The creation of test structures for characterization and modeling processes and interconnect is an active endeavor in IC technology [Schr90][Ghan82][Ghee85]. Typically, a regular set of test circuits are included as a process or control monitor in the kerf. These structures are designed by process designers and are probed during and after the run to collect key process statistics. In the case of a mature process, these measurements should show a high consistency from run to run. The same structures are used with special structures designed by device and circuit modeling engineers to model the process for circuit designers. Again, in the case of a mature process any variation is attributed to the design and simulation tools, which are then calibrated accordingly, and another set is run for verification. Typically, these special test structures are designed using simple scaled geometries that ignore the effect of real on-chip topologies and circuit environments. In addition, these test structures have limited suitability for predicting circuit behavior in the middle of a reticle field. Therefore, delay and interconnect modeling based on these isolated test structures is not accurate enough to extract maximum performance out of a circuit. Effects such as non-planarity of the multilayer interconnect with dominating 3-D fringing fields, or self and mutual heating with resulting device slow down are common in such technologies. These effects will not generally be seen in small planar test structures. Modeling verification and performance prediction must be done by measuring structures at the relevant dimensions with the actual materials and processing history to be implemented [Edel95][Deut94]. This becomes very important for high performance chips which try to squeeze maximum performance out of a process. Thus, test structures must approximate a typical chip environment and provide the desired data with a high level of accuracy.
This is especially important for a technology involving new semiconductor materials and new circuit concepts [Zucc80]. The continuously evolving nature of the AlGaAs/GaAs HBT process technology became apparent when the first batch of chips developed by the FRISC group came back from the foundry and the measured results were found to be significantly different from the simulation results based on the earlier models [Nah93]. This indicated the need of an additional degree of safety margin in subsequent design efforts to compensate for the unexpected variations in the process itself. Non-availability of any characterization structures on that run made it very difficult to correlate the measured results with simulation.
An opportunity arrived, in the fall of 1993, in the form of the High-Speed Circuit Design (HSCD) project to make demonstration test circuits and drive the calibration and augmentation of the CAD tools to reflect the test results. A strong emphasis was placed on the suitability of the technology for high-speed circuit design. Hence, key issues were the multilayer interconnect characteristics over short and long distances with varying topology and geometry, and device switching performance under transient, loaded, unloaded, low current density, and high-current density conditions. A test chip was designed to carry structures for this purpose.
A reticle containing the test chip and a few other demonstration
and experimental circuits was submitted to the foundry in the
Fall of 1994 and processed wafers were received by year end. The
testing and correlation process continued for another year after
receiving the first wafers. This chapter gives a brief overview
of the strategy employed to evaluate key circuit parameters and
the design of the test structures. The testing scheme and the
measurement results are described in the next chapter.
The HBT devices are formed in organo-metallic vapor phase epitaxy
(OMVPE) grown layers on semi-insulating GaAs substrates. Figure
2.1 shows a cross-section of the HBT device [Asbe91] used in this
effort. Many HBTs are made using an AlGaAs/GaAs heterostructure
due to a close match in their lattice constants [Lee93]. These
devices have a smaller base spreading resistance compared to a
homojunction BJT. Another advantage is that the emitter-base capacitance
can be made very small by a relatively low doping of the emitter.
The baseline device has a ft of 50 GHz at a collector
current of 2.0 mA.
The interconnect process is non-planar employing three levels of gold conductors, Si3N4 and polyimide interlevel dielectrics, and thin film - NiCr and WSiNx - resistors. The interconnect cross-section is shown in Figure 2.2. The nominal dielectric constants of polyimide and silicon nitride are 2.9 and 6.8 respectively. Silicon nitride lies on top of the first level of metal and is used to make capacitors for analog components of the circuits or for providing bypass capacitors. The gold interconnect and nitride layer on top of the devices also act as good heat spreaders. The overall fabrication technology is discussed in great detail in [Ali91].
Early attempts to characterize GaAs technology targeted MESFET [Zucc80] [Brow93], and low-integration HBT Gate arrays [Ying93]. This was the first attempt to characterize the HBT technology for high-density and high-speed full custom digital applications. A quarter of the reticle space (2 cm x 2 cm) was allocated for this purpose. Therefore, the selection of appropriate test structures and the test scheme to implement in a limited area required a lot of iterations from concept to implementation.
A typical high-density chip contains gates with varied loads, high fan-out nets, short local interconnects, long global interconnects, clock interconnection networks, and dense circuit macros such as register files and datapaths. Maximizing performance requires accurate information about device and interconnect behaviors under different circuit conditions. Low complexity large regular test structures were designed to make global performance estimates and small high-density structures were designed to simulate local environment. The structure description is broken up into two main sections: passive and active structures, and is described in the following sections. Figure 2.3 shows the design flow adhered to in the design process. The layout of the test chip is shown in Figure 2.4 and the layout of the full reticle is given in Figure 2.5.
The passive structures consisted, mostly, of capacitive and resistive
structures to determine, electrically, parameters such as line
width, sheet resistivity, thickness, and dielectric constant of
metal and/or insulator layers in the process and to calibrate
extraction tools against these results. These structures are described
in this section.
There were three types of capacitors on the test chip: MIM/parallel,
finger, and crossover. The main difference between these capacitors
was the direction of the electric field between the capacitor
plates. All the capacitors were designed to be measured by 1-port
s-parameter measurements limiting the value of capacitors to a
few pico farads. These capacitors are probed with a coplanar GSG
probe with a 150 µm pad pitch [Casc83]. All the capacitors
are described below.
The MIM capacitors were obtained by sandwiching the silicon nitride
layer between the first and second level metal layers. The schematic
and layout of a MIM capacitor is given in Figure 2.6. Y and X
dimensions were varied to obtain two such capacitors. Their dimensions
and extracted capacitance, using QuickCap, are given in Table
2-1. The capacitance of these structures comprises of a large
parallel plate component and a small fringing field component
as shown in Figure 2.7.
Parallel plate capacitors were made by overlapping any two of
the available three metal layers. Schematic and layout of a parallel
plate capacitor between the M1 and M2 layers is shown in Figure
2.8. The only difference between a MIM capacitor and an M1/M2
parallel plate capacitors is the removal of polyimide layer in
the former to obtain a high capacitance value. Again, X and Y
were varied to get capacitors of different values and of different
layer combinations - M1/M2, M1/M3, M2/M3.
A circular dot would have been the best type of structure as it
has the minimum periphery for a given area, but boston geometries
were not allowed in the layout. The overlap capacitors included
in the test set with the dimensions and extracted capacitances,
using QuickCap, are given in Table 2-2. Figure 2.9 illustrates
a model of an M1/M2 capacitor.
Finger, or interdigital, capacitors were made by interdigitating signal and ground electrodes in a plane using only one metal layer. These capacitors emphasize the line to line capacitance as the field in these capacitors is mainly in the horizontal direction. This kind of capacitor is present in the circuit between a base and emitter contacts in bipolar transistors [Asbe87] or between a differential wire pair. The layout and schematic of a typical structure is shown in Figure 2.10 and the photomicrograph of a fabricated capacitor is shown in Figure 2.11. Two finger capacitors were put on the chip in M1 layer with the dimensions shown in Table 2-3. These dimensions are typical of the M1 wiring on F-RISC/G chips. The spacing between two adjacent fingers was kept at 20 µm to make interfinger coupling negligible. Since the metal layers are not very thick the number of fingers was increased to twenty to amplify the mutual capacitance. Both silicon nitride and polyimide are present as insulators between these metal1 fingers. An approximate model of the finger capacitor is given in Figure 2.12.
|416 x 747.6||399/2/3|
|416 x 757.5||398/2/4|
Cross-over capacitors were obtained by orthogonally crossing conductors
in two metal layers at regular intervals. Such a structure maximizes
the field lines between two conductors and simulates the worst
case wiring capacitance when all the wiring tracks are occupied.
The schematic and photomicrograph of a crossover capacitor is
shown in Figure 2.13 and Figure 2.14. The characteristics of these
capacitors, provided in Table 2-4, are based on standard routing
pitches used currently in F-RISC/G chips. Use of the same routing
pitches was important for later correlation with the measured
results, and the direct insertion of the result into chip routing
tools. A simple model of an M1/M2 crossover capacitor is also
shown in Figure 2.14.
|M1 pitch = 4
µm (1 track)
M2 pitch = 3 µm (1 track)
|M1 pitch = 22
µm (3 tracks)
M2 pitch = 21 µm (3 tracks)
|M1 pitch = 6
µm (1 track)
M3 pitch = 8µm (1 track)
|M2 pitch = 6
µm (1 track)
M3 pitch = 8 µm (1 track)
The resistor structures were designed to investigate the effect of line width, corners, and processing steps on the sheet resistivity of resistive and metal layers. A non-planar process produces variation in interconnect sheet resistance. Although this variation is minuscule, long signals routed on top of devices in a dense register file can increase the line resistances affecting the current levels adversely. Each resistor structure consists of five separate resistors as shown in Figure 2.15. The resistors are measured by the four-point method and are probed by a multi-channel (SSPGSSGPSS) cascade probe with a 150 µm pad pitch [Casc91]. The six signal pads on the probe can be used to measure voltage across five resistors (R1, R2, R3, R4, and R5). This arrangement provides additional accuracy by measuring the resistance between any two points in the whole structure.
The metal resistor structures determine the effect on the sheet resistivity of metal layers due to different types of contacts, crossovers, and vias in a high-density layout. One example is shown in Figure 2.16. All the resistor structures are summarized in Table 2-5.
Response of a known passive network to a known excitation can be exactly determined by network analysis or by experimentally applying the stimulus to a sample. The uncertainty in this approach is the variation in the passive network properties and in replication of the excitation waveform encountered in real circuits. Even a fast ramp on the order of 25 ps -30 ps requires very expensive equipment. Again, measurements with network analyzers and time domain reflectometry techniques require de-embedding the effect of the probes and pads and contact parasitics. Some of these, such as contact resistances, are dependent on the force applied and can leave uncertainty in the measurements. On the other hand, in-circuit test schemes can be easily used to measure the network response to a large signal, and calibrate parasitic extraction tools. One common scheme is the use of ring oscillators made of a chain of gates. A ring oscillator can produce and apply a multi-GHz stable stimulus very easily to produce direct time domain results. Ring oscillators can also, simultaneously, provide a low frequency synchronization signal to trigger an external scope to capture a high-speed signal. Ring oscillators have been used in many testing situations [Dutt95][Lane87] before.
Active structures use a ring oscillator as the basic structure on the test chip and modify it to isolate the desired variable. At first it was thought to be good enough to put a few low-level wiring oscillators comprised of several interconnected stages and test their oscillation period for gate-delay measurement [Lane87]. This idea can also be used to monitor the interconnect instead of the gate [Dutt95]. Here, the ring oscillators were used to evaluate, simultaneously, both the devices and the 3-D interconnect effects with help from passive structures. The ring oscillators were loaded with the typical interconnect loads encountered in digital designs.
The basic idea is shown in Figure 2.17. An odd number of inverters, or a chain of buffers with an odd number of inversions, can be connected in a circular chain to make an oscillator. This oscillator will oscillate at a frequency given by [2. 1]
where N is the number of oscillator stages and di
is the delay through the ith stage. Hence, if the delay through
each stage is equal to d, the resulting oscillation frequency
is simply given by [2. 2]
Therefore if the delay through each stage is kept the same by careful design and layout, the maximum frequency of oscillation depends on d only which varies from dunloaded to dloaded, where dloaded is given by [Bako90] [2. 3]
Here dloaded, dunloaded, and dinterconnect signify the loaded gate, unloaded gate, and the interconnect delays respectively. dunloaded is the intrinsic gate delay and is calculated by measuring an unloaded ring oscillator. Once dunloaded is known, equation 2.3 can be used to obtain dinterconnect as dloaded is calculated by measuring the loaded ring oscillators. Depending on the interconnect, as shown in Figure 2.18, this delay can be due to lumped capacitive load, distributed RC load, or distributed RLC load.
A current mode logic (CML) gate acts like a current source to charge a lumped capacitive load. The delay in charging a capacitor C through voltage swing V by a current I is given by [2. 4]
If V and I are known, the above equation determines capacitive load at the gate output. The load at the output is also extracted by QuickCap. Any discrepancy among measured and extracted results leads to a change in the process technology file for QuickCap. The same process technology file is used to verify capacitive structures described earlier. Therefore, this triple check method provides a process technology file with a high accuracy. The device characteristics are measured with a separate structure to verify device models. Thus, device performance with varying load can be estimated simultaneously employing ring oscillators with unloaded and loaded gates. The design parameters of the oscillators designed for these measurements are given in Table 2-6. The unloaded and loaded ring oscillators are described in the following sections.
Figure 2.18: Ring oscillators
with (a) unloaded (b) capacitive (c) distributed RC, and (d) distributed
|Gate current level|
|Number of stages|
|Length of interconnect|
|Type of interconnect|
|Type of load|
The main design issues for unloaded oscillators were the area required and testing convenience. A six-channel 5 GHz ceramic probe from Cascade[Casc91] was available for testing. A 4-to-1 multiplexer was used to switch between four oscillators requiring three channels -two for multiplexer select signals and one for multiplexer output. The other three channels were used to monitor the on-chip power supply and the voltage swing in a special voltage-current monitor cell to calibrate the swing at 250 mV across all the oscillators.
The next problem was the number of stages in an oscillator. It is easier to measure a lower oscillation frequency achieved with more stages. The best choice turned out to be an 8-stage oscillator. A six-stage oscillator would have been close to the upper measurement frequency of the probe and a 7-stage oscillator would have disrupted the regular layout. The multiplexer too could have been included in the oscillating loop as shown in the Figure 2.19. This configuration doesn't lend itself to easy post-fabrication analysis as the constituent stages are not of the same type. Finally the configuration shown in Figure 2.20 was chosen to decouple the multiplexer from the ring oscillators.
The schematic shown in Figure 2.20 formed the basis of both unloaded and loaded oscillators. The layout of the unloaded oscillator structure is shown in Figure 2.21 with four 8-stage loops connected via a 4-input multiplexer to a 50 pad driver for scope outputs. These oscillators are made up of two types of devices - standard Q1, and a round emitter Q1. There were two oscillators of each type of device in the structure and their design parameters are given in Table 2-7.
Three basic types of routing schemes were used in F-RISC/G HBT
standard cell areas as enumerated in Table 2-8.
|Differential routing between standard cells||wiring pitch = 6 µm|
|Differential routing inside standard cells||wiring pitch = 4/5 µm|
|Single ended routing inside standard cells||wiring pitch = 2/3 µm|
Several ideas emerged regarding the design of test structures which would give an indication of capacitance of a wire under these routing conditions. These conditions can also be described as
A strategy was developed to provide as many structures as possible and still be able to measure all the situations described above.
Since there are four oscillator spots available in any structure, as described in the section on unloaded oscillators, with all of them connected to a single multiplexer it was easier to calibrate the structures if all of them contained the same kind of oscillators. Finally, a test structure was designed with four oscillators of the same load type and varying signal length from 530 µm to 1908 µm. In total there was space for seven oscillator structures on the chip including the unloaded oscillators. The line lengths were similar in each group and only the loading type was varied - overlap, crossover, finger. One underlying assumption was that the interconnect capacitance would behave as a lumped load instead of a distributed one. Therefore, the maximum signal length was kept below 2 mm to ascertain a lumped load. The photomicrograph of one such structure is shown in Figure 2.22.
Table 2-9 lists all the different types of ring oscillators along with their load type, load capacitance, and their oscillation period - determined from PSPICE simulations - grouped according to the structure type. The structure type in the first column also cross-references the layout by the encircled numbers in Figure 2.23 which shows the combined layout of all the ring oscillator structures.
|Unloaded (1)||Minimum M1||14||354|
|Simple (4)||530 µm M1||86||570|
|Simple (4)||1218 µm M1||180||825|
|Simple (4)||1562 µm M1||229||947|
|Simple (4)||1906 µm M1||281||1087|
|Simple (7)||530 µm M2||73||534|
|Simple (7)||1218 µm M2||156||764|
|Simple (7)||1562 µm M2||200||875|
|Simple (7)||1906 µm M2||238||980|
|Overlap / M3 Plane (5)||530 µm M1/M3 Gnd||84||567|
|Overlap / M3 Plane (5)||1218 µm M1/M3 Gnd||188||841|
|Overlap / M3 Plane (5)||1562 µm M1/M3 Gnd||234||966|
|Overlap / M3 Plane (5)||1906 µm M1/M3 Gnd||283||1090|
|Overlap / M2 Plane (3)||530 µm M1/M2 Gnd||99||603|
|Overlap / M2 Plane (3)||1218 µm M1/M2 Gnd||211||900|
|Overlap / M2 Plane (3)||1562 µm M1/M2 Gnd||266||1058|
|Overlap / M2 Plane (3)||1906 µm M1/M2 Gnd||328||1209|
|Finger / M3 (6)||530 µm M1/M3 Gnd||87||573|
|Finger / M3 (6)||1218 µm M1/M3 Gnd||187||844|
|Finger / M3 (6)||1562 µm M1/M3 Gnd||233||970|
|Finger / M3 (6)||1906 µm M1/M3 Gnd||286||1105|
|Finger / M2 (2)||530 µm M1/M2 Gnd||94||595|
|Finger / M2 (2)||1218 µm M1/M2 Gnd||203||876|
|Finger / M2 (2)||1562 µm M1/M2 Gnd||255||1027|
|Finger / M2 (2)||1906 µm M1/M2 Gnd||310||1169|
Figure 2.23: Layout of all
the unloaded and loaded ring-oscillators along with the load type.
The structure numbers are cross-referenced in Table 2-9.
On-chip wire topology is much more complex than a simple long
wire. Standard options in present day design tools are to model
the wire as (a) or (b) in Figure 2.24 [Kahn96]. With longer wire
lengths, the delay is dependent on wire resistance too. These
models start to fail if the total interconnect resistance is comparable
to the driver output resistance.
Figure 2.24: Various types
of interconnect models (a) lumped C (b) lumped R and C (c) distributed
R and C (d) distributed R, L, and C.
This regime, where the wire delay is dependent on both resistance and capacitance, is quadratic in nature. When the wire lengths are short, the delay is linear with respect to the capacitance. The point of transition from linear to quadratic regime comes when the wire resistance begins to be significant with respect to the output resistance of the driver. In case of bipolar logic this point comes very early as the driver output resistances are in the 150 - 250 range. Based on calculations, M1 wires start showing the quadratic RC delay effect after a length of only 1.5-2 mm. The point where even the wire inductance becomes significant is starting to show up on long global wires. Assuming a lumped R and lumped C, as in (b), is optimistic as all the capacitance is shielded by the resistance. A distributed RC tree (c) indicates accurate delay as long as the inductance is not significant. As signal frequencies are increased, a full RLC model is needed to model these interconnects.
On-chip interconnections are fanned out in many ways and this
makes the calculation of driver to receiver delay, also known
as sink delay, even more complex. A number of algorithms, such
as asymptotic waveform evaluation [Pill90] and Pade approximation
[Lin92] have been put forward to approximate an RC tree. The earliest
one used a simple RC time constant to approximate the 50% delay
time [Elmo48]. These approximations can eat into the design margin
of a high-performance design and therefore real SPICE based numbers
need to be generated to get a measure of confidence in delay modeling
before committing to an expensive run.
A test structure was designed using an 8 mm long tapped delay
M1 line as shown in Figure 2.25. It has a very compact layout
and simplifies testing by using a special ring-oscillator configuration.
There are 8 ring oscillators formed out of 8 taps on the delay
line. All 8 oscillators can be measured in one probe touchdown
unlike 4 only in the previous scheme described for unloaded oscillators.
The layout of the structure is shown in Figure 2.26. The inputs
to the structure are 3 DC signals to a 3-to-8 decoder which generates
8 select lines. These lines in turn select one of the ring oscillators.
|Size||1.55 mm x 1.6 mm|
|Device Count||302 Standard Q1 HBT, 20 Diodes, 104 resistors|
|Current Level / Output Swing||2 mA / 250 mV|
|Number of Stages||8|
|Outputs / Inputs||3 (1 High-speed) / 3|
Since the ring oscillator structures involved devices, another way for measuring these device characteristics in the vicinity was provided with special probe de-embedding sites. There were both de-embedded transistors and de-embedded schottky diodes on the chip. A standard compact structure was used to characterize these devices with two-port s-parameter measurements, as shown in Figure 2.27 .
The upper half of the structure contains a device connected in
a common-emitter configuration with the B (base), E (emitter),
and the collector (C) terminals as labeled on the pads. The lower
half of the structure contains the same geometry but without the
device. The measurements from the lower half of the structure
are used to de-embed the data from the upper half. The structure
is probed in two steps with a coplanar high-speed probe. The measurement
scheme and results are described in the next chapter.
This structure contains eight 30-stages oscillators to test the
unloaded gate delays of different types of transistors under different
biases. The structure has a very compact layout and contains more
than a 1000 transistors in an area of 1.6 mm2 and is
used as an yield indicator. An augmented testing scheme measured
oscillation frequencies, voltage swing, and current levels of
all eight oscillators in just one probe touch down.
Table 2-11 summarizes the characteristics of this structure. The
schematic and layout of the structure are shown in Figure 2.28
and Figure 2.29 respectively. Four ring oscillators feed each
of the 4-to-1 multiplexers which in turn feed a 2-to-1 multiplexer.
The output of this multiplexer drives a 50-ohm driver. The input
to the structure are 3 DC select signals to select one of 8 oscillators.
|Size||1.0 mm x 1.6 mm|
|Device Count||1057 Transistors, 8 Diodes, 782 Resistors|
|Device Type||Standard and Non-Standard Q1|
|Current Level / Stages||2 mA, 1.6 mA, 0.8 mA / 30 Stages|
|Output Swing||250 mV|
|Outputs / Inputs||4 (1 High-speed) / 3|
There were other chips and chiplets on the reticle as shown in
Figure 2.5. These were: RPI Test Chip, Voltage Controlled Oscillator,
and Boundary Scan Test Chip. These chips contained much more complex
circuits which were verified by applying results
from the structures described herein. The RPI test chip, tests
a 200 ps 32 word x 8 bit SRAM and a 1-ns delay carry chain. The
SRAM is a sensitive indicator of the accuracy of capacitance extraction
because its wiring is extremely intricate and the memory is heavily
wiring-capacitance limited in its operating speed. The carry chain
similarly tests the raw speed of the devices in minimal load environment
The voltage controlled oscillator [Camp97] tested the limits of
the transistor speed by trying to achieve a 20 GHz oscillator.
The boundary scan chip contained a scheme to test complex chips
A test chip containing passive and active test structures was designed to characterize an AlGaAs/GaAs HBT process. The structures were developed, in particular, to investigate the process for high-density digital applications. Another requirement in the design of these structures was the calibration of parasitic extraction tools for this process which are mostly 3-D in nature. Both passive and active structures were designed for this purpose. The measurement methods and test results are described in the next chapter.