To compensate for the higher capacitance of the wires we shrunk the wire width of Metal 1 from 2.0 µm to 1.6 µm taking advantage of the new wire design rules in the Rockwell HBT process. The shrinking of the wires reduces the interconnect capacitance to about the same level that we expected based on nominal dielectric layer thicknesses. However, an analysis of distributed RC indicated that Metal 1 wires that are longer than 2 mm show a distributed RC delay effects resulting in a quadratically growing interconnect delay. This causes excessive interconnect delays that were not included in our simulations since our tools used to support only a linear interconnect delay model. Even using 2.0 µm wide Metal 1 wires to lower the resistance of the wires does not significantly reduce the quadratic delay effects since the wider wires have larger coupling and crossover capacitances. This is surprising since distributed RC delays are typically only relevant in interconnects with submicron design rules. We notice distributed RC effect in our HBT circuits earlier since the drive impedance of bipolar drivers is low compared to CMOS, hence the resistance in the wires gets larger than the output impedance of the drivers even after a few millimeters of interconnect. Since the targeted cycle time for our HBT chips is very short a 50 ps excess delay due to distributed RC delay is significant while a 50 ps delay increase in a CMOS circuit would have a much lower impact.
In the Rockwell polyimide/Au interconnect process, only Metal 1 shows significant distributed RC delay effects. The distributed RC delay effect is only significant if the total Metal 1 wire length from the driver to a receiver on a net exceeds 2 mm. Metal 2 and Metal 3 are much thicker and have lower interconnect capacitances per unit length since they are further away from the GaAs substrate with a dielectric constant of 13.1. The other dielectrics in the Rockwell process (SiO2 er=3.9, SiN er=6.9, Polyimide er=2.9) have significantly lower dielectric constants. Table 1 shows the characteristics of the three interconnect layers in the Rockwell HBT process and Figure 1 shows a vertical Metal 3 channel crossing an Metal 2 wiring channel and a standard cell.
|Rockwell's Polyimide/Au Interconnect Layers|
|Metal 1 - W=1.6 µm P=6.0 µm||0.65||35||143|
|Metal 2 - W=2.0 µm P=6.0 µm||1.40||14||101|
|Metal 3 - W=3.0 µm P=9.0 µm||1.60||11||123|
Since Metal 3 is thicker than Metal 1, its sheet resistance is only 16 mW instead of 55mW per square. In addition, the M2-M3 dielectric is thicker than the M1-M2 dielectric reducing coupling and capacitance to Metal 2 and Metal 3 is farther away from the GaAs substrate with the high dielectric constant of 13.1. However, the minimal width and spacing of Metal 3 are 3 µm instead of 1.6 µm and 2.0 µm for Metal 2. Also while the minimal width and spacing of the interconnect were reduced in 1995 the vias were not scaled. A M2-M3 via requires a 5 µm by 5 µm Metal 2 and Metal 3 area and a M1-M2 via requires a 3 µm by 3 µm area. The large size of the vias can be seen in Figure 1 that shows a Metal 3 wiring channel crossing a standard cell with two power rails and a standard cell wireing channel.
The RC problem can be avoided if segments of nets that contain long Metal 1 wires between the driver and a receiver are moved to Metal 3. However, the routing density of Metal 3 is much lower than Metal 1 and Metal 3 is already used for the power rails. Thus the power rails have to be cut and bridged with Metal 2 as shown in Figure 1. Metal 1 is used for vertical interconnects on the F-RISC/G architecture chips and Metal 3 is used for horizontal power rails. Fortunately, our estimates indicated that only 20-40 nets are affected on each chip. Thus the reduced wire density on Metal 3 is no impediment.
Based on these results a move from Metal 1 to Metal 3 can both lower the interconnect capacitance on long nets and eliminate the distributed RC effect! Additional QuickCap runs indicate that the capacitance increase on Metal 2 wiring due to additional Metal 3 overlap in the wiring channels is small (<5%).
The structures simulates a Metal 3 wiring channel crossing a standard cell
row and a wiring channel with 20 differential Metal 2 wires. The channel
cut in the Metal 3 power rails is closed with Metal 2 and M2-M3 vias as
shown in Figure 1. The wiring pitch for Metal 3 is varied from 6 um 10 um.
The Metal 3 width is 3 µm, the minimal Metal 3 design rule.
Assuming the following net configuration:
The Metal 1 line is dominated by losses and the signal rise time gets severely
degraded on long lines causing long delays and a jitter problem in the presents
of noise. The Metal 3 lines have much less resistance and show some ringing
effect since on chip interconnects are not terminated.
Figure 7 shows Metal 1 and Metal 3 interconnect delays as a function of interconnect length. The Metal 3 delays are almost linear up to 10 mm while the Metal 1 delays are clearly quadratic! The delay of an 8 mm long Metal 1 line is twice as long as expected from a linear delay model! The excess delay on a net running from top to bottom on a chip can therefore be 150 ps longer than originally simulated. Even spacing Metal 1 wires further apart leads to no significant improvement. Increasing the spacing of Metal 3 from 6 µm to 7 µm improves delays on long lines. Further increases in the Metal 3 spacing have no significant effect on delays, but reduce interconnect density. Considering the size o the M3-M2 vias a routing pitch of at least 9 µm is recommended. A 9 µm Metal 3 pitch allows to place in-line vias in the Metal 3 channels. Thus with a 9 um pitch the transition from Metal 3 to Metal 2 or Metal 1 can be done in the existing Metal 2 and Metal 3 wiring tracks of the nets to be moved to Metal 3. The M3W30P90r (Metal 3 Width=3.0 µm Pitch=9.0 µm reduced interlayer dielectric thickness) delays are a few percent lower than the original Metal 1 W20P60nR delays which do not include the wire resistance and are based on a sensitivity towards capacitance of 112 Ohm for a high power gate with the output at level 1. Table 2 lists the interconnect delays for Metal 1, Metal 2, and Metal 3 differential interconnections as a function of wire length. The table shows the delays with nominal (suffix n) and reduced (suffix r) dielectric thicknesses. The M1W20P60nR entries represent the expected delays with nominal dielectric thicknesses and no distributed RLC effects (Rwire=0). Table 3 lists the equivalent capacitance to ground (C11+C12=C22+C12) of differential Metal 1, Metal 2, and M3 interconnects.
|Interconnect Delays [ps]|
|Equivalent Capacitance To Ground for Differential Interconnects.
||M1W16P60r ||C11=C22=143.5 FF/mm ||C12=22.0 FF/mm
||M1W20P60n ||C11=C22=135.1 FF/mm ||C12=28.6 FF/mm
||M1W16P70r ||C11=C22=141.0 FF/mm ||C12=19.4 FF/mm
||M1W16P80r ||C11=C22=137.7 FF/mm ||C12=18.1 FF/mm
||M2W20P60r ||C11=C22=100.5 FF/mm ||C12=20.8 FF/mm
||M2W30P60n ||C11=C22= 93.8 FF/mm ||C12=21.3 FF/mm
||M3W30P60r ||C11=C22=135.1 FF/mm ||C12=25.2 FF/mm
||M3W30P70r ||C11=C22=125.4 FF/mm ||C12=17.5 FF/mm
||M3W30P80r ||C11=C22=124.1 FF/mm ||C12=13.7 FF/mm
||M3W30P90r ||C11=C22=122.8 FF/mm ||C12=13.4 FF/mm
Figure 8 shows the voltage drop on a 7.2 mm long Metal 3 power rail and
a 7.2 mm power rail with three bridges for 14 differential Metal 3 wires
with a 9 µm pitch at 1/5, 1/2, 4/5 of the length. The three Metal 2 power
rail bridges are each 288 µm wide. Clearly even under worst case assumptions
the Metal 2 bridges and vias do not signifcantly increase the voltage drop
on the power rails.
(a) Original Routing (b) Local Optimization (c) Global Optimization
(a) Optimization with external M3 channel (b) Optimization with internal M3 channels
(a) Extraneous differential-signal crossovers (b) Optimized layout
Simulations indicate that the circuit should draw 106 mA, producing a power dissipation of about 0.25 W.
In the design of adders, the critical path is typically in the carry chain.
The carry into any bit of an adder depends on all previous inputs. With
a ripple-carry adder, the simplest type of adder, the carry out from one
bit is found directly from the carry from the previous bit, which is computed
from the carry of the bit before that, and so forth. The carry chain thus
extends in one path through each and every bit in turn, and the longest
path length through the adder is directly proportional to the number of
bits in the adder.
The carry select adder attempts to shorten this path through parallelization. The adder is divided into several stages. With in a section, some other carry scheme is used, such as ripple carry. However, to avoid the delay waiting for the carry in to the first bit of the section, separate sets of logic are used to find the carry and sums for both possible carry in values (1 and 0). By the time the carry in is available, all possible sums and carry outs have been computed. The carry in can then be used to select the proper output via a multiplexor at each output. The longest path in this case consists of the carry through the first stage plus one multiplexor for each stage where the carry in selects the proper carry output.
The performance of a full-custom, single-macro varied stage carry select adder was compared with the even-length stage carry select adder spread across four dies currently used in the F-RISC Project microprocessor. For future stages of the project, it has been proposed that a factor of two can be gained in adder speed through architectural adjustments and migrating to a higher yield technology where the datapath could fit on one die, before device speed gains are even considered.
|Stages||Carry Stage Widths||Carry Settling Time [# Gate Delays]|
|9||No Possible Sequence||N/A|
|(1) Ripple carry only. No select needed for one stage|
|b31 rising to s31 rising||57.2 ps|
|b32 falling to s31 falling||53.7 ps|
|cin falling to s0 rising||26.9 ps|
|cin falling to s31 rising||452.8 ps|
|cin falling to cout falling||460.6 ps|
|b0 rising to s0 falling||45.6 ps|
|b0 rising to s31 falling||521.3 ps|
|b0 rising to cout falling||527.7 ps|