The important requirements of the F-RISC/G package are a dense placement and routing capability and high power dissipation. This led to frequent pad rearrangement [Loy96] in combination with critical path routing estimation and driver power optimization. The design of the package at the behavioral level involved chip partitioning, global route estimation, and power allocation as explained in the previous chapter. This chapter describes the physical design of the package and package testing scheme.

The chapter is organized as follows:

Chip Preparation

Any packaging scheme places some kind of design requirement on the chip pad footprint design. A modern chip can have signal I/O and power pads arranged on the periphery in a single ring/ staggered rows/ non-staggered rows /area arrays arrangement. Typically, a single pad ring is tried for a single chip package unless the number of pads required are much more than that. In that case a multiple ring arrangement or area arrays are considered. Sometimes, even if the chip is not pad limited, power distribution concerns may require area arrays.

Figure 5.1: Staggered pad arrangement on FRISC chips.

A performance driven multichip package can have chip with pads not following any regular arrangement. Though, routing considerations make the designers follow a regular grid. As the package type was being investigated upon, several iterations were done on the chip pad footprints. Due to process-induced restrictions, pads could be placed only around the core logic. An ability to place pads in the middle would have saved power and eased routing. The final arrangement is a garland of staggered pad rows with 150 µm pitch within a row and 75 µm pitch between the two rows as shown in Figure 5.1. Each differential pad pair was split between these two rows on a 75 µm pitch.

On-chip Terminators

The signal environment required 50 termination which could have been achieved via the use of either on-chip resistors, taking advantage of the easily available resistor layer in the process, or thin-film resistors on the package. Using package resistors would have increased the routing complexity on the package. On the other hand, the on-chip resistor layer had been tested very well in the earlier interconnect characterization effort. The on-chip resistors were found to be within 5 % of the desired value as reported in chapter 3. Therefore it was decided to put the terminators on the chips themselves which also meant easy manufacturability.

As is shown in Figure 5.2, each receiver pad has a 50 resistor nearby with its own pads. If the receiver in question is the last receiver in the daisy chain then it becomes a suitable candidate for termination. In that case the receiver pad will be connected to one end of the resistor and the other end of the resistor is pulled up to the power supply. The first end of the resistor is also connected to the incoming signal. Alternatively, the incoming signal is connected to the receiver pad and a small jog from there is connected to the resistor. Each differential receiver has two input pads. These two input pads require a resistor each. These resistors share a common node and therefore are connected as such to save space on the package. Also, a nearby ground pad is provided on an extra rail to be connected to the ground plane. This rail is separated from the chip ground rail so as not to disturb circuits inside the chip core.

Figure 5.2: Pad and terminator arrangement around the chip periphery.

Boundary Scan Pad Redesign

Known Good Die (KGD) acquisition has always been a source of concern for MCM foundries [Mali94]. Normally, a KGD is supplied after some type of testing - wafer level, functional, at-speed, or chip-level burn-in. This requires some sort of carrier for the die to be put in. After the appropriate test, this carrier is removed from the die and the die is declared a KGD in case of a successful test.

The die can get damaged by probing or by the process of removing it from the test carrier or the pads become unsuitable for making good contact in the next packaging step. Therefore the boundary scan pads were elongated to make sure these pads have undamaged space left after the chip is declared a KGD. The pad elongation increases the capacitance on the pads and is not a cause of concern for low-speed boundary scan signals. Only one of the pad requires a high-speed clock for at-speed testing which is designed to drive this extra capacitance. This sacrificial area is shown in the Figure 5.3.

Figure 5.3: Redesigned boundary scan pad.

Extra Ground Pads for Testing

One result of the terminator pad design, explained in the previous sections, was the inclusion of extra ground pads near all the drivers. This helps in getting down to a driver pad with a 50 _ GS probe for voltage sensing. This probe is used in addition to the other probes for scan testing.

The schematic of this type of test is shown more clearly in Figure 5.4. Only one pad of a differential pad pair can be probed this way. The understanding is that if that pad is damaged or the signal is not at the right level, there is no useful information gained by probing the other pad as far as passing the chip as a KGD is concerned. Another important point arising out of an asymmetric probe pad arrangement and is not intuitive is that two Ground-Signal probes with the arrangements - GS and SG - are needed for testing left and right side of the chips simultaneously as the probe arm can be moved in only one direction.

Figure 5.4 : Extra ground pads for testing.

Floorplanning and Optimization

The final floorplan was arrived at by simultaneously tweaking the chip I/O pad placement and the overall package floorplan. Typically, a package designer has the freedom to change die or chip placement to fulfill system specifications. An additional degree of freedom is necessary to achieve maximum system performance with a given set of chip and packaging technologies. This additional flexibility is gained by the ability to change pad placements on the chips and requires close collaboration between the chip and package designers. This kind of approach can be dubbed as a "giant chip" [Kuh92] or "superchip" [Poon95] development.

Ability to dictate the pad placements on the chip resulted in the modification of the old floorplan shown in Figure 5.5. It shows several critical path windows overlapped on the floorplan. With the cycle time goal of 1 ns, the system chip placement was originally performed by hand. Signal propagation times were analyzed for the critical paths in the system. The cache controller address broadcasts represented the crucial nets. The respective time of flight windows have been overlaid on the preliminary MCM. The time-of-flight delays didn't make the timing specification for a number of critical paths. The longest net was about 10 cm from the cache controllers to all the memory chips. Given the timing constraints it became apparent that a revised floorplan would be necessary.

Figure 5.5: An earlier floorplan with overlaid critical timing windows.

The floorplan that was derived is shown in Figure 5.6. In this configuration, both cache controllers are located directly beneath Datapath chips, reducing the distance of the address broadcast. This revision solved the data path chips to cache controllers broadcast, however the cache controller address broadcasts were still out of tolerance. By placing two sets of output address pads on each cache controller, it was possible to allow each controller to broadcast to two sets of four memory chips as shown in Figure 5.7. This modification results in a significant reduction in the maximum time of flight delay as compared to broadcasting to all eight chips from a single driver. All the new critical paths are shown in Figure 5.8.

Figure 5.6: Final version of F-RISC/G package floorplan.

Figure 5.7: Cache controller address broadcast from two sets of pads.

Figure 5.8: Critical data, instruction, and control paths.

Electrical Design

I/O Circuit Design

Several types of I/O drivers and receivers were designed to use for special circuit needs and environment. All the off-chip signals are differential except the boundary scan control signals. The differential driver and receivers are described below with their schematics in Figure 5.9.

Figure 5.9: Circuit schematics of an open collector differential driver and a differential receiver.

Differential Drivers

The output drivers are open collector drivers with the high and low voltage levels of 0V and -400mV at 25 C respectively. Schottky diodes are used to prevent saturation of the driver if the output is unused. These drivers act as a switched current source with a high impedance. The pullup resistors also serve as the terminators for the off-chip transmission lines. The 10%-90% rise time of the driver is 70 ps and delay 45 ps at a current level of 8 mA and output swing of 400 mV as shown in Figure 5.10.

Differential Receivers

The pad receivers are differential buffers with a high level of 0 V and a low level of -200 mV (min) at 25 C. These receivers have sufficient gain to generate a full 250 mV output swing with an input differential signal of 150 mV to account for transmission line losses on long lines. Receiver sensitivity analysis was done by varying the input differential voltage and looking at the output differential. The results are shown in Figure 5.11. A difference of 100 mV at the receiver input is enough to switch the receiver. The receiver delay is 15 ps at a switching current of 1.2 mA as shown in Figure 5.12. The characteristics of driver and receiver are summarized in Table 5-1.

Table 5-1: Summary of driver and receiver characteristics.

Signal Swing
Rise Time
Differential Driver41.6 mW 45 ps400 mV at o/p70 ps
Differential Receiver6.24 mW 15 ps>= 150 mV




Figure 5.10: Driver response with (a) delay = 45 ps (b)10%-90% rise time = 70 ps (c) and switching current = 8 mA.

Figure 5.11: SPICE simulation of the output voltage differential vs. the input voltage.



Figure 5.12: SPICE simulated receiver response: (a) delay = 15 ps and (b) switching current = 1.2 mA.

Layer stack design

The characteristics of the off-chip interconnect structure can be guessed by looking at the number of signals, congestion, critical signals, and power requirements. The horizontal congestion of the nets between ID and DP demands a very small wire pitch. The wire pitch also depends on the signal swing requirement of the longest net. Striplines provide higher density than microstrips if the insulator thicknesses are kept the same. Table 5-2 summarizes the design issues involved in selecting a particular stack with a minimum of two controlled impedance routing layers. The candidate cases are shown in Figure 5.13. Thick power planes are used as reference planes with low loss power distribution. Case 1, shown in bold, was selected due to its high routing capacity. The signal lines have 50 impedance and are 36 µm wide with a 75 µm routing pitch. There are two ground and two power planes. Each signal layer references a ground layer to provide a low inductance path for any common mode current. The interesting point to note from the stack design is the placement of a set of power planes right on top of the chips. This way all the power vias get out of the way of the routing layers increasing routability by a significant factor. The reduced length of power vias also reduces the power and ground via inductances.

Table 5-2: Trade-offs between Cases 1,2 and 3.

Case 1
Case 2
Case 3
routing direction
routing density
manufacturing cost

Figure 5.13 : Possible stacks - Case1: 6 layers, Case2: 5 layers, Case3: 5 layers.

Interconnect Parameters

The interconnect cross-section for a routing layer is shown in Figure 5.14. Here h is the distance between a line and the nearest reference plane, t is the line thickness, w is the line width, s is the line pitch, and d is the differential routing pitch.

Figure 5.14: Interconnect cross-section.

Each interconnect layer contains an adhesive layer, a polymer dielectric film, and a Ti/Cu/Ti metal layer [Kris96]. The first layer dielectric is composed of Ultem adhesive and KaptonE film. Upper layers use siloxane polyimide/epoxy (SPIE) blend adhesive, and KaptonE films. The interconnect metal is 0.1µm sputtered Ti / 0.3µm sputtered Cu / 4µm electroplated Cu / 0.1µm sputtered Ti for upper metal layers 1 to 5 while metal 0 layer has 10 µm thick Cu for carrying in power. The cross-section modeled along with the layer dimensions is shown in Figure 5.15.

Figure 5.15: Cross-section of the interconnect structure for parameter extraction.

This cross-section was modeled in QuickCap [Rle94] and capacitance was extracted in two cases - air dielectric, and SPIE/KaptonE. The capacitance matrix obtained from using air dielectric was inverted to obtain the inductance matrix. R and G values were obtained from published results [Kris96]. The parameters for a single line in a differential pair are given in Table 5-3.

Table 5-3: Nominal line parameters.

1.38 /cm
294 nH/m
116.5 pF/m
0.002-0.0025 (10 GHz - 20 GHz)
propagation delay
5.85 ps/mm

Transmission Line Delay Modeling and Analysis

The lines were modeled as lossy transmission lines with using W element in HSPICE [Hspi97] which incorporates frequency dependence of the line resistance. Figure 5.16 shows the schematic of a differential transmission line of a point-to-point net. The driver and receiver current levels are at 8 mA and 1.2 mA respectively. The input to the driver is a 250 mV differential signal representative of on-chip input to a driver. The driver output is a 400 mV pulse on the transmission line with a rise time of 70 ps. These pulses are absorbed by the 50 terminators at the receiver input. The line length is varied from 1 cm to 10 cm and the receiver output is simulated. The results are shown in Figure 5.17. The receiver input waveforms are shown getting progressively attenuated as the line lengths are increased. The receiver switching threshold of about 150 mV falls within the initial step for line lengths of upto 9 cm long providing very low delay penalty due to rise time degradation.

Figure 5.16: Schematic to determine the driver strength.

Figure 5.17: Input wave at the receiver input after the specified transmission line length.

The skin-effect resistance is modeled by a frequency dependent sheet resistance Rs as follows:

[5. 1]

[5. 2]

where w = 72 µm (twice of line width due to strip line design), and = 1.9 µ-cm. The receiver requires an input differential of at least 150 mV to switch. These input and output differentials are shown in Figure 5.18 for line lengths from 0.1 mm - 9 cm.

Figure 5.18: Voltage differentials at the input and output of the receiver.

Adder Critical Path

The carry chain adder is on the critical path in MCM and was simulated for a 32-bit addition. The schematic of this path through the MCM is given in Figure 5.19. The delay numbers on the adder path are given in Table 5-4. There are three types of delays involved in a complete 32-bit addition. These are the logic delays from the operands to the carry-out pad on DP0 and carry-in to carry-out pad on DP1/DP2/DP3. The carry propagation delay comprises of the driver/receiver delays and package signal delays. The package delays are calculated from the zero crossing at the driver output to the zero crossing at the receiver input. The total time taken to add two 32-bits operand is 764 ps providing as compared to the original stipulated 800 ps. The relative contribution of these delays is shown in the pie chart in Figure 5.20.

Figure 5.19: Schematic of the carry chain critical path.

Table 5-4: Delay breakdown in the 32-bit adder.

Delay Type
Delay [ps]
Operand A and B to DP0COUT

Figure 5.20: Distribution of delay in the adder critical path.

Coupling Noise Analysis

In a real circuit differential signaling reduces common mode coupling noise but doesn't eliminate it. Members of a differential pair still experience uneven coupling in a minimum pitch routing environment. To determine the coupling noise impressed on maximally coupled differential lines the line arrangement shown in Figure 5.21 was simulated for maximum near end and far end coupled noise. Line cross-section was same as the cross-section shown earlier in Figure 5.15. The capacitive line parameters were extracted by QuickCap both in the presence of normal dielectrics and in the presence of air dielectric only. The capacitance matrix in air was inverted and divided by the square of the speed of light in air to obtain the inductance matrix. These matrices are given in Figure 5.22. One point to note is that the dielectric environment is not homogeneous due to the presence of the adhesive and KaptonE film with different dielectric constants. This will introduce far end noise as described later. The lines were assumed to be coupled to a maximum length of 5 cm. Pairs 1, 2 and 3, 4 were driven differentially with 400 mV signals and the coupled signals at the input and output of lines 3 and 4 were simulated by HSPICE.

(a) (b)

Figure 5.21: Schematic of coupled lines simulation (a) near end (b) far end. Lines 1,2,4, and 6 are excited and noise is monitored on lines 3 and 4.

* L0 (H/m)


2.2e-8 2.935e-7

1.7e-9 2.2e-8 2.919e-7

1.0e-10 1.8e-9 2.221e-8 2.93e-7

0 2.0e-10 2.0e-9 2.5e-8 2.923e-7

0 0 2.0e-10 1.9e-9 2.19e-8 2.94e-7

* C0 (F/m)


-0.9e-11 1.18e-10

-0.035e-12 -0.9e-11 1.18e-10

-0.001e-13 -0.035e-12 -0.9e-11 1.18e-10

-0.0001e-14 -0.001e-13 -0.035e-12 -0.9e-11 1.18e-10

-0.00001e-15 -0.0001e-14 -0.001e-13 -0.035e-12 -0.9e-11 1.18e-10

Figure 5.22: L and C matrices for the 6-conductor system.

Figure 5.23 shows the noise at near-end of lines 3 and 4 when only lines 1 and 2 are excited. Line 3 is closest to 1 and 2 and experiences a peak noise amplitude of 20 mV. Coupled noise to line 4 drops to less than 2 mV. The peak differential noise amplitude is about 18 mV. Coupled noise on lines 5 and 6 is minimal. It is clear from Figure 5.23 that the coupled noise drops very fast as we move away from the excited lines. When both line pairs 1,2 and 5,6 are excited in-phase, the near end noise on lines 3 and 4 takes the form shown in Figure 5.24. The peak noise differential is about 30 mV. Figure 5.25 shows the noise when the pairs 1,2 and 5,6 are excited out-of-phase. Both lines 3 and 4 noise waveforms are in phase reducing the peak differential to 5 mV. Thus, the worst case near end noise on a quiet differential receiver is about 30 mV when the neighbors on both sides are excited in-phase.

Figure 5.23: Near end noise on lines 3 and 4 (bottom) with excitation on lines 1 and 2 (top). The waveform on line 3 is shifted up by 400 mV for easy comparison with noise on line 4.

Figure 5.24: Near end noise on lines 3 and 4 (middle) with in-phase excitation on line pairs 1,2 and 5,6 (top). The waveform on line 3 is shifted up by 400 mV for easy comparison with noise on line 4. Noise differential is shown at bottom.

Figure 5.25: Near end noise on lines 3 and 4 (middle) with out-of-phase excitation on line pairs 1,2 and 5,6 (top). The waveform on line 3 is shifted up by 400 mV for easy comparison with noise on line 4. Noise differential is shown at bottom.

Figure 5.26 shows the far end noise waveforms on line 3 and 4 when only lines 1 and 2 are excited. The peak noise amplitude is on line 3 of less than 20 mV. Line 4 again experiences minimal coupling. When both pairs 1,2 and 3,4 are excited in-phase with 400 mV signals, the coupled noise on 3 and 4 is shown in Figure 5.27. The peak differential noise is about 30 mV. Out-of-phase excitation of lines 1,2 and 3,4 result in the noise waveforms shown in Figure 5.28. Due to the in-phase nature of noise the differential noise amplitude goes down to less than 10 mV. Thus, in the worst case both near and far end differential noise amplitudes are less than 30 mV. This is about 30% of a 100 mV noise margin.

Figure 5.26: Far end noise on lines 3 and 4 (bottom) with excitation on lines 1 and 2 (top). The waveform on line 4 is shifted up by 400 mV for easy comparison with noise on line 4.

Figure 5.27: Far end noise on lines 3 and 4 (middle) with in-phase excitation on line pairs 1,2 and 5,6 (top). The waveform on line 4 is shifted up by 400 mV for easy comparison with noise on line 3. Noise differential is shown at bottom.

Figure 5.28: Far end noise on lines 3 and 4 (middle) with out-of-phase excitation on line pairs 1,2 and 5,6 (top). The waveform on line 4 is shifted up by 400 mV for easy comparison with noise on line 3. Noise differential is shown at bottom.

Coupling Test Structure

A test structure is placed on the package to measure the coupling between signal lines as the spacing between them is increased. The location of the structure w.r.t. other chips is shown in Figure 5.29. The left end of the structure is shown in Figure 5.30. A signal can be launched on line A and the coupled signal can be probed on lines B, B', C, C', D, and D' at both near and far ends. The lines are 36 µm wide and are spaced at a 75 µm pitch. Length of line A is 5 cm while B/B', C/C', and D/D' are coupled to A/A' for a length of 4.825 cm, 4.775 cm, and 4.7 cm respectively. A slow wave test structure is also designed and will be described in Liyong Wang's thesis.

Figure 5.29: Location of the coupling test structure on the package.

Figure 5.30: Layout of the left end of the coupling test structure.

Power Distribution Analysis

The scheme to supply power to the package is shown in Figure 5.31 with the bold arrows pointing the locations of the solder lands for power connectors. The power is distributed via 10 µm thick solid Cu planes providing a low resistance and inductance path. The layer stack is reproduced in Figure 5.32 to show the arrangement of power planes. Total power requirements of the package are given in Table 5-5.

Figure 5.31: Power supply scheme with the power connectors shown by arrows.

The worst case voltage drop was estimated by making the following assumptions:

Figure 5.32: Layer stack used in the package.

Table 5-5: Power requirements of the package.

Total Power [W]
Supply Voltage[V]
Current [A]
ID112.0 -5.22.30
DP452.0 -5.210.0
CC225.2 -5.24.84
CR16128.0 -5.224.61
Deskew14.0 -5.20.76
Total221.2 -5.242.5

This implies that the current level keeps decreasing as it travels from the sides towards the center of the package and goes to zero at the vertical line in the center of the package. To the first order, this current level is shown in Figure 5.33. The value under each vertical line is the amount of the current crossing that line in the direction shown. The distance between adjacent vertical lines is 1 cm.

Figure 5.33: The current distribution along a power plane .

Based on Figure 5.33, the voltage drop from left supply rails to the center of the package is calculated in Table 5-6. The parameters assumed for the plane are

Therefore using two pair of power planes and supplying power from the smaller sides will result in a maximum voltage drop of less than 0.5%. In reality, the power is supplied from all four sides which will reduce this variation to even less percentage.

Table 5-6: Calculations for worst case voltage drop.

Location Current Leaving

Cumulative Voltage

Drop [mV]
Percentage voltage drop = 21.53 mV x 100/5.2 V = 0.41 %

Another factor to consider in supplying power is the via resistance from the power fingers to the power planes, solder resistance, and IR drop in external cable. There are as many vias on these fingers as permitted by the design rules. These factors are tabulated in Table 5-7. The voltage entering all the sides are kept same by carefully controlling external cable lengths. External power supplies share load to keep this variation close to zero. The maximum current density in the power plane is much lower than a very safe limit of 1E5A/cm2 [Ghan82].

Table 5-7: Other factors in supplying power.

Via resistance (A = 1296 µm2, t=45 µm)
0.5 m_
Solder resistance
2 m_
Maximum current density in the power plane
1.5E4 A/cm2

Simultaneous Switching Noise Analysis

Solid power planes provide very low inductance and differential transmission brings the switching current to a very low level. The number of drivers and power vias in each chip are shown in Table 5-8. The via parameters are given in Table 5-9.

Table 5-8: Number of power and driver pads on F-RISC/G chips.

Number of Drivers
Number of power Pads
65 80
28x4=112 124x4=496
41x2=82 136x2=272
5x16=80 130x16=2080
339 2928

Table 5-9: Via parameters [Daum93].

Via Resistance0.5 m
Via Capacitance1 fF
Via Inductance10 pH

To determine the current fluctuation at the pull up resistors, the circuit shown in Figure 5.34 was simulated in HSPICE. It lumps the inductance of both power and ground connections into a single inductance L in series with the pull-up resistors. The resultant current fluctuation and its rate of change are shown in Figure 5.35 and Figure 5.36.

Figure 5.34: Schematic of structure to determine current fluctuations.

From Figure 5.36

Maximum rate of change of current (dI/dt) = 20 E6 A/s

Total number of drivers N = 339

The condition to keep the voltage drop on the line to less than 10 mV when all the drivers switch simultaneously is given as

[5. 3]

or, [5. 4]

Substituting N and dI/dt we obtain, [5. 5]

Since the inductance of a set of power and ground vias is about 20 pH we need a minimum of 14 vias to satisfy this condition. This condition is amply covered in the package with thousands of power vias to power planes.

Figure 5.35: Current fluctuation at the VDD node of an output driver.

Figure 5.36: Rate of change of switching current in a differential driver (x=106).

Clock Routing and Deskewing Scheme

The clock lines are distributed as shown in Figure 5.37 from the deskew chip. The deskewing scheme implemented in an earlier test chip [Nah94] was able to deskew to a maximum of half a phase, i.e., 125 ps to minimize the effect of temperature and humidity on the synchronization among different chips. This capability is enhanced with the use of matched length clock lines to minimized the effect of any mismatch in time-of-flight delays on clock signals. The longest clock line turned out to be the clock to DP3 chip and was 4.7 cm, as routed. Therefore, rest of the chip clock lines were meandered to make them equal to 4.7 cm. Figure 5.38 shows a closeup of these meandered clocks.

Figure 5.37: Location of synchronous chips w.r.t. the clock driver.

A second version of the package has also been prepared in case the deskew chip does not get completed on time. This scheme drives a clock tree from an external clock. to generate 10 matched clocks. One of these clocks is routed out of the package for monitoring. This circuit also generates the RESET and SYNC signals based on the external RESET and SYNC signals. This alternate scheme is shown in Figure 5.39.

Figure 5.38: Equal length clock delay lines to the receivers.

Figure 5.39: Clock supply scheme on the MCM.

The clock tree designed for this new scheme has matched length lines both on-chip and off-chip to obtain low skew. The off-chip transmission lines have equal number of bends to minimize the effects of inductive discontinuities on clock skew. This is shown more clearly in Figure 5.40. The external clock comes via a surface mount mini-SMA connector to a clock driver. This clock driver drives ten more drivers with matched length lines. Seven of these lines go to the seven synchronous chips on the package. The memory chips require a free-running high speed clock for testing and pattern loading. Supplying sixteen high-speed clocks would have required valuable connector space on the package periphery. Therefore, two of these ten clocks go to two banks of memory chips. Within each bank these clocks are forwarded in a relay fashion from chip to chip. This keeps the rise time sharp at the memory clock inputs. One clock output is sent out of the package via another mini-SMA connector for testing.

Figure 5.40: Schematic of an alternative low skew balanced clock scheme.

Placement and Routing

Design For Manufacturability

While searching for a reasonable packaging strategy it was observed that the motherboard-daughterboard concept, so popular in the traditional PCB designs, can be applied very well to this package to increase final package yield. In this concept the original floorplan is broken into three clusters and the clusters are packaged separately and tested. The working clusters are then combined together to obtain the final configuraion. This design-for-manufacturability approach is illustrated in Figure 5.41.

Figure 5.41 : Division of the floorplan into three clusters.

The cluster size was traded off against the critical path margin to simultaneously increase yield and lower the critical path penalty. The critical path in the 3-package processor increases over that of a single package processor due to the minimum distance requirements between two adjacent packages as shown in Figure 5.42. A maximum distance of about 2 mm is kept between two adjacent packages in this approach. The commercial availability of 6-chip MCMs from Ross Technology, a SPARC vendor provided a measure of confidence in 6-chip MCMs. The leftmost cluster (or daughter MCM) comprises of cache memory with 6 chips placed edge to edge. The rightmost cluster contains 6 chips with 5 memory chips and one datapath chip. The middle cluster contains 12 chips and has an instruction decoder chip, 3 datapath chips, 2 cache controller chips, and rest of the memory chips.

Figure 5.42: Well locations with respect to the chips.

Initially, the idea was to completely route these daughter MCMs and then combine them on a mother substrate using a final personalization tape as shown in Figure 5.43. As the routing progressed it was realized that the final personalization step will need much more than one layer. In the present scheme, the first layer is put down separately on all the daughter MCMs and is followed by a complete testing of these modules.

Figure 5.43 : Connecting edges between the daughter MCMs.

Subsequently three working daughter MCMs are put together and rest of the 5 layers are put down on top of the combined MCMs. A base substrate is shown in Figure 5.44 which can be removed to reduce the thermal resistance of the overall package. The steps in assembling the module are as follows:

Figure 5.44 : Arrangement of three daughter MCMs on one mother MCM.

Differential Routing on MCM

Designing interconnect networks for thin-film packages always strain existing routing systems beyond their limits [Dai92]. Each signal layer provided about 130 cm of differential wires (75 µm pitch) per square cm. A scheme had to be found to route these differential lines with minimum skew. The problem of this type of net-matching also occurs in any high-performance CMOS design to route clock lines and critical nets [Xiao89]. The typical approach to handle these nets is to route them individually and let the router try to balance the delay by optimizing the drivers on these lines alongwith the length and topology [Carr96]. A new scheme was developed by the F-RISC group [Loy94] which reduced the routing complexity by half in any given design and increased the ease of hand-editing in a post-routing session. The scheme works well only if it is given a pre-existing route. Since MCM was routed mostly by hand, the differential routing was also completed in the same manner. The task of planning the final routing was done simultaneously with the placement with the critical nets guiding the process. Only staircase and staggered vias were used in the design and are shown in Figure 5.45. Figure 5.46 shows the picket fence arrangement of pads which were the toughest to route. Figure 5.47 to Figure 5.56 show all the interconnect layers except the top ground plane.

Figure 5.45: Different types of vias.

Figure 5.46: Picket fence like pad arrangement and routing directions.

Figure 5.47: A rat's nets representation of the route showing congested horizontal control paths.

Figure 5.48: A graphical estimation of the routing density on the package.

Figure 5.49: Layer 0 for signal and power I/O.

Figure 5.50: Layer 1 - IC pads.

Figure 5.51 : Layer 2 - Power plane with signal feedthroughs.

Figure 5.52 : Layer 3 - Ground plane with signal feedthroughs.

Figure 5.53: Layer 4 - Signal layer (X and Y routing).

Figure 5.54: Layer 5 - Power plane with signal feedthroughs.

Figure 5.55: Layer 6 - Signal 2 (X and Y routing).

Figure 5.56: Top substrate view.

Final MCM Statistics

The final wiring statistics are shown in Figure 5.57. The figure shows the longest wires going upto 5 cm in line length. The division of the longest wires among the major blocks of the processor is shown in Table 5-10. Table 5-11 provides the major characteristics of the package.

Figure 5.57: Net length distribution (series 1 - original, series 2 - final).

Table 5-10: Final length of the major signal buses.

Net Type

Length [mm]
Data Memory Address18 50.3
Address to CC3235.8
Instruction Bus to ID32 24.25

Table 5-11: Wiring statistics.

Wire width36 µm
Wire pitch75 µm
Wiring efficiency0.75
Minimum Chip separation0.4 mm
Maximum Chip separation2 mm
Bond pad area/Pitch on chips75 x 75 mm2 / 75 µm
Number of wiring layers7
Total wire length19.5 m
Number of nets780
Average net length 25 mm
Total number of Vias18600
Total number of Pins12208
Total number of Signal I/Os67
Maximum Voltage Drop< 47 mV
Board Area2101 mm2


The design of this package required a number of optimizations and techniques. The problem of chip damage during testing was solved by providing sacrificial pad area for testing. Floorplanning at the package level used a new technique by involving the individual chip designers into a global optimization. Differential signaling provided superior noise rejection and signal transmission capability to the package. The module design was optimized for manufacturability by breaking it into 3 sub-modules. The hand-crafted routing achieved a high level of efficiency in packing density at 75%.