CHAPTER 6

THERMAL MODELING OF THE F-RISC/G PACKAGE

Introduction

In modern day high-performance machines, the burden of cooling has shifted from the systems level to the component level [Tumm89]. This has been brought about by the increasing heat density at the chip level as evident by the power dissipation numbers of recent commercial high-performance microprocessors shown in Table 6-1. The heat flux of most of the high-performance microprocessors is greater than 10 W/cm2. The F-RISC/G package dissipates about 220 W with an average heat flux of 10.47 W/cm2.

Table 6-1 : Power dissipation of recent high-performance microprocessors.

Microprocessor
Technology
Devices
Power

[W]
Size

[mm2]
Heat Flux

[W/cm2]
DEC Alpha 212640.35m CMOS 15,200,000
72
302
23.84
Intel Pentium Pro0.35µm BiCMOS 5,500,000
30
195
15.30
IBM PowerPC6200.5m CMOS 6,900,000
30
311
9.64
UltraSPARC II0.35m CMOS 5,400,000
25
149
16.77
PA-RISC 80000.5µm CMOS 3,900,000
40
345
11.59
MIPS R100000.35µm CMOS 6,800,000
30
298
10.06
Exponential X-7040.5µm BiCMOS 2,700,000
85
150
56.67
F-RISC/GGaAs HBTs301642
220
2101
10.47

The temperature that directly affects chip operation is the device junction temperature. This affects the chip reliability as it is inversely proportional to the product of the junction temperature and the time spent at that temperature [Serg95][Cohe94]. The functionality may vary too as the gate delay figures are typically valid only over a limited range of temperature. System noise margin is also compromised because current and voltage levels depend on temperature at the drivers and receivers [Poon95]. Power consumption in dense areas - such as register files, cache blocks, and ALU - leads to non-uniform temperature distribution giving rise to local hot spots. If one region operates at a much lower temperature than the other, voltage swings need to be adjusted to meet design margins. Therefore, squeezing maximum performance out of a microprocessor requires that all of the regions on the microprocessor be tuned to perform compatibly [Fitc94]. If these temperature variations are not known, the designs are made over-conservative with a loss of performance. Therefore identification of thermal problems and thermal management becomes critical for high-performance circuits.

The gain of the HBT transistors goes down with an increase in temperature, unlike the homojunction bipolar transistors, resulting in a slowdown of the transistors [Wald92]. Interestingly, this is in conflict with the goal of higher integration unless the devices consume less power with a resulting lower total chip power. Otherwise, the temperature increase will offset the speed advantages gained through reduction of interconnection parasitics. The available HBT SPICE models show the opposite effect as they were modeled along the homojunction bipolar devices. Therefore, SPICE's internal temperature parameter, which is for silicon homojunction bipolar devices was not used in circuit simulations. Though SPICE device models were available at several temperatures, the most reliable and extensively tested model worked at only 25C. Therefore, all the designs were simulated with the 25C model and the important requirement on the cooling structure of the package was to keep the device junction temperature at 25C.

Silicon's thermal conductivity is about three times that of GaAs at room temperature and is inversely proportional to temperature increases while the thermal conductivity of GaAs decreases with temperature by a power of -1.25 [Roes88]. Therefore, hot spots are more likely to form on GaAs surfaces compared to Si at same power densities. Some researchers have devised CAD tools to layout GaAs ICs keeping the temperature effects in mind [Vold95]. Though, for designers working on high-performance designs, the result of such tools has not been an over-riding factor [Fitc94]. Typically, a post-processing step is used to make minor re-arrangements for thermal integrity such as moving a big resistor away from a device. As we go to higher heat flux a thermal engineering approach is more suitable than simple post layout thermal management. Due to a reduced performance from process based penalties, F-RISC/G designers couldn't apply power reduction techniques to the chips and most of the power reduction was obtained by powering down bi-directional pads between primary and secondary cache.

The powering down of pads brought the power dissipation of the package from a maximum of 270 W to 220 W with the maximum power density on a chip at 18 W/ cm2. The effect of this power density on the device junction temperatures was modeled for different HBT devices to determine the worst case junction temperature rise. A layout guide was also generated to help in device level floorplanning and minimize the effect of mutual heating. The device temperature models were based on the methods available in literature [Smit91][Poul91] and were verified against measured results [Wald92].

Thermal management and design has always been an integral part of mainframes and supercomputer [Cohe93] designs. These machines are generally large and can afford elaborate cooling schemes. On the other hand, processors geared towards personal or workstation type computing applications can't afford the associated cost of a complex cooling system. Since F-RISC/G was targeted towards a workstation type environment a demonstration air-cool system was set as a goal. Therefore, the basic heat dissipation structure required the attachment of the package to a big enough air-cooled heat sink. This became possible by the donation of a low thermal resistance heat sink, capable of dissipating 600 W - 800 W with a low form factor, by IBM [Knic91].

Since the device junction temperature needed to be kept at 25C some form of cooling was still required. One of the most attractive option was the incorporation of semiconductor cooling elements in the form of thermoelectric coolers. This reduced the demands on the ambient cooling and provided fine control on the chip temperatures with low noise. The resulting cooling arrangement, along with the boundary conditions, has the structure shown in Figure 6.1. The rest of the chapter is organized as follows:

Figure 6.1: Major elements of the proposed cooling structure.

Distribution of Thermal Sources on the Package

The processor dissipates power in the current switches used for logic, emitter followers used for driving long on-chip interconnections, memory circuits used for temporary storage, and the driver-receiver circuits used for off-chip communication. Table 6-3 shows the power requirements of all the chips. The computational core of one decoder and four datapath chips dissipates close to 64 W. The primary cache memory and the cache controllers spend an additional 205 W of power.

Table 6-2 : Latest power dissipation numbers for F-RISC/G chipset.

Chip
Devices
Area [mm2]
Power [W]
Power Density [W/cm2]
ID
7358
7.6 x 8.7
12.0
18.1
DP (4)
9785
8.5 x 9.3
13.0 x 4
16.4
CC (2)
13172
9.5 x 8.3
12.6 x 2
15.9
CR (16)
14300
7.0 x 9.2
11.2 x 16
17.3
Deskew
-
6.0 x 6.0
4.0
11.1
Total
301642
-
272.4 W
-

The power spent in the cache memory is unusually high due to its implementation in HBT technology. This step was necessary due to the unavailability of lower-power high-density CMOS bare dies with sufficient access speeds at the time. The 66% power contribution of the cache chips, see Figure 6.2, can be brought down in future designs by using alternate technologies. The power allocated to these chips kept increasing, as shown in Table 6-3, as the project came to completion due to the unexpected increase in interconnect capacitances as described in the previous chapters.


Figure 6.2: Distribution of power among F-RISC/G chipset.

Table 6-3: Increase in chip power dissipation over design duration.

Chip Name
1992-93
1994-95
1996-97
ID
7.0 W
12.0 W
12.0 W
DP
7.0 W
12.0 W
13.0 W
CC
6.0 W
9.5 W
12.6 W
CR
5.0 W
9.0 W
11.2 W
DSK
4.0 W
4.0 W
4.0 W
Total
131.0 W
227.0 W
272.4 W

One major component of this dissipation in all the chips is the power expended in the clock drivers, clock buffers, address drivers, and the superbuffers used to drive large capacitances. The increased capacitance required additional power in the already "hot" drivers to maintain speed. The circuit blocks dissipating most of the heat are shown in Table 6-4 with the pie distribution shown in Figure 6.3.

Table 6-4: Major sources of heat on the package.

Circuit/Element
Number
Total Power Dissipated [W]
L1-L2 Drivers/Receivers1024 50.5
Core Drivers31213.0
Regfile Blocks48.0
Cache RAM Blocks64128.0
Tag RAM Blocks4*8.0
Boundary Scan 2321.0
Core logic + receivers 39.44
Core Drivers/Receivers
82/118
4.46
Other (Terminators)Counted with drivers/receivers above
Total272.4

*only 12 columns out of a possible 32 are used here.

One important fact indicated in Table 6-4 is the power consumed by the communication between the first and second level of cache. This is due to more than a thousand signals for data exchange between L1 and L2 cache sustaining a data rate of 8 GB/s between L1 cache and CPU core. This type of bandwidth was essential to keep a high throughput and the price was in terms of increased power.

Figure 6.3: Distribution of power among different circuit elements.

A decision was made not to implement L1 - L2 interface on the demonstration prototype because they were not needed to run test programs. These driver/receiver pairs have been powered down on the present system saving 50 W in power dissipation. Currently, the proposed system dissipates about 220 W of power. The modified power dissipation numbers are given in Table 6-5 and the pie distribution is shown in Figure 6.4. Table 6-6 provides the area and power distribution numbers for the sub-packages. The silicon - or GaAs- efficiency of the complete package is

showing the high packing density achieved in this design. The areas are given in mm2.

Table 6-5 : Modified power dissipation numbers for F-RISC/G chipset.

Chip
Devices
Area [mm2]
Power [W]Power Density [W/cm2]
ID
7358
7.6 x 8.7
12.0 18.1
DP (4)
9785
8.5 x 9.3
13.0 x 4 16.4
CC (2)
13172
9.5 x 8.3
12.6 x 2 15.9
CR (16)
14300
7.0 x 9.2
8.0 x 16 12.5
Deskew
-
6.0 x 6.0
4.0 11.1
Total
301642
1570.2
221.2 -


Figure 6.4: Modified distribution of power among F-RISC/G chipset.

Table 6-6 : Aggregate power dissipation numbers for sub-modules.

Module
Devices
Active Area

[mm2]
Power

[W]
Power Density

[W/cm2]
MCM1
85800
15.45 x 29.8=460.41
48.0 10.42
MCM2
134557
35.27 x 32.75=1155.09
120.210.38
MCM3
81285
15.45 x 31.5=486.67
53.0 10.89
Total
301642
2002.17
221.2 -

Determination of Junction Temperature Rise

One major constraint on this design was the requirement to keep the device junctions at room temperature. The measurement of peak junction temperature is very difficult [Anho95] and the spice models are mostly verified against an average junction temperature [Wald92]. Since the area occupied by the devices is very small compared to the area occupied by the wires the heat flux on the surface of the chip is highly non-uniform as shown in Table 6-7.

Table 6-7: Comparison of interconnect area and device area.

Chip Name
Total Devices
Total Area

[mm2]
Device

Area
Wire

Area
Metal

Layers
Instruction Decoder7358 66.12
2.2%
97.8%
3
Datapath978579.05
2.4%
97.6%
3
Cache RAM1430064.4
4.4%
95.6%
3
Cache Controller13172 78.85
3.3%
96.7%
3
4004* (10 µm)2300 1240%
60%
Pentium Pro** (0.35 µm)5,500,000 1957.25%
92.25%

*PMOS, **CMOS (device area approximated by 21 x [minimum design rule].

Thus, an assumption of uniform heat flux on the surface, also known as the parallel plate case [Poul91], can lead to an over-optimistic thermal design. The nominal thickness of a GaAs wafer is about 25 mil (625 µm) and individual dies are thinned down to reduce the thermal resistance of the die. The thermal resistance presented by a die is not a strong function of its thickness as most of the heat is generated in a very small area at the top of the die. The heat is spread out from a device to cooler points on the surface or back of the chip as shown in Figure 6.5.

Figure 6.5 : Thermal spreading from devices and dense circuits at the chip surface.

The already low thermal conductivity of GaAs decreases - as much as 30% in the 25C-125C range - leading to a self heating of the device [Smit91]. Similarly if two or more devices are located close together, their temperature fields interact and the temperature at any point is given by a superposition of the two fields. This effect, called proximity heating, can also increase the device temperature requiring even more cooling to operate the device at a given junction temperature.

Usually, the chip can work within a relatively wide temperature range unlike wire delays which have a rather narrow range available to maximize performance. Therefore the conventional tools are configured to minimize wire delays and the temperature problem is fixed later by external cooling [Fitc94]. This is not true everywhere in the circuit as the highly congested memory cells may need to trade-off temperature with capacitance and speed for reliable operation especially if the device degrades at higher temperatures.

Figure 6.6: HBT device layouts illustrating the relative sizes of the emitters.

Determining temperature distribution on a chip is a complex problem, akin to capacitance determination, due to multitude of non-regular heat sources and conductive geometries. This removes the possibility of getting any closed form analytical solution and all the methods - finite element, finite-difference, series-summation - use some kind of empirical approximation due to the problem of using up excessive memory. Next section describes the thermal models for an HBT device which estimate maximum junction temperatures at different power levels and at different die thicknesses.

Modeling Junction Temperature

The temperature is highly peaked where the power is dissipated in a transistor. As mentioned before, it is very difficult to measure the peak temperature because of its high non-uniformity. Most of the measured results and spice models are based on an average temperature [Wald92]. Therefore the device resistance needs to be modeled and verified against measured results. The basis of most of the modeling in the literature is the assumption that the heat flow problem can be separated from the calculation of the device's electrical characteristics. Thus, instead of solving 3 or more coupled differential equations - Poisson, current continuity, and heat flow - in 3-D space extending to the back of the substrate we solve just the temperature problem in 3-dimensions and weakly couple the two equations through the thermal impedance parameter, and use linear thermal coefficients for the electrical and physical parameters [Anho95].

The heat flow equation is

[6. 1]

where k is the thermal conductivity of the medium in W/mK, T is the temperature at a point in K and P is the power generated per unit volume in W/m3. This equation can be solved analytically or numerically but most of the methods assume a temperature independent thermal conductivity k [Poul91]. GaAs thermal conductivity depends on temperature and can be expressed as

[6. 2]

Various authors have provided different values of n varying from -0.85 to -1.41 [Anho95]. We assume n = -1.25 which gives a reasonable approximation [Blak82]. With n = -1.25 the thermal conductivity of GaAs at 300 K is 44 W/mK.

The temperature dependent conductivity of GaAs can be changed into a temperature independent variable by applying the Kirchoff's integral transformation [Smit86]. With this transformation, we can reduce equation (6.1) and equation (6.2) to a constant k equation by applying the following equation to obtain a pseudo temperature [Poul91].

[6. 3]

Here T0 is the chip backside temperature. The real temperature rise then can be obtained by using the following transformation after solving for :

[6. 4]

This reduces equation (6.1) to the familiar Poisson equation

[6. 5]

There are several methods in the literature to solve the Poisson equation and the most relevant one here is the solution for a point or hemispherical source on the surface of a semi-infinite medium. Assuming that the temperature field is distributed two-dimensionally on the surface of the chip with no heat loss to either the back of the die or from the top surface, the temperature can be written as [Holm81]

[6. 6]

Here T is the temperature at a point away from a source, with power P, by a distance r. Superposition is used to calculate temperature rise due to multiple sources. On the surface of a die, equation (6.5) can be rewritten as

[6. 7]

where z is normal to the die surface. If we apply the boundary condition of a constant chip backside temperature, there are no known closed form solutions [Poul91]. This problem can be circumvented by using series summation methods such as the method of images [Daws93] where for every power source, another negative power source is placed at twice the chip thickness below the top source. This is followed by another positive source at three times the thickness of the chip, and so on, so that the temperature at any point on the surface due to the original source is given by the sum of a series of sources with alternating signs. This method suffers from an excessive computation requirement for a usually multi-source chip and the series takes a long time to converge.

Other solution methods are programs employing finite-element and finite-difference calculations requiring a lot of memory and are very slow for even simple structures. Floating points programs such as QuickCAP [Rlc94] can also be applied to solve this equation but a suitable version was not available. Instead, the following approach was used to model the device temperature rise and other effects readily and was shown to be conforming to measured results. The approach is based on [Smit86] and [Poul91] and is a hybrid of finite-element and analytical solutions.

Step 1

Previous research [Anho95] has shown that most of the power in an HBT is dissipated below the emitter and the power density is nearly independent of the position beneath the emitter. This conforms to the fact that in an HBT the current and fields are vertical. Therefore, for the worst case analysis, the heat source can be approximated as a rectangular plate having the dimensions of an emitter stripe. The current density in the emitter can be as high as 1000A/mm2.

Step 2

Equation (6.6) for a point source can be integrated in two dimensions to model a rectangular source. This results in the following expression from [Smit86] for the temperature at any point (x,y) with the source of dimension L x W centered at (x0,y0).

[6. 8]

where,

, and

Step 3

Now add the effects of a finite chip thickness, die-attach layers and chip edges. [Poul91] determined that for chip thickness larger than the source dimensions the temperature distribution, with infinite thickness assumption, should be multiplied by e-r/H to obtain the temperature at the desired point. This factor was determined by comparing the results with finite-difference and series-summation simulators. Therefore, the temperature at the point (x,y) can be given as

[6. 9]

The contribution due to the die-attach layers, such as epoxy for F-RISC/G chips, can be determined using the above expression by assuming two layers with the same conductivity and is given from [Poul91] for the sake of completeness:

[6. 10]

where k1 and h1 refer to the chip conductivity and thickness and k2 , h2 refer to the die-attach layer. This expression is valid when

[6. 11]

Step 4

Use superposition to add up the contribution of multiple sources.

Step 5

Model the standard devices used in F-RISC/G and compare them with the published experimental results.

This algorithm was followed to develop a program in MATLAB [Math92], a popular matrix based package, to simulate the temperature distribution on top of HBT transistors with different size emitters. Table 6-8 and Figure 6.7 give the worst case power dissipation situations in the F-RISC/G design. The 3-input AND gate shown in Figure 6.7 shows level-2 and level-3 transistors, Q4 and Q6 respectively, dissipating 5.1 mW and 7.9 mW in the ON state. The other transistors in the same pair still dissipate 2.8 mW maximum.

Table 6-8: Calculation of power dissipation in a HBT device at different current and logic levels.

Type
VCE

[V]
IC

[mA]
P

[mW]
IC

[mA]
P

[mW]
IC

[mA]
P

[mW]
Worst Case

VCE[V] IC[mA] P[mW]
Level 11.150.5 0.5751.21.38 22.31.15 2.02.3
Level 21.40.5 0.71.21.68 22.85.1 2.05.1
Level 31.40.5 0.71.21.68 22.87.9 2.07.9

Figure 6.7: A 3-input AND gate displaying bias variations in level 1, level 2, and level 3 devices at 2 mA.

Results of the Junction Temperature Model

All the devices used in the design were modeled for temperature rise and the results were used to determine the maximum temperature rise in the package. Current/area and total power dissipation are two main indicators of the maximum temperature rise on a device. These two numbers were higher on the transistors than on the schottky diodes and the resistors. Therefore, the results for only the transistors are presented in the following sections. There are three types of gates in the design - high/medium/low power. These devices consume - on an average - 2.8 mW, 1.2 mW, and 0.7 mW respectively as given is Table 6-8. In a real circuit these devices switch between on and off states, and therefore their average power dissipation is lower. Still, to get a fair idea about the maximum temperature rise, 2.8 mW power dissipation was assumed in all the Q1 devices. Q3 transistors are assumed to dissipate 12.6 mW (4.2 mW/finger).

Standard Q1 Device

The standard Q1, also known as Q1430, having an emitter stripe area of 1.4 µm x 3.0 µm was simulated by varying device power dissipation values at a chip thickness of 600 µm, and was compared with the published measured results [Wald92]. This device was analyzed to gain confidence in the device temperature modeling due to the availability of experimental results. Figure 6.8 shows the comparison between the simulated and measured average temperatures. The simulated average temperature, averaged over the emitter area, matches the measured temperature within 10%.

Figure 6.8: Comparison of modeled and measured [Wald92] average temperatures for a Q1 Device (1.4 µm x 3.0 µm Emitter) at different device power dissipation values (tdie = 600µm, tepoxy = 50µm).

Figure 6.9 and Figure 6.10 show the temperature distribution due to Q1430 from different angles. The average temperature rise on this device is 10C at the worst case power dissipation values encountered in the current designs. These transistors were later abandoned in favor of Q1217 transistors which are described in the next section.


Figure 6.9: Temperature distribution for a Q1430 with P = 2.8 mW, tdie = 600 µm, tepoxy = 50 µm, Tpeak = 16.2C, and Tavg = 9.48C.

Figure 6.10: 2-D surface plot of temperature variation for a Q1430 device with P = 2.8 mW, tdie = 600 µm, tepoxy = 50 µm, Tpeak = 16.2C, and Tavg = 9.48C.

Q1217 Device


Figure 6.11: Temperature distribution on top of the die surface for a Q1217 device with P = 2.8 mW, tdie = 75 µm, tepoxy = 25 µm, Tmax = 23.6C, and Tavg = 13.03C.

Q1217 device was designed to increase the switching speed of the Q1430 device by increasing the current density in the emitter at the same power. This was accomplished by reducing the emitter size [Camp97]. This, in turn, increased the device temperature rise from about 10C in Q1430 to 13.03C, as shown in Figure 6.11. Figure 6.12 shows the lateral temperature distribution from the center of the device. All the Q1430 devices have been replaced by this device in the layouts. Therefore, the temperature rise over this device at the maximum power dissipation conditions was taken as an important design constraint for the thermal design of the package.

Figure 6.12: Two-dimensional numerical simulation of the temperature rise on a Q1217 transistor with distance from the center of the transistor with P = 2.8 mW, tdie = 75 µm, Tmax = 23.6C, and Tavg = 13.03C.

Another interesting problem was the variation in temperature with respect to die thickness. Figure 6.13 shows the temperature increase by varying device power and die thickness. The device temperature goes up by at least 80C at a device power dissipation of 20 mW and an almost zero thickness. The results of a simulation of temperature rise at a constant device power of 2.8 mW and varying die thickness are shown in Figure 6.14. Due to the small power generation area the total thermal resistance of the device doesn't change too much until the die is very thin. Therefore the die needs to be lapped to at least 75 µm. Figure 6.15 shows the same kind of temperature variation with different device power and thickness values.


Figure 6.13: Variation of surface temperature with device power and die thickness for Q1217 device.

Figure 6.14: Variation of temperature with die thickness at device power of 2.8 mW for a Q1217 transistor.

Figure 6.15: Temperature variation with varying die thickness at different device powers for a Q1217 transistor.

Q1226 Device

Figure 6.16: Temperature distribution for a Q1226 device with P = 4.2 mW, tdie = 75 µm, tepoxy = 25 µm, Tpeak = 28.19C, Tavg = 16.05C.

Q1226 is a long emitter device used in the register files and cache blocks when the switching current required was a little more than 2 mA. The resulting temperature rise is higher than the temperature rise on a standard Q1217 device. Since this device has been used only in a few places it is not a cause of concern. The temperature peak on this device is sharper than the temperature peak on Q1217 due to an increase in emitter stripe area at the same current density. Since all these devices are used in pairs, one of them is off at any time and therefore the device has enough surrounding area so that it doesn't affect any other device's temperature field.

Q3 Device

Figure 6.17: Q3 device with each finger dissipating 4.2 mW, at I = 3 mA in each finger, with tdie = 600 µm, Tmax = 21.86C, and Tavg= 15.32C.

Q3 devices are three emitter devices which drive large loads. These devices have mostly been used in address drivers in memory or in off-chip drivers requiring high power. These devices can sink a maximum of 12 mA of current but, on an average, have been used at 9 mA levels. The current density is lower than a Q1217 device and the temperature rise is slightly more than a standard Q1 device. Each finger carries 4.2 mW and the temperature distribution is shown in Figure 6.17 and Figure 6.18.

Figure 6.18: Variation of average and peak temperatures for a Q3 transistor at a power level of 4.2 mW/finger. The emitter stripe area is 1.2 µm x 5.0 µm.

Another important fact evident from Figure 6.17 is that local heating effects do not interact even within the narrow confines of a device where we see three individual peaks of equal heights. Therefore, even in a dense device array such as register file the effects of mutual heating are minimal and the junction temperature depends, primarily, on the power dissipation of the device itself. This effect is also more pronounced due to the lower thermal spreading capability of GaAs material.

Proximity Heating

Figure 6.19: Relative placement of two heat sources and devices.

Simulations were done to find out the minimum distance to keep the temperature rise on either device under control. The F-RISC/G chips employ differential logic with devices placed in pairs. This meant that one of the device is off and the other one is on at any time. For a worst case analysis two different current switches were placed very close with two ON devices put side by side. This was simulated at the maximum power levels by varying the distance between them. The results for Q1217, Q1226, and Q3 are shown in Figure 6.20, Figure 6.21, Figure 6.22, and Figure 6.23. The emitter pitch and minimum device distances are illustrated in Figure 6.19.

Figure 6.20: Design guide for specifying the minimum distance between two Q1217 devices of different current switches, with one device fixed at (0,0) and the other moved along (X,0) with P = 2.8 mW, tdie = 75 µm.

Figure 6.21: Average junction temperature increase with one Q1217 device fixed at (0,0) and the other positioned at (x,y) with P = 2.8 mW, and tdie = 75 µm.

Figure 6.22: Distance vs. temperature rise for two Q1226 devices with tdie = 75 µm, tepoxy = 25 µm, and P = 4.2 mW.

Figure 6.23: Distance vs. temperature rise for two Q3 devices, tdie = 75 µm, tepoxy = 25 µm, and P=4.2 mW.

Chip Level Mutual Heating

Figure 6.24: Densely packed array of heat sources.

All the results presented so far were obtained by simulating one or two devices in an isolated environment. The thermal environment on a real chip is much more complex and a device can heat up due to the addition of mutual heating effects from a number of devices much farther away. Therefore, to simulate the worst case temperature rise, an array of Q1217 devices was simulated with all the devices dissipating power. Due to the slow speed of simulation, an array size of 1.6 mm x 1.3 mm was simulated by replicating heat sources at Wx and Wy pitch as shown in Figure 6.24. This size was more than sufficient as mutual heating effects decrease exponentially once the devices are further apart than the maximum die thickness of 600 µm. There are few higher density placements possible that are not legal due to design rule restrictions. The simulation results are shown in Figure 6.25 with the temperature peaking at the junctions.


Figure 6.25: 3x3 array of Q1217 devices showing temperature peaks at device junctions with P = 2.8 mW, tdie = 75 µm, Tmax = 26.92C, and Tavg = 16.03C.

The temperature increase over the chip due to mutual heating and field coupling is shown in Figure 6.26 and Figure 6.27. The big device array was simulated at different die thicknesses to determine the optimum thickness keeping in mind its handling. The 11 data points in Figure 6.28 took 240 hours to simulate and indicated a maximum temperature rise of 18.4C at a die thickness of 75 µm. This decided the die thickness and the maximum temperature difference across it to be considered in the thermal structure design. A summary of all the device level simulations and resulting thermal impedances is given in Table 6-9.

Figure 6.26: Field coupling in a dense array of Q1217 devices with P = 2.8 mW, tdie = 75 µm, Tpeak = 26.92C, and Tavg= 16.03C. Interacting lines are at 20% of Tpeak.

Figure 6.27: Field coupling in a dense array of Q1217 devices with P = 2.8 mW, tdie = 75 µm, Tpeak = 26.92C, and Tavg = 16.03 C. Interacting lines are at 15% of Tpeak.

Figure 6.28: Average junction temperature rise vs. die thickness for a big array of Q1217 devices with P = 2.8 mW (Area simulated = 1.6 mm x 1.3 mm).

Table 6-9: Summary of different device thermal simulations.

Device
Emitter Stripe
Peak Temp

P = 2.8 mW

tdie = 75 µm

[C]
Ave Temp

P = 2.8 mW

tdie = 75 µm

[C]
Minimum Design Rule
Rthermal

[C/mW]
Q14301.4 x 3.016.2 9.483.30
Q12171.2 x 1.723.6 13.0312 µm4.65
Q12261.2 x 2.628.19 16.0515 µm5.73
Q33 x 1.2 x 5.021.86 15.3218 µm5.47

Design Guidelines for Thermal Integrity

Consideration of all the factors discussed earlier results in the following design guide for managing the thermal integrity of a chip.

  1. Reduce power dissipation of a device by decreasing the voltage across it at the same current. For example, try to avoid cells with VCE of more than 1.4 V such as in the AND gate shown in Figure 6.7.
  2. Use device layouts with wider spacing between the emitters [Poul91].
  3. Decrease the device current density [Poul91] required to drive a line by lowering its capacitance and/or resistance.
  4. A Q1227 device can switch a maximum of 3 mA of current. This increases the average junction temperature by as much as 3C over the average temperature of a normal Q1217 device. The impact of this temperature rise should be offset by providing enough margin around the places this device has been used.
  5. The resistors should be kept as far as possible from a device as they also dissipate heat.
  6. Big devices such as Q3s should be used sparingly and be kept away from other heat sources.
slide_list

Active Cooling with Thermo-Electric Coolers

A standard thermoelectric cooler (TEC) consists of a number of thermocouples, made of n- and p- type legs - connected electrically in series, and thermally in parallel - between two ceramic plates. When a current is sent through it, a temperature difference is generated across its two face plates due to peltier effect [Gold95]. The TECs have been used successfully in applications where space and weight are at a premium such as space based electronics [Vand95]. Therefore, one way to increase the cooling ability of the F-RISC/G package is by incorporating a TEC between the package and the heat sink. Peltier coolers increases the efficiency of a heat sink by increasing its base temperature. They are also very useful in dealing with the hot spot problem, effectively pumping the heat away from the center of a chip. For example, the register files are very hot and located almost in the center of the package and can use this heat pumping ability to cool the F-RISC/G module.

Good thermoelectric materials are good electrical conductors but poor thermal conductors. They also possess large Seebeck coefficients. This is represented by a figure of merit called z given as

[6. 12]

where, is the Seebeck coefficient in V/C, is the electrical conductivity in mhos/m, k is the thermal conductivity in W/mK and the units of z are K-1. , , k, and z are strong functions of temperature. z determines the maximum temperature difference possible with a thermocouple made of a set of materials. The standard active material used for thermo-electric coolers is bismuth telluride based alloys and the theoretical maximum z which can be obtained with this material is in the range 3 - 4 K-1 depending on the average temperature of operation [Gold86]. The maximum available in a commercial product is about 2.7 by several manufacturers. z varies with temperature and therefore, zT, where T is the absolute temperature, is also used as a figure of merit at a temperature T. At present the highest zT which can be obtained is close to 1 for SiGe at 1050 K as compared to the theoretical maximum possible of 1.75 at 1050K [Rowe95].

The efficiency of a thermoelectric converter depends upon the temperature difference across which it operates, the average temperature of operation, and its figure of merit. There are three main categories of materials which are popular for thermoelectric applications in different temperature ranges [Rowe95]. These are given in Table 6-10.

Commercial ratings are typically specified over vacuum operating conditions or under dry N2 atmosphere. Typically, the specifications are derated by upto 25% for operation under standard atmospheric conditions due to parasitic losses. Another specification of importance is the maximum temperature inside a TEC. The standard modules are assembled with solder that melts at 138C. Therefore, the maximum operating temperature should be well below this melting temperature. A good value is 80C. The TECs can be mounted using solder, adhesives, or compression methods. Large TECs are not recommended to be soldered to a heat sink as the stresses developed across the thermocouples due to the temperature difference across their hot and cold surfaces may crack them.

Table 6-10: Prominent thermoelectric materials [Rowe95].

MaterialFigure of Merit Ranking Temperature range
Bismuth Telluride and its alloys1 450 K (max)
Lead Telluride based alloys2 1000 K
Silicon germanium31300 K

There are five types of effects taking place inside a TEC -peltier, seebeck, thomson, joule, and fourier. Peltier, seebeck, and thomson effects are reversible but not the joule heating and fourier heat conduction. Therefore, a thermoelectric cooler rejects more heat on the hot side than it can absorb on the cold side. This puts a limit on the amount of possible cooling depending on the heat dissipation capacity on the hot side. Peltier effect is proportional to the current while joule heating is proportional to the square of current. Therefore above an optimum current value, joule heating effect dominates over the peltier cooling effect and the device starts losing its efficiency. Since one requirement of this design was to keep it air-cooled with the available heat-sinks it was necessary to find a cooler solution within a 600 W range.

Thermoelectric Calculations for Module Selection

The best commercially available thermoelectric material in the temperature range of -120C to 230C is a pseudo-binary alloy of bismuth, tellurium, selenium, and antimony, - (Bi,Sb)2(Te,Se)3 - also, referred to as bismuth telluride [Marl95]. The highest reported figure of merit is about 4x10-3 K-1 for this material [Seel72] though the commercially available devices have a figure of merit of about 2.67x10-3 K-1. All the good thermoelectric materials, such as bismuth telluride, are semiconductors and a thermocouple can be obtained by joining a p-type and an n-type element. When a direct current is passed from the n to the p element, cooling takes place at the cold junction, while reversing the current direction generates heat at the same junction. The normal temperature range of operation of F-RISC/G is well within the range of bismuth telluride, and therefore, parameters of this material were used in the design equations described in this section.

The following equations are based on [Krau83][Gold95][Marl95] to illustrate the theory used in determining the cooler specifications and have been reformulated to incorporate the factors employed by commercial manufacturers. Figure 6.29 shows the schematic of a TEC with only one thermocouple. The two legs of the thermocouple are made of a p-type element, with resistivity p, thermal conductivity kp, and Seebeck coefficient p, and an n-type element, with resistivity n, thermal conductivity kn, and Seebeck coefficient n. The area and length of the p- and n-type elements are Ap, Lp, and An, Ln respectively.

Figure 6.29: Schematic of a thermoelectric cooler.

Assuming Th and Tc as the hot and cold junction temperatures of the thermo-couple we can define Tave and T as and, [6. 13]

Geometric factor, G, of these thermoelectric element is defined as

[6. 14]

The overall resistance, thermal conductance, and Seebeck coefficient of the thermocouple is given as

[6. 15]

[6. 16]

and, [6. 17]

The figure-of-merit (FOM) of this thermocouple can now be defined as

[6. 18]

whereas z was the FOM for a material. Given a material, the highest value of Z is desired for maximum performance and it reaches that value when the denominator, KR, in the previous equation is minimized. KR is minimized when [6. 19]

Due to the flat nature of the top and bottom surface of a TEC Lp = Ln. In this temperature range and k are also comparable for both p-and n-type materials and therefore Ap=An [Gold95]. This also makes Gp=Gn. Therefore, p and n type legs can now be described with a single set of parameters and will not be distinguished in the equations. The new Z is given by, [6. 20]

When a current I is sent through the couple, it develops a voltage across the couple which is equal to the sum of the Seebeck voltages and the ohmic drop in the couple, i.e., [6. 21]

The power supplied to the couple is or, [6. 22]

The heat absorbed at the cold junction is reduced by two sources - Joule heat and the fourier heat. The net heat absorbed at the cold junction is then equal to the peltier heat at the cold junction minus the sum of resistive heat loss (assumed half of total resistive loss at each junction) and the heat conducted from the hot junction to the cold junction. This is given as [6. 23]or, [6. 24]

The total heat rejected by the couple on the hot side is [6. 25]

The coefficient of performance (COP), , of the couple is defined as the ratio of the heat withdrawn to the applied power. Hence, [6. 26]

can be maximized by applying the optimum current I0. Differentiating with respect to I0 and equating it to zero we get the following I0 [6. 27]

Thus, the optimum current through a couple for a given average temperature and temperature difference depends solely on its bulk material properties ( and Z) and its geometrical properties (G). The coefficient of performance at the optimum current can be simplified to [6. 28]

The goal is to minimize total heat rejected on the hot side for a given qC, T, and area. For a given material, maximizing G maximizes the optimum current and satisfies the minimum heat rejection constraints. Currently, couples with a range of G values are available from commercial manufacturers such as MELCOR Corporation [Eem95] and the maximum G available from MELCOR is 1.179. The problem in increasing G is the low yields in getting bigger area and shorter length pellets. Anything shorter or bigger also requires a high current source which can be impractical to obtain at such a low voltage.

The commercial modules are specified by four temperature dependent parameters - Imax, Vmax, Tmax, and Qcmax. These parameters are always specified at a hot side temperature, Th, of 25C. Imax/G is unaffected by Th and is about 5x10-3 A/m. The parameters of the couples made by MELCOR and other manufacturers are given in Table 6-12 and Table 6-12.

Table 6-11: Commercial module specifications.

ParameterMinimumMaximum
Size1.8 mm x 3.4 mm 62 mm x 62 mm
H2.45 mm5.8 mm
Qc-max0.2 W 125 W
Imax0.8 A 60 A
Vmax0.4 V 15.4 V
N4127
Tmax67C
G0.0161.179
Surface FinishLapped 1 mil

Table 6-12: Thermoelectric parameters.

ItemValue (T = 296K)
n-2x10-4 V/C
p2x10-4 V/C
n1.0x10-3 cm
p1.0x10-3 cm
Kn1.5x10-2 W/cm.K
Kp1.5x10-2 W/cm.K
Z2.67x10-3 K-1

The TECs with a maximum G of 1.179 are available with a surface area of 55 mm x 55 mm and contains 31 couples. Four of these TEC can be mounted on top of the heat sink which has a mounting area of 12.75 cm x 12.75 cm. Figure 6.30 shows an engineering drawing of this TEC with relevant dimensions and parameters.


Figure 6.30: CP 5-31-06L from MELCOR with Imax = 60 A, Qmax = 125 W, Vmax = 3.75 V, Tmax = 67C, and number of couples N = 31.

Table 6-12 gives the material parameters at 296 K. The temperature dependent variation in the parameters is given by the equations below as specified by MELCOR Corporation.(Tave) = ( 22224.0 + 930.6 Tave - 0.9905 Tave2 ) x 10-9 V/C [6. 29](Tave) = ( 5112.0 + 163.4 Tave + 0.6279 Tave2 ) x 10-8 -cm [6. 30]k(Tave) = ( 62605.0 - 277.7 Tave + 0.4131 Tave2 ) x 10-6 W/cm.K [6. 31]

To take the heat sink and the contact resistance into account, T can be substituted by [6. 32]

This makes the equation for qc iterative as the hot plate temperature depends on the temperature rise of the heat sink which in turn depends on the heat rejected by the cooler. Therefore, the TEC equations were analyzed independently of the heat sink resistance, for the amount of cooling possible with a given cooler.

Results

All the equations provided above were input into a spreadsheet with the parameters of the chosen TEC - CP 5-31-06L. From the initial modeling of the device and package thermal resistance, the temperature rise of the junction relative to the TEC cold face was about 25C. This suggested the temperature range at the TEC cold face (Tc) to be 0C - 10C to keep the device junction (Tj) in the 25C - 35C range. The temperature of the hot face (Th) depended on the resistance of the external heat sink and the resistance of the bond between the heat sink and the cooler. Since Th was still unknown, the total amount of heat rejected for different values of (Tc-Th) was calculated. The plots for three different cold side values - in the 0C-10C range - are shown in Figure 6.31. The plots have a parabolic shape and show that for every delta T value there are two possible values of the input current or power. The cooler is run at the minimum value of the two currents to minimize the rejected heat. The left half of the plot is detailed in Figure 6.32. These values are further given in Table 6-13 for calculation purposes. Shaded rows in Table 6-13 provide a few operating points for cooler operation keeping in mind the cooling required for the package.

Figure 6.31: Plot of delta T vs. heat rejected at cold side temperature Tc = 0C, 5C, and 10C, and input heat Qc =220 W.

Figure 6.32: Delta T vs. heat rejected at cold side temperature Tc = 0C, 5C, and 10 C, and input heat Qc = 220 W.

Table 6-13: Rejected power vs. delta T for Qc = 220 W.

Delta T

[C]
Heat Rejected

(Tc = 0C)

[W]
Heat Rejected

(Tc= 5C)

[W]
Heat Rejected

(Tc= 10C)

[W]
0
306.67
303.10
299.85
5
335.56
330.31
325.57
10
370.36
362.86
356.18
15
412.34
401.81
392.55
20
463.49
448.72
435.98
25
527.07
506.07
488.39
30
609.23
578.09
552.91
35
724.69
673.5
635.33
40
944.78
817.34
748.91
45
981.77
947.18

The heat sink [Knic91] is rated at 600 W capacity at 150 CFM air flow at room temperature with a thermal resistance of 0.041 °C/W. This can be bumped to 700 W-750 W by increasing the air flow and/or lowering the ambient temperature. It has a top mounting area of 127.5 mm x 127.5 mm of aluminum and a fin area of 1m2. The full assembly has a plenum and a blower to pull air through the heat sink. Thus, the maximum cooling possible with the selected thermo-electric cooler, keeping the possible heat dissipation capacity of the heat sink in mind, is about 35C when the cooler rejects about 725 W. The actual operation point will depend on the ambient temperature and the total resistance from cooler hot side to ambient air and is determined later.

Several things can be done to increase performance of these coolers. Normally, they are made with alumina face plates. To increase the thermal spreading capability of the face plates and lower the overall resistance, other options are aluminum nitride, and beryllia plates. These devices need DC power for operation. Linear or switching power supplies are available for these applications. The power supplies should have a low ripple present at the power supply output. MELCOR recommends limiting this ripple to less than 10% to contain the performance loss less than 1%. One fact to note in this design is the inadequacy of a multi-stage cooler. Multistage units can improve the coefficient of performance but they need more surface area on the hot side which was not possible in this case. Multi-stage cooling is useful for achieving larger temperature differences. Another idea is to use a TEC as a heat spreader by assembling elements with different values of G with the higher values of G in the center where most of the heat is being absorbed and lower values of G in the outer areas where little heat is being absorbed.

Thermal Path in the Package and Material Selection

The major heat conduction path is vertically downward in the package, from the back side of the chips, as shown in Figure 6.33. There is negligible power conducted from the top of the chip through the polyimide and copper wires to the ambient and will be ignored for the purpose of making conservative design decisions. A possible scheme of assembling this package is illustrated in Figure 6.34 with the thermal interfaces between different blocks shown in bold lines.

Figure 6.33 : Electrical and heat extraction paths are in opposite directions.

Figure 6.34: Cross-section of the thermal structure.

This arrangement shows a first level assembly of chips on three separate substrates and placement of the combined module on a heat spreader. The heat spreader spreads the heat on top of a 2x2 thermo-electric cooler array mounted on top of a heat sink. The problem is lowering the interfacial thermal resistance and matching the thermal coefficients of expansion across different interfaces. The constraint of maintaining the repair/rework capability of the modules and matching the coefficients of expansion removed the possibility of soldering at all the interfaces. The design matrix for die-attach is given in Table 6-14.

Table 6-14: Design matrix for die attach

Die AttachThermal Conductivity ReworkComments
EpoxyLowDifficult Compliant
ThermoplasticsLowYes Compliant
SolderHighNo
  • Stiff
  • Needs backside metal

Thermoplastics are easiest to rework but a disadvantage is their low thermal conductivity. In the current scheme the chips are attached to silvar shims with epoxy (84-1LMI - a trademark of Ablestick). Silvar keeps fragile GaAs dice in slight compression after processing [Micr95]. The shims are attached to an AlN substrate with another layer of epoxy. The high thermal conductivity of aluminum nitride and GE's prior experience of using it with GaAs chips made the choice of substrate relatively easier. Other substrates considered are given Table 6-15.

Table 6-15: Design matrix for substrates

SubstrateThermal Conductivity [W/m·K] TCE

[10-6/C]

Sizes Available

[in. x in. x in.]
Comments
Alumina (96%)206.8 4 x 4 x 0.3
Poor heat spreader
AlN2504.3 4 x 4 x 0.3
Good heat spreader
AlSiC1606.5-9 4 x 4 x 0.3
Good heat spreader
Diamond120023 4 x 4 x 1 mm
Excellent heat spreading

The heat from the AlN substrate is spread on top of the thermoelectric cooler array using a heat spreader. The substrate needs to be placed on top of this spreader using a reworkable interface. The thermoelectric cooler array is mounted on top of Al heat sink using a reworkable and pliant material. Selection of this material is described later in the section on thermal calculation. Thus, the resistance to the heat flow comprises of the thermal resistance offered by the chip, epoxy, shims, epoxy, AlN, thermal interface, heat spreader, thermal interface, thermoelectric cooler, thermal interface, and the heat-sink. The selection of the heat spreader and this interface material is done in the next section.

Modeling of a Heat Spreader

The thermoelectric cooler array requires a uniform loading on its top face. This is accomplished by placing a heat spreader between the package and the cooler to spread the heat out on all sides of the package. Three-dimensional modeling is required for determining this spreading resistance. In case of simple geometries a good correlation can be obtained on the basis of spreading angles. The spreading resistance increases with increasing thermal conductivity as shown in Figure 6.35 [Kuhl92]. As shown in Figure 6.36, the heat dissipated by the package is spread out using a heat spreader with a maximum base dimensions equal to the area of the cooler. The spreading angles for common substrate materials and the dimension of some of the possible heat spreaders with the resulting temperature difference across them are given in Table 6-16.

Figure 6.35: Thermal spreading vs. thermal conductivity [Kuhl92].

Figure 6.36: Top view of the package along with the heat spreading.

Table 6-16: Performance of different heat spreaders with Q = 221.1 W.

Material
Th. Cond.

[W/m·K]
TCE

[10-6/C]
Spread

Angle
Horizontal

Spread [mm]
Thickness

[mm]
R

[C/W]
ÆT

[C]
Al22023 563926.3
0.0167
3.70
AlN2504.3 623920.7
0.0116
2.55
Cu39817 703914.2
0.0050
1.10

Thermal Resistance Calculations

The temporary nature of the testing process required a thermal interface which would provide high thermal conductivity, good compliance to both microscopic and macroscopic irregularities, ease of application, and reworkability. Some options with their advantages and disadvantages are given in Table 6-17.

Table 6-17: Options for thermal interfaces.

Thermal InterfaceAdvantages Disadvantages
Solder
  • High k
  • low reworkability
  • low compliancy
Epoxy
  • reworkable
  • compliant
  • low k
Thermal Grease
  • reworkable
  • compliant
  • low k
Thermal pads
  • reworkable
  • compliant
  • low k
  • needs pressure and temperature

Commercially available thermal pads were looked at as a possible medium as their thermal conductivity is as good as thermal grease and they are able to conform to irregular surfaces. Using these pads also removes the problem of handling greases. These pads are characterized better by their thermal impedance values. In addition to a material's ability to conduct heat, this quantity also denotes the material's ability to conform to irregular surfaces minimizing contact resistance. Some of the commonly available materials are given in Table 6-18. T-pli205 (a trademark of Thermagon) is a conformable elastomeric material and provides a low thermal impedance at a clamping pressure of 10 psi. Since it doesn't need any phase changing temperature for best operation it was selected over other thermal pads.

Table 6-18: Commercially available thermal interface pads.

Name
k

[W/m.K]
Th. Imp.

[C-in2/W]
Comments
Kon-Dux™ (Aavid)
0.17
graphite based
Softface™ (Bergquist)
3.5
0.07
supplied on a polyester film; can be transferred using hot stamping
Hi-Flow™ (Bergquist)
1
0.05
filled polymer;dries at 27C;flows at 43C;low pressure operation
Q-Pad3™ (Bergquist)
2.4
0.1
pressure sensitive, graphite imbedded in a polymer matrix,
Thermflow T705™

(Chomerics)

0.72
0.06

(5 psi)
thin, dry film; softens at 50C;
T-pli205™ (Thermagon)
6.6
0.07(10 psi)
0.125 mm thickness; elastomer

The schematic of the whole thermal structure is given in Figure 6.37 with the thermal resistance values in Table 6-19. Instruction decoder chip is assumed for chip level resistor calculations due to it being the chip with highest power density. One-dimensional heat flow is assumed through the thin epoxy, shim, and the substrate layers. Aluminum is assumed as a heat spreader. The module is covered with thermal insulation to stop a feedback of heat from the cooler's hot plate to cold plate and transfer of heat from ambient to the module.

Figure 6.37: Schematic of the thermal structure.

Table 6-19: Thermal calculations (ID Chip - 7.6 x 8.7 mm2 - 12 W).

Resistor
Thermal

Conductivity

[W/m.K]
Thickness

(x)

[µm]
Area

(A)

[mm2]
Temperature

T

[C]
Resistance

R

[C/W]
R1(Chip+Epoxy)*
46
1007.6x8.718.4 * 1.53
R2 (Silvar Shim)
153
2507.6x8.70.18 0.024
R3 (Epoxy:84-1LMI)
3.93
257.6x8.71.15 0.096
R4 (AlN)
250
89521010.37 0.0017
R5 (T-pli205)
6.6
12521014.75 0.0215
R6 (Heat Spreader)
220
26300121003.70 0.0166
R7 (ESP7359)**
11.6
100121000.157 0.00071
Total (Heat input = 221.2 W) 28.7

* - From device level simulations **ESP7359 - Diamond Epoxy from AI Technology.

Therefore, the temperature rise on top of the cooler is 28.7 C. The heat sink is attached to the cooler with ESP7359 (diamond filled epoxy) as shown in Figure 6.38. The desired degrees of freedom are the ability to independently vary both the temperature difference across the cooler plates and the ambient temperature. Fixing the temperature difference across the cooler plates demands a particular ambient temperature due to a finite resistance between the cooler hot face and the ambient. Hence, there is only degree of freedom available in this case. Table 6-20 gives the ambient temperature needed to obtain a particular cold side temperature. The most promising numbers are those with high enough ambient temperatures and low enough cold side temperatures. The best operating points to keep the junction temperature in the range of 25C - 40C are shown as shaded rows. The entire thermal network with the junction temperature at 28.7C is shown in Figure 6.39. An isometric view of the proposed assembly of the structure is shown in Figure 6.40.

Table 6-20: Possible cooling scenarios with the heat sink.

Delta

T

[C]
Heat Sink

R

[C/W]
Adhesive

Resistance

[C/W]
Cold Side

Temperature

[C]
Heat

Rejected

[W]
Ambient

Temp.

[C]
Junc.

Temp.

[C]
35
0.041
0.00071
0
724.6
4.77
28.7
35
0.041
0.00071
5
673.5
11.90
33.7
35
0.041
0.00071
10
635.3
18.50
38.7
30
0.041
0.00071
0
609.2
4.59
28.7
30
0.041
0.00071
5
578.0
10.89
33.7
30
0.041
0.00071
10
552.9
16.93
38.7
25
0.041
0.00071
0
527.0
3.01
28.7
25
0.041
0.00071
5
506.0
8.89
33.7
25
0.041
0.00071
10
488.3
14.63
38.7

Figure 6.38: The resistance network from the cooler hot side to the ambient.

Figure 6.39 : Equivalent thermal network with resistance values and temperature differences across the resistors for a junction temperature of 28.7C and an ambient of 4.77C.

Figure 6.40: An isometric view of the heat sink, cooler array, and the package.

Modeling of the Combined Structure

AutoTherm, a finite element analysis tool to thermally characterize PCBs and multichip modules [Ment91], was used to model the temperature distribution on the chip surface and modeling the package, and heat spreader together. This tool has been reported to be accurate to within 5-6% [Wolf91]. Uniform heat flux assumption over chip surfaces give the temperature distribution shown in Figure 6.41. The maximum temperature rise is about 25C over the base temperature.

Figure 6.41 : Temperature distribution on top of the chips (Max temp = 25C).

Since the power densities are not uniform on the chips, a non-uniform power density distribution was modeled with the register files and the cache blocks dissipating more heat than the logic blocks. The resulting temperature distribution is shown in Figure 6.42 with a maximum temperature of 27 C.

Figure 6.42 : Temperature distribution on top of the chips: non-uniform case (Max temp = 27 C).

There are several ways of arranging the external cooling required for this package. Figure 6.43 shows a couple of such configurations based on [Naka86]. Configuration (a) is possible in CII due to its use of liquid cooling in room air-conditioners. Configuration (b) is possible at GE facilities.

(a)

(b)

Figure 6.43: Heat dissipation schemes.

Stress Management

The TCE mismatches of the materials in this structure are minimized to prevent failure due to thermally induced forces and stresses during assembly. Since the structure will operate in a 30 - 40 C window near room temperature the temperature excursions are not going to be of too much magnitude. Still, there is a chance of fatigue failure due to thermal cycling.

Monitoring Temperature

The chip temperature can be measured by several methods such as measuring the voltage drop across an on-chip diode, liquid crystal thermography, or by infrared imaging [Tumm89]. All these methods have their own limitations and do not have a high resolution suitable for sub-micron features on a chip. Most of the temperature spike is in the emitter region and it falls sharply as we move away from the emitter. Diode drops can be used to indicate the average temperature near a junction. They can be calibrated by pulse measurements, so they don't heat up, with the chip turned off and later pulsing them again with the chip turned on. The diode can also be made to estimate the temperature by dissipating the same amount of power and employing rapid fire measurements. On the other hand the chip temperature can also be determined by measuring the temperature at several points on the support structure of the package and reverse calculating the temperature on the chip surface. For this several K-type thermocouples are attached with thermally conducting epoxy to the heat sink and thermoelectric coolers.

Summary

The temperature presents another limitation in designs with increased heat flux as the devices operate in a limited temperature range [Ryma89]. More modeling effort is needed on the device temperature variations and the peak junction temperature. It will be more and more important due to the sensitivity of gate delays to temperature. The design of an appropriate cooling system for the package was challenging in terms of the required constraint of air-cooling and cost. Peltier junction based coolers can provide the requisite assistance to these air-cooled packages. Increasing the power requirements of a chip will increase the total cost of a computer due to its increasing cooling costs. This is why optimal circuit designs that keep power dissipation to a minimum are increasingly more important.