A 32-Word by 32-Bit Three-Port Bipolar Register File Implemented Using a SiGe HBT BiCMOS Technology

 

By

 

Samuel A. Steidl

A Thesis Submitted to the Graduate

Faculty of Rensselaer Polytechnic Institute

in Partial Fulfillment of the

Requirements for the Degree of

DOCTOR OF PHILOSOPHY

Major Subject: Electrical Engineering

The original of the complete thesis is on file

in the Rensselaer Polytechnic Institute Library

 

 

 

 

Examining Committee:

            John F. McDonald, Thesis Advisor

            Kenneth Rose, Member

            Gary J. Saulnier, Member

            Toh-Ming Lu, Member

            Eugene J. Rymaszewski, Member

 

 

 

Rensselaer Polytechnic Institute

Troy, New York

 

May 2001

(For Graduation August 2001)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

© Copyright 2001

by

Samuel A. Steidl

All Rights Reserved

TABLE OF CONTENTS

 

LIST OF TABLES........................................................................................................... vi

LIST OF FIGURES........................................................................................................ xii

ACKNOWLEDGEMENTS....................................................................................... xxiv

ABSTRACT................................................................................................................. xxv

Chapter 1   Introduction and Historical Review................................ 1

1.1     Introduction............................................................................................................. 1

1.2     Technology.............................................................................................................. 3

1.3     Historical review of memory designs......................................................................... 7

1.4     ECL and CML circuit design.................................................................................... 9

Chapter 2   Register File Design.................................................................... 20

2.1     Register file overview............................................................................................. 20

2.2     Memory cell design................................................................................................ 21

2.3     Bit line driver design............................................................................................... 23

2.4     Read address decoder design................................................................................. 24

2.5     Read word line driver design.................................................................................. 29

2.6     Write address decoder and word line driver design................................................. 30

2.7     Sense amplifier design............................................................................................ 33

2.7.1     Version 1........................................................................................................ 33

2.7.2     Version 2........................................................................................................ 40

2.8     Output latch design................................................................................................. 41

2.9     Comparator design................................................................................................. 42

2.10   Current source design............................................................................................. 43

2.11   Reference voltage generator design......................................................................... 48

2.12   Device count.......................................................................................................... 50

2.13   Register file layout.................................................................................................. 50

Chapter 3   Test Chip Design.............................................................................. 55

3.1     Register file test chip overview................................................................................ 55

3.2     Address counter design.......................................................................................... 57

3.3     Data rotator design................................................................................................. 60

3.4     Write enable pulse generator design........................................................................ 61

3.5     Sampling latch design............................................................................................. 62

3.6     Viewing register file test chip signals........................................................................ 62

3.7     Pad receiver designs............................................................................................... 64

3.8     Pad driver design................................................................................................... 67

3.9     Voltage-controlled oscillator design........................................................................ 68

3.10   Register file test chip device count........................................................................... 72

3.11   Register file test chip layout..................................................................................... 72

3.12   Register file test chip alterations for the second fabrication run................................. 75

3.13   Linear feedback shift register design........................................................................ 78

3.14   Ring oscillator test chip design................................................................................ 80

3.15   Voltage-controlled oscillator test chip design........................................................... 83

Chapter 4   SiGe Technology Performance Analysis......................... 91

4.1     Simulation methodology.......................................................................................... 91

4.2     Logic swing in CML and ECL circuits.................................................................... 92

4.3     Buffer propagation delay using CML, ECL, and CMOS circuits.............................. 95

4.4     Interconnect Performance..................................................................................... 107

Chapter 5   Auxiliary Test Chip Performance Analysis................ 123

5.1     Test methodology................................................................................................. 123

5.2     Ring oscillator performance.................................................................................. 124

5.3     Voltage-controlled oscillator test chip performance............................................... 134

5.3.1     High frequency voltage-controlled oscillator performance............................... 134

5.3.2     Medium frequency voltage-controlled oscillator performance.......................... 153

5.3.3     Voltage-controlled oscillator test chip power consumption.............................. 168

Chapter 6   Register File Test Chip Performance Analysis........ 171

6.1     Voltage controlled oscillator performance............................................................. 171

6.2     Read address counter performance....................................................................... 183

6.3     Write address counter performance...................................................................... 200

6.4     Data rotator performance..................................................................................... 215

6.5     Data LFSR performance...................................................................................... 226

6.6     Write enable pulse generator performance............................................................ 238

6.7     Clock skew......................................................................................................... 241

Chapter 7   Register File Performance Analysis.............................. 244

7.1     Simulation methodology........................................................................................ 244

7.2     Read access timing............................................................................................... 245

7.3     Write access timing.............................................................................................. 263

7.4     Read after write access time performance............................................................. 278

7.5     Register file power dissipation............................................................................... 290

7.6     Register file test chip power dissipation................................................................. 292

7.7     Summary of measured register file results.............................................................. 295

Chapter 8   Future Directions...................................................................... 298

8.1     Register file improvements.................................................................................... 298

8.2     Register file test chip improvements....................................................................... 299

8.3     BiCMOS register file design................................................................................. 303

8.4     Pipelined register file design.................................................................................. 313

CONCLUSIONS............................................................................................................ 330

REFERENCES............................................................................................................... 332

APPENDIX A - Miscellaneous Schematics................................................ 340

APPENDIX B - Cell Artwork.............................................................................. 344

APPENDIX C - Wafer 2 Register File Test Chip Netlist....................... 359

 


LIST OF TABLES

 

Table 1.1: Performance summary of some current Si CMOS technologies............................. 4

Table 1.2: Performance summary of some current InP HEMT and GaAs MESFET technologies.          5

Table 1.3: Performance summary of some current bipolar and BiCMOS technologies............ 6

Table 1.4: Performance summary of some current SRAM designs. The write access time is typically reported as the time the write enable pulse must be asserted........................................................................... 8

Table 2.1: Sense amplifier steady state bias conditions computed using analytical methods and SPICE simulations assuming a sense amplifier bias current of 30 mA.......................................................................... 39

Table 2.2: Sense amplifier steady state bias conditions computed using analytical methods and SPICE simulations assuming a sense amplifier bias current of 200 mA........................................................................ 40

Table 2.3: Resistor values used in some current source reference generators to achieve the specified current level. Current sources using the reference generators are assumed to have RE values equal to REREF. Values generated using Equation 8 assume VBE is 0.9 V.................................................................................................. 44

Table 2.4: Resistor values used in some current sources to achieve the specified current level when IOx is not equal to IREF. Values obtained using Equation 8 assume VBE is 0.9 V.................................................. 47

Table 2.5: Number of devices used in the circuits comprising the register file........................ 50

Table 3.1: List of register file test chip pads and their functions............................................ 57

Table 3.2: Ouput pad signal as a function of SELA3 and SELA2........................................ 63

Table 3.3: Ouput pad signal as a function of SELB3, SELB2, SELA3, and SELA2............. 63

Table 3.4: Column selection as a function of SELC3, SELC2, SELB3, and SELB2............. 63

Table 3.5: Number of devices used in the circuits comprising the register file test chip.......... 72

Table 3.6: List of shift register latches that can be XOR’ed and fed back to the shift register input to produce LFSR’s of different sizes. Shift register latches are numbered from 0 to N-1, where latch 0 is at the beginning of the shift register..................................................................................................................................... 79

Table 4.1: Capacitance extraction results for metal 1 wires................................................ 109

Table 4.2: Capacitance extraction results for metal 2 wires................................................ 110

Table 4.3: Capacitance extraction results of metal 3 wires................................................. 111

Table 4.4: Linear regression results (slope and intercept) for propagation delay as a function of wire length using the 97 models assuming the buffer in question is driving one load buffer................................... 113

Table 4.5: Linear regression results (slope and intercept) for propagation delay as a function of wire length using the 99 models assuming the buffer in question is driving one load buffer................................... 114

Table 4.6: Coefficients for a quadratic approximation of the propagation delay as a function of wire length using the 97 models assuming the buffer in question is driving one load buffer................................... 118

Table 4.7: Coefficients for a quadratic approximation of the propagation delay as a function of wire length using the 99 models assuming the buffer in question is driving one load buffer................................... 119

Table 5.1: Simulated and measured ring oscillator output frequencies and corresponding buffer propagation delays using the 97 and 99 models. The percent error figures represent the error of the simulations with capacitive wire parasitics in comparison with the measured results of the first and second fabrication runs................ 127

Table 5.2: List of current levels used by circuits within the ring oscillator test chip............... 132

Table 5.3: Power dissipation of the ring oscillator test chip. The average power dissipation and estimated standard deviation of the measured test chips are listed for both fabrication runs. The percent error is the variation of the designed power dissipation from the average measured power dissipation for each fabrication run......... 133

Table 5.4: Simulated and measured HFVCOb frequency and peak-to-peak amplitude ranges using the 97 and 99 models. The minimum and maximum amplitudes are represented where two results are listed under peak-to-peak amplitude. The percent error figures represent the error of the simulations with capacitive wire parasitics in comparison with the measured results......................................................................................................... 140

Table 5.5: Simulated and measured MFVCO frequency and peak-to-peak amplitude ranges using the 97 and 99 models. The percent error figures represent the error of the simulations with capacitive wire parasitics in comparison with the measured results......................................................................................................... 156

Table 5.6: List of current levels used by circuits within the VCO test chip.......................... 168

Table 5.7: Power dissipation of the VCO test chip with frequency control voltages set to 0 V. The percent error is the variation of the designed power dissipation from the average measured power dissipation. 169

Table 6.1: Simulated and measured RFTC output waveform frequency and peak-to-peak amplitude ranges generated by the on-chip VCO using the 97 and 99 models. The amplitudes are listed for signals at the highest and lowest frequencies. The percent error figures represent the error of the simulations with capacitive wire parasitics in comparison with the measured results............................................................................................ 177

Table 6.2: Simulated delay required for the next state of each read address counter bit to reach the output of a master latch with respect to the rising edge of the clock signal driving the master latch. The frequency specified is the inverse of the largest delay for a particular column, which is an estimate of the maximum clock frequency for the read address counter without a malfunction according to simulations............................................................. 191

Table 6.3: Simulated and measured maximum clock frequencies at which the read address counters will function properly. The listed percent error is the error of the simulated results including capacitive wire parasitics with respect to the measured results on each wafer................................................................................... 192

Table 6.4: Simulated delay required to compute the next state value for each bit of the read address counters with respect to the rising edge of the clock signals driving the counters. The frequency specified is the inverse of the largest delay for a particular column, which represents an estimate of the maximum clock frequency for the read address counter without a malfunction according to simulations......................................................................... 194

Table 6.5: Simulated delay required for the next state of each write address counter bit to reach the output of a master latch with respect to the rising edge of the clock signal driving the master latch. The frequency specified is the inverse of the largest delay for a particular column, which represents an estimate of the maximum clock frequency for the write address counter without a malfunction according to simulations................................................. 205

Table 6.6: Simulated and measured maximum clock frequencies at which the write address counters will function correctly. The listed percent error is the error of the simulated results including capacitive wire parasitics with respect to the measured results......................................................................................................... 207

Table 6.7: Simulated delay required to compute the next state value for each bit of the write address counter with respect to the rising edge of the clock signals driving the counter. The frequency specified is the inverse of the largest delay for a particular column, which represents an estimate of the maximum clock frequency for the write address counter without a malfunction according to simulations............................................................................ 209

Table 6.8: Simulated propagation delay for the slave latches of the data rotator from the rising clock edge to when the new data appears at the latch output nodes. The frequency specified is the inverse of twice the largest delay for a particular column, which represents a theoretical limit of the clock frequency for the data rotator without a malfunction according to simulations.............................................................................................................. 220

Table 6.9: Simulated delay required for the next state of each data rotator bit to reach the output of a master latch with respect to the rising edge of the clock signal driving the master latch. The frequency specified is the inverse of the largest delay for a particular column, which is an estimate of the maximum clock frequency for the data rotator without a malfunction according to simulations............................................................................ 221

Table 6.10: Simulated maximum clock frequencies at which the data rotator will function correctly.       222

Table 6.11: Simulated propagation delay for the slave latches of the data LFSR from the rising clock edge to when the new data appears at the latch output nodes. The frequency specified is the inverse of twice the largest delay for a particular column, which represents a theoretical limit of the clock frequency for the data LFSR without a malfunction according to simulations.................................................................................................................. 231

Table 6.12: Simulated delay required for the next state of each data LFSR bit to reach the output of a master latch with respect to the rising edge of the clock signal driving the master latch. The frequency specified is the inverse of the largest delay for a particular column, which is an estimate of the maximum clock frequency for the data LFSR without a malfunction according to simulations............................................................................ 233

Table 6.13: Simulated and measured maximum clock frequencies at which the data LFSR will function correctly. The listed percent error is the error of the simulated results including capacitive wire parasitics with respect to the measured results................................................................................................................................... 234

Table 6.14: Simulated propagation delay from the multiplexer that selects between the external and on-chip clock to write enable input to the register file as well as the slowest write address input and data input to the register file on the test chips. Also listed are the available address and data setup times based on this data....... 239

Table 6.15: Summary of delays between the master on-chip clock signal and the clock signals for the read address counters and sampling latches, as well as the corresponding skew between the read address clocks and sampling latch clocks for a particular port.......................................................................................................... 242

Table 7.1: Summary of simulated propagation delays that comprise the total read access time for both read port A and read port B of the wafer 1 register file................................................................................. 251

Table 7.2: Summary of simulated propagation delays that comprise the total read access time for both read port A and read port B of the wafer 2 register file................................................................................. 254

Table 7.3: Measured read access time for both port A and B of the register file on four wafer 1 die. Four columns were measured on each die, and the average and estimated standard deviation for the read access times are listed, as well as the minimum and maximum read access times.............................................................. 256

Table 7.4: Measured read access time for both port A and B of the register file on four wafer 2 die. Four columns were measured on each die, and the average and estimated standard deviation for the read access times are listed, as well as the minimum and maximum read access times.............................................................. 257

Table 7.5: Summary of average read access times for the register file based on both simulated and measured results. The listed percent error is the error of the simulated results including capacitive wire parasitics with respect to the measured results......................................................................................................................... 258

Table 7.6: Summary of simulated propagation delays that comprise the total write access propagation delay for the register file on both wafer 1 and wafer 2.................................................................................. 266

Table 7.7: Simulated parameters that comprise the register file write access time, and also determine the pipeline clock period for write operations, using optimistic assumptions.............................................. 272

Table 7.8: Simulated parameters that comprise the register file write access time, and also determine the pipeline clock period for write operations, using conservative assumptions......................................... 274

Table 7.9: Measured minimum write enable pulse width and write access time for the register file on four wafer 2 die.    276

Table 7.10: Summary of average minimum write enable pulse widths and write access times for the register file based on both simulated and measured results. The listed percent error is the error of the simulated results including capacitive wire parasitics with respect to the measured results...................................................... 278

Table 7.11: Summary of the simulated propagation delays that comprise the total read after write access time for both read port A and read port B of the register file using optimistic assumptions for the address setup time.    284

Table 7.12: Summary of simulated propagation delays that comprise the total read after write access time for both read port A and read port B of the register file using conservative assumptions for the address setup time.       286

Table 7.13: Measured read after write access times for both port A and B of the register file on four wafer 2 die. The read after write access time was determined for the column with the slowest measured read access time on each die, which is also listed for the sake of comparison.......................................................................... 288

Table 7.14: Summary of the average read after write access times for the register file based on both simulated and measured results. The listed percent error is the error of the simulated results including capacitive wire parasitics with respect to the measured results.................................................................................... 289

Table 7.15: Current levels used in the wafer 1 register file circuits...................................... 291

Table 7.16: Current levels used in the wafer 2 register file circuits...................................... 291

Table 7.17: Current levels used by the RFTC circuits that test the wafer 1 register file........ 292

Table 7.18: Current levels used by the RFTC circuits that test the wafer 2 register file........ 293

Table 7.19: Applied voltage to wafer 1 and wafer 2 register file test chips during measurements along with the consumed current and resulting power dissipation........................................................................ 294

Table 7.20: Power dissipation of the register file test chip by design and according to measurements. The percent error is the variation of the designed power dissipation from the average measured power dissipation.          295

Table 8.1: Summary of simulated propagation delays that comprise the total read access time for the BiCMOS register file. Simulations do not include wire parasitics.................................................................... 309

Table 8.2: Write word line driver and memory cell write access propagation delays for the BiCMOS register file simulated without wire parasitics. Also included are equivalent propagation delays for the bipolar register file. The estimated write access propagation delay based on the change in these two parameters is also listed..... 310

Table 8.3: Estimates of parameters that comprise the BiCMOS register file write access time using simulations without wire parasitics. These estimates assume the write address decoder can be connected directly to the write word line drivers. The listed percent increase is with respect to the simulation results determined for the bipolar register file.     312

Table 8.4: Summary of simulated propagation delays that comprise the total stage 1 and stage 2 read access propagation delays for the pipelined register file.............................................................................. 319

Table 8.5: Summary of simulated propagation delays that comprise the stage 1 and stage 2 write access propagation delays for the pipelined register file......................................................................................... 321

Table 8.6: Parameters that determine the pipelined register file clock period for write operations. The parameters were determined from simulations without wire parasitics...................................................... 325

Table 8.7: Summary of simulated propagation delays that comprise the stage 1 and stage 2 write access propagation delays for the pipelined register file. The critical path for the stage 2 critical path in this case involves the input data. 326

Table 8.8: Parameters that determine the pipelined register file clock period for write operations. The parameters were determined from simulations without wire parasitics that include the input data critical path. 327

Table 8.9: Summary of the propagation delays that comprise the stage 1 and stage 2 read after write propagation delays for the pipelined register file based on simulations without wire parasitics........................... 329

 


LIST OF FIGURES

 

Figure 1.1: CML buffer circuit............................................................................................ 11

Figure 1.2: Schematics for a 2-input AND function implemented in CML............................ 14

Figure 1.3: 2-input OR function implemented using a CML current tree............................... 15

Figure 1.4: Schematics for a 2-input XOR function implemented using CML....................... 16

Figure 1.5: Schematics for a D-latch implemented in CML.................................................. 17

Figure 1.6: Level 2 and level 3 ECL buffers........................................................................ 19

Figure 2.1: Register file block diagram................................................................................ 21

Figure 2.2: Three-port memory cell schematics................................................................... 23

Figure 2.3: Bit line driver schematics................................................................................... 23

Figure 2.4: A portion of the read address decoder schematics. Included is the stage one 2-bit decoder, a portion of the stage two decoder, and some of the read word line drivers............................................ 25

Figure 2.5: Schematics for the stage one 2-bit decoder CML buffer driving wired-OR input devices.    25

Figure 2.6: Schematics for the stage one 3-bit decoder CML buffer driving wired-OR input devices.    27

Figure 2.7: Schematics for a 2-input single-ended ECL NOR gate used in the read address decoders.  28

Figure 2.8: Read word line driver schematics...................................................................... 30

Figure 2.9: A portion of the write address decoder schematics. Included is the stage one 2-bit decoder, a portion of the stage two decoder, and some of the write word line drivers........................................... 31

Figure 2.10: Write enable circuit schematics....................................................................... 32

Figure 2.11: Schematics for an ECL NOR gate used in the second stage of the write address decoder. 33

Figure 2.12: Write word line driver schematics................................................................... 33

Figure 2.13: Schematics for a common-base sense amplifier............................................... 34

Figure 2.14: Sense amplifier schematics.............................................................................. 35

Figure 2.15: Propagation delay of the memory cell and sense amplifier as a function of the value of each sense amplifier bias current source............................................................................................................... 37

Figure 2.16: Steady state differential voltage across a pair of read bit lines as a function of the value of each sense amplifier bias current source........................................................................................................ 38

Figure 2.17: Output latch schematics.................................................................................. 41

Figure 2.18: Gate level schematics for a comparator circuit to determine whether a read address matches the write address and a write is enabled................................................................................................... 42

Figure 2.19: Current source implementation........................................................................ 44

Figure 2.20: Circuit used to estimate the current source output resistance............................ 45

Figure 2.21: Reference voltage generator schematics.......................................................... 49

Figure 2.22: Memory cell layout (north facing left).............................................................. 51

Figure 2.23: Register file layout.......................................................................................... 53

Figure 3.1: Test chip block diagram.................................................................................... 56

Figure 3.2: Read address counter gate level schematics....................................................... 58

Figure 3.3: Write address counter gate level schematics...................................................... 59

Figure 3.4: Data rotator gate level schematics..................................................................... 60

Figure 3.5: Write enable pulse generator schematics........................................................... 61

Figure 3.6: Schematics for a pad receiver with level 2 outputs............................................. 64

Figure 3.7: Schmitt trigger receiver schematics.................................................................... 65

Figure 3.8: Pad driver schematics....................................................................................... 68

Figure 3.9: Voltage-controlled delay element (VCDE) schematics....................................... 69

Figure 3.10: VCO analog frequency select pad receiver schematics.................................... 70

Figure 3.11: Voltage-controlled oscillator (VCO) schematics.............................................. 71

Figure 3.12: Register file test chip layout............................................................................. 73

Figure 3.13: Register file test chip pad arrangement............................................................. 74

Figure 3.14: Register file test chip die micrograph............................................................... 75

Figure 3.15: Second register file test chip pad arrangement................................................. 77

Figure 3.16: 8-bit data rotator/6-bit data LFSR gate level schematics.................................. 80

Figure 3.17: Ring oscillator schematics............................................................................... 80

Figure 3.18: Ring oscillator test chip layout (north facing right)............................................. 81

Figure 3.19: Ring oscillator test chip pad arrangement......................................................... 82

Figure 3.20: Ring oscillator test chip die micrograph (north facing right)............................... 83

Figure 3.21: High frequency voltage-controlled oscillator (HFVCO) schematics.................. 83

Figure 3.22: Illustration of XOR operations on the HFVCO VCDE output waveforms to produce frequency multiplication..................................................................................................................................... 84

Figure 3.23: Balanced 2-input XOR gate schematics.......................................................... 86

Figure 3.24: Medium frequency VCO schematics............................................................... 88

Figure 3.25: VCO test chip layout (north facing right)......................................................... 89

Figure 3.26: VCO test chip pad arrangement...................................................................... 89

Figure 3.27: VCO test chip die micrograph (north facing right)............................................ 90

Figure 4.1: IC as a function of Vid for a current switch......................................................... 94

Figure 4.2: Buffer chain used to determine the propagation delay of an ECL or CML buffer. 96

Figure 4.3: Propagation delay through a differential CML buffer as a function of the steady state current under various loading conditions using the 97 models.......................................................................... 97

Figure 4.4: Propagation delay through a differential CML buffer as a function of the steady state current under various loading conditions using the 99 models.......................................................................... 98

Figure 4.5: Propagation delay through a level 2 differential ECL buffer as a function of the steady state emitter follower current under various loading conditions using the 97 models.......................................... 99

Figure 4.6: Propagation delay through a level 2 differential ECL buffer as a function of the steady state emitter follower current under various loading conditions using the 99 models........................................ 100

Figure 4.7: Propagation delay through a level 3 differential ECL buffer as a function of the steady state emitter follower current under various loading conditions using the 97 models........................................ 102

Figure 4.8: Propagation delay through a level 3 differential ECL buffer as a function of the steady state emitter follower current under various loading conditions using the 99 models........................................ 103

Figure 4.9: Propagation delay (tplh) through a static CMOS inverter as a function of the NFET gate width under various loading conditions using the 97 models........................................................................ 104

Figure 4.10: Propagation delay (tphl) through a static CMOS inverter as a function of the NFET gate width under various loading conditions using the 97 models........................................................................ 105

Figure 4.11: Propagation delay (tplh) through a static CMOS inverter as a function of the NFET gate width under various loading conditions using the 99 models........................................................................ 106

Figure 4.12: Propagation delay (tphl) through a static CMOS inverter as a function of the NMOS gate width under various loading conditions using the 99 models........................................................................ 107

Figure 4.13: Level 2 buffer propagation delay as a function of the length of the wires the buffer must drive using the 97 models. The buffer also drives a single load buffer. Wire parasitics were computed using the analytical capacitance extraction method....................................................................................................... 112

Figure 4.14: Buffer propagation delay as a function of the length of metal 2 wire the buffer must drive using the 97 models. The buffer also drives a single load buffer. Wire parasitics are computed using the analytical capacitance extraction method....................................................................................................................... 115

Figure 4.15: Level 2 buffer propagation delay as a function of the length of the wires the buffer must drive using the 97 models. The buffer also drives a single load buffer. Wire parasitics were computed using the analytical RC extraction method....................................................................................................................... 116

Figure 4.16: Level 2 buffer propagation delay as a function of the length of the wires the buffer must drive using the 97 models. The buffer also drives a single load buffer. Wire parasitics were computed using the numerical RC and capacitance extraction methods................................................................................... 117

Figure 4.17: Buffer propagation delay as a function of the length of metal 2 wire the buffer must drive using the 97 models. The buffer also drives a single load buffer. Wire parasitics are computed using the analytical RC extraction method. 120

Figure 4.18: Buffer propagation delay as a function of the length of metal 2 wire the buffer must drive using the 97 models. The buffer also drives a single load buffer. Wire parasitics are computed using the numerical RC and capacitance extraction methods...................................................................................................... 121

Figure 5.1: Simulated waveforms showing the level 1 ring oscillator response at a pair of buffer output nodes within the ring and at the pad output.................................................................................................. 125

Figure 5.2: Simulated waveforms showing the level 2 ring oscillator response at a pair of buffer output nodes within the ring and at the pad output.................................................................................................. 125

Figure 5.3: Simulated waveforms showing the level 3 ring oscillator response at a pair of buffer output nodes within the ring and at the pad output.................................................................................................. 126

Figure 5.4: Measured waveforms from a ring oscillator test chip. They are the level 1, level 2, and level 3 ring oscillator output waveforms, respectively................................................................................... 126

Figure 5.5: Level 1 buffer propagation delay as a function of supply voltage using the 99 models.          129

Figure 5.6: Level 2 buffer propagation delay as a function of supply voltage using the 99 models.          130

Figure 5.7: Level 3 buffer propagation delay as a function of supply voltage using the 99 models.          131

Figure 5.8: HFVCOb simulation results without wire parasitics using the 99 models. The control voltage is 0.2 V.         134

Figure 5.9: HFVCOb simulation results using the 99 models and including capacitive wire parasitics. The control voltage is 0.2 V......................................................................................................................... 136

Figure 5.10: Simulated HFVCOb pad driver output waveforms with the frequency control voltage set to -1 V and 0.2 V, respectively, using the 99 models................................................................................. 137

Figure 5.11: Simulated HFVCOb pad driver output waveforms with the frequency control voltage set to -1 V and 0.9 V, respectively, using the 97 models................................................................................. 138

Figure 5.12: Measured HFVCOb pad driver output waveforms with the frequency control voltage set to -1 V and 1 V, respectively................................................................................................................ 139

Figure 5.13: HFVCOb output frequency as a function of the applied control voltage using the 97 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics.................................................................................................................... 142

Figure 5.14: HFVCOb output frequency as a function of the applied control voltage using the 99 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics.................................................................................................................... 143

Figure 5.15: Peak-to-peak amplitude of the HFVCOb output signal as a function of the applied control voltage using the 97 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics................................................................................ 144

Figure 5.16: Peak-to-peak amplitude of the HFVCOb output signal as a function of the applied control voltage using the 99 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics................................................................................ 145

Figure 5.17: HFVCOa output signal peak-to-peak amplitude as a function of the applied control voltage using the 97 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics................................................................................ 146

Figure 5.18: HFVCOa output signal peak-to-peak amplitude as a function of the applied control voltage using the 99 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics................................................................................ 147

Figure 5.19: Phase noise of HFVCOb as a function of the offset frequency from the carrier frequency, which is 12 GHz................................................................................................................................... 148

Figure 5.20: HFVCOb output signal frequency as a function of the supply voltage using the 97 models. Measured results for one VCO with a control voltage of 1 V is shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics, all receiving a control voltage of 0.9 V..................... 150

Figure 5.21: HFVCOb output signal frequency as a function of the supply voltage using the 99 models. Measured results for one VCO with a control voltage of 1 V is shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics, all receiving a control voltage of 0.2 V..................... 151

Figure 5.22: HFVCOb output signal peak-to-peak amplitude as a function of the supply voltage using the 97 models. Measured results for one VCO with a control voltage of 1 V is shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics, each receiving a control voltage of 0.9 V.     152

Figure 5.23: HFVCOb output signal peak-to-peak amplitude as a function of the supply voltage using the 99 models. Measured results for one VCO with a control voltage of 1 V is shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics, each receiving a control voltage of 0.2 V.     153

Figure 5.24: Results of the MFVCO simulations using the 99 models and including capacitive wire parasitics. The waveforms are two VCDE output pairs, the phase detector output pairs, and the pad driver output. The control voltage is 0.2 V...................................................................................................................... 154

Figure 5.25: Simulated MFVCO pad driver output waveforms using the 97 and 99 models with the control set to 0.9 V and 0.2 V, respectively............................................................................................... 155

Figure 5.26: Measured MFVCO pad driver output waveforms with the control set to –1.1 V and 1 V, respectively.     155

Figure 5.27: MFVCO output signal frequency as a function of the applied control voltage using the 97 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics.................................................................................................................... 158

Figure 5.28: MFVCO output frequency as a function of the applied control voltage using the 99 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics.................................................................................................................... 160

Figure 5.29: Peak-to-peak amplitude of the MFVCO output signal as a function of the applied control voltage using the 97 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics................................................................................ 161

Figure 5.30: Peak-to-peak amplitude of the MFVCO output signal as a function of the applied control voltage using the 99 models. Measured results for two VCO’s are shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics................................................................................ 162

Figure 5.31: Phase noise of the MFVCO as a function of the offset frequency from the carrier frequency, which is 6.0 GHz................................................................................................................................... 163

Figure 5.32: MFVCO output signal frequency as a function of the supply voltage using the 97 models. Measured results for one VCO with a control voltage of 1 V is shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics, all receiving a control voltage of 0.9 V..................................... 164

Figure 5.33: MFVCO output signal frequency as a function of the supply voltage using the 99 models. Measured results for one VCO with a control voltage of 1 V is shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics, all receiving a control voltage of 0.2 V..................................... 165

Figure 5.34: MFVCO output signal peak-to-peak amplitude as a function of the supply voltage using the 97 models. Measured results for one VCO with a control voltage of 1 V is shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics, each receiving a control voltage of 0.9 V.     166

Figure 5.35: MFVCO output signal peak-to-peak amplitude as a function of the supply voltage using the 99 models. Measured results for one VCO with a control voltage of 1 V is shown, as well as simulated results without wire parasitics, and with capacitive and RC wire parasitics, each receiving a control voltage of 0.2 V.     167

Figure 6.1: Results of RFTC VCO simulations using the 99 models and including capacitive wire parasitics. The frequency control voltage is 0.3 V............................................................................................... 172

Figure 6.2: Simulated RFTC pad driver output waveforms generated by the on-chip VCO using the 99 models. The on-chip VCO is simulated with the analog frequency control voltage set to -1 V and 0.3 V, respectively, and the digital frequency control voltage set high. Simulations include capacitive wire parasitics........... 173

Figure 6.3: Measured RFTC output pad driver waveforms generated by the on-chip VCO with the analog frequency control voltage set to –1.6 V and 1.3 V, respectively, and the digital frequency control voltage set high.       174

Figure 6.4: Simulated RFTC pad driver output waveforms generated by the on-chip VCO using the 99 models. The on-chip VCO is simulated with the analog frequency control voltage set to -1 V and 0.3 V, respectively, and with the digital frequency control voltage set low. Simulations include capacitive wire parasitics........... 175

Figure 6.5: Measured RFTC output pad driver waveforms generated by the on-chip VCO with the analog frequency control voltage set to –1.6 V and 1.3 V, respectively, and the digital frequency control voltage set low.        176

Figure 6.6: RFTC output signal frequency generated by the on-chip VCO as a function of the analog control voltage using the 97 models. Simulated results without wire parasitics, as well as with capacitive wire parasitics, are shown with the digital control voltage set high (fast) and with the digital control voltage set low (slow)... 179

Figure 6.7: RFTC output signal frequency generated by the on-chip VCO as a function of the analog control voltage using the 99 models. Simulated results without wire parasitics, as well as with capacitive wire parasitics, are shown with the digital control voltage set high (fast) and with the digital control voltage set low (slow)... 180

Figure 6.8: Results of RFTC clocking simulations with an external clock, using the 99 models and including capacitive wire parasitics. The voltage scale for the external clock waveform is 1 V/div........................ 182

Figure 6.9: RFTC read address counter A simulation results using the 99 models and including capacitive wire parasitics................................................................................................................................... 184

Figure 6.10: Measured RFTC output waveforms from wafer 2 with read address counters A and B selected for viewing, respectively................................................................................................................ 185

Figure 6.11: Measured RFTC output waveforms from wafer 2 with read address counters A and B selected for viewing, respectively................................................................................................................ 186

Figure 6.12: Measured RFTC output waveform from wafer 2 with read address counters A selected for viewing. The RFTC is clocked at a moderate frequency (1.4 GHz).................................................. 187

Figure 6.13: A measured RFTC output waveform from wafer 2 with read address counter B selected for viewing. At 5.41 GHz, the clock frequency is only 16 times greater than the most significant counter bit, which has a frequency of 338 MHz.......................................................................................................................... 187

Figure 6.14: RFTC read address counter A simulation results illustrating a counter malfunction. The simulation includes capacitive wire parasitics and uses the 99 models........................................................ 188

Figure 6.15: General timing diagram illustrating critical delays in a state machine that limit the maximum clock frequency that can be successfully applied.......................................................................................... 189

Figure 6.16: Simulated and measured maximum clock frequencies at which the read address counters will function properly as a function of the supply voltage............................................................................... 195

Figure 6.17: RFTC simulation results that illustrate shifting a bit through the read address counters in scan mode. The simulation includes capacitive wire parasitics and uses the 99 models............................ 198

Figure 6.18: Measured RFTC output waveforms from wafer 1 with read address counter B selected for viewing. The RFTC is in scan mode and a waveform approximately half the frequency of the scan clock is driving the Shift In pad................................................................................................................................... 199

Figure 6.19: RFTC write address counter simulation results using the 99 models and including capacitive wire parasitics extracted from the wafer 1 layout................................................................................ 201

Figure 6.20: Measured RFTC output waveforms from wafer 1 and wafer 2 with the write address counter selected for viewing. The wafer 1 waveform is erroneous in that its frequency is double the expected value while the wafer 2 waveform appears to be correct.................................................................................. 202

Figure 6.21: RFTC write address counter simulation results illustrating a counter malfunction. The simulation includes capacitive wire parasitics extracted from the wafer 2 layout and uses the 99 models..... 204

Figure 6.22: Simulated and measured maximum clock frequencies at which the wafer 2 write address counter functions properly as a function of the supply voltage.................................................................. 211

Figure 6.23: RFTC simulation results that illustrate shifting a bit through the write address counter and data rotator in scan mode. The simulation includes capacitive wire parasitics and uses the 99 models.......... 212

Figure 6.24: Measured RFTC output waveforms from wafer 1 with the write address counter selected for viewing. The RFTC is in scan mode and a waveform approximately half the frequency of the scan clock is driving the Shift In pad................................................................................................................................... 213

Figure 6.25: RFTC data rotator simulation results using the 99 models and including capacitive wire parasitics extracted from the wafer 1 layout............................................................................................... 216

Figure 6.26: RFTC data rotator simulation results illustrating a rotator malfunction. The simulation includes capacitive wire parasitics and uses the 99 models................................................................................ 218

Figure 6.27: Simulated maximum clock frequencies at which the data rotator will function properly as a function of the supply voltage............................................................................................................. 223

Figure 6.28: Measured RFTC output waveforms from wafer 1 with the data rotator selected for viewing. The RFTC is in scan mode and a waveform approximately half the frequency of the scan clock is driving the Shift In pad.    224

Figure 6.29: RFTC data LFSR simulation results using the 99 models and including capacitive wire parasitics extracted from the wafer 2 layout............................................................................................... 227

Figure 6.30: Measured RFTC output waveforms from wafer 2 with the data LFSR selected for viewing.          228

Figure 6.31: RFTC data LFSR simulation results illustrating a LFSR malfunction. The simulation uses the 99 models and includes capacitive wire parasitics extracted from the wafer 2 layout............................. 229

Figure 6.32: A measured RFTC output waveform from wafer 2 with the data LFSR selected for viewing. The waveform exhibits signs that the clock frequency is too high for correct operation of the LFSR..... 230

Figure 6.33: Simulated and measured maximum clock frequencies at which the data LFSR will function properly as a function of the supply voltage...................................................................................... 236

Figure 6.34: RFTC simulation results that illustrate shifting a bit through the write address counter and data LFSR in scan mode. The simulations include capacitive wire parasitics and use the 99 models............ 237

Figure 7.1: Wafer 2 register file read port A read access simulation results using the 97 models and including capacitive wire parasitics. The scale for the word line currents is 50 mA/div while the scale for the bit line currents is 1 mA/div................................................................................................................................... 246

Figure 7.2: Wafer 1 RFTC output signals with register file read ports A and B selected for viewing. The shortest pulse widths for the port A signal are 600 ps, indicating a 300 ps read access time for this column. The shortest pulse widths for the port B signal are 660 ps, indicating a 330 ps read access time for this column.... 248

Figure 7.3: Wafer 2 RFTC output signal with register file read port A selected for viewing. The left-hand figure indicates a correctly read 32-bit pattern from one of the columns. The shortest pulse widths are 580 ps, indicating a 290 ps read access time for the selected column. The right-hand figure demonstrates errors in reading out the 32-bit pattern from the same column using a slightly higher clock frequency..................................................... 249

Figure 7.4: Wafer 2 RFTC output signal with register file read port B selected for viewing shown with different time scales. The shortest pulse widths are 630 ps, indicating a 320 ps read access time for the selected column. 250

Figure 7.5: Simulated and measured wafer 1 register file port A column read access times as a function of the supply voltage....................................................................................................................... 259

Figure 7.6: Simulated and measured wafer 2 register file port A column read access times as a function of the supply voltage....................................................................................................................... 260

Figure 7.7: Simulated and measured wafer 1 register file port B column read access times as a function of the supply voltage....................................................................................................................... 261

Figure 7.8: Simulated and measured wafer 2 register file port B column read access times as a function of the supply voltage....................................................................................................................... 262

Figure 7.9: Wafer 2 register file write access simulation results using the 97 models and including capacitive wire parasitics. The scale for the word line currents is 50 mA/div......................................................... 264

Figure 7.10: Timing diagram for an optimistic register file write access. Values were determined through simulations using the 97 models and including capacitive wire parasitics.................................................. 269

Figure 7.11: Timing diagram for a conservative register file write access. Values were determined through simulations using the 97 models and including capacitive wire parasitics.................................................. 271

Figure 7.12: Wafer 2 register file read port A read after write access simulation results using the 97 models and including capacitive wire parasitics............................................................................................. 280

Figure 7.13: Wafer 2 RFTC output signals during a read after write operation. The left-hand figure is a 63 bit pattern read out of one of the columns through read port A while in write mode. The right-hand figure is the match signal for port A, indicating that a read after write operations are occurring. The shortest pulse widths for the left hand signal are 680 ps, indicating a 340 ps read after write access time for the selected column........................ 282

Figure 7.14: Wafer 2 RFTC output signal with register file read port B selected for viewing. The left-hand figure illustrates a correctly read 63-bit pattern from one of the columns. The shortest pulse widths are 740 ps, indicating a 370 ps read after write access time for the selected column. The right-hand figure illustrates errors in reading out the 63-bit pattern from the same column using a slightly higher clock frequency........................................ 283

Figure 8.1: Timing diagram illustrating a potential problem in accurately sampling the register file output data to determine the read access time using the RFTC........................................................................... 300

Figure 8.2: Wafer 1 RFTC output signal with register file read port A selected for viewing. The left-hand figure illustrates a correctly read 32-bit pattern from one of the columns. The shortest pulse widths are 190 ps, which would seem to indicate a 96 ps read access time for the selected column. The right-hand figure illustrates errors in reading out the 32-bit pattern from the same column using a slightly higher clock frequency........................ 301

Figure 8.3: Wafer 1 RFTC output signal with register file read port A selected for viewing. The left-hand figure illustrates a correctly read 32-bit pattern from one of the columns. The shortest pulse widths are 660 ps, which indicates a 330 ps read access time for the selected column. The right-hand figure illustrates errors in reading out the 32-bit pattern from the same column using a slightly higher clock frequency................................................ 302

Figure 8.4: CMOS single-port static memory cell schematics............................................ 304

Figure 8.5: CMOS three-port static memory cell schematics............................................. 306

Figure 8.6: Schematics for a 2-input single-ended ECL NOR gate used in the read address decoders of a BiCMOS register file.................................................................................................................. 308

Figure 8.7: Diagram illustrating a two-stage pipelining scheme for the register file............... 314

Figure 8.8: Schematics for a latch with a 2-input single-ended CML NOR input. This latch is the master of a master-slave latch for the second stage of either address decoder of the pipelined register file........... 315

Figure 8.9: Schematics for a latch with a single-ended level 3 output. This latch is used as the slave in a master-slave latch for the second stage of the read address decoder for the pipelined register file.............. 316

Figure 8.10: Schematics for a latch with a differential level 3 output. This latch is used as the slave in a master-slave latch for the second stage of the write address decoder for the pipelined register file.................. 316

Figure 8.11: Timing diagram illustrating operation of the pipelined register file..................... 318

Figure 8.12: Relationship between the input data and pipeline clock for a write operation with optimistic assumptions using the pipelined register file. Results are simulated using the 99 models without wire parasitics. 324

Figure 8.13: Relationship between the input data and pipeline clock for a write operation with conservative assumptions using the pipelined register file. Results are simulated using the 99 models without wire parasitics.     328

Figure A.1: Schematics for a 2-input AND function implemented in CML. QD is used to lower the voltage across Q2b when it is cut off to prevent breakdown of the device............................................................ 340

Figure A.2: Schematics for a 2-to-1 multiplexer implemented in CML............................... 340

Figure A.3: Schematics for a 3-input AND function implemented in CML......................... 341

Figure A.4: Schematics for a D-latch with a 2-to-1 multiplexer to select the D-input implemented in CML.       341

Figure A.5: Schematics for a 2-to-1 multiplexer with a 2-input XOR function at one of the multiplexer inputs. The circuit is implemented in CML.................................................................................................. 342

Figure A.6: Schematics for a 2-input XOR function with a 2-input AND function at one of the XOR function inputs. The circuit is implemented in CML..................................................................................... 342

Figure A.7: 4-to-1 multiplexer implemented using a CML current tree............................... 343

Figure B.1: Sense amplifier layout (north facing right)........................................................ 344

Figure B.2: Write bit line driver layout (north facing left).................................................... 344

Figure B.3: Output master latch layout (north facing right)................................................. 345

Figure B.4: Output slave latch layout (north facing right).................................................... 345

Figure B.5: CML buffer layout (north facing right)............................................................ 346

Figure B.6: stage one 2-bit decoder CML buffer.............................................................. 346

Figure B.7: Stage one 3-bit decoder CML buffer layout (north facing right)....................... 347

Figure B.8: Second stage of the read address decoder and read word line driver layout for the A and B read ports.      347

Figure B.9: Write enable circuit layout.............................................................................. 348

Figure B.10: Second stage of the write address decoder and write word line driver layout. 348

Figure B.11: Typical layout for a reference voltage generator for a 1 mA current source.... 349

Figure B.12: Single-ended signal reference voltage generator layout.................................. 349

Figure B.13: ECL buffer layout........................................................................................ 350

Figure B.14: Layout for a 2-input AND function implemented in ECL............................... 350

Figure B.15: Layout for a 2-input XOR function implemented in ECL............................... 351

Figure B.16: Layout for a D-latch with level 1, level 2, and level3 outputs implemented in ECL.            351

Figure B.17: Voltage-controlled delay element (VCDE) layout......................................... 352

Figure B.18: Balanced 2-input XOR gate layout............................................................... 352

Figure B.19: Layout for a 2-input AND function implemented in CML.............................. 353

Figure B.20: Layout for a 2-to-1 multiplexer implemented in CML................................... 353

Figure B.21: Layout for a 3-input AND function with level-3 outputs implemented in ECL. 354

Figure B.22: Layout for a D-latch with a 2-to-1 multiplexer to select the D-input implemented in CML.            354

Figure B.23: Layout for a 2-to-1 multiplexer with a 2-input XOR function at one of the multiplexer inputs. The circuit is implemented in CML.................................................................................................. 355

Figure B.24: Layout for a 2-input XOR function with a 2-input AND function at one of the XOR function inputs. The circuit is implemented in CML..................................................................................... 355

Figure B.25: Layout for a 4-to-1 multiplexer with level-2 outputs implemented in ECL...... 356

Figure B.26: Layout for a pad receiver with level 2 outputs............................................... 356

Figure B.27: Schmitt trigger receiver layout...................................................................... 357

Figure B.28: Pad driver layout......................................................................................... 357

Figure B.29: VCO analog frequency select pad receiver schematics.................................. 358

 


ACKNOWLEDGEMENTS

 

            I would like to thank the Defense Advanced Research Project Agency (DARPA) for sponsoring my research by providing space on two fabrication runs of a SiGe HBT BiCMOS technology from International Business Machines (IBM). I would also like to thank DARPA, Rensselaer Polytechnic Institute (RPI), Intel Foundation, IBM, and Sierra Monolithics, Inc. for their support of my education in the form of research assistantships, teaching assistantships, and fellowships. I would also like to thank my thesis advisor, John F. McDonald, the other members of my doctoral committee, Kenneth Rose, Gary J. Saulnier, Toh-Ming Lu, and Eugene J. Rymaszewski, my research colleagues, Hans Greub, Robert Philhower, Kyung-Suc Nah, James Loy, Xin Zhang, John Van Etten, Clifford Maier, Atul Garg, Peter Campbell, Steven Carlough, Matthew Ernest, Thomas Krawczyk, Liyong Wang, and Peter Curran, and the faculty and staff of the Electrical, Computer, and Systems Engineering Department for their support and assistance. Finally, I would like to thank my family and friends for their support throughout my stay here at RPI. This work is dedicated to the memory of my brother, Joseph R. Steidl.


ABSTRACT

 

            A 32-word by 32-bit bipolar register file with 2 read ports and 1 write port is described. This register file was implemented using a SiGe heterojunction bipolar transistor (HBT) BiCMOS technology. This technology supports an HBT with an fT of 48 GHz and an fmax of 69 GHz. A test chip was designed to determine the on-chip register file performance in a pipelined system. Two iterations of the design were fabricated. The two 5-bit counters on the test chip used to generate read addresses operated using average maximum clock frequencies of 4.3 GHz and 5.1 GHz on the first iteration, and 5.8 GHz and 5.0 GHz on the second iteration. The 5-bit counter on the test chip used to generate write addresses operated using an average maximum clock frequency of 4.0 GHz on the second iteration, but did not operate correctly on the first iteration. The 6-bit linear feedback shift register (LFSR) on the test chip used to generate input data operated using an average maximum clock frequency of 5.5 GHz on the second iteration.

            The best measured die on the first iteration has a read access time of 330 ps for port A, while the port B read access time is 350 ps, based on four measured columns for each port. The write access time for this register file is unknown, while the estimated power dissipation is 6.8 W using a 5 V supply. Some of the column read access times were much lower than the worst case column access time for a particular die, however, such as the 280 ps read access time on one of the columns of the best measured die on the first iteration. The best measured die on the second iteration has a read access time of 350 ps for port A, while the port B read access time is 360 ps, again based on four measured columns for each port. This die has a read after write access time of 320 ps for port A, and a read after write access time of 350 ps for port B. The write access time for this register file is 340 ps, with a write enable pulse width of 170 ps, while the estimated power dissipation is 4.7 W using a 4.5 V supply. Some of the column read access times were also much lower than the worst case column access time for a particular die on the second iteration, such as the 290 ps read access time on one of the columns of another measured die on the second iteration.

            For the average maximum clock frequency of the read address counters, the simulation results varied between 13% and 35% of the measured results on the first iteration, while on the second iteration, the simulation results varied between 0.3% and 25% of the measured results, depending on the counter and which models were used. For the average maximum clock frequency of the write address counter, the simulation results varied between 44% and 57% of the measured results on the second iteration, depending on which models were used. For the average maximum clock frequency of the data LFSR, the simulation results varied between 52% and 72% of the measured results on the second iteration, again depending on which models were used. For the average read access times, simulation results on the first iteration of the register file were within 8% of the measured results for both ports, while on the second iteration, the simulation results were within 13% of the measured results for both ports. For the average read after write access times, simulation results on the second iteration of the register file varied between 16% and 57% of the measured results, depending on which models and assumptions were used. For the average write access times, simulation results on the second iteration of the register file varied between 6% and 62% of the measured results, also depending on which models and assumptions were used. For the average minimum write enable pulse width, simulation results for the second iteration of the register file varied between 24% and 69% of the measured results, again depending on which models and assumptions were used.

 


 

Chapter 1                                                         Introduction and Historical Review

 

Memory design is an important aspect in the creation of complex digital circuits. Memory circuits provide a means of storing data that is important in the flow of digital computations. This data includes input values, intermediate results, final results, state information, as well as many other forms of data important for specific digital calculations. Large memory circuits provide a means of storing data in a much smaller area than a number of circuits designed to store single bits of data. A register file is a memory structure that serves to provide storage for the registers describe in the architecture of a particular computer processor. These registers store results of arithmetic and logical computations, memory addresses, and other state information necessary for the operation of the programs running on the processor. Hence, a register file is a type of scratch pad for information that is needed by the processor to execute a particular series of instructions. To prevent the register file from limiting the performance of the processor, the register file must be large and yet be able to access data as quickly as possible. Since the access time of a memory structure tends to be inversely proportional to its size, these two goals are diametrically opposed. Therefore, designing a memory structure that is both large and has a fast access time to improve computer processor performance is a challenge that faces designers.

1.1        Introduction

            A register file is a memory structure that stores the values of the registers used in a computer processor. These registers serve as a temporary storage location for data that is being processed by the processor. This data includes intermediate and final results computed by the processor, data from the cache being used in these computations, memory addresses of data being accessed by the processor, and data for specialized hardware operations specific to certain registers. Use of these registers is fundamental to achieving high performance from a processor because the registers can be accessed in a very short, well defined, period of time. Access to main memory, on the other hand, tends to be much slower and more unpredictable, depending on how many levels of cache need to be accessed to obtain or store the desired data. Therefore, typically, data is loaded into registers from main memory, computations are performed on this data, with intermediate, and then final results written into the register file, and finally, the results are stored in main memory. Being able to access data from the register file at fast rates is critical in designing a processor with good performance. Depending on the design, the processor speed may even be limited by the register file access times.

            Typically, a register file is designed as a static random access memory (SRAM). A SRAM is advantageous over a dynamic random access memory (DRAM) in that the SRAM access time is typically faster than that of the DRAM. This is because active devices are used to set the bit lines in a SRAM design, while the bit lines are set using a stored charge on a capacitor in a DRAM design. In addition, a SRAM memory cell is designed to store a value indefinitely, while charge leaks from the storage capacitor of a DRAM memory cell, which, therefore, needs to be periodically refreshed. The refresh cycle may impact the processor’s ability to access data when required, which would complicate the processor design and slow down its performance.

            The goal of this research project is to design a register file for use in simple reduced instruction set computing (RISC) processors operating at high frequencies, and other high performance systems. Even in the simplest of RISC processors, arithmetic operations typically require three register file accesses. Two of these accesses are required to read the operands for a calculation from the register file, while the third access is required to write the results back into the register file. One would like to fetch both operands simultaneously to minimize the cycle time. Also, since RISC processors are typically pipelined to improve performance, one would like to write the result of a previous instruction into the register file simultaneously with the access of the operands for a more recent instruction. In this way, one avoids delays associated with waiting for the write operation to complete before the read accesses can occur. Therefore, it was determined that the register file should have two read ports and one write port that operate simultaneously. The register file was chosen to be 32 bits wide to operate within a 32-bit processor. A depth of 32 words was decided for the register file since this is typical for current 32-bit RISC processors [1]. An SRAM design was chosen to provide higher performance and avoid the necessity of refreshing the memory.

1.2        Technology

            The register file was implemented using a SiGe heterojunction bipolar transistor (HBT) BiCMOS technology [2]-[4]. The HBT in this technology was reported to have a unity gain frequency (fT) of 48 GHz and a unity power gain frequency (fmax) of 69 GHz [2]. Reported stage delays of ECL ring oscillators using the SiGe HBT reached as low as 17.5 ps with a switch current of 2.2 mA [2]. An average yield of 78.1% was reported on a yield array of 4000 HBT’s [3].

            The primary reason for choosing an HBT technology over a standard bipolar technology is that the emitter-base heterojunction in the HBT is designed to improve the performance of the HBT. This is because the two materials forming the emitter-base heterojunction have different bandgaps, indicating that the two materials have different intrinsic carrier concentrations as well. Therefore, assuming the dopant concentrations in the base and emitter are similar to that of a standard bipolar device, and also assuming the density of states in the conduction and valence bands of the two materials forming the heterojunction are similar, the device current gain (b) of the HBT will increase as an exponential function of the difference in the bandgaps of the two semiconductor materials forming the heterojunction [5]. Depending on the how much the two material bandgaps differ, however, the current gain of the HBT may become limited by the base transport factor [6]. For this as well as other reasons, it is usually prudent to trade off the improvement in device beta for other performance improvements by increasing the dopant level in the base of an HBT (often such that it is greater than the dopant level in the emitter). Even under these conditions, the HBT still exhibits a reasonably high current gain. However, the increase in the base dopant levels decreases the base resistance and increases the Early voltage of the HBT. This results in faster device switching speeds and less current variation as a function of the applied base and collector voltages.

            The processing steps to fabricate a SiGe HBT used by IBM are summarized in [4]. The SiGe HBT collector is made by first forming an n+ subcollector on a p+ substrate with a p- epitaxial layer, and then growing an n- epitaxial layer. A SiGe alloy epitaxial layer is grown using a UHV/CVD process step to form the base of the HBT. The extrinsic base region is implanted in a manner such that it is self aligned to the intrinsic base region. A polysilicon layer is then deposited and doped to form the emitter. The SiGe layer in the HBT is graded, resulting in a reduction in the bandgap of the base from emitter to collector. This is advantageous in that it creates an electric field in the base that accelerates the electrons, reducing the base transit time. In this way the switching performance of the SiGe HBT is improved.

            The CMOS devices provided in the IBM technology feature a minimum drawn gate length of 0.5 mm, resulting in an effective gate length of 0.35 mm [2]. The devices are fabricated on a p-substrate, using n-wells for PMOS devices. They are formed using advanced techniques such as the incorporation of nitride sidewalls and Ti-salicide over polysilicon gates to improve device performance [4]. The reported CMOS ring oscillator stage delay was 95 ps with a 3.3 V supply [2].

L [mm]

W [mm]

ID [mA]

VDS [V]

fT [GHz]

fmax [GHz]

Ref.

0.14

20x10

6

-

40

-

8

0.25

5x10

9

2.5

40

38

9

0.2

100

125

2

23

46

10

0.5

-

-

2.5

25

66

11

0.5

100

7

2.5

25

60

12

0.75

10x24

10

1

12.9

30

13

Table 1.1: Performance summary of some current Si CMOS technologies.

            To justify the choice of the IBM SiGe HBT BiCMOS technology for use in RISC processor design as well as in other high speed digital systems, it is necessary to compare this technology with current Si CMOS technologies. This is because CMOS technologies are used in the vast majority of these types of applications. The performance of several modern Si CMOS technologies [7]-[13] are listed in Table 1.1. The measured fT of the chosen SiGe HBT is 20% faster than the highest measured fT of any of the listed CMOS processes, and the measured fmax of the SiGe HBT is 4.5% faster than the measured fmax of any of the listed CMOS technologies. Therefore, despite the fact that the chosen SiGe HBT BiCMOS process was developed before most of the listed CMOS processes, the figures of merit indicate that chosen SiGe HBT is competitive in terms of switching performance with all the listed CMOS technologies.

Device

L [mm]

W [mm]

ID [mA]

VDS [V]

fT [GHz]

fmax [GHz]

Ref.

HEMT

0.03

100

44

0.9

350

-

14

HEMT

0.1

2x50

-

-

195

-

15

HEMT

0.8

100

55

3

40

80

16

MESFET

0.06

150

41

1.1

168

-

17

MESFET

0.11

100

23

1.5

123

-

18

MESFET

0.15

150

-

1.5

72

-

19

Table 1.2: Performance summary of some current InP HEMT and GaAs MESFET technologies.

            It is also beneficial to compare the performance of the chosen SiGe HBT BiCMOS technology with FET technologies other than MOS. The fT and/or fmax of some modern InP HEMT and GaAs MESFET processes [14]-[19], listed in Table 1.2, illustrate that the performance of modern compound semiconductor devices typically exceed the performance of the HBT’s in the chosen SiGe technology. However, there are many disadvantages with compound semiconductor technologies. The main disadvantage with compound semiconductor technologies is that the material properties of these compounds make processing more difficult [20], typically resulting in lower device yield. Although yield numbers are not provided in [14]-[19], it is reasonable to assume that the HEMT yields found in the InP technologies are far lower than the HBT yields found in the chosen SiGe technology. Also, the MESFET yields found in the GaAs technologies are at best equal to the HBT yields found in the chosen SiGe process. The chosen SiGe HBT BiCMOS technology provides Si MOSFET yields of over 1 million devices [21], however, allowing the design of much more complex systems using these devices to implement the less speed critical components. A secondary disadvantage of compound semiconductor processes involves manufacturing costs. This is due in part to the fact that these processes are typically produced on 3-in or 4-in wafers [17]-[20], while the chosen SiGe process is produced on 8-in wafers [3]. Also, epitaxial layers in the InP HEMT’s are often grown using MBE or VPE processes [15],[16], while the SiGe layer in the chosen SiGe process is grown using a UHV/CVD process [3].

Tech.

L [mm]

W [mm]

IC [mA/mm2]

VCE [V]

fT [GHz]

fmax [GHz]

Ref.

SiGe

0.5

2.5

1.6

-

48

69

2

SiGe

0.35

3.55

4.83

-

130

-

22

SiGe

0.14

1.5

7.14

1

92

108

23

SiGe

1

1x4

1.25

2

50

90

24

SiGe

0.5

10

1.8

2.5

46

71

25

SiGe

0.2

1.7

0.53

1

40

70

26

Si

0.28

10

2.14

3

51

60

27

Si

0.4

1.2

1.25

2.5

52

33

9

Si

0.2

0.7

2.14

1.0

40

-

28

Si

0.2

1.0

-

-

25

-

29

Si

0.35

6

0.9

3

24

50

30

InP

2

10

2

-

235

-

31

InP

1

3

2.7

-

215

195

32

InP

0.8

8

2

1.25

165

140

33

InP

1.5

30x4

0.6

2

88

140

34

GaAs

1.6

4.6

1.63

1.25, 3

102

224

35

GaAs

2

2

0.6

-

65

75

36

Table 1.3: Performance summary of some current bipolar and BiCMOS technologies.

            It is also beneficial to compare the performance of the chosen SiGe HBT BiCMOS technology with other HBT technologies. The performance of some modern HBT technologies [2],[22]-[36] are listed in Table 1.3. From this data, one finds that the HBT performance of the chosen SiGe technology matches or exceeds the performance of the BJT’s in the Si technologies. This is expected due to the bipolar device performance enhancements that are possible with an HBT structure, as discussed above. The HBT performance of the chosen SiGe technology does not exceed the HBT performance of any of the listed compound semiconductor technologies. These compound semiconductor technologies share the same drawbacks as the compound semiconductor FET technologies, however. When compared with other SiGe technologies, the chosen SiGe technology is also not the best technology in terms of performance. However, only one of the listed SiGe technologies other than the chosen SiGe technology is a BiCMOS technology [25], and the performance of its HBT’s is comparable to that of the HBT’s in the chosen SiGe technology. This does not mean that it is impossible to incorporate CMOS devices into the other SiGe technologies, but the HBT performance in these technologies may suffer as a result. Finally, it must be conceded that not all the technologies listed in Table 1.1 through Table 1.3 are available for commercial use. This limited the technology choices for use in implementing the register file.

1.3        Historical review of memory designs

            Because of the importance of memory circuits in digital applications, research has been conducted on numerous memory designs of various sizes, access capabilities, and using a number of different technologies. The results of these research projects can be compared to determine the effect that these factors have on the performance of a memory circuit. Since the register file design described in this document is a type of SRAM, the focus of the comparisons among previous research efforts will only include SRAM designs. The key parameters of several SRAM designs [37]-[52], are listed in Table 1.4.

            One of the most competitive designs found, implemented in a Si BiCMOS technology [37], had a reported access time of 0.3 ns. To maximize performance, only bipolar devices were used in the read access critical path. This explains the disparity between the read and write access times for this design. A slow write access time with respect to the read access time for the register file reported in this document is not desirable since the write access time will then limit the speed of the processor for which it was designed. Significantly slower performance than in the aforementioned SRAM design was reported in other listed Si BiCMOS designs, as Table 1.4 illustrates. Some of these designs, however, provide much greater storage capacity. The design most similar in terms of functionality to the proposed register file is a Si BiCMOS 32-word by 32-bit three-port register file [41],[42]. Although this design uses a technology with feature sizes similar to that of the SiGe HBT BiCMOS technology used in the register file design described here (0.6 mm gate length for the MOS devices), the register file design described here achieves faster access times (See Chapter 7). The Si Bipolar technologies listed in Table 1.4 have access times on the same order of magnitude as the Si BiCMOS designs. These SRAM’s were designed using older technologies, however, and some have a larger storage capacity as well, which explains in part why they were unable to outperform the BiCMOS designs.

Technology

Size

Ports

Read time

Write time

Ref.

Si BiCMOS

32 x 20 b

1 ro, 1 wo

0.3 ns

< 1 ns

37

Si BiCMOS

8K x 9 b

1 rw

0.65 ns

0.80 ns

38

Si BiCMOS

2K x 32 b

1 rw

0.9 ns

-

39

Si BiCMOS

2K x 16 b

1 rw

1.0 ns

3 ns

40

Si BiCMOS

32 x 32 b

2 ro, 1 wo

1.3 ns

< 1.0 ns

41, 42

AlGaAs/GaAs HBT

32 x 8 b

1 rw

0.22 ns

-

43, 44

Josephson Junction.

256 x 4 b

1 rw

0.5-0.52 ns

-

45

Si CMOS

16 x 64 b

2 ro, 1 wo

0.64 ns

< 1.6 ns

46

GaAs MESFET

32 x 32 b

1 rw

1.0-2.3 ns

1.0 ns

47

GaAs HEMT

8K x 8 b

1 rw

1.2 ns

< 1 ns

48

Si Bipolar

256 x 4 b

1 rw

0.85 ns

1 ns

49

Si Bipolar

512 x 10 b

1 rw

0.85 ns

-

50, 51

Si Bipolar

1K x 16 b

1 rw, 1 ro

1.2 ns, 0.95 ns

2.7 ns

52

Table 1.4: Performance summary of some current SRAM designs. The write access time is typically reported as the time the write enable pulse must be asserted.

            Si CMOS is another technology used to implement a high performance register file, as illustrated in the design of a 16-word by 64-bit three port register file [46] summarized in Table 1.4. This design has an access time that rivals most of the BiCMOS designs listed in Table 1.4, although many of the BiCMOS designs have a much higher storage capacity. Yet other designs were implemented using compound semiconductor technologies, such as an AlGaAs/GaAs HBT 32-word by 8-bit register file [43],[44]. This design has a measured read access time of 0.22 ns, which is the fastest reported access time found. However, the AlGaAs/GaAs register file has a very small storage capacity (one fourth that of the register file described here), and only one port. Other GaAs technologies used to implement SRAM’s include a MESFET technology [47] and a HEMT technology [48]. Although the access times of the listed HEMT and MESFET SRAM’s are comparable, the storage capacity of the HEMT SRAM is much larger than that of the MESFET SRAM. The access times of both SRAM’s, however, are not especially fast in comparison with access times achieved using other technologies. A Josephson junction technology was also used to implement a 256-word by 4-bit SRAM with a read access time of 0.5 ns [45]. Although the access time of this SRAM was significantly faster than that of any other previously reported SRAM’s at the time, subsequent developments in other semiconductor technologies have allowed the design of SRAM’s with similar read access times.

1.4        ECL and CML circuit design

            A number of different digital circuit families are available for use in bipolar technologies. The most well known families include resistor-transistor logic (RTL), transistor-transistor logic (TTL), integrated injection logic (IIL), emitter-coupled logic (ECL), and current-mode logic (CML). One of the most important factors that influence the choice of a circuit family for a particular design is switching speed. Two important factors that determine the switching speed of a particular circuit family are the speed at which devices within the circuit can be switched and the speed at which the circuit can drive its load between logic levels. In a bipolar device, conduction of current from emitter to collector is facilitated by the charge that accumulates in the base region in excess of the thermal equilibrium value. Therefore, the switching speed of a bipolar device is governed by the amount of excess charge that must be transferred between the base region of the device and its surrounding environment, as well as the speed at which this charge can be displaced. Most of the above mentioned digital circuit families function by allowing one or more of the bipolar devices to saturate in one or both of the logic states. This is disadvantageous in that a significantly larger amount of charge must accumulate in the base region of a bipolar device to cause saturation. This is because in saturation both the base-emitter and base-collector diodes are forward biased, producing an excess accumulation of charge in the base region for both diodes. For this reason, circuit families that allow devices to saturate tend to have slower switching speeds because of the increased amount of charge that must be transported to and from the base region of the devices to switch the devices on and off. ECL and CML circuits require the transistors to operate only in the forward active and cut-off regions, however, thereby requiring a greatly reduced amount of charge transport to switch the devices on and off. In addition, the output voltage swing of an ECL or CML circuit required to produce valid input levels for another such circuit is generally small relative to the voltage swing of other logic families. This is advantageous in that, because of the smaller voltage swing, only a relatively small amount of charge must be transported between the driver and the load to produce the required swing of the driver output voltage that signals a change in the logic value of the driver. Therefore, the time required to change the output voltage level will be shorter than in a comparable circuit with a larger output signal swing. Because of their comparatively fast switching speeds, ECL and CML circuits were chosen as the basis for the design of the register file.

            ECL and CML circuits are based on the core circuit shown in Figure 1.1, which can be referred to as a CML buffer [5],[53]-[61]. CML circuits differ from ECL circuits in that a CML circuit has no emitter followers driving the output nodes. Instead, the output nodes are at the collector resistors. The CML circuit in Figure 1.1 operates by steering the current drawn by the current source through one or both of the active devices in the circuit. The majority of this current is drawn through the resistors attached to the collectors of the devices, producing a voltage drop between the resistors proportional to the difference in current drawn by the transistors. In this way, the amount and proportion of current flowing through the devices is translated into a set of voltage levels that appear at the circuit outputs.

Figure 1.1: CML buffer circuit.

            The circuit in Figure 1.1 is generally driven in such a way that, in steady state, the vast majority of the current from the current source flows through only one of the two devices, while the other remains essentially cut off [5],[53]-[61]. For this reason, the two devices are often referred to as a current switch. Assuming the collector resistors are sized properly, this type of operation results in a significant voltage difference between the two output nodes of the circuit. Usually, the higher of the two voltage levels is defined as true while the lower of the two voltage levels is defined as false. Regardless of the complexity of the circuit, both ECL and CML circuits are designed so that only significant current is conducting through one of the two collector resistors in steady state. Therefore, the voltage levels of the two output nodes will always be at extreme values. This implies that both the logical output and its complement are both available simultaneously in ECL and CML circuits.

            ECL and CML circuits can either be driven in a single-ended or differential manner [55]-[61]. The difference between the two driving techniques lies in the way input voltages are applied to the base nodes of the devices in a current switch. In the case of single-ended ECL and CML circuits, one of the device base nodes (such as node Ib in Figure 1.1) is tied to a reference voltage, while the other base node is connected to one of the two outputs of another ECL or CML circuit. The choice of the output node depends on whether or not the signal is to be inverted. The output voltage levels of a single-ended ECL or CML driver are designed such that the voltage level corresponding to true is significantly greater than the reference voltage while the voltage level corresponding to false is significantly less than the reference voltage. Since the emitters of the two current switch transistors are coupled, the difference in voltage between the applied signal and the reference voltage becomes the difference in voltages applied across the base-emitter junctions of the devices. The total current through both devices must equal the current drawn by the current source. Given the exponential relationship between the collector current (IC) and the base-emitter voltage (VBE) found in bipolar devices [6], only a small difference in voltage levels between the base inputs of a pair of transistors with coupled emitters is required to force the majority of the current drawn by the current source to flow through only one of the two devices. Therefore, given that the input signal voltage generated by the circuit driving the current switch differs significantly from the reference voltage in either direction (depending on the logic value being transmitted), most of the current drawn by the current source will flow through only one of the two devices. Consequentially, only one of the collector resistors will conduct significant current as well. This means that, given the collector resistors are correctly sized, a significant difference in the voltage levels between the complementary output nodes will exist. Therefore, either output node is capable of driving other current switches in a single-ended manner. In this way, the input signal is transmitted through the CML buffer, which propagates the logic value or its complement to other ECL and CML circuit in the required manner.

            For differential ECL and CML circuits, the reference voltage is eliminated. Instead, both output nodes of the driving differential ECL or CML circuit are connected to the base nodes of the devices in one or more of the current switches in the receiver circuit. Therefore, given that the differential voltage between the output nodes of the driving circuit is large enough to insure that VBE across one of the two devices is significantly larger than that of the other device, as before, most of the current will flow through only one of the devices. Again, this indicates that only one of the collector resistors will conduct significant current, producing a differential output voltage between the output nodes of the buffer that is comparable to the applied differential input voltage. In this way, as with the single-ended CML buffer, the differential CML buffer is able to propagate a given logic value to other ECL and CML circuits. The inverse of the input signal can also be propagated simply by reversing the connections between the output node of the CML buffer and the input nodes of a particular current switch in a load circuit. This forces the current, if any is available, to flow through the opposite device in the load current switch. Therefore, as with single-ended ECL and CML circuits, complementary output logic values are also available in differential ECL and CML circuits.

            For a digital circuit family to be useful, it must not only be able to propagate logic values, but compute logical operations as well. Logical computations can be performed using ECL and CML circuits by forming current trees [55]-[57],[59]-[61]. An ECL or CML current tree consists of one or more current switches interconnected to form a circuit that evaluates a binary tree graph. Each current switch corresponds to a node in the tree. The coupled emitter connection of a current switch correspond to the edge that leads toward the root of the tree, while the collector connections correspond to edges that lead toward the leaf nodes of the tree. The coupled emitters of the root pair are connected to a current source while the dangling collectors of the leaf current switches are connected to either of two collector resistors. The base leads of the current switch nodes are input nodes that must be driven by other ECL or CML gates. When driven properly, each pair of devices acts as a switch that allows current to flow from one collector lead through the corresponding device to the coupled emitter lead, as described above. To clarify the circuit operation, consider the current as a flow of electrons that leaves the current source and travels through the current switches of the tree until one of the two collector resistors is reached. The electron current first encounters the root current switch, where it is directed through one of the two devices, depending on the differential voltage applied to the base leads of these devices, and exits the node through the corresponding collector lead. The electron current continues traveling up through the tree in this manner, traversing a path directed by the current switches based on the applied input voltages to the circuit, until a collector node is reached that leads to a collector resistor. The current causes a voltage drop across this resistor, resulting in a voltage drop on the corresponding output node of the circuit. Therefore, the path the current takes through the current tree determines which collector resistor conducts current. In this way, the logical output of the circuit is determined. At each relevant current switch, from the root node to a leaf node, a decision is made as to which path the electron current should take on its way to the destination based on input data from other ECL and CML circuits. Therefore, a current tree evaluates at set of input logic values in the same manner as a binary decision tree.

Figure 1.2: Schematics for a 2-input AND function implemented in CML.

            An example of a CML current tree, shown in Figure 1.2, implements a 2-input AND function (O1 = I1 * I2) [59]-[61]. If the input signals to this circuit are single-ended, each barred input must be connected to a reference voltage. However, if the input signals are differential, for each current switch, one wire of a particular signal pair is connected to each of the inputs. The operation of the circuit is easily explained using decision tree analysis. One begins at the current source and follows the electron current to the first current switch. If signal I2b is true (i.e. the voltage level of I2b is greater than I2), current flows through device Q2b and the voltage at node O1 drops due to the current flow through RC1, indicating O1 is false. Note that since no current is flowing through Q2, no current can be flowing through either Q1I1 or Q1I0. This implies that no voltage drop exists across RC0, indicating that O1b is true. Therefore, in this case the O1 and O1b signal values are complementary and consistent with the expected result given I2 is false. If I2 is true, on the other hand, current flows through Q2. Now one must examine the other current switch to determine the flow of current. If I1b is true, current flows through Q1b, and again a voltage drop is observed at node O1. Therefore, O1 is again false, which is consistent with the above equation. If I1 is true when I2 is true, however, current flows through Q1 instead, and the voltage at node O1b drops due to current flow through RCb, indicating that now O1b is now false. Current is no longer flowing through RC in this case, so no voltage drop is observed at node O1, indicating O1 is now true. In this way, it has been demonstrated that the circuit shown in Figure 1.2 implements a 2-input AND function.

Figure 1.3: 2-input OR function implemented using a CML current tree.

            Because signal inversions in ECL and CML circuits are accomplished without any additional hardware, it can be easily demonstrated that the circuit shown in Figure 1.2 also implements a 2-input OR function. Using deMorgan’s theorem, the expression O1 = I1 + I2 is found to be equivalent to O1b = I1b * I2b. This conversion indicates that an OR function can be computed with an AND operator if the input terms are inverted prior to computing the AND operation and the result is then inverted as well. Therefore, by simply altering the labels of the input and output nodes of the CML AND circuit shown in Figure 1.2, the requisite inversions are performed, resulting in the 2-input OR gate shown in Figure 1.3. The types of logic functions that can be implemented with ECL and CML current trees is not limited to simple AND and OR functions. Any logical function that can be represented by a binary decision tree with a depth that is no greater than the number of allowed levels in the current trees for a given design environment can be implemented by mapping the decision tree into an ECL or CML current tree.

Figure 1.4: Schematics for a 2-input XOR function implemented using CML.

            A current tree implementing a 2-input exclusive-OR (XOR) function (O1 = I1 * I2b + I1b * I2) [55],[57]-[61] is shown in Figure 1.4. The current switch consisting of Q2 and Q2b steers current between the other two current switches, depending on the value of O2 and O2b. If I2 is true, current flows through Q1 when I1 is true, creating a voltage drop that indicates O1 is false. However, if I1 is false while I2 is true, current flows through Q1b, allowing the voltage at O1 to approach VCC, which indicates O1 is true. If I2 is false, on the other hand, current flows through Q0b when I1 is false, creating a voltage drop that indicates O1 is false. However, if I1 is true while I2 is false, current flows through Q0, allowing the voltage at O1 to approach VCC, which indicates O1 is true. Put simply, the current switch consisting of Q2 and Q2b selects which of the remaining current switches controls the current flowing through the collector resistors, allowing the output voltage to be set differently by I1 and I1b depending on the value of I2 and I2b. Note that a 2-input multiplexer can be implemented with this circuit if the upper current switches are driven by different signals, i.e. IA1 and IA1b for Q1 and Q1b, and IB1 and IB1b for Q0 and Q0b.

Figure 1.5: Schematics for a D-latch implemented in CML.

            In logic design, it is often necessary to have a circuit that can store a logic value. This type of circuit, generally referred to as a latch, can be implemented using an ECL or CML circuit such as the circuit shown in Figure 1.5 [55],[57],[59]-[61]. When W2 is true, the latch is in write mode, allowing current to flow through Q3 and the current switch consisting of Q1 and Q1b. Therefore, in write mode, the value presented by D1 and D1b will be replicated at Q1 and Q1b. When W2b becomes true, the current is redirected through Q3b and the current switch consisting of Q2 and Q2b. The base lead for each device in this current switch is connected to an output node of the latch, which is also connected to the collector lead of the opposite current switch device. In this way, positive feedback is employed to maintain the voltage levels at the output nodes that were set in write mode. That is, if D1 and Q1 are true when W2b becomes true, current switches from flowing through Q3 and Q1 to flowing through Q3b and Q2. This current continues to flow through RCb, however, insuring that the voltage at the base of Q2b is sufficiently lower than the voltage at the base of Q2. This prevents Q2b from conducting significant current. Therefore, the current flow through Q2 will be maintained as long as W2b is true, storing a true value in the latch. If D1 and Q1 are false while W2 is true, similar reasoning can be used to show that current flow is transferred from Q3 and Q1b to Q3b and Q2b when W2b becomes true, requiring Q1 to remain false while W2b is true.

            For the current tree circuits to perform properly, it is necessary to insure that none of the devices enter saturation. For this reason, attention must be paid to the common mode voltage level of each input signal pair, whether differential or a combination of a single-ended signal and a reference voltage. The common mode voltage levels of the input pairs to two current switches in which the emitter leads of one current switch (the upper current switch) are connected to one of the collector leads of the second current switch (the lower current switch) must differ by a sufficient amount to prevent saturation of the lower current switch when current is flowing through devices in both current switches. This is because the devices conducting current in the two current switches develop voltage drops across their base-emitter junctions. This means that, for example, if the two current switches are driven by input signals with the same common mode voltage levels, the collector voltage of the conducting device in the lower current switch, which in this case is equal to the voltage of the emitters of the upper current switch, will try to drop significantly below the base voltage of the conducting device in the lower current switch. Therefore, the conducting device in the lower current switch saturates, and performance suffers.

            To solve this problem, emitter follower stages are connected to the output nodes of the CML current tree circuits, as illustrated in Figure 1.6, creating ECL circuits [55]-[61]. The emitter follower stages serve to shift the common mode voltage of an output signal pair by one or more VBE drops. The common mode output voltage of a particular ECL or CML circuit is often categorized by assigning a level number to the signal pair based on the number of VBE drops between one of the signal outputs of the current tree and an emitter follower output. Commonly, a higher level number indicates a larger number of VBE drops. A level 1 designation is assigned to an output signal pair connected directly to the current tree collector resistors. As mentioned before, this is a CML circuit. However, the designers may choose to exclude CML circuits, in which case emitter follower stages exist in all the circuits. Under these conditions, the level 1 designation may be given to signals in which one VBE drop exists between the output signal pair and the output of the current tree portion of the circuit. Each additional VBE drop adds an additional level to the output signal pair. The additional VBE drops are created by inserting diodes (formed by shorting the base and collector of a transistor) between the top device (whose base is connected to a collector resistor) and the emitter follower current source. When driving inputs of a current tree, it is necessary to insure that, for any two adjacent current switches (that is, pairs in which the emitter leads of one pair are connected to the collector lead of a device in the other pair), the upper current switch is driven by an input signal pair that is at least one level less than the level of the input signal pair driving the lower current switch. In this way, the emitter voltage of the devices in the upper current switch (and, therefore, the collector voltage of the conducting device in the lower current switch) at worst will be about equal to the base voltage of the lower device. Therefore, saturation of the conducting device in the lower pair is avoided. The emitter follower output stages also tend to lower the output impedance of the driving circuit. This improves the ability to drive load circuits and interconnect. For this reason, CML circuits are often omitted in ECL style designs.

Figure 1.6: Level 2 and level 3 ECL buffers.

 


 

Chapter 2                                                                Register File Design

 

The main emphasis in the design of the register file was to minimize the access times. This has been the prime motivation for the choice in technology used to design the register file. Having chosen a bipolar technology, it is important to use circuit design techniques that produce minimal switching times within the devices. For this reason, circuit designs employing current steering techniques, such as CML and ECL logic circuits, form the basis of the register file design. These types of circuits bias devices in either the forward active or cutoff regions, which minimizes the device switching times since the devices do not become saturated. Other factors such as power consumption and area requirements were considered as well to produce a design that is practical for high performance digital products.

2.1        Register file overview

            As previously mentioned, the register file has a size of 32 words by 32 bits that can be simultaneously accessed by two read ports and a single write port. A block diagram for the register file, shown in Figure 2.1, illustrates that the register file contains seven distinct types of functional blocks. These are the memory cell array, the read address decoders and word line drivers, the write address decoder and word line drivers, the bit line drivers, the sense amplifiers, the output latches, and the comparators. The memory cell array is responsible for storing the 1024 bits of data. It is arranged in a grid of 32 rows by 32 columns of memory cells. When any of the ports access the memory cell array, the read or write operation is performed on every memory cell in the selected row simultaneously. Each read address decoder is responsible for decoding a 5-bit address to determine which of the 32 rows is selected for each read operation. The 64 read word line drivers are responsible for driving the read word lines accordingly. The write enable circuit is responsible for determining whether or not a write operation is to take place. If so, the 5-bit write address is decoded to select the memory cell row in which new data will be written. The 32 write word line drivers are responsible for driving the write word lines accordingly. The bit line drivers are responsible for providing input data to the memory cells during write operations. There are 32 bit line drivers, each of which supplies one bit of input data to a particular column of memory cells. The 64 sense amplifiers are responsible for detecting the values on the read bit lines and translate them into differential voltages corresponding to two data words. The comparators are used to detect whether or not a write operation is occurring for a row that is being read by either read port. The 64 output latches are responsible for selecting and storing the appropriate data from either the sense amplifiers or the bit line drivers based on the results of the comparators. A more detailed explanation of the functions of the blocks within the register file is provided below.

Figure 2.1: Register file block diagram.

2.2        Memory cell design

            The memory cell, shown in Figure 2.2, consists of four current switches and two collector resistors [62]. During a write operation, 17.2 mA of current flows through WW while no current flows through WWb. About 0.54 mA of the current flowing through WW is directed through either QD or QDb, depending on the differential voltage applied between WB and WBb by the corresponding bit line driver. The current through QD or QDb produces a differential voltage between MC and MCb. The magnitude of this voltage is determined primarily by the amount of current flowing through WW that is directed through either QD or QDb and the size of the collector resistors, which in this case results in a voltage swing of 0.25 V. At the end of the write operation, the current flowing through WW is redirected through WWb. About 0.54 mA of current flows through either QF or QFb, depending on whether MCb or MC is at a higher potentialas a result of the write operation. The positive feedback configuration of the devices QF or QFb maintains the differential voltage between MC and MCb for as long as current is flowing through WWb. In this way, the memory cell stores data in a manner similar to that of a D-latch (see Chapter 1). However, the current switch that directs current between QD and QDb or QF and QFb based on whether or not a write is asserted is not found in the memory cell. Choosing the current level used in the memory cell is a compromise between providing as much current as possible to write new values into a row of memory cells quickly and maintaining reasonable word line metal widths without violating electromigration rules. Also, keeping the power dissipation to a reasonable level was a factor in the decision. The propagation delay in writing a value into the memory cell can be defined as the interval from the point in time when the differential current between WW and WWb reaches zero to the point in time when the differential voltage between MC and MCb reaches zero. Simulation results show that this propagation delay is 36 ps.

            When a memory cell row is selected for a read operation through read port A, 17.2 mA of current flows through RAW. For a particular memory cell, about 0.54 mA of this current either flows through QRA or QRAb, depending on whether MCb or MC is at a higher potential, which in turn causes most of this current to flow through either RAB or RABb. The value stored in the memory cell is determined by the sense amplifier connected to RAB and RABb, based on the current flowing in these two bit lines. The value stored in the memory cell can be read simultaneously through port B using similar methods. The propagation delay in reading a value from the memory cell can be defined as the interval from the point in time when the differential current between RAW word lines of two memory cell rows (one that is finishing a read operation and another that is beginning a read operation) reaches zero to the point in time when the differential current between the bit lines RAB and RABb reaches zero. Simulation results show that this propagation delay is 56 ps.

Figure 2.2: Three-port memory cell schematics.

2.3        Bit line driver design

            The bit line driver, shown in Figure 2.3, is simply an ECL buffer that drives nodes WB and WBb at level 2 for a particular column of memory cells, producing a differential

Figure 2.3: Bit line driver schematics.

voltage across these lines that corresponds to the value to be written into one of the memory cells in the column. 32 bit line drivers are required for the register file. This allows WB and WBb of each of the 32 memory cell array columns to be driven simultaneously, providing 32 bits of data for storage in the register file within a single write operation. The 34 current switch loads that each bit line driver must drive is a considerable amount of loading for the circuit. For this reason, larger emitter follower devices were employed, conducting 2 mA of current to provide better driving capability in the bit line driver. Simulation results show that the propagation delay of the bit line driver under loaded conditions is 69 ps.

2.4        Read address decoder design

            The read address decoder is similar to the design used in a BiCMOS register file [41],[42]. Each read address is decoded in two stages, as shown in Figure 2.4. In the first stage, wired-OR techniques are used to decode the lower two bits. This decoder uses two CML buffers to drive four wired-OR lines. A CML buffer driving the input devices to four wired-OR lines is illustrated in Figure 2.5. Each wired-OR line operates in a manner similar to that of an emitter follower. However, instead of having a single bipolar device with its base driven by a CML buffer and biased using a current source connected to the emitter, there are two emitter-coupled bipolar devices, each with its base driven by a separate CML buffer, and both biased by a single current source connected to the emitters. Therefore, the voltage level on the wired-OR line is determined by the higher of the two input voltage levels to the base terminals of the devices, assuming they differ. If both input voltage levels are high, both devices are forward biased, and a high level voltage at level 2 is observed on the wired-OR line. Likewise, if both input voltages are low, both devices are forward biased, and a low level voltage at level 2 is observed on the wired-OR line. If only one of the input voltage levels to the two devices is high, however, the device receiving the high voltage is forward biased while the other device is operating in the cutoff region. Therefore, the voltage across the base-emitter junction of the forward biased device produces a high level voltage at level 2 on the wired-OR line. In this way, the output of a wired-OR line is an OR function of the input signals applied to the emitter-coupled devices on the wired-OR line.

Figure 2.4: A portion of the read address decoder schematics. Included is the stage one 2-bit decoder, a portion of the stage two decoder, and some of the read word line drivers.

Figure 2.5: Schematics for the stage one 2-bit decoder CML buffer driving wired-OR input devices.

            Since the output signals of the stage one 2-bit decoder are single-ended, the output voltage swing of these signals for this circuit is designed to be 0.5 V. Simulations show the propagation delay through the CML buffer from A to B is 35 ps. The total propagation delay through the stage one decoder is the interval from the time when the differential voltage between A and Ab reaches zero to the time when the rising or falling voltages on one of the wired-OR lines reaches the reference voltage level used by the stage two decoder (see below). Because the output voltages of the stage one decoder are single-ended, the propagation delay through the circuit when an output voltage is rising (tplh) is not equal to the propagation delay when the output voltage is falling (tphl). For this circuit, the simulated value of tphl is 114 ps and tplh is 80 ps.

            The CML buffers drive the wired-OR lines for the stage one 2-bit decoder in a manner such that two wired-OR lines receive an inverted version and two wired-OR lines receive a non-inverted version of each address bit, as Figure 2.4 illustrates. The arrangement is such that wired-OR line 0 receives a non-inverted signal from both CML buffers, wired-OR line 1 receives a non-inverted signal from CML buffer 1 and an inverted signal from CML buffer 0, wired-OR line 2 receives an inverted signal from CML buffer 1 and a non-inverted signal from CML buffer 0, and wired-OR line 3 receives an inverted signal from both CML buffers. Therefore, since exactly one wired-OR line receives a low voltage on both input terminals, it is the only one of the four wired-OR lines to produce a low output signal. The wired-OR line that produces a low output voltage indicates the decoded value of the 2-bit decoder. Each of the other three wired-OR lines receive a high voltage on one or both of its input terminals, resulting in a high signal output for each of these lines.

            The stage one 3-bit decoder decodes the upper three bits of an address in a manner similar to that of the stage one 2-bit decoder. However, this decoder decodes a 3-bit address onto eight wired-OR lines. There are three CML buffers in this case and each wired-OR line contains three emitter-coupled input devices, each of which has a base connection to one of the CML buffers. Each CML buffer drives four wired-OR lines from both the inverting and non-inverting output nodes, as Figure 2.6 illustrates. The pattern is such that for a given three-bit address pattern, the wired-OR line corresponding to the encoded pattern is driven low, while all seven other lines are driven high. Therefore, a 000 input pattern forces wired-OR line 0 low, a 001 input pattern forces wired-OR line 1 low, a 010 input pattern forces wired-OR line 2 low, and so on up to a 111 input pattern, which forces wired-OR line 7 low. The output voltage swing of this stage one decoder is also designed to be 0.5 V. The simulated propagation delay through the CML buffer from A to B is 42 ps in this case. For the entire stage one 3-bit decoder, tphl is 54 ps and tplh is 58 ps according to simulations.

Figure 2.6: Schematics for the stage one 3-bit decoder CML buffer driving wired-OR input devices.

            The second stage of each read address decoder consists of 32 single-ended 2-input ECL NOR gates. This well known NOR implementation, shown in Figure 2.7, operates in a manner similar to the ECL buffer circuit described above [53],[54],[58]. Since the input signals are single-ended, a static reference signal, VREF, is required. VREF must be midway between the expected high and low input signal voltage levels. When both input signals are at a lower voltage level than VREF, ICS flows almost entirely through QREF, allowing R1 to be pulled up to VCC. If either input signal is at a higher voltage level than the VREF, however, ICS flows through either Qh or Ql, causing R1 to be pulled low. An emitter follower is provided to produce a level 3 output signal to properly bias the corresponding read word line driver. One input of each NOR gate is connected to one of the set of four wired-OR lines from the stage one 2-bit decoder while the other input is connected to one of the set of eight wired-OR lines from the stage one 3-bit decoder, as illustrated in Figure 2.4. There are exactly 32 ways to choose unique pairs from the two sets, providing a unique pair for each NOR gate. Since only one of the wired-OR lines in each set is low for any given address, only one ECL NOR gate will receive two low signals as input. This NOR gate produces a high output signal, indicating that the corresponding row is selected, while all the other NOR gates produce low output signals.

Figure 2.7: Schematics for a 2-input single-ended ECL NOR gate used in the read address decoders.

            The output voltage swing of the stage two decoder is designed to be 0.5 V. Although the output signal of this circuit is single-ended, the voltage swing could have been designed to be as low as any standard circuit with a differential output. This is because the output voltage does not swing about a fixed reference voltage to produce a differential input voltage for a receiver. Instead, the output voltages of a number of the gates are compared to determine which gate has a high output voltage. Therefore, the effective input differential voltage to a circuit driven by the ECL NOR gates is the difference in voltage between the output voltage of the NOR gate that selects the row to be read and the output voltage of any other NOR gate. To obtain this same effective input differential voltage with a single ECL NOR gate and a fixed reference voltage, the output voltage swing of the NOR gate would have to be doubled. Nevertheless, a 0.5 V swing was chosen for ECL NOR gate in the stage two decoder instead of a 0.25 V swing to insure proper operation of the read word drivers. The primary reason was to minimize the current in the 31 read word drivers that are receiving low level voltages. Also, in the event of mismatch in the output voltage swings of the various stage two decoders and because of voltage drops in the metal lines that couple the emitters of the read word drivers, the increased voltage swing insures that the read word drivers behave properly. The emitter follower devices of the stage two decoder NOR gates were doubled in size and the current through each emitter follower was doubled as well to improve the drive capability of the circuit because of the large size of the read word line driver devices. According to simulations, tplh is 55 ps and tphl is 46 ps for the read address decoder ECL NOR gate.

2.5        Read word line driver design

            Each read word line driver consists of a single large device capable of handling as much as 18 mA of current. This is necessary to drive a read port in every memory cell for a single row. The base of each device is driven by the corresponding NOR gate from the stage two decoder. The emitters of all 32 read word line driver devices for a given port are connected to a single current source as shown in Figure 2.4 and Figure 2.8. Since only one of the NOR gates is producing a high output signal, the VBE of the read word driver device connected to this gate is significantly larger than that of the other read word line driver devices. Therefore, nearly all the current sunk by the current source flows through the selected read word line driver device, while the other read word line driver devices are cut off. Since the collector of each read word line driver device is connected to the appropriate read word line of the corresponding row of memory cells, current flows only through the selected read word line of each read port. In this way, each memory cell in a selected row is able to drive its stored value on to the appropriate set of bit lines while memory cells in rows that are not selected for this read operation do not significantly affect the state of the bit lines. The current source draws 17.2 mA, providing about 0.54 mA for each bit line of a particular read port. This value was chosen as the result of a compromise between providing as much current as possible to drive the bit lines effectively and maintaining reasonable word line metal widths without violating electromigration rules. Also, keeping the power dissipation down to a reasonable level was a factor in the decision. The propagation delay through the read word line driver, measured as the interval from the point in time when the differential voltage of the two changing input signals reaches zero to the point in time when the differential current of the two changing output signals reaches zero, is 5.5 ps.

Figure 2.8: Read word line driver schematics.

2.6        Write address decoder and word line driver design

            The write address decoder, illustrated in Figure 2.9, operates in a manner similar to that of the read address decoder with two notable exceptions. The first is the addition of a write enable circuit, which prevents the decoder from selecting any row of memory cells for a write operation unless the write enable signal is asserted. This circuit, shown in Figure 2.10, consists of a CML buffer with differential inputs and a single-ended output that drives four additional input devices to the set of four wired-OR lines used in the stage one 2-bit decoder. Therefore, this circuit adds a third input to each of the four wired-OR lines of the stage one 2-bit decoder. Because the CML buffer in the write enable circuit drives the same value onto all four wired-OR lines, when the buffer input is high, indicating a write operation is enabled, a low value is placed on each wired-OR line. The low values do not alter the states of the four wired-OR lines, allowing the first stage of the decoder to operate in the same manner as in the read address decoder. However, when the CML buffer input is low, indicating no write operation is to take place, a high value is placed on one of the inputs of each wired-OR line. In this case all four wired-OR lines are forced high, which insures that each of the NOR gates in the second stage of the decoder will receive a high value on one of its input terminals, and hence, produce a low output signal. Therefore, when no write operation is enabled, each row of memory cells is driven in a manner that preserves the values currently stored in the memory cells. As with the stage one decoder circuits, the output voltage swing of this circuit is designed to be 0.5 V. Because the CML buffer driving the emitter follower devices has a single-ended output, the propagation delays on high-to-low and low-to-high transitions are different. According to simulations, tphl is 44 ps while tplh is 43 ps for the CML buffer. These simulations show that, for the entire stage one decoder, tphl is 99 ps while tplh is 81 ps.

Figure 2.9: A portion of the write address decoder schematics. Included is the stage one 2-bit decoder, a portion of the stage two decoder, and some of the write word line drivers.

            The second way in which the write address decoder differs from the read address decoder is in the modifications to the stage two decoder ECL NOR gates. This was necessary to accommodate the write word line drivers. Each ECL NOR gate for the write address decoder provides a differential output, which is necessary to drive a write word line driver. Schematics for the write address decoder ECL NOR gate are shown in Figure 2.11. The write word line driver, shown in Figure 2.12, consists of a pair of emitter-coupled devices with emitters attached to a current source. When the NOR gate output is false, no write operation is occurring. This means that WD3b is at a higher voltage than WD3, causing current to flow through QR to drive the word line WWb. This allows current to flow through either QF or QFb in the memory cells of the row driven by this particular write word line driver, allowing the stored values in these memory cells to be maintained. During a write operation for a particular row, the NOR gate output becomes true. At this point, WD3 is at a higher voltage than WD3b, causing current to flow through QW to drive the word line WW. This causes current to flow through either QD or QDb in the memory cells of the row driven by this particular write word line driver, allowing the values driven onto the write bit lines to be written into these memory cells.

Figure 2.10: Write enable circuit schematics.

            The current source in each write word line driver draws 17.2 mA of current to supply all the memory cells of a particular row with the required current through either WW or WWb, as described above. Because of the large amount of current handled by the write word drivers, very large devices are required. To drive these devices more effectively, the device sizes and current flowing through the emitter followers of the ECL NOR gate were doubled. The voltage swing of the write address decoder ECL NOR gate is designed to be 0.5 V for reasons similar to those described for the read address decoder ECL NOR gate. According to simulations, tphl is 45 ps and tplh is 44 ps for the write address decoder ECL NOR gate. The propagation delay through the write word line driver, measured as the interval between the point in time when the input differential voltage becomes zero and the point in time when the output differential current becomes zero, is 10 ps according to simulations.

Figure 2.11: Schematics for an ECL NOR gate used in the second stage of the write address decoder.

Figure 2.12: Write word line driver schematics.

2.7        Sense amplifier design

2.7.1        Version 1

            An active sense amplifier is used to convert the differential current on a pair of read bit lines into an output differential voltage. Of course, this could have been accomplished simply by using a pair of pull-up resistors on the bit lines. However, since the bit lines must traverse the entire memory cell array and are connected to a number of devices, they have a large parasitic capacitance. This means that, using the pull-up resistors, a significant lag in the change in differential voltage across the bit lines would be observed as a result of a change in the differential bit line currents. For this reason, a common-base sense amplifier, as shown in Figure 2.13, is often used [63]. In this circuit, QRB and QRBb provide the current that flows through the bit lines. The majority of this current also flows through the collector resistors, producing a differential voltage across O and Ob that is proportional to the difference in current flowing on the two read bit lines. Although this type of sense amplifier provides no improvement in transimpedance over that of the pull-up resistors, it isolates the bit lines from the differential output voltage. Therefore, the capacitance on each sense amplifier output node is significantly reduced with respect to the capacitance on each read bit line. The sense amplifier current sources provide a small amount of current to allow QRB and QRBb to remain forward biased regardless of which bit line is conducting current. Therefore, the VBE of the device conducting current from one of the bit lines is not much larger than the VBE of the device that is only conducting current from one of the current sources. This means that only a small voltage shift on the bit lines is required to produce the desired response in QRB and QRBb. These devices, in turn, can rapidly produce a change in the output differential voltage due to the lower capacitance on nodes O and Ob with respect to the capacitance on the bit lines. This occurs much more quickly than if pull-up resistors attached to the bit lines were used to produce an identical output differential voltage directly on the bit lines.

Figure 2.13: Schematics for a common-base sense amplifier.

            A problem with the sense amplifier shown in Figure 2.13 is that it is susceptible to changes in the common-mode voltage on the bit lines due to noise. If RB and RBb both rise due to noise coupling, for instance, VBE across each sense amplifier device is decreased, reducing the current flow through these devices. This leads to a reduction in the differential voltage at the output of the sense amplifier, which can lead to further noise problems with this signal pair. This problem can be alleviated somewhat by increasing the bias currents in the sense amplifier to improve the common-mode rejection. However, a better alternative is to use a cross-coupled cascode sense amplifier, shown in Figure 2.14, which has much better common-mode rejection [57]. This circuit functions in a manner similar to the sense amplifier shown in Figure 2.13. That is, the majority of the current flowing through the read bit lines is provided by QRB and QRBb in this case as well, producing a differential voltage across O and Ob that is proportional to the difference in current flowing on the two read bit lines. Rather than using a fixed bias voltage for QRB and QRBb, however, a cross-coupled diode circuit is used to allow the voltages on the bit lines to influence the biasing. This scheme improves the common mode rejection of the sense amplifier. For instance, if RB and RBb both rise due to noise coupling in this case, the cross-coupled diodes force the bias voltages at the base of both QRB and QRBb to rise as well, resulting in no significant change in the voltage across the base-emitter junctions of these devices. Therefore, there is no significant change in the current flowing through QRB and QRBb, and as a result, no significant change in the output differential voltage of the sense amplifier.

Figure 2.14: Sense amplifier schematics.

            Since the voltage of every read bit line is biased about one VBE drop below VCC by the sense amplifiers, it is necessary to insure that the memory cell current switches that direct current between the read bit lines are driven at level 2 to avoid saturation. Although this could be done by connecting TW of the memory cells to VCC and using emitter followers to level shift the voltages at nodes MC and MCb to level 2, it would require an additional four devices in every memory cell and result in significantly larger power dissipation. Instead, a single diode was connected between VCC and TW for each row of memory cells to shift the level of MC and MCb to level 2 for every memory cell. This solution only requires 32 diodes capable of handling the current drawn by the write word line drivers. Because MC and MCb are at level 2, the memory cell devices QF and QFb are driven at level 2. For this reason, it is necessary for the write word line drivers to receive level 3 inputs to prevent devices in the write word line driver from saturating. Because the memory cell current switches that direct current between the read bit lines are driven at level 2, it is also necessary for each read word line driver to receive a level 3 input to prevent the read word line driver devices from saturating. Finally, since MC and MCb are at level 2 and the write word line drivers receive level 3 inputs, it is necessary to drive the memory cell devices QD and QDb at level 2 by the bit line drivers. Driving QD and QDb at level 1 would saturate the devices, while driving them at level 3 would saturate devices in the write word line drivers.

            SPICE simulations were performed to determine the optimal value for the sense amplifier bias current sources. These simulations computed the propagation delay through a memory cell and sense amplifier as well as the bit line voltage swing as a function of the value of each sense amplifier bias current source. The memory cell propagation delay varies as a function of the sense amplifier biasing since the sense amplifier bias determines the bit line swing and, therefore, the loading characteristics presented to the memory cells in the corresponding column. The propagation delay of the sense amplifier is the interval between the point in time when the differential current on the bit lines becomes zero and the point in time when the differential output voltage of the sense amplifier becomes zero. The simulation results, shown in Figure 2.15, indicate that the propagation delay through the memory cell and sense amplifier decrease as the current levels in the sense amplifier bias current sources are increased. This trend is observed in simulations including capacitive wire parasitics, as well as in simulations without wire parasitics. As Figure 2.16 indicates, the steady state differential voltage across the read bit lines also decreases as the current levels in the sense amplifier bias current sources are increased. This decrease directly translates into a decrease in the propagation delays through the memory cell and sense amplifier since a decrease in the voltage swing on the bit lines results in a decrease in the amount of charge transfer between the bit lines. This allows a memory cell to switch the differential voltage on the bit lines more quickly, resulting in a lower propagation delay through the affected circuits. A value of about 90 mV was chosen as the bit line voltage swing. Although bit line voltage swings as low as 30 mV have been used with this style of sense amplifier [57], a higher value was chosen here to reduce the risk of noise related problems in the circuit. From Figure 2.15, the sense amplifier bias current sources should have a value of about 30 mA to produce a 90 mV voltage swing on the bit lines. Using this bias value, the propagation delay through the memory cell is 56 ps for a read operation, as mentioned earlier, and the propagation delay though the sense amplifier is 41 ps.

Figure 2.15: Propagation delay of the memory cell and sense amplifier as a function of the value of each sense amplifier bias current source.

Figure 2.16: Steady state differential voltage across a pair of read bit lines as a function of the value of each sense amplifier bias current source.

            Determining the current flow through each of the devices in the sense amplifier analytically under steady state conditions is not trivial. In performing the derivation, it is assumed that the collector current through a device as a function of its VBE can be expressed as shown in Equation 14 (see Chapter 4) and, it is also assumed that bDC approaches infinity. If one assumes that current is flowing along the RB bit line, by inspection of Figure 2.14,

 

, and

( 1 )

 

.

( 2 )

Also by inspection of Figure 2.14, one finds that

 

, and

( 3 )

 

,

( 4 )

assuming RD and RDb are equal. Substituting Equation 14 for each VBE variable and eliminating IQD and IQDb produces

 

 and

( 5 )

 

.

( 6 )

Although a closed form solution cannot be determined for either ICQRB or ICQRBb, from the above equations, numerical methods can be used to find solutions for these variables when IRB, IBIAS, and RD are specified. Having computed ICQRB and ICQRBb, it is then possible to compute the steady state differential voltage across the bit lines, which is

 

.

( 7 )

            Using Equations 1 through 7 and SPICE, the values for the current through the devices in the sense amplifier shown in Figure 2.14 and the resulting bit line differential voltage were determined. The results, listed in Table 2.1, show that, although the currents in three of the devices were predicted accurately with analytical methods, the large error in ICQRBb produces a significant error in the estimation of DVRB. Despite the error in the determination of DVRB using analytical methods, however, the result is a reasonable estimate of the differential bit line voltage. Computing the propagation delay through the sense amplifier analytically is much more complicated, requiring SPICE simulations to determine this result.

 

Analytic

SPICE

Error

ICQRB

0.49 mA

0.49 mA

1.6%

ICQRBb

3.8 mA

2.4 mA

61%

IQD

26 mA

28 mA

5.3%

IQDb

73 mA

74 mA

2.3%

DVRB

-114 mV

-93 mV

23%

Table 2.1: Sense amplifier steady state bias conditions computed using analytical methods and SPICE simulations assuming a sense amplifier bias current of 30 mA.

2.7.2        Version 2

            An opportunity to fabricate a second version of the register file became available approximately one year after the tape-out of the first fabrication run. The register file design was made a bit more aggressive on this run by lowering the voltage swing on the read bit lines. Based on the simulation results shown in Figure 2.15, the bit line voltage swing was reduced to about 40 mV, resulting in an increase in the value of each sense amplifier bias current source to 200 mA. Simulations show this decreases the memory cell propagation delay to 47 ps and decreases the sense amplifier propagation delay to 36 ps. The risk of noise problems that may occur if the bit line voltage swing is decreased below 40 mV and the increased power required as a result of the increase in the sense amplifier bias current sources above 200 mA were the primary reasons why the bias current for the sense amplifiers was not increased further.

 

Analytic

SPICE

Error

ICQRB

0.56 mA

0.55 mA

1.9%

ICQRBb

49 mA

48 mA

2.1%

IQD

15 mA

17 mA

13%

IQDb

18 mA

21 mA

13%

DVRB

-51 mV

-44 mV

15%

Table 2.2: Sense amplifier steady state bias conditions computed using analytical methods and SPICE simulations assuming a sense amplifier bias current of 200 mA.

            Using Equations 1 through 7 and SPICE, the values for the current through the devices in the sense amplifier shown in Figure 2.14 with 200 mA bias current sources and the resulting bit line differential voltage were determined as well. The results, listed in Table 2.2, show that the prediction of ICQRBb using the analytical model is improved in the second design. This is probably due to the fact that the larger value of ICQRBb in the second design caused the associated device to behave more like an ideal transistor. This may also be the reason why the prediction of the diode currents IQD and IQDb was less accurate using the analytical method in the second design. The overall error in the prediction of device currents was lower in the second design, however, resulting in a more accurate prediction of the bit line voltage swing for the second design using analytical methods.

2.8        Output latch design

            A set of output latches is provided for each read port to capture the register file output. Normally this output comes from the sense amplifiers. However, when a write operation is occurring in a row that is also being read by one of the read ports, the normal read access is delayed while new data is written into the memory cells. This worst case scenario will limit the read access time of the register file under practical circumstances since the required read access time for most designs must be met under all circumstances. To prevent the read access time from being limited by this special case, it is possible to bypass the memory cells in this case and send the data on the write bit lines directly to the

Figure 2.17: Output latch schematics.

output latches as well as memory cells. To implement this scheme, a 2-to-1 multiplexer is required for each output latch to select whether the data to be stored in the latch is coming from a sense amplifier or a bit line driver. This multiplexer can be integrated into the output latch current tree [57],[59]-[61], resulting in the output latch circuit shown in Figure 2.17. This circuit operates in a manner similar to that of the D-latch described above, with the exception that when W3 is true, indicating a value is being written into the latch, the differential voltage between the select line pair M2 and M2b determines whether I1 or I0 is written. I0 receives data from a sense amplifier while I1 receives data from a pair of write bit lines through a CML buffer (see Figure 1.1). The output latch is designed to have an output voltage swing of 0.25 V. The simulated propagation delay from D to O is 28 ps, while a change in M2 that causes a change in O has a simulated propagation delay of 37 ps.

2.9        Comparator design

            To determine whether the data written into the output latches of a read port should come from the sense amplifiers of that port or the bit line drivers, it is necessary to compare the read address of that port with the write address to determine whether or not

Figure 2.18: Gate level schematics for a comparator circuit to determine whether a read address matches the write address and a write is enabled.

they match. It is also necessary to check the write enable signal to determine whether or not a write operation is occurring. The circuit shown in Figure 2.18 provides the gate level schematics for a comparator that is capable of determining whether a pair of 5-bit addresses match and also whether a write operation is occurring. Five exclusive-NOR (XNOR) gates are used to compare each pair of bits from the two addresses. Three AND gates are used to determine whether or not all five XNOR gate output values are true (indicating the two addresses match) and whether or not a write operation is enabled. In the case where the two addresses match and a write operation is enabled, the circuit output is true. All the logic gates in the comparator are differential ECL or CML circuits. Two buffers are driven in parallel by the output of the last AND gate since the comparator must broadcast its result to 32 output latch select lines. Using the two buffers, the fanout of each buffer is only 16 loads.

2.10    Current source design

            Each current source for the register file and test circuits is essentially a current mirror with emitter degeneration resistors, as Figure 2.19 illustrates [64]. The reference generator circuit provides a reference voltage (VREF) that is applied to the base of each current source device (QCSx). Assuming RE and REREF are equal, the current flowing through each current source device is nearly identical to the reference current (IREF) through QREF. The value of IREF (and hence IOx) is determined for the most part by the value of RREF. Assuming the b of each HBT approaches infinity, the value of RREF to achieve the desired current flow (IREF) through the current sources, assuming RE is equal to this value, is

 

.

( 8 )

Since VBE varies as a logarithmic function of IC in a bipolar device and the value of VBE is small relative to the power supply voltage, using an average value of VBE for a forward biased HBT is generally sufficient to obtain a reasonable estimate of RREF. The purpose of QBCC is to diminish the amount of base current from the current sources that flows through RREF. This reduces the change in IREF as a result of varying the number of current sources driven by the reference circuit by a factor of b+1. In this way, the addition of QBCC reduces the dependence of the current flowing through the current sources on the number of current sources driven by the reference circuit [64],[65]. Values of resistors used in the current source reference generators to provide current sources at a number of different current levels are listed in Table 2.3. The values for REREF generated using Equation 8, assuming the VBE values are 0.9 V, were used for the most part in the current source reference generator designs. Some modifications were made based on SPICE simulation results.

Figure 2.19: Current source implementation.

Current

REREF

RREF

Equ. 8

Current

REREF

RREF

Equ. 8

100 mA

4 kW

25 kW

23 kW

2 mA

200 W

1.2 kW

1.2 kW

900 mA

440 W

2.6 kW

2.6 kW

4 mA

100 W

590 W

580 W

1 mA

400 W

2.3 kW

2.3 kW

 

 

 

 

Table 2.3: Resistor values used in some current source reference generators to achieve the specified current level. Current sources using the reference generators are assumed to have RE values equal to REREF. Values generated using Equation 8 assume VBE is 0.9 V.

            Some mismatch in the current flowing through each current source with respect to the reference current is observed since, in general, VOx does not equal VCQ. Assuming a particular current source device is not saturated, the variance in current through the current source from the reference current is a function of the current source output resistance and the difference between VOx and VCQ. One would like a large current source output resistance in order to produce only a small variation in the current flowing through the current source as a function of the variation in the voltage applied across the current source. The purpose of the emitter degeneration resistor in a particular current source is to improve the output resistance of the current source. In a simple current mirror without emitter degeneration resistors, the output resistance is approximately equal to the output resistance of the bipolar device (ro) used to implement the current source. By constructing a small signal circuit for a current source in Figure 2.19, as shown in Figure 2.20, using a hybrid pi model for the HBT and assuming an ideal VREF [65], it can be demonstrated that the output resistance of the current source increases to

 

.

( 9 )

 

Figure 2.20: Circuit used to estimate the current source output resistance.

            In a typical current source design of this type, the voltage drop across RE is less than the VBE of a forward biased HBT. Assuming a voltage drop of VBE is available, a more sophisticated current source such as a cascode or Wilson current source can be used [64],[65], resulting in a much greater improvement in the current source output resistance. Since the voltage drop across RE is relatively small, RE is typically much smaller than rp, resulting in a simplification of the current source output resistance [64],[65], producing

 

.

( 10 )

The product IORE is approximately the voltage dropped across RE. Therefore, if this value is on the order of VBE for a forward biased HBT, it is much greater than kT/q, resulting in a significant improvement in the current source output resistance with respect to the current mirror output resistance. For the register file current sources, RE is typically chosen to produce a voltage drop of 0.4 V across RE. This value is significantly lower than the VBE of a forward biased device, yet allows a substantial increase in the current source output resistance.

            In the event that RE is not equal to REREF in the current source implementation shown in Figure 2.19, IOx is no longer equal to IREF. It is convenient to use different values for IOx and IREF when the desired value of IOx is either very large or small. For very large values of IOx, using a small value for IREF reduces power consumption. For very small values of IOx, a larger value for IREF is convenient to avoid using large resistors in the current source reference voltage generator. To determine the relationship between IOx and IREF, it is necessary equate VBEQCSx and VRE with VBEQREF and VREREF [64]. By equating these terms and substituting current expressions for each voltage term (using Equation 14 for each VBE term), after some algebraic manipulation, one finds that

 

,

( 11 )

given “a” is the area of an emitter stripe and assuming b approaches infinity. The simplest way to design a current source with a value of IREF that differs from IOx is to use the ratio of IREF to IOx in choosing the device parameters. That is, one should set aREF / aOx equal to IOx / IREF to allow the logarithmic term in Equation 11 to approach zero, as it does when IOx is the same as IREF, and aOx is the same as aREF. Also, one should set REREF / RE equal to IOx / IREF to allow the same voltage drop across both resistors. Using these relationships, the desired ratio of IOx to IREF will be obtained according to Equation 11.

            The component values of three current sources in which IREF is less than IOx are listed in Table 2.4. These current sources, which are for the pad driver, the write word line drivers, and the read word line drivers, were designed using a lower value of IREF to save power since all three circuits draw a large amount of current. The reference current for the write word line drivers is twice that of the read word line drivers because the write word line driver current source reference voltage generator drives 32 current sources while the read word line driver current source reference voltage generator only drives one current source. Therefore, more current was used in the write word line current source reference voltage generator to make it less sensitive to fluctuations in the current drawn through the reference voltage line by the current sources. Note that the resistor and device size ratios mentioned above have been used, for the most part, in obtaining the desired ratio of IOx to IREF. A minimum length of 2.75 mm was required for the HBT’s with two emitter stripes to produce layouts that meet the design rules, which is why some of the device size ratios are not well matched to the current ratios. The resistor values that are predicted by Equation 8 are higher than the resistor values predicted by SPICE, possibly because of a poor estimate of the VBE values.

IOx

IREF

RE

REREF

aOx

aREF

RREF

Equ. 8

8 mA

4 mA

50 W

100 W

5 mm2

2.75 mm2

587 W

628 W

17.2 mA

8.6 mA

23.3 W

46.5 W

9 mm2

4.5 mm2

245 W

267 W

17.2 mA

4.3 mA

23.3 W

76.2 W

9 mm2

2.75 mm2

440 W

552 W

Table 2.4: Resistor values used in some current sources to achieve the specified current level when IOx is not equal to IREF. Values obtained using Equation 8 assume VBE is 0.9 V.

            Because of the small bias currents in the sense amplifier, it is inconvenient to design a current source with a reference current equal to the current drawn by the current sources. This is because the value of RREF under these conditions becomes prohibitively large, thereby wasting chip real estate. For this reason, current sources similar in design to a Widlar current source were used to bias the sense amplifiers [64],[65]. A Widlar current source is like the current source shown in Figure 2.19, with the exception that REREF is zero. Although a traditional Widlar current source does not contain QBCC, but instead provides a short circuit between the base and collector of QREF, the sense amplifier current sources employ QBCC in the reference voltage generators. The relationship between IREF and IOx in the Widlar current source can be derived from Equation 11 by setting REREF equal to zero and solving for IREF, yielding

 

.

( 12 )

            Since 1.6 KW resistors are already used in the sense amplifier for RD and RDb, the value of RE was chosen to be 1.6 KW for the sense amplifier bias current sources as well. This is because it is the largest resistor that fits well in the sense amplifier layout. Using Equation 12, IREF for the sense amplifier bias current sources is determined to be 200 mA assuming m is 1, or 100 mA assuming m is 1.5 (see Chapter 4). The actual value of IREF used was 460 mA, which produced a value of 31 mA for the sense amplifier current sources according to SPICE. Based on the SPICE simulations, RREF was set to 6.2 KW to produce this value for IREF. Therefore, the transistor behavior modeling is critical in determining the correct value for IREF to produce the desired value of IOx for this type of current source. If the transistor is not modeled well analytically, and the wrong value of m is chosen, significant variation from the expected value of IOx, as a result of the poor choice of IREF, will occur.

            As mentioned earlier, the bias current sources of the sense amplifiers were adjusted for the second register file fabrication run. Since altering the layout of the sense amplifier cells would be difficult, the current through the bias current sources were adjusted solely through modifications to the bias current source reference voltage generators. This was accomplished by adjusting IREF to 0.9 mA for each reference voltage generator and by adding an emitter degeneration resistor (REREF). The value of REREF was chosen such that bias current sources draw 0.2 mA. Using Equation 11 and assuming m is 1.5, REREF was computed to be 290 W. Assuming m is 1, REREF was computed to be 310 W. According to SPICE, assuming IREF is 0.9 mA and IOx is 0.2 mA, and RE is 1.6 KW, REREF must be 340 W. Therefore, transistor modeling is important in correctly determining the appropriate resistor values for this current source as well. Using Equation 8 and assuming VBE is 0.9 V, RREF was determined to be 2.7 KW to allow IREF to be 0.9 mA. The value of RREF required to allow IREF to equal 0.9 mA is also 2.7 KW according to SPICE.

2.11    Reference voltage generator design

            The schematics for a reference voltage generator that provides the reference voltage for a set of 32 single-ended ECL NOR gates are shown in Figure 2.21. QREF, QBCC, RREF, and RE form a current source reference voltage generator, as described above. This circuit provides VREF for the current sources of the first stage of a particular decoder, as well as the current sources involving devices QCCS and QECS in Figure 2.21. This is done to match the current flow through QCCS to the current flow through the current sources in the stage one decoder as closely as possible. The diode QD produces a voltage drop to decrease the voltage applied across the current source so that it more closely matches the voltage drop across each current source in the corresponding stage one decoder. This further improves the match in current flow between the two types of circuits.

Figure 2.21: Reference voltage generator schematics.

            The current source involving QCCS continuously draws current through two pull-up resistors connected in parallel, each of the same value as the pull-up resistors in the stage one decoders. Therefore, the voltage at CSW1 should be about midway between the high and low voltages observed at B and Bb in the stage one decoders. CSW1 drives an emitter follower circuit whose output is the reference voltage for the single-ended ECL NOR gates. The purpose of the emitter follower is to mimic the operation of a wired-OR line. That is, it is designed to produce a voltage drop between CSW1 and VREFCS similar to the VBE drop from the wired-OR line input (B or Bb) to the wired-OR line output. Two parallel HBT’s are used in the emitter follower since on average, a wired-OR line with a high output is driven high by two input devices which in turn are driven by CML buffers in the stage one decoder. Using this design, the value of VREF is approximately midway between the high and low signal voltage levels from the stage one decoders that drive the single-ended ECL NOR gates.

2.12    Device count

            The register file uses a total of 11,365 HBT’s and 3,767 resistors. Table 2.5 provides a summary of the number of devices used by each group of circuits within the register file. The majority of the devices are used in the memory cells. Most devices are the minimum size allowed by the design rules to minimize the space required for the register file. This not only conserves chip real estate, but helps to minimize wire lengths, which should improve chip performance as well. Devices were made larger than the minimum size only when required to handle large currents, as in the case of the devices in the write word line drivers, for instance.

Circuits

HBTs

Resistors

Circuits

HBTs

Resistors

memory cells

8224

2048

read stage 2 decoders

448

192

write word drivers

96

64

write bit line drivers

224

160

read word drivers

70

8

feed forward buffers

192

192

sense amplifiers

384

384

write stage 2 decoders

320

160

stage 1 decoders

141

45

output latches

960

320

write enable decoder

7

2

comparators

188

88

wired-OR current sources

36

36

reference voltage generators

75

67

Table 2.5: Number of devices used in the circuits comprising the register file.

2.13    Register file layout

            The register file memory cell layout, shown in Figure 2.22, determines, to a large extent, the size and framework of the overall register file layout. It has dimensions of 20 mm by 37 mm. The size of the memory cell is large compared to similar memory cells designed using CMOS devices. For instance, using a 0.6 mm BiCMOS technology, a functionally equivalent three port memory cell was designed with dimensions of 19.8 mm by 15.1 mm [41],[42]. The area of the bipolar memory cell is about 2.5 times larger than that of this CMOS memory cell. One factor that affects the size of the bipolar memory cell is amount of steady state current that must be delivered to maintain the stored data in the memory cells, as well as the steady state current that must be delivered to perform read and write operations (17.2 mA in each case). Since this current is delivered by one or more word lines for a particular operation, the word lines were implemented in metal 3. This is because metal 3 is thicker than the other metal layers and, therefore, has a lower resistance as a function of the length of a wire segment for a given width when compared to metal 1 and metal 2 wires. Even using metal 3 for the word lines, however, the width of each wire was still fairly wide in order to prevent a high voltage drop over the length of each wire and to satisfy the electromigration rules. For this reason, the metal 3 word lines were the limiting factor in minimizing the memory cell length in the north-south direction.

Figure 2.22: Memory cell layout (north facing left).

            Another factor that affected the size of the memory cell is the size of each SiGe HBT. Since the metal 2 bit lines only carry a small amount of current (about 0.54 mA), they can be minimally sized and, therefore, have a minimal impact on the width of the memory cell in the east-west direction. Instead, the widths of the HBT’s and lengths of the resistors limit the size of the memory cell in the east-west direction. Also, although the memory cell size is limited by the metal 3 word lines in the north south direction, the lengths of the HBT’s and widths of the resistors would have made it difficult to make the memory cell any smaller in the north-south direction had the metal 3 word lines not been the limiting factor. Therefore, the bipolar memory cell is 2.5 times larger than the cited CMOS memory cell largely because a SiGe HBT with an emitter area of 0.5 mm by 1 mm has a total area of about 6.5 mm by 4.7 mm, while a typical CMOS device in the cited CMOS memory cell with a gate area of about 0.6 mm by 3 mm has a total area of about 3 mm by 3 mm. The large aspect ratio of the bipolar memory cell (about 1.9 to 1) is due to the fact that the memory cell size is constrained in the north-south direction by the metal 3 word lines, while the HBT’s can be placed under the metal 3 lines in a manner that does not impact the length of the memory cell in the north-south direction, and yet does not require nearly as much space in the east-west direction.

            The layout for the entire register file is shown in Figure 2.23. The overall register file layout dimensions are about 1.0 mm by 1.8 mm. The memory cells occupy the majority of this area. Since the word lines run from east to west, it is convenient to put the word line drivers on the east and west sides of the memory cell array. Since each write word line driver requires three devices capable of handling 17.2 mA of current, while each read word line driver only requires one such device, the write word line drivers were placed on the west side of the memory cell array, while both read word line drivers were placed on the east side of the array. The diode driving the top word line was also placed on the east side of the memory cell array. This was the most convenient way to fit the large devices along the sides of the memory cell array. The write address decoder was placed west of the write word line drivers, while the read address decoder was placed east of the read word line drivers to minimize the length of the wires connecting the circuits. For this reason, the write address input lines and the write enable line are located on the west side of the register file, while the read address input lines are located on the east side of the register file.

Figure 2.23: Register file layout.

            The read port A sense amplifiers and output latches, as well as the bit line drivers, are located north of the memory cell array, while the read port B sense amplifiers and output latches, as well as the comparators, are located south of the array. Therefore, the data input lines and the read port A data output lines are located at the north side of the register file, while the read port B data output lines are located at the south side of the register file. The circuitry to provide output data for the read ports are located on opposite sides of the register file because the thin width of the memory cells made it nearly impossible to place both sets of circuitry on the same side of the register file in a convenient manner. Both comparators were placed on the same side of the register file to minimize the length of the write address input lines, which drive both the write address decoder and both comparators. The length of the read address input lines, which drive both a read address decoder and one of the comparators, on the other hand, were not significantly affected by this choice of the placement for the comparators.

 


 

Chapter 3                                                                       Test Chip Design

 

A circuit is of no use to the designer if it cannot be adequately tested using the equipment available to the designer. In the case of the register file circuit, testing to determine its performance, as well as functional testing, is necessary to properly evaluate the design. Given the limited test equipment available, test circuitry was required on the register file chip to perform the necessary tests. As well as the register file test chip, two other test chips were designed to evaluate the performance of the SiGe HBT BiCMOS technology using some simple circuits.

3.1        Register file test chip overview

            The purpose of the register file test chip (RFTC) is to provide a means of testing both the functionality and performance of the register file. The testing scheme is similar to a scheme developed to test a 2-Kb RAM implemented using an AlGaAs/GaAs HBT technology [66],[67]. A block diagram of the test chip is shown in Figure 3.1. During normal operation, three 5-bit counters supply the register file with repeating patterns of sequential addresses for read and write operations. A rotator supplies the register file columns with a repeating 8-bit pattern that is used as input data for write operations. Using these circuits, data can be written into the register file and then read using either read port to determine whether or not the register file is functioning properly. A scan mode also exists in which the counters and rotator are linked in a scan chain that allows data to be serially loaded into the counters and rotator. This is useful for providing specific patterns of input data for the rotator as well as for offsetting the addresses of the counters by a set amount. Table 3.1 lists the register file test chip I/O pads, along with descriptions of their functions.

Figure 3.1: Test chip block diagram.

            Each address counter is designed as a 5-bit state machine. This means that each counter is composed of a 5-bit register, which stores the state of the counter, and a next state decoder, which computes the new state to load into the register during the next clock cycle [58]. Edge-triggered latches are required in the register implementation for the state machine to function properly. These types of latches capture and store data on each rising or falling clock edge. Without this capability, data appearing at the outputs of the register can propagate back through the next state decoder and alter the original data written into the latches before the write operation of the original data has completed. One method of approximating edge-triggered behavior is to use what is known as a master-slave latch. This latch is simply two D-latches connected in series, where the clock driving the first latch (the master) is inverted with respect to the clock driving the second latch (the slave). If the clocks are arranged such that the slave latch writes when the clock is high and the master latch writes when the clock is low, data to be latched into the master-slave latch is first stored in the master when the clock is low. When the clock becomes high, data stored in the master is then written into the slave, and the master stops writing new data. Therefore, if the input data changes after the initial rising edge of the clock, the data is not passed on to the slave, resulting in the preservation of the data written into the slave at the rising edge of the clock. At the falling edge of the clock, data can again be written into the master, but at this point the slave has stopped writing data. Therefore, the new data is not written into the slave until the next rising clock edge. A falling edge-triggered master-slave latch can be made simply by allowing the master to write when the clock is high and allowing the slave to write when the clock is low.

Pad Name

Description

Analog Control

Selects VCO frequency within selected band.

Clock Select

Selects between VCO and external clock.

External Clock

Clock for shift operations. Can be used in test mode as well.

External Write

Enables an asynchronous write operation

Output Select

Selects signal that is driven off-chip for viewing.

Scope Output

Produces an output signal to be viewed on an oscilloscope.

Scan

Selects between scan and test modes of operation.

Shift In

Provides data for the scan chain and selects the VCO frequency band.

Write Delay

Selects clock delay to optimize synchronous write operations.

Write Enable

Enables synchronous write based on the selected clock

Write Select

Selects between external and synchronous write signals.

Table 3.1: List of register file test chip pads and their functions.

3.2        Address counter design

            The gate level schematics for the read address counter, shown in Figure 3.2, contain five rising edge-triggered master-slave latches, which store the state of the counter. The next state decoder is simple conceptually. During normal operation, for each bit, if every lower order bit is high, the bit should change its state at the next rising clock edge. In all other cases, the bit should keep the same state. For the least significant bit, this implies that this bit should change state on every rising clock edge. This algorithm

Figure 3.2: Read address counter gate level schematics.

will produce a 16-cycle repeating pattern from 0 to 15 using a five digit binary format, which is stored in the master-slave latches. The next state logic is implemented using XOR and AND gates, some of which are combined in a single current tree. The XOR

Figure 3.3: Write address counter gate level schematics.

gates serve as programmable inverters. That is, assuming one input terminal of the XOR gate is the data and the other input terminal is the control, when the control input terminal is low, the data passes through the XOR gate unaffected. However, when the control input terminal is high, the data going through the XOR gate is inverted. Therefore, an XOR gate can be used to change the state of a particular stored bit, which is fed into the XOR gate data input terminal, given the AND of all the lower order stored bits produces a high value, which is fed into the XOR control input terminal. Note that each master-slave latch has a 2-to-1 multiplexer built into the master latch. This allows input data from the next state decoder to be selected during normal operation to provide a counting operation, or data from the previous latch in the scan chain to be selected during scan mode. In scan mode, data shifts through the address counters from least significant bit to most significant bit. Since the read address counters must drive the read address decoders of the register file, level 2 outputs were included on every slave latch. The write address counter, shown in Figure 3.3, is identical to the read address counters with the exception that a level 3 output is provided as well to drive the register file comparators.

3.3        Data rotator design

            The data rotator, shown in Figure 3.4, operates like a shift register. In scan mode, data is shifted from one latch to the next on every rising clock edge. This is also true in normal operation. However, in normal operation, the first latch in the chain stores data previously held by the last latch in the chain, allowing the data stored in the last latch to

Figure 3.4: Data rotator gate level schematics.

be shifted into the first latch. In this way, an 8-bit pattern that is shifted into the data rotator in scan mode will repeat itself in normal mode every eight cycles. Each slave latch provides a level 2 output to drive the register file bit line drivers.

3.4        Write enable pulse generator design

            The write enable pulse generator, shown in Figure 3.5, produces the write enable signal that is used by the register file. When the write select signal is high, the write enable signal used by the register file is simply a delayed version of an externally applied write signal. When the write select signal is low, however, the write enable signal used by the register file is a pulsed signal that is derived from the test chip clock when the write enable signal is high. To generate the synchronized signal, an AND operation is performed between the write enable signal and the test chip clock, producing a square wave output similar to the test chip clock when the write enable signal is high. When the write enable signal is low, however, the write enable signal used by the register file is always low. The latch prevents runt write enable pulses from being produced at the AND gate output by insuring that the write enable pulse at the AND gate input only has a transition immediately before the clock signal at the other AND gate input is asserted.

Figure 3.5: Write enable pulse generator schematics.

            The output of the multiplexer that selects between the external write signal and internally generated write signal drives a chain of CML buffers. The purpose of the buffers is to provide additional delay for the write enable pulse used by the register file to insure that the appropriate address and data setup times are observed before a write operation is performed. The second and last buffers in the series drive a 2-input multiplexer that allows the amount of delay to be selected, depending on the value of the write delay signal.

3.5        Sampling latch design

            A set of 32 slave latches is provided for each read port. These latches, with schematics shown in Figure 1.5, receive data from the register file output latches. By clocking the latches such that the output latches write in data when the clock is high and the slave latches write in data when the clock is low, effectively the pairs of latches act as falling edge-triggered master-slave latches. This means that, after new read addresses are presented on each rising clock edge by the read address counters, the corresponding data read out of the register file read ports is latched on every falling clock edge. Therefore, assuming the proper data is latched within the time period allowed by the clock, the register file read access time is less than or equal to one half of the clock period. Because master-slave latches are used to capture the data read out of the register file, late arriving data cannot appear at the output of the slave latches until the next falling clock edge. Without the slave latches, after missing the window between the first rising clock edge and the subsequent falling clock edge, late data may become visible at the output terminals of the output latches at the next rising clock edge, making it more difficult to interpret the test results when trying to determine the minimum read access time.

3.6        Viewing register file test chip signals

            Because of limited chip probing capability, only one output pad is available on the test chip. For this reason, a tree of multiplexers is used to select data to view from a particular register file column through either read port. The test chip design only allows one to observe the output of one half of the register file columns from each read port. Four 4-to-1 multiplexers are used to select the output of four columns from the 16 available columns for each read port. Another 4-to-1 multiplexer is used to select the output of one of the four selected columns for each read port. Other signals that can be observed include the most significant bit of each counter and the last rotator bit, which are selected using a 4-to-1 multiplexer. This multiplexer also receives input data from a 2-to-1 multiplexer, which is used to select between the clock and the read port A match

SELA3

SELA2

Description

0

0

Clock or Match

0

1

Read Port A

1

0

Read Port B

1

1

Counters or Rotator

Table 3.2: Ouput pad signal as a function of SELA3 and SELA2.

SELB3

SELB2

SELA3=0,SELA2=0

SELA3=0,SELA2=1

SELA3=1,SELA2=0

SELA3=1,SELA2=1

0

0

Clock

Column

Rotator

0

1

Clock

Column

Read Counter B

1

0

Match A

Column

Write Counter

1

1

Match A

Column

Read Counter A

Table 3.3: Ouput pad signal as a function of SELB3, SELB2, SELA3, and SELA2.

SELB3,SELB2,SELC3,SELC2

Column

SELB3,SELB2,SELC3,SELC2

Column

0000

0

1000

16

0001

2

1001

18

0010

4

1010

20

0011

5

1011

21

0100

8

1100

24

0101

10

1101

26

0110

12

1110

28

0111

13

1111

29

Table 3.4: Column selection as a function of SELC3, SELC2, SELB3, and SELB2.

signal for observation. Finally, a 4-to-1 multiplexer is used to select between the above-mentioned signals for observation through the pad driver. The manner in which these signals are selected is illustrated in Figure 3.1. Six select signals obtained from input pads control the selection through the multiplexer tree. The manner in which these signals determine the output signal selection is summarized in Table 3.2 through Table 3.4.

3.7        Pad receiver designs

            The circuit design for a level 2 pad receiver that accepts a single-ended signal from an input pad is shown in Figure 3.6. To provide protection against damage to the bipolar device connected to the pad, large diodes, known as electrostatic discharge (ESD) devices, are connected between the pad and VCC, and between the pad and VEE. The diodes are reverse biased under normal conditions and therefore have little influence on the normal circuit behavior. However, in the event of a large electrostatic build-up on the pad, the charge is shunted to either VCC or VEE through one of these diodes.

Figure 3.6: Schematics for a pad receiver with level 2 outputs.

            The input pad drives an emitter follower, which in turn drives one side of a single-ended ECL buffer. The emitter follower devices are much larger than the minimum size devices to make the device connected to the pad much more resistance to damage from electrostatic discharge. Another emitter follower circuit is used to generate a reference voltage (VREF) for a number of pad receiver circuits. Its input is tied to VCC while its output drives the other current switch terminal of the ECL buffer. Therefore, when the input pad voltage is VCC, both pad receiver output voltage levels are about equal. When the pad input voltage becomes significantly greater than VCC, the output differential voltage of the pad receiver indicates a high logic value. When the pad input voltage becomes significantly less than VCC, however, the output differential voltage of the pad receiver indicates a low logic value. If the input is left floating, a diode connected between VCC and the input pad is turned on, making the input pad voltage a diode drop below VCC. Therefore, the pad receiver produces a differential voltage corresponding to a low logic value when the input pad is left floating. For pad receivers requiring output levels other than level 2, the output emitter follower stages can be altered or removed. The input impedance of a pad receiver circuit is fairly high under normal conditions since the impedance looking into the base of the emitter follower input device as well as the impedance looking into the reverse biased diodes is high. This does not cause any serious problems when the pad is driven by a 50 W line, however, since the signals that drive the pad receiver circuits are control signals that are essentially static under normal testing conditions.

Figure 3.7: Schmitt trigger receiver schematics.

            Not all the test chip input signals are static. Signals such as SCAN and the external clock need fast rising and falling edges without hazards to minimize logic errors that result from distortions of these signals. For this reason, a Schmitt trigger receiver is used to produce a waveform with fast rise and fall times even if the input pad waveform rise and fall times are relatively slow. Also, through hysteresis, the Schmitt trigger receiver does not respond to most noise on the input signal that would cause logic hazards using a normal pad receiver. A circuit with hysteresis switches from a low state to a high state at a higher input threshold voltage than that which causes the circuit to switch from a high state to a low state. Therefore, after the input threshold is reached and the circuit switches to a high state, for instance, a large amount of noise is required to reach the threshold voltage required to switch the circuit output back to the low state.

            The implementation of the Schmitt trigger receiver, shown in Figure 3.7, is a differential feedback amplifier based on a Cherry/Hooper amplifier [68]-[70]. This Schmitt trigger implementation is similar to a design used in a bipolar RISC processor [59]. A 50 W resistor is connected from VCC to the input pad to match the input impedance of the pad to the transmission line impedance. Also, diodes for protection against electrostatic discharge are provided for the Schmitt trigger receiver. As with the pad receiver circuits, the Schmitt trigger receiver input is referenced to VCC. When the input pad voltage is significantly lower than VCC, The voltage at O2 is about 0.3 V below the voltage at O2b since most of the current from ICS2 is flowing through RC. O2P is about 0.8 V below O2Pb, however, because most of the current from ICS1 flows through RFB. As the input pad voltage increases to VCC, O2P still remains about 0.3 V below O2Pb, which is a large enough differential voltage to insure that RCb is not conducting significant current. This means that the O2 is still about 0.3 V below O2b, as when the input pad voltage was well below VCC. No significant change in the current through RC and RCb occurs until the voltage at O2P approaches the voltage at O2Pb. This does not take place until well over half the current flowing through RFBb is redirected through RFB. For this to occur, the input pad voltage must be well above VCC. As the input pad voltage proceeds above this point, current is redirected abruptly from RC to RCb. This is because changes in the voltages across RC and RCb directly influence the voltage levels at O2P and O2Pb, providing positive feedback that aids in redirecting current from RC to RCb. Because of the circuit symmetry, it is easy to demonstrate that the same effect is observed at the input pad voltage is lowered from a point well above VCC. That is, the voltage must be lowered below VCC, at which point the Schmitt trigger output will switch back to its original state. In this way, the asymmetric switching of the Schmitt trigger prevents noise on the signal from being amplified into logic hazards. Also, the positive feedback between the output and input of the second current switch produces fast rise and fall times in the Schmitt trigger output that can be many magnitudes of order shorter than the input pad voltage rise or fall time.

3.8        Pad driver design

            The output pad driver, shown in Figure 3.8, is used to drive a single-ended signal off-chip onto a 50 W line. This design is similar to a differential output pad driver used in a bipolar RISC processor [59],[60]. The driver consists of a pair of emitter followers that drive an emitter-coupled pair. The collector of one of the emitter-coupled pair devices is connected to VCC while the other is connected to the output pad. This allows current to either be drawn from the 50 W line to produce a low output signal, or drawn from VCC to produce a high output signal on the 50 W line. The current source for the emitter-coupled pair draws 8 mA, which produces low voltage of 0.4 V below VCC, assuming the line is properly terminated. Since no current is flowing through the 50 W line when the output signal is high, the nominal high voltage on the line is VCC. A diode is connected between VCC and the output pad to provide a path for current to flow in the event the output pad is left unconnected or unterminated. Use of the diode allows the output impedance of the driver to remain high while the line is terminated, since this device is reverse biased in this case. This is advantageous in that the voltage swing on the output pad is larger than if a collector resistor is used. Although a 50 W collector resistor would make the output of the impedance match the transmission line impedance, it would cut the voltage swing at the output pad in half since the effective impedance under steady state conditions is then 25 W. This was not considered a priority, however, since the oscilloscope was expected to terminate the transmission line adequately. Therefore, no significant reflections of the incident waves at the oscilloscope were expected, making reflections of these waves at the output pad inconsequential. Large diodes are connected between VCC and the output pad, and between VEE and the output pad to prevent damage to the driver devices due to electrostatic discharge.

Figure 3.8: Pad driver schematics.

3.9        Voltage-controlled oscillator design

            A voltage-controlled oscillator (VCO) is present on the register file test chip to generate the clock directly on the chip. The test chip VCO is essentially a ring oscillator containing buffer stages with adjustable propagation delays [44],[71]-[74]. The voltage- controlled delay element (VCDE), shown in Figure 3.9, operates much like an ECL buffer [71]. The upper current switch, composed of Q2 and Q2b, receives a standard level 2 differential input signal pair, while the lower current switch, composed of Q3 and Q3b, receives a differential control voltage input signal pair. The differential voltage between CTL and CTLb determines the amount of current from ICSh that flows through Q3, and, therefore, flows through the upper current switch. RED and REDb are emitter degeneration resistors that increase the differential voltage required to direct current between Q3 and Q3b. In this way, the use of the emitter degeneration resistors produces a more linear distribution of current between the two devices as a function of the input differential voltage that is also less sensitive to the magnitude of the differential voltage. Because of this, the emitter degeneration resistors improve the control over the current flowing through Q3 and Q3b as a function of the input differential voltage, making it easier to adjust the delay through the VCDE. The majority of this current, along with most of the current drawn by ICSl, flows through the upper current switch where, depending on whether the input signal is high or low, is directed through either SD1 and RC or SD1b and RCb. The propagation delay through the VCDE after a change in the input differential voltage at I and Ib is inversely proportional to the amount of current flowing through the current switch. Therefore, as the differential control voltage is adjusted, the amount of current flowing through the upper current switch is altered, causing the propagation delay through the upper current switch to change.

Figure 3.9: Voltage-controlled delay element (VCDE) schematics.

            As the delay through the VCDE is adjusted, one would like to have the magnitude of the output voltage swing remain constant. Since the delay through the VCDE is adjusted by altering the current flow through the upper current switch, if collector resistors are used to determine the output voltage swing of the VCDE, the magnitude of the output voltage swing will vary as the delay is adjusted. For this reason, the Schottky diodes SD1 and SD1b are used to determine the magnitude of the output voltage swing of the VCDE. This is because the change in voltage across a Schottky diode as a function of a change in current flowing through the Schottky diode is logarithmic in nature. Therefore, less variation in the magnitude of the output voltage swing of the VCDE as a function of its delay is observed with the addition of the Schottky diodes.

            The CTL and CTLb inputs of all the VCDE’s are driven by a pad receiver circuit similar to a pad receiver with level 3 outputs. However, this pad receiver contains emitter degeneration resistors, as Figure 3.10 illustrates, to produce a smaller and more linear variation in the output differential voltage as a function of changes in the input voltage from the pad. This further improves the control of current flowing through Q3 and Q3b of each VCDE as a function of the pad voltage, making it easier to adjust the delay through the VCDE’s.

Figure 3.10: VCO analog frequency select pad receiver schematics.

            As the schematics in Figure 3.11 show, the VCO contains six VCDE’s arranged in a ring. Since there is a single inversion in the ring of buffers, there are no stable states and only one metastable state in which the ring can exist in steady state. This state occurs when the differential output voltage of every VCDE is zero. This state is not stable since any noise pulse across a pair of differential lines is sensed by a VCDE, resulting in an amplified version of the noise at the VCDE output. This differential noise pulse is then further amplified by the next VCDE and continues to gain strength through each VCDE stage. Because of the ring structure, the amplified signal supersedes the noise on the original differential lines and continues to gain magnitude, although the signal is inverted at this point. Eventually the amplitude of the pulse is the maximum allowed by the VCDE’s. In this way, a steady oscillating state is reached in which each VCDE output differential voltage alternately switches between high and low states. The time span between successive state changes is equal to the total propagation delay of the six VCDE’s since this is the time it takes for a transition to propagate around the ring. Therefore, the period of the waveform is equal to twice the propagation delay of all six VCDE’s. Because there are six VCDE’s, the output waveforms of adjacent VCDE’s are 30° apart in phase.

Figure 3.11: Voltage-controlled oscillator (VCO) schematics.

            One of the VCDE’s in the VCO drives a level 2 ECL buffer as well as the next VCDE in the ring to provide an output waveform. The ECL buffer drives the clock of a master-slave latch as well as one of the data inputs of a 2-to-1 multiplexer. An inverted version of the output of the master-slave latch is fed back to the master-slave latch data input. This means that for any particular state that the slave latch is storing, the opposite state will be written on the next rising clock edge. Therefore the output of the master-slave latch is a waveform of half the frequency of the VCO waveform clocking the master-slave latch. The master-slave latch output drives the second data input of the 2-to-1 multiplexer, which provides a selection between the faster and slower waveforms. To allow the test chip to be clocked externally, an additional 2-to-1 multiplexer is used to select between the VCO clock and an external clock via a Schmitt trigger receiver. The external clock also serves as a scan clock and always supersedes the VCO clock in scan mode.

3.10    Register file test chip device count

            A total of 1,911 additional HBT’s and 880 additional resistors are used in the circuits designed to test the register file, resulting in a total device count of 13,276 HBT’s and 4,647 resistors for the register file test chip. Table 3.5 provides a summary of the number of devices used by each group of circuits. A significant number of these devices are used in the counters and the data latches, which sample register file data in conjunction with the register file output latches. As with the register file, most HBT’s used in the testing circuits are the minimum size allowed by the design rules. The main exception is in the pad driver, where devices with two long emitter stripes are used to provide enough current to drive a 50 W line. The devices connected to the pads in the pad receivers are also longer than the minimum size to make them less susceptible to electrostatic discharge. Other devices used on the test chip include special diodes to prevent electrostatic discharge from damaging the devices connected to the pads and Schottky diodes in the VCDE’s to maintain constant voltage swings as the currents through the cells are varied.

Circuits

HBT’s

Resistors

Circuits

HBT’s

Resistors

data latches

448

192

output multiplexers

191

41

read counters

332

116

VCO

98

70

write counter

172

58

I/O pads

195

120

data rotator

148

64

clock distribution

144

80

write generator

60

37

reference voltage generators

50

51

miscellaneous

73

51

register file

11365

3767

Table 3.5: Number of devices used in the circuits comprising the register file test chip.

3.11    Register file test chip layout

            The layout for the register file test chip, shown in Figure 3.12, occupies an area of 2.6 mm by 2.2 mm. The write enable pulse generator and the write address counter are located west of the register file, while the read address counters are located east of the register file. The data rotator and the read port A column select multiplexers are located north of the register file, while and the B read port column select multiplexers are located south of the register file. The locations of these circuits were chosen to provide the shortest possible connections between the register file I/O lines and the test circuit I/O lines. The RFTC VCO and additional multiplexers and clock buffers are located east of the read address counters. The location of the RFTC VCO was chosen to minimize the length of the wires from the clock buffers to the read address counters since these are the most critical test circuits. Also, this location allows similar wire lengths between the clock buffers and the sampling latches for read port A and B.

Figure 3.12: Register file test chip layout.

            The I/O pads and corresponding I/O circuits are located on the west, east and south edges of the test chip. A description of each of the pad functions as a function of the location of the pad on the register file test chip is illustrated in Figure 3.13. The pads on the east and west sides of the RFTC are designed for use with 10-pin probes containing six signal, two power, and two ground connections. The pads on the south side of the chip are design for use with either ground-signal-ground probes or needle probes if the signal frequencies are low. The pitch between all adjacent pads is 150 mm since the probes used for testing support this pad pitch.

Figure 3.13: Register file test chip pad arrangement.

            A standard cell approach was used to implement the test circuits on the register file test chip. This method allowed a cell to be laid out once and then used in more than one location if necessary. Since the test circuits are designed to operate much faster than the expected register file operation, any performance impact on the test circuits due to the standard cell approach should not impact the register file testing. A die micrograph of a fabricated register file test chip is shown in Figure 3.14. It is difficult to distinguish many of the chip features that can be seen in the RFTC layout since metal 3 covers most of the chip surface. Features hidden under metal 3 are not visible, unlike in older technologies, since the contours of these features have been removed through planarization techniques.

Figure 3.14: Register file test chip die micrograph.

3.12    Register file test chip alterations for the second fabrication run

            As mentioned in Chapter 2, an opportunity to fabricate a second version of the register file test chip was provided about a year after the fabrication of the first test chip. During the testing of the first version of the test chip, some problems became apparent (see Chapter 6). Alterations were made to the address these problems.

            The first problem that was addressed was the erratic behavior of the write address counter. When viewing the top bit of the write address counter, the variable nature of the waveform made it difficult to trigger on the oscilloscope. When triggering was obtained, the frequency of the resulting waveform was twice the expected frequency (1/16 of the clock frequency instead of 1/32 of the clock frequency) for the range of clock frequencies that produced viewable results. Simulations performed on this circuit predicted normal behavior for the counter, indicating that there was probably a timing problem in the circuit. Since the circuit is a simple state machine and the buffer that drives the write counter is on the opposite side of the register file, it was hypothesized that the differential voltage between the two signals representing the clock was degraded while traversing the distance to the write counter. This degradation was suspected to interfere with the normal simultaneous clocking of the master-slave latches in the write counter, producing an unexpected, inconsistent output from the top bit of the counter. To remedy the problem, an additional buffer was placed between the original buffer and the write address counter latches. This buffer is located near the write address counter. In this way, the loading of the original buffer that drove the write counter is reduced to one buffer, allowing it to more easily drive the parasitics associated with the wires. This cleaner signal is fed through the new buffer, which can amplify the signal if there is still degradation and drive the write address counter latches with lower wire parasitics.

            The second problem that occurred was the inability to scan in a pattern for the data rotator and have the pattern rotate through the circuit when the test chip was switched back into test mode. Although a waveform was observed at the rotator output and the write address counter output (which feeds the rotator in scan mode) that was at the same frequency as the waveform driving the scan in pad, when switching the scan pad low, the output of the rotator was either stuck at one or stuck at zero. This is inconsistent with the simulation results. One possible explanation for this problem is that instead of acting as a shift register in scan mode, the data bit at the scan input to the rotator propagates through all the rotator latches on a single rising clock edge. Therefore, when the test chip is put back into test mode, the output is either always high or always low. The rotator was laid out, however, such the latches are clocked in the opposite order that the data traverses through the latches. That is to say, although the latch clock inputs are all driven by the same driver, the wires connecting the latch clocks go from the clock driver to the last latch in the chain, to the second last latch in the chain, etc., until the first latch in the chain is reached. Therefore, if there is any clock skew, the last latch will be clocked first, receiving the old data from the second last latch before it is clocked. The second last latch will receive the old data from the third last latch in the same manner. This effect should ripple down the shift register, allowing each latch to receive the appropriate data from the previous latch without producing the suspecting propagation of the scan input data through all the latches on a single clock edge.

Figure 3.15: Second register file test chip pad arrangement.

            Two alterations were made to remedy the problem with the rotator. First, two buffers were placed in parallel between the original buffer driving the data rotator clock and the data rotator latches. These buffers were placed near the rotator, and each drives four of the rotator’s master-slave latches in parallel. This serves to decrease the loading on the original buffer driving the rotator, which is of particular concern since the rotator is not near the original clock buffer. Two buffers were chosen to drive the rotator latches since this scheme reduces the loading of these buffers to a level below that of the read address counter clock buffers. This is important since the read address counters are known to work. To balance the loading of the two buffers as much as possible without adversely affecting the performance of the circuit, one of the buffers drives latches zero, one, six, and seven, while the other buffer drives latches two through five. In this way, each buffer drives four consecutive rotator latches and, since latch two is not far from latch zero, the wiring parasitics each buffer must drive are comparable. This reduces the clock skew between the latches driven by the two buffers. The second alteration to the rotator is a redesign that allows the rotator to also function as a linear feedback shift register (LFSR). This allows the automatic generation of pseudo-random test patterns that serve as input data for the register file. An additional pad was required to select whether the new circuit behaves as a rotator or LFSR. The new pad layout for the revised test chip is shown in Figure 3.15.

3.13    Linear feedback shift register design

            An LFSR is a shift register with feedback to allow it to produce a sequential bit pattern that is pseudo-random in nature [72]. The feedback signal, which is the XOR of the last latch output and one or more other latch outputs from the shift register, is fed into the input of the first latch of the shift register. Therefore, the LFSR requires no input data. The feedback is designed to allow the LFSR to visit each of the possible states except state 0, where all the latches store signals representing false. Since the XOR of any number of false inputs is a false output, an LFSR with all latches in the false state will continuously shift false values through the shift register, remaining in state 0 indefinitely. Therefore, an LFSR with N latches with have 2N-1 states that it visits normally. This means that a repeating pattern of 2N-1 bits can be obtained by using the output of any of the LFSR latches. Of course the patterns at the outputs of the latches will be identical, although skewed in time. Table 3.6 tabulates possible sets of latch outputs that can be XOR’ed to produce an LFSR out of a shift register for a number of different size shift registers.

            For the register file test chip, eight sources of input data for the 32 register file data inputs are still desired. However, a 255 bit LFSR pattern would be difficult to view on an oscilloscope for verification purposes. In addition, an 8-bit LFSR requires four latch outputs to be XOR’ed and fed back to the shift register input, which may adversely impact the LFSR performance. A 6-bit LFSR, however, provides a 63-bit pattern, which is of sufficient length to produce 32-bit pseudo-random patterns that can be written into all the register file addressable locations for a given bit within the 32-bit data word. Therefore, any given register file column will store a 32-bit portion of the 63-bit LFSR pattern after 32 consecutive write operations, assuming the write address counter produces sequential addresses. For these reasons, a 6-bit LFSR was chosen to be integrated with the 8-bit data rotator.

Size

XOR’ed bits

Size

XOR’ed bits

Size

XOR’ed bits

Size

XOR’ed bits

1

0

5

1, 4

9

3, 8

13

0, 2, 3, 12

2

0, 1

6

0, 5

10

2, 9

14

0, 10, 11, 13

3

0, 2

7

0, 6

11

1, 10

15

0, 14

4

0, 3

8

0, 4, 5, 7

12

2, 3, 6, 11

16

1, 2, 4, 15

Table 3.6: List of shift register latches that can be XOR’ed and fed back to the shift register input to produce LFSR’s of different sizes. Shift register latches are numbered from 0 to N-1, where latch 0 is at the beginning of the shift register.

            The schematics for the data rotator/LFSR are shown in Figure 3.16. The circuit contains eight shift register latches. As with the original data rotator, the first latch in the shift register chain contains an integrated multiplexer. In scan mode, this multiplexer selects the scan in data from the write address counter, as before. However, in test mode, the multiplexer selects data from another multiplexer with an integrated XOR gate feeding one of its inputs. Therefore, in test mode, if the rotate option is selected, the second multiplexer selects data from the last latch on the scan chain, allowing the data in the shift register to cycle through the latches indefinitely, producing a rotating data pattern. However, if the LFSR option is selected, the second multiplexer selects the XOR of the first and sixth latches on the scan chain, producing a LFSR state machine using the first six latches in the shift register. In this case the last two latches in the shift register simply produce shifted versions of the 63-bit LFSR pattern. The 63-bit LFSR pattern is 101010110011011101101001001110001011110010100011000010000011111. In the event the LFSR starts up in a state where all the latches store a logic zero, the LFSR will remain in this state indefinitely as long as the test chip is in test mode. The test chip can be placed in scan mode, however, to allow a new state to be shifted into the LFSR. This will allow the LFSR to function properly when the test chip is returned to test mode.

Figure 3.16: 8-bit data rotator/6-bit data LFSR gate level schematics.

3.14    Ring oscillator test chip design

            Additional test chips were designed to test the performance of the SiGe devices. One of these test chips, the ring oscillator test chip (ROTC), contains three ring oscillators. The purpose of a ring oscillator is to provide a waveform that enables one to determine the propagation delay of a buffer. Each ring oscillator produces an oscillating waveform under the same principle as the register file test chip VCO. Unlike the VCO, however, the delays of the ring oscillator stages are not adjustable. Instead, the stages are implemented using simple CML or ECL buffers such as those shown in Figure 1.1 and Figure 1.6.

Figure 3.17: Ring oscillator schematics.

            On the ROTC, the three ring oscillators use 31 level 1 CML, level 2 ECL, or level 3 ECL buffers, respectively, arranged in a ring structure, as shown in the ring oscillator schematics in Figure 3.17. A large number of stages must be used to insure that the buffers in the ring oscillator reach a steady state value for a period of time between transitions. This is necessary to allow the buffer propagation delay to be determined assuming the buffer in question was producing a steady state output before the input transition occurred. An additional level 2 ECL buffer and a pad driver are provided to view the ring oscillator output. Note that a single inversion exists in the ring oscillator buffer ring to eliminate the stable states for the ring, causing oscillations at each ring oscillator stage output in a manner similar to that of the VCO ring stages. However, in this case the phase difference in the output waveforms of successive stages is 5.8°. The period of the ring oscillator output can be thought of as the time required for each of the buffer stages to perform a low-to-high transition and a high-to-low transition. Therefore, the propagation delay of each buffer is

 

,

( 13 )

where f is the ring oscillator frequency and s is the number of ring oscillator stages. The data obtained from a ring oscillator is useful, therefore, in predicting the propagation delay of a buffer driving a single load.

Figure 3.18: Ring oscillator test chip layout (north facing right).

            The layout for the ring oscillator test chip, shown in Figure 3.18, occupies an area of 1.5 mm by 0.43 mm. The level 1 ring oscillator is at the north end of the chip, the level 2 ring oscillator is in the middle of the chip, and the level 3 ring oscillator is at the south end of the chip. The level 1 ring oscillator is composed of standard cells laid out in two rows, allowing the ring structure to be created using relatively short signal wires between the buffers, even at the row boundaries. The level 2 and level 3 ring oscillators are composed of standard cells laid out in four rows, with connections between the rows made to provide a serpentine-like path for a signal through the buffers. The buffer at the end of the last row for each ring oscillator connects back to the first buffer of the first row. This scheme minimizes the signal wire lengths between the rows.

Figure 3.19: Ring oscillator test chip pad arrangement.

            A row of ten pads with a 150 mm pitch between adjacent pads is provided on the ring oscillator test chip for compatibility with a 10-pin probe containing six signal, two power, and two ground connections. Only three of the signal pads are used, however, since only one output signal is required for each ring oscillator. Note that this chip is longer than it needed to be to fit the layout for the three ring oscillators, but is instead bounded by the length required to accommodate the pad arrangement for the 10-pin probe. The location and function of each of the pads on the ring oscillator test chip is shown in Figure 3.19. A die micrograph of a fabricated ring oscillator test chip is shown in Figure 3.20. A ring oscillator test chip was submitted for the second fabrication run as well. This ring oscillator test chip is identical to the first ring oscillator test chip with the exception that the guard ring is spaced further from the pads in the second design. This is to prevent any of the pads from becoming inadvertently shorted to the guard ring, which is tied to VEE. This became a concern while testing the ring oscillator test chip from the first fabrication run, although no cases of a short were actually observed. The dimensions of the ring oscillator test chip for the second fabrication run are 1.5 mm by 0.46 mm.

Figure 3.20: Ring oscillator test chip die micrograph (north facing right).

3.15    Voltage-controlled oscillator test chip design

            A test chip was also designed containing high frequency voltage-controlled oscillator (HFVCO) circuits similar to the register file test chip VCO. Each HFVCO only employs four VCDE’s in the ring, however, and uses XOR gates to multiply the frequency of the signals produced by the HFVCO ring [44],[71]-[74], as the schematics for the HFVCO, shown in Figure 3.21, illustrate. The HFVCO VCDE and analog frequency control pad receiver schematics are identical to those used in the register file test chip VCO. Therefore, as with the register file test chip VCO, the HFVCO ring provides a periodic signal with a frequency that can be varied by adjusting the voltage on the analog frequency control pad. The difference in phase between the output waveforms of adjacent VCDE’s is 45° in the HFVCO, however.

Figure 3.21: High frequency voltage-controlled oscillator (HFVCO) schematics.

            An XOR operation can multiply the frequency of a digital signal by two if another version of the signal is available that is 90° out of phase with the first signal, and each signal has a 50% duty cycle. Two signals of this type are said to be in quadrature. The multiplication occurs because the quadrature signals alternate between having the same logic value and opposite logic values over equal intervals spanning one-fourth the signal period. Since an XOR operation produces different logic values depending on whether the two input logic values are the same or different, the logic value at the output of an XOR gate will change four times each period in response to the input quadrature signals. Therefore, the resulting output signal has a frequency twice that of the input quadrature signals. This type of frequency multiplication is illustrated in Figure 3.22.

Figure 3.22: Illustration of XOR operations on the HFVCO VCDE output waveforms to produce frequency multiplication.

            Since the difference in phase between output waveforms of adjacent VCDE’s is 45°, the difference in phase between output waveforms of nonadjacent VCDE’s (S0 and S2, or S1 and S3) is 90°. Therefore, XOR operations can be performed on S0 and S2, as well as S1 and S3 to produce two waveforms with double the frequency of the original waveforms (D0 and D1), as shown in Figure 3.22. Since the difference in phase between output waveforms of adjacent VCDE’s is 45°, the phase difference between D0 and D1 is 90°. This is because the time interval between edges of the output waveforms of adjacent VCDE’s is equal to the time interval between the edges D0 and D1. Therefore, because the frequency of D0 and D1 is double the frequency of the VCDE output waveforms, the phase difference between D0 and D1 is necessarily 90°. Since the phase difference between D0 and D1 is 90°, an XOR gate can also be used to double the frequency of these two signals, producing a signal (T0) with a frequency that is four times the frequency of the VCDE output waveforms, as shown in Figure 3.22.

            From Figure 3.21, one notes that the loading of the various circuits in the HFVCO appears balanced. That is, each VCDE drives another VCDE and an XOR gate, while each of the stage one XOR gates drive the stage two XOR gate. This is an important characteristic of the HFVCO design. If the loading of the VCDE’s is not equal, the delay through each of the VCDE’s will differ, resulting in phase shifts between adjacent VCDE output signals that vary from 45°. Since these signals drive the first set of XOR gates, the phase error in these signals causes the signals produced by the XOR gates to have duty cycles that vary from 50%. Because these signals drive the stage two XOR gate, the output signal of this circuit will have even more severe variations in its duty cycle. Also, depending on the exact nature of the phase error, the output signal of the stage two XOR gate will probably not have matched consecutive periods. That is, consecutive periods of the waveform may not have the same duty cycle, or even the same period. Therefore, it is important to maintain balanced loading among the members of the various groups of circuits in the VCO to prevent distortion of the output waveform.

            The standard implementation of a CML XOR gate, shown in Figure 1.4, does not present the same load to the drivers on each of its two input terminals. Instead, I1 connects to two current switches, while I2 only connects to one current switch. Also, I2 must be driven one level below I1. In addition, the propagation delay from I1 to O1 is different than the propagation delay from I2 to O1. For these reasons, using this XOR gate as a frequency multiplier for the high frequency VCO would create distortion in the VCO output waveform. An XOR gate circuit implementation with matched loading on the input terminals and matched propagation delays between input terminals and the output terminal [44],[73]-[74], shown in Figure 3.23, uses techniques that differ from the normal current tree logic described above. In this circuit, the nodes labeled X are each set

Figure 3.23: Balanced 2-input XOR gate schematics.

to one of three voltage levels. If I1 and I2 are both true, X1 drops to the lowest voltage level while X2 is pulled up to the highest voltage level since current is conducting through Q1b and Q2b. If I1 and I2 are false, however, X2 drops to the lowest voltage level while X1 is pulled up to the highest voltage level since current is conducting through Q1 and Q2. If I1 and I2 have different logic values, X1 and X2 are both set to a voltage midway between the high and low voltage levels since current is conducting through either Q1 and Q2b or Q1b and Q2. If I1 and I2 have the same logic value, on the other hand, X3 and X4 are both set to a voltage midway between the high and low voltage levels since current is conducting through either Q3 and Q4 or Q3b and Q4b. If I1 and I2 have different logic values, however, then either X3 is pulled up to the highest voltage level and X4 drops to the lowest voltage level, or X4 is pulled up to the highest voltage level and X3 drops to the lowest voltage level, depending on whether Q3b and Q4 or Q3b and Q4 are conducting current, respectively. Therefore, exactly one of the X nodes will have a high voltage level under steady state conditions. Emitter followers are used to shift the voltage of the X nodes down one level, which prevents Q5A through Q5D from becoming saturated, since the emitter follower outputs drive the base terminals of these devices. Because the collectors of Q5A and Q5B are both connected to O1, if current is flowing through either device, the voltage at O1 drops below VCC. Otherwise, the collector resistor attached to O1 pulls the node to VCC. Similarly, the voltage at O1b drops below VCC when current is conducting through either Q5C or Q5D. Otherwise the voltage at O1b is pulled up to VCC by the attached collector resistor. This means that if either X1 or X2 has a high voltage level, the voltage at O1 will be lower than at O1b, indicating that I1 and I2 have the same logic value. Otherwise, if either X3 or X4 has a high voltage level, the voltage at O1 will be higher than at O1b, indicating that I1 and I2 have different logic values. This is the definition of an XOR operation. Emitter followers are used to shift the voltage at O1 and O1b to level 2.

            Three versions of the HFVCO were designed for the VCO test chip. One version, designated HFVCOa, is implemented as shown in Figure 3.21, using the pad driver schematics found in Figure 3.8. Since the bandwidth of this pad driver is severely limited by the bandwidth of the ESD device (see Chapter 5), a second version of the HFVCO, designated HFVCOb, was designed that is identical to HFVCOa, with the exception that the ESD device was omitted from the pad driver circuit. This version of the HFVCO should produce a higher amplitude output signal for a given frequency, assuming the pad driver devices are not damaged by static electricity. The third version of the VCO, designated the medium frequency voltage-controlled oscillator (MFVCO), differs from the HFVCO in that only one level of frequency multiplication is employed. In other respects, the design is similar to that of the HFVCO.

Figure 3.24: Medium frequency VCO schematics.

            The schematics for the MFVCO are shown in Figure 3.24. Note that two XOR gates are used in the MFVCO design even though only one is required to multiply the core frequency by a factor of two and provide input for the pad driver. The purpose of the second XOR gate is to balance the loading of the VCDE’s so that the delay through each VCDE as a function of the input differential control voltage is equal. This will minimize the distortion in the MFVCO output signal. The pad driver for the MFVCO uses ESD devices since the protection they provide seemed more important than the limitation on the MFVCO bandwidth that is imposed by their use.

            The layout for the VCO test chip, shown in Figure 3.25, occupies an area of 1.5 mm by 0.51 mm. HFVCOb is on the north side of the chip, HFVCOa is in the middle of the chip, and the MFVCO is on the south side of the chip. All the VCO cells are laid out as standard cells. The VCDE’s are arranged in two rows of two cells to minimize the lengths of the signal wires between the cells. The two XOR gates with connections to the VCDE’s are located on either side of the VCDE ring, while the final XOR gate in each HFVCO is located between the VCDE core and the pad cells. A row of ten pads with a 150 mm pitch between adjacent pads is provided on the VCO test chip for compatibility with a 10-pin probe containing six signal, two power, and two ground connections. Two signal pins are used for each VCO to provide connections to the control input and VCO output. Note that this chip is longer than it needed to be to fit the layout for the three VCO’s, but is instead also bounded by the length required to accommodate the pad arrangement for the 10-pin probe. The location and function of each of the pads on the VCO test chip is shown in Figure 3.26. A die micrograph of a fabricated VCO test chip is shown in Figure 3.27.

Figure 3.25: VCO test chip layout (north facing right).

 

 

 

Figure 3.26: VCO test chip pad arrangement.

Figure 3.27: VCO test chip die micrograph (north facing right).

 


 

Chapter 4                                                                      SiGe Technology Performance Analysis

 

To design circuits that function well in digital and mixed-signal systems, it is necessary to understand the performance advantages and limitations of various classes of circuits. In determining these advantages and limitations, design methodologies can then be formulated to help the designer make decisions about a design without having to simulate every possible permutation of a particular circuit to make the appropriate choices. In this chapter, both steady state and transient analyses are performed on CML, ECL, and CMOS circuits to determine the performance of buffers and inverters designed in these circuit families as a function of the various design parameters associated with these circuits. These parameters include current levels, device sizes, logic swings, and loading. Based on the performance data, design methodologies are developed for designing a particular gate, as well as choosing the appropriate gate for optimal performance given the particular design environment. These methodologies streamline the design process by providing a method of making design decisions based on quantitative data without an overwhelming number of simulations for each design decision.

4.1        Simulation methodology

            All simulations of the register file test chip and other test structures were performed using a version of SPICE titled HSpice by Avant!, with device models provided by the chip manufacturer. Although there were numerous updates of the device models during the course of the design cycle, results of simulations using the models available at the time of the first fabrication run (97 models) and models currently available (99 models) are presented. Three types of simulations were performed. The first type, referred to as simulations without parasitics, are performed in SPICE using netlists generated from schematics. The second and third types, referred to as simulations including capacitive and RC wire parasitics, respectively, are performed in SPICE using netlists generated from physical layouts. The netlists are generated from layouts using Cadence extraction tools configured for the SiGe technology. These extraction tools detect devices based on the layers in the layout and determine connectivity using the interconnect layer information. In addition, for the simulations including wire capacitance, the parasitic capacitance is determined for each net and included in the SPICE netlist. The capacitance for each net is determined using the length and area of each section of wire to compute the corresponding area and fringe capacitance to nearby layers above and below the section. For the RC simulations, the resistance of metal layers marked with a special layer is determined in addition to the metal capacitance. RC pi models are used to simulate the combined effects of the resistance and capacitance. These models represent the total capacitance of a particular section of a net as two capacitors. A resistor that represents the total resistance of this section of the net separates the capacitors. The nets are sectioned automatically by the extraction tools based on such factors as changes in metal layers using vias and “t” intersections of metal. For the RC simulations, nearly all the nets were marked to provide complete RC information for the SPICE simulations. Because of time constraints, simulations including capacitive and RC wire parasitics were not performed before either tape out. Therefore, the data from these simulations was not available to aid in design optimization.

4.2        Logic swing in CML and ECL circuits

            Unlike with logic families such as CMOS, the high and low output voltages of a particular logic circuit (VOH and VOL) are not approximately equal to the supply voltage and ground, respectively, in ECL and CML circuits. Instead, they are independent quantities determined by the circuit design. Choosing a proper value for VOH - VOL is important for the ECL and CML circuits to perform well. If the value is too small, the distribution of current between the two devices in a current switch may be such that the device that should be conducting the majority of the current may not be conducting significantly more current than the device that should be nearly cut off. In this case, given a current tree circuit with multiple levels, it is possible that the circuit may produce a pair of output voltages in which the value of the voltage representing the true signal is less than the value of the voltage representing the false signal. In addition, if VOH - VOL is too small, noise in the system may be sufficient to alter the voltage levels of signals enough to temporarily change the logic values associated with these signals. If VOH - VOL is too large, on the other hand, problems with performance can result. The main problem is that the larger the output voltage swing, the greater the amount of charge that has to be transferred to switch between VOH and VOL. Therefore, a greater voltage swing tends to result in a greater switching time in ECL and CML circuits due to the increased time required to transfer the necessary charge. If the output voltage swing is large enough, it may even cause current switch devices driven by the highest level inputs to saturate, further increasing the amount of charge transfer required to switch between logic states. In this way, the switching time of the circuit is increased even further.

            To analyze the current flow through a current switch (such as the one in the circuit found in Figure 1.1) as a function of the applied input differential voltage in a more quantitative manner [64],[65], one begins by assuming both devices are operating in the forward active region. The collector current of a forward biased bipolar device as a function of VBE is often approximated as

 

.

( 14 )

Using this equation, it is possible to divide the collector current of the first device by that of the second device in order to express the quotient of the collector currents in terms of the differential voltage applied between the base nodes of the two devices, producing

 

,

( 15 )

where Vid is the applied voltage between the base nodes of the two current switch devices. Assuming current is flowing through the current switch, the current drawn from the coupled emitters is constant in steady state. Therefore,

 

,

( 16 )

where ICS is the constant current that is drawn through the current switch. Solving for IC1 results in

 

,

( 17 )

while solving for IC1b results in

 

.

( 18 )

Examining these two equations, it is apparent that for a large positive value of Vid, IC1 is nearly ICS and IC1b is nearly zero, while for a large negative value of Vid, the situation is reversed. Also, when Vid is zero, IC1 and IC1b are equal. Since the change in IC1 and IC1b is an exponential function of the change in Vid, the magnitude of Vid does not need to be very large for either IC1 or IC1b to approach ICS. This means that the choice of VOH - VOL does not have to be very large to insure that, in steady state, most of the current flowing through a current switch is flowing through only one of the two devices.

Figure 4.1: IC as a function of Vid for a current switch.

            IC1 and IC1b are plotted as a function of the applied differential voltage to the current switch using data obtained from the above equations as well as data obtained from SPICE simulations using the 97 models. ICS was chosen to be 1 mA while m was chosen to be 1 for the calculations. The plot of the resulting data, shown in Figure 4.1, confirms that the current is evenly divided between the two current switch devices when Vid is zero. Also, from this plot it is apparent that, using either the analytic or SPICE data, the majority of ICS flows through only one current switch device as the magnitude of Vid approaches 0.2 V. Note that the analytic results are not heavily dependent on the value of bDC. However, the slopes of the transfer characteristics are determined primarily by the ideality factor (m) within the exponential term. By adjusting m to 1.5, a much closer match is obtained between the analytic and Spice results. Based on this data, the minimum design value of Vid for the ECL and CML circuits was chosen to be 0.25 V. This indicates that circuits with differential inputs require VOH - VOL to be 0.25 V while circuits with single-ended inputs require VOH - VOL to be 0.5 V.

4.3        Buffer propagation delay using CML, ECL, and CMOS circuits

            SPICE simulations were also performed using the 97 and 99 models to determine the performance of the SiGe devices in digital circuits. Differential ECL and CML circuits were simulated using the 97 and 99 models to determine the propagation delay through a buffer as a function of the steady state current flowing through the buffer and as a function of the number of loads driven by the buffer, given a 0.25 V output voltage swing. One way to determine this in SPICE is to simulate a chain of buffers in which two pulses that are the inverse of each other are applied to the input nodes of the first buffer in the chain. The propagation delay is observed through a buffer near the end of the chain. This delay is computed as the difference between the points in time when the buffer input and output differential voltages reach zero as the pulse propagates through the buffer. The buffers following this buffer serve as a load while those before shape the input pulse into something more typical of what the buffer under observation would be exposed to in normal operation. For these simulations, five buffers in series are used to shape the input pulse while two in series are used as a load for the buffer under observation, as illustrated in Figure 4.2. All eight buffers use the same current source reference voltage generator. Ring oscillator simulations (see Chapter 5) were used to verify that this simulation methodology is effective for determining propagation delays.

Figure 4.2: Buffer chain used to determine the propagation delay of an ECL or CML buffer.

            One set of buffer chain simulations was performed using differential CML buffers (with schematics shown in Figure 1.1) containing devices with emitter lengths of 1 mm to determine the propagation delay of a CML buffer as a function of the steady state current through the current switches in the buffers. The results of the differential CML buffer simulations using the 97 models, shown in Figure 4.3, demonstrate that when driving a single load buffer, the propagation delay of a differential CML buffer approaches 12 ps as the steady state current through the buffers is increased. The propagation delay increases significantly, however, as the number of loads is increased or as the current flowing through the buffers is decreased. That is, as the current level is decreased to 0.25 mA, the buffer propagation delay increases nonlinearly to nearly 19 ps, which is a 54% increase. On the other hand, if the number of loads the buffer is driving increases to 32, the buffer propagation delay increases to nearly 104 ps, over 8.6 times the delay with one load. Combining the two effects, decreasing the current level of the CML buffer to 0.25 mA and increasing the number of load buffers to 32 results in a propagation delay of nearly 270 ps, over 22 times the delay using 1 mA of current and driving one load buffer. This indicates that increasing the current to the device limit of 1 mA produces the lowest buffer propagation delays. This is due to the fact that increasing the buffer current level allows the buffer to transfer charge more quickly, which facilitates more rapid device switching. The lowest power-delay product is achieved using a lower current level, however, because of the nonlinear decrease in propagation delay as a function of buffer current. This is an important consideration to prevent overheating of circuits in large designs due to an excess dissipation of power. The lowest power-delay product for the CML buffers occur when the buffers are using less than 0.25 mA, regardless of the number of load buffers. In the RFTC design, CML gates were typically designed to use 1 mA of current since achieving low gate propagation delay was a greater priority than conserving power.

Figure 4.3: Propagation delay through a differential CML buffer as a function of the steady state current under various loading conditions using the 97 models.

            The differential CML buffer chain simulations were repeated using the 99 models when they became available. The trends observed in the results, shown in Figure 4.4, are similar to those observed in Figure 4.3 using the 97 models. This is indicated by the fact that the CML buffer propagation delay when the buffer current is 0.25 mA and the CML buffer is driving one load is 20 ps, which is only 5.8% larger than the delay incurred using the 97 models. If the number of load buffers is increased to 32, at 0.25 mA, the CML buffer propagation delay increases to 270 ps, which is identical to the delay observed using the 97 models. When driving 32 load with a buffer current of 1 mA, the propagation delay decreases to 100 ps, which is only a 3.7% decrease from the delay observed using the 97 models. However, when a CML buffer with a current level of 1 mA is driving only one load, the propagation delay is 15 ps, which is 22% higher than the delay observed using the 97 models. These results demonstrate that at higher current levels, the CML buffer is slower, but less sensitive to loading using the 99 models when compared with results using the 97 models, while at lower current levels, the CML buffer behaves similarly using either set of models. Again, the power-delay product of the CML buffer reaches its minimum value at a current level below 0.25 mA.

Figure 4.4: Propagation delay through a differential CML buffer as a function of the steady state current under various loading conditions using the 99 models.

            Based on the results shown in Figure 4.3, a steady state current of 1 mA was chosen for the current switches in the level 2 and level 3 differential ECL buffers with current switch devices having 1 mm emitter lengths. Buffer chain simulations were performed using level 2 differential ECL buffers with devices having 1 mm emitter lengths as a function of the steady state current through the emitter followers of the buffers. Simulation results for the level 2 differential ECL buffer using the 97 models, shown in Figure 4.5, also indicate a propagation delay of 12 ps at an emitter follower current of 1 mA. However, the propagation delay of the level 2 buffer only increases to 14 ps as the emitter follower current is reduced to 0.25 mA, which is an 18% increase in delay over the level 2 buffer with 1 mA emitter follower current levels. This is much lower than the 54% increase in delay observed in the CML buffer as its current was reduced from 1 mA to 0.25 mA using the same models.

Figure 4.5: Propagation delay through a level 2 differential ECL buffer as a function of the steady state emitter follower current under various loading conditions using the 97 models.

            When the number of buffer loads driven by a level 2 buffer with emitter follower current levels of 1 mA is increased to 32, the buffer propagation delay increases to 63 ps, which is 5.5 times greater than the propagation delay when driving only one buffer load. This is an improvement over the driving capability of the CML buffer, whose delay increases to 104 ps when driving 32 load buffers. Combining the two effects, the propagation delay of a level 2 buffer with emitter follower currents of 0.25 mA and driving 32 load buffers increases to 94 ps, which is 8.2 times greater than the delay observed using 1 mA emitter follower currents and driving one buffer load. This is over 2.8 times smaller than the 270 ps propagation delay observed when using a CML buffer with a 0.25 mA current level to drive 32 load buffers. The lowest power-delay product for the level 2 ECL buffer occurs when the emitter follower current levels are about 0.25 mA if the buffer is driving eight or less buffers. When driving 16 to 32 load buffers, the power-delay product is lowest at emitter follower current levels of about 0.5 mA. Note also that the delay begins to increase as the current through each emitter follower increases beyond 1 mA, especially when driving a large number of buffers. This is most likely due to device performance degradation due to a high current density in the devices when active. The unity gain frequency (fT) of the HBT, an indicator of device performance, is also degraded at high current levels [2].

Figure 4.6: Propagation delay through a level 2 differential ECL buffer as a function of the steady state emitter follower current under various loading conditions using the 99 models.

            The level 2 differential ECL buffer chain simulations were also repeated using the 99 models. The trends observed in the results, shown in Figure 4.6, are similar to the trends observed in Figure 4.5 using the 97 models. However, unlike with the CML buffer simulations, there is only a good match between the simulation results using the two sets of models when the emitter follower current is low and the level 2 ECL buffer is driving a large number of loads. For instance, a level 2 ECL buffer with emitter follower current levels of 0.25 mA that is driving 32 load buffers has a propagation delay of 95 ps, which is 1.2% higher than the result obtained using the 97 models. When the level 2 ECL buffer with emitter follower current levels of 0.25 mA is driving only one load, however, the propagation delay is 17 ps, which is 22% larger than the delay observed using the 97 models. These results exemplify the fact that, similar to the CML buffer with a high current level, the level 2 ECL buffer has a longer propagation delay when lightly loaded, but is less sensitive to loading effects when using the 99 models as compared with the 97 models. The pattern holds at higher current levels as well. For instance, a level 2 ECL buffer with an emitter follower current level of 1 mA that is driving one buffer load has a propagation delay of 14 ps, which is 26% larger than the delay observed using the 97 models. However, when the level 2 ECL buffer is driving 32 load buffers, the propagation delay is 72 ps, which is only 14% larger than the delay observed using the 97 models. The power-delay product reaches a minimum value when the level 2 ECL buffer emitter follower current levels are about 0.25 mA assuming the buffer is driving 8 or less load buffers, or if the there are 32 load buffer. When driving 16 load buffers, the power-delay product reaches a minimum value assuming the level 2 ECL buffer emitter follower current levels are about 0.5 mA.

            A set of buffer chain simulations was also performed using level 3 differential ECL buffers with devices having 1 mm emitter lengths as a function of the steady state current through the emitter followers of the buffers. Simulation results for the level 3 ECL buffers using the 97 models, shown in Figure 4.7, indicate a trend similar to that of the level 2 ECL buffers. However, the propagation delay is greater for the level 3 ECL buffers under similar simulation conditions. For instance, at an emitter follower current level of 1 mA, a level 3 ECL buffer has a propagation delay of 14 ps when driving one load buffer, which is 18% greater than the level 2 ECL buffer propagation delay under similar conditions. When driving 32 load buffers, the level 3 ECL buffer propagation delay increases to 77 ps, which is 23% larger than the level 2 ECL buffer propagation delay under similar conditions. With an emitter follower current of 0.25 mA and driving either 1 or 32 buffer loads, the level 3 ECL buffer propagation delays are 17 ps and 110 ps, respectively, which are 28% and 17% larger than the respective propagation delays of the level 2 ECL buffer under similar conditions. Therefore, the level 3 ECL buffer propagation delay is more sensitive to loading than the level 2 ECL buffer when both buffers use high emitter follower current levels and less sensitive to loading when both buffers use low emitter follower current levels. Since the level 3 ECL buffer propagation delay when driving 32 loads is much lower than the propagation delay of the CML buffer, but higher than the CML buffer propagation delay when driving only one load buffer, the level 3 ECL buffer has less sensitivity to loading than the CML buffer. The power-delay product reaches a minimum value for the level 3 ECL buffer when the emitter follower current levels are about 0.25 mA if driving 1 to 4 or 32 load buffers. If driving 8 to 16 load buffers, the level 3 ECL buffer power-delay product reaches a minimum value when the emitter follower current levels are about 0.5 mA.

Figure 4.7: Propagation delay through a level 3 differential ECL buffer as a function of the steady state emitter follower current under various loading conditions using the 97 models.

            The level 3 differential ECL buffer chain simulations were also repeated using the 99 models. The trends observed in the results, shown in Figure 4.8, are similar to the trends observed in Figure 4.7 using the 97 models. As with the level 2 ECL buffer simulations using the 99 models, the level 3 ECL buffer simulations using the 99 models show greater propagation delays under lightly loaded conditions, but less sensitivity to loading as the number of buffer loads is increased when compared to simulations using the 97 models. For instance, with an emitter follower current of 0.25 mA and driving either 1 or 32 buffer loads, the level 3 ECL buffer propagation delays using the 99 models are 21 ps and 112 ps, respectively, which are 19% and 1.5% larger than the respective propagation delays of the level 3 ECL buffer using the 97 models. Also, with an emitter follower current of 1 mA and driving either 1 or 32 buffer loads, the level 3 ECL buffer propagation delays using the 99 models are 17 ps and 87 ps, respectively, which are 24% and 13% larger than the respective propagation delays of the level 3 ECL buffer using the 97 mo