There are two kinds of differential output bipolar logic circuits used in the present work. The nomenclature used for these circuits in the current literature is not clear. The commonly used terms ECL (Emitter coupled Logic) and CML (Current Mode Logic) are identified by the "authorities" in the field as distinctions without difference, in ways that have been found to be more confusing than useful. It becomes evident that a system on nomenclature is need to successfully discuss the design or circuits relevant to the present research.
The common denominator between these logic families is a source whose current is steered to pull down one of two (or several) output lines that have pullups, differentiating asserted and nonasserted outputs. In light of this shared construction, these circuits might be generically labeled Current Steering Logic (CSL). The current is steered via emitter coupled transistors. The proportion current through the transistors is exponentially dependent on the difference in baseemitter voltages. For a modest voltage swing, e.g. 250 mV, essentially the one transistor with the highest base voltage will be "on", conducting all of the current, and the other(s) will be "off" conducting no current. There are two ways of arranging the current switch. On way, shown in Figure 41, calls for one or more transistors driven by singleended input lines, and with their collectors as well an their emitters tied together, to be associated with a transistor whose base is held at a constant base voltage. This is a SingleEnded Reference Current Switch (SERS). Alternatively, the two lines of a differential pair may be used to drive the bases of a pair of transistors tied at their emitters. This is a Differential Current Switch (DCS)[EICH91], shown in Figure 42.
The fanout of each signal for the ripple carry cell internal to each select stage is one. A signal either goes to the next ripple carry cell, or it is an output from the adder. The carry lines output from the multiplexers, however, are much more heavily loaded. They must drive the carry multiplexer at the end of the next stage, as well as a sum multiplexer for every bit of that stage. In most cases, it is the carry line that is on the critical path, so it is acceptable to allow the sum multiplexers to be driven by a slower signal. This is accomplished by using a buffer gate to drive the sum multiplexers. The output of a carry multiplexer then drives only two loads. The only case where this is not true is for the last stage, where, before precise consideration of parasitics, the carry and sums should come out at about the same time. With the buffering, however, and the loading of the line selecting the sums being the greatest for this stage, the sum on the last bit actually has the longest delay. This may need to be handled as a special case.
For each bit, the following signals are needed: two carry outs, a carry select, an ALU function and a sum. Five current trees are needed per bit:
For the first bit of each stage, the static carry inputs lead to simplification, of the carry generation and selection circuits. A statically driven seriesgated current switch can be replaced with one short and one open connection oriented according to the static signal desired. Therefore the corresponding current switch can be eliminated, reducing circuit size. The two carry generators become just an AND and an OR. The carry select loses the full multiplexer, and is a function of just the select and the clear.
The layout of a monolithic ALU as a single macro cell was achieved with three smaller macro cells. Two cells contain the complete functionality for one bit, akin to a bitslice design: the "MID" cell with the full circuitry for use in the middle of a carry select stage, and the "HEAD" cell with reduced carry generation for use at the beginning of a select stage. Each of these cells contains two carry generators, a carry selector, an ALU function generator and a sum generator. The data flow was generally considered to be
The carry generators are placed near the "top" to reduce any interconnect delay on the operands. In both cells, the carry generators are side by side and define the cell width, reducing interconnect delay on the carries as well. The carry selector is placed below that on the right, again reducing interconnect distance on the carries.
The ALU function generator is placed to the left of that. Since the sum generator depends on both the carry selector and the ALU function generator, it is located below those two units, with the sum exiting out the bottom of the cell. In addition, the MID cell contains a "sneak path" for the selected carry for use in computing ALU overflow. Since overflow depends on the last carry and the nexttothelast carry, the last MID cell in each stage has an external connection to this carry, and the last stage can supply the carry for the overflow computation.
The CMUX cell contains the carry multiplexor at the end of every stage. The multiplexor is located at the top of cell, where the carries come in on the right and the selected carry goes out the left. The CMUX cell also contains buffers for the control signals for the next stage. In addition, a superbuffer drives a copy of the selected carry signal that drives the carry selectors in the HEAD and MID cells, separate from the carry that drives the next carry multiplexor. These buffers are used to reduce the loading due to fan out. The control signals drive all 32 bits, which is quite a large fan out and needs to be split up. The superbuffer for the copy of the selected carry also has a lower sensitivity to capacitance than a regular buffer, reducing the performance hit cause by the increasing number of loads per stage. This has the added benefit of loading each carry multiplexor similarly with a superbuffer and another carry multiplexor, leading to similar delays for each.1
There is an additional "START" cell that drives the control signals and carry in to the first stage. Conceivably, the control signals could be driven from the left, by buffers in the CMUX for that stage as opposed to buffers in some cell previous to that stage. However, there would still be the need for the superbuffer for the copy of the carry in to the first stage.
Figure 44 illustrates the combination of these cells into a carry select stage. Shown is the low end of the complete ALU, focusing on the five bit wide first stage.

A SPICE extraction of the layout, with interconnect capacitance, was used to simulate the design to determine delays through the circuit for addition. CLR is held low to activate the carry path, and AND and XOR are low and high respectively to produce an addition. With the A operand bits low, and high order bit of the B operand high, toggling either the least significant bit of B or Cin will produce a change on Cout
Some representative timings taken from the SPICE output trace are shown in Table 4I, while a complete output trace is show in Figure 45. The longest path through the carry chain is from the b0 input to the cout output; thus, the delay of the adder could be characterized as approximately 283 ps.

As a fully differential CSL gate maps directly to a binary decision tree, the set of possible operations is very rich. However, since each seriesgated level in general provides only one input and finite supply voltages limit the number of seriesgated levels, the fanin is strictly limited. In most cases, only one input per seriesgated level is allowed. On the other hand, with a fully singleended CSL gate a new input may be added just by tying another transistor in the emitter/collector tied arrangement, but this only produces simple operations.
The equations for creating carry lookahead require a certain amount of complexity that cannot be achieved with just a singleended CSL gate. Using differential CSL imposed a tighter fanin limit, which increases the number of levels or lookahead needed to cover the full operand width and reduces the speed of the adder. What is needed is a hybrid logic which mixes width with complexity.


The equation for pseudocarry lookahead is of the form + + Figure 46 shows in (a) that fully differential CSL requires three levels to compute + and in (b) five levels for + + . By bringing in all of the signals single ended, as in Figure 47, the circuit can be reconfigured to generate + + in only three levels. This means the depth of the carry tree will be a function of , as opposed to , for the same supply voltage.
Two levels of seriesgated DCS at the base of the lookahead CSL tree are driven by the inputs to create a three way selection. The level above that contains three SERCS driven by the inputs. With both inputs asserted, all three inputs are acceptable; with only the high order input asserted the upper two inputs are acceptable; with no inputs asserted only the high order input is acceptable.


The first layer of the carry tree, driven directly by the operand bits, cannot utilize this gate and will cover groups of two bits, not three. The depth of tree can be calculated as or 4 gates. This tree is shown in Figure 48. One additional gate can cover final sum computation and latching resulting in a projected 5 gate deep adder.