Leading up to this candidacy proposal, preliminary research activities have been conducted in the formal algebraic background of binary addition methods and construction of some test circuitry to demonstrate the feasible speeds of these methods. However this does not yet comprise a manufacturable adder or a well-founded claim to such an adder's speed.
An no-cost oportunity for fabrication of a partial pseudo-carry look-ahead carry tree test structure for 32-bit operands arose early in this research, and such a test structure was designed. This circuit generates the highest order pseudo-carry for 32-bit operands and feeds that back as an input, creating the ring oscillator as shown in Figure 5-1. This circuit is intended as a proof of speed of the pseudo-carry circuit. The measured results will be compared to back-annotated circuit simulation to examine the accuracy of the modeling and identify potential problem area that may affect future designs. The dependence of the delay of the various signal paths on the variables of supply voltage and temperature must be measured, and the resulting data analyzed to provide justifiable claims as to the performance of these circuits.
There are two carry-tree test structures, each of which consists a carry-tree with a static 1 applied to the A operand and a static 0 applied to the B operand. The carry tree is made to oscillate by inverting the carry out and then applying that signal to one of the operand bits. Each test structure introduces the feedback signal into a small set of different points in the carry-tree which are selectable. This allows multiple paths through the tree to be exercised.
The test structure is based around the carry tree that generates the 32nd carry extracted from a complete set of distribution trees for 32-bit addition. It contains 5-gate deep signal paths, comprising of 3 mixed-logic look-ahead gates,1 single-ended input gate, and 1 differential gate. This corresponds to the paths that would exists in the carry distirbution of the full adder. The unloaded simulation results in Figure 5-2 show the expected performance of the test structure.
These results will be compared to back-annotated circuit simulations to conducted to show the estimated dependence on loading, temperature and supply. The results of these comparisons will be used to project the actual performance of constructed adders
The mathematics behind the construction of carry look-ahead and similar circuits can be mapped to a parallel prefix problem. A parallel prefix computation is defined as the generation of the sequence comprised of all prefixes of a series-like computation using some associative operator. More specifically:
A well-developed theory of parallel prefixes exists, but not all connections with carry generation have been explored in full. Of specific interest are parallel prefixes using ternary operators, which are the analog of the three-way look-ahead gates described above.
As a basis for construction of complete carry trees, constructive solutions for the ternary prefix problem must be found. Critical to the appropriateness of these constructions will be the relationship of circuit depth to bounds on fan-in and fan-out. Parallel prefix computation is often considered in tandem with threshold circuits, with fan-in loosely bounded or even unbounded. Unbounded fan-in is of course impractical in physical circuits design, and the impact of fan-in was touched upon in the discussion of the three-way look-ahead gate in See Mixed logic look-ahead gate.
The circuitry designed so far is not an adder. It is only a carry tree, and only for the most significant carry. A complete tree producing 32 outputs must be designed. Naïve implementations tend to create very high fan-out for carries out of large blocks, up to the size of the following blocks. The variable delay introduced by such fan-out is not acceptable. Designs for regular carry look-ahead trees all use only binary look-ahead gates, like in Figure 5-3. A new tree design which can utilize two and three input look-ahead gates while maintaining a minimal fan-out is needed. This will involve computation of carry-generation equations, based on the ternary prefix problem, for all bit positions that are constrained by the form of the 3-input look-ahead gate.
Optimization of the bit ordering is also required. The loading of a gate's output is not only a function of fan-out but also of length of interconnect to each of those fan-out destinations. A carry look-ahead tree combines inputs from widely separated bits at each node, increasing at each level. Both the total load contributed by interconnect and imbalance of the interconnect on separate fan-out branches increase the delay of an output. If a strictly ordinal layout need not be enforced, large separations may be narrowed at the expense of widening short paths resulting in more balanced wire delays. Sensitivity of delay to the two variables of total and unbalanced loading will have to be analyzed, and heuristic for comparing different bit ordering needs to be generated from that short of creating a layout for every possible order.
The mixed single- and double-ended look-ahead gate is a core piece of circuitry for this pseudo-carry look-ahead adder. Design decisions thus far have sometimes been made primarily on the depth of the logical design, with only general consideration of the impact on the altered circuit design and thus far none for area. The necessity to design the carry tree to limit fan out has already been discussed. The effect of fan-out, fan-in, and tail current on gate delay need to be defined more precisely. The required noise margins for single-ended inputs in the given technology need to be analyzed to determine the minimum input swing needed, with the goal of reducing rise time by reducing output swing. The drive capabilities of straight level-1 outputs and the sensitivity of delay to interconnect and device loading need to be compared level-2 emitter-follower outputs, specifically relating to driving the single-ended inputs with emitter follower outputs. If these can be driven on level-1, specifically on the connections from the first to second layer where the interconnects will be shortest due to the small range that the blocks cover, the gate could be expanded to four inputs from three or could possibly be moved to -3.6 V.
The series-gated input level selection also provides opportunity for improving both fan-in (functional complexity) and power. The series-gated levels have thus far been separated by 1 drop. If they are instead separated by one-half of that amount the number of available levels for a given supply voltage increase or the needed supply voltage for a given number of inputs decreases. Though the gate may slow down, the question remains whether this could be offset by reducing the depth of the circuit with the increased fan-in. Noise margins and sensitivity to loading will have to be calculated to determine if and when such a gate would provide an advantage in the carry look-ahead tree.
The focus of this research thus far has been entirely on 32-bit addition. The work of the research group many areas, at least one of which is finding the need for 64-bit arithmetic. The methods developed for minimal delay adders need to be generalized for arbitrary widths for use in other researchers projects.
The pseudo-carry logic developed in this research changes the relative merits of look-ahead adders versus other adder types. It is well known that carry look-ahead decreases delay at the expense of increased area, and in some opinions at the expense of increased complexity as well. Comparisons of the new trade-offs between delay, circuit size, routing complexity, and regularity should be drawn up, to demonstrate the place of the new design in the arena of available binary adders. Also of interest are the merits of hybid, where the assets of different adder are combined to lessen their liabilities [TYAGI93].
Along those lines, identification of tasks to which the new adder would be well suited would help to demonstrate the advantages of the design. For example, the circuit depth needed for a 32-bit pseudo-carry look-ahead adder applies to operands up to 54 bits. This is wide enough to accept the significantly of IEEE 754 double precision floating point numbers[IEEE85]. This suggest the suitability of the pseudo-carry look-ahead when fast floating point operations are needed.