When designing a carry select adder, the obvious question is where to divide the select stages. The first idea in dividing the bits is to recognize the goal of parallelization of the groups and divide bits evenly into blocks [WEST93]. The delay for the entire adder would then be addition time for bits plus the delay through multiplexers to select the correct outputs of ever select stage.

Further optimization recognizes the multiplexers used to select the proper output from each stage themselves incur a signal delay. Since the selection of an output for a stage takes as an input a previous stage's multiplexer output, it may be a larger stage as it has longer to complete; time-wise it may lengthen by the delay of a multiplexer. While carry select with even sized stages is scales linearly (although at a rate scaled by a constant from the ripple carry time), increasing each stage can be shown to scale with the square root of the size of the adder.

Varying the size of each stage of a carry select adder, as opposed to the simple method of equal stage sizes, can reduce the critical path delay. Optimally, the carry in should arrive just as the carry out possibilities have been generated, and these signals should enter the carry out multiplexer at the same time. Therefore, each stage should be slightly shorter than its successor. How much shorter depends upon the relative delay through the multiplexer as compared to that of the "gate" generating the lower level carry. These are typically considered the same "gate delay", but depending on the technology can not only be very different but can also vary significantly due to circuit context.1

If is the number of bits in stage , is the delay time across the "carry" gate, is the delay time across a multiplexer, and is the number of stages, then the delay of the carry chain is

is true2.

If a constant step in size is used between each stage, then it follows that the step size should be

If B is the total number of bits in the adder, then (considering that the final stage may not need to be cN to meet size requirements)

Solving the latter expression for c1 yields

Substituting into the following expression for total delay

results in this expression for
in terms of
, B and s:

is required to be an integer, and is typically considered as an integer as well, so if we consider and as the ceiling3 of approximate variables

Since B and s are given, the minimization of delay is solely a function of N. At this minimum, we know that

Since the term is positive, the inequality will be true for between the roots of the equality. So finding the roots

The lowest value of may be chosen4, given from above, by:

Given these values for and , can then be found.

325 |
||

No Possible Sequence6 |

In a facile design of a carry select adder, s would be assumed to be 1. Given that, and B=32, a table of td/tg versus N could be calculated as shown in Table 3-I.

In the semiconductor technology and digital logic family being discussed considered, s can actually be up to 2 [NAH91]. If s=2, one can surmise by inspection that these slow multiplexers will make additional stages "costly". This changes our stage arrangement as shown in Table 3-II.

327 |
||

No Possible Sequence8 |

The pseudo-carry [LING66, LING81] is based on carry look ahead addition. Carry look-ahead addition, sometimes referred to as "parallel addition", creates a carry for a bit by combining signals generated at each preceding bit. This can be performed by wide fan-in gates, or by a tree-like structure where the look-ahead combines larger and larger blocks. The pseudo-carry improves on that by distinguishing limiting and non-limiting delay paths, and moving delay from the limiting paths to the non-limiting paths.

In the context of carry look-ahead, the computation of a carry out can be reformulated in term of the propagate( ) and generate( ) signals:

The term can be further expanded:

If in fact is computed with an inclusive or as is shown here(in some methods, it is generated by an exclusive or), an identity exists

Thus we can factor the out of the equation for :

The remainder of the terms is known as the "pseudo-carry", . The term can then moved into the equation for Sn:

The term, identified as , can be computed in parallel with the term as well as the term resulting in the expression

The question remains, however, what has been gained by this manipulation? The key is the reduction in fan in from to for a given block size to cover. Specifically in the case of the current-steering logic gates being considered, increasing complexity severely decreases the available fan in for a single gate. Furthermore, by inspection neither or will exceed the delay for . Thus delay on a critical path has been traded for delay on non-critical paths.

The generalized equations for group pseudo-carry can be derived in the same manner as those for group generates . Given the expression for carry in terms of propagates and generates

we can factor those lesser terms into group generates and propagates of the form by the following rules:

- Each term of will also contain every higher ordinal subgroup propagate than the subgroup generate for that term. The term with the highest ordinal subgroup will thus contain no subgroup propagates.

For example, the four-bit carry

can be broken into two groups in this manner:

9,

Likewise, group pseudo-carries can be made by adding a rule to factor out a .

One can compute higher-level terms from those of sub-blocks in the same manner as for carry select. Note that due to the factoring which creates the pseudo-carry, the terms are one bit position behind the corresponding terms.

In general, the pseudo-carry look-ahead tree is expanded by:

and in the case of a three-input look-ahead gate10

The carry-out of a 32-bit adder can thus be generated from the 32-bit pseudo-carry tree:

2. I.e. the longest delay is through the first stage and all of the carry multiplexers, as opposed to coming through one of the intermediate stages. One can apply this statement without loss of generality. Consider the largest k such that k is not greater than N and

The previous stages do not impinge on the critical path delay and we need only consider the trailing subset of the adder stages as an adder in its own right, with N-k+1 stages and where it is true that

3. The ceiling of a number is the least integer not less than that number, e.g. if the representation of ceiling is then .

4. With the lowest applicable N, the fewest stages are used, which probably result in an easier circuit to build. So selecting the lowest N is not exactly arbitrary.

6. There is no sequence of length seven that also maintains the step size. Whereas and , the delay for the nine stage adder would be at least 10 , it is obviously not a consideration here.

8. There is no sequence of length seven that also maintains the step size. Whereas and , the delay for the nine stage adder would be at least 15 , it is obviously not a consideration here.

9. The symbols for group generate and propagate are and , where is the size of the group and is the starting position. Note that

10. See Mixed logic look-ahead gate discusses the significance of the three-input look-ahead gate.