Introduction and Historical Review
As clock speeds of state of the art digital systems approach the Giga Hertz (GHz) range, one of the important issues that must be addressed for a reliable synchronous digital system is the skew of the system clock. Clock skew increases machine cycle time, which lowers system performance. Since clock skew does not scale down automatically with the reduction in the machine cycle time, it is not surprising that skew control becomes critical as the operating frequency of the system clock is increased.
There are two sources for clock skew. One source of skew is due to the delay mismatches in the physical distribution of the clock signals. The topology of the logic layout may result in different delays to the different physical points of the clock distributions. The other source of skew generally arises during the normal operation of any system. The temperature variations due to the local heating across a system results in delay variations as a function of time. In other words, a given clock signal distribution may experience delay variations over the duration of system operation.
The clock deskew scheme developed and presented in this thesis provides a novel approach to control chip-to-chip skew present in the distribution of the system clock. This scheme is especially well suited for the skew created in the clock distribution system. The scheme requires a Phase-Locked Loop (PLL) circuit for every chip that requires a clock signal. It is also applicable to a system with multiple clock phases under the special restrictions outlined in Chapter 2.
The deskew scheme has been developed as part of an ARPA sponsored project to build a 1 ns computer, the Fast Reduced Instruction Set Computer version G (F-RISC/G) in Rockwell GaAs/AlGaAs HBT technology. MCM package technology has been chosen as a vehicle to assemble together the chips comprising the F-RISC/G, which was partitioned due to yield limitations of the experimental GaAs HBT "emitters up" fabrication process. A Phase Locked Loop (PLL) circuit developed for an implementation of the scheme is discussed in Chapter 3. It has been designed in differential logic wherever suitable for lower noise sensitivity and for I/O signals level compatibility with external circuits. An architectural feature of the F-RISC/G requires multiple phases in the system clock; four phases, to be more specific. The clock phase generator circuit presented in Chapter 4 generates efficiently the four phases and is compatible with the deskew scheme. The actual system clock frequency required for the generation of the four phases is only twice the desired system speed; for example, 2 GHz system clock is sufficient for generation of four 1 GHz clock phases in F-RISC microprocessor. In Chapter 5, a test vehicle design and suitable test plans are presented.
The second part of the thesis is devoted to a design of 32x8 bit register file, which is an integral part of F-RISC/G. The two main design objectives were speed and size. Due to the low yield expectations, there was a great desire to reduce the final design to the simplest form as much as possible; the GaAs/AlGaAs transistors have a fT in excess of 50 GHz in the Rockwell baseline process. The test results from the chips fabricated with an experimental process had an fT of only 30 GHz and indicated an access time of 500 ps for the register file design presented in this thesis. The design and the description of the register file circuits along with the test results are documented in Chapter 6. The target access time is under 200 ps. Several factors have contributed to the discrepancy in the speed. Extra nitride layer with high dielectric constant was added during the fabrication process and parasitic capacitances have been under estimated (Appendix B). Shown also is the updated register file design that is SPICE simulated to have access time under 200 ps.
The need for a clock deskew scheme is closely tied to the nature of skew itself. It then becomes natual to ask, "What is skew and why is it necessary to make it as little as possible?" The nature of skew and its potential detrimental effects on a system operation are explored in the following discussion.
In a synchronous system, it is important that every element in the system receive its clock edge precisely at the same time or at least at a time specified by the circuit design. Skew arises when delays of nonideal nature (propagation delay for example) result in clock edges not arriving at the times specified in the design. Fig. 1.1 shows the general model of a digital system. Proper synchronous operation results when the CLK0 and CLKN active edges occur at the same instant. In that case, the combinational logic network ouputs new values for S'(0) ¼ S'(N) based on the current values of S(0) ¼ S(N) and the control signals. After a certain amount of time from the arrival of the clock edge, S(0) ¼ S(N) assume new values equal to the old values of S'(0) ¼ S'(N). Then, the new outputs of the flip-flops cause the combinational logic gate network to respond to the new inputs by generating new values of S'(0) ¼ S'(N). The combinational logic outputs may experience hazards and delays caused by finite propagation time of the gates. In other words, the values of S'(0) ¼ S'(N) may be momentarily false but will eventually settle to the proper values in accordance with the logic of the combinational gate network.
Fig. 1.1. General model of a digital system
Suppose now that CLKN is delayed with respect to rest of the clocks, CLK0 ¼ CLKN-1, which are assumed to be ideal for the sake of a simplified argument. Signals S(0) ¼ S(N-1) change upon simultaneously receiving the clock edges of CLK0 ¼ CLKN-1. The combinational logic responds to the new set of inputs by generating new set of outputs. However, the values of S'(0) ¼ S'(N) may be momentarily false for a brief instant, possibly due to the existence of hazards. Suppose further that the delayed CLKN edge is received by the Nth flip-flop during this period of instability. The consequence is that the Nth flip-flop would have a false value. Needless to say, this scenario can occur when the clock edges do not appear simultaneously at all clock inputs, aptly demonstrating the possible detrimental effects that the phenomenon of clock skew can have on synchronous digital systems.
Fig. 1.1 illustrates problems which can arise even when using edge triggered flip flops. As a practical matter the technology employed for F-RISC/G can not afford the luxury of the circuitry of an edge triggered flip flop. Instead either Latches or Master-Slave (MS) flip flops are used. These flip flops are even more vulnerable when the write signal is active than edge triggered flip flops. For the latch, when the clock is low, improper data changes can pass through to the outputs. Similar effects can disturb the Master stage of MS flip flops. Consequently, it is even more critical that clock arrival times be precisely controlled when using these more realistic memory elements [UNGER86].
Hazards are spurious outputs or glitches of combinational circuits common in digital systems. They can be caused by the circuit propagation and the interconnection delays. The propagation delay of a circuit is the time between a change in input signal to a change in corresponding output signal. The real gates cannot change instantaneously upon a change in an input, since their operation rely on movement of holes and electrons within some physical material, be it Si or GaAs. Hence, there will always be circuit propagation delays, however small they may be. The exact amount of their delays are a complex function of the input waveforms, temperature, output loadings, operating power, logic family, and other parameters. The interconnection delays are the propagation delays of the wires carrying signals between gates or between chips. This form of delay becomes significant relative to the machine cycle times for high system clock frequencies, such as 2 GHz. These two types of delays can cause hazards in the following manner.
Fig. 1.2 shows an inverter, a signal applied to its input, and the resulting output signal. It illustrates the propagation delay present in an inverter. Now, consider the simple circuit shown in Fig. 1.3 which contains the same inverter shown in Fig. 1.2. It is well known from Boolean Algebra that the output of the circuit should be X × X¢ = F regardless of the logic values of X, thus predicting the output to be always low. However, the timing diagram of the circuit, shown also in the same figure, illustrates that output has a spurious high voltage; a hazard.
Fig. 1.2. Propagation delay present in an inverter
Fig. 1.3. Example of a hazard
1.3. Survey of Existing Clock Deskew Methods
The existing methods to distribute and deskew clock signals in high speed digital systems can be categorized into three major groups. Representative methods for each of the groups are presented in the following. First method makes use of chain of inverters with fine gate delays [FOUTS92]. A block diagram of a typical scheme is shown in Fig. 1.4. A master clock signal propagates down the chain of inverter gates. Then, a multiplexer selects one of the several taps along the inverter chain, the exact tap, thus the amount of delay that a clock encounters, being determined by an associated set of select control signals to the multiplexer. The delay adjustments are made in multiple amounts of a gate delay; the delay resolution is limited to a gate delay. There is no provision for a continuous adaptive deskewing in this method. Instead, sets of select control signals are sent to each of the multiplexers to compensate for the clock skews, which are externally measured. Hence, there is no on-the-fly means to adjust for additional skews caused by temperature variations due to local heating, for example, in the course of normal operation.
The second method for deskewing clock signal incorporates a PLL and variable delay elements as shown in Fig. 1.5 [GREUBPAT]. The PLL in the scheme sets appropriate delays to each of the delay elements according to the value of the digital code latched onto D/A converters. Appropriate codes are obtained after initial skews are externally measured. While this scheme is not limited to the coarse gate delay resolution of the first method, it also lacks the provision for a continous adaptive deskewing, making it vulnerable to the skews caused by temperature variations during normal operations.
Fig. 1.4. Deskew scheme with chain of delay gates (Fouts)
Fig. 1.5. Deskew scheme with manual adjustment (Greub)
Fig. 1.6. Deskew scheme with PLLs (Johnson)
The third method shown in Fig. 1.6 also makes use of PLLs [JOHNSO88]. This method is the most similar of the three methods to the proposed scheme, but also has an important difference. In this method, a PLL is present in each chip that requires clock signal. Upon power up, each PLL adjusts so that it locks to the system clock presented to all chips. The outputs of PLLs are supposed to be the deskewed clock signals. There was a key underlining assumption with this scheme that makes it unsuitable for a high speed digital system. The assumption was that the skew in the distribution of the master clock to each of the PLLs is negligible. However, it is precisely the type of skew that is of concern in a high speed digital system and is the target for compensation by the deskew scheme presented in this thesis. The third method would be suitable for applications in low speed systems with the machine cycle times in the tens of nano-seconds, where the skew in the clock distribution is indeed negligible for most cases.
[GARDNE80, HEIN88, JEONG87, KURITA91, RANSIJIN91, SOYUER89, SOYUER90, and WARE89]