Chapter Seven

In the design of a VLSI chip or cell, layout is often seen as a formality which influences the system performance but is manageable. This is typically true for most designs, but at the high end of the design spectrum, layout can have a profound effect upon overall performance. As signal frequencies rise and process dimensions fall, parasitic capacitance, inductance, and resistance increase in importance relative to circuit performance. Although circuits can often be redesigned for higher performance, improving the interconnection performance is typically achieved from the development of new materials with better properties and integrating the materials into the fabrication process. Because this is often beyond the scope of the designer’s responsibilities, the only alternative is to optimize the layout.

In order to produce high-quality custom layouts, it is important to develop and follow detailed methodologies. To this end, I have developed a methodology that realizes custom cell layouts with highly-matched parasitics. It’s also important to examine the standard cell design method because it is used in nearly all large designs today. Although standard-cell designs sacrifice performance in exchange for ease of design, within the F-RISC/G design methodology there are opportunities for improvement that would increase system speed and reduce chip area. This chapter discusses the methodology for matched custom cell design and the optimizations available within the standard cell F-RISC design methodology.

There are several different steps involved in the design process. The initial specification defines the design, the creation of a high-level architecture determines the overall shape, the schematics define how the architecture will be realized, and the layout is the implementation of the circuit on chip. There are many ways to accomplish the first three steps, but the fourth step is typically more defined. It is the implementation step where the physical dimensions of the circuit and the fabrication process finally have a direct impact upon the design. This section will examine several implementation methods and their impact upon the design cycle and performance.

There are many different methods available for design, but these can be broken into four broad categories: standard cell, custom cell, gate array and macro cell [DILLI88, WESTE93]. Each of these methods has distinct characteristics that are useful depending upon the design goals specified. Because of their unique strengths and weaknesses, one method alone is typically not sufficient for an entire design and so most modern designs will use probably two or more of these methods. Figure 7.1 depicts the characteristics of each design method relative to each other.

In the custom method, all circuits are placed and routed independently with no reuse. The macro cell methodology uses a library of pre-designed, non-primitive cells that are then placed and routed individually. The standard cell method draws from a library of primitive cells and places restrictions upon cell dimensions and placement within the layout. The gate array design method is similar to both the custom and standard cell approaches. Gate arrays consist of a large number of flexible cells arranged in a matrix from which circuits are created by routing wires between the cells.

This method is probably the most widely used of all because it has the flexibility to support a wide variety of applications but also allows for easier redesign and modification. The standard cell approach is based upon a library of predefined cells from which one is chosen based upon logical function, power, area, and so on. The components of a standard cell library are typically basic building blocks of digital design, such as logic gates, memory elements, and signal buffers. This one-to-one relationship makes the process of translating the design from schematic to layout easier since synthesis or compilation is not necessary. Only a simple mapping between the components in the schematic and the cells in the library is required.

The macro cell design technique (also known as semi-custom design) is similar to the standard cell technique but provides more flexibility during the layout phase. A cell may contain several logical functions and be grouped together to form larger cells. This freedom to group together logical functions into one cell can result in better performance because the designer can examine the circuit from a wider perspective. The downside is an increase in design time due to the irregularity of the macro cells and the difficulty in reworking the design. For time-critical components of a design, macro-cell design is typically the method of choice. It allows the designer to improve performance of the block as a whole while avoiding the complexity of a full-custom design.

Full custom design is typically marked by the scarcity or entire lack of subcells. Placement and routing is performed on the device level rather than at the cell level, hence every component is placed and routed individually with little or no reuse. This form of design offers the best possible performance, but it extremely difficult and time-consuming, especially for large circuits. For this reason, the macrocell method is typically preferred for larger high-speed circuits.

Gate arrays are a combination of standard-cell and custom design methodologies. A gate array is basically just a large array of logical gates. Circuits are implemented by wiring the gates together to form a logical function. This type of design is typically used for prototyping or fast-turnaround designs because only wiring information is required and the interconnect-less chips may be fabricated earlier and stockpiled until needed. Performance does suffer because of the non-optimal design, placement and routing of the design blocks.

The F-RISC/G processor uses a blend of the standard cell and macro-cell methods. The majority of each chip uses standard cell design, primarily for logical functions. Macro-cell design is evident in critical components such as the register file, the cache blocks within the cache RAM chip, the four-phase clock generator, and the carry-chain in the datapath chip. Because these circuits are on a critical path, they require higher performance than standard cell methods could provide. Of the test structures fabricated, only the testchip uses standard cell design. The testchip register file and the VCO chip both use macro-cell design.

The F-RISC/G project uses a CMOS CAD tool in which the standard-cell design parameters are dictated by both the routing program and the process technology. In this section, we examine these specifications and how they affect the performance of the final design. Based upon this analysis, a new design paradigm is proposed which could maintain the same functionality but with lower cost in terms of chip area.

The Rockwell process restricts all but the top (metal-3) metallization level from overlapping devices. As a result, more aggressive standard cell design techniques such as "over-the-cell" routing become impractical and all cells must have connectors placed on the cell periphery. Furthermore, the F-RISC/G CAD tools require that signal connectors appear on both sides of the cell, effectively creating numerous internal signal feedthroughs that frequently are not used. Figure 7.2 contains the layout of a standard cell with four input and one output signal, each of which must connect to both the top and bottom of the cell boundary.

The impact of the F-RISC/G standard cell specifications appear in the layout of standard cells using the Rockwell process. Because all signal connectors must appear at the top and bottom of a cell, the layout of the cell may require more area. There are many signal routes that take advantage of these "free" feedthroughs within a cell, but as shown in Table 7.1, there are even more situations in which the feedthroughs are never used. If all of the unused feedthroughs were removed and the standard cells compacted, it is possible that a significant amount of chip area could be saved. While this method would require additional feedthroughs to compensate for those removed from the standard cells, the total area saved from each cell could more than offset the increase in feedthrough area. The difference in area depends upon 1) the reduction in cell area when feedthroughs are removed, and 2) the area of an external feedthrough.

Table 7.1 - Feedthrough utilization statistics for F-RISC/G core architecture chips

The actual feedthrough usage is highly dependent upon router performance and technology, so the numbers reported here should not be taken as a bound for all designs but rather as an indication of how imposed feedthroughs can affect performance. An analysis of feedthrough-usage will provide insight into the impact of the F-RISC/G design methodology upon performance.

In order to determine the information shown in Table 7.1, a program called CELLSTATS has been developed. CELLSTATS parses through a chip design file and records the number of cell instances along with the number of connections to the upper / lower connectors and the utilized feedthroughs (both total and itemized utilization according to connector type). Statistics are generated for every cell used in the design but are also grouped into larger categories (such as AND gates, signal buffers, etc.) in order to increase the statistical population.

There are three chips in the F-RISC/G processor which use standard cell design, namely the cache controller, datapath and instruction decoder. In order to establish an estimate of the area wasted by the forced-feedthrough methodology, each chip was analyzed with the CELLSTATS tool.

Each chip uses a large number of standard cells but the distribution of cell instances for each chip is fairly equal. Figure 7.3 shows the distribution for each of the three chips and Figure 7.4 shows the total distribution for all of the chips. Most cells are used on average 7 times in a design but a few may be used 30 or more times. The relatively low average usage of each cell presents a problem when estimating the effect of layout changes upon the entire chip. To generate an estimate of chip area for different layout techniques, a subset of cells must be redesigned in order to determine the impact of the changes. Obviously the cells which are used often become candidates for inclusion. As for the remaining cells, because the distribution among all of the chips is relatively flat, it is difficult to select cells for the subset. It is simply impractical to generate new cell layouts for all of them (each cell redesign can require up to one day of effort by an experienced designer).

Figure 7.3 - Distribution of standard cell instances for F-RISC/G core chips

Fortunately, the 60 to 90 instances typically used in a design are often variations of nine basic cells: AND gates, XOR gates, multiplexers, buffers, clock-buffers, super-buffers, data latches, master-slave latches and level-shifters. The distribution of cell types for each chip is shown in Figure 7.5 while Figure 7.6 depicts the distribution for all chips. The buffers, clock buffers and super buffers were separated into three classes because their physical layouts have significantly different areas and they generally have unique performance specifications.

In Table 7.2 the feedthrough utilization per cell for the F-RISC/G designs are shown. While a fairly high number of cells use at least one feedthrough, very few contain multiple feedthroughs. The majority of the feedthroughs are used by input signals due to the relatively higher number of inputs (there is typically only one clock, select, or output signal per cell).

Figure 7.7 shows the distribution of utilized feedthroughs in each chip among signal classes. The clear majority of utilized feedthroughs are input signals while the clock and output signals are roughly tied for second place.

Figure 7.8 shows the utilization of available feedthroughs broken down by signal type. On average, less than 20% of the available feedthroughs of any type are used. The best utilization occurs for the clock and select signals because they are typically distributed to several points in the circuit (increasing the possibility that a feedthrough could be used) and have only one feedthrough available per cell. The input and output signals have lower utilization because these signals may often connect only two points and thus may never require a feedthrough.

Since there is low feedthrough utilization on the global level, the chips were analyzed in more detail to determine if certain classes of cells or even specific cells exhibited unusually high utilization. The cell library was partitioned into nine classes and then analyzed in terms of feedthrough utilization per signal class in order to provide information on the efficiency of the input, output, select and clock feedthroughs.

As in the general feedthrough analysis, clock and select signals generally had much higher feedthrough utilization than the input or output signals. Due to the large number of input feedthroughs, low utilization is particularly bad. Since the three chips in F-RISC/G have significantly different functionality, it is difficult to establish trends between the designs.

Figure 7.9 - Detailed utilization of available signal feedthrough per signal class for each chip

The redesign of the cell library cannot proceed blindly without considering the connectivity of the existing designs. Because the feedthroughs provide connectors at both the top and bottom of the cell, the router is free to select whichever position is more convenient. In order to determine if any preference for north or south connectors exists, CELLSTATS tracks totals for all of the connectors as well as different classes of connectors (see Table 7.3).

Clearly there is little or no preference for one position over another, hence there is no advantage to moving connectors individually to any particular side. However, the distribution of connectors between both sides of the cell should be somewhat balanced in order to accommodate the original distribution (presumably the router will produce approximately the same characteristics for non-feedthrough standard cell designs).

Based upon the cell classes defined earlier and the frequency with which some individual cells are used, a subset of cells has been redesigned in order to estimate the area savings from removing the forced-feedthrough constraint. Because only a few cells are being redesigned, the area of cells that are not in the subset must be estimated. Many of the cells not in the subset are simply variations of other cells that have been redesigned, hence a fairly accurate estimate of their area can be made. Variations typically involve power levels or signal levels. The power level of a cell is determined primarily by the source resistor value and establishes the switching performance. As shown in Figure 7.10, lower power cells have larger resistor values and can often require more area.

Figure 7.10 - Low, medium and high power versions of a data latch standard cell

Similarly, cells that vary in terms of signal levels may have different areas due to changed connectivity or the inclusion of level-shifting circuitry. In Figure 7.11, three versions of an AND gate are shown. The first version has level 1 outputs, the second has level 2 and the third level 3. Level shifting is accomplished through the use of emitter follower circuits (on the left side of the second and third cells).

The mapping process between the subset and other cells is based upon the cell type (AND, XOR, buffer, etc.). If more than one cell of a particular type is in the subset, cells outside of the subset are mapped to the closest match based upon their names. Once a cell has been mapped into the subset, its area must be estimated. In order to maintain the same power rail pitch, redesigned cells keep the original height so only the cell widths change. All of the cells in the subset provide level-1 output signals. For cells that produce a level-2 or level-3 output, an emitter follower circuit is appended to the cell and typically requires a fixed amount of area.

In order to estimate the area for a level-2/level-3 output cell from a redesigned level-1 output cell, the change in width for the redesigned cell is subtracted from the width of the existing level-2/level-3 cell. This allows the savings for the body of the cell to be applied to the new circuit and still incorporate the effect of the level-shifting circuitry. Some cells may be smaller than the corresponding cell in the subset, hence estimating the redesigned area by subtracting the width change may lead to inaccurate results. In these cases, the cell is reduced by the same ratio as the one in the subset.

Over 28 cells have been analyzed in order to determine the area saving as a result of removing forced feedthroughs. The cells included in the redesign span the entire range of cell classes and also include some subdivisions within larger cell classes. Table 7.4 below shows the area savings for a number of the redesigned standard cells. In order to obtain the most savings possible, an aggressive redesign philosophy was adopted which favored minimal layout area over parasitic capacitance or yield considerations.

Cell Name	Cell Type	Original Width	New Width	Area Savings
an31h	an	84 mm x 163 mm	50 mm x 163 mm	40.5%
ant22h	ant	84 mm x 163 mm	56 mm x 163 mm	33.3%
buf1h	buf1h	48 mm x 163 mm	32 mm x 163 mm	33.3%
buf1l	buf1l	60 mm x 163 mm	38 mm x 163 mm	36.7%
cbuf2h	cbuf	144 mm x 163 mm	134 mm x 163 mm	6.9%
dlt1h	dl	84 mm x 163 mm	53 mm x 163 mm	36.9%
ls12h	ls12	72 mm x 163 mm	58 mm x 163 mm	19.4%
ls13h	ls13	84 mm x 163 mm	64 mm x 163 mm	23.8%
ls21h	ls21h	36 mm x 163 mm	26 mm x 163 mm	27.8%
mlt3h	ml	180 mm x 163 mm	151 mm x 163 mm	16.1%
mlt1h	mlt1	132 mm x 163 mm	108 mm x 163 mm	18.2%
mx41	mx	155 mm x 163 mm	103 mm x 163 mm	33.5%
mxt21h	mxt	84 mm x 163 mm	53 mm x 163 mm	36.9%
sbuf	sbuf2h	84 mm x 163 mm	60 mm x 163 mm	28.6%
xrt21h	xr	84 mm x 163 mm	51 mm x 163 mm	39.3%
		*Average area savings:*		*28.8%*

Not all of the cells outside of the redesigned subset are accurately estimated using the values above. Drawing from comparisons between the area estimates above and actual redesigned layouts, it appears that the values may be too aggressive and not reflect the actual savings for other cells. Two more sets of data for redesigned cells have been generated which use less aggressive area estimates. The first set (Table 7.5) uses the average of all the redesigned versions of a cell rather than the most favorable. This presents a more realistic estimate for the area of an average redesigned cell. The second set (Table 7.6) uses the average of the most aggressive redesign and the original area in order to represent a lower bound upon the expected area savings due to cell redesign.

Cell Name	Cell Type	Original Width	New Width	Area Savings
an31h	an	84 mm x 163 mm	58 mm x 163 mm	31.0%
ant22h	ant	84 mm x 163 mm	56 mm x 163 mm	33.3%
buf1h	buf1h	48 mm x 163 mm	32 mm x 163 mm	33.3%
buf1l	buf1l	60 mm x 163 mm	38 mm x 163 mm	36.7%
cbuf2h	cbuf	14 mm x 163 mm	134 mm x 163 mm	6.9%
dlt1h	dl	84 mm x 163 mm	56 mm x 163 mm	33.3%
ls12h	ls12	72 mm x 163 mm	58 mm x 163 mm	19.4%
ls13h	ls13	84 mm x 163 mm	68 mm x 163 mm	19.0%
ls13l	ls13l	84 mm x 163 mm	74 mm x 163 mm	11.9%
ls21h	ls21h	36 mm x 163 mm	26 mm x 163 mm	27.8%
mlt3h	ml	18 mm x 163 mm	151 mm x 163 mm	16.1%
mlt1h	mlt1	132 mm x 163 mm	114 mm x 163 mm	13.6%
mx41	mx	15 mm x 163 mm 5	120 mm x 163 mm	22.6%
mxt21h	mxt	84 mm x 163 mm	60 mm x 163 mm	28.6%
sbuf	sbuf2h	84 mm x 163 mm	60 mm x 163 mm	28.6%
xrt21h	xr	84 mm x 163 mm	60 mm x 163 mm	28.6%
		*Average area savings:*		*24.4%*

Table 7.5 - Less aggressive area savings for various redesigned standard cells

Cell Name	Cell Type	Original Width	New Width	Area Savings
an31h	an	84 mm x 163 mm	66 mm x 163 mm	21.4%
ant22h	ant	84 mm x 163 mm	70 mm x 163 mm	16.7%
buf1h	buf1h	48 mm x 163 mm	40 mm x 163 mm	16.7%
buf1l	buf1h	60 mm x 163 mm	49 mm x 163 mm	18.3%
cbuf2h	cbuf	144 mm x 163 mm	134 mm x 163 mm	6.9%
dlt1h	dl	84 mm x 163 mm	59 mm x 163 mm	29.8%
ls12h	ls12	72 mm x 163 mm	65 mm x 163 mm	9.7%
ls13h	ls13	84 mm x 163 mm	68 mm x 163 mm	19.0%
ls13l	ls13l	84 mm x 163 mm	74 mm x 163 mm	11.9%
ls21h	ls21h	36 mm x 163 mm	26 mm x 163 mm	27.8%
mlt3h	ml	180 mm x 163 mm	151 mm x 163 mm	16.1%
mlt1h	mlt1	132 mm x 163 mm	114 mm x 163 mm	13.6%
mx41	mx	156 mm x 163 mm	148 mm x 163 mm	5.1%
mxt21h	mxt	84 mm x 163 mm	60 mm x 163 mm	28.6%
sbuf	sbuf2h	84 mm x 163 mm	60 mm x 163 mm	28.6%
xrt21h	xr	84 mm x 163 mm	60 mm x 163 mm	28.6%
*Average area savings:*				*18.7%*

Table 7.6 - Least aggressive area savings for various redesigned standard cells

Based upon the redesigned cells and the three sets of area estimates for the remaining cells, several projections of the total standard cell area for each chip have been generated. In order to account for the contribution of each cell instance, the projections were generated by estimating the redesigned area for every cell instance in a chip and multiplying the estimate by the number of instances. This effectively provides a weighing mechanism that includes the effect of every instance in the chip and should increase the accuracy of the estimate. Because only cell instances are used in the estimate, the effect of external feedthroughs (i.e. tracks inserted between standard cells by the router) are not felt. While this permits the comparison of the "true" standard cell areas, it is not acceptable in regards to the total chip area estimate. Consequently, as described in the next section the external feedthroughs are included in the estimate for the overall chip dimensions.

Figure 7.12 - Estimated overall standard cell area savings for F-RISC/G core architecture chips

As shown in Figure 7.12, the resulting total area savings are somewhat less than the average cell area reduction for each set of data. The decrease in the projected total standard cell area savings relative to the average savings are shown below in Table 7.7. Nearly all of the projections for the entire standard cell area reduction are below the average cell reduction values, indicating that the uneven distribution of cell instances has a definite effect upon the total area reduction.

Table 7.7 - Decrease in area savings for projected standard cell area reduction compared to average cell area reduction

In order to quantify the effect of multiple cell instances, the standard cell area savings were estimated using only one instance of each cell per chip (see Figure 7.13). It was expected that flattening the cell distribution would bring the total area savings closer to the average cell area reduction numbers but, as shown in Table 7.8, this is clearly not the case. While overall the cache controller estimates remained about the same, the instruction decoder and datapath numbers are clearly worse. The datapath chip appears to have more cell types with below average area savings (which lower the average savings, hence the lower normalized savings) but the higher quantities of above-average cells compensate to some extent and pull the average up. This analysis indicates that the average area reduction for the entire cell library is a poor indicator of the total standard cell area savings and that the relative distribution of cells is just as important in determining the area savings.

Table 7.8 - Decrease in area savings for flattened standard cell area reduction compared to average cell area reduction

Although savings projections have been made for individual cells, average cell area and total standard cell areas, it is not entirely clear how these numbers translate into overall chip area savings. In order to establish the chip-level area reduction, the relationship between the standard cell area and total chip area must be established, including the effect of external feedthroughs.

For the three F-RISC/G chips, both the standard cell rows and horizontal routing tracks determine the height. Since the cell redesign focused solely upon removal of vertical forced feedthroughs, the cell heights were not affected and thus the chip height should have little dependence upon the cell area savings. In contrast, the chip width is determined in large part by the cell widths and would benefit directly from any reduction in the average horizontal dimension.

Because the chip width is also dependent upon the vertical routing tracks and the pad frame width, the absolute reduction in standard cell width is directly transferred to the top level but the percentage decrease in area is obviously smaller. As shown in Eq. 1 below, the reduction in the standard cell block width (D W_SC) is calculated by subtracting the averaged total width of external feedthroughs per row (A_avgFT*W_FT) from the original block width (W_original) in order to get the width of the standard cells alone. The result is multiplied by the average area reduction factor (only the standard cells will be reduced) to get the absolute reduction in the standard cell width.

With the reduction of the widest standard cell block in absolute terms, the reduced chip width can be determined simply by subtracting the change in width (D W_SC) from the overall chip width. Table 7.9 below shows the projected area savings for the chips using the three reduction factors described earlier.

Table 7.9 - Reduced chip width (W’) and percent reduction based upon estimated standard cell area savings and no additional feedthroughs

The estimates in Table 7.9 above are based upon maintaining the same number of external feedthroughs after the standard cell internal forced-feedthroughs have been eliminated. While this may present a rough lower bound for the area when the cells are redesigned, it is not entirely accurate in the sense that some of the internal cell feedthroughs in the original layout were used but not included in the average feedthrough estimate. Presumably, when the redesigned cells are used, more external feedthroughs will be added to the design to route signals that previously used the internal feedthroughs. To account for these feedthroughs, the average number of utilized internal feedthroughs per row has been added to the external feedthrough count. As a result, the average number of feedthroughs per row has increased and produced a decrease in the area savings for the new design paradigm. Table 7.10 below contains the estimates of chip widths when additional external feedthroughs are added.

Table 7.10 - Reduced chip width (W’) and percent reduction based upon estimated standard cell area savings and additional external feedthroughs

Although the savings have been reduced somewhat, the difference between the estimates is rather small. Furthermore, the small reduction in the savings indicates that the addition of feedthroughs to the design has a much smaller effect than the advantages of removing the forced feedthroughs from every cell.

During the course of the F-RISC/G project, it became apparent that the quality of layout was highly dependent upon the designers’ skill, experience and patience. Within the context of a large high-speed design, a set of heuristics for creating high-quality layouts is essential. For high-speed designs, any hesitation in the switching performance of a differential pair detracts from the overall performance.

In order to minimize disruptions and ensure smooth switching characteristics, techniques have been developed which assist the designer in producing layouts with highly matched parasitic capacitance and resistance. These techniques exploit the inherent symmetry of differential circuits but also allow for asymmetric elements as well. Matched capacitance is an important quality for high-speed circuits. A layout has matched capacitance if the parasitic capacitance for every node and its companion node are nearly equal. Layouts with a high degree of matched-capacitance experience less skew and better performance at higher frequencies.

A common problem among almost every high-speed design is signal skew, but differential designs can also exhibit intra-signal skew, i.e. skew between the two signals which compose the differential signal. While typically not significant for most designs, at the high end of the design spectrum intra-signal skew can become a limiting factor. Intra-signal skew typically arises from asymmetric capacitance, inductance and/or resistance as seen by each half of a differential pair. When one side of the signal is more heavily loaded than the other, it may exhibit a slower response time and affect the switching of the circuit. At the extreme, both sides of the differential pair may have the same value and thereby force the output state of a differential switch into an unknown state for some finite amount of time.

One advantage of differential circuitry is an inherent symmetry due to the underlying current-switch foundation. This symmetry may span the entire circuit or just within a local region and can be characterized by an imaginary axis about which the circuit is symmetrical. Identifying and exploiting circuit axes of symmetry may produce layouts with a high-degree of matched capacitance. Due to the large number of axes within a complex differential circuit, a methodology is required for identifying axes and proceeding with the layout design.

Although it may not occur as often, symmetry can exist at levels of abstraction beyond the bottom circuit level. Extending the concept of axes of symmetry to higher levels of the hierarchy will help balance interconnection between subcells and keep skew low.

There are numerous axes of symmetry available in a differential circuit simply because the circuit is inherently two-sided. There are primarily two types of symmetry axes: current-switch and critical-path.

Because current switches are composed of only two devices connected in a standard way, they provide a natural localized axis of symmetry between them. These axes provide a reference plane for creating a symmetrical layout. When current switches are "stacked" in a cascaded manner (i.e. a current tree), the symmetry of current switches at the base of the tree extend upwards towards the top level of switches. For example, the current-switch tree in Figure 7.14 below has three axes of symmetry, two of which (axes 2 and 3) are on the lowest level of hierarchy and the remaining axis is on the top level. Axes 2 and 3 are local axes that exist only within the current switch while axis 1 is a global axis of symmetry.

As current-switches are stacked, axes from the bottom of the tree may extend into the upper levels and possibly to the top level of hierarchy, forming global axes of symmetry. These axes may also be identified along the critical path through the circuit For example, the high-speed XOR used in the VCO depends upon equal delay between the two input signals, creating an axis of symmetry along the critical path. Figure 7.15 depicts both current-switch (A2-A5) and critical-path (A1,A6) axes of symmetry within the XOR. A close examination of axes A1 and A6 reveals that they are actually one and the same but appear to be separate due to the drawing of the circuit.

The first step in the layout design process should be the development of a floorplan for the entire system under consideration. This map will be used during the cell layout to determine rough estimates for connector positions. The floorplan is developed in a hierarchical process beginning with the top-level of the design and progressing down towards the lower levels. The last level of hierarchy is the actual layout of cells which follows a somewhat different methodology and is not included in the system floorplanning process.

The development of a system floorplan must take into account rough estimates of subsystem and subcell area. These may be generated based upon certain statistics such as device count, power, or simply by designer experience. The layout of each cell then becomes somewhat constrained to the floorplan specifications and may require the redesign of the floorplan if the area requirements cannot be reasonably achieved.

The layout of a cell begins with the specifications from the system floorplan, namely cell area and connector locations. All axes of symmetry should be identified along with their relative position in the axes hierarchy in order to apply them at the appropriate point in the layout process.

After the initial placement of connectors, the layout process follows the critical path through the cell. The cell area should be partitioned (as with the system floorplan) based upon the major axes of symmetry. Partitioning continues within each subpartition using the remaining axes of symmetry until the entire hierarchy has been traversed. At this point, the positioning of devices and interconnection may begin and should be straightforward. After the critical path has been placed and routed, non-critical circuits (e.g. bias circuitry) may be added to the layout.

During the initial floorplanning stages and axes of symmetry identification, it is important to take note of any special conditions that should be considered during the layout process. This may affect the selection of axes of symmetry in order to accommodate certain requirements. For example, consider the high-speed XOR schematic shown in Figure 7.15. A quick analysis can identify several axes of symmetry but the selection of the major axis can dramatically affect the finished layout. Axes A1 and A6 both appear to be equally valid major axes of symmetry, however, selecting axis A6 could dramatically increase the length of the input lines. In order to reduce the input capacitance, several circuit variations should be explored based upon the floorplan specifications. Different orientation of cells and/or current switches will often have drastically different effects upon the layout elsewhere in the system.

In order to assist designers in producing layouts with better capacitance matching, two sets of heuristics have been developed for both physical layout on both the global and local levels. The global, or system level heuristics are aimed at identifying symmetry within the upper levels of the design and orienting the critical path to obtain lower overall length. The local heuristics are focused upon the actual cell layouts and may exhibit more symmetry due to the use of differential logic.

As VLSI circuits continue to increase in speed, the physical design methodology will become more important. This is true for all types of designs, from standard cells to full custom, and will continue as long as the increase in device speeds outpaces improvements in VLSI interconnection technology. With every increase in transistor performance, a greater percentage of the circuit delay becomes attributable to the interconnect. In this chapter, standard cell and full-custom physical design methodologies were investigated. Like nearly all processors today, F-RISC/G is a standard cell design and consequently has reduced performance but increased ease-of-design.

The inclusion of a forced-feedthrough requirement by the CAD tools has further reduced the performance of the processor unnecessarily by both increasing the overall chip area and interconnection lengths as well as oftentimes leaving unused metal "stubs" within the cells which add capacitance. Estimates based upon redesigned standard cells indicate that the overall chip area savings range from 13.3% to 8.9% depending upon the specific F-RISC/G chip and the assumptions regarding redesigned cell area projections. With these reductions and further improvements from process metallization changes, the chip areas could be significantly reduced which will translate directly into shorter interconnection delays and parasitic capacitance.

Not all circuits can use a standard cell implementation, usually because the performance or area requirements are too stringent. For these designs, semi- or full-custom layout is the preferred physical design process. While full-custom layout can produce circuits that operate closer to the upper design limits, special care must be taken to wring out the last drops of performance. To this end, techniques have been developed which produce layouts with closely-matched parasitic capacitance for differential circuits. Starting with the top level of the design, the techniques move through the circuit hierarchy and focus upon the critical path through both the entire circuit and each block along the path. These techniques were developed during the high-speed voltage-controlled oscillator design and applied throughout.

F-RISC/G Chip	Subcell count	Subcell instances	Cells with utilized FTs	Total available FTs	Total utilized FTs
Cache cont.	500	58	203 (40.6%)	1624	263 (14.4%)
Datapath	591	84	242 (41.0%)	1989	325 (16.3%)
Instruction dec.	638	89	258 (40.4%)	1968	313 (15.9%)

Chip	Utilized FTs	Multiple FTs	Input FTs	Output FTs	Clock FTs	Select FTs
Cache Cont.	40.6%	10.4%	23.0%	9.8%	8.8%	5.8%
Datapath	41.0%	11.3%	20.3%	11.7%	13.2%	5.1%
Instruction Dec.	40.4%	7.8%	25.2%	11.0%	8.5%	1.3%

	Total Connections			Input Connections		Output Connection			Other Connections
Chip	North	South	North		South	North	South	North		South
Cache Cont.	440	431	358		342	281	265	135		161
Datapath	522	517	370		386	319	336	223		195
Instruction Dec.	571	568	423		441	350	354	162		154

Chip Name	Aggressive	Less Aggressive	Least Aggressive
Cache Controller	20.5%	16.2%	17.6%
Datapath	22.3%	22.6%	19.9%
Instruction Decoder	12.5%	6.6%	-0.5%

Chip Name	Aggressive	Less Aggressive	Least Aggressive
Cache Controller	14.7%	16.8%	20.1%
Datapath	36.5%	37.1%	38.5%
Instruction Decoder	16.1%	13.0%	7.5%

Chip Name	Original Width (W)	Std. Cell Block W (max)	Aggressive Estimates -W’ (% Reduction)	Less Aggressive Estimates - W’ (% Reduction)	Least Aggressive Estimates - W’ (% Reduction)
Cache Controller	9016 mm	6815 mm	7734 mm (14.2%)	7890 mm (12.5%)	8163 mm (9.5%)
Datapath	8005mm	5614 mm	6966 mm (13.0%)	7125 mm (11.0%)	7315 mm (8.6%)
Instruction Decoder	7214 mm	4697mm	6213 mm (13.9%)	6297 mm (12.7%)	6418 mm (11.0%)

Chip Name	Average # of FTs Per Row	Average # of Added FTs Per Row	Aggressive Estimates -W’ (% Reduction)	Less Aggressive Estimates - W’ (% Reduction)	Least Aggressive Estimates - W’ (% Reduction)
Cache Control.	121.0	28.0	7815 mm (13.3%)	7959 mm (11.7%)	8216 mm (8.9%)
Datapath	99.3	22.8	7030 mm (12.2%)	7179 mm (10.3%)	7358 mm (8.1%)
Instruct. Decoder	57.9	14.9	6258 mm (13.3%)	6339 mm (12.1%)	6453 mm (10.6%)