Cache RAM Characterization and Optimization
The F-RISC/G processor uses a Harvard architecture in which separate on-chip caches are used for instructions and data. There are 16 cache memory chips in the F-RISC/G processor, 8 per cache. Each chip uses four copies of a memory block which is nearly the same circuit as the register file but at lower power levels. Consequently, when the register file required changes to meet the performance goals, the cache RAM block did also. Fortunately, the performance specifications were less stringent than for the register file which made the redesign process much less difficult. One major problem did arise during the redesign process, due to the prolonged WRITE operation of the cache controller, but this was mitigated by adjusting the internal voltage swings.Finally, a novel circuit for the Read/Write Logic was developed which improved the WRITE recovery time.
In addition to the register file, the cache RAM has also been extensively modified and retuned. Many of the layout changes from the register file have been incorporated into the cache blocks but the optimization process has been completely separate due to different performance and area constraints. The cache RAM block has 32 rows of 16 bits apiece and a maximum allowable power consumption of approximately 1.5 W. Four of these blocks are used in the cache RAM chip, hence the power consumption is multiplied by a factor of 4 in the actual implementation.
The access time was originally specified at below 450 ps, however, problems in the design of the cache controller chip cut the time to under 400 ps. The register file and cache RAM block circuit topologies are essentially the same but with some differences in the threshold voltage generator and address line driver circuits which trade-off power and performance. In addition, most of the resistor component values are not the same, reflecting the differences in power and speed of the two circuits.
The original cache RAM block had a READ access time of 487.8 ps based upon SPICE simulations with the 2-sided device model and the anisotropic reduced interlevel dielectric (rILD) interconnection model. As with the register file, the access time was fairly divided between the address/word driver circuits (250.8 ps) and the memory cell / sense-amplifier section (237.0 ps), although the proportion has shifted towards the address / word drivers due to the doubled word line length. The intrinsic delay (i.e., the circuit delay without parasitic capacitance) is 293.1 ps. As shown in Figure 5.1, the sensitivity of the cache RAM block to capacitance is (not surprisingly) roughly the same as for the register file.
Figure 5.1- Cache RAM block READ sensitivity to address, bit, and word line parasitic capacitance
Circuit Optimization and Modification
Many of the changes made in the register file were applied to the cache RAM block with varying degrees of success. As few modifications were made as possible in order to reduce the redesign time and keep the circuit simple.
As with the register file, a resistor was placed between the bitlines to provide passive equalization and improve the switching performance (see Figure 4.21 and Figure 4.22). A sensitivity analysis (Figure 5.2) was performed to determine a suitable value for the equilization resistor. The current through the resistor drops significantly between values of 1kW and 3kW . A value of 3kW was selected due to its position as the break point for the current and access time plots.
Figure 5.2 - Cache RAM block sensitivity to bitline equalization resistor
During the cache RAM block optimization process, the bitline current proved to be one of the most important parameters in terms of performance. Since the bitlines have the largest parasitic capacitance in the circuit but do not exhibit any RC effects, their delay becomes simply a charge-control problem. The original cache RAM block design provided bitline currents on the order of 0.5 mA. Because the devices are limited to 2.0 mA, there is plenty of room for improvement. Increasing the bitline current significantly within the register file was not permissible due to the already high bitline currents.
Figure 5.3 shows the relationships between READ access time, bitline current and bitline current source resistor. As the bitline current increases, the access time decreases drastically and breaks below 400 ps at approximately 300 W (1.5 mA). Increasing the current will provide even more gains in performance, but it is also important to realize that the RAM block power dissipation cannot exceed 1.5 W.
Figure 5.3 - Cache RAM block sensitivity to bitline current source resistor
The percentage of total circuit power consumed by the bitlines is shown in Figure 5.4 along with the corresponding bitline current. As the bitline current (and thus the total system power) grows, it accounts for an increasingly larger proportion of the total circuit dissipation.
Figure 5.4 - Relation between cache RAM block bitline current and total power (relative to bitline current source resistor of 800 W ).
Figure 5.5 shows the relationship between the change in total system power and the reduced access time due to increased bitline current. Despite improving the access time by 26%, the increased bitline current has raised the total power dissipation by only 16% . However, from the non-linear tail-end slope of the current in Figure 5.4 and the percent increase in power in Figure 5.5 it appears that further gains in access time will become more expensive in terms of power.
Figure 5.5 - Comparison of cache RAM block READ access time improvement to % increase in power (relative to bitline current source resistor of 800 W )
Cache RAM Block Summary
The performance characteristics for the cache RAM block are shown below in Table 5.1 and the breakdown of power among the circuit components is shown in Figure 5.6.
|READ access time (static, dynamic)||
371.0 ps, 338.2 ps
1.37 mm x 2.02 mm
Table 5.1 - Cache RAM block performance and characteristics
Figure 5.6 - Power dissipation breakdown for cache RAM block
Problem with Sustained WRITE Operations
One unanticipated problem which arose was the effect of sustained WRITE operations. During a WRITE, the internal memory cell levels are overwritten by forcing the bitlines to extreme voltages. Since the memory cell and read/write logic are connected to the bitlines in a "wired-OR" configuration, the high base node within a selected memory cell can override the bitline low voltage supplied by the read/write logic and force the bitline high. Because the other bitline is also forced high by the read/write logic, both bitlines can be set high and the actual output value of the circuit will become indeterminate.
This condition never arises in typical usage of the register file because there is at least one READ cycle in-between successive WRITEs. Unfortunately, the cache RAMs do not have this desirable constraint. Due to the asynchronous design of the F-RISC/G level-1 (L1) cache, the WRITE mode must be maintained whenever data is requested from the level-2 (L2) cache. When the data arrives, it is written into the level-1 cache RAM blocks and then sent through to the datapath chip. If the new data is different from the previous bus values, then the bitlines will have valid signal levels and the WRITE operation will proceed normally. However, because the internal nodes have had the opportunity to switch fully and charge both bitlines high, a problem may occur if some bit values sent from the L2 cache are the same as the previous values.
Mechanics of the WRITE Operation
In order to resolve this problem, the actual mechanisms by which new values are written into the register file must be understood. As mentioned previously, there are two distinct mechanisms which may change the state of a memory cell, namely turning off the "on" device by raising the bitline potential or forcing the "off" device to turn on by dropping the bitline voltage. Although both methods are used in the register file to obtain higher performance, only one is actually necessary to set the state. Consequently, the slower cache RAM block may use only one of the methods and still meet the performance specifications.
To avoid applying the same potential to both bitlines, the high voltage applied by the read/write logic must be reduced in order to provide a significant bit line voltage differential during sustained WRITEs. When the selected memory cell drives a bit line high during a WRITE, the static potential of the bit line can now be greater than the potential of the other bit line set by the Read/Write Logic, thereby providing an acceptable minimum potential difference.
The read/write logic has two modes of operation (READ and WRITE) and sets the bitlines accordingly. In the READ mode, the circuit attempts to set both bitlines to the same voltage (see nodes A and B in Figure 5.7). Because this potential (VA / VB) is between the memory cell high (VC) and low (VD) potentials, the Read/Write Logic establishes the lower bitline voltage. In WRITE mode, however, the circuit forces one bitline high and the other low. To solve the sustained WRITE problem, the high voltage of the read/write logic must be reduced. This then allows the high voltage from the memory cells to rise significantly above the read/write logic high value and provide a differential signal on the bitlines.
The READ and WRITE voltages are established by the current flowing through the circuit and the pull-up resistor values. Because the logic is symmetric, the WRITE voltages have the same deviation from the READ voltage but in opposite directions. As a result, the high WRITE voltage cannot be reduced without diminishing the low voltage as well and thereby slowing the WRITE operation. The high and low WRITE voltages could be reduced by shifting down the midrange READ reference voltage but this in turn would increase the READ access times.
Figure 5.7 - Original and modified Read/Write Logic voltage swings
To compensate, the high value from the read/write logic was "reduced" by shifting the midrange READ voltage upwards. This effectively reduced the high WRITE voltage and increased the low potential. The READ voltage was shifted by reducing the current through the pull-up resistors using an additional current path in parallel with the READ side of the circuit (shown in Figure 5.8).
(a) Register file Read/Write Logic (b) Cache RAM block Read/Write Logic
Figure 5.8 - Comparison of register file and cache RAM block Read/Write Logic circuits
A 32x16 cache RAM macro has been developed with an access time of 371.0 ps and a WRITE time of 225.8 ps. The cache macro has twice as many elements as the register file but consumes only 1.5 W, or 25% less power. The access time of the cache RAM was significantly improved by increasing the bitline current but at a relatively low cost in terms of total power dissipation. One major problem which involved the design of the cache controller prolonged WRITE operation was uncovered during the redesign process and fixed. In addition, the cache RAM block uses a novel read/write logic circuit which has a reduced high bitline voltage level during WRITE operations in order to improve the recovery time for a subsequent READ.
Redesigned Cache RAM Statistics
1367 mm X 1805 mm
32 rows X 8 bits
|READ access time||
|WRITE access time||
2622 transistors, 1050 diodes
Table 5.2 - Redesigned Cache RAM statistics