CHAPTER 7

TESTING SCHEME

Introduction

A comprehensive testing scheme is necessary to increase the yield of a multichip package. Usually, chips are tested before insertion into a package, interconnect layers are inspected and repaired after every layer, and final assembly is inspected and functionally verified before declaring a fully working package. A number of test schemes are used for these steps [John87] such as automatic and manual optical inspection, capacitive probing, acoustic microscopy, x-ray and laser inspection, voltage -contrast e-beam [Keez94], and electrical in-circuit testing schemes such as built in self test (BIST), level sensitive scan, and boundary scan. Electrical testing schemes are attractive because of low cost and the ability to incorporate the scheme right into a design. Most of the F-RISC/G package testing is done using in-circuit electrical testing schemes.

All the F-RISC/G chips use boundary scan testing for functional and at-speed verification. Boundary scan was preferred because it used only a fraction of resources required by a BIST scheme [Phil93]. This chip level scheme was easily adapted to test a collection of chips together at the package level. The testing is done at several points during the package fabrication phase. The electrical testing scheme is used in conjunction with the optical inspection of interconnect at GE to fully verify the package. This chapter describes the complete test scheme with emphasis on package level testing.

Test Constraints and Methodology

Two types of tests are devised to verify the chips and the package: functional and at-speed. The main constraints in the design of these tests are

These guidelines shaped the testing scheme described in the rest of this chapter. Figure 7.1 illustrates the overall test process used for this purpose. Tests will be done both at RPI and GE with different types of test equipment at different stages as shown in Table 7-1. First stage is to identify Known-Good-Dies (KGD). Once enough dice are accumulated to populate a package they will be forwarded to GE for packaging. The KGD identification part will be repeated after the dice are placed on the substrate. In case of a die failure due to pick-n-place operation that die can be replaced at this step. The first layer of insulator and metal are put down next and all the chip pads brought up. The dice are tested again with boundary scan tests individually. Any fault here will require debelting of tape on that sub-module and possible failure of all the dice on that module. After that all the layers are put down and chips are tested for functionality by scan chains and for speed by program execution.

Figure 7.1: Test process flow.

Table 7-1: Test matrix.

Test StageTest Scheme Location
Equipment
Bare dieBoundary scan RPIHigh-speed probes, probe station
After pick-n-placeBoundary scan RPIHigh-speed probes, probe station
After first tapeBoundary scan RPIHigh-speed probes, probe station
From second to last tapeOptical inspection GE
After module completionBoundary scan,

Program control

RPITest jig with heat sink

Bare Die Testing

All the chips employ boundary scan schemes to comprehensively test the functionality of the chips. This has been discussed in detail in [Phil93][Maie96] but will be overviewed here for additions and corrections. A boundary scan test scheme places a scan cell between the core logic and an I/O pad and connects all the scan cells in a single chain as shown in Figure 7.2. The input to the first scan cell and the output of the last scan cell are brought out as separate control pads on the chip periphery and are used to move data in and out of the chip, serially, during testing. These scan cells are controlled by additional signals to control the testing status - bare chip, chip on package, or normal chip operation - and to confirm operation of the chip at-speed.

Figure 7.2: Simple schematic of boundary scan scheme.

There are twelve signals in total as shown in Figure 7.3 using two six-channel probes. All the signals are described later in Table 7-2 and Table 7-3. Additional power probes are put down on the remaining two sides of the chips. The G-P-G power probe and S-S-P-G-S-S-G-P-S-S signal probes can handle a maximum of 1.0 A and 0.5 A current respectively [Casc81]. A set of 50 _ G-S/S-G probes, mounted on a flexible arm, are used to test any individual driver on the pad periphery.

In the standard boundary scan testing scheme, test vectors are serially shifted into the scan cells and applied to the core logic. The data is collected from the scan cells connected to the outputs of the core logic after a clock cycle. The added at-speed testing feature of the circuitry allows the presentation of the test vector and observation of the circuit response with an individually variable delay from the start of one of the four clock phases. The variable delays can be adjusted from 0 to 280 ps in steps of 40 ps.

Figure 7.3 : Probe sites on the ID, DP, and CC chips - left; and RAM chips - right.

Figure 7.4 : Test setup for single chip testing.

Submodule Testing

Chip testing on the package is done in three phases. In the first phase the standard suite of boundary scan tests is executed after the chips are placed on the submodule substrates. At this point all the chips can be tested at-speed. The idea behind the first phase is shown on left in Figure 7.5. In the second phase the tests are done after gluing down the first insulator layer and depositing the first metal layer. The schematic of the second phase is shown on right in Figure 7.5. This brings up all the pads to the first metal layer and the standard boundary scan test scheme is applied again to test the integrity of all the chips. The module top view at this stage is shown in Figure 7.7. If all the chips pass all the tests the whole module is declared functional and is ready for integration. In the third phase the chips are tested after all the interconnect is built and the three modules are integrated together.

Figure 7.5: Chips are tested after placing them on the substrate: left; after first metal layer: right.

Figure 7.6: Test setup for submodule testing.

Figure 7.7: Top view of the package after the first tape layer is put down showing the test pads.

Interconnections Testing During Fabrication

The interconnects are tested optically at GE, after building each layer, to achieve maximum possible yield for the whole interconnect structure. Any defect at this stage can be reworked without debelting the whole tape. This testing is complemented later by testing the wires electrically using the scan chains.

Final Module Testing

Final module verification will proceed in the order

The final package I/O is shown in Figure 7.8 with all the scan chains. All these tests will be conducted on a separate test jig described later.

Figure 7.8: Scan chains on the package.

Interconnection Testing

One round of interconnect integrity will be made at GE by optical inspection. Another way of testing these interconnections is by using the scan chains displayed in Figure 7.8. First, all the output latches on the chips in a scan chain are loaded with test vectors. In the next step these chips are clocked and the data is transmitted over the wires to the receivers on other chips as shown in Figure 7.9. The test vectors are latched at these receivers and are scanned out to be compared with the original vectors. Depending on the degree of failure the interconnect can either be reworked or just discarded to be built up again. This method doesn't provide delay information of the wires unless the test is conducted by carefully deskewing the clocks to the two chips. Delay indication of the interconnect is obtained by using a pair of unused pads on the RAM chips to create a ring oscillator on the package.

Figure 7.9: Schematic of the interconnections testing scheme.

Boundary Scan Testing

As mentioned before, the scan chains of a few chips are connected together in a daisy chain on the package to facilitate testing of a few chips together with a considerable reduction in testing time. The SCAN_IN and SCAN_OUT signals of a group of chips are daisy chained and all the control signals are broadcast to that group. The following chains were made on the package

1. ID, DP0, DP1, DP2, ICC, and DCC.

2. IM0, IM2, IM4, IM5, IM6, and IM7

3. DM0, DM2, DM4, DM5, DM6, DM7

4. IM3

5. IM1, DM3, and DM1

6. DP3

The boundary scan chain itself can be tested by setting it into oscillation by using an off-chip - during chip testing - or off-MCM - during MCM testing - inverter to set the whole chain into oscillation. The scan clock input in this case is a slow speed free running clock and the whole boundary scan line acts as a long multichip shift register.

The memory chips use a different boundary scan scheme than the ID, DP, and the CC chips though in principal they are same. The signals available for both schemes on the respective chips are given in Table 7-2 and Table 7-3. The signals which are shared among all the chips in a chain are shown lightly shaded in the tables. Signals shown in darker shade need to be supplied individually to all the chips. Table 7-2 shows that only INP_SEL needs to be supplied individually for testing MCM wires. SCAN is a control signal requiring a sharp rise time. The scan clock (SC) to the boundary scan cells is daisy chained in reverse order to data, as shown in Figure 7.10 to remove any possibility of a race condition due to excessive skew.

Figure 7.10: Movement of data and scan clock in opposite direction.

The TEST signal is internally pulled high to TEST mode. Once the chips are tested and placed on MCM it is pulled low externally. Cache RAM chips don't use the deskewed clock and therefore need an external clock to run the internal state machine for testing. The HSCLK signals are daisy chained by providing another driver beside the HSCLK receiver to send the output to another chip. This way of active daisy chaining as compared to passive daisy chaining gets us signals with sharp rise times.

Another trick used to provide maximum flexibility for testing was to provide all the configuration signals for the datapath chips on the package edge. These bits are used for personalizing the datapath chips so that they act as DP0/DP1/DP2/DP3. By changing these bits any datapath chip can emulate any other datapath chip increasing the debugging capability of any logic fault. They can also be used to test the middle module alone for possible at-speed verification by converting the DP2 chip to DP3.

Table 7-2 : Signals available on DP, ID, and CC for boundary scan testing.

Signals
Probe Site
Comments
VIEWA
1
Depends on SEL, high speed (output)
SEL
1
Selects either VIEWA or VIEWB (input)
VIEWB
1
Depends on SEL, high-speed (output)
START
1
Start signal for scanning (input)
TEST
1
Enables testing mode (input)
SC
1
Scan clock (input)
SCAN_IN
2
Scan in data (input)
INP_SEL
2
Select input mode (input)
SCAN
2
Control signal (input)
SCAN_OUT
2
Scanned out data (output)
CLK
2
Clock (input)
SYNC
2
Sync signal (input) - Initializes 4-phases

Table 7-3 : Signals available on CR for boundary scan testing.

Signals
Probe Site
Comments
CNTRSYN
1
Counter high bit for scope (output)
SCAN_IN
1
Serial input port for test vectors (input)
SS
1
Selects single-shot/continuous mode (input)
SCAN
1
Daisy chained (input)
SCAN_OUT
1
Serial output port (output)
W_DEL
1
Delayed write line (output)
SC
2
Scan clock (input)
ANALOG
2
Delay of the write line (input)
CHSEL1
2
Select line for SCOPE output (input)
CHSEL0
2
Select line for SCOPE output (input)
SCOPE
2
High-speed data for scope (output)
HSCLK
2
High speed clock (input)
TESTPulled high internally

System Testing

A full system test will be done when all the chips and the submodules are verified as functional at their intended speed. A simple block diagram of the testing setup is given in Figure 7.11. Since the secondary and main memory is missing from the package, a technique of testing the system with the on-board cache memory only, was designed. This memory is loaded via scan chains and the system is booted up to read from it. Since the size of L1-cache is very small, small programs are used to demonstrate the at-speed operation of the system. These programs are at the end of this chapter.

Figure 7.11: Block diagram of the full system test.

An approximate arrangement of the electrical support system is shown in Figure 7.12. Power and control signals are supplied by two-sided custom flex cables from Advanced Circuit Technology, Inc., Nashua, NH. These cables are custom manufactured to supply 10-100 A of current depending on the length and voltage drop specifications.

Figure 7.12: Top view of the system under test with flex connections to a PCB supplying power and control signals. The high speed signals are taken out directly by a coax connector.

There will be 4 cables of length not exceeding 4-6 inches on all four sides connected to a surrounding PCB with removable connectors such as cinch buttons. The PCB will be custom designed to carry low inductance power planes with enough bypass capacitance. Table 7-4 gives a short list of required parts for this custom PCB.

Table 7-4: Description of parts.

PartQuantityComments
DC-DC Converter4Standard
Cu 2-sided Custom Flex4 From ACT
Bypass CapacitorsBoard level

Supplying Power

The total power requirements of the setup are given in Table 7-5. Switching power supplies have an efficiency of more than 80% while linear supplies are about 50% efficient. Linear supplies on the other hand show much less ripple at the output. A good power supply also has overvoltage, overcurrent, and overtemperature protection capability and can operate in series or parallel with similar supplies. these supplies can be controlled from the front panel or via a GPIB controller. The power comes at high dc-voltage level to the board to reduce losses in the power cables and keep noisy power supply away from the module under test. This will require dc-to-dc converters in the test box.

Table 7-5: Estimate of the total power requirements during testing.

Unit Name
Power Required [W]
Package240 W
Thermo-electric coolers473 W
Total713 W
Blower208 V 60 Hz/220 V 50 Hz 1 PH
Switching power supply713 * 1.25 = 891 W
Linear power supply713 * 2 = 1426 W
Oscilloscope180 W
Total (switching power supply)891+180 = 1071 W
Total (linear power supply)1426+180 = 1606 W

Supplying clock signals

The clock to the package is supplied via a 50 surface mount mini-SMA Hirose connector from a Weinschel Enginnering Model 432A/438A .01-4 GHz benchtop tunable oscillator. This clock drives the input of a clock tree which generates 10 synchronized clocks. One of this clock output is available via a coax connector for external monitoring. One other option was to use a surface mount clock oscillator such as available from Mini-Circuits. The problem was that these oscillators work in a narrow range such as 1.5 GHz-1.91 GHz and therefore the package can't be run at any other frequency.

Powering Up Strategy

Following steps are followed while powering up the system:

  1. Connect the heat sink blower and the package power supply in series to prevent accidental destruction of the package in case of a blower failure.
  2. Switch on the power supply and record the power supply current and voltage for each module. Compare these readings against the expected numbers. A good match between these numbers gives some confidence in further tests.
  3. If the system passes static test do the boundary scan test with a simple scan in and scan out. This will test all the interconnections. Passing this test implies normal working of the external interface.
  4. Load a simple program and test the cache first.
  5. Load the cache in single shot mode and pull the test signal low.
  6. If the cache is found OK test the instruction decoder chip next. This will confirm the fetch cycle.
  7. Datapath comes last in this cycle. Testing the datapath finishes a full confirmation of the system.
  8. Use the handheld infrared imager and attached thermocouples to get the temperature readings.
  9. At-Speed Simulation and Testing

The processor in the current form doesn't support at-speed connections to the second level cache (L2) memory. It can run instructions that are scanned in serially into the on-module primary instruction cache to verify the 2 GHz, 4 phase operation claims for the demonstration tests of the processor. The capacity of the primary instruction cache is 2 KB which is equivalent to 512 instruction words with 32-bit widths. Small specialized programs were developed to fit into this memory to show the processor running at speed when executed. The speed critical parts of the processor and their specifications are given in Table 7-6.

Table 7-6: Speed critical circuits in F-RISC/G.

Circuit
Speed
Adder
< 1000 ps
Register File
200 ps
L1 Cache
750 ps

Thus, to show a processor running at-speed requires instructions exercising all the circuit macros given in the Table 7-6. Normally, this can be done by running a few instructions and checking the resulting state of the processor. One good way of defining this state is to look at the resulting contents of the program counter, register file, and cache, i.e., all the storage bits. This requires the placement of many high-speed control signals on the package and still lacks a clear proof of the processor speed. Therefore, a dynamic method of testing the processor speed was devised by recognizing the importance of the carry out signal from the most significant bit of the 32-bit adder. The connectivity and routing information of this signal is shown in Figure 7.13.

Figure 7.13: Illustration of carry chain test output.

This bit can be set to change every cycle by executing an add instruction coupled with a branching loop as shown at the end of this section. If the processor is running at a clock of 1 GHz this bit will run at 500 MHz. Since the carry out bit from DP3 is unused on the processor it is routed out to a connector for observation purpose without any extra loading on the net slowing down its rise time. It can also be programmed to show different sequences thus negating the possibility of a fluke signal.

Since the chips were designed by multiple designers, a way of combining and simulating the chip netlists - containing the net delays - in a hierarchical manner was devised. The processor was broken up into two clearly divided sections for timing verification - core logic and cache memory. The core logic timing will be described by Steven Carlough in his thesis. The cache timing involves the instruction fetch, and data load/store timings. The rest of the sections describe the simulation technique and simulated cache timings alongwith the system boot sequence. The simulations were done using qsim tool in the compass design automation tool suite. One strange problem with these simulations was the incompatibility of the cache memory chip netlist with the rest of the architecture chips' netlist which kept crashing the simulations. A lot of consultation was done with the company but the exact source of error remain undetermined.

Simulation Technique

The top level simulations were done in the following manner:

  1. Create a symbol for each chip with the pin names representing the chip I/O pads. A separate symbol is needed for representing these chips as the normal schematic of the chips doesn't have the full timing information. The new symbol makes it possible to call the backannotated netlist (.NLE) of the chips directly. Four symbols, representing the ID/DP/CC/CR chips are generated here.
  2. Create a symbol for a simple differential delay element.
  3. Create a top level schematic using the symbols generated in 1 and 2 above. The delay element is inserted into each and every net segment and is used to represent net delays.
  4. Generate a top level .NLS netlist from logicasst.
  5. Replace NLS with NLE in all the M and I statements in the .NLS generated in step 4 (only for the architecture chips). This change makes it possible to call backannotated .NLE files of the chips.
  6. Flatten this .NLS as following :
  7. Step 6 above gives a flattened .NLE. In this process it loses all the D statements. Therefore, all the D statements are put back from the original chip .NLE netlists into the new flattened .NLE using a script.
  8. The .NLE generated above is loaded into qsim and supplied test vectors.

An additional delay element was added to every interchip net in the top level schematic as shown in Figure 7.14. The input or output nets of this element were assigned the delay value extracted from the package layout. These delay values were provided in a .sim file and were represented with an equivalent capacitance value. An example fragment is shown in Figure 7.15 with the delays shown on a section of the data bus.

Figure 7.14: Splitting of nets to insert MCM delay.

C nn DBUSO[0] VSS 0 0 164

C nn DBUSO[1] VSS 0 0 164

C nn DBUSO[2] VSS 0 0 172

C nn DBUSO[3] VSS 0 0 172

C nn DBUSO[4] VSS 0 0 168

C nn DBUSO[5] VSS 0 0 168

C nn DBUSO[6] VSS 0 0 170

C nn DBUSO[7] VSS 0 0 170

C nn DBUSO[8] VSS 0 0 177

Figure 7.15: MCM delays in ps represented as equivalent net capacitance.

A daisy chain net was splitted up in the fashion shown in Figure 7.16 and delays of each branch were inserted for separate nets. The daisy chain nets were common in instruction decoder-datapath broadcast and cache cycles. An example daisy chain net is shown in Figure 7.17. For example, the delay of IOCNTRL0[1] signal from the ID to the DCC chip is a sum of the ICCIOCNTRL0[1] (176 ps) and DCCIOCNTRL0[1] (95 ps) delays.

Figure 7.16: Splitting scheme for daisy chains.

# IOCNTRL[1:0] NET - Goes from ID to ICC to DCC

C nn ICCIOCNTRL0[1] VSS 0 0 176

C nn ICCIOCNTRL0[0] VSS 0 0 176

C nn DCCIOCNTRL0[1] VSS 0 0 95

C nn DCCIOCNTRL0[0] VSS 0 0 95

Figure 7.17: Representation of IOCONTROL [1:0] net.

Instruction Fetch Cycle

The instruction fetch cycle is allocated two pipeline stages [Phil93] and is completed within 2 ns. The instruction cache controller (ICC) keeps a copy of the program counter, called remote program counter (RPC), during normal instruction execution and issues a 9-bit address every cycle to the primary instruction cache [Maie96]. The eight IM chips receive this address and put out 4-bits each on the instruction bus after 750 ps.

The critical path in instruction fetch is exercised when a branch is taken. The DP chip sends out the branch target address at phase 1 which is latched at the ICC on phase 3. ICC immediately sends out the lower 9 bits of this address to the IM chips and simultaneously processes the address for a cache hit or miss. Since all the programs to be run on F-RISC/G are small enough to fit inside the instruction cache, an instruction miss is not realized in the demonstration code. The address is latched at the instruction memory and data becomes available after 750 ps at its output latches. The instruction word is sampled by the instruction decoder at phase 1 of the following cycle. The instruction fetch cycle and its delay components are shown in Figure 7.18 and Table 7-7.

Figure 7.18: Schematic of instruction fetch operation.

Table 7-7: Components of the instruction fetch cycle.

Component
Minimum Delay [ps]
Maximum Delay [ps]
Specified Delay

[ps]
DP Driver I/P to DP Driver O/P 136(ø1) 162(ø1)
DP Driver O/P to ICC Receiver I/P 59246
ICC Receiver I/P to ICC Latch O/P 130130
ICC Latch O/P to ICC Driver O/P 7070
ICC Driver O/P to IM Address I/P 63303
IM Address I/P to IM Data O/P 750750
IM Data O/P to ID Receiver I/P 59175
Total 12671836
2000
Margin57.85% 8.93%

The datapath address bus output becomes valid 162 ps after phase 1 in the worst case. Immediately, the BRANCH signal is asserted to tell the instruction cache controller to branch. Instruction cache controller loads the address at its inputs into its local program counter and also sends it out to the instruction RAMs. The IM chips receive the address at different instances due to their physical separation and output the data 750 ps after receiving the address. The data is latched at the instruction decoder at the start of phase 1. Bits I3..I7 are needed 100 ps before phase 1 and therefore they are latched 100 ps before phase 1. For these bits the cycle time is 1900 ps which is still within the 5% margin.

Data Load Cycle

The data load path is an extra 250 ps longer than the instruction fetch path [Maie96]. Therefore, in case of a load the data is transferred to datapath chips from data memory chips in 2250 ps. This path is shown schematically in Figure 7.19 and the minimum and maximum delay paths are given in Table 7-8.

Figure 7.19: Schematic of data load cycle.

Table 7-8: Components of the data load cycle.

Component
Minimum Delay [ps]
Maximum Delay [ps]
Specified Delay

[ps]
DP Driver I/P to O/P
70
70
DP Driver O/P to DCC I/P
58
156
DCC I/P to DCC O/P Latch
130
130
DCC O/P Latch to DCC Driver O/P
70
70
DCC Driver O/P to DM Address I/P
95
330
DM I/P to O/P
750
750
DM O/P to DP I/P
54
175
DP I/P to DP Latch
80
80
Total
1307
1819
2250
Margin
72.1%
23.69%

The worst case margin in the data load case is 23.69% while the best case margin in 72.1%.

Data Store Cycle

The data store operation begins with the generation of the data to be stored during the DE stage [Phil93]. This data is stored in two pipeline registers until it is transferred to the cache during the D1 stage. The instruction decoder signals the data cache controller by asserting WDC long before the target address is supplied by the datapth in the D1 stage. The data cache controller latches the address from the datapath so that the datapath doesn't have to keep it stable. By the time this address reaches the data memory, the datapath chips latch the data at their outputs. This data is latched by the data memory on the receipt of the DINLTCH signal from the data cache controller. The path is shown schematically in Figure 7.20.

Figure 7.20: Schematic of the data store operation.

The timings of the store operation closely follow the load timings as shown in Table 7-9. There is no inherent cycle in this operation as compared to the load and fetch cycles and satisfying the timings is much easier than those operations.

Table 7-9: Components of the store operation.

Component
Minimum Delay [ps]
Maximum Delay [ps]
Specified Delay

[ps]
DP Driver I/P to O/P
70
70
DP Driver O/P to DCC I/P
58
156
DCC I/P to DCC O/P Latch
130
130
DCC O/P Latch to DCC Driver O/P
70
70
DCC Driver O/P to DM Address I/P
95
330
DP O/P to DM I/P
59
180

System Boot Sequence

The module is initialized by asserting the SYNC and RESET signals from the clock generator. This initializes all the state machines and latches in the instruction decoder, datapath, and cache controller chips. At this point the processor executes the boot routine provided in Figure 7.23 to initialize the on-board primary cache. The state diagram given in Figure 7.21 depicts the system operation clearly. The boot routine given fills up the cache to validate its contents.

Figure 7.21: State diagram of the processor operation.

When power is applied to the processor and a RESET signal is sent to the instruction decoder it generates a processor reset interrupt and branches to the address 20hex. When this address is requested from the instruction cache controller it is forced as a miss by asserting the INIT signal on the chip from outside. Subsequently, one cache line starting from the address 20hex is fetched from memory. The instruction at 20hex is a BRANCH to a LOAD instruction residing at the same cache line address in main memory and therefore another miss occurs and this line is fetched. From now on 32 LOADs are issued to validate the data cache. After validating the data cache the remaining instruction cache is validated and after that a BRANCH is issued to the place where the real program resides. The instruction decoder needs a pair of LOAD and STORE instruction right after the first BRANCH instruction to settle into normal mode.

There is a fixed relationship between the SYNC signal and the master clock. The SYNC signal needs to be asserted after 100 ps of a clock edge. This synchronizes all the chips by bringing up their clocks in same phases. This is shown in Figure 7.22. By the time RESET is lowered the IMM bits at the output of ID go to 0020hex. The condition code remain at 1111. Rest of the signals from ID stabilize to 0 or 1 except MBYA and MBYB which remain unknown. ABUS goes to 0020hex with in the first cycle. After 2 cycles the CC bits at the ID output go to 1110. When this condition code reaches DP3 after 240 ps it triggers BRAOUT and BRAOUT2 at its output. The branch signal goes from DP3 to DP0-2, ID, and ICC chips. ICC receives the branch signal and starts fetching the instruction at the address given by the address bus.

Figure 7.22: Timing diagram for processor startup.

BRANCH @LABEL

@LABEL LOAD R0 = addr[0]/IOCTRL=3

NOOP

NOOP

STORE addr[1]=R0

NOOP

LOAD R0=addr[20]/IOCTRL=3

LOAD R0=addr[40]/IOCTRL=3

LOAD R0=addr[60]/IOCTRL=3

LOAD R0=addr[80]/IOCTRL=3

LOAD R0=addr[A0]/IOCTRL=3

LOAD R0=addr[C0]/IOCTRL=3

LOAD R0=addr[E0]/IOCTRL=3

LOAD R0=addr[100]/IOCTRL=3

LOAD R0=addr[120]/IOCTRL=3

LOAD R0=addr[140]/IOCTRL=3

LOAD R0=addr[160]/IOCTRL=3

LOAD R0=addr[180]/IOCTRL=3

LOAD R0=addr[1A0]/IOCTRL=3

LOAD R0=addr[1C0]/IOCTRL=3

LOAD R0=addr[1E0]/IOCTRL=3

Figure 7.23 : Boot up sequence.

LOAD R0=addr[200]/IOCTRL=3

LOAD R0=addr[220]/IOCTRL=3

LOAD R0=addr[240]/IOCTRL=3

LOAD R0=addr[260]/IOCTRL=3

LOAD R0=addr[280]/IOCTRL=3

LOAD R0=addr[2A0]/IOCTRL=3

LOAD R0=addr[2C0]/IOCTRL=3

LOAD R0=addr[2E0]/IOCTRL=3

LOAD R0=addr[300]/IOCTRL=3

LOAD R0=addr[320]/IOCTRL=3

LOAD R0=addr[340]/IOCTRL=3

LOAD R0=addr[360]/IOCTRL=3

LOAD R0=addr[380]/IOCTRL=3

LOAD R0=addr[3A0]/IOCTRL=3

LOAD R0=addr[3C0]/IOCTRL=3

LOAD R0=addr[3E0]/IOCTRL=3

[481 LOAD instructions]

BRA [OS starting address] /CC=1 /RTN=0

NOOP

NOOP

Figure 7.24 : Boot up sequence (contd.)

At-Speed Carry Chain Testing

Simple methods to show the raw speed of the processor have a greater chance of working. The toggling of carry out bit satisfies scanning, loading, fetching, branching, and execution of the processor. The following simple program produces a 10101010... sequence at the DP3-CARRYOUT pin with a frequency of 500 MHz or a quarter of the frequency supplied by the master clock and is a good indicator of the processor speed. It was assembled and simulated using asg compiler and friscsim simulator. The program listing is attached alongwith the object file. The maximum speed of this bit is 500 MHz as shown in Figure 7.25.

Figure 7.25: Output of the CARRY OUT bit at 500 MHz.

; Carryout.fs - This program generates a 101010... seq. at DP3-COUT pin suitable to be viewed on the MCM.

;

; 2/27/97 - Atul

;

; Description

; Line 1,2: Load ffffffff into register R0.

; Line 3 : Branch to the same pc location and execute next 3 instructions (COUT Bit= 0)

; Line 4 : Add 1 to R0 and store the result in R1 (COUT Bit = 1).

; Line 5 : Add 0 to R0 and store the result in R1 (COUT Bit = 0).

; Line 6 : Add 1 to R0 and store the result in R1 (COUT Bit = 1).

; After that the execution goes back to Line 3 (branch instruction).

Assembly Listing

addi r0 = 0 + 0xffff

@0000 4000FFFF ('h65555555AAAAAAAA) ADDI R0=0 + 0xFFFF /NOAT

addi r0 = r0 + 0xffff /ldh

@0001 4204FFFF ('h65595565AAAAAAAA) ADDI R0=R0 + 0xFFFF /LDH /NOAT

branch _1 pc = pc + 0x0000 /ex=3 /sq=0

@0002 24170000 ('h5965566A55555555) BRANCH _1 PC=PC + 0x0000 /LAT=7

add r1 = r0 + 1

@0003 C2082001 ('hA559559559555556) ADD R1=R0 + 0x01 /NOAT

add r1 = r0 + 0

@0004 C2082000 ('hA559559559555555) ADD R1=R0 + 0x00 /NOAT

add r1 = r0 + 1

@0005 C2082001 ('hA559559559555556) ADD R1=R0 + 0x01 /NOAT

ASSEMBLER SYMBOL TABLE:

Table 7-10: Trace of the test routine.

Instruction Register R0Register R1
COUT
ADDI R0=0 + FFFF 0000FFFFuuuuuuuu
0
ADDI R0=R0 + FFFF FFFFFFFFuuuuuuuu
0
ADD R1=R0 + 1 FFFFFFFF00000000
1
ADD R1=R0 + 0 FFFFFFFFFFFFFFFF
0

Summary

A comprehensive test scheme was designed to test all the chips and the final package with minimal equipment and man-power. The scheme is based on boundary scan and comprises of additional at-speed tests. A high-speed 2-GHz clock is supplied either via an on-board deskew chip or via length controlled clock transmission lines. Boot up routine and test programs were generated to guide the testing process.