Central points:
* => Will not be covered in exams.
Chap 1:
1.1 Introduction (Computer Abstractions and Technology)
- Computers have led to a third revolution for civilization
- In the past, performance meant minimizing use of memory
- Understanding the hierarchical nature of memories and the parallel nature of processors is of fundamental importance
1.2 Below your program
- Computer Instructions and data are represented as bits or binary numbers
- Computer hardware executes machine language instructions
- Assembly language and assemblers help humans by symbolically representing machine instructions or simple collections of machine instructions
- High level languages and compilers make programming easier
- Software exists in hierarchical layers
- Systems software is closer to the hardware
1.3 Under the covers
- here are many different types of computer components:
input, output, memory, datapath and control, network
- to cope with complexity we rely on the principle of abstraction
- an abstraction omits unneeded detail
- one crucial abstraction is that of the instruction set architecture
- it provides a standardized interface between hardware and low level software
- Different implementations of the same architecture provide different performance
- binary compatibility is extraordinarily important in the marketplace
1.4 Integrated Circuits: Fueling innovation:
- technology has evolved from the vacuum tube to the transistor to the IC to VLSI
- the industry has consistently quadrupled capacity every three years
- changes in technology drive changes in organization
- chips are manufactured in a series of steps
1.5 Real Stuff*: manufacturing pentium chips
- computer designers must know both hardware and software technologies to build competitive systems
- silicon is a medium in which computer designers work
- fewer dies mean higher costs (Pentium pro vs. Pentium)
1.6 Fallacies and Pitfalls
- Fallacy: Computers have been built in the same old-fashioned way for far too long, and this antiquated model of computation is running out of steam
- Pitfall: Ignoring the inexorable progress of hardware while planning a new machine
Check whether you know all the key terms (see key terms at the end of chapter)
============================================================================
Chap 2: Performance
2.1 Introduction
- Performance means different things to different people
- two important measures of performance are throughput and response time
- for most of the text, performance will be defined as the reciprocal of execution time
- results of performance comparisons are reported using ratios (eg: A is n times faster than B)
2.2 Measuring performance
- Execution time can be difficult to define (i.e. user CPU time, system CPU time, elapsed time)
- We will focus on user CPU time
2.3 Relating the metrics
- CPU time = (cycles/program)x (sec/cycle)
- CPU time = (cycles/program) / (cycles/sec)
- CPI is the average number of clock cycles each instruction takes to execute
- CPU time = (cycles/instruction/ x (instructions/program) x (cycles/sec)
- Performance is always relative to a specific program
2.4 Choosing programs to evaluate performance
- The best programs to use for benchmarks are real applications (MY program)
- Companies try to improve performance on benchmarks using very specialized compiler optimizations, which can sometimes lead to erroneous or misleading performance comparisons
- The guiding principle in reporting performance measurements should be reproducibility
2.5 Comparing and summarizing performance
- total execution time is a consistent summary of performance assuming a workload has been defined
- a weighted arithmetic mean can be calculated based on a workload assumption
2.6 Real stuff: the SPEC95 benchmarks and performance of recent processors*
- For a given instruction set architecture, performance increases come from
- increases in clock rate
- improvements in processor organization that lower CPI and
- compiler enhancements that lower the instruction count or generate instructions with lower average CPI
- SPECint95 and SPECfp95 performance ratings for Pentium and Pentium pro processors at different clock rates emphasize these points
2.7 Fallacies and pitfalls:
- Pitfall: Expecting the improvement in one aspect of a machine to increase performance by an amount proportional to the size of the improvement (Amdahl's law)
- Fallacy: Hardware-independent metrics predict performance
- Pitfall: Using instructions per second as a performance metric
- Fallacy: Synthetic benchmarks predict performance
- Pitfall: Using the arithmetic mean of normalized execution times to predict performance
- Fallacy: The geometric mean of execution time ratios is proportional to total execution time
Check whether you know all the key terms (see key terms at the end of chapter)
============================================================================
Chap 3:
3.1 Introduction
- The words of a machine's language are instructions and its vocabulary is its instruction set
- The text uses an instruction set, MIPS which is used by NEC, Nintendo, Silicon graphics and Sony, and is typical of instruction sets designed since the 1980s
3.2 Operations of the Computer Hardware:
- Simplicity favors regularity (eg: 3 operands for arithmetic)
- The translation from C to MIPS assembly language is performed by the compiler
3.3 Operations of the computer Hardware (contd.)
- Operands of arithmetic instructions are registers
- Smaller is faster (MIPS has 32 registers each with 32 bits)
- Data must be transferred from and to memory (i.e. loads and stores)
- Words must always start at byte addresses that are multiples of 4
- Details for 4 MIPS instructions are presented in the text
3.4 Representing Instructions in the Computer
- The layout of the machine instructions is called the instruction format
- Details of the R- and I- format are presented
- Good design demands a compromise, which is why there is more than just one format, whereas the instruction length is fixed
- The stored program concept underlies today's computers
3.5 Instructions for making decisions
- MIPS contain 2 important decision making instructions (beq, bne)
- Familiar control constructs in C can be created with these instructions and the slt instruction (for comparisons other than equality)
3.6 Supporting procedures in computer hardware
- to execute a procedure, six steps must be performed
- the jal and jr instructions provide support for procedure call and return
- MIPS software allocates specific registers for procedure calls (also see conventions)
- A stack is used to support nested and recursive procedures
3.7 Beyond numbers
- characters are represented using ASCII
- MIPS instructions for loading and storing bytes can be used to help perform string manipulations
3.8 Other styles of MIPS addressing:
- Immediate instructions provide support for specifying 16-bit constants as part of the instructions and use the I-format (addi, slti)
- Support for immediates is provided to help make the common case fast
- The lui instruction (along with addi/ori) is used to put a 32-bit constant into a register
- The PC-relative addressing is used for branches
- The jump instructions are used for unconditional branches
- There are 5 MIPS addressing modes: register, displacement, immediate, PC-relative and pseudo-direct
3.9 Starting a Program*
- A compiler is used to transform a C program into assembly language
- Pseudoinstructions are part of assembly language, but not actually part of the instruction set
- An assembler is used to transform assembly language into an object file
- A linker is used to combine object files to produce an executable
- A loader is used to run an executable
3.10 An example to put it all together
- A complete procedure for 'swap' and 'sorting' is presented
3.11 Arrays vs. pointers*
- An array-based version and a pointer-based version of a procedure for clearing memory are presented
- Modern compilers will optimize array-based code to produce code similar to that derived from pointer-based code
3.12 Real stuff: PowerPC and 80x86 Instructions *
- Additional addressing modes supported by the PowerPC (indexed and update) are example of increasing the complexity of the instruction set
- The 80x86 architecture evolved by committee over a long tie span and under the constraint of providing compatibility. It is thus difficult to explain and impossible to love.
- The 80x86 architecture is very complicated (eg, encodings from 1 to 17 bytes) but its most frequent instructions are not difficult to implement and thus performance has increasingly been improved.
3.13 Fallacies and Pitfalls
- Fallacy: More powerful instructions mean higher performance
- Fallacy: Write in assembly language to obtain highest performance
- Pitfall: Forgetting that sequential word addresses in machines with byte addressing do not differ by one
- Pitfall: Using a pointer to an automatic variable outside its defining procedure
Check whether you know all the key terms (see key terms at the end of chapter)
============================================================================
Chap 4: Arithmetic for computers
4.1 Introduction
- Bits have no inherent meaning. We must decide how to use them to represent numbers
4.2 Signed and Unsigned numbers
- Binary representations for numbers are efficient for computers
- Bits in MIPS words are number from right to left (0 = least significant bit
and 31 is the most significant bit)
- Signed numbers are best represented using 2's complement
- sign extension is used to convert 16-bit two's complement numbers into 32-bit two's
complement numbers
- Two's complement numbers can be negated by inverting all of their bits and adding 1
- Long binary numbers are often represented using hexadecimal
4.3 Addition and subtraction
- Overflow occurs if we don't have enough bits to represent the result
- MIPS detects overflow and for some instructions generates an exception
- Unsigned instructions do not generate exceptions
if overflow takes place (addu, sltu etc)
4.4 Logical Operations
- Instructions which operate on fields or bits of words are useful
- Useful operations include shifting, ANDing, ORing. MIPS provides
instructions for these
4.5 Constructing an ALU
- An ALU performs selected operations upon its inputs
- A 1-bit ALU to support AND, OR and addition operations can be easily
constructed using primitive gates and a multiplexor
- We can use 32 1-bit ALUs to construct a 32-bit ALU
- Subtraction is performed by inverting all of the b-inputs
to the ALU and then setting the initial carry in to 1. All this is 2's
complement -- recall you negate a number in 2's complement.
- The ALU can be tailored to support the MIPS slt and branch instruction
- Hardware executes in parallel
- Carry-lookahead adders (which rely on a reorganization of the logical equations based upon
propagate/generate) are faster than ripple carry adders. The key idea is to exploit parallelism by identifying subequations that can be calculated independent of the carryin.
4.6 Multiplication
- A simple algorithm for multiplication results in an efficient hardware implementation
by successively improving upon the obvious design
- Booth's algorithm provides an elegant approach to signed multiplication and has an efficient
implementation*
- MIPS provides instructions to perform 32-bit multiplication and places the result
in a 64-bit register pair
4.7 Division
- A simple algorithm for division results in an efficient hardware implementation
by successively improving upon the obvious design
- MIPS provides instructions to perform division that are similar in nature
to multiplication because the same hardware can be used for both operations
4.8 Floating point
- The IEEE 754 floating point representation
- Floating point operations can be performed by performing operations on the underlying representation
MIPS provides support for floating point instructions
- The need for accurate arithmetic lead to the development of the IEEE 754 standard
4.9 Real Stuff: Floating Point in the PowerPC and the 80x86 *
- The 80x86 floating point architecture is completely different from all other architectures in the world
4.10 Fallacies and Pitfalls:
- Fallacy: floating point addition is associative
- Fallacy: just as a left shift instruction can replace an integer multiply power of two, a right shift is the same as an integer division by a power of two
- Only theoretical mathematicians care about floating point accuracy (see the story on the Pentium bug)
Check whether you know all the key terms (see key terms at the end of chapter)
============================================================================
Chap 5: The Processor: Data path and control
5.1 Introduction
- We will be designing an implementation of a processor that supports the core MIPS instruction set (lw, sw, add, beq, j)
- We will rely on the use of combinational and sequential logic elements in constructing our implementations
- We will assume an edge triggered clocking methodology
5.2 Building a Datapath
- The basic components of our design include instruction and data memory, a program counter, adders, and ALU and the register file.
- We can look each instruction individually and see which functional units it uses and the path of data flow between them
5.3 A simple implementation scheme
- Using multiplexors, we can tie together all of the required datapaths for all of the instructions
- The value of the control signals is based only on the instruction being executed and thus can be implemented using combinational logic
- the cycle time of the single cycle implementation is determined by the worst case path delay for all the instructions
- A machine with more powerful instructions would result in an imbalance in the amount of time required and thus the single cycle implementation is not very practical
5.4 A multicycle implementation
- If we break each instruction into a series of steps and perform one step every cycle we can create a multicycle implementation
- In doing so, we can reuse functional units such as memory and the ALU
- Registers are needed to store values between cycles
- A finite state machine must be used for control because the values of the control signals now depend upon which step we are performing