Performance ========== GOALS: ---- Scenario: 1. Program 1 is compiled into 100 instructions on machine A and 150 instructions on machine B. Is machine B faster than machine A ? Ans: it depends ! 2. Is Intel's 400 MHz processor 4/3 times faster than its 300 MHz processor ? TOP DOWN (BIG PICTURE): - Performance is measured through "metrics" and influenced by "factors". One factor alone (like clock speed) does not paint the complete picture of performance. Choice of metrics and relative effect of factors is the key to performance. - More importantly, a designer should understand the effects of each design decision on the key factors which compose to form the metric. Point: Economics: demand drives supply. Technology changes must be directed at improving user-perceived metrics. Architecture improvements or performance improvements can achieve this. BOTTOM UP (Building blocks): - Black box, parameters/factors (input), metrics/effects (output) - The black box which is technology creates a function of factors to yield metrics. New technology is good if it yields a better function. Concepts: ------- - Metrics, parameters Eg1: Intel has released a 400 MHz processor. It should run 4/3 times faster than its 300 MHz counterpart, so I should rush and pay >> 4/3 times the price for it .... Eg: Our program takes 10s to run on computer A, which has 400 MHz clock. User demand = program runs in 6s. Designer discovers a a new technology called the D-law which says that the clock rate can be upped, but it will cause 1.2 times more clock cycles for the program. What is the minimum clock rate required to get the desired speedup ? TOP DOWN: - Time (response) is the ultimate metric from a user's point of view. - Throughput or utilization is the ultimate metric from a system manager's (provider's) point of view. - When we deal with Patterson/Hennessey we will talk about time. When we deal with Operating Systems (manager program), we will deal with throughput as well. Note that changes which lead to better response for a given pgm (workload) usually lead to better throughput, but not vice versa. BOTTOM UP: - Time = function(instructions, clock cycles per instruction, clock cycle time) - Execution time = Instrns/program * Average time per instruction - Execution time = I/pgm * CPI * clock cycle time. Can remember formula if you simply take care to ensure that the units are matched after the multiplication. Except cycle time, all variables are program dependent! - Clock rate increases: hardware technology - Processor organization: lower average CPI - Compiler enhancements: generate lower instruction count or instructions with lower CPI. - Design goals: lower execution time => lower any one of the components independently or lower the product ! - Beware of results analyzing the effect of each factor alone ! - Manufacturers can fairly easily implement an architecture with a faster clock cycle, but its performance could be worse ! - Understanding architecture/design => understanding what impact design/architecture choices have on each component - Relative performance usually reported as ratios: "A is n times faster than B" - Workloads, Benchmarks: Eg: What does it take for the computer to speed up MY program ? Problem: infinite number of "MY" programs. What is a representative "MY" program ? TOP DOWN: - Lead to marketing and technical games where designers and marketeers optimized or touted the new design to beat specific benchmarks {(Perceived) performance guides design decisions !!}. Eg specialized compilers to give better results for benchmark etc. - Performance guides purchase decisions - Specific performance (on MY pgm) depends upon program being executed. Benchmarks are the next best. BOTTOM UP: - Principle = reproduceability - Representativeness - Solution: a set of programs together may form the benchmark. Then comes the tricky problem of summarizing the results. - SPEC uses normalized ratios and summarizes using the geometric mean of the ratios. - Pitfalls of geometric mean, arithmetic mean etc - Note that ratio is not the same as time. Units mismatch. But ratio can show one technology/machine being better than the other for a set of programs. - What should a designer keep in mind about performance ? Ans: Amdahl's law corollary => make the common case fast Eg: Program runs in 100s. Multiplies = 80% of program. M's law of multiplication allows the designer M to scale multiplication wonderfully. Now, I am a user and I need to make my dear program 5 times faster. How much speedup of multiply instructions should M achieve to allow me to reach my overall speedup goal ? Top Down: - The performance enhancement possible with a given improvement is limited by the amount that the improved feature is used. - Moore's law of improved transistor density had a profound impact because it affected everything. A broadly affecting technology (eg Internet) is something to watch for ! - Rule of thumb: optimize the common case ! Bottom up: - Exn time after improvement = Exn time affected by improvement ----------------------------- Amount of improvement + Unaffected Exn time ======================================================================================= Concepts checklist: ------------------ - Amdahl's law: impact of performance issues on computer design - Benchmark - Clock cycle - CPI = Clock cycles per instruction (or sometimes expressed per class of instructions) - Factors or parameters - Instruction/program - Metrics or effects - Time (execution) is the ultimate metric - Throughput/Utilization used by system managers - Workload ========================================================================================