Principles of Computer Architecture : Principles of Computer Architecture Chapter Contents : Chapter Contents 10.1 Quantitative Analyses of Program Execution
10.2 From CISC to RISC
10.3 Pipelining the Datapath
10.4 Overlapping Register Windows
10.5 Multiple Instruction Issue (Superscalar) Machines – The PowerPC
10.6 Case Study: The PowerPC™ 601 as a Superscalar Architecture
10.7 VLIW Machines
10.8 Case Study: The Intel IA-64 (Merced) Architecture
10.9 Parallel Architecture
10.10 Case Study: Parallel Processing in the Sega Genesis Instruction Frequency : Instruction Frequency • Frequency of occurrence of instruction types for a variety of languages. The percentages do not sum to 100 due to roundoff. (Adapted from Knuth, D. E., An Empirical Study of FORTRAN Programs, Software—Practice and Experience, 1, 105-133, 1971.) Complexity of Assignments : Complexity of Assignments • Percentages showing complexity of assignments and procedure calls. (Adapted from Tanenbaum, A., Structured Computer Organization, 4/e, Prentice Hall, Upper Saddle River, New Jersey, 1999.) Speedup and Efficiency : Speedup and Efficiency • Speedup S is the ratio of the time needed to execute a program without an enhancement to the time required with an enhancement. • Time T is computed as the instruction count IC times the number of cycles per instruction CPI times the cycle time t. • Substituting T into the speedup percentage calculation above yields: Example : Example • Example: Estimate the speedup obtained by replacing a CPU having an average CPI of 5 with another CPU having an average CPI of 3.5, with the clock period increased from 100 ns to 120 ns.
• The previous equation becomes: Four-Stage Instruction Pipeline : Four-Stage Instruction Pipeline Pipeline Behavior : Pipeline Behavior • Pipeline behavior during a memory reference and during a branch. Filling the Load Delay Slot : Filling the Load Delay Slot • SPARC code, (a) with a nop inserted, and (b) with srl migrated to nop position. Call-Return Behavior : Call-Return Behavior • Call-return behavior as a function of nesting depth and time (Adapted from Stallings, W., Computer Organization and Architecture: Designing for Performance, 4/e, Prentice Hall, Upper Saddle River, 1996). SPARC Registers : SPARC Registers • User view of RISC I registers. Overlapping Register Windows : Overlapping Register Windows Example: Compiled C Program : Example: Compiled C Program • Source code for C program to be compiled with gcc. gcc Generated SPARC Code : gcc Generated SPARC Code gcc Generated SPARC Code (cont’) : gcc Generated SPARC Code (cont’) Effect ofCompilerOptimization : Effect ofCompilerOptimization • SPARC code generated with the -O optimization flag: The PowerPC 601 Architecture : The PowerPC 601 Architecture 128-Bit IA-64 Instruction Word : 128-Bit IA-64 Instruction Word Parallel Speedup and Amdahl’s Law : Parallel Speedup and Amdahl’s Law • In the context of parallel processing, speedup can be computed: • Amdahl’s law, for p processors and a fraction f of unparallelizable code: • For example, if f = 10% of the operations must be performed sequentially, then speedup can be no greater than 10 regardless of how many processors are used: Efficiency and Throughput : Efficiency and Throughput • Efficiency is the ratio of speedup to the number of processors used. For a speedup of 5.3 with 10 processors, the efficiency is: • Throughput is a measure of how much computation is achieved over time, and is of special concern for I/O bound and pipelined applications. For the case of a four stage pipeline that remains filled, in which each pipeline stage completes its task in 10 ns, the average time to complete an operation is 10 ns even though it takes 40 ns to execute any one operation. The overall throughput for this situation is then: FlynnTaxonomy : FlynnTaxonomy • Classification of architectures according to the Flynn taxonomy: (a) SISD; (b) SIMD; (c) MIMD; (d) MISD. Network Topologies : Network Topologies • Network topologies: (a) crossbar; (b) bus; (c) ring; (d) mesh; (e) star; (f) tree; (g) perfect shuffle; (h) hypercube. Crossbar : Crossbar • Internal organization of a crossbar. Crosspoint Settings : Crosspoint Settings • (a) Crosspoint settings for connections 0 ® 3 and 3 ® 0; (b) adjusted settings to accommodate connection 1 ® 1. Three-Stage Clos Network : Three-Stage Clos Network 12-Channel Three-Stage Clos Network with n = p = 6 : 12-Channel Three-Stage Clos Network with n = p = 6 12-Channel Three-Stage Clos Network with n = p = 2 : 12-Channel Three-Stage Clos Network with n = p = 2 12-Channel Three-Stage Clos Network with n = p = 4 : 12-Channel Three-Stage Clos Network with n = p = 4 12-Channel Three-Stage Clos Network with n = p = 3 : 12-Channel Three-Stage Clos Network with n = p = 3 C function computes (x2 + y2) ´ y2 : C function computes (x2 + y2) ´ y2 Dependency Graph : Dependency Graph • (a) Control sequence for C program; (b) dependency graph for C program. Matrix Multiplication : Matrix Multiplication • (a) Problem setup for Ax = b; (b) equations for computing the bi. Matrix Multiplication Dependency Graph : Matrix Multiplication Dependency Graph The Connection Machine CM-1 : The Connection Machine CM-1 • Block diagram of the CM-1 (Adapted from Hillis, W. D., The Connection Machine, The MIT Press, 1985). CM-1 Router Network : CM-1 Router Network • A four-space hypercube for the router network. CM-1 Processing Element : CM-1 Processing Element The Connection Machine CM-5 : The Connection Machine CM-5 Partitions on the CM-5 : Partitions on the CM-5 Fat Tree : Fat Tree Parallel Processing in Sega Genesis : Parallel Processing in Sega Genesis • External view of the Sega Genesis home video game system. Sega Genesis Architecture : Sega Genesis Architecture • External view of the Sega Genesis home video game system.