Advanced Computer Architecture - Lecture 6: Instruction set principles (Cont''d). This lecture will cover the following: ISA performance analysis, fallacies and pitfalls; DSP media operations; ISA performance; putting it all together; media and signal processing operations;...
Trang 1CS 704
Advanced Computer Architecture
Lecture 6
Instruction Set Principles
(ISA Performance Analysis, Fallacies and Pitfalls)
Prof Dr M Ashraf Chughtai
Trang 3- Places of source and destinations
- Place of next instruction
- Instruction word length
- Variable Length
- Fixed length
- Hybrid – variable fixed
- Categories of Hybrid length
4, 3, 2, 1 and 0 address format
Trang 4Recap: Lecture 5 … Cont’d
- Comparison of hybrid instruction word format
- Minimum number of memory bytes are required in case
of 1 address (accumulator) format
- Maximum for 4-address format
- MIPS Instruction word format
- RISC and MIPS a fixed length, 64-bit LOAD/STORE
Architecture
- It supports:
- 8-, 16-, 32- and 64-bit operand
- R-type, I-type and J-type
- Arithmetic and logic operation
Trang 5Media and Signal Processing Operands
Graphic applications deal with 2D and 3D images
3D data type is called vertex
Vertex structure has 4-components
Vertex values are usually 32-bit Floating point values
DSP adds fixed point to the data types – binary point just
to the right of the sign-bit
Trang 63D Data Type
filled with pixels
consisting of four 8-bit channels
Trang 7Media and Signal Processing Operations
usually much narrower than the 64-bit data word of modern processors
four 16-bit data values so that the
64-bit ALU to perform four 16-64-bit
operations (say add operation) in a
single clock cycle
Trang 8Media and Signal Processing Operations
prevent the ‘CARRY’ between the four 16-bit partitions of 64-bit ALU
Single-Instruction Multiple-Data (SIMD) or
vector operations
Trang 9Multimedia Operations
use 32-bit floating point operations
allowing a single instruction to launch two 32-bit operations on operands
found side-by-side in double precision register
SIMD instructions found in recent
computers
Trang 10Summary of SIMD instructions
in recent computers
Insert Table given in Fig 2.17 from page 110
Trang 11Multimedia Operations
common across the five architectures
performing multiple narrow operations
on either 64-bit or 128-bit ALU
B-byte,
H-half word
W-word and
8B double word
Trang 12Digital Signals Processing Issues
Trang 13DSP Operations
Saturating Add/Sub
overflow otherwise it may miss an
event, therefore, it uses saturating
arithmetic.
presented it is set to the largest representable number, based on the
sign of the number
Trang 14DSP Operations
Result Rounding
IEEE 754 has several algorithms to round the wider accumulator into narrower one, DSPs select the appropriate mode to
round the result
Trang 15ISA Performance
program uses an ISA
compilers can be classified as follows:
Trang 16Classification of Performance
optimization
- High-level optimization: is often done on the
source with the output fed to the later
optimization passes.
- Local Optimization: is done within a
straight-line code fragment (basic block)
- Global Optimization: extends the optimization
across branches
- Register Allocation: associate registers with
operands
Trang 17Impact of Compiler Technology
- Interaction of compiler and high-level language
affects how a program uses an ISA
- Here, two important questions are:
1: How are variables allocated?
2: How many registers are needed to
allocate variables appropriately?
- These questions are addressed by using three
areas in which high-level language allocates
data
Trang 18Three areas of data allocation
return
single variable rather than arrays and are addressed by stack-pointer
Trang 19Three areas of data allocation … Cont’d:
2: Global Data Area
- It is used to allocate statically declared objects
such as global variables and constants
- These objects are mostly arrays and other
aggregate data structures
- Register allocation is relatively less effective
for global variables
- Global variables are aliased – there are
multiple way to address so make it illegal to put
on registers
Trang 20Three areas of data allocation … Cont’d:
Trang 21ISA Performance … Cont’d
floating-point registers
be performed on single precision or double precision
MOV.S copies a single precision register to another of the same type
MOV.D copies a Double precision register to another of the same type
Trang 22MIPS Floating-point Operations … Cont’d
routines, MIPS64 offers Paired-Single Instructions
floating point operations on each half of the 64-bit floating point register
Examples:
ADD.PS SUB.PS MUL.PS
Trang 23MAC/VU-Advanced
Putting it All Together
instruction sets by the hardware technology of that time
popular, viewed as being good match of high-level language
architectures was to reduce the software
cost, thus produced high-level architectures such as VAX machine
Trang 24Putting it All Together Cont’d
architecture took place due to
sophisticated compiler technology
introduced; these include:
Trang 25MAC/VU-Advanced
Putting it All Together Cont’d
1990s Architectures
1: Address size doubles – 32-bit to 64-bit 2: Optimization of conditional branches via conditional execution e.g.; conditional move 3: Optimization of Cache performance via
pre-fetch that increased the role of memory hierarchy in performance of computers
4: Multimedia support 5: Faster Floating point instructions 6: Long Instruction Word
Trang 26Concluding the Instruction set Principles
Three pillars of Computer Architecture
Hardware, Software and Instruction Set
Instruction Set Interface between hardware and software Taxonomy of Instruction Set:
Stack, Accumulator and General Purpose Register
Types and Size of Operands:
Types: Integer, FP and Character Size: Half word, word, double word
Classification of operations Arithmetic, data transfer, control and support
Trang 27Concluding the Instruction set Principles … Cont’d
Operand Addressing Modes
Immediate, register, direct (absolute) and
Indirect
Classification of Indirect Addressing
Register, indexed, relative (i.e with
Special Addressing Modes
Auto-increment, auto-decrement and scaled Control Instruction Addressing modes
Branch, jump and procedure call/return
Trang 28Concluding the Instruction set Principles … Cont’d
Instruction encoding
- Essential elements of computer instructions:
type of operands, places of source and destinations and place of next instruction
- Instruction word length
Variable, fixed length and hybrid
- Hybrid length taxonomy
4, 3, 2, 1 and 0 address format
- Comparison of hybrid instruction word format
Minimum number of memory bytes are required
in case of 1 address (accumulator) format and
Trang 29Concluding the Instruction set Principles … Cont’d
MIPS Instruction word format
- RISC and MIPS a fixed length, 64-bit LOAD/STORE
Architecture
- It supports:
- 8-, 16-, 32- and 64-bit operand
- R-type, I-type and J-type
- Arithmetic and logic operation
- data transfer operations
- Control flow operations
Trang 30Concluding the Instruction set Principles … Cont’d
Multimedia and Digital Signal Processing Operands
- Graphic applications deal with 2D and 3D images
- DSP adds fixed point to the data types – binary point
just to the right of the sign-bit
Multimedia and Digital Signal Processing operations
- All are fixed-width operation , performing multiple
narrow operations on either 64-bit or 128-bit ALU
- The narrow operation B-byte, H-half word, W-word and
8B double word
Multimedia and Digital Signal Processing issues
Result Rounding
Trang 31Concluding the Instruction set Principles … Cont’d
ISA Performance
Role of Compiler: The interaction of compiler and level languages significantly effects how program uses an ISA
high
Trang 32-Allah Hafiz
and Asalm-u-Alacum
Trang 33Practice Problems
Trang 34Practice Problems
Quantitative Principles [Lecture 2-3]
B and C) of instructions The clock cycles per instruction (CPI) for each type of instruction is as follows:
Trang 35Solution to Practice Problem 1
Result: Sequence 2 executes fewer instructions
cycles for each sequence
CPU Clock Cycles for sequence 1 = 2x2 + 3x4 + 4x1 = 20 cycles
CPU Clock Cycles for sequence 1 = 2x3 + 3x2 + 4x4 = 28 cycles
Result: Sequence 1 is faster
CPI for sequence 1 = 20/7 = 2.85
CPI for sequence 2 = 28/6 = 4.67
Result: Sequence 2 which has fewer instructions has higher CPI, thus is slower
Trang 36Practice Problems
Instruction Set Principles [Lecture 4-5]