Advanced Computer Architecture - Lecture 6: Instruction set principles (Cont''d)

Advanced Computer Architecture - Lecture 6: Instruction set principles (Cont''d). This lecture will cover the following: ISA performance analysis, fallacies and pitfalls; DSP media operations; ISA performance; putting it all together; media and signal processing operations;...

Trang 1

CS 704

Advanced Computer Architecture

Lecture 6

Instruction Set Principles

(ISA Performance Analysis, Fallacies and Pitfalls)

Prof Dr M Ashraf Chughtai

Trang 3

- Places of source and destinations

- Place of next instruction

- Instruction word length

- Variable Length

- Fixed length

- Hybrid – variable fixed

- Categories of Hybrid length

4, 3, 2, 1 and 0 address format

Trang 4

Recap: Lecture 5 … Cont’d

- Comparison of hybrid instruction word format

- Minimum number of memory bytes are required in case

of 1 address (accumulator) format

- Maximum for 4-address format

- MIPS Instruction word format

- RISC and MIPS a fixed length, 64-bit LOAD/STORE

Architecture

- It supports:

- 8-, 16-, 32- and 64-bit operand

- R-type, I-type and J-type

- Arithmetic and logic operation

Trang 5

Media and Signal Processing Operands

 Graphic applications deal with 2D and 3D images

 3D data type is called vertex

 Vertex structure has 4-components

 Vertex values are usually 32-bit Floating point values

 DSP adds fixed point to the data types – binary point just

to the right of the sign-bit

Trang 6

3D Data Type

filled with pixels

consisting of four 8-bit channels

Trang 7

Media and Signal Processing Operations

usually much narrower than the 64-bit data word of modern processors

four 16-bit data values so that the

64-bit ALU to perform four 16-64-bit

operations (say add operation) in a

single clock cycle

Trang 8

Media and Signal Processing Operations

prevent the ‘CARRY’ between the four 16-bit partitions of 64-bit ALU

Single-Instruction Multiple-Data (SIMD) or

vector operations

Trang 9

Multimedia Operations

use 32-bit floating point operations

allowing a single instruction to launch two 32-bit operations on operands

found side-by-side in double precision register

SIMD instructions found in recent

computers

Trang 10

Summary of SIMD instructions

in recent computers

Insert Table given in Fig 2.17 from page 110

Trang 11

Multimedia Operations

common across the five architectures

performing multiple narrow operations

on either 64-bit or 128-bit ALU

B-byte,

H-half word

W-word and

8B double word

Trang 12

Digital Signals Processing Issues

Trang 13

DSP Operations

 Saturating Add/Sub

overflow otherwise it may miss an

event, therefore, it uses saturating

arithmetic.

presented it is set to the largest representable number, based on the

sign of the number

Trang 14

DSP Operations

 Result Rounding

IEEE 754 has several algorithms to round the wider accumulator into narrower one, DSPs select the appropriate mode to

round the result

Trang 15

ISA Performance

program uses an ISA

compilers can be classified as follows:

Trang 16

Classification of Performance

optimization

- High-level optimization: is often done on the

source with the output fed to the later

optimization passes.

- Local Optimization: is done within a

straight-line code fragment (basic block)

- Global Optimization: extends the optimization

across branches

- Register Allocation: associate registers with

operands

Trang 17

Impact of Compiler Technology

- Interaction of compiler and high-level language

affects how a program uses an ISA

- Here, two important questions are:

1: How are variables allocated?

2: How many registers are needed to

allocate variables appropriately?

- These questions are addressed by using three

areas in which high-level language allocates

data

Trang 18

Three areas of data allocation

return

single variable rather than arrays and are addressed by stack-pointer

Trang 19

Three areas of data allocation … Cont’d:

2: Global Data Area

- It is used to allocate statically declared objects

such as global variables and constants

- These objects are mostly arrays and other

aggregate data structures

- Register allocation is relatively less effective

for global variables

- Global variables are aliased – there are

multiple way to address so make it illegal to put

on registers

Trang 20

Three areas of data allocation … Cont’d:

Trang 21

ISA Performance … Cont’d

floating-point registers

be performed on single precision or double precision

MOV.S copies a single precision register to another of the same type

MOV.D copies a Double precision register to another of the same type

Trang 22

MIPS Floating-point Operations … Cont’d

routines, MIPS64 offers Paired-Single Instructions

floating point operations on each half of the 64-bit floating point register

Examples:

ADD.PS SUB.PS MUL.PS

Trang 23

MAC/VU-Advanced

Putting it All Together

instruction sets by the hardware technology of that time

popular, viewed as being good match of high-level language

architectures was to reduce the software

cost, thus produced high-level architectures such as VAX machine

Trang 24

Putting it All Together Cont’d

architecture took place due to

sophisticated compiler technology

introduced; these include:

Trang 25

MAC/VU-Advanced

Putting it All Together Cont’d

1990s Architectures

1: Address size doubles – 32-bit to 64-bit 2: Optimization of conditional branches via conditional execution e.g.; conditional move 3: Optimization of Cache performance via

pre-fetch that increased the role of memory hierarchy in performance of computers

4: Multimedia support 5: Faster Floating point instructions 6: Long Instruction Word

Trang 26

Concluding the Instruction set Principles

Three pillars of Computer Architecture

Hardware, Software and Instruction Set

Instruction Set Interface between hardware and software Taxonomy of Instruction Set:

Stack, Accumulator and General Purpose Register

Types and Size of Operands:

Types: Integer, FP and Character Size: Half word, word, double word

Classification of operations Arithmetic, data transfer, control and support

Trang 27

Concluding the Instruction set Principles … Cont’d

Operand Addressing Modes

Immediate, register, direct (absolute) and

Indirect

Classification of Indirect Addressing

Register, indexed, relative (i.e with

Special Addressing Modes

Auto-increment, auto-decrement and scaled Control Instruction Addressing modes

Branch, jump and procedure call/return

Trang 28

Instruction encoding

- Essential elements of computer instructions:

type of operands, places of source and destinations and place of next instruction

- Instruction word length

Variable, fixed length and hybrid

- Hybrid length taxonomy

4, 3, 2, 1 and 0 address format

- Comparison of hybrid instruction word format

Minimum number of memory bytes are required

in case of 1 address (accumulator) format and

Trang 29

MIPS Instruction word format

- RISC and MIPS a fixed length, 64-bit LOAD/STORE

Architecture

- It supports:

- 8-, 16-, 32- and 64-bit operand

- R-type, I-type and J-type

- Arithmetic and logic operation

- data transfer operations

- Control flow operations

Trang 30

 Multimedia and Digital Signal Processing Operands

- Graphic applications deal with 2D and 3D images

- DSP adds fixed point to the data types – binary point

just to the right of the sign-bit

 Multimedia and Digital Signal Processing operations

- All are fixed-width operation , performing multiple

narrow operations on either 64-bit or 128-bit ALU

- The narrow operation B-byte, H-half word, W-word and

8B double word

 Multimedia and Digital Signal Processing issues

Result Rounding

Trang 31

 ISA Performance

 Role of Compiler: The interaction of compiler and level languages significantly effects how program uses an ISA

high

Trang 32

-Allah Hafiz

and Asalm-u-Alacum

Trang 33

Practice Problems

Trang 34

Quantitative Principles [Lecture 2-3]

B and C) of instructions The clock cycles per instruction (CPI) for each type of instruction is as follows:

Trang 35

Solution to Practice Problem 1

Result: Sequence 2 executes fewer instructions

cycles for each sequence

CPU Clock Cycles for sequence 1 = 2x2 + 3x4 + 4x1 = 20 cycles

CPU Clock Cycles for sequence 1 = 2x3 + 3x2 + 4x4 = 28 cycles

Result: Sequence 1 is faster

CPI for sequence 1 = 20/7 = 2.85

CPI for sequence 2 = 28/6 = 4.67

Result: Sequence 2 which has fewer instructions has higher CPI, thus is slower

Trang 36

Instruction Set Principles [Lecture 4-5]

Tiêu đề	instruction set principles (isa performance analysis, fallacies and pitfalls)
Người hướng dẫn	Prof. Dr. M. Ashraf Chughtai
Trường học	mac/vu
Chuyên ngành	advanced computer architecture
Thể loại	lecture
Năm xuất bản	fall 2023

Định dạng
Số trang	36
Dung lượng	0,95 MB