Computer architecture Part IV Data Path and Control

A Few Words About Where We Are HeadedPerformance = 1 / Execution time simplified to 1 / CPU execution time CPU execution time = Instructions × CPI / Clock rate Performance = Clock rate /

Trang 1

Part IV

Data Path and Control

Trang 2

About This Presentation

This presentation is intended to support the use of the textbook

Computer Architecture: From Microprocessors to Supercomputers,

Oxford University Press, 2005, ISBN 0-19-515455-X It is updated regularly by the author as part of his teaching of the upper-division course ECE 154, Introduction to Computer Architecture, at the

University of California, Santa Barbara Instructors can use these slides freely in classroom teaching and for other educational

purposes Any other use is strictly prohibited © Behrooz Parhami

First July 2003 July 2004 July 2005 Mar 2006 Feb 2007

Trang 3

A Few Words About Where We Are Headed

Performance = 1 / Execution time simplified to 1 / CPU execution time

CPU execution time = Instructions × CPI / (Clock rate)

Performance = Clock rate / ( Instructions × CPI )

Define an instruction set;

make it simple enough

to require a small number

of cycles and allow high clock rate, but not so simple that we need many instructions, even for very simple tasks (Chap 5-8)

Design hardware for CPI = 1; seek improvements with CPI > 1 (Chap 13-14)

Design ALU for arithmetic & logic ops (Chap 9-12)

Try to achieve CPI = 1

with clock that is as

high as that for CPI > 1

Trang 4

IV Data Path and Control

Topics in This Part

Chapter 13 Instruction Execution Steps

Chapter 14 Control Unit Synthesis

Chapter 15 Pipelined Data Paths

Chapter 16 Pipeline Performance Limits

Design a simple computer (MicroMIPS) to learn about:

• Data path – part of the CPU where data signals flow

• Control unit – guides data signals through data path

• Pipelining – a way of achieving greater performance

Trang 5

13 Instruction Execution Steps

A simple computer executes instructions one at a time

• Fetches an instruction from the loc pointed to by PC

• Interprets and executes the instruction, then repeats

Topics in This Chapter

13.1 A Small Set of Instructions13.2 The Instruction Execution Unit13.3 A Single-Cycle Data Path

13.4 Branching and Jumping13.5 Deriving the Control Signals13.6 Performance of the Single-Cycle Design

Trang 6

13.1 A Small Set of Instructions

Fig 13.1 MicroMIPS instruction formats and naming of the various fields.

Operand / Offset, 16 bits

I

J

inst

Instruction, 32 bits

Seven R-format ALU instructions (add, sub, slt, and, or, xor, nor)

Six I-format ALU instructions (lui, addi, slti, andi, ori, xori)

Two I-format memory access instructions (lw, sw)

Three I-format conditional branch instructions (bltz, beq, bne)

Four unconditional jump instructions (j, jr, jal, syscall)

We will refer to this diagram later

Trang 7

AND immediate andi rt,rs,imm

OR immediate ori rt,rs,imm XOR immediate xori rt,rs,imm Load word lw rt,imm(rs) Store word sw rt,imm(rs)

Jump register jr rs Branch less than 0 bltz rs,L Branch equal beq rs,rt,L Branch not equal bne rs,rt,L Jump and link jal L

Copy

Control transfer

LogicArithmetic

Memory access

op

15 0 0 0 8 10 0 0 0 0 12 13 14 35 43 2 0 1 4 5 3

fn

32 34 42

36 37 38 39

8

Table 13.1

Trang 8

13.2 The Instruction Execution Unit

Fig 13.2 Abstract view of the instruction execution unit for MicroMIPS

For naming of instruction fields, see Fig 13.1.

ALU cache Data

Instr cache

Next addr

Control

Reg file

Operand / Offset, 16 bits

Destination Unused Opcode ext

12 A/L, lui, lw,sw

j,jal syscall

22 instructions

Trang 9

13.3 A Single-Cycle Data Path

Fig 13.3 Key elements of the single-cycle MicroMIPS data path

/

cache

Instr cache

Next addr

Reg file

16

Register input

Data out Func

Trang 10

An ALU for MicroMIPS

Fig 10.19 A multifunction ALU with 8 control signals (2 for function class,

32-Ovfl Zero

Ovfl Zero

Func Control

0 or 1

AND 00

OR 01 XOR 10 NOR 11

Trang 11

13.4 Branching and Jumping

Fig 13.4 Next-address logic for MicroMIPS (see top part of Fig 13.3)

/ 30

/ 32 BrTrue

/ 32

/ 30

/ 26

/ 30

/

30 4

MSBs

30 MSBs

BrType

IncrPC

NextPC

/ 30

31:2

16

(PC)31:28 | jta When instruction is j or jal

Trang 12

13.5 Deriving the Control Signals

Table 13.2 Control signals for the single-cycle MicroMIPS implementation.

Trang 13

OR XOR NOR AND immediate

OR immediate XOR immediate Load word Store word Jump Jump register Branch on less than 0 Branch on equal

Branch on not equal Jump and link

Trang 14

Control Signals in the Single-Cycle Data Path

Fig 13.3 Key elements of the single-cycle MicroMIPS data path

/

cache

Instr cache

Next addr

Reg file

16

Register input

Data out Func

Trang 15

s ltiIns t

andiIns t oriIns t xoriIns t luiIns t

Trang 16

Control Signal Generation

Auxiliary signals identifying instruction classes

arithInst = addInst ∨ subInst ∨ sltInst ∨ addiInst ∨ sltiInst

logicInst = andInst ∨ orInst ∨ xorInst ∨ norInst ∨ andiInst ∨ oriInst ∨ xoriInst immInst = luiInst ∨ addiInst ∨ sltiInst ∨ andiInst ∨ oriInst ∨ xoriInst

Example logic expressions for control signals

RegWrite = luiInst ∨ arithInst ∨ logicInst ∨ lwInst ∨ jalInst

ALUSrc = immInst ∨ lwInst ∨ swInst

Add′Sub = subInst ∨ sltInst ∨ sltiInst

DataRead = lwInst

PCSrc0 = jInst ∨ jalInst ∨ syscallInst

Control

addInst subInst jInst

sltInst

.

Trang 17

Putting It All Together

32 /

16

Register input

Data out

sltInst

.

32-O vfl Zero

32

32 MSB

A

y

x

Shorth symb for AL

O Zero

Fun Cont

0 or 1

AND 00

OR 01 XOR 10 NOR 11

/ 32 BrTrue

/ 32

/

30

/ 30

/ 30 / 26

/ 30

/

30 4 MSBs

30 MSBs

Trang 18

13.6 Performance of the Single-Cycle Design

An example combinational-logic data path to compute z := (u + v)(w – x) / y

Add/Sub latency

2 ns

Multiply latency

6 ns

Divide latency

15 ns

Beginning with inputs u, v, w, x, and y

stored in registers, the entire computation can be completed in ≅25 ns, allowing 1

ns each for register readout and write

Total latency

23 ns

Note that the divider gets its correct inputs after ≅9 ns, but this won’t cause a problem

if we allow enough total time

Trang 19

Performance Estimation for Single-Cycle MicroMIPS

Fig 13.6 The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies

Not used

Trang 20

How Good is Our Single-Cycle Design?

Clock rate of 125 MHz not impressive

How does this compare with

current processors on the market?

Not bad, where latency is concerned

A 2.5 GHz processor with 20 or so pipeline stages has a latency of about0.4 ns/cycle × 20 cycles = 8 ns

Throughput, however, is much better for the pipelined processor:

Up to 20 times better with single issue

Perhaps up to 100 times better with multiple issue

Trang 21

14 Control Unit Synthesis

The control unit for the single-cycle design is memoryless

• Problematic when instructions vary greatly in complexity

• Multiple cycles needed when resources must be reused

14.1 A Multicycle Implementation14.2 Choosing the Clock Cycle14.3 The Control State Machine14.4 Performance of the Multicycle Design14.5 Microprogramming

14.6 Exception Handling

Trang 22

3 cycles 5 cycles 3 cycles 4 cycles

Time saved

Trang 23

A Multicycle Data Path

Fig 14.2 Abstract view of a multicycle instruction execution unit for MicroMIPS For naming of instruction fields, see Fig 13.1

ALU

Cache

Control

Reg file

op

jta

fn

imm rs,rt,rd (rs)

(rt) Address

Trang 24

Multicycle Data Path with Control Signals Shown

Fig 14.3 Key elements of the multicycle MicroMIPS data path

Three major changes relative to

the single-cycle data path:

1 Instruction & data

2

Corrections are

shown in red

Trang 25

14.2 Clock Cycle and Control Signals

JumpAddr jta SysCallAddr

PCSrc 1 , PCSrc 0 Jump addr x reg z reg ALU out

PCWrite Don’t write Write

MemRead Don’t read Read

MemWrite Don’t write Write

ALUSrcX PC x reg

ALUSrcY 1 , ALUSrcY 0 4 y reg imm 4 × imm

LogicFn 1 , LogicFn 0 AND OR XOR NOR

IRWrite Don’t write Write

RegWrite Don’t write Write

RegDst 1 , RegDst 0 rt rd $31

RegInSrc 1 , RegInSrc 0 Data reg z reg PC

FnClass , FnClass lui Set less Arithmetic Logic

Trang 26

Execution

Cycles

Table 14.2 Execution cycles for multicycle MicroMIPS

write it into instruction register, increment PC

Inst′Data = 0, MemRead = 1 IRWrite = 1, ALUSrcX = 0 ALUSrcY = 0, ALUFunc = ‘+’ PCSrc = 3, PCWrite = 1

registers, compute branch

address and save in z register

ALUSrcX = 0, ALUSrcY = 3 ALUFunc = ‘+’

ALU type Perform ALU operation and

ALUFunc: Varies

Load/Store Add base and offset values,

ALUFunc = ‘+’

Branch If (x reg) = ≠ < (y reg), set PC

ALUFunc= ‘−’, PCSrc = 2 PCWrite = ALUZero or

PCSrc = 0 or 1, PCWrite = 1

ALU type Write back z reg into rd RegDst = 1, RegInSrc = 1

RegWrite = 1

Store Copy y reg into memory Inst′Data = 1, MemWrite = 1

Trang 27

14.3 The Control State Machine

Fig 14.4 The control state machine for multicycle MicroMIPS

Cycle 1 Cycle 2 Cycle 3

ALU- type

State 5

ALUSrcX = 1 ALUSrcY = 1 ALUFunc = ‘−’

JumpAddr = % PCSrc = @ PCWrite = #

State 8

RegDst = 0 or 1 RegInSrc = 1 RegWrite = 1

State 7

ALUSrcX = 1 ALUSrcY = 1 or 2 ALUFunc = Varies

State 6

Inst′Data = 1 MemWrite = 1

State 4

RegDst = 0 RegInSrc = 0 RegWrite = 1

State 2

ALUSrcX = 1 ALUSrcY = 2 ALUFunc = ‘+’

State 3

Inst′Data = 1 MemRead = 1

Jump/

Branch

Notes for State 5:

% 0 for j or jal, 1 for syscall,

don’t-care for other instr’s

@ 0 for j, jal, and syscall,

1 for jr, 2 for branches

# 1 for j, jr, jal, and syscall,

ALUZero (′) for beq (bne),

bit 31 of ALUout for bltz

For jal, RegDst = 2, RegInSrc = 1,

RegWrite = 1

Note for State 7:

ALUFunc is determined based

on the op and fn fields

Speculative calculation of branch address

Branches based

on instruction

Trang 28

State and Instruction Decoding

Fig 14.5 State and instruction decoders for multicycle MicroMIPS

jrInst

norInst sltInst

orInst xorInst

sltiInst andiInst oriInst xoriInst luiInst

Trang 29

Control Signal Generation

Certain control signals depend only on the control state

ALUSrcX = ControlSt2 ∨ ControlSt5 ∨ ControlSt7

RegWrite = ControlSt4 ∨ ControlSt8

Auxiliary signals identifying instruction classes

addsubInst = addInst ∨ subInst ∨ addiInst

logicInst = andInst ∨ orInst ∨ xorInst ∨ norInst ∨ andiInst ∨ oriInst ∨ xoriInst

Logic expressions for ALU control signals

Add′Sub = ControlSt5 ∨ (ControlSt7 ∧ subInst)

FnClass1 = ControlSt7′ ∨ addsubInst ∨ logicInst

FnClass0 = ControlSt7 ∧ (logicInst ∨ sltInst ∨ sltiInst)

LogicFn1 = ControlSt7 ∧ (xorInst ∨ xoriInst ∨ norInst)

LogicFn0 = ControlSt7 ∧ (orInst ∨ oriInst ∨ norInst)

Trang 30

14.4 Performance of the Multicycle Design

Fig 13.6 The MicroMIPS data path unfolded (by depicting the register write step as a separate block) so as to better visualize the critical-path latencies

Not used

Trang 31

How Good is Our Multicycle Design?

Clock rate of 500 MHz better than 125 MHz

of single-cycle design, but still unimpressive

How does the performance compare with

current processors on the market?

Not bad, where latency is concerned

A 2.5 GHz processor with 20 or so pipeline

stages has a latency of about 0.4× 20 = 8 ns

Throughput, however, is much better for

the pipelined processor:

Up to 20 times better with single issue

Perhaps up to 100× with multiple issue

Trang 32

14.5 Microprogramming

State 0

Inst′Data = 0 MemRead = 1 IRWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘+’

PCSrc = 3 PCWrite = 1 Start

Cycle 1 Cycle 2 Cycle 3 Cycle 1 Cycle 4 Cycle 5

State 5

ALUSrcX = 1 ALUFunc = ‘−’

State 8

RegDst = 0 or 1 RegInSrc = 1

State 7

State 6

State 4

RegDst = 0 RegInSrc = 0

State 2

ALUSrcX = 1 ALUFunc = ‘+’

State 3

Jump/

Branch

Notes for State 5:

% 0 for j or jal, 1 for syscall, don’t-care for other instr’s

1 for jr, 2 for branches # 1 for j, jr, jal, and syscall, ALUZero (′) for beq (bne), bit 31 of ALUout for bltz For jal, RegDst = 2, RegInSrc = 1, RegWrite = 1

Note for State 7:

The control state machine resembles

Microinstruction

Fig 14.6 Possible 22-bit microinstruction

format for MicroMIPS

PC control

Cache control

Register control

ALU inputs

IRWrite

FnType LogicFn

ALUSrcY ALUSrcX RegInSrc

RegDst RegWrite

Sequence control

ALU function

2

bits

23

Trang 33

The Control State Machine as a Microprogram

Fig 14.4 The control state machine for multicycle MicroMIPS

Cycle 1 Cycle 2 Cycle 3

ALU- type

State 5

State 8

State 7

State 6

State 4

State 2

State 3

Jump/

Branch

Notes for State 5:

% 0 for j or jal, 1 for syscall,

don’t-care for other instr’s

1 for jr, 2 for branches

# 1 for j, jr, jal, and syscall,

ALUZero (′) for beq (bne),

bit 31 of ALUout for bltz

For jal, RegDst = 2, RegInSrc = 1,

RegWrite = 1

Note for State 7:

Decompose into 2 substates Multiple substates

Multiple substates

Trang 34

Symbolic Names for Microinstruction Field Values

Table 14.3 Microinstruction field values and their symbolic names

The default value for each unspecified field is the all 0s bit pattern.

Field name Possible field values and their symbolic names

Trang 35

Control Unit for

Microprogramming

Fig 14.7 Microprogrammed control unit for MicroMIPS

Microprogram memory or PLA

Data

0

Sequence control

andi:

-Multiway branch

64 entries

in each table

Trang 36

fetch: PCnext, CacheFetch # State 0 (start)

PC + 4imm, μPCdisp1 # State 1

rt ← z, μPCfetch # State 8lui

rd ← z, μPCfetch # State 8add

rd ← z, μPCfetch # State 8sub

rd ← z, μPCfetch # State 8slt

rt ← z, μPCfetch # State 8addi

rt ← z, μPCfetch # State 8slti

rd ← z, μPCfetch # State 8and

rd ← z, μPCfetch # State 8or

rd ← z, μPCfetch # State 8xor

rd ← z, μPCfetch # State 8nor

rt ← z, μPCfetch # State 8andi

rt ← z, μPCfetch # State 8ori

rt ← z, μPCfetch # State 8xori lwsw1: x + imm, mPCdisp2 # State 2

rt ← Data, μPCfetch # State 4 sw2: CacheStore, μPCfetch # State 6 j1: PCjump, μPCfetch # State 5j jr1: PCjreg, μPCfetch # State 5jr branch1: PCbranch, μPCfetch # State 5branch jal1: PCjump, $31 ←PC, μPCfetch # State 5jal

Trang 37

14.6 Exception Handling

Exceptions and interrupts alter the normal program flow

Examples of exceptions (things that can go wrong):

• ALU operation leads to overflow (incorrect result is obtained)

• Opcode field holds a pattern not representing a legal operation

• Cache error-code checker deems an accessed word invalid

• Sensor signals a hazardous condition (e.g., overheating)

Exception handler is an OS program that takes care of the problem

• Derives correct result of overflowing computation, if possible

• Invalid operation may be a software-implemented instruction

Interrupts are similar, but usually have external causes (e.g., I/O)

Trang 38

PCSrc = 3 PCWrite = 1

Start

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

ALU- type

State 5

State 8

State 7

State 6

State 4

State 2

State 3

Jump/

Branch

State 10

IntCause = 0 CauseWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘−’

EPCWrite = 1 JumpAddr = 1 PCSrc = 0 PCWrite = 1

State 9

IntCause = 1 CauseWrite = 1 ALUSrcX = 0 ALUSrcY = 0 ALUFunc = ‘−’ EPCWrite = 1 JumpAddr = 1 PCSrc = 0 PCWrite = 1

Illegal operation

Overflow

Trang 39

15 Pipelined Data Paths

Pipelining is now used in even the simplest of processors

• Same principles as assembly lines in manufacturing

• Unlike in assembly lines, instructions not independent

15.1 Pipelining Concepts15.2 Pipeline Stalls or Bubbles15.3 Pipeline Timing and Performance15.4 Pipelined Data Path Design

15.5 Pipelined Control15.6 Optimal Pipelining

Định dạng
Số trang	80
Dung lượng	1,27 MB

Tiêu đề	Data Path and Control
Tác giả	Behrooz Parhami
Trường học	University of California, Santa Barbara
Chuyên ngành	Computer Architecture
Thể loại	presentation
Năm xuất bản	2007
Thành phố	Santa Barbara