1. Trang chủ
  2. » Giáo án - Bài giảng

kiến trúc máy tính võ tần phương chương ter04 1 single cycle processor sinhvienzone com

52 77 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 52
Dung lượng 1,61 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

2013  Register File consists of 32 × 32-bit registers  Two registers read and one written in a cycle  Registers are selected by:  RA selects register to be read on BusA  RB select

Trang 1

Vo Tan Phuong

Trang 3

2013

 Designing a Processor: Step-by-Step

 Datapath Components and Clocking

 Assembling an Adequate Datapath

 Controlling the Execution of Instructions

 The Main Controller and ALU Controller

 Drawback of the single-cycle processor design

Trang 4

2013

 Recall, performance is determined by:

 Instruction count

 Clock cycles per instruction (CPI)

 Clock cycle time

 Processor design will affect

 Clock cycles per instruction

 Clock cycle time

 Single cycle datapath and control design:

 Advantage: One clock cycle per instruction

 Disadvantage: long cycle time

I-Count

Trang 5

2013

 Analyze instruction set => datapath requirements

 The meaning of each instruction is given by the register transfers

 Datapath must include storage elements for ISA registers

 Datapath must support each register transfer

 Select datapath components and clocking methodology

 Assemble datapath meeting the requirements

 Analyze implementation of each instruction

 Determine the setting of control signals for register transfer

 Assemble the control logic

Trang 6

2013

 All instructions are 32-bit wide

 Three instruction formats: R-type, I-type, and J-type

 Op 6 : 6-bit opcode of the instruction

 Rs 5 , Rt 5 , Rd 5 : 5-bit source and destination register numbers

 sa 5 : 5-bit shift amount used by shift instructions

 funct 6 : 6-bit function field for R-type instructions

 immediate 16 : 16-bit immediate value or address offset

 immediate 26 : 26-bit target address of the jump instruction

Op 6 Rs 5 Rt 5 Rd 5 sa 5 funct 6

Op 6 Rs 5 Rt 5 immediate 16

Op 6 immediate 26

Trang 7

2013

 Only a subset of the MIPS instructions are considered

 ALU instructions (R-type): add, sub, and, or, xor, slt

 Immediate instructions (I-type): addi, slti, andi, ori, xori

 Load and Store (I-type): lw, sw

 Branch (I-type): beq, bne

 Jump (J-type): j

 This subset does not include all the integer instructions

 But sufficient to illustrate design of datapath and control

 Concepts used to implement the MIPS subset are used

to construct a broad spectrum of computers

Trang 8

2013

slt rd, rs, rt set on less than op 6 = 0 rs 5 rt 5 rd 5 0 0x2a

addi rt, rs, im 16 add immediate 0x08 rs 5 rt 5 im 16

slti rt, rs, im 16 slt immediate 0x0a rs 5 rt 5 im 16

andi rt, rs, im 16 and immediate 0x0c rs 5 rt 5 im 16

ori rt, rs, im 16 or immediate 0x0d rs 5 rt 5 im 16

xori rt, im 16 xor immediate 0x0e rs 5 rt 5 im 16

lw rt, im 16 (rs) load word 0x23 rs 5 rt 5 im 16

sw rt, im 16 (rs) store word 0x2b rs 5 rt 5 im 16

beq rs, rt, im 16 branch if equal 0x04 rs 5 rt 5 im 16

bne rs, rt, im 16 branch not equal 0x05 rs 5 rt 5 im 16

Trang 9

2013

 RTL is a description of data flow between registers

 RTL gives a meaning to the instructions

 All instructions are fetched from memory at address PC

Instruction RTL Description

ORI Reg(Rt) ← Reg(Rs) | zero_ext(Im16); PC ← PC + 4

LW Reg(Rt) ← MEM[Reg(Rs) + sign_ext(Im16)]; PC ← PC + 4

SW MEM[Reg(Rs) + sign_ext(Im16)] ← Reg(Rt); PC ← PC + 4

BEQ if (Reg(Rs) == Reg(Rt))

PC ← PC + 4 + 4 × sign_extend(Im16) else PC ← PC + 4

Trang 10

2013

Fetch operands: data1 ← Reg(Rs), data2 ← Reg(Rt)

Execute operation: ALU_result ← func(data1, data2)

Write ALU result: Reg(Rd) ← ALU_result

Next PC address: PC ← PC + 4

Fetch operands: data1 ← Reg(Rs), data2 ← Extend(imm16)

Execute operation: ALU_result ← op(data1, data2)

Write ALU result: Reg(Rt) ← ALU_result

Next PC address: PC ← PC + 4

Fetch operands: data1 ← Reg(Rs), data2 ← Reg(Rt)

Equality: zero ← subtract(data1, data2)

Branch: if (zero) PC ← PC + 4 + 4×sign_ext(imm16)

else PC ← PC + 4

Trang 11

2013

 LW Fetch instruction: Instruction ← MEM[PC]

Fetch base register: base ← Reg(Rs)

Calculate address: address ← base + sign_extend(imm16)

Read memory: data ← MEM[address]

Write register Rt: Reg(Rt) ← data

Next PC address: PC ← PC + 4

Fetch registers: base ← Reg(Rs), data ← Reg(Rt)

Calculate address: address ← base + sign_extend(imm16)

Write memory: MEM[address] ← data

Next PC address: PC ← PC + 4

Target PC address: target ← PC[31:28] || Imm26 || ‘00’

concatenation

Trang 12

2013

 Memory

 Registers

 Read source register Rs

 Read source register Rt

 Write destination register Rt or Rd

 Program counter PC register and Adder to increment PC

 Sign and Zero extender for immediate constant

 ALU for executing instructions

Trang 13

2013

 Designing a Processor: Step-by-Step

 Datapath Components and Clocking

 Assembling an Adequate Datapath

 Controlling the Execution of Instructions

 The Main Controller and ALU Controller

 Drawback of the single-cycle processor design

Trang 14

Instruction Memory

Address Data_in Data_out

Mem Read

Mem Write

32

32

32

clk

Trang 15

2013

 Register

 Similar to the D-type Flip-Flop

 n-bit input and output

 Write Enable (WE):

 Enable / disable writing of register

 Negated (0): Data_Out will not change

 Asserted (1): Data_Out will become Data_In after clock edge

 Edge triggered Clocking

 Register output is modified at clock edge

Register

Data_In

Clock

Write Enable

n bits

Data_Out

n bits

WE

Trang 16

2013

 Register File consists of 32 × 32-bit registers

 Two registers read and one written in a cycle

 Registers are selected by:

 RA selects register to be read on BusA

 RB selects register to be read on BusB

 RW selects the register to be written

 Clock input

 The clock input is used ONLY during write operation

 During read, register file behaves as a combinational logic block

 RA or RB valid => BusA or BusB valid after access time

RW RA RB

Register File

Trang 17

R0 is not used

WE

WE

WE

Trang 18

2013

 Allow multiple sources to drive a single bus

 Two Inputs:

 Data_in

 One Output: Data_out

 If ( Enable ) Data_out = Data_in else Data_out = High Impedance state (output is disconnected)

 Tri-state buffers can be

used to build multiplexors

Trang 19

ALU Selection

SLT: ALU does a SUB and check the sign and overflow

Trang 20

2013

 Instruction memory needs only provide read access

 Because datapath does not write instructions

 Behaves as combinational logic for read

 Address selects Instruction after access time

 Data Memory is used for load and store

 The Clock synchronizes the write operation

 Separate instruction and data memories

 Later, we will replace them with caches

MemWrite MemRead

Data Memory

Address Data_in

32

Trang 21

2013

 Clocks are needed in a sequential

logic to decide when a state element

(register) should be updated

 To ensure correctness, a clocking

methodology defines when data can

be written and read

 Data must be valid

and stable before arrival of clock edge

 Edge-triggered clocking allows a register to be read and written during same clock cycle

Trang 22

2013

 With edge-triggered clocking, the clock cycle must be

long enough to accommodate the path from one register through the combinational logic to another register

through combinational logic

 Ts : setup time that input to a register must be stable

before arrival of clock edge

 Th: hold time that input to a register must hold after arrival of clock edge

 Hold time (Th) is normally satisfied since Tclk-q > Th

Trang 23

2013

 Clock skew arises because the clock signal uses different paths with slightly different delays to reach state elements

 Clock skew is the difference in absolute time between

when two storage elements see a clock edge

 With a clock skew, the clock cycle time is increased

 Clock skew is reduced by balancing the clock delays

Tcycle ≥ Tclk-q + Tmax_combinational + Tsetup+ Tskew

Trang 24

2013

 Designing a Processor: Step-by-Step

 Datapath Components and Clocking

 Assembling an Adequate Datapath

 Controlling the Execution of Instructions

 The Main Controller and ALU Controller

 Drawback of the single-cycle processor design

Trang 25

2013

 We can now assemble the datapath from its components

 For instruction fetching, we need …

 Program Counter (PC) register

 Instruction Memory

 Adder for incrementing PC

The least significant 2 bits

of the PC are ‘00’ since

PC is a multiple of 4

Datapath does not handle branch or jump instructions

32

Address Instruction

Instruction Memory

30 bits of PC by 1

32

Address Instruction

Instruction Memory

next PC

clk

Trang 26

2013

 Control signals

Op 6 Rs 5 Rt 5 Rd 5 sa 5 funct 6

ALUCtrl RegWrite

BusA & BusB provide data input to ALU

ALU result is connected to BusW

32

Address Instruction

Instruction Memory

Trang 27

2013

 Control signals

 ALUCtrl is derived from the Op field

Op 6 Rs 5 Rt 5 immediate 16

ALUCtrl RegWrite

32

Address Instruction

Instruction Memory

PC and Rt

Rt selects register

to write, not Rd clk

Trang 28

2013

 Control signals

 ALUCtrl is derived from either the Op or the funct field

 RegDst selects the register destination as either Rt or Rd

A mux selects RW

as either Rt or Rd

Another mux selects 2 nd ALU input as either data on BusB or the extended immediate

ALUCtrl RegWrite

Instruction Memory

Trang 29

2013

For R-type ALU instructions, RegDst is

‘1’ to select Rd on RW

select BusB as second ALU input The active part of datapath is shown in green

For I-type ALU instructions, RegDst is

‘0’ to select Rt on RW

select Extended immediate as second ALU input The active part of datapath is shown in green

Instruction Memory

Instruction Memory

Trang 30

2013

 Two types of extensions

 Zero-extension for unsigned constants

 Sign-extension for signed constants

 Control signal ExtOp indicates type of extension

 Extender Implementation: wiring and one AND gate

Imm16

Trang 31

2013

dce

 Additional Control signals

BusB is connected to Data_in of Data Memory for store instructions

Adding Data Memory to Datapath

 A data memory is added for load and store instructions

A 3 rd mux selects data on BusW as either ALU result or memory data_out

Data Memory

Address Data_in Data_out

Instruction Memory

Trang 32

Address Data_in Data_out

Instruction Memory

RegDst = ‘0’ selects Rt

as destination register

RegWrite = ‘1’ to enable writing of register file

MemtoReg = ‘1’ places the data read from memory on BusW

ExtOp = 1 to sign-extend Immmediate16 to 32 bits

Clock edge updates PC and Register Rt

Trang 33

Address Data_in Data_out

Instruction Memory

RegDst = ‘X’ because

no register is written

RegWrite = ‘0’ to disable writing of register file

MemtoReg = ‘X’ because don’t care what data is put on BusW

ExtOp = 1 to sign-extend Immmediate16 to 32 bits

Clock edge updates PC and Data Memory

Trang 34

2013

 Additional Control Signals

Next

PC

Next PC logic

computes jump or branch target instruction address

zero PCSrc

Bne Beq

J

ALUCtrl

Reg Write

ExtOp

RegDst

ALUSrc

Data Memory

Address Data_in Data_out

Instruction Memory

Mem Write

Mem toReg

Trang 35

2013

Imm16 is sign-extended to 30 bits

Jump target address: upper 4 bits of PC are concatenated with Imm26

PCSrc = J + (Beq Zero) + (Bne Zero)

26

Beq Bne

J Zero

Trang 36

Address Data_in Data_out

Instruction Memory

= 0

Mem Write

= 0

Mem toReg

Trang 37

Address Data_in Data_out

Instruction Memory

= 0

Mem Write

= 0

Mem toReg

RegWrite, MemRead, and MemWrite are 0

Either Beq = 1 or Bne

depending on opcode

Clock edge updates PC register only

ALUSrc = 0 to select

value on BusB ALUCtrl = SUB to generate Zero Flag

Next PC outputs branch target address

PCSrc = 1 if branch is taken

ALUSrc

= 0

Trang 38

2013

 Designing a Processor: Step-by-Step

 Datapath Components and Clocking

 Assembling an Adequate Datapath

 Controlling the Execution of Instructions

 The Main Controller and ALU Controller

 Drawback of the single-cycle processor design

Trang 39

2013

Main Control Input:

6-bit opcode field from instruction

Main Control Output:

Datapath

32

Address Instruction

Instruction Memory

A

L

U

ALU Control Input:

 6-bit opcode field from instruction

 6-bit function field from instruction

ALU Control Output:

ALU Control

Main Control

Trang 40

Address Data_in Data_out

Instruction Memory

J, Beq, Bne

MemtoReg

MemRead MemWrite

ExtOp

Main Control

Op ALU

Ctrl

ALUop func

clk

Trang 41

2013

dce

the data value on BusW

second register file output (BusB)

Second ALU operand comes from the extended 16-bit immediate

Data_out ← Memory[address]

Memory[address] ← Data_in

If branch is taken

J PC ← PC + 4 PC ← Jump target address

Main Control Signals

Trang 42

2013

 X is a don’t care (can be 0 or 1), used to minimize logic

Op Reg

Dst

Reg Write

Ext

Op

ALU Src Beq Bne J

Mem Read

Mem Write

Mem toReg

ori 0 = Rt 1 0=zero 1=Imm 0 0 0 0 0 0

Trang 43

2013

RegDst = R-type

RegWrite = (sw + beq + bne + j)

ExtOp = (andi + ori + xori)

ALUSrc = (R-type + beq + bne)

Trang 44

2013

Input Output 4-bit

Encoding

Op 6 funct 6 ALUCtrl

The 4-bit ALUCtrl is encoded according to the ALU implementation

Trang 45

2013

 Designing a Processor: Step-by-Step

 Datapath Components and Clocking

 Assembling an Adequate Datapath

 Controlling the Execution of Instructions

 The Main Controller and ALU Controller

 Drawback of the single-cycle processor design

Trang 46

2013

 Long cycle time

 All instructions take as much time as the slowest instruction

longest delay

Instruction Fetch

ALU Reg Read Decode ALU Reg

Write

Load Instruction

Fetch

Decode Reg Read

Compute Address

Reg Write Memory Read

Store Instruction

Fetch

Decode Reg Read

Compute Address Memory Write

Trang 47

2013

New PC Old PC

Data Memory Access Time Old Data Memory Output Value Data from DM

Occurs Clk

Clock Cycle

Trang 48

2013

 Long cycle time: long enough for Slowest instruction

+ Data Memory Access Time

+ Delay through MemtoReg Mux

+ Setup Time for Register File Write + Clock Skew

 Cycle time is longer than needed for other instructions

 Therefore, single cycle processor design is not used in practice

Trang 49

2013

 Break instruction execution into five steps

 Instruction fetch

 Instruction decode, register read, target address for jump/branch

 Execution, memory address calculation, or branch outcome

 Memory access or ALU instruction completion

 Load instruction completion

 One clock cycle per step (clock cycle is reduced)

 First 2 steps are the same for all instructions

ALU & Store 4 Branch 3

Trang 50

2013

 Assume the following operation times for components:

 Instruction and data memories: 200 ps

 ALU and adders: 180 ps

 Decode and Register file access (read or write): 150 ps

 Ignore the delays in PC, mux, extender, and wires

 Which of the following would be faster and by how much?

 Single-cycle implementation for all instructions

 Multicycle implementation optimized for every class of instructions

 Assume the following instruction mix:

 40% ALU, 20% Loads, 10% stores, 20% branches, & 10% jumps

Trang 51

Register Read

ALU Operation

Data Memory

Register Write Total

ALU 200 150 180 150 680 ps Load 200 150 180 200 150 880 ps Store 200 150 180 200 730 ps Branch 200 150 180 530 ps

880 ps determined by longest delay (load instruction)

880 ps / (3.8 × 200 ps) = 880 / 760 = 1.16

Compare and write PC

Ngày đăng: 28/01/2020, 23:10

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm