dce Exercise 1 Fill the value of the control signals for following instruction: a.. dce Exercise 2 • We wish to add the instruction jalr jump and link register to the single-cycle datapa
Trang 1TP.HCM
2013
dce
COMPUTER ARCHITECTURE CE2013
Faculty of Computer Science and
Engineering Department of Computer Engineering
Vo Tan Phuong
http://www.cse.hcmut.edu.vn/~vtphuong
Trang 2dce
Chapter 4
Single-cycle & Pipeline
Processor
Trang 3dce
zero
Single-Cycle Processor Overview
PCSrc
E
Data Memory
Address
Data_in Data_out
32
A L U
ALU result
32
5
Registers
RA
RB
BusA
BusB
RW BusW
32
Address Instruction
Instruction Memory
+1
30
Rs
5
Rd
Imm26
Rt
m u x
0
1 5
m u x
0
1
m u x
0
1
m u x
0
1
30
30 Jump or Branch Target Address
30
Imm16
Next PC
RegDst
ALUSrc RegWrite
J, Beq, Bne
MemtoReg
MemRead
MemWrite ExtOp
Main Control
Ctrl
ALUop func
clk
Trang 4dce
Exercise 1
Fill the value of the control signals for following instruction:
a slt $t0,$s0,$zero
b bne $t0,$zero,exit_label
Reg Dst
Reg Write
Ext Op
ALU Src
Beq Bne J Mem
Read
Mem Write
Mem toReg
Reg Dst
Reg Write
Ext Op
ALU Src
Beq Bne J Mem
Read
Mem Write
Mem toReg
Trang 5dce
Exercise 2
• We wish to add the instruction jalr (jump and link
register) to the single-cycle datapath Add any necessary datapath and control signals and draw the result datapath Show the values of the control signals to
control the execution of the jalr instruction.
• The jump and link register instruction is described
below:
Trang 6dce
• One solution:
(Comment: JReg means Jump Register; RA means: Return Address)
Exercise 2
Trang 7dce
• The main control signals for the JALR instruction are the same for other R-type instructions, such as ADD and SUB These control signals are shown in the table below:
• The ALU Control signals for the JALR instruction are shown below JReg = 1 and RA = 1 ALUCtrl is a don't care
Exercise 2
Trang 8dce
Exercise 3
We want to compare the performance of a single-cycle CPU design with a multi-cycle CPU Suppose we add the multiply and divide
instructions The operation times are as follows:
o Instruction memory access time = 190 ps, Data memory access time = 190 ps
o Register file read access time = 150 ps, Register file write access = 150 ps
o ALU delay for basic instructions = 190 ps, ALU delay for multiply or divide =
550 ps Ignore the other delays in the multiplexers, control unit, sign-extension, etc.
Assume the following instruction mix: 30% ALU, 15% multiply & divide, 15% load, 15% store, 15% branch, and 10% jump.
a What is the total delay for each instruction class and the clock cycle for the single-cycle CPU design
b Assume we fix the clock cycle to 200 ps for a multi-cycle CPU, what is the CPI for each instruction class and the speedup over a fixed-length clock cycle?
Trang 9dce
Exercise 3
a Total delay for each instruction:
Clock cycle = max delay = 1040ps
Trang 10dce
Exercise 3
b CPI for each instruction:
CPI for Basic ALU = 4 cycles
CPI for Multiply & Divide = 6 cycles (ALU takes 3 cycles)
CPI for Load = 5 cycles
CPI for Store = 4 cycles
CPI for Branch = 3 cycles
CPI for Jump = 2 cycles
Average CPI = 0.3 * 4 + 0.15 * 6 + 0.15 * 5 + 0.15 * 4 + 0.15 * 3 + 0.1 *
2 = 4.1
Speedup of multi-cycle over single-cycle = (1040 * 1) / (200 * 4.1) =
1.27
Trang 11dce
Exercise 4
• Identify all the RAW data dependencies in the following code Which dependencies are data hazards that will be resolved by forwarding? Which dependencies are data hazards that will cause a stall? Using a graphical representation of the pipeline, show the forwarding paths and stalled cycles if any
add $3, $4, $2
sub $5, $3, $1
lw $6, 200($3)
add $7, $3, $6
Trang 12dce
Exercise 4
• RAW dependencies:
add $3, $4, $2 and sub $5, $3, $1 (forwarding)
add $3, $4, $2 and lw $6, 200($3) (forwarding)
lw $6, 200($3) and add $7, $3, $6 (stall 1, forward)
add $3, $4, $2 and add $7, $3, $6 (from register)
Trang 13dce
Exercise 5
• We have a program of 10^6 instructions in the format of “lw, add,
lw, add ,…” The add instruction depends only on the lw instruction right before it The lw instruction also depends only on the add
instruction right before it If this program is executed on the 5-stage MIPS pipeline:
It takes 6 cycles on average to complete one LW and one ADD.
1 cycle (to complete LW) + 2 cycles (bubbles) + 1 cycle (to complete ADD) + 2 cycles (bubbles) = 6 cycles
So, it takes 6 cycles to complete 2 instructions
Average CPI = 6/2 = 3
b With forwarding, what would be the actual CPI?
It takes only 3 cycles on average to to complete one LW and one ADD
1 cycle (to complete LW) + 1 cycle (bubble) + 1 cycle (to complete ADD) = 3 cycles
So, it takes 3 cycles to complete 2 instructions