Advanced Computer Architecture - Lecture 45: Putting it all together

Advanced Computer Architecture - Lecture 45: Putting it all together. This lecture will cover the following: introduction and quantitative principles; instruction set architecture; computer hardware design; instruction level parallelism – dynamic; instruction level parallelism – static; memory hierarchy system;...

Trang 2

Today’s Topics

 Module 1: Introduction

 Module 2: Instruction Set Architecture

 Module 3: Computer hardware design

 Module 4: Instruction Level Parallelism – Dynamic

 Module 5: Instruction Level Parallelism – Static

 Module 6: Memory Hierarchy system

 Module 7: Multiprocessing

 Module 8: I/O Systems

 Module 9: Networks and Clusters

MAC/VU-Advanced

Computer Architecture Lecture 45 Putting It All together(2) 2

Trang 3

Module 1:

Introduction and Quantitative Principles

We started this course distinguishing the

computer organization and computer

architecture

Architecture refers to those attributes of a

computer visible to the programmer or

compiler writer; e.g.,

instruction set, memory addressing techniques, I/O mechanisms etc

MAC/VU-Advanced

Trang 4

The architecture of the members of a

processor family are same whereas

organization of same architecture may differ between different members of the family

MAC/VU-Advanced

Trang 5

Module 1: Introduction …

Computer Development

We also introduced the computers

developments with academic and commercial perspectives

Academically, modern computer

developments have their infancy in 1944-49,

when John von Neumann introduced the

concept of stored-program computer, referred

to as Electronic Discrete Variable Automatic Computer – EDVAC

MAC/VU-Advanced

Trang 6

Computer Development

Commercially , the first machine was built by

Eckert-Mauchly Computer Corporation in 1949

In 1971, Intel introduced first cheep

microprocessor 4004 and then 80 x 86 series

In 1998, more than 350 million microprocessors with different instruction set architectures were

in use; this number has risen to more than a

billion in 2006

MAC/VU-Advanced

Trang 7

Computer Generations

Technological developments , from vacuum

tubes to VLSI circuits, dynamic memory and

network technology gave birth to four different generations of computers

This course has viewed the Computer

Architecture from four perspectives

 Processor Design

 Memory Hierarchy

 Input/output and storages

 Multiprocessor and Network interconnection

MAC/VU-Advanced

Trang 8

Module 1: Quantitative Principles

The key to the quantitative analysis in

determining the effectiveness of the entire

computing system is the computer hardware and software performance

In this respect , we discussed:

 Price-performance design

 CPU performance metrics

 CPU benchmark suites

MAC/VU-Advanced

Trang 9

MAC/VU-Advanced

Module 1: Price-Performance Design

The issue of cost-performance is complex one

At one extreme, high-performance computers designer may not give importance to the cost

in achieving the performance goal

At the other end, low-cost designer may

sacrifice performance to some extent

The price-performance design lies between

these extremes where the designer balances cost and hence price verses performance

Trang 10

MAC/VU-Advanced

Module1: CPU Benchmark Suites

In order to compare the performance of two

machines, a user can simply compare the

execution time of the same workload running

on both the machines

In practice users want to know, without

running their own programs, that how well the machine will perform on their workload

This is accomplished by evaluating the

machine using a set of benchmarks – the

programs specifically chosen to measure the performance

Trang 11

MAC/VU-Advanced

Module1: CPU Benchmark Suites

Five levels of programs are used as benchmarks:

1 Real Applications – scientific programs evaluate

the performance of a machine

2 Modified Applications – the real applications with

certain blocks modified to focus desired aspects

of application,

3 Kernels – the small key pieces extracted from the

real program

4 Toy benchmarks – small codes normally used as

beginning programming assignments.

5 Synthetic benchmarks – the small section of

Artificially created program

Trang 12

MAC/VU-Advanced

Module1: Quantitative Principles of Quantitative

Performance Measurement

Quantitatively the performance of a system can

be enhanced by speedup of a fraction of

system based on the concept of the common case first

Amdahl’s Law is the basis of the measure of

the performance enhancement

which defines the Speedup due to enhancement

E that that accelerates a fraction F of the task as:

Trang 13

MAC/VU-Advanced

Module 1: Amdahl's Law

Ex Time without Enhancement

Trang 14

Module 2:

Instruction Set Architecture

The three pillars of computer architecture are:

 hardware,

 instruction set

 software

Hardware facilitates to run the software and

instruction set is the interface between the

hardware and software

– While talking about the Instruction set

architecture the focus of our discussion has been:

MAC/VU-Advanced

Trang 16

MAC/VU-Advanced

Module 2: Taxonomy of Instruction Set

The taxonomy of Instruction set was defined as:

– Stack Architecture:

– Accumulator Architecture

– General Purpose Register Architecture

 Register – memory

 Register – Register (load/store)

 Memory – Memory Architecture (Obsolete)

Trang 17

MAC/VU-Advanced

Module 2: Types of Operands and

Arithmetic, data transfer, control

and support operations

Trang 18

MAC/VU-Advanced

Module 2: Types of Operands addressing

modes

Operand Addressing Modes

Immediate, register, direct Immediate, register, direct (absolute) and

Indirect

Classification of Indirect Addressing

Register, indexed, relative

Register, indexed, relative (i.e with

displacement) and memory

Special Addressing Modes

Auto-increment, auto-decrement and scaled

Control Instruction Addressing modes

Branch, jump and procedure call/return

Trang 19

MAC/VU-Advanced

Module 3: Computer Hardware design

Basic building blocks of a computer

Sub-systems of CPU: Datapath and Control Processor design steps

Processor design parameters

Hardware design process

Timing signals

Uni-bus, 2-bus and 3-bus structures

3-bus based single cycles data path

Trang 20

Data Path

CONTROL

MAC/VU-Advanced

Sub-systems of Central Processing Unit

– Datapath:

the path that facilitates the

transfer of information from

one part (register/memory/ IO)

to the other part of the system

- Control:

the hardware that generates

signals to control the

sequence of steps and direct

the flow of information

through the datapath

At a “higher level” a CPU can be viewed as consisting of two sub-systems

Trang 21

Module 3: Datapath Implementations

The datapath is the arithmetic organ of the Von- Neumann’s stored-program

Based on the concepts of single cycle,

multiple cycle and pipelined architecture

MAC/VU-Advanced

Trang 22

Module 3: Datapath Implementation

It consists of registers, internal buses,

arithmetic units and shifters

Each register in the register file has:

- a load control line that enables data load to

Trang 23

Lecture 45 Putting It All together(2) 23

Module 3: Single/Multiple Cycle Approach

In the Single Cycle implementation, the cycle time is set to accommodate the longest

instruction, the Load instruction.

In the Multiple Cycles implementation, the

cycle time is set to accomplish longest step, the memory read/write

Consequently, the cycle time for the Single

Cycle implementation can be five times longer

than the multiple cycle implementation.

MAC/VU-Advanced

Computer Architecture

Trang 24

MAC/VU-Advanced

Module 3: Pipelined Datapath

Pipelining is a fundamental concept

Where an instructions is completed in multiple steps using distinct resources

It utilizes capabilities of the Datapath by

Starting next instruction while working on the current one

The pipelined datapath may encounter three

types of hazards

Structural, Data and Control

Trang 25

Module 3: Pipeline Hazards

Structural hazards occur when same

resource is accessed by more than one

instructions; e.g.,

One memory port or one register write port

It can be removed by using either multiple resources or inserting stall

Stall degrades the pipeline performance

MAC/VU-Advanced

Trang 26

Module 3: Pipeline Hazards

Data Hazards occur when attempt is made to read invalid data

Data hazard can be removed by using stall and forwarding techniques

Control hazards occur when an attempt is

made to branch prior to the evaluation of the condition

Four ways to handle control hazards

MAC/VU-Advanced

Trang 27

Module 3: 4 ways to handle control hazard

1: Stall until branch direction is clear

2: Predict Branch Not Taken

Execute successor instructions in sequence

3: Predict Branch Taken

Trang 28

MAC/VU-Advanced

Module 4: Instruction Level parallelism

Simple pipeline facilitates in-order execution

Whereas, in order to enhance the performance

of the pipeline, we want to begin execution as soon as the data operands are available, i.e , out-of-order execution

Out-of-order execution may introduce data

hazards of type WAR and WAW

Instruction Level Parallelism can be achieved

by Hardware or Software

Trang 29

MAC/VU-Advanced

Module 4: Instruction Level Parallelism

In SW parallelism, the dependencies are

defined by program result in hazards if HW

cannot resolve

HW exploiting ILP works when dependence

cannot be determined at run time

These hardware techniques to exploit ILP are referred to as Dynamic Scheduling techniques

Trang 30

MAC/VU-Advanced

Dynamic scheduling is accomplished by

dividing the ID stage into two parts

Issue the instruction in-order Read operand out-of-order

Structural and data dependencies are checked

at ID stage

It facilitates out-of-order execution which

results in out-of-order completion

Trang 31

Lecture 45 Putting It All together(2)

Recap: ILP- Dynamic Scheduling

31

We discussed the score-boarding and

Tomasulo’s algorithm as the basic

concepts for dynamic scheduling in

integer and floating-point datapath

The structures implementing these

concepts facilitate out-of-order execution

to minimize data dependencies thus avoid data hazards without stalls

MAC/VU-Advanced

Trang 32

MAC/VU-Advanced

high Performance without special compilers

Here, the control and buffers are distributed

with Function Units (FU)

or pointers to reservation stations(RS) ; i.e., the registers are renamed

Unlike Scoreboard, Tomasulo can have multiple loads outstanding

Trang 33

33

We also discussed branch-prediction

techniques and different types of

branch-predictors, used to reduce the number of

stalls due to control hazards

The concept of multiple instructions issue

was discussed in details

This concept is used to reduce the CPI to less that one, thus, the performance of the

processor is enhanced

MAC/VU-Advanced

Trang 34

34

We studied extensions to the Tomasulo’s structure by including hardware-based

speculation

It allows to speculate that branch is

correctly predicted, thus may execute of-order

out-but commit in-order having confirmed that the speculation is correct and no

exceptions exist

MAC/VU-Advanced

Trang 35

The major hardware-based techniques studied are summarized here:

Technique Hazards type stalls Reduced

Data Hazard Stalls

bypass

Control Hazard Stalls

and Branch Scheduling

Hazard Stalls from

Scheduling (score boarding)

Trang 36

Dynamic Scheduling Techniques Cont’d

Stalls from: data hazards

with renaming from dependences and

anti-(Tomasulo’s Approach) from output dependences

Trang 37

Module 5: Static Approach for ILP

The multiple-instruction-issues per cycle processors are rated as the high-

performance processors

These processors exist in a variety of

flavors, such as:

Trang 38

Module 5: Static Approach for ILP

The superscalar processors exploit ILP

using static as well as dynamic scheduling approaches

The VLIW processors, on the other hand, exploits ILP using static scheduling only

The major software scheduling techniques, under discussion, to reduce the data and control stalls, are as follows:

MAC/VU-Advanced

Trang 39

Introduction to Static Scheduling in ILP

- Basic Compiler Data

Trang 40

Module 6: Memory Hierarchy System

Here, we discussed how the gap between the speed of processor and the storage devices - DRAM, SRAM and Disk is increasing with time

We studied that in order to obtain high speed storage at the cheapest cost per byte, different types of memory modules are organize in

hierarchy, based on the:

Concept of Caching and

Principle of Locality

MAC/VU-Advanced

Trang 41

The principle of locality states that to obtain data or instructions of a program, the

processor access, at any instant of time,

a relatively small portion of the address space

of the fastest memory closet to the processor There are two different types of locality:

Temporal locality is the locality in time

Spatial locality is the locality in space

MAC/VU-Advanced

Trang 42

Concept of caching states that a small, fastest and most expensive storage be used as the

staging area or temporary-place to:

– store frequently-used subset of the data or

instructions from the relatively cheaper, larger and slower memory; and

– To avoid having to go to the main memory

every time this information is needed

The performance of cache is limited by different types of penalties

MAC/VU-Advanced

Computer Architecture Lecture 27 Memory Hierarchy (3) 42

Trang 43

MAC/VU-Advanced

Then we that talked four options to improve the cache performance

These options are used to reduce:

─ the miss penalty

─ the miss rate

─ the miss Penalty or miss rate via

Parallelism

─ the time to hit in the cache

Tiêu đề	Putting It All Together
Người hướng dẫn	Prof. Dr. M. Ashraf Chughtai
Trường học	mac/vu
Chuyên ngành	advanced computer architecture
Thể loại	lecture

Định dạng
Số trang	58
Dung lượng	1,63 MB