1. Trang chủ
  2. » Công Nghệ Thông Tin

Advanced Computer Architecture - Lecture 30: Memory hierarchy design

51 7 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Memory Hierarchy Design
Người hướng dẫn Prof. Dr. M. Ashraf Chughtai
Trường học mac/vu
Chuyên ngành advanced computer architecture
Thể loại lecture
Định dạng
Số trang 51
Dung lượng 1,35 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Advanced Computer Architecture - Lecture 30: Memory hierarchy design. This lecture will cover the following: cache performance enhancement; reducing miss rate; classification of cache misses; reducing cache miss rate; way prediction and pseudo-associativity; compiler optimization;...

Trang 1

CS 704

Advanced Computer Architecture

Lecture 30

Memory Hierarchy Design

Cache Performance Enhancement

(Reducing Miss Rate)

Trang 2

Today’s Topics

Recap: Reducing Miss Penalty

Classification of Cache Misses

Reducing Cache Miss Rate

Summary

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 2

Trang 3

Recap: Improving Cache Performance

The miss penalty

The miss rate

The miss Penalty or miss rate via

Parallelism

The time to hit in the cache

Trang 4

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 4

Recap: Reducing Miss Penalty

Multilevel Caches

Critical Word first and Early Restart

Priority to Read Misses Over writes

Merging Write Buffers

Victim Caches

Trang 5

Recap: Reducing Miss Penalty

‘Multi level caches’

‘The more the merrier

Trang 6

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 6

Recap: Reducing Miss Penalty

“ Critical Word First and Early Restart’ intolerance

Reduces miss-penalty

Trang 7

Recap: Reducing Miss Penalty

‘Priority to read miss over the write miss’

Favoritism

Trang 8

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 8

Recap: Reducing Miss Penalty

‘Merging write-buffer,’

Acquaintance

Victim cache

Salvage

Trang 9

Recap: Reducing Miss Penalty

Reduces miss penalty

Multi level caches

Reduces miss rate

Cache-misses

Methods to reduce the miss rate

Trang 11

Cache Misses - Classification

Trang 12

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 12

Cache Misses - Classification

Conflict Miss

Many blocks map to the same address or set

Trang 13

Cache Misses - Classification

Trang 14

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 14

Cache Misses - Classification

Trang 15

Reducing Miss Rate

1 Larger Block Size

2 Larger Caches

3 Higher Associativity

4 Way Prediction and Pseudo-associativity

5 Compiler Optimization

Trang 16

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 16

1: Larger Block Size

Reduce the miss rate

Spatial locality

Larger block have maximum number of data

or instructions

Trang 17

1: Larger Block Size

In small cache , larger blocks may increase

Trang 18

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 18

1: Larger Block Size

Trang 19

1: Larger Block Size

Assumption

80 clock cycles of over head

Delivers 16 bytes every 2 clock cycle

Hit time of 1 clock cycle

Trang 20

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 20

1: Larger Block Size: Solution

Average memory access time

= Hit time + Miss Rate x Miss Penalty

4KB cache, the miss rate = 7.24%

Miss penalty =

80 +4 = 84 clocks

Trang 21

1: Larger Block Size: Solution

Trang 22

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 22

1: Larger Block Size: Solution

Copy table 5.18 pp 428

Trang 23

1: Larger Block Size: Solution

– Latency and bandwidth of the lower level memory

– High latency and high bandwidth

Trang 24

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 24

2: Large Cache Size

Reduce Capacity misses

In 2001

2 nd – level and 3 rd -level caches

Drawback

Longer hit time

Higher cost (access time)

Trang 27

3: Higher Associativity

Trang 28

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 28

4: Way Prediction and

Pseudo-associativity

Fast hit time of direct-mapped caches

Lower conflict misses of set-associative caches

Reduces the conflict misses

Trang 29

4: Way Prediction and Pseudo-associativity

Way Prediction

Block in a set

Steps

1 Extra bits

Trang 30

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 30

4: Way Prediction and Pseudo-associativity

2-way prediction 4-way prediction

2 Multiplexer

Single tag

Trang 31

4: Way Prediction and Pseudo-associativity

Other blocks for matches in subsequent

clock cycles

 Alpha 21264

Latency of 1 clock cycle

3 clock cycles

Trang 32

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 32

4: Way Prediction and Pseudo-associativity

Pseudo-associative

Column associative caches

Miss

“Pseudo-set”

Trang 33

4: Way Prediction and Pseudo-associativity

Pseudo Associative caches

Performance

“Slower hit”

Trang 35

5: Compiler Optimization

Instruction

Code optimization

Reordering the Procedures

– Using Cache-line Alignment

Trang 36

The code-line alignment method:

Decreases cache miss

Entry point is at the beginning of a cache

block

Trang 39

5: Compiler Optimization: Loop Interchange

• program having nested loops that

access data in non-sequential

order for j (0100) and in

sequential order for i (05000)

Trang 43

5: Compiler Optimization:

Using Blocking

‘Row major order’ (row-by-row)

‘Column major order’

Iteration for matrix multiplication

Trang 45

Whereas,

can hold one N x N matrix and

one row of N elements ,

then the full matrix Z and one ( i th )

row of Y can stay in the cache

Trang 46

MAC/VU-Advanced

Computer Architecture Lecture 30 Memory Hierarchy (6) 46

5: Compiler Optimization:

Using Blocking

B is chosen such that one row of B and

one B x B matrix can fit in in cache.

This ensures that the y and z blocks are

resident on cache

Let us have a look on to the modified code which shows that the two inner loops now compute in steps of size B (blocking

factor) rather than the full N x N size of

arrays X and Z

Trang 47

5: Compiler Optimization:

Using Blocking

Trang 49

The way-prediction techniques checks a

section of cache for hit and then on miss it checks the rest of the cache

The final technique – loop interchange and blocking , is a software approach to

optimize the cache performance

Next time we will talk about the way to

enhance performance by having processor

Trang 50

Example: Avg Memory Access

Time vs Miss Rate

Example: assume CCT = 1.10 for 2-way, 1.12 for 4-way, 1.14 for 8-way vs CCT

direct mapped

Trang 51

Allah Hafiz

Ngày đăng: 05/07/2022, 11:55

TỪ KHÓA LIÊN QUAN