Advanced Computer Architecture - Lecture 28: Memory hierarchy design

Advanced Computer Architecture - Lecture 28: Memory hierarchy design. This lecture will cover the following: cache design and policies; placement and replacement policies; cache write strategy; cache performance enhancement; memory hierarchy designer’s concerns; block placement policy;...

Trang 1

CS 704

Advanced Computer Architecture

Lecture 28

Memory Hierarchy Design

(Cache Design and policies )

Prof Dr M Ashraf Chughtai

Trang 2

Today’s Topics

Recap: Cache Addressing Techniques

Placement and Replacement Policies Cache Write Strategy

Cache Performance Enhancement

Summary

Trang 3

Recap: Block Size Trade off

Impact of block size on the cache

performance and categories of cache design

The trade-off of the block size verses the Miss rate, Miss Penalty, and

Average access time , the basic CPU performance matrices

Trang 4

Recap: Block Size Trade off

– The larger block size reduces the miss

rate, but If block size is too big relative to cache size, miss rate will go up; and

– Miss penalty will go up as the block size

increases; and

– Combining these two parameters, the

third parameter, Average Access Time

Trang 5

Recap: Cache Organizations Cache organizations

Block placement policy, we studied three cache organizations.

Trang 6

Recap: Cache Organizations

– Direct Mapped where each block has

only one place it can appear in the cache – Conflict Miss

– Fully Associative Mapped where any

block of the main memory can be placed any where in the cache; and

– Set Associative Mapped which allows to

place a block in a set of places in the

cache

Trang 7

Memory Hierarchy Designer’s Concerns

Block placement: Where can a block be

placed in the upper level?

Block identification: How is a block found if

it is in the upper level?

Block replacement: Which block should be replaced on a miss?

Write strategy: What happens on a write?

Trang 8

Block Placement Policy

Fully Associative:

Block can be

placed any where

in the upper level

(Cache)

E.g Block 12 from

the main memory

can be place at

block 2, 6 or any of

the 8 block

locations in cache

Trang 9

Block Placement PolicySet Associative: Block can be

placed any where in a set in

upper level (cache)

The set number in the upper

level given as:

(Block No) MOD (number of sets)

E.g., an 8-block, 2-way set

associative mapped cache, has 4

sets [0-3] each of two blocks;

therefore

and block 12 or 16 of main

memory can go any where in

set # 0 as (12 MOD 4 = 0) and

Similarly, block 14 can

be placed at any of the

Trang 10

Block Placement Policy

Direct Mapped: (1 way associative)

Block can be placed at only one

specific location in upper level (Cache)

The location in the cache is given by:

Block number MOD No of cache blocks

E.g., the block 12 or

block 20 can be place

at location 4 in cache

having 8 blocks as:

12 MOD 8 = 4

20 MOD 8 = 4

Trang 11

Block Identification

How is a block found if it is in the upper level? Tag/Block

A TAG is associated with each block frame The TAG gives the block address

All possible TAGS, where a block may be

placed are checked in parallel

Valid bit is used to identify whether the

block contains correct data

– No need to check index or block offset

Trang 12

Block Identification: Direct Mapped

Cache Index 5bits

0 1

Cache Data

Byte 0

0 4

Cache Tag

Byte Select Ex: 0x00 9

31

Lower Level (Main) memory: 4GB – 32-bit address

31

Trang 13

Block Identification

Cache Index 4bits

0 4

31

8 9

0 1

Byte 0 Byte 1

Byte 32 Byte 33

Trang 14

Block Replacement Policy

In case of cache miss, a new block

needs to be brought in

If the existing block locations, as

defined by Block placement policy, the are filled,

then an existing block has to be fired based on

– Cache mapping; and

– some block replacement policy

Trang 15

For the Direct Mapped Cache, the block replacement is very simple as a block can be place at only one location given by:

(Block No.) MOD (Number of Cache Blocks

There are three commonly used

schemes for Fully and Set Associative mapped

These policies are:

Trang 16

Random: replace any block

– it is simple and easiest to implement

– The candidate for replacement are

randomly selected

– Some designers use pseudo random

block numbers

Trang 17

Least Recently Used (LRU): replace the block either never used of used long ago

– It reduces the chances of throwing out

information that may be needed soon

– Here, the access time and number of times a

block is accessed is recorded

– The block replaced is one that has not been

used for longest time

– E.g., if the blocks are accessed in the sequence

0,2,3,0, 4,3,0,1,8,0 the victim to replace is block

Trang 18

First-in, First-out (FIFO): the block first

place in the cache is thrown out first; e.g., if the blocks are accessed in the sequence

2,3,4,5,3,4

then to bring in a new block in the cache,

the block 2 will be thrown out as it is the

oldest accessed block in the sequence

FIFO is used as approximation to LRU as

LRU can be complicated to calculate

Trang 19

Block Replacement Policy: Conclusion

Associativity 2-way 4-way 8-way

Trang 20

Write Strategy

Must not overwrite a cache block

unless main memory is up to date

Multiple CPUs may have individual

caches

I/O may address main memory directly

Memory is accessed for read and write purposes

Trang 21

Write Strategy Cont’d

The instruction cache accesses are

read

Instruction issue dominates the cache traffic as the writes are typically 10% of the cache access

Furthermore, the data cache are

10%-20% of the overall memory access are write

Trang 22

In order to optimize the cache

performance, according to the

Amdahl’s law, we make the common case fast

Fortunately, the common case, i.e., the

cache read, is easy to make fast as:

– Read can be optimized by making the

tag-checking and data-transfer in parallel

Thus, the cache performance is good

Trang 23

However, in case of cache-write, the cache contents modification cannot begin until the tag is checked for address-hit

Therefore the cache-write cannot begin in

parallel with the tag checking

Another complication is that the processor specifies the size of write which is usually a portion of the block

Trang 24

– Write back —The information is written only to

the block in the cache The modified cache block

is written to main memory only when it is

replaced

both the block in the cache and to the block in

the lower-level memory

Trang 25

Write Strategy: Pros and Cons of each

Write Back:

– No write to the lower level for repeated

writes to cache

– a dirty bit is commonly used to indicate the

status as the cache block is modified (dirty)

or not modified (clean)

– Reduce memory-bandwidth requirements,

hence the reduces the memory power

requirements

Trang 26

Write Strategy: Pros and Cons of each

Write Through:

– Simplifies the replacement procedure

– the block is always clean, so unlike

write-back strategy the read misses cannot result

in writes to the lower level

– always combined with write buffers so that

don’t wait for lower level memory

– Simplifies the data-coherency as the next

lower level has the most recent copy (we

will discuss this later)

Trang 27

Write Buffer for Write Through

Write Buffer

DRAM

Trang 28

Write buffer is just a FIFO: Typical

number of entries: 4

Once the data is written into the write buffer and assuming a cache hit, the CPU is done with the write

The memory controller will then move the write buffer’s contents to the real memory behind the scene

Trang 29

DRAM cycle time sets the upper limit

on how frequent you can write to the main memory.

The write buffer works as long as the frequency of store, with respect to the time, is not too high, i.e.,

Store frequency << 1 / DRAM write cycle

Trang 30

If the store are too close together or the CPU time is so much faster than the DRAM cycle time, you can end up overflowing the write

buffer and the CPU must stop and wait.

A Memory System designer’s nightmare is

when the Store frequency with respect to

time approaches 1 over the DRAM Write

Cycle Time, i.e.,

The CPU Cycle Time <= DRAM Write Cycle Time

We call this Write Buffer Saturation

Trang 31

Write Buffer Saturation

In that case, it does NOT matter how big

you make the write buffer, the write buffer will still overflow because you are simply feeding things in it faster than you can

empty it

There are two solutions to this problem:

buffer and replace this write through cache with a write back cache

Trang 32

Write Buffer Saturation

Processor

Cache

Write Buffer

DRAM L2

Cache

Trang 33

No-write Allocate: Usually the

write-misses do not affect the cache, rather

the block is modified only in the lower

Trang 34

Write-Miss Policy

The blocks stay out of the cache in no-write allocate until the program tries to read the

blocks, but

The blocks that are only written will still

be in the cache with write allocate

Let us discuss it with the help of

example

Let’s look at our 1KB direct mapped

cache again

Trang 35

Cache Data

Byte 0

0 4

:

Byte 1

Byte 32 Byte 33

Cache Tag

Byte Select Ex: 0x00 9

Trang 36

Write-miss Policy

Assume we do a 16-bit write to memory

location 0x000000 and causes a cache miss

in our 1KB direct mapped cache that has byte block select

32-After we write the cache tag into the cache and write the 16-bit data into Byte 0 and

Byte 1, do we have to read the rest of the

block (Byte 2, 3, Byte 31) from memory?

If we do read the rest of the block in, it is

called write allocate

Trang 37

As the principle of spatial locality

implies that we are likely to access

them soon.

But the type of access we are going to

do is likely to be another write.

Trang 38

So even if we do read in the data, we may end up overwriting them anyway so it is a common practice to NOT read in the rest of the block on a write miss.

If you don’t bring in the rest of the block, or use the more technical term, Write Not

Allocate, you better have some way to tell the processor the rest of the block is no

longer valid.

Trang 39

No write-allocate verses write allocate:

Example

Let us consider a fully associative

write-back cache with cache entries that start

empty

Consider the following sequence of five

memory operations and find

The number of hits and misses when using no-write allocate verses write allocate

Trang 40

Trang 41

Example

So the first two writes will result in MISSES

Address [200] is also not in the cache, the reed is also miss

The subsequent write [200] is a hit

The last write [100] is still a miss

The result is 4 MISSes and 1 HIT

Trang 42

Example

For the write-allocate policy

The first access to 100 and 200 are MISSES

The rest are HITS as [100] and [200] are both found in the cache

The result is 2 MISSes and 3 HITs

Trang 43

Trang 44

Conclusion

Write-through caches often use No Write Allocate, the reason is that

even if there is a subsequent write

to the block, the write must go to

the lower level memory

Trang 45

Allah Hafiz

And Aslam-U-Alacum

Tiêu đề	memory hierarchy design
Người hướng dẫn	Prof. Dr. M. Ashraf Chughtai
Trường học	mac/vu
Chuyên ngành	advanced computer architecture
Thể loại	lecture

Định dạng
Số trang	45
Dung lượng	1,3 MB