Advanced Computer Architecture - Lecture 28: Memory hierarchy design. This lecture will cover the following: cache design and policies; placement and replacement policies; cache write strategy; cache performance enhancement; memory hierarchy designer’s concerns; block placement policy;...
Trang 1CS 704
Advanced Computer Architecture
Lecture 28
Memory Hierarchy Design
(Cache Design and policies )
Prof Dr M Ashraf Chughtai
Trang 2Today’s Topics
Recap: Cache Addressing Techniques
Placement and Replacement Policies Cache Write Strategy
Cache Performance Enhancement
Summary
Trang 3Recap: Block Size Trade off
Impact of block size on the cache
performance and categories of cache design
The trade-off of the block size verses the Miss rate, Miss Penalty, and
Average access time , the basic CPU performance matrices
Trang 4Recap: Block Size Trade off
– The larger block size reduces the miss
rate, but If block size is too big relative to cache size, miss rate will go up; and
– Miss penalty will go up as the block size
increases; and
– Combining these two parameters, the
third parameter, Average Access Time
Trang 5Recap: Cache Organizations Cache organizations
Block placement policy, we studied three cache organizations.
Trang 6Recap: Cache Organizations
– Direct Mapped where each block has
only one place it can appear in the cache – Conflict Miss
– Fully Associative Mapped where any
block of the main memory can be placed any where in the cache; and
– Set Associative Mapped which allows to
place a block in a set of places in the
cache
Trang 7Memory Hierarchy Designer’s Concerns
Block placement: Where can a block be
placed in the upper level?
Block identification: How is a block found if
it is in the upper level?
Block replacement: Which block should be replaced on a miss?
Write strategy: What happens on a write?
Trang 8Block Placement Policy
Fully Associative:
Block can be
placed any where
in the upper level
(Cache)
E.g Block 12 from
the main memory
can be place at
block 2, 6 or any of
the 8 block
locations in cache
Trang 9Block Placement PolicySet Associative: Block can be
placed any where in a set in
upper level (cache)
The set number in the upper
level given as:
(Block No) MOD (number of sets)
E.g., an 8-block, 2-way set
associative mapped cache, has 4
sets [0-3] each of two blocks;
therefore
and block 12 or 16 of main
memory can go any where in
set # 0 as (12 MOD 4 = 0) and
Similarly, block 14 can
be placed at any of the
Trang 10Block Placement Policy
Direct Mapped: (1 way associative)
Block can be placed at only one
specific location in upper level (Cache)
The location in the cache is given by:
Block number MOD No of cache blocks
E.g., the block 12 or
block 20 can be place
at location 4 in cache
having 8 blocks as:
12 MOD 8 = 4
20 MOD 8 = 4
Trang 11Block Identification
How is a block found if it is in the upper level? Tag/Block
A TAG is associated with each block frame The TAG gives the block address
All possible TAGS, where a block may be
placed are checked in parallel
Valid bit is used to identify whether the
block contains correct data
– No need to check index or block offset
Trang 12Block Identification: Direct Mapped
Cache Index 5bits
0 1
Cache Data
Byte 0
0 4
Cache Tag
Byte Select Ex: 0x00 9
31
Lower Level (Main) memory: 4GB – 32-bit address
31
Trang 13Block Identification
Cache Index 4bits
0 4
31
8 9
0 1
Byte 0 Byte 1
Byte 32 Byte 33
Trang 14Block Replacement Policy
In case of cache miss, a new block
needs to be brought in
If the existing block locations, as
defined by Block placement policy, the are filled,
then an existing block has to be fired based on
– Cache mapping; and
– some block replacement policy
Trang 15Block Replacement Policy
For the Direct Mapped Cache, the block replacement is very simple as a block can be place at only one location given by:
(Block No.) MOD (Number of Cache Blocks
There are three commonly used
schemes for Fully and Set Associative mapped
These policies are:
Trang 16Block Replacement Policy
Random: replace any block
– it is simple and easiest to implement
– The candidate for replacement are
randomly selected
– Some designers use pseudo random
block numbers
Trang 17Block Replacement Policy
Least Recently Used (LRU): replace the block either never used of used long ago
– It reduces the chances of throwing out
information that may be needed soon
– Here, the access time and number of times a
block is accessed is recorded
– The block replaced is one that has not been
used for longest time
– E.g., if the blocks are accessed in the sequence
0,2,3,0, 4,3,0,1,8,0 the victim to replace is block
Trang 18Block Replacement Policy
First-in, First-out (FIFO): the block first
place in the cache is thrown out first; e.g., if the blocks are accessed in the sequence
2,3,4,5,3,4
then to bring in a new block in the cache,
the block 2 will be thrown out as it is the
oldest accessed block in the sequence
FIFO is used as approximation to LRU as
LRU can be complicated to calculate
Trang 19Block Replacement Policy: Conclusion
Associativity 2-way 4-way 8-way
Trang 20Write Strategy
Must not overwrite a cache block
unless main memory is up to date
Multiple CPUs may have individual
caches
I/O may address main memory directly
Memory is accessed for read and write purposes
Trang 21Write Strategy Cont’d
The instruction cache accesses are
read
Instruction issue dominates the cache traffic as the writes are typically 10% of the cache access
Furthermore, the data cache are
10%-20% of the overall memory access are write
Trang 22Write Strategy Cont’d
In order to optimize the cache
performance, according to the
Amdahl’s law, we make the common case fast
Fortunately, the common case, i.e., the
cache read, is easy to make fast as:
– Read can be optimized by making the
tag-checking and data-transfer in parallel
Thus, the cache performance is good
Trang 23Write Strategy Cont’d
However, in case of cache-write, the cache contents modification cannot begin until the tag is checked for address-hit
Therefore the cache-write cannot begin in
parallel with the tag checking
Another complication is that the processor specifies the size of write which is usually a portion of the block
Trang 24Write Strategy Cont’d
– Write back —The information is written only to
the block in the cache The modified cache block
is written to main memory only when it is
replaced
both the block in the cache and to the block in
the lower-level memory
Trang 25Write Strategy: Pros and Cons of each
Write Back:
– No write to the lower level for repeated
writes to cache
– a dirty bit is commonly used to indicate the
status as the cache block is modified (dirty)
or not modified (clean)
– Reduce memory-bandwidth requirements,
hence the reduces the memory power
requirements
Trang 26Write Strategy: Pros and Cons of each
Write Through:
– Simplifies the replacement procedure
– the block is always clean, so unlike
write-back strategy the read misses cannot result
in writes to the lower level
– always combined with write buffers so that
don’t wait for lower level memory
– Simplifies the data-coherency as the next
lower level has the most recent copy (we
will discuss this later)
Trang 27Write Buffer for Write Through
Write Buffer
DRAM
Trang 28Write Buffer for Write Through
Write buffer is just a FIFO: Typical
number of entries: 4
Once the data is written into the write buffer and assuming a cache hit, the CPU is done with the write
The memory controller will then move the write buffer’s contents to the real memory behind the scene
Trang 29Write Buffer for Write Through
DRAM cycle time sets the upper limit
on how frequent you can write to the main memory.
The write buffer works as long as the frequency of store, with respect to the time, is not too high, i.e.,
Store frequency << 1 / DRAM write cycle
Trang 30Write Buffer for Write Through
If the store are too close together or the CPU time is so much faster than the DRAM cycle time, you can end up overflowing the write
buffer and the CPU must stop and wait.
A Memory System designer’s nightmare is
when the Store frequency with respect to
time approaches 1 over the DRAM Write
Cycle Time, i.e.,
The CPU Cycle Time <= DRAM Write Cycle Time
We call this Write Buffer Saturation
Trang 31Write Buffer Saturation
In that case, it does NOT matter how big
you make the write buffer, the write buffer will still overflow because you are simply feeding things in it faster than you can
empty it
There are two solutions to this problem:
buffer and replace this write through cache with a write back cache
Trang 32Write Buffer Saturation
Processor
Cache
Write Buffer
DRAM L2
Cache
Trang 33No-write Allocate: Usually the
write-misses do not affect the cache, rather
the block is modified only in the lower
Trang 34Write-Miss Policy
The blocks stay out of the cache in no-write allocate until the program tries to read the
blocks, but
The blocks that are only written will still
be in the cache with write allocate
Let us discuss it with the help of
example
Let’s look at our 1KB direct mapped
cache again
Trang 35Cache Data
Byte 0
0 4
:
Byte 1
Byte 32 Byte 33
Cache Tag
Byte Select Ex: 0x00 9
Trang 36Write-miss Policy
Assume we do a 16-bit write to memory
location 0x000000 and causes a cache miss
in our 1KB direct mapped cache that has byte block select
32-After we write the cache tag into the cache and write the 16-bit data into Byte 0 and
Byte 1, do we have to read the rest of the
block (Byte 2, 3, Byte 31) from memory?
If we do read the rest of the block in, it is
called write allocate
Trang 37Write-miss Policy
As the principle of spatial locality
implies that we are likely to access
them soon.
But the type of access we are going to
do is likely to be another write.
Trang 38Write-miss Policy
So even if we do read in the data, we may end up overwriting them anyway so it is a common practice to NOT read in the rest of the block on a write miss.
If you don’t bring in the rest of the block, or use the more technical term, Write Not
Allocate, you better have some way to tell the processor the rest of the block is no
longer valid.
Trang 39No write-allocate verses write allocate:
Example
Let us consider a fully associative
write-back cache with cache entries that start
empty
Consider the following sequence of five
memory operations and find
The number of hits and misses when using no-write allocate verses write allocate
Trang 40No write-allocate verses write allocate:
Trang 41No write-allocate verses write allocate:
Example
So the first two writes will result in MISSES
Address [200] is also not in the cache, the reed is also miss
The subsequent write [200] is a hit
The last write [100] is still a miss
The result is 4 MISSes and 1 HIT
Trang 42No write-allocate verses write allocate:
Example
For the write-allocate policy
The first access to 100 and 200 are MISSES
The rest are HITS as [100] and [200] are both found in the cache
The result is 2 MISSes and 3 HITs
Trang 43No write-allocate verses write allocate:
Trang 44No write-allocate verses write allocate:
Conclusion
Write-through caches often use No Write Allocate, the reason is that
even if there is a subsequent write
to the block, the write must go to
the lower level memory
Trang 45Allah Hafiz
And Aslam-U-Alacum