Advanced Computer Architecture - Lecture 26: Memory hierarchy design. This lecture will cover the following: concept of caching and principle of locality; concept of cache memory; principle of locality; cache addressing techniques; RAM vs. cache transaction; temporal locality; spatial locality;...
Trang 1CS 704
Advanced Computer Architecture
Lecture 26
Memory Hierarchy Design
(Concept of Caching and Principle of Locality)
Prof Dr M Ashraf Chughtai
Trang 2Today’s Topics
Recap: Storage trends and memory hierarchy
Concept of Cache Memory
Principle of Locality
Cache Addressing Techniques
RAM vs Cache Transaction
Summary
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 2
Trang 3Recap: Storage Devices
Design features of semiconductor
Trang 4Recap: Speed and Cost per byte
SRAM
hold moderately large amount of data and instructions
– Disk storage is slowest and
cheapest
data and instructions
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 4
Trang 5Recap: CPU-Memory Access-Time
and Disk with respect to the speed of processor, as compared to that of the SRAM, is increasing very fast with
time
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 5
Trang 6CPU-Memory Gap … Cont’d
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 6
1 10 100 1,000 10,000
SRAM access time
CPU cycle time
Trang 7Memory Hierarchy Principles
The speed of DRAM and CPU
complement each other
Organize memory in hierarchy,
based on the Concept of Caching; and
– Principle of Locality
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 7
Trang 81: Concept of Caching
staging area or temporary-place to:
– store frequently-used subset of the data
or instructions from the relatively cheaper, larger and slower memory; and
– To avoid having to go to the main
memory every time this information is needed
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 8
Trang 9Caching and Memory Hierarchy
Memory devices of different type are used for each value k – the device level
– the faster, smaller device at level k,
serves as a cache for the larger,
slower device at level k+1
– The programs tend to access the
data or instructions at level k more often than they access the data at level k+1
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 9
Trang 10Caching and Memory Hierarchy
– Storage at level k+1 can be slower,
but larger and cheaper per bit
A large pool of memory that costs as much as the cheap storage at the
highest level (near the bottom in hierarchy)
serves data or instructions at the rate
of the fast storage at the lowest level
(near the top in hierarchy)
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 10
Trang 11Examples of Caching in the Hierarchy
Hardware 0
On-Chip TLB
Address translations TLB
Web browser
10,000,000 Local disk
Web pages Browser cache
4-KB page 32-byte block 32-byte block
4-byte word
What Cached
Web proxy server
1,000,000,000
Remote server disks
OS 100
Main memory
Hardware 1
On-Chip L1
Hardware 10
Off-Chip L2
AFS/NFS client
10,000,000 Local disk
Hardware+ OS
100 Main memory
Compiler 0
CPU registers
Managed By
Latency (cycles) Where Cached
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 11
Trang 122: Principle of Locality
Programs access a relatively small
portion of the address space at any
Trang 13Principle of Locality
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 13
Electr onics Computers
Chemistry Civil Engg Electrical Engg.
We select 4 books;
2 each of Electronics and
Computers; place them on
a small table for fast
access
Trang 14Types of Locality
Temporal Spatial
Temporal locality is the locality in time
which says if an item is referenced, it will tend to be referenced again soon.
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 14
Trang 16A well-written program tends to reuse data and instructions which are:
– either near those they have used recently – or that were recently referenced
themselves
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 16
Trang 17Principle of Locality
– Spatial locality: Items with nearby
addresses (i.e., nearby in space) be located at the same level, as they
tend to be referenced close together
in time
– Temporal locality: Recently
referenced items (i.e., referenced
close in time) be placed at the same memory level, as they are likely to be referenced in the near future
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 17
Trang 18Locality Example: Program
Trang 19Locality Example
Spatial Locality:
All the array-elements a[ i ] or data,
reference in succession at each loop
iteration, so all the array elements be
located at the same level
All the instructions of the loop are
referenced repeatedly in sequence
therefore be located at the same level
Trang 20Locality Example
Temporal Locality
The data, sum is referred each
iteration; i.e., recently referred data is referred in each iteration
The Instructions of a loop, sum += a[i]
Cycle through loop repeatedly
Trang 21Based on Locality Principle
How Memory Hierarchy works?
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 21
― the memory hierarchy will keep the
more recently accessed data items
closer to the processor because
chances are the processor will
access them again soon
Trang 22Based on Locality Principle
How Memory Hierarchy works?
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 22
NOT ONLY do we move the item that has just been accessed
are adjacent to it
Trang 23Hierarchy List
Register File Level 0 Datapath
Main memory Level 3 System Board DRAM
Disk cache Level 4 Disk drive
Disk Level 5 Magnetic disk
Optical Level 6 CDs etc- bulk storage
Tape Level 7 Huge cheapest Storage
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 23
Trang 24Intel Processor Cache
80386 – no on chip cache
80486 – 8k byte lines
Pentium (all versions)
– two on chip L1 caches
– Data & instructions
Trang 25Cache Devices
Cache device is a small SRAM which is
made directly accessible to the processor; and
DRAM, which is accessible by the cache as well as by the user or programmer, is
placed at the next higher level as the Memory
Main-Larger storage such as disk, is placed away from the main memory
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 25
Trang 26Cache Organization
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 26
Main Memory
Trang 27Caching in a Memory Hierarchy
is partitioned into blocks (say 16
caches a subset of the blocks (say 4 blocks ) from level k+1
Trang 28Cache Organization
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 28
Trang 29Cache Addressing – Direct Addressing
Level k: 4 blocks
addressed by 2-bit code: zz
The n th block from k+1 level is placed
0001 0101 1001 1101
0010 0110 1010 1110
0011 0111 1011 1111
Trang 30MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 30
Memory Hierarchy Terminology
Lower Level Memory
Upper Level Memory
To Processor
From Processor Blk X Blk Y
Trang 31MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 31
Memory Hierarchy Terminology
Hit: the data the processor wants to
access appears in some block in the upper level (example: Block X)
that are found in the upper level (i.e., HIT)
which consists of
(i) RAM access time
(ii) Time to determine if this is hit or miss
Trang 32MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 32
Memory Hierarchy Terminology … Cont’d
Miss: data needed by the processor is not found in the upper level and has to be
retrieved from a block in the lower level
(Block Y)
(i) to replace a block in the upper level
(ii) to deliver the block the processor
Recommendation: Hit Time must be much
much smaller than Miss Penalty, otherwise no need for memory hierarchy
Trang 33Request 14
– Object d is transferred
to CPU
Trang 34Request 12
Cache Miss
Program needs object A , which
is stored in some block C say block 12 at level K+1
Cache miss
– Block C (12 from K+1) is
not at level k- It is cache Miss
– Hence, level k cache
must fetch it from level k+1; and
– transfer object A to the
Trang 35Placement and Replacement Policies
– If level k cache is full, then some current block
must be replaced ( evicted ), which one is the
Replacement policy that defines which block should be evicted?
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 35
Trang 36Types of misses
Cold (compulsory) miss
beginning of the cache access
Capacity miss
(working set) is larger than the cache
Conflict miss
but multiple data objects all map to the same level k block.
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 36
Trang 37Conflict Miss: Example … Cont’d
If the placement policy is based on the
– Block n at level k+1 must be placed in
block (n mod 4) at level k
In this case, referencing blocks 0, 8, 0, 8, 0,
8, would miss every time as 8 mod 4 = 0,
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 37
Trang 38Cache Design
We have observed that more than one
blocks from the level k+1 memory (say of
the main memory), having N blocks, may be placed at the same location (given by N
MOD M) in the level-k memory (say cache) having M blocks
Hence, a tag must be associated with each block in the level-k (cache) memory to
identify its position in the level k+1 memory (Main memory)
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 38
Trang 39Direct Mapping Example
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 39
The 16 MB main memory has 24 address bus
It is organized in 32-bit blocks
16 K word (64 KB) cache requires 16-bit address and 8-bit tag
Trang 40Direct Mapping Address Structure
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier – for the main memory
– 8 bit tag (=22-14)
– 14 bit slot or line or index value for cache
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and
checking Tag
Trang 41Direct Mapping Cache Organization
MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 41
Trang 42MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 42
Let us consider another example with realistic numbers:
Assume we have a 1 KB direct mapped cache with block size equals to 32 bytes
In other words, each block associated with the
cache tag will have 32 bytes in it (Row 1).
0 1 2 3
Byte 63 :
Byte 992 Byte 1023 :
Cache Tag Line Number or Index
Valid Bit
:
Trang 43MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 43
Address Translation – Direct Mapped Cache
Assume the k+1 level main memory of 4GB, with Block Size equals to 32 bytes, and a k level cache
of 1Kbyte
Cache Index
0 4
31
Cache Tag
Ex: 0x01 Stored as part
of the cache “state”
Valid Bit
:
0 1 2 3
Byte 63 :
Byte 992 Byte 1023 :
Cache Tag
Byte Select Ex: 0x00 9
Trang 44MAC/VU-Advanced
Computer Architecture Lecture 26 Memory Hierarchy (2) 44
Cache Design
With Block Size equals to 32 bytes, the 5 least
significant bits of the address will be used as byte select within the cache block.
Since the cache size is 1K byte, the upper 32
minus 10 bits, or 22 bits of the address will be
stored as cache tag
The rest of the address bits in the middle, that is bit 5 through 9, will be used as Cache Index to
select the proper cache block entry