Advanced Computer Architecture - Lecture 32: Memory hierarchy design. This lecture will cover the following: main memory performance; virtual memory performance; destination virtual memory; DRAM logical organization; double data rate DRAM; optimizes sequential access; avoid handshaking; multiprocessor demand higher bandwidth;...
Trang 1CS 704
Advanced Computer Architecture
Lecture 32
Memory Hierarchy Design
(Main and Virtual Memories)
Prof Dr M Ashraf Chughtai
Trang 2Today’s Topics
Recap: Memory Hierarchy and Cache performance
Main Memory Performance
Virtual Memory Performance
Summary
MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 2
Trang 3design goal of memory system
Low cost as of cheapest memory fast speed as of fastest memory
Recap: Memory Hierarchy
Trang 4MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 4
The fastest, smallest and most costly memories The slowest, biggest and cheapest memories
Recap: Memory Hierarchy
Trang 5– Average access speed
– Cost
– Cheapest technology
Semiconductor memories
Static and Dynamic RAMs
Upper levels in the memory hierarchy
Recap: Memory Hierarchy
Trang 6MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 6
The Caches use Static Random Access
Trang 7Cache and main memory are organized in equal sized blocks
Word transfer
Bock transfer
The CPU requests contents of main
memory
Word transfer is fast
Recap: Cache Design
Trang 9Organizations of main memory
Source for Caches
Destination virtual memory
Main Memory Organization
Trang 10DRAM logical organization (4 M Bit)
MAC/VU-Advanced
Computer Architecture Lec 32– Memory Hierarchy Design (8) 10
Column Decoder Sense Amps & I/O
Memory Array
Trang 11Main Memory Performance
Trang 12Main Memory Performance
MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 12
Fast page mode
Optimizes sequential access
Synchronous DRAM (SDRAM)
Avoid handshaking
Double Data Rate (DDR) DRAM
Transmit data
Trang 14MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 14
Inputs/outputs and multiprocessors
Low-latency memory
Multiprocessor demand higher bandwidth
2 nd level caches with larger block size
Main Memory Performance
Trang 15The most commonly used techniques are
Improving Main Memory Performance
Trang 16MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 16
1: Wider Main Memory
Trang 171: Wider Main Memory
Main Memory
L1 cache
Wider L2 Cache
Trang 18MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 18
1: Wider Main Memory: Example
4 words (i.e 32 byte) block
– Time to send address = 4 clock cycles
– Time to send the data word = 4 clock cycles
– Access time per word = 56 clock cycles Miss Penalty =
No of words x [time to: send address + send data word + access word]
Trang 191: Wider Main Memory
1: For 1 word organization
Miss Penalty = 4 x (4 +4+56) = 4 x (64)
The memory bandwidth = bytes/clock cycle
= 32/256 = 1/8 byte /cycle
2: For 4-word organization
Miss Penalty = 1 x (4 +4+56) = 64 Clock Cycles; and
Memory bandwidth = 32/64 = 1/2 bytes/cycle;
Trang 20MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 20
1: Wider Main Memory: Demerits
Main Memory
L1 cache
Wider L2 Cache
Trang 212: Interleaved Memory
Trang 22MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 22
2: Interleaved Memory
Trang 232: Interleaved Memory
– bank 0 has all word whose: Address MOD 4 = 0 – bank 1 has all word whose: Address MOD 4 = 1 – bank 2 has all word whose: Address MOD 4 = 2 – bank 3 has all word whose: Address MOD 4 = 3
Word
address
Word addres s
Word address
0 4 8
1 5 9
2 6
3 7
Trang 24MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 24
Bandwidth Calculation:
bandwidth of 4 words interleaved memory using the time model as used in case of wider memory
The miss penalty for 4-word interleave memory is:
= time to send address + time to access +
number of banks x time to send data
= 4 + 56 + 4 x 4 =76 clock cycles
Bandwidth = 32/76 = 0.4 byte per clock
Bandwidth = 32/256= 1/8 = 0.125 byte per clock
Trang 253: Independent Memory Banks
Memory banks offer independent accesses
Multiprocessors
I/O
CPU with Hit under n Misses
Non-blocking Caches
Trang 26MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 26
3: Independent Memory Banks
Superbank
Trang 273: Independent Memory Banks
– An input device may use one controller and one bank
– The cache read may use another and
– The cache write still another
Trang 28MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 28
– Using memory banks
– Making memory and its bus wider
– Doing both
How many the banks should be there?
Trang 29Summary: Main Memory Bandwidth Enhancement
This decision is essential to ensure that
if memory is being accessed sequentially
(e.g when processing an array)
then by the time you try to read a second
word from a bank, the first access has finished
Otherwise it will return to original bank
before it has the next word ready
Trang 30MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 30
Summary: Main Memory Bandwidth Enhancement
8 banks, each of 64-bit
Access time of 10 clock cycle
– Clock cycle 1
– Bank 0 after 10 clock cycles
– After 10 clock cycles,
– The bank 0 would fetch the next desired word – 7 banks sequentially till the 18 th clock cycle
Trang 31Summary: Main Memory Bandwidth Enhancement
– 18 th clock
– Bank 0
– CPU cannot start fetching
– Clock cycle 20
– 10 clock cycles again
Number of bank ≥ Number of clock cycles to
access word in bank
Trang 33Virtual Memory System
Increasing gap
High cost of main memory
Physical DRAM as a cache for the disk
Single level store
Trang 34Virtual Memory system … Cont’d
Single level storage
Virtual Memory System
Manages two levels of memory
hierarchy
Main memory and secondary storage
Segments, named as a page
MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 34
Trang 35Virtual Memory system … Cont’d
Page
Block
Contiguous pages
Trang 36Virtual Memory System … Cont’d
Physical Main Memory
A B C D
Virtual Memory Address space
Disk
Virtual Addresses
Physical Addresses
0 4k 8k 12k 16k 20k : :
0 4k 8k 12k 16k 20k 24k 28k 32k 36k 40k : :
C
Trang 37Virtual Memory: Attributes
– Protection
– Relocation
Trang 38Virtual Memory: Attributes … Cont’d
Trang 39Page Tables
Process i:
Physical Addr Read? Write?
Trang 40Virtual Memory: Attributes … Cont’d
Relocation
– Simplifies loading of program
– Allows to place a program anywhere
Trang 41Cache verses Virtual memory
– Page or segment is used for block
– Page fault or address fault is used for miss
– CPU produces virtual address
– The virtual addresses are translated to the
main memory or physical addresses
Trang 42Cache verses Virtual memory
Trang 43Cache verses Virtual memory
Virtual Address
Page Table
Main Memory Physical Address
Trang 44Cache verses Virtual memory
– Replacement on cache miss
– Page fault
– The size of processor address
– Cache size is independent of the processor
address
MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 44
Trang 45Cache verses Virtual memory
– Secondary storage
– Lower-level backing store for main memory
– File system occupies the space on secondary
storage
Trang 46Issues of Virtual Memory Design
Trang 47Issues of Virtual Memory Design
performance
size
Trang 48Typical System with Virtual Memory
Trang 49Typical System with Virtual Memory
The CPU generates the Virtual Address
Operating system manages a lookup table
Location of the page or segment
Virtual addresses to physical addresses
Trang 50Page Faults (like “Cache Misses”)
– Current process suspends
– OS has full control over placement
MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 50
Trang 51Page Faults (like “Cache Misses”)
CPU
Memory Page Table
Trang 52Servicing a Page Fault: 3 steps
disk Disk disk
I/O controller
Trang 53Servicing a Page Fault: 3 steps
I/O controller
Reg
(2) DMA Transfer
(1) Initiate Block Read
Trang 54Servicing a Page Fault: 3 Steps
disk Disk disk Disk
I/O controller
Reg
(2) DMA Transfer
(1) Initiate Block Read
(3) Read Done
MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 54
Trang 55Main memory design
Methods to improve the bandwidth of main memory
Concept of Virtual Memory
Servicing the page fault in Virtual Memory
Trang 56Allah Hafiz
MAC/VU-Advanced
Computer Architecture Lec 32 Memory Hierarchy Design (8) 56