Advanced Computer Architecture - Lecture 40: Input/Output systems. This lecture will cover the following: RAID and I/O system design; redundant array of inexpensive disks; I/O benchmarks; I/O system design; service accomplishment; service interruption; network attached storages and reliability;...
Trang 1CS 704
Advanced Computer Architecture
Lecture 40
Input Output Systems
(RAID and I/O System Design)
Prof Dr M Ashraf Chughtai
Trang 3MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 3
Last time we compared the performance of disk storage and flash memory
We noticed that flash is six times faster than
the disk for read and the disk is six time faster than the flash for data write
– Then we discussed the trends in I/O
inter-connects as: the networks, channels and
backplanes
– The networks offer message-based
narrow-pathway for distributed processors over long
distance
Trang 4The backplanes offer memory-mapped wide
pathway for centralized processing over short distance
The interconnects are implemented via buses
The buses are classified in two major
categories as the I/O bus and CPU-Memory bus
The channels are implemented using I/O buses and backplanes using CPU-Memory buses
Trang 5MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 5
Then we discussed the bus transition protocols
which specify the sequence of events and
timing requirements in transferring information
as synchronous or asynchronous
communication
We also discussed bus arbitration protocols ― the protocols to reserve the bus by a device
that wishes to communicates when multiple
devices need the bus access
Here, we noticed that the bus arbitration
schemes usually try to balance two factors:
Trang 6 Bus-priority: the device with highest priority
should be serviced first
Fairness: every device that want to use the bus
is guaranteed to get the bus eventually
The three bus arbitration schemes are:
Daisy Chain Arbitration
Centralized Parallel Arbitration
Distributed Arbitration
Trang 7Storage I/O Performance
Now having discussed the basic types of
storage devices and the ways to interconnect them to the CPU, we are going to look into the ways to evaluate the performance of storage I/O systems
We know that if a storage device crashes then prime objective of a storage device should be
to remember the original information to make storage device reliable
The reliability of a system can be improved
by using the following four methods
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 7
Trang 8Reliability Improvement
Fault Avoidance – prevent fault occurrence by
construction
Fault Tolerance – providing service complying
with the service specification
by redundancy
Error Removal – minimizing the presence of
errors by verification
Error Forecasting – to estimate the presence,
creation and consequence
of errors by evaluation
Trang 9Reliability, availability and dependability
The performance of storage I/Os is measured
in terms of its reliability, availability and
dependability
These terminologies have been defined by
Laprie; in the paper entitled
‘Dependable Computing and Fault Tolerance: Concepts and Terminology;
published in the Digest of papers of 15 th
Annual Symposium on Fault Tolerant
Computing (1985)
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 9
Trang 10Laprie defined dependability as the quality of delivered service such that reliance can
justifiably be placed on this service;
where the service delivered by a system is its observed actual behavior and the system
failure occurs when actual behavior deviates from the specified behavior
Note that a user perceives a system alternating between two states of delivered service; these states are:
Trang 11 Service Accomplishment – service is
delivered as specified and
Service Interruption – delivered service is different from the specified service
Quantifying the transitions between service accomplishment and service interruption is the measure of the dependability
The dependability is measured in terms of the measure of:
module reliability, which is the measure
of the continuous service accomplishment;
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 11
Trang 12Measuring Reliability
and, module availability, which is the
measure of the swinging between the accomplishment and interruption states
of delivered service
Now before we discuss the reliable and
dependable designs of the storage I/O let us
understand the terminologies used to measure reliability, availability and dependability
The reliability of a module is the measure of
the time to failure from a reference initial
instant
Trang 13Measuring Reliability … Cont’d
In other words we can say the Mean Time To Failure (MTTF) of a storage module, a disk, is the measure of reliability; and
The reciprocal of the MTTF is the rate of
failure; and
the service interruption is measured as the
Mean Time To Repair (MTTR)
Now let us understand, with the help of an
example, how can we use these terminologies
to measure the availability of a disk subsystem
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 13
Trang 14Measuring Reliability: Example
Consider a disk subsystem comprising the following component
For the given values of MTTF of each
component; find the system failure rate and hence the system MTTF
Trang 15Reliability Example … Cont’d
10 disks, each with MTTF = 1,000,000 Hrs 1SCSI controller with MTTF = 500,000 Hrs
1 SCSI cable with MTTF = 1,000,000 Hrs
1 power supply with MTTF = 200,000 Hrs
Solution:
System Failure Rate = 10 (1/1,000,000) +1/500,000 +
1/1,000,000 + 1/200,000 +1/200,000
= 23 /1,000,000 Hrs
System MTTF = 1/Failure Rate = 1,000,000/23
= 43,500 Hrs = 5 yearsMAC/VU-Advanced Computer Architecture Lecture 40 Input / Output System (3) 15
Trang 16The availability of a module is the measure of the service accomplishment with respect to the swinging between the two states of
accomplishment and interruption
The module availability, therefore can be
quantified as the ratio of the MTTF and Mean Time Between Failure – MTBF (which is equal
to the sum of MTTF and MTTR); i.e.,
Availability = MTTF / (MTTF +MTTR)
= MTTF / MTBF
Trang 17Network Attached Storages and Reliability
Last time we discussed the disk storages and their interface with the processor using
channel and backplane interconnects; and
talked about the impact of disk storages and interconnects on the overall performance of the complete computing system
Today we will discuss the network
interconnects to interface multiple processers located at a long distance and need high
performance storage service
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 17
Trang 18Network Attached Storages and Reliability
A network provides well defined physical
and logical interfaces; i.e., interconnect
separate CPU and storage system
The networks are capable of sustaining
high bandwidth transfer and their file-server Operating system supports remote file
access
Hence, the network attached storages are
more vulnerable to the reliability and their dependability is very high
Trang 19Network Attached Storage
Decreasing Disk Diameters
Increasing Network Bandwidth
Network File Services
3 Mb/s » 10Mb/s » 50 Mb/s » 100 Mb/s » 1 Gb/s » 10 Gb/s
networks capable of sustaining high bandwidth transfers
Network provides
well defined physical
and logical interfaces:
separate CPU and
storage system! OS structures supporting remote
file access
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 19
Trang 20Network Attached Storages and Reliability
So to improve both the availability and
performance of storage system, disk arrays
are introduced, which contain many low cost disks
The throughput of disk arrays is improved by having high bandwidth disk system which
employ many small disk drives; and
The throughput of a disk array is increased by having many small arms on small (3.00” – 1.8”) disk drives rather than one long arm on a
larger disk (14” – 24”); and
Trang 21Manufacturing Advantages of Disk Arrays
3.5”
Disk Array: 1 disk design
Conventional: 4 disk designs
Disk Product Families
Trang 22Replace Small # of Large Disks with Large #
of Small Disks! (1988 Disks)
11 W 1.5 MB/s
Disk Arrays have potential for
Trang 23Network Attached Storages and Reliability
Simply spreading the data over many disk
forces access to may several disks and hence improve the throughput
The drawback to an array with more devices is
that dependability and hence the reliability
decreases – generally N devices have 1/N
reliability
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 23
Trang 24Array Reliability: Example
Reliability of N disks = Reliability of 1 Disk ÷ N
Disk system MTTF = 50,000 Hours ÷ 70 disks
= 700 hours
Drops from 6 years to 1 month!
However, the dependability can be improved by adding redundant disks to the array to tolerate
faults
Arrays (without redundancy) too unreliable to be
Trang 25Subsystem Organization
controller
single board disk controller
single board disk controller
single board disk controller
single board disk controller
host adapter
in small format devices
striping software off-loaded from
host to array controller
Trang 26Redundant Arrays of Disks
In a disk array, files are "striped" across multiple spindles
Adding redundant disk to achieve high fault
tolerance yields high data availability
Here, if Disks fails, the contents are reconstructed from data redundantly stored in the array
However the drawbacks of redundant disk are:
Capacity penalty to store it
Bandwidth penalty to update
Trang 27System-Level Availability
Fully dual redundant
I/O Controller I/O Controller
Array Controller Array Controller
Recovery
Group
Goal: No Single Points of Failure
Goal: No Single Points of Failure
with duplicated paths, higher performance can be
obtained when there are no failures
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 27
Trang 28Redundant Arrays of Disks
These systems are known as RAID:
Redundant Array of Inexpensive Disks or
Redundant Array of Independent Disks
There exist several different approaches
to include redundant disks in the disk
array
These approaches are usually classified
by numerical value which identifies the
RAID level
Each of these techniques have different
Trang 29Redundant Arrays of Disks
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 29
The fault tolerance and overhead in redundant disk for RAID having 8 disks of user data is as given below:
0 No Redundancy 0 0
1 Mirrored 1 8
2 Memory –Style ECC 1 4
3 Bit Interleaved Parity 1 1
4 Block Interleaved Parity 1 1
5 Block interleaved 1 1
distributed parity
6 P+Q Redundancy 2 2
Trang 30RAID 0 – Non Redundant Striped
RAID 0 is the disk array without any redundant disk
However, here the data is stripped across a set
of disks which makes the collection appears to the software as a single large disk
Note that the taxonomy RAID 0 is a misnomer
as there is no redundant disk; but as the data stripping is used here, so it is normally
referred to as the RAID
Trang 31RAID 1: Disk Mirroring/Shadowing
Each disk is fully duplicated onto its "shadow“ Targeted for high I/O rate
Whenever data are written to one disk those data are also written to redundant disk
recovery group
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 31
Trang 32RAID 1: Disk Mirroring/Shadowing
If a disk fails, the system just goes to the
mirror, so there are 8 survivals in this example provided one disk of mirrored pair fails
It is the most expensive solution: 100%
capacity overhead
One Logical write = two physical writes
If the data worth 4 disk is to be striped and
stored on 8 disks, there are two way to strip
the data
Trang 33RAID 1: Disk Mirroring/Shadowing
Note that since 2001, as there is no commercial
implemen-tation of RAID 2, we will not discuss this technique
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 33
Trang 34RAID 3: Bit-Interleaved Parity Disk
Rather than having a complete copy of the
original disk, we can achieve desired
dependability by adding enough redundant
information to restore the lost information on
failure
RAID 3 uses one extra disk, called Parity disk,
that holds the check information in case of failure RAID 3 act logically as single high capacity, high transfer rate disk
The arms are synchronized logically and spindles rotationally
Trang 35RAID 3: Bit-Interleaved Parity Disk
1 1 0 0 1 1 0 1
1 0 0 1 0 0 1 1
0 0 1 1 0 0 0 0
Trang 36RAID 3: Bit-Interleaved Parity Disk
Here, every read or write access goes to all the disk
For every read access, the parity is computed across recovery group to protect against hard disk failures
Note that for the RAID 3 shown here, there is 33% capacity cost for parity
However, the wider arrays reduce capacity
costs, but decreases expected availability and increases reconstruction time
Trang 37RAID 4:Block-Interleaved Parity and
RAID 5:Distributed Block-Interleaved Parity
Both the RAID 4 and RAID 5 levels use the same ratio of data disk to parity disk as
RAID 3, but they access data differently
The distribution of data in RAID 4 verses
RAID 5 is shown here
In the Block-Interleaved Parity RAID 4, the parity disk is associated to each data block, identical to RAID 3
So it supports a mixture of small read and small writes and large read and large writes
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 37
Trang 38RAID 4: Block-Interleaved Parity
D20 D21 D22 D23 P5
Disk Columns
Increasing Logical Disk Addresses
Stripe Stripe Unit
Trang 39RAID 4:Block-Interleaved Parity and
RAID 5:Distributed Block-Interleaved Parity
However, one drawback of this system is that the parity disk must be uploaded on
every write, which is bottleneck for back write
back-to-This bottleneck is resolved in Block
interleaved parity RAID 5, where the parity disk is distributed among the blocks
Note from the RAID 5 organization shown here that the parity associated each row of the data block is no longer restricted …
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 39
Trang 40RAID 5: Distributed Block-Interleaved Parity
Stripe Stripe Unit
Trang 41RAID 4:Block-Interleaved Parity and RAID 5:Distributed Block-Interleaved Parity
… to a single disk
Hence, this organization allows multiple
writes to occur simultaneously as long as the stripe-units are not located in the same disk
For example:
1st write to block 8 must also access its
parity block P2 (i.e., two reads from two
disks – the 1st and 3rd disks) and
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 41
Trang 42RAID 4 and RAID
2nd write to block 5 imply an update in P1
(i.e., two reads from two disks – the 2nd and
4th disks)
Thus, the two write could occur at the same time in parallel
Where as we you look into the organization
of RAID 4, both the P1 and P2 are on the
same disk (5th disk) so it would be
bottleneck and could not be written
simultaneously
Trang 43RAID 4 and RAID 5
In RAID 4 and RAID 5, the parity is stored as blocks and is associated with a set of data
blocks
In RAID 3 every access goes to all the disks while the levels 4 and 5 use smaller accesses which allow independent access to occur in parallel
In RAID 4 and RAID 5 error detection
information in each sector is checked
independently for ‘small reads’ to see if the data are correct in one sector
MAC/VU-Advanced
Computer Architecture Lecture 40 Input / Output System (3) 43