Advanced Computer Architecture - Lecture 43: Networks and clusters. This lecture will cover the following: internetworks; cluster; case studies; OSI layers; Transmission Control Protocol/Internet Protocol (TCP/IP); non-standard connections; division of memory;...
Trang 1CS 704
Advanced Computer Architecture
Lecture 43
Networks and Clusters
(Internetworks and Clusters)
Prof Dr M Ashraf Chughtai
Trang 3Recap:
In our last two lectures on Networks and
Cluster we discussed:
The formation of generic interconnection
networks and their categorization, the
networks communication model, performance, media, software, protocols, subnet and
networks topologies
Here, we noticed that a generic interconnection network comprises: Computer nodes, H/W and S/W interface, Links to the interconnection
network and Communication subnet
Trang 4The interconnect communication model
shows that two machines are connected via
two unidirectional wires with a FIFO (queue) at the end to hold the data
The communication software separates the
header and trailer from the message and
identifies the request, reply, their
acknowledgments and error checking codes
The communication protocols suggest the
sequence of steps to reliable communication
Trang 5We also discussed:
the properties and performance of interconnect network media or link – the unshielded twisted pair (UTP), coaxial cable and fiber optics
the formation of bus-based and switch-based
communication subnets and introduced the
network topologies
The bus-based communication subnets share the common media where arbitration is the
bottleneck
Trang 6 Alternative to sharing media is to use a switch
to provide a dedicated line to all destinations in order; and facilitates point-to-point
communication much faster than the shared
media
The switch-based networks are classified as
the centralized and distributed switch networks
Here the routing, to establish interconnection between two node at a time, depends on the
addressing style: source-based routing and
destination-based routing
Trang 7– Bandwidth – number or length of messages
passing per sec.
– Degree - number of links connected to a node
– Diameter - number of nodes between source and destination; this is in deed t he measure of
maximum latency
Trang 8– Bisection - the imaginary line that divides the
interconnect into roughly two equal parts, each having half the nodes
– Bisection Bandwidth: the volume of
communication allowed between any two
halves of network with equal number of nodes Last time, we discussed an intermediate class
of network interconnect – Multistage Switch
network
It is built from number of large switch boxes
each containing number of small crossbar
Trang 9The performance of Multi-stage switch lies
between performance of non-locking crossbar and bus-based networks
Following the discussion on centralized switch topologies we studied the distributed-switch
interconnects; which are categorized as the
fully -connected and partially -connected,
symmetric or asymmetric interconnects
The distributed-switch interconnect topologies, such as: linear array, ring, 2D mesh/torus and hypercube were studied
Trang 10We also discussed the relative cost and
performance of these topologies, based on the bisection bandwidth and number of links for 64 nodes network; which is shown in the
Trang 11Today’s Topics: Internetworking
So far we have been talking about the design
styles, topologies and performance of
interconnection networks
Now we are doing to talk about the connection
of two or more interconnection networks,
called Internetworking; the Internet is typical example of Internetworking
Internetworking deals with the communication
of computers on independent and incompatible
networks reliably and efficiently
Trang 12Internetworking relies on the communication
standards to convert information from one kind
of network to another
These standards are composed of hierarchy of layers, where each layer is responsible for a
portion of overall communication
Each computer, network and switch
implements its layer of standards, called the
Protocol Families or Protocol suites, and
facilitates applications to work with any
inter-connection
Trang 13OSI: 7- Layer Model
The Open Systems Interconnect – OSI
developed a 7-layer model , which describes a network as the series of layers; with
Application layer at the top (i.e., layer 7) and
Physical layer at the bottom (layer 1) and
presentation, session, transport, network and data link layers in between the top to bottom layers, as layer-6 down to layer-2 , respectively The OSI model, layer-7, the Application layer is used for applications specifically written to run over the network, e.g., Network File System (NFS) etc
Trang 14OSI Layers
The layer-6, Presentation layer translates from application to network format and vice versa
The layer-5, Session Layer , establishes
maintains and ends the sessions across the
network
The layer-4, Transport Layer , facilitates
additional connection below the session layer; the protocol is referred to as the Transmission Control Protocol - TCP
Trang 15OSI Layers
The layer-3, Network Layer , translates the
network address and names to their physical address; e.g., computer name to Media Access Control –MAC ; the layer-3 protocol is referred
to as Internet Protocol or IP
The layer-2, Data Link layer , turns packets into raw bits and at the receiving end turns bits into packets; the example protocol is Ethernet
The layer-1, Physical Layer , transmits raw stream over physical cable/media; IEEE 802 is typical example physical layer protocol
Trang 16bit-TCP/IP Families
The protocol family divides the responsibilities among the layers, with each layer offering
services needed by the layer above
The Transmission Control Protocol/Internet
Protocol - TCP/IP is the most popular
Trang 17TCP/IP Families
… layer and removing at the receiving layer
The original message, from the top layer,
includes a header and trailer sent by the level protocol
lower-The next-lower protocol in turn adds its own header (and possibly trailer) to the message and so on
If the message is too large for a particular
layer, then it is broken into smaller messages; this division of message and addition of …
Trang 18TCP/IP Families
… header and trailer continues till the message descends to the physical transmission media
The message is then sent to the destination
Each level of protocol family at the receiving
end, from bottom to the top layer, checks the message at its level and remove its header and trailer, and pass it on to the next higher level
The message is rebuilt by putting the pieces
together
Trang 19TCP/IP Families
This nesting of protocols layers is referred to
as the Protocol Stack as it reflects the Last-in First out nature of addition and removal of the header and trailer
A typical TCP/IP datagram, containing header and message, is depicted here
Fig 8.27 pp 835 Text book
Trang 20TCP/IP Families
The standard IP and TCP headers are 20 byte each, stacked as shown
However, the length can optionally be
increased which is specified by the length field (L)
The length of the whole datagram is identified
by a separate field ‘Length’ in IP header, while the TCP header includes this information in the
‘ sequences number field’
Trang 21As the detailed discussion on the TCP / IP is
beyond the scope of this course on Computer
Architecture, therefore
we are leaving this discussion here and are going
to talk about ‘ cluster ’, the last topic of our study of the ‘Networks and Cluster ’;
rather the last topic of this course on Advance
Computer Architecture
However, the students interested in further study
of Internetworks may consult literature and books
on Computer Networks and Internetworking
Trang 22Clusters – System Area Networks
The coordinated use of interconnected
computers in a machine room is referred to as the cluster or System Area Network
Massively parallel machine providing high
bandwidth can be built from off-the-shelf
components, instead of depending on the
custom machines or networks
A cluster , i.e., a collection or bunch of top computer and disk offers low cost
desk-computing infrastructure that could tackle very large problems and applications, such as: …
Trang 23Clusters Performance Challenges
databases, file servers, Web servers,
simulation and multiprogramming and batch
Let us talk about these confronts one by one
Non-Standard Confront: As you know that the multiprocessors are usually connected ……
Trang 24Clusters Performance Challenges
…… connected memory bus which offers high bandwidth and low latency; and
Contrary to this, the clusters are connected
using I/O bus of the computer, thus have large conflicts at high speed
Division of Memory: A large single program
running on a cluster of N machines requires N independent memory units and N copies of
operating system; on the other hand,
a shared address multiprocessor allows to use almost all memory in the computer
Trang 25Clusters Performance Advantages
However, contrary to these challenges,
clusters have advantages in respect of
dependability and scalability
The weakness of separate memories for
program size in case of cluster , as discussed earlier, is indeed a strength in terms of system availability and expandability (or say the
scalability)
Furthermore, as the cluster consists of
independent computers connected through …
Trang 26Clusters Performance Advantages
… LAN, and cluster software is a layer that
runs on top of the local operating system,
therefore,
it is easier as compared to the multiprocessor,
to replace any computer without bringing down the all computer of the cluster, hence a cluster offers high dependability and scalability
Furthermore, it is easier to expand a cluster,
therefore, it is attractive to the world wide web service providers
Trang 27Cluster Design Examples
In order to study practical aspects of cluster
designs, we are going to discuss different
cluster design comprising: 32 processors, 32
GB DRAM, and 32 or 64 disks
For different cluster designs let us consider III processors operating at clock rate of 700
P-MHz and 1000 P-MHz include large L2 cache
ranging from 256 KB to 1MB
However, note that due to larger die size, the
processor chip price with 1 MB cache is double
as compared to that of with 256KB Cache chip
Trang 28Cluster Design Examples … Cont’d
In cluster design, the higher chip price of chip matters little; but the objective is to minimize cost for desired performance target
We are considering following four cases:
1 Cost of cluster hardware with local disk
2 Cost of cluster hardware with disk over SAN
(system or storage area network)
3 Cost of cluster options that is more realistic
4 Cost and Performance of a cluster for
transaction processing
Trang 29Cluster Design Examples … Cont’d
Example 1
In order to discuss the first case, Cost of cluster hardware with local disk, let us consider three logical organization of clusters:
a)Uniprocessor Cluster
b)2-way SMP (Symmetric Shared Memory Processor) cluster
c)8-way SMP cluster
Trang 30Uniprocessor Cluster Design
The Uniprocessor cluster organization, shown
here, consists of 32 xSeries 300 computer (for 32 processors)
Fig 8.34 a pp 846
As the maximum memory for this computer is 1.5
GB, so it easily allows desired 32 GB (1 x 32)
memory
As each computer has 2 disk drives each of 36.4
GB so it yield 32 x 2 36.4 = 2330 GB
The organization uses the built-in slots for
storage, so computer can accept its own G-bit …
Trang 31Uniprocessor Cluster Design
… hot adopter, hence 32 cables are available for the IGB Ethernet switch
However, as the switch has 30 ports, therefore 2 switches are used
These two switches are connected together with
4 cables, leaving 56 ports for 32 computers
The standard rack is 19” x 30” x 72” [W x D x H] and can accommodate 32 uniprocessors
computers, 34 rack units (32 for computers and 2 for switches)
This design is cost effective
Trang 322-way SMP Cluster Design
In we use the 2-processor computer of xSeries
330, as shown here, every thing is halved
Fig 8.34 (b) pp 846
Here, 32 processor need only 16 computers, a
single 30-port switch can work as there are 16
cables to be interfaced
Furthermore, the rack size 18 RU instead of 44 RU which is less than half the standard size
Trang 338-way SMP Cluster Design
The 8-processor computer of xSeries 370, as
shown here, only 4 computer are used at they
contain 4 x 8 =32 processors
Fig 8.34 (c) pp 846
The maximum memory is 32 GB but we need only
8 GB per computer and for 4 computers only
8-port switch is sufficient
However, at 2 disk per computer, the 4 computers can hold 8 disks with maximum capacity per disk 73.4 GB; hence we need expansion storage box which can hold up to 14 disks; and 2 racks are
Trang 34Comparison of 3-Cluster Designs
The price of three clusters with a total of 32
processors, 32 GB memory and 2.3 Tetra-byte
disk is shown here
Fig 8.35 848
Note that the network cost decreases as the size
of the SMP increases; because the memory
buses supply more of the inter-processor
communication
Furthermore, the 4 of the 8-way SMP cost more than 32 Uniprocessor computers
Trang 35Comparison of 3-Cluster Designs
The price of three clusters with a total of 32
processors, 32 GB memory and 2.3 Tetra-byte
disk is shown here
Fig 8.35 848
Note that the network cost decreases as the size
of the SMP increases; because the memory
buses supply more of the inter-processor
communication
Furthermore, the 4 of the 8-way SMP cost more than 32 Uniprocessor computers
Trang 36Cluster Design Examples … Cont’d
Example 2
Now let us discuss the 2 nd case, Cost of cluster hardware using SAN (storage area network) for disks
In the previous we set the disks local to the
computer which reduces the cost and space
However, it offers the following problems for the operator
1: No protection against single disk failure
Trang 37Cluster Design Examples … Cont’d
This results in system-down state on the disk failure
To overcome this problem, a RAID controller and Fiber Channel Arbitrated Loop (FC-AL), is used as the storage area network (SAN)
In this case all the SCSI disks are replaced
FC-AL disk behind the RAID storage server
Note that FC-AL can be connected in a loop
with up to 127 devices
Trang 38Cluster Design Examples … Cont’d
The price comparison for the three clusters,
using SAN, show here
Trang 39Cluster Design Examples … Cont’d
However, the software (data base) cost and
hardware maintenance cost ( cost of operator
to keep the machine running) has not been
considered
Trang 40Cluster Design Examples … Cont’d
The other costs include the cost of backup tapes, cost of space to house the servers
A complete comparison of the earlier three
clusters including the other cost is shown here
Fig 8.39 pp 852
It shows that 2-way SMP using SAN is lowest in
total price
However, over the 3 years, the cost of operator will
be more than the cost of the hardware; so we must reduce the purchase cost of the old computers to reduce the overall cost
Trang 41Cluster Design Examples … Cont’d
Example 4
Now let us discuss the 4 th case, cluster design
for transaction processing , shown here
Fig 8.40 pp 853
The cluster has 32 P-III processors, using the same IBM computer as the basic building block which was employed in earlier design