Chapter 15 - Distributed operating systems. This chapter discusses important features of these components and the manner in which these features influence the computation speedup, reliability, and performance that can be achieved in a distributed system.
Trang 1PROPRIETARY MATERIAL. © 2007 The McGrawHill Companies, Inc. All rights reserved. No part of this PowerPoint slide may be displayed, reproduced or distributed
Trang 2– Thus, each node performs some OS functions
* Data used by an OS function may be spread across many computers
* Non-local data is accessed through the network
Trang 3Distributed Systems
• Distributed systems consist of four components
– Individual computer systems
– Network connecting the computer systems
– Distributed computations
– Distributed operating system
We discuss the basics of these four components in this chapter
Trang 4Benefits of distributed systems
• Distributed systems provide five key benefits
* Cost of enhancing a capability is α additional capability desired
Made possible by open system standards
Trang 5Nodes in a distributed system
• Nodes can be of different types
– Workstation
* Has a single CPU and single user
– Minicomputer
* Has a single CPU but many users
It is also called a process pool node
– Cluster
* A group of nodes that work together in an integrated manner
Trang 7Operating systems for a distributed system
• The OS must provide
– Resource sharing across boundaries of systems
– Computation speed-up of applications
– Reliability
– Good performance of the distributed system
• Two kinds of operating systems
– Network operating systems
* Only provide resource sharing
– Distributed operating systems
* Integrate functioning of individual computers
Trang 8A network operating system
• The network OS layer exists between a process and the kernel of OS
• It recognizes requests for access to remote resources; implements them
• It passes other requests to the kernel
Trang 9Distributed Operating Systems
• Differences with a conventional OS
– A distributed OS integrates functioning of individual computers and scatters processes of an application to various nodes
* Achieves computation speed-up and resource efficiency
* Helps in providing reliability
– Examples
* Windows cluster
Node manager detects faults, failover manager provides reliability
* Sun Cluster software
Global process management, distributed file system enable
process migration when a failure occurs
* Amoeba distributed OS
Trang 10Reliable interprocess communication
• Communication between processes takes place through the network It raises following issues
– Naming of processes
* Processes should be able to find each other’s network addresses
Address of a process is a pair (<host_name>, <process_id>)
The domain name service (DNS) is a distributed service for
obtaining the IP address of each computer
* A name server is a generic name for this arrangement
– Reliability of communication
* Interprocess messages may be lost due to congestion in the network
Trang 11Interprocess Communication (IPC) Protocols
• Processes in a distributed application communicate
through messages sent using an interprocess
communication (IPC) protocol
– IPC protocol is made reliable through three features
Trang 13IPC Protocols
• An IPC protocol specifies actions to be performed in the sender and destination processes of a message
– A reliable protocol guarantees delivery of a message
* It has at-least-once or exactly-once properties
– A blocking protocol blocks the sender of a message until the
message is delivered
* This action simplifies the protocol and reduces its memory requirements
Trang 14Blocking version of Request-reply-acknowledgment (RRA) protocol
• Sender site copies the message in a buffer, sends it and blocks the sender
• Receiver saves reply in buffer and sends it; resends if duplicate request recd
• Sender sends an acknowledgment (ack) of the reply
• Timeouts and retransmissions occur in both sender and receiver
Trang 15Non-blocking version of Request-reply (RR) protocol
• Sender site buffers the request and also send it; sender is not blocked
• Receiver computes a reply if not a duplicate, buffers and also sends to sender
• Timeout and retransmission can occur in the sender
• Sender is interrupted when a reply is received to its message
Trang 16Simplification due to idempotency
• Idempotency simplifies IPC protocols
– Definition: Idempotent computation is one that yields same result
if recomputed
* i := 5 is idempotent
* i := i + 1 is not idempotent
– A duplicate message can be reprocessed if its processing
involves idempotent computations
* Duplicate messages need not be discarded !
It simplifies an IPC protocol
However, may make it slower
* Read / write operations on files are idempotent
Distributed file systems can omit discarding of duplicates
Trang 17IPC protocols
• Q: Analyze the RRA and RR protocols and determine
– their buffering requirements
* RRA
Destination site needs one buffer for each sender process
Releases buffer on receiving ack or next request
* RR
Sender may send more requests before receiving ack
Destination site needs many buffers for same sender process
When can a buffer be released?
– their semantics
* Both RRA and RR: Basically at-least-once semantics
* Provide exactly once semantics if duplicate requests are discarded
Trang 18Distributed computation paradigms
• A distributed application may organize data in several
* Parts of data are kept in different nodes
A distributed computation paradigm provides effective support for
distributed computations
Trang 19Modes of accessing remote data
• Data in a distant node can be accessed in three ways
– Remote data access
* Data is accessed over the network
* Slows down operation of the application
– Data migration
* Data is moved to the site of the application
* May complicate management of replicated data
– Process migration
* An application process is moved to the site of the data
Trang 20Distributed Computation Paradigms
• Features of three paradigms
– Client–server computing
* A client invokes the server, the server provides a service
* Used for remote data access; not suitable for distributed computing
– Remote procedure call (RPC)
* The remote procedure is installed by a system administrator and
registered with a name server
* Provides exactly-once semantics; used for distributed computing
– Remote evaluation
* Program uses the statement: At <node> eval <code_segment>
* Compiler makes provision to transfer <code_segment> to <node>
* <code_segment> is executed at remote node and results returned
Trang 22RPC implementation
• Client stub marshals the parameters, converts to m/c independent form,
consults name server to find location of the remote procedure
• Server stub extracts parameters, invokes procedure, packs the results
Trang 23• Case studies
– Sun RPC
* Designed for client–server computing; has at-least-once semantics
* Interface language (XDR) and interface compiler (rpcgen)
Rpcgen produces stubs, remote procedure and a header file
Remote procedure accepts a single parameter
– Java RMI
* The server creates a remote object whose methods offer services
* Services are registered with the rmiregistry name server
* Client consults rmiregistry for a service, obtains a handle for it
* A serializable object can be passed as parameter, the service can
invoke its methods (resembles remote evaluation)
Trang 25Types of networks
• The LAN is confined to a laboratory, a building or a cluster of buildings
• The WAN connects geographically distant nodes
Trang 26Network topologies
• The star topology has a single point of failure
• Bidirectional ring can tolerate link failures, but not failure of intermediate nodes
• Fully and partially connected topologies offer tolerance of link and node failures
Trang 27* A token circulates over the ring, has a free / busy flag
* Station transmits when it sees token with free flag
– Asynchronous transfer mode (ATM) technology
* Cell (i.e., packet) size 53 bytes: compromise for data & audio applns
* Uses virtual path—specific bandwidth is reserved on physical links
* Virtual channel is given specific bandwidth in a virtual path
Trang 28An ATM switch
• Virtual path id is ‘translated’ by the switch: VPI should be unique only in a link
Trang 29Connection strategies
• A connection strategy
– decides when, and for how long, to set up a connection
* A connection is a data path between processes
– It is also called a switching technique
– It influences communication efficiency and throughput of links
– Three connection strategies
Trang 30Connection strategies
(a) All messages between the processes use the same connection
(b) A connection is set up for each message
(c) A connection is set up for each packet in a message
Trang 31Routing strategies
• The routing function decides which network path would
be used by a connection
– It enables the system to adapt to changing traffic patterns
– Three routing strategies
Trang 32Routing strategies
(a) Same path is used for communication between all processes in a pair of nodes(b) A path is chosen for communication between a pair of processes
(c) A path is chosen for each message or each packet
Trang 33Network Protocols
• A network protocol is a set of rules and conventions
used to implement communication
– It addresses four concerns
* Naming of sites
* Efficient name resolution
* Communication efficiency
* Handling of faults
– It consists of a hierarchy of protocols that address specific
concerns Hence it is called a protocol stack
* The ISO protocol consists of 7 protocols
* The TCP / IP protocol consists of 4 protocols
Trang 34The ISO protocol
• The ISO protocol consists of 7 protocol layers
– Physical layer
* Provides electrical mechanisms for bit transmission
– Data link layer
* Collects bits into frames, performs error detection and flow control
Trang 35The ISO protocol
• The ISO protocol (contd)
– Session layer
* Initiates and terminates sessions between processes
* Provides for restart and recovery
Trang 36Operation of the ISO protocol stack
Trang 37The transmission control protocol / internet protocol
(TCP / IP) stack
• IP is a connectionless, unreliable protocol for communication between hosts
• TCP is a connectionoriented reliable protocol
• UDP is a connectionless, unreliable protocol
Trang 38The TCP / IP protocol stack
• Features of the protocols
– TCP
* Connection-oriented, reliable protocol
* Uses a virtual circuit between processes, retransmits on time-out
* Employs flow control so that receiver can accept messages at the
rate at which the sender sends them
It controls retransmission overhead
– UDP
* Connection-less, unreliable protocol
* Used in multi-media applications and video conferencing because loss of packets is tolerable
– Higher layer protocols
* File transfer, SMTP (e-mail), remote log in, etc
Trang 39Network bandwidth and latency
• Bandwidth
– Rate at which data is transferred over the network
* Depends on link capacities, error rates and delays
• Latency
– Elapsed time between sending and receiving of a byte
* Processing time in protocol layers and delays due to network congestion contribute to it
* Typically computed for the first of the bytes to be transferred
Trang 40Modeling a distributed system
• A distributed system is modeled as a graph
S = (N, E)
where N is the set of nodes E is the set of edges
– Two kinds of graph models are used
* In a physical model, each node is a computer system and each
edge is a communication link
* In a logical model, each node is a process and each edge is an
interprocess communication channel
It is used to model a distributed computation
Trang 41Uses of system models
• System models are used to determine properties of a
system or computation
– Impact of faults
* Minimum number of faults that would partition the system
– Resiliency
* k-resiliency: k is the largest number of faults which a system can
withstand without disruption in its functioning
– Latency between nodes
* Minimum latency depends on minimum number of links on a path between two nodes
– Cost of sending information to all nodes
* Number of messages needed
* Used to determine complexity of algorithms
Trang 42Design issues in distributed operating systems
• Distributed nature of the computing environment raises five significant issues
– Transparency of resources and services
* Transparency: resource names should not depend on their locations
* It simplifies access from different locations
– Distribution of control functions
* OS functions are performed in many nodes to ensure reliability
– System performance
* Load balancing is performed to obtain resource efficiency
* Special techniques like file caching are used for scalability
Trang 43Design issues in distributed operating systems
• Design issues (contd)
– Reliability
* Redundancy of resources is exploited to provide fault tolerance
* Special techniques like two phase commit are used
– Security
* An intruder may corrupt interprocess messages over the network
* Special techniques are used for message security and authentication