Chapter 13 - Synchronization and scheduling in multiprocessor operating systems. This chapter discusses different kinds of multiprocessor systems, and describes how the OS achieves high throughput and fast response by using special techniques of structuring its kernel, so that many CPUs can execute kernel code in parallel, and of synchronizing and scheduling processes.
Trang 1in any form or by any means, without the prior written permission of the publisher, or used beyond the limited distribution to teachers and educators permitted by McGrawHill
Trang 2Advantages of multiprocessors
• Multiprocessor architectures provide three advantages
– High throughput
* CPUs can service many processes in parallel
– Computation speed-up
* An application may finish early because its processes may be serviced in parallel
– Graceful degradation
* Fault in one CPU does not halt the multiprocessor system
Trang 3Classification of multiprocessor systems
• Multiprocessor systems are classified according to the manner in which CPUs access memory units
– Uniform memory access (UMA) architecture
* All CPUs can access the entire memory in an identical manner
* Also called symmetrical multiprocessor (SMP) architecture
– Non-uniform memory access (NUMA) architecture
* Nodes have their own memories, called local memories
* The CPUs in one node can access the local memory of the node faster than the memory of another node
– No-remote-memory-access (NORMA) architecture
* CPUs can access memory units of other nodes only over the network
• Throughput depends on the interconnection network
Trang 4Interconnection networks
• Common CPU–memory interconnection networks
– Bus
* Low cost, high expandability, reasonable access speeds
* Only one CPU–memory conversation can be in progress at any time
– Cross-bar switch
* CPUs connected along one direction, memory units along another
* High cost, low expandability, high access speeds
* Many conversations can be in progress at any time
– Multistage interconnection network (MIN)
* Hybrid between a bus and a cross-bar switch
* Each stage consists of many 2 x 2 switches
* A path is selected through the stages to reach a memory unit
Trang 5Interconnection networks
Trang 6SMP architectures
• A cache coherence protocol ensures that copies of data in caches
and memory are mutually consistent
Trang 7SMP architectures
• Scalability
– Performance of the system should vary linearly with the number
of CPUs in it
* Bus: not scalable because bus becomes a bottleneck
* Cross-bar switch: Scalable at low traffic densities
However, the switch cost is not linear with number of CPUs
Trang 8NUMA architecture
• A node consists of CPUs, local memory units and I/O system connected by a LAN
• It also contains a global port connected to a high speed network and a remote cache
• Hardware ensures coherence between local caches and global caches
Trang 9Multiprocessor operating systems
• Presence of multiple CPUs affects the operating
system’s method of functioning
– Kernel structure
* Issue: Permit CPUs to execute kernel code in parallel
Provides reliability and increases responsiveness to interrupts
– Process synchronization
* Issue: Use presence of multiple CPUs to reduce overhead of switching between processes and reduce synchronization delays
Implement synchronization through looping rather than blocking
– Process scheduling
* Provide computation speed-up
* Ensure high cache-hit ratios
Trang 10Kernel structure
• Relevant features of an SMP architecture
– Any CPU can initiate I/O on any device
– A CPU can communicate with other CPUs through an
inter-processor interrupt (IPI) and shared memory
• Features of an SMP kernel
– Kernel data structures have to be locked (see next slide)
– An interrupt may lead to shuffling of processes on CPUs
* C1 sends an IPI to C2 and asks it to perform scheduling
– Kernel provides graceful degradation
* System can function even when CPUs fail
Trang 11Locking of data structures in an SMP kernel
• Granularity of locks influences the parallelism that is possible in the system
• Each priority queue may have its own lock
• Deadlocks would be prevented by employing ranking of locks (similar to
resource ranking for deadlock prevention)
Trang 12NUMA kernel
• Local and non-local memories have different access
times
– Each node has a separate kernel
* A process accesses only local memories
* Scheduling it on the same CPU provides high cache hit ratio
– Application regions are used for providing good application
performance
* An application region is a resource partition and a kernel
* It is used to service one application
– However, accesses to non-local memories are expensive
* They span domains of many kernels
Trang 13Process synchronization
• Uses a radically different approach to synchronization
– Many CPUs exist in the system
* Synchronization through looping does not lead to priority inversion
– Synchronization is performed through a synchronization lock
* Scalability of a lock is important
Performance of an application using the lock should be independent of the number of processes and CPUs
* CPU should be able to handle interrupts while the process executing on it is involved in synchronization
* Two special locks called spin locks and sleep locks are employed
Trang 14Queued, spin and sleep locks in multiprocessor
operating systems
Trang 15SLIC bus for process synchronization
• SLIC consists of a special 64 bit register in each CPU; each bit is a lock
• If a bit is set, a CPU trying to set it spins on it
• If a bit is not set and a CPU tries to set it, corresponding bits in other CPUs are set
• Races are resolved through hardware
• When a CPU resets a bit, corresponding bits in all CPUs are reset
Trang 16A scalable software solution for
process synchronization
• A primary lock is a conventional lock; a shadow lock exists in local memory of a CPU
• A CPU creates a shadow lock and spins over it if it cannot set the primary lock
• A queue for the primary lock keeps track of all its shadow locks
Trang 17Process scheduling
• Two techniques
– Affinity based scheduling
* Parts of address space of a process are loaded in the CPU’s cache
It ensures high cache hit ratios
The process is said to have an affinity for the CPU
* Hence that CPU should be favoured for executing the process
– Synchronization conscious scheduling
* Processes that use spin locks: schedule them at the same time
It is called co-scheduling or gang scheduling
* Processes that interact through messages: schedule them at different times
Trang 18Process shuffling
• Pi’s I/O operation completes, hence C1 is switched to it and C2 is switched to Pj
Trang 19Scheduling in Mach
• Each processor set is assigned a subset of threads
• Time slice on a CPU is inversely proportional to the number of threads
• Each processor also has a local queue of threads; these threads are executed only on it
• A process can issue hints to the scheduler─discouraging and hands-off hints
Trang 20Linux and Windows
• Linux
– Locking granularity became finer with later releases
– Starting Linux 2.6, the kernel is preemptible
– Reader–writer spin lock and sequence lock for scalability
– Hard affinity for a CPU can be specified; soft affinity for last CPU – Load balancing is performed
• Windows
– Hard and soft affinities
– n–1 CPUs execute high priority processes, one CPU executes
all other processes
– Spin locks on kernel data structures Thread holding a spin lock
is never preempted