Lecture Operating systems: A concept-based approach (2/e): Chapter 13 - Dhananjay M. Dhamdhere

Chapter 13 - Synchronization and scheduling in multiprocessor operating systems. This chapter discusses different kinds of multiprocessor systems, and describes how the OS achieves high throughput and fast response by using special techniques of structuring its kernel, so that many CPUs can execute kernel code in parallel, and of synchronizing and scheduling processes.

Trang 1

in any form or by any means, without the prior written permission of the publisher, or used beyond the limited distribution to teachers and educators permitted by McGrawHill

Trang 2

Advantages of multiprocessors

• Multiprocessor architectures provide three advantages

– High throughput

* CPUs can service many processes in parallel

– Computation speed-up

* An application may finish early because its processes may be serviced in parallel

– Graceful degradation

* Fault in one CPU does not halt the multiprocessor system

Trang 3

Classification of multiprocessor systems

• Multiprocessor systems are classified according to the manner in which CPUs access memory units

– Uniform memory access (UMA) architecture

* All CPUs can access the entire memory in an identical manner

* Also called symmetrical multiprocessor (SMP) architecture

– Non-uniform memory access (NUMA) architecture

* Nodes have their own memories, called local memories

* The CPUs in one node can access the local memory of the node faster than the memory of another node

– No-remote-memory-access (NORMA) architecture

* CPUs can access memory units of other nodes only over the network

• Throughput depends on the interconnection network

Trang 4

Interconnection networks

• Common CPU–memory interconnection networks

– Bus

* Low cost, high expandability, reasonable access speeds

* Only one CPU–memory conversation can be in progress at any time

– Cross-bar switch

* CPUs connected along one direction, memory units along another

* High cost, low expandability, high access speeds

* Many conversations can be in progress at any time

– Multistage interconnection network (MIN)

* Hybrid between a bus and a cross-bar switch

* Each stage consists of many 2 x 2 switches

* A path is selected through the stages to reach a memory unit

Trang 5

Interconnection networks

Trang 6

SMP architectures

• A cache coherence protocol ensures that copies of data in caches

and memory are mutually consistent

Trang 7

SMP architectures

• Scalability

– Performance of the system should vary linearly with the number

of CPUs in it

* Bus: not scalable because bus becomes a bottleneck

* Cross-bar switch: Scalable at low traffic densities

 However, the switch cost is not linear with number of CPUs

Trang 8

NUMA architecture

• A node consists of CPUs, local memory units and I/O system connected by a LAN

• It also contains a global port connected to a high speed network and a remote cache

• Hardware ensures coherence between local caches and global caches

Trang 9

Multiprocessor operating systems

• Presence of multiple CPUs affects the operating

system’s method of functioning

– Kernel structure

* Issue: Permit CPUs to execute kernel code in parallel

 Provides reliability and increases responsiveness to interrupts

– Process synchronization

* Issue: Use presence of multiple CPUs to reduce overhead of switching between processes and reduce synchronization delays

 Implement synchronization through looping rather than blocking

– Process scheduling

* Provide computation speed-up

* Ensure high cache-hit ratios

Trang 10

Kernel structure

• Relevant features of an SMP architecture

– Any CPU can initiate I/O on any device

– A CPU can communicate with other CPUs through an

inter-processor interrupt (IPI) and shared memory

• Features of an SMP kernel

– Kernel data structures have to be locked (see next slide)

– An interrupt may lead to shuffling of processes on CPUs

* C1 sends an IPI to C2 and asks it to perform scheduling

– Kernel provides graceful degradation

* System can function even when CPUs fail

Trang 11

Locking of data structures in an SMP kernel

• Granularity of locks influences the parallelism that is possible in the system

• Each priority queue may have its own lock

• Deadlocks would be prevented by employing ranking of locks (similar to

resource ranking for deadlock prevention)

Trang 12

NUMA kernel

• Local and non-local memories have different access

times

– Each node has a separate kernel

* A process accesses only local memories

* Scheduling it on the same CPU provides high cache hit ratio

– Application regions are used for providing good application

performance

* An application region is a resource partition and a kernel

* It is used to service one application

– However, accesses to non-local memories are expensive

* They span domains of many kernels

Trang 13

Process synchronization

• Uses a radically different approach to synchronization

– Many CPUs exist in the system

* Synchronization through looping does not lead to priority inversion

– Synchronization is performed through a synchronization lock

* Scalability of a lock is important

 Performance of an application using the lock should be independent of the number of processes and CPUs

* CPU should be able to handle interrupts while the process executing on it is involved in synchronization

* Two special locks called spin locks and sleep locks are employed

Trang 14

Queued, spin and sleep locks in multiprocessor

operating systems

Trang 15

SLIC bus for process synchronization

• SLIC consists of a special 64 bit register in each CPU; each bit is a lock

• If a bit is set, a CPU trying to set it spins on it

• If a bit is not set and a CPU tries to set it, corresponding bits in other CPUs are set

• Races are resolved through hardware

• When a CPU resets a bit, corresponding bits in all CPUs are reset

Trang 16

A scalable software solution for

process synchronization

• A primary lock is a conventional lock; a shadow lock exists in local memory of a CPU

• A CPU creates a shadow lock and spins over it if it cannot set the primary lock

• A queue for the primary lock keeps track of all its shadow locks

Trang 17

Process scheduling

• Two techniques

– Affinity based scheduling

* Parts of address space of a process are loaded in the CPU’s cache

 It ensures high cache hit ratios

 The process is said to have an affinity for the CPU

* Hence that CPU should be favoured for executing the process

– Synchronization conscious scheduling

* Processes that use spin locks: schedule them at the same time

 It is called co-scheduling or gang scheduling

* Processes that interact through messages: schedule them at different times

Trang 18

Process shuffling

• Pi’s I/O operation completes, hence C1 is switched to it and C2 is switched to Pj

Trang 19

Scheduling in Mach

• Each processor set is assigned a subset of threads

• Time slice on a CPU is inversely proportional to the number of threads

• Each processor also has a local queue of threads; these threads are executed only on it

• A process can issue hints to the scheduler─discouraging and hands-off hints

Trang 20

Linux and Windows

• Linux

– Locking granularity became finer with later releases

– Starting Linux 2.6, the kernel is preemptible

– Reader–writer spin lock and sequence lock for scalability

– Hard affinity for a CPU can be specified; soft affinity for last CPU – Load balancing is performed

• Windows

– Hard and soft affinities

– n–1 CPUs execute high priority processes, one CPU executes

all other processes

– Spin locks on kernel data structures Thread holding a spin lock

is never preempted

Định dạng
Số trang	20
Dung lượng	746,25 KB