‘obit addresses Gbytes of physical memory Virtual memory, Virtual Processors Multiprocessor 61-128 High-spoed intemetintranet, Web Services Single use, but vulnerable to hackers worldwi
Trang 1TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTrrrrrrrrr
William Stallings Copyright 2008
‘This document is an extract from
Operating Systems: Internals and Design Principles, Sixth Edition
Trang 22.8 LINUX
History
Linus started out as a UNIX vatiant for the IBM PC (Intel 80386) architecture Linus Torvalds, a Finnish student of computer science, wrote the intial version, Tor valds posted of people, collaborating over the Internet, have contributed to the development of an early version of Linux on the Internet in 1991 Since then,a number Linus, all under the control of Torvalds, Because Linus is free and the source code is available, it became an early alternative to other UNIX workstations, such as those olfered by Sun Microsystems and IBM Today, Linux is a full-featured UNIX system that rans on all of these platforms and more, including Intel Pentium and Ttanium, and the MotorolailBM PowerPC
Trang 328 / LINUX 98 'WINDOWS/LINUX COMPARISON
Windows, POSIX and, crgially, O82
32-bit program address space Tobit program address space
Mbytes of physical mecnory Kbytes of physical memory
“Miero-conttller based 110 devices State-machine based UO devices
ClientServer distributed computing Large diverse user populations Standalone interactive systems Small number of friendly users
‘Compare these with today's enviroament ‘obit addresses
Gbytes of physical memory Virtual memory, Virtual Processors Multiprocessor (61-128)
High-spoed intemetintranet, Web Services Single use, but vulnerable to hackers worldwide Although both Windows and Linux have adapted to changes in the environment, the original design} cavironments (Leia 1989 and 1973) heavily inuenced the design choices
Unit of concurrency: threads vs processes [adress space, uniprocessor} Process ereation: LÒ CreateProcess() vs frk() Asyne vssyne Faddress space, swapping) [swapping 10 devices) Security Discretionary Access vs uidigd oxerpopulalom]
System structure
“Modular core Kernel, with explicit publishing of data structures and interfaces by components Monolithic Kernel
‘Three layers ‘+ Hardware Abstraction Layer manages
processor, interrupt, DMA BIOS details
+ Kemel Layer manages interrupts, and synchronization CPU scheduling,
+ Executive Layer implements the major OS functions in ull threaded, mostly
fatded after boot, Much kemel code and
data is pagcable, Non-pageable kemel code
tnd data uses large pages for TLH efficiency
Trang 4
96 CILAPTER 2 / OPERATING SySTEM OVERVIEW
Fie systems networking devices are loadable!
unloadable drivers (dynamic link libraries)
using the extensible HO system interfaces
Dynamically loaded drivers can prove both
ppageable and non-pageable seetions
[Namespace root is virtual with file systems
‘mounted underneath types of eystsm objects
‘easily extended, and leverage unified nam- ng referencing lifetime management, secu
and handle-bused synchronization
(0 personalities implemented vser-mode subsystems Native NT APIs are based on
the general Kernel handlelobject aehitee-
Extensive support for loading/unloading
‘ere! modules, suchas device drivers and filesystems
‘Moles cannot be paged, but can be unloaded
[Namespace is roted in a file system: adding new named system objects requie filesystem] changes oF mapping onto device model Implements a POSIX-compatible, UNIX- ture and allow crose-process manipolation of Virtual memory, threads, and other kernel
objects
Discretionary Access Controls
privileges auditing
like interface: Kemet APIs farsimpler than
‘Windows: Can understand various types of cexcoutables
Usergroup IDsseapabilies similar to NT pri
‘legs can also be associated with processes
Key tothe success of Linux has been the availability of free software packages
‘under the auspices of the Free Software Foundation (FSF) FSF's goal is stable, plat- form-independent software that is free, high quality, and embraced by the user com munity FSF's GNU project” provides tools for software developers, and the GNU Public License (GPL) is the FSF seal of approval Torvalds used GNU tools in de veloping his kernel, which he then released under the GPL Thus, the Linux distrib- lutions that you see today are the product of FSF'S GNU project, Torvald’ individual effort, and many collaborators all over the world, In addition to its use by many individual programmers, Linux has now made significant penetration into the corporate World, Ths is not only hecause of the free software, but also because of the quality of the Linux kernel Many talented pro- _grammers have contributed to the current version, resulting in a technically impres- sive product, Moreover, Linux is highly modular and easily configured This makes i easy {0 squeeze optimal performance from a variety of hardware platforms, Plus, with the source code available, vendors ean tweak applications and utilities to meet specific requirements Throughout this book, we will provide details of Linux kernel internals based on the most recent version, Linux 26
Trang 528/UNUX 97
that runs as a single process with a single address space All the functional components
‘of the kernel have access to all ofits internal data structures ‘made to any portion of a typical monolithic OS, ll the modules and routines must bere and routines If changes are linked and reinstalled and the system rebooted before the changes can take effect Asa result, any modification, such as adding # new device driver or filesystem function, isi ficult This problem is especially acute for Linu, for which development is global and done by a loosely associated yroup of independent programme Although Linux does not use a mierokernel approach, it achieves many of the potential advantages of this approach by means of ts particular modular architecture Linus i structured as a collection of modules
« number of which can be automatically loaded and unloaded on demand These relatively independent blocks are referred to asloadable modules[GOY E99] In essence,a module is an object file whose code can
be linked to and unlinked from the kernel at runtime Typically, a module implements, some specific function, such as a filesystem, a device driver, oF some other feature of the kernels upper layer A module does not execute as its own process or thread, al though it can ereate kernel threads for vatious purposes as necessary: Rather, miod- ule is executed in kernel mode on behalf of the current process Thus, although Linux may be considered monolithic, its modular structure
‘overcomes some of the difficulties in developing and evolving the kernel “The Linux loadable modules have two important chatacterisies
‘modules, dependencies between modules can be defined This has two enetits:
1 Code common t a set of similar modules (eg, drivers for similar hardware) can be moved into a single module, redueing replication
Trang 698 cILAPreR 2 / OPERATING SysTEM OVERVIEW
igure 217 Example List of Linux Kernel Modules
+ usecount: Module usage counter The counter is incremented when an opeFa- tion involving the module's funetions is started and decremented when the op~
‘eration terminates
* flags: Module flags,
«+ nsyms: Number of exported symbols,
+ ndeps: Number of referenced modules
+ *syms: Pointer to this module's symbol table
«= Sdeps: Pointer to list of modules the are referenced by this module,
+ “refs: Pointer to list of modules that use this module,
‘The symbol table defines those symbols controlled by thi used elsewhere, Figure 2.17 shows that the VPAT module was loaded after the FAT module and that the VFAT module is dependent on the FAT module
Figure 2.18, taken from [MOSB02| shows the main components of the Linux kernel
as implemented on an IA-64 architecture (eg Intel Ilanium).The figure shows sev eral processes running on top of the kernel Ench box indicates a separate process, while each squiggly line with an arrowhead represents a thread of execution.’ The
i Linus there no distinction between the concepts of presses and threads However, uliple ves Lin ca be groupe topether sich a) tha ifletvel you can have single process omprisig mile tends These malters re dncsaed i Chapter
Trang 7
ignals: The kernel uses signals to call into a process For example, signals are used 10 notify a process of certain faults, such as division by zero, Table 26 ives a few examples of signals
Table 26 Some Linux Signals
siGsEGV Seamentation violation SiGwINcH ‘Window size unchanged
Trang 8100 CHAPTER 2 / OPERATING SYSTEM OVERVIEW
+ System calls: The system calls the means by which a process requests a specific kernel serviee There are several hundred system calls, which ean be roughly {grouped into six eategories: filesystem, process, scheduling, interprocess com- munication, socket (networking), and miscellaneous Table 27 defines a few ex-
‘amples in each category
open ‘Open and possibly ceatea ie or deve
rend Read om file descriptor
Proce related
et “Teeminate he calling process
sp ‘Set user identity ofthe current process, Get proces entiation
Proves a means by which a arent process my observe and contr the execu tion of another process and examine and change score image and segs
Sets bth the scheduiag policy (ep FIFO) and he ssciated parameters forthe proces ps3
— ‘A messge bolle siichire allocated to receive a memage The tem ell, ‘then reads ữepsje rô the ewig que specied BY ued tate the ely
created message baller
Trang 92.9 / RECOMMENDED READING AND Wea StTES 101 Table 2.7 (Continued)
in “Asis the lea IP adres und port fora sockel Reurns for success and 1 ferent connect Esublses a connetion betwen the given socket and the remote socket s80
—— Retar loa host name
‘Send the byes contained in bute pointed toby "map over the sven sche Setsthe options on socket,
‘hat wil be pesded to hold the mle, “Afompis to eeaes oadable module eniry and resene the Keel memory
Cope alin-sore paris file dk nd wats unt the device report tht
il parteare ov aabh storage Requests information related o ladle modules fom the kernel Retun the time ia seconds since January 1,170,
‘Simulates «hangup om the cusen terminal, This call ranges for other users habe alent) agin ine
Processes and scheduler: Creates, manages, and schedules processes
{tual memory: Allocates and rages virtual memory for processes
File systems: Provides a global, hierarchical namespace for files, directories, and other file related objects and provides file system functions Network protocols Supports the Sockets interface to users for the TCPAP protocol suite
‘Character device drivers: Manages devices that require the kernel to send of receive data one byte at a time,such as terminals, modems, and printers Block device drivers: Manages devices that read and write data in blocks, such as various forms of secondary memory (magnetic disks, CD-ROMS, ete)
Interrupts: Handles interrupts [rom peripheral devices
Trang 10
4.6 LINUX PROCESS AND THREAD MANAGEMENT
roses are container Tor the worms ATIF
space a general handle mechani for retereacing
Kemel object and threads: heads am in proces,
and the schedlabe entice
Proceso are both Sonne and the shea enttes proces can sate wre space and sp lem resources making procenies festive able x threads
rocesorare created by dseete Hops WHR con
suet he container fora new program ad he fest
Urea fork) ike native APL ex, bl only sed
for POSIX compatiblity
Proce created hy making vital copies with fork) and then overwriting with excl) ran anew
—¬
roves tandle table wed a ualorny reference
kemelobjsts (representing processes thread,
memory sections synchronization UO devices di
‘ere open ile network eaonetions timers Kernel transactions)
Kernel objects referenced by ad hoe cllectin of [APIs and mechanisms ncn ie desrptors for ‘pen ies and socket and PIDs for processes and proces groups
TUpio16milioa Hnadien oa Karel oboe ae ap
ported per press pte open Featocke process are sapped par
‘eel fll ul ead wh ero preemp
thon enabled ow all yen inthe orginal design Few feral proces and sd Ferel preempon A eoeat feature
‘Many system serves inplenented using a
int server computing cain he OS personality ‘ubysere that un in oser mode and eommniate
thủng remote-procedute als
‘exception of many networking felons Most series ae implemented ia the kernel wih the
Trang 11entfiers: Each process has a unique process identifier and also has user and oup identifiers group identifier is used to assign resource access privileges
‘Times and timers: Includes process creation time and the amount of processor time so far consumed by the process A process may also have associated one for more interval timers A process defines an interval timer by means of asys- tem call: as a resulta signal is sent to the process when the timer expires A timer may be single use or periodic
File system: Includes pointers to any files opened by this process, as well as pointers to the current and the root directories for this process
‘+ Address space: Defines the virtual address space
* Processorspecific context: The registers and stack information that constitute the context of this process assigned to this process Figure 4.18 shows the exceution states of a process These are as Follows:
This state value corresponds to two states A Running process is either executing or it is ready to execute,
«+ Interruptible: This is a blocked state, in which the process is waiting for an
‘event, such as the end of an HO operation, the availability ofa resource, of a signal from another process
* Uninterruptible: This is another blocked state, The difference between this and the Interruptible state ing directly on hardware conditions and therefore will not handle any signals is that in an uninterruptible state, a process is wat
‘written with a set of user-level library functions, the most popular of which is
Trang 126.7 LINUX PROCESS AND THREAD MANAGEMENT 197
Figure 418 Linus Procea/Thread Model
known as pibread (POSIX thread) lebraries with all of the threads mapping into single kernel-level process.“! We have seen that modern versions of UNIX offer kernel-level threads Linux provides a unique solution in that it does not recognize
4 distinetion between threads and processes Using a mechanism similar to the lightweight processes of Solaris, user-level threads are mapped into kernel-level processes Multiple user-level threads that constitute a single user-level process
‘are mapped into Linux kernel-level processes that share the same group ID This tenables these processes to share resources such as files and memory and to avoid the need for a context switeh when the scheduler switches among processes in the same group A new process is created in Linux by copying the attributes of the current process A new process ean be cloned so that it shares resources, such as files, signal hhandlers,and virtual memory: When the two processes share the same virtual mem-
‘ry they function as threads within a single process, However, no separate type of ata structure is defined for a thread In place of the usual fork() command, processes are ereated in Linux using the clone() command This command inclides a set of flags as arguments, defined in Table 45 The traditional fork() system eal i imple
‘mented by Linux asa clone() system call with all of the elone flags cleared
Trang 13
198 CHAUTER 4 /THILEADS, SM, AND MICROKERNELS,
Tale 4S — Linx clone () aps
CLONE-CLEARID | Gharihe ask 1D,
‘CLONE_DETACHED | The pirent doesnot wants SIGCHLD sgn ent on sx
‘CLONE FILES ‘Shares he ble tha Keni he open fie
‘CLONES ‘Shares the able that Menifee root rectory and he caren working dncaon ella the vale of the bit mask used o mask he nit le permissions ofa ne ic
‘CLONE-IDERTASK | Set PID to ero which refers to ail ak The Mle sk employed when all
salable take are bseked wating for resources
“CLONE_NEWNS | Greitew new namespics for the cil
‘CLONE_PARENT | Caller and new tak share the sume parea prose
‘CLONE_PTRAGE | ihe piven prowess beng aed he cid procem wil be Waeel
GLONE.SETTID | Wile the TID back ower space
CLONE.SETTIS | Creates new TLS forthe cil
"CLONE SIGHAND | Shares he table that ents the wend Randle
“CLONE SYSVSEM | Shares System V SENLUNDO seman
‘CLONE-THREAD | Inirs thi proces into the sme head proup ol the paren hill ve limplicly entrees CLONE PARENT,
Trang 14
Linux includes all of the concurrency mechanisms found in other UNIX
such as SVR4, including pipes, messages, shared memory, and signals In addition, Linux 2.6 includes a rich set of concurreney mechanisms specifically intended for use when a thread is executing ia kernel mode That is, these are mechanisms used within the kernel to provide concurrency in the execution of kernel code This sec- tion examines the Linux kernel concurrency mechanisms
Atomic Operations
Linus provides a set of operations that guarantee atomic operations on a variable
‘These operations can be used to avoid simple race conditions, An atomic operation
«executes without interruption and without interference On a uniprocessor system, thread performing an atomic operation eannol be interrupted once the operation hhas started until the operation is finished In addition, on a multiprocessor system, the variable being operated on is locked from access by other threads until this op- eration is completed ‘Two types of atomic operations are defined in Linux:integer operations, which
‘operate on an integer variable, and bitmap operations, which operate on one bit ina bitmap (Table 6.3) These operations must be implemented on any architecture that implements Linux For some architectures, there are corresponding assembly lan- {guage instructions for the atomic operations On other architectures, an operation that locks the memory bus is used to guarantee that the operation is atomic: Foratomic integer operationsa special duta type is used atom
integer operations can be used only on this data type, and no other oper
Jowed on this datatype, OVEO4] lists the following advantages for thes
restrictions:
1 The atomic operations are never used on variables that might in some circum- stances he unprotected from race conditions
2 Variables ofthis datatype are protected from improper use by nonatomic operations
3 The compiler cannot erroneously optimize access to the value (e , by using an ‘lias rather than the correct memory address)
4 This data type serves to hide sarchitecture-specifie differences in its imple-
A typical use of the atomic integer data type is to implement counters, The atomic bitmap operations operate on one of a sequence of bits at an arbi trary memory location indicated by a pointer variable Thus, there is no equivalent oth ‘Atomic operations are the simplest of the approaches to kernel synchroniza lta type needed for atomic integer operations tion More complex locking mechanisms can be built on top of them
Spinlocks
“The most commton technique used for protecting a critical section in Linux is the spin- lock Oaly one thread at atime can acquire a spinlock Any other thread attempting to
Trang 15290 CHAPTER 6 / CONCURRENCY: DEADLOCK AND STARVATION
Yable6.3 Linux Atomic Operations
‘Atomic Integer Operations
void sranicisst(ataniacty, TJ] Sethe value of vo ntegert
Void avente_wubilut 1, avenle-t “Vy SuBaqtifomy
Tne RtonIe.ađã_regarlvetTnE 1; ‘Ads to yretun Tithe Fval Snes
AEenle-E *¥1 tive: eur otherwise senting semaphores) (used or impe-
Wold cea DLE(INE AE, eid “aad “Gear it arin the bitmap pointed io by ade dela change bit (ine nr, veld *adaey) Tavern he bitmap pointed by ar Gar Fest and wold aaa sek BIE (ine oF) Set bic arin the Bitmap plated To by sake return he bi vale
‘Gear birnr in he bitmap pointed To By suk rte theo it vale Taser bit ei the bitmap pomted to By erent thes iva
Return the value of ita nthe Baa pointed byadde
be very short,say on the order of les than two context changes “The basie form of use of a spinlock is the Following:
spin_lock(kleck) J critical section */
spin_unlock(klock}
Trang 166.87 LINUX KERNEL CONCURRENCY MECHANISMS 291 Table 64 Linux Spinlocks
——
Tefapiniscie *asck) Acquires he specie lock, spinning Wnceded unt salable
ike pin-Tock, bale disable aferaps oa Te oe racer
Tike pin lock, bl aso dfables he execution TT
‘otto halves Weis spin-IwlszkispimlasX-L TIGER) Releases ven Ook
Wold spin_onlook AFatepinicokt Releases even nek snd able Tea ter op
to piven previous state eleaey ven Took and estore Teal TIRE
Voi4 snin-Ieer-ImiInpinMesE-E Traine ven work
Soe getm-Erylserlzptmlsrt-E “Tes To acque specified loc reuras nonzero Hoek
Tne spt Ta loskadlpinlosce °TGEK) Reluess nowero i lock i enenlly Rell and 729 there
‘+ rg: If interrupts are always enabled, then this spinlock should be used,
jeqsave: If Lis not known it interrupts will be enabled or disabled at the ime
‘of execution, then this version should ‘current state of interrupts on the local processor is saved, to be restored when be used When a lock is acquired, the the lock is released,
‘The plain spinlock is used if the programmer knows thatthe protected data is not accessed by an interrupt handler or bottom hall, Otherwise, the appropriate nonplain spinlock is used.
Trang 17292 CHAPTER 6 / CONCURRENCY: DEADLOCK AND STARVATION
Spintocks are implemented differently on a uniprocessor system versus a mul: tiprocessor system For a uniprocessor system, the following considerations apply: If kernel preemption is turned off,so that a thread exceuting in kernel mode cannot be interrupted, then the locks are deleted at compile time; nel preemption is enabled, which does permit interrupts then the spinlocks again they are not needed, If ker- compile away (that is, no test of a spinlock memory location occurs) but are simply implemented as code that enablesidisables interrupts On a multiple processor sys- tem, the spinlock is compiled into code that does in fact test the spinlock location, The use of the spinlock mechanism in a program allows it to be independent of whether its executed on a uniprocessor oF multiprocessor system
Reader-Writer Spinlock ‘The reader-writer spinlock is a mechanism that al- lows a greater degree of concurrency within the kernel than the basie spinlock,
‘The reader-writer spinlock allows multiple threads to have simultaneous access
to the same data structure for reading only but gives exclusive access to the spin- lock for a thread that intends to update the data structure, Each reader-writer spinlock consists of a 24-bit reader counter and an unlock flag, with the Following interpretation:
7 7 Spinosk hasbeen aegired or wing By one Tea
n=O) 7 Spins hasbeen aeqired or reading ym hee
Semaphores
CÁ he user level, Linux provides a semaphore interface corresponding to that in UNIX SVR4, Internally, Linux provides an implementation of semaphores for its
‘own use That is,code that is part of the kernel ean invoke kernel semaphores These kernel semaphores cannot be accessed dicey by the user program via system calls
‘They are implemented a6 functions within the kernel and are thus more efficient than user-visible semaphores Linus provides three types of semaphore facilities in the kernel: binary sema- phores, counting semaphores, and reader-writer semaphores,
Binary and Counting Semaphores The binary and counting semaphores defined in Linux 26 (Table 6.5) have the same functionality as described for such
Trang 186.48/ LINUX KERNEL CONCURRENCY MECHANISMS 293, Table 6S Linux Semaphores
‘Traditional Semaphores
—¬—
Sine count) Iniatizes sven count the damically created semaphore othe Wels Ile MOTER at tees) count of nally unlocked) Tnializes he djamally created semaphore witha
unt of Gta loeked] Taaizes he dynamically created semaphore witia Aliempis to acguire the ive semaphore enering tnirraptible sep if semaphore is unavailable Tae down IntorraptTBIststret aenaphore tam ‘Alempis to aeuire the given semaphore enerng "te sleep ifsemaphore i unavaabe returns
“EINTR value (signal oer th the fst of up
‘operations ecives
TrE đam ty
tae) TRIntrver sennphore A noero vale -ANepsloazgsreIbephen xmapbðns.andrearnx if semaphore isunsvable vole W(ernict sanmenare “een Releases the ven semaphore
Reader Writer Semaphores Vola [nie suesn(styuer rụ sehaphare, Si unt of Inalizes the dynamically erated semaphore witha
Linuy provides three versions of the down (cenit) operation
1, The đosss function corresponds to the traditional ceriiait operation That is, the thread tests the semaphore and blocks if the semaphore is not available
‘The thread will awaken when a corresponding up operation on this semaphore
‘occurs, Note that this funetion name is used for an operation on either a count ing semaphore or a binary semaphore,
2, The down_invesruot to-a kernel signal while being blocked on the down operation If the thread is, ibLefunction allows the thread to receive and respond woken up by a signal, the dow jp ible function increments the
Trang 19
294 CHAPTER 6 / CONCURRENCY: DEADLOCK AND STARVATION
‘count value of the semaphore and returns an error code known in Linux as - This alerts the thread that the invoked semaphore function has aborted
In effect, the thread has been forced to “give up” the semaphore This feature is useful for device drivers and other services in which itis convenient to override a semaphore operation
3 The dows_s=yLoc' function makes it possible to try to acquire a semaphore without being blocked If the semaphore is available, itis acquired Otherwise, this function returns a nonzero value without blocking the thread
Reader-Writer Semaphores The readerwriter semaphore divides users into readers and writersit allows multiple concurrent readers (with no writers) but only a single writer (with no concurrent readers) In effect, the semaphore functions as a counting semaphore for readers but a binary semaphore (MUTEX) for writers, Table 6.5 shows the basic reader-writer semaphore operations The reader-writer semaphore
"uses uninterruptible sleep, so there is only one version of each of theiown operations
Barriers
In some architectures, compilers andfor the processor hardware may reorder memory accesses in source cade to optimize performance These reotderings are dane to opti= mize the use ofthe instruction pipeline inthe processor The reordering algorithms con: tin checks to ensure that data dependencies are not violated, For example, the code:
‘memory barrier facility Table 6.6 lists the most important functions that are defined
et ‘Prevenls oes om Being eordered arse basier
xe) ‘Proves oads and slores rộm bengreonferedxsie barir
barrier) ‘Prevens the some from reordering oa o ores across he Fr Seno ‘Op SMP provides abt) and on UP provides a arr)
=n ‘Da SMP proidesa wan) anon UP providesa baie)
=e EI ‘Oe SMP provides mi) andon UP proven Breer)
Trang 2069 / SOLAIUS THREAD $91 IRONIZATION pruinartivEs 295
for this facility The emis) operation insures that no reads occur across the barrier defined by the place of the vin) in the code Similarly, the wb (} operation in- sures that no wriles occur aeross the barrier defined by the place of t
the code The nis() operation provides both a load and store barrier Tivo important points to note about the barrier operations:
vans) in
12 The bartiets relate to machine instructions, namely loads and stores Thus the higher-level language instruction «= involves both a load (read) from lo- cation > and a store (write) to location
we “The =mb, vib, and mb operations dictate the behavior of both the compiler and the processor In the ease of the compiler, the barrier operation dictates,
that the compiler case of the processor the barrier operation dictates that any instructions pend> not reorder instructions during the compile process, In the ing in the pipeline before the barrier must be committed for execution before any instructions encountered alter the barrier
The be: ) operation isa lighter-weight version of the sib) operation,
in that it only controls the compiler’s hehavior This would be useful iit is known that the processor will not perform undesirable reorde
186 processors do not reorder writes ‘The seins, anp_wnb, and eow>_b operations provide an optimization for code that may be compiled on either a uniprocessor (UP) or a symmetsic multi- processor (SMP) These instructions are defined as the usual memory barriers for an SMP, but for a UP, they are all treated only as compiler barriers (The snp_ opera~ tions are useful in situations in which the data dependencies of concern will only arise in an SMP context
ws For example, the Intel
Trang 21Linux Virtual Memory
Virtual Memory Addressing Linux makes use of a three-level page table structure, consisting of the following types of tables (each individual table isthe size
cof one page):
+ Page directory: An active process has a single page directory that isthe sizeof ‘one page Bach entry in the page dircetory points to one page of the page mi dle directory The page directory must bein main memory for an active process + Page middle directory: The page middle directory may span multiple pages Each entry in the page middle directory points to one page in the page table
* Page table: The page table may also span multiple pages Each page table cenity refers to one virtual page of the process
‘To use this three-level page table structure, a virlual address in Linus is viewed
as consisting of four fields (Figure 825), The leftmost (most significant) field is used
as an index into the page directory The next field serves as an index into the page middle directory The third field serves as an index into the page table The fourth field gives the offset within the selected page of memory
Trang 22390 CHAPTER & / VIRTUAL MEMORY
Figure 8.25 Address Translation in Linus Virtal Memory Scheme
‘The Linux page table structure is platform independent and was designed to accommodate the 64-bit Alpha processor, which provides hardware suppor for three levels of paging With 64-bit addresses, the use of only two levels of pages on the
‘Alpha would result in very large page tables and directories The 32-bit Pentium/x86 architecture has a two-level hardware paging mechanism The Linux software e by defining the size ofthe page middle directory as
‘one, Note that all references ran time Therefore, there is no performance overhead for using, to an extra level of inditection are optimized away at com: genetic three-level design on platforms which support only two levels in hardware
‘buddy system is used The kernel maintains list of contiguous page frame groups of fixed size:a group may consist of 1,2.4,8, 16, 0r 32 page frames As pages are allo cated and deallocated in main memory, the available groups are split and merged using the buddy algorithm,
Page Replacement Algorithi The Linux page replacement algorithm is based on the elock algorithm described in Section 82 (soe Figure 8.16) Inthe im" ple clock algorithm, a use bit and a modify bit age associated with each page in main
‘memory In the Linux seheme,the use hit is replaced with an 8-bit age variable Each time that 8 page is aeessed, the age variable is incremented Inthe background, Linux periodically sweeps through the global page pool and decrements the age Variable for each page ait rotates through all the pages in main memory A page with an age of Oi an old” page that has not been referenced in some time and is the best candidate for replacement The larger the value of age, the more frequently