Where L4 implements IPC and minimal virtual memory management in privileged mode, Pebble’s nucleus includes only code to transfer threads from one protection domain to another and a smal
Trang 1The following paper was originally published in the
Proceedings of the USENIX Annual Technical Conference
Monterey, California, USA, June 6-11, 1999
The Pebble Component-Based
Operating System_
_
Eran Gabber, Christopher Small, John Bruno, José Brustoloni, and Avi SilberschatzLucent Technologies—Bell Laboratories
© 1999 by The USENIX AssociationAll Rights ReservedRights to individual papers remain with the author or the author's employer Permission is granted for noncommercialreproduction of the work for educational or research purposes This copyright notice must be included in the reproduced paper.USENIX acknowledges all trademarks herein
For more information about the USENIX Association:
Phone: 1 510 528 8649 FAX: 1 510 548 5738Email: office@usenix.org WWW: http://www.usenix.org
Trang 2The Pebble Component-Based Operating System
Eran Gabber, Christopher Small, John Bruno†, José Brustoloni and Avi Silberschatz
Information Sciences Research Center Lucent Technologies—Bell Laboratories
600 Mountain Ave.
Murray Hill, NJ 07974 {eran, chris, jbruno, jcb, avi}@research.bell-labs.com
†Also affiliated with the University of California at Santa Barbara
Abstract
Pebble is a new operating system designed with the
goals of flexibility, safety, and performance Its
architec-ture combines a set of feaarchitec-tures heretofore not found in a
single system, including (a) a minimal privileged mode
nucleus, responsible for switching between protection
domains, (b) implementation of all system services by
replaceable user-level components with minimal
privi-leges (including the scheduler and all device drivers)
that run in separate protection domains enforced by
hardware memory protection, and (c) generation of code
specialized for each possible cross-domain transfer The
combination of these techniques results in a system with
extremely inexpensive cross-domain calls that makes it
well-suited for both efficiently specializing the
operat-ing system on a per-application basis and supportoperat-ing
modern component-based applications
A new operating system project should address a real
problem that is not currently being addressed;
construct-ing yet another general purpose POSIX- or
Windows32-compliant system that runs standard applications is not a
worthwhile goal in and of itself The Pebble operating
system was designed with the goal of providing
flexibil-ity, safety, and high performance to applications in ways
that are not addressed by standard desktop operating
systems
Flexibility is important for specialized systems, often
referred to as embedded systems The term is a
misno-mer, however, as embedded systems run not just on
microcontrollers in cars and microwaves, but also on
high-performance general purpose processors found in
routers, laser printers, and hand-held computing
devices
Safety is important when living in today’s world of
mobile code and component-based applications.Although safe languages such as Java [Gosling96] andLimbo [Dorward97] can be used for many applications,hardware memory protection is important when code iswritten in unsafe languages such as C and C++
High performance cannot be sacrificed to provide safety
and flexibility History has shown us that systems arechosen primarily for their performance characteristics;safety and flexibility almost always come in secondplace Any system structure added to support flexibilityand safety cannot come at a significant decrease in per-formance; if possible, a new system should offer betterperformance than existing systems
Early in the project, the designers of Pebble decided that
to maximize system flexibility Pebble would run as littlecode as possible in its privileged mode nucleus If apiece of functionality could be run at user level, it wasremoved from the nucleus This approach makes it easy
to replace, layer, and offer alternative versions of ing system services
operat-Each user-level component runs in its own protection
domain, isolated by means of hardware memory
protec-tion All communication between protection domains isdone by means of a generalization of interrupt handlers,
termed portals Only if a portal exists between
protec-tion domain A and protecprotec-tion domain B can A invoke aservice offered by B Because each protection domain
has its own portal table, by restricting the set of portals
available to a protection domain, threads in that domainare efficiently isolated from services to which theyshould not have access
Portals are not only the basis for flexibility and safety inPebble, they are also the key to its high performance.Specialized, tamper-proof code can be generated foreach portal, using a simple interface definition lan-guage Portal code can thus be optimized for its portal,
Trang 3saving and restoring the minimum necessary state, or
encapsulating and compiling out demultiplexing
deci-sions and run-time checks
The remainder of this paper is structured as follows In
Section 2 we discuss related work In Section 3 we
describe the architecture of Pebble, and in Section 4 we
discuss the portal mechanism and its uses in more detail
Section 5 covers several key implementation issues of
Pebble Section 6 introduces the idea of implementing a
protected, application-transparent “sandbox” via portal
interposition, and shows the performance overhead of
such a sandbox Section 7 compares the performance of
Pebble and OpenBSD on our test hardware, a MIPS
R5000 processor Section 8 reviews the current status of
Pebble and discusses our plans for future work We
summarize in Section 9, and include a short code
exam-ple that imexam-plements the sandbox discussed in Section 6
Pebble has the same general structure as classical
micro-kernel operating systems such as Mach [Acetta86],
Cho-rus [Rozer88], and Windows NT [Custer92], consisting
of a privileged mode kernel and a collection of user
level servers Pebble’s protected mode nucleus is much
smaller and has fewer responsibilities than the kernels
of these systems, and in that way is much more like the
L4 microkernel [Liedtke95] L4 and Pebble share a
common philosophy of running as little code in
privi-leged mode as possible Where L4 implements IPC and
minimal virtual memory management in privileged
mode, Pebble’s nucleus includes only code to transfer
threads from one protection domain to another and a
small number of support functions that require kernel
mode
Mach provides a facility to intercept system calls and
service them at user level [Golub90] Pebble’s portal
mechanism, which was designed for high-performance
cross-protection-domain transfer, can be used in a
simi-lar way, taking an existing application component and
interposing one or more components between the
appli-cation component and the services it uses
Pebble’s architecture is closer in spirit to the nested
pro-cess architecture of Fluke [Ford96] Fluke provides an
architecture in which virtual operating systems can be
layered, with each layer only affecting the performance
of the subset of the operating system interface it
imple-ments For example, the presence of multiple virtual
memory management “nesters” (e.g., to provide demand
paging, distributed shared memory, and persistence)
would have no effect on the cost of invoking file system
operations such as read and write The Fluke modelrequires that system functionality be replaced in groups;
a memory management nester must implement all of thefunctions in the virtual memory interface specification.Pebble portals can be replaced piecemeal, which permitsfiner-grained extensibility
The Exokernel model [Engler95, Kaashoek97] attempts
to “exterminate all OS abstractions,” with the privilegedmode kernel in charge of protecting resources, but leav-ing resource abstraction to user level application code
As with the Exokernel approach, Pebble moves theimplementation of resource abstractions to user level,but unlike the Exokernel, Pebble provides a set ofabstractions, implemented by user-level operating sys-tem components Pebble OS components can be added
or replaced, allowing alternate OS abstractions to ist or override the default set
coex-Pebble can use the interposition technique discussed inSection 6 to wrap a “sandbox” around untrusted code.Several extensible operating system projects have stud-ied the use of software techniques, such as safe lan-
guages (e.g., Spin [Bershad95]) and software fault isolation (e.g., VINO [Seltzer96]), for this purpose.
Where software techniques require faith in the safety of
a compiler, interpreter, or software fault isolation tool, asandbox implemented by portal interposition and hard-ware memory protection provides isolation at the hard-ware level, which may be simpler to verify thansoftware techniques
Philosophically, the Pebble approach to sandboxing isakin to that provided by the Plan 9 operating system[Pike90] In Plan 9, nearly all resources are modeled asfiles, and each process has its own file name space Byrestricting the namespace of a process, it can be effec-tively isolated from resources to which it should nothave access In contrast with Plan 9, Pebble can restrictaccess to any service, not just those represented by files
Pebble applies techniques developed by Bershad et al.[Bershad89], Massalin [Massalin92], and Pu et al.[Pu95] to improve the performance of IPC Bershad’sresults showed that IPC data size tends to be very small(which fits into registers) or large (which is passed bysharing memory pages) Massalin’s work on the Synthe-sis project, and, more recently, work by Pu et al on theSynthetix project, studied the use of generating special-ized code to improve performance
Pebble was inspired by the SPACE project [Probert91].Many of the concepts and much of the terminology of
the project come from Probert’s work; e.g., SPACE
Trang 4pro-vided us with the idea of cross-domain communication
as a generalization of interrupt handling
The Spring kernel [Mitchell94] provided
cross-protec-tion domain calls via doors, which are similar to
Peb-ble’s portals However, Spring’s doors are used only for
implementing operations on objects, and do not include
general purpose parameter manipulations
The Kea system [Veitch96] is very similar to Pebble It
provides protection domains, inter-domain calls via
por-tals and portal remapping However, Kea’s porpor-tals do
not perform general parameter manipulations like
Peb-ble Parameter manipulations, such as sharing memory
pages, are essential for efficient communication
between components
The MMLite system [Helander98] is a
component-based system that provides a wide selection of
object-oriented components that are assembled into an
applica-tion system MMLite’s components are space efficient
However, MMLite does not use any memory protection,
and all components execute in the same protection
domain
Like Dijkstra’s THE system [Dijkstra68], Pebble hides
the details of interrupts from higher level components
and uses only semaphores for synchronization
Some CISC processors provide a single instruction that
performs a full context switch A notable example is the
Intel x86 task switch via a call gate [Intel94] However,
this instruction takes more than 100 machine cycles
The Pebble philosophy consists of the following four
key ideas
The privileged-mode nucleus is as small as possible If
something can be run at user level, it is.
The privileged-mode nucleus is only responsible for
switching between protection domains In a perfect
world, Pebble would include only one privileged-mode
instruction, which would transfer control from one
pro-tection domain to the next By minimizing the work
done in privileged mode, we reduce both the amount of
privileged code and the time needed to perform essential
privileged mode services
The operating system is built from fine-grained
replace-able components, isolated through the use of hardware
memory protection.
The functionality of the operating system is mented by trusted user-level components The compo-nents can be replaced, augmented, or layered
imple-The architecture of Pebble is based around the ity of hardware memory protection; Pebble, as describedhere, requires a memory management unit
availabil-The cost of transferring a thread from one protection domain to another should be small enough that there is
no performance-related reason to co-locate services.
It has been demonstrated that the cost of using hardwarememory protection on the Intel x86 can be madeextremely small [Liedtke97], and we believe that if itcan be done on the x86, it could be done anywhere Ourresults bear us out—Pebble can perform a one-way IPC
in 114 machine cycles on a MIPS R5000 processor (seeSection 7 for details)
Transferring a thread between protection domains is done by a generalization of hardware interrupt han- dling, termed portal traversal Portal code is generated dynamically and performs portal-specific actions.
Hardware interrupts, IPC, and the Pebble equivalent ofsystem calls are all handled by the portal mechanism.Pebble generates specialized code for each portal toimprove run-time efficiency Portals are discussed inmore detail in the following section
3.1 Protection Domains, Portals and Threads
Each component runs in its own protection domain
(PD) A protection domain consists of a set of pages,
represented by a page table, and a set of portals, whichare generalized interrupt handlers, stored in the protec-tion domain’s portal table A protection domain mayshare both pages and portals with other protectiondomains Figure 1 illustrates the Pebble architecture
Figure 1 Pebble architecture Arrows denote portal
traversals On the right, an interrupt causes a device driver’s semaphore to be incremented, unblocking the device driver’s thread (see Section )
Trang 5Portals are used to handle both hardware interrupts and
software traps and exceptions The existence of a portal
from PD A to PD B means that a thread running in PD A
can invoke a specific entry point of PD B (and then
return) Associated with each portal is code to transfer a
thread from the invoking domain to the invoked domain
Portal code copies arguments, changes stacks, and maps
pages shared between the domains Portal code is
spe-cific to its portal, which allows several important
opti-mizations to be performed (described below)
Portals are usually generated in pairs The call portal
transfers control from domain PD A to PD B, and the
return portal allows PD B to return to PD A In the
follow-ing discussion we will omit the return portal for brevity
Portals are generated when certain resources are created
(e.g semaphores) and when clients connect to servers
(e.g when files are opened) Some portals are created at
the system initialization time (e.g interrupt and
excep-tion handling portals)
A scheduling priority, a stack, and a machine context are
associated with each Pebble thread When a thread
traverses a portal, no scheduling decision is made; the
thread continues to run, with the same priority, in the
invoked protection domain Once the thread executes in
the invoked domain, it may access all of the resources
available in the invoked domain, while it can no longer
access the resources of the invoking domain Several
threads may execute in the same protection domain at
the same time, which means that they share the same
portal table and all other resources
As part of a portal traversal, the portal code can
manipu-late the page tables of the invoking and/or invoked
pro-tection domains This most commonly occurs when a
thread wishes to map, for the duration of the IPC, a
region of memory belonging to the invoking protection
domain into the virtual address space of the invoked
protection domain; this gives the thread a window into
the address space of the invoking protection domain
while running in the invoked protection domain When
the thread returns, the window is closed
Such a memory window can be used to save the cost of
copying data between protection domains Variations
include windows that remain open (to share pages
between protection domains), windows that transfer
pages from the invoking domain to the invoked domain
(to implement tear-away write) and windows that
trans-fer pages from the invoked domain to the invoker (to
implement tear-away read)
Note that although the portal code may modify VM datastructures, only the VM manager and the portal manager(which generates portal code) share the knowledgeabout these data structures The Pebble nucleus itself isoblivious to those data structures
3.2 Safety
Pebble implements a safe execution environment by acombination of hardware memory protection that pre-vents access to memory outside the protection domain,and by limiting the access to the domain’s portal table
An protection domain may access only the portals itinherited from its parent and new portals that were gen-erated on its behalf by the portal manager The portalmanager may restrict access to new portals in conjunc-tion with the name server A protection domain cannottransfer a portal it has in its portal table to an unrelateddomain Moreover, the parent domain may intercept all
of its child portal calls, including calls that indirectlymanipulate the child’s portal table, as described inSection 6
3.3 Server Components
As part of the Pebble philosophy, system services areprovided by operating system server components, whichrun in user mode protection domains Unlike applica-tions, server components are trusted, so they may begranted limited privileges not afforded to applicationcomponents For example, the scheduler runs with inter-rupts disabled, device drivers have device registersmapped into their memory region, and the portal man-ager may add portals to protection domains (a protectiondomain cannot modify its portal table directly)
There are many advantages of implementing services atuser level First, from a software engineering standpoint,
we are guaranteed that a server component will use onlythe exported interface of other components Second,because each server component is only given the privi-leges that it needs to do its job, a programming error inone component will not directly affect other compo-
nents If a critical component fails (e.g., VM) the system
as a whole will be affected—but a bug in console devicedriver will not overwrite page tables
Additionally, as user-level servers can be interrupted atany time, this approach has the possibility of offeringlower interrupt latency time Given that server compo-nents run at user level (including interrupt-driventhreads), they can use blocking synchronization primi-tives, which simplifies their design This is in contrastwith handlers that run at interrupt level, which must not
Trang 6block, and require careful coding to synchronize with
the upper parts of device drivers
3.4 The Portal Manager
The Portal Manager is the operating system component
responsible for instantiating and managing portals It is
privileged in that it is the only component that is
permit-ted to modify portal tables
Portal instantiation is a two-step process First, the
server (which can be a Pebble system component or an
application component) registers the portal with the
por-tal manager, specifying the entrypoint, the interface
def-inition, and the name of the portal Second, a client
component requests that a portal with a given name be
opened The portal manager may call the name server to
identify the portal and to verify that the client is
permit-ted to open the portal If the name server approves the
access, the portal manger generates the code for the
por-tal, and installs the portal in the client’s portal table The
portal number of the newly generated portal is returned
to the client A client may also inherit a portal from its
parent as the result of a domain_fork(), as
described in Section 4.5.
To invoke the portal, a thread running in the client loads
the portal number into a register and traps to the
nucleus The trap handler uses the portal number as an
index into the portal table and jumps to the code
associ-ated with the portal The portal code transfers the thread
from the invoking protection domain to the invoked
pro-tection domain and returns to user level As stated
above, a portal transfer does not involve the scheduler in
any way (Section 5.4 describes the only exception to
this rule.)
Portal interfaces are written using a (tiny) interface
defi-nition language, as described in Section 4.4 Each portal
argument may be processed or transformed by portal
code The argument transformation may involve a
func-tion of the nucleus state, such as inserting the identity of
the calling thread or the current time The argument
transformation may also involve other servers For
example, a portal argument may specify the address of a
memory window to be mapped into the receiver’s
address space This transformation requires the
manipu-lation of data structures in the virtual memory server
The design of the portal mechanism presents the
follow-ing conflict: on one hand, in order to be efficient, the
argument transformation code in the portal may need to
have access to private data structures of a trusted server
(e.g., the virtual memory system); on the other hand,
trusted servers should be allowed to keep their internaldata representations private
The solution we advocate is to allow trusted servers,such as the virtual memory manager, to register argu-ment transformation code templates with the portalmanager (Portals registered by untrusted services would
be required to use the standard argument types.) Whenthe portal manager instantiates a portal that uses such anargument, the appropriate type-specific code is gener-ated as part of the portal This technique allows portalcode to be both efficient (by inlining code that trans-forms arguments) and encapsulated (by allowing servers
to keep their internal representations private) Althoughportal code that runs in kernel mode has access toserver-specific data structures, these data structures can-not be accessed by other servers The portal managercurrently supports argument transformation code of asingle trusted server, the virtual memory server
3.5 Scheduling and Synchronization
Because inter-thread synchronization is intrinsically ascheduling activity, synchronization is managed entirely
by the user-level scheduler When a thread creates a
semaphore, two portals (for P and V) are added to its
portal table that transfer control to the scheduler When
a thread in the domain invokes P, the thread is ferred to the scheduler; if the P succeeds, the scheduler returns If the P fails, the scheduler marks the thread as blocked and schedules another thread A V operation
trans-works analogously; if the operation unblocks a threadthat has higher priority than the invoker, the schedulercan block the invoking thread and run the newly-awak-ened one
3.6 Device Drivers and Interrupt Handling
Each hardware device in the system has an associatedsemaphore used to communicate between the interruptdispatcher component and the device driver componentfor the specific device
In the portal table of each protection domain there areentries for the portals that corresponds to the machine’shardware interrupts The Pebble nucleus includes a shorttrampoline function that handles all exceptions andinterrupts This code first determines the portal table ofthe current thread and then transfers control to theaddress that is taken from the corresponding entry inthis portal table The nucleus is oblivious to the specificsemantics of the portal that is being invoked The portalthat handles the interrupt starts by saving the processorstate on the invocation stack (see Section 5.1), then itswitches to the interrupt stack and jumps to the interrupt
Trang 7dispatcher In other words, this mechanism converts
interrupts to portal calls
The interrupt dispatcher determines which device
gener-ated the interrupt and performs a V operation on the
device’s semaphore Typically, the device driver would
have left a thread blocked on that semaphore The V
operation unblocks this thread, and if the now-runnable
thread has higher priority than the currently running
thread, it gains control of the CPU, and the interrupt is
handled immediately Typically, the priority of the
rupt handling threads corresponds to the hardware
inter-rupt priority in order to support nested interinter-rupts The
priority of the interrupt handling threads is higher than
all other threads to ensure short handling latencies In
this way, Pebble unifies interrupt priority with thread
priority, and handles both in the scheduler A pictorial
example of this process is found in Figure 1
Note that Pebble invokes the interrupt dispatcher
promptly for all interrupts, including low priority ones
However, the interrupt handling thread is scheduled
only if its priority is higher than the currently running
thread
Only a small portion of Pebble runs with interrupts
dis-abled, namely portal code, the interrupt dispatcher, and
the scheduler This is necessary to avoid race conditions
due to nested exceptions
3.7 Low and Consistent Interrupt Latency
Pebble provides low and consistent interrupt latency by
design, since most servers (except the interrupt
dis-patcher and the scheduler) run with interrupts enabled
The interrupt-disabled execution path in Pebble is short,
since portal code contain no loops, and the interrupt
dis-patcher and the scheduler are optimized for speed User
code cannot increase the length of the longest
interrupt-disabled path, and thus cannot increase the interrupt
latency In previous work we included details on the
interrupt handling mechanism in Pebble, along with
measurements of the interrupt latency on machines with
differering memory hierarchies [Bruno99] In particular,
the interrupt latency on the MIPS R5000 processor that
is used in this paper is typically 1200-1300 cycles from
the exception until the scheduling of the user-level
han-dling thread
3.8 Non-Stop Systems
Non-stop (or high-availability) systems are
character-ized by the ability to run continuously over extended
periods of time and support dynamic updates For
exam-ple, some systems, such as telephone switches, are
expected to run for years without unscheduled downtime Pebble is especially suited for these systems, sincemost system functionality may be replaced dynamically
by loading new servers and modifying portal tables Theonly component that cannot be replaced is the nucleus,which provides only minimal functionality
Portals are used for multiple purposes in Pebble In thissection, we describe a few of their applications
4.1 Interposition and Layering
One technique for building flexible system is to factor itinto components with orthogonal functionality that can
be composed in arbitrary ways For example, distributedshared memory or persistent virtual memory can beimplemented as a layer on top of a standard virtualmemory service Or, altered semantics can be offered bylayering: the binary interface of one operating systemcan be emulated on another operating system by inter-cepting system calls made by an application written forthe emulated system and implementing them throughthe use of native system calls
The portal mechanism supports this development odology very nicely Because the portal mechanism isused uniformly throughout the system, and a portal per-forms a user-level to user-level transfer, service compo-nents can be designed to both accept and use the sameset of portals
meth-For example, the primary task of a virtual memory ager is to accept requests for pages from its clients andservice them by obtaining the pages from the backingstore When a client requests a page, the virtual memorymanager would read the page from the backing storeand return it to the client via a memory window opera-tion A standard virtual memory service implementationwould support just this protocol, and would typically beconfigured with a user application as its client and thefile system as its backing store server
man-However, the backing store could be replaced with a tributed shared memory (DSM) server, which wouldhave the same interface as the virtual memory manager:
dis-it would accept page requests from dis-its client, obtain thepages from its backing store (although in this case thebacking store for a page might be the local disk oranother remote DSM server) and return the page to itsclient via a memory window operation By implement-ing the DSM server using the standard virtual memoryinterface, it can be layered between the VM and the file
Trang 8system Other services, such as persistent virtual
mem-ory and transactional memmem-ory, can be added this way as
well
When a page fault takes place, the faulting address is
used to determine which portal to invoke Typically a
single VM fault handler is registered for the entire range
of an application’s heap, but this need not be the case
For example, a fault on a page in a shared memory
region should be handled differently than a fault on a
page in a private memory region By assigning different
portals to subranges of a protection domain’s address
space, different virtual memory semantics can be
sup-ported for each range
4.2 Portals Can Encapsulate State
Because portal code is trusted, is specific to its portal,
and can have private data, portal code can encapsulate
state associated with the portal that need not be exposed
to either endpoint The state of the invoking thread is a
trivial example of this: portal code saves the thread’s
registers on the invocation stack (see Section 5.1), and
restores them when the thread returns On the flip side,
data used only by the invoked protection domain can be
embedded in the portal where the invoker cannot view
or manipulate it Because the portal code cannot be
modified by the invoking protection domain, the
invoked protection domain is ensured that the values
passed to it are valid This technique frequently allows
run-time demultiplexing and data validation code to be
removed from the code path
As an example, in Pebble, portals take the place of file
descriptors An open() call creates four portals in the
invoking protection domain, one each for reading,
writ-ing, seeking and closing The code for each portal has
embedded in it a pointer to the control block for the file
To read the file, the client domain invokes the read
portal; the portal code loads the control block pointer
into a register and transfers control directly to the
spe-cific routine for reading the underlying object (disk file,
socket, etc.) No file handle verification needs to be
done, as the client is never given a file handle; nor does
any demultiplexing or branching based on the type of
the underlying object need to be done, as the appropriate
read routine for the underlying object is invoked directly
by the portal code In this way, portals permit run-time
checks to be “compiled out,” shortening the code path
To be more concrete, the open() call generates four
consecutive portals in the caller’s portal table Open()
returns a file descriptor, which corresponds to the index
of the first of the four portals The read(), write(),
seek() and close() calls are implemented bylibrary routines, which invoke the appropriate portals, asseen in Figure 2 invoke_portal() invokes theportal that is specified in its first argument (Note thatthe portal code of read and write may map the bufferargument in a memory window to avoid data copying )
4.3 Short-Circuit Portals
In some cases the amount of work done by portal versal to a server is so small that the portal code itselfcan implement the service A short-circuit portal is onethat does not actually transfer the invoking thread to anew protection domain, but instead performs therequested action inline, in the portal code Examplesinclude simple “system calls” to get the current thread’s
tra-ID and read the high resolution cycle counter The TLBmiss handler (which is in software on the MIPS archi-tecture, the current platform for Pebble) is also imple-mented as a short-circuit portal
Currently, semaphore synchronization primitives areimplemented by the scheduler and necessitate portal tra-versals even if the operation does not block However,these primitives are good candidates for implementation
as hybrid portals When a P operation is done, if the
semaphore’s value is positive (and thus the invokingthread will not block), the only work done is to decre-ment the semaphore, and so there is no need for thethread to transfer to the scheduler The portal code coulddecrement the semaphore directly, and then return Only
in the case where the semaphore’s value is zero and thethread will block does the calling thread need to transfer
to the scheduler Similarly, a V operation on a phore with a non-negative value (i.e., no threads are
sema-blocked waiting for the semaphore) could be performed
in a handful of instructions in the portal code itself
Although these optimizations are small ones (domaintransfer takes only a few hundred cycles), operations
Figure 2 Implementing file descriptors with portals
read(fd, buf, n)
invoke_portal(fd, buf, n)write(fd, buf, n)
invoke_portal(fd+1, buf, n)
seek(fd, offset, whence)
invoke_portal(fd+2, offset, whence)close(fd)
invoke_portal(fd+3)
Trang 9that are on the critical path can benefit from even these
small savings
4.4 Portal Specification
The portal specification is a string that describes the
behavior of the portal It controls the generation of
por-tal code by the porpor-tal manager The porpor-tal specification
includes the calling conventions of the portal, which
registers are saved, whether the invoking domain shares
a stack with the invoked domain, and how each
argu-ments is processed
The first character in the specification encodes the
por-tal’s stack manipulation For example, “s” denotes that
the invoking domain shares its stack with the invoked
domain “n” denotes that the invoked domain allocated
a new stack The second character specifies the amount
of processor state that is saved or restored For example,
“m” denotes that only minimal state is saved, and that
the invoking domain trusts the invoked domain to obey
the C calling convention “p” denotes that partial state
is saved, and that the invoking domain does not trust the
invoked domain to retain the values of the registers
required by the C calling convention The rest of the
specification contains a sequence of single character
function codes, that specify handling of the
correspond-ing parameters For example, the template “smcwi”
specifies a shared stack, saving minimal state, passing a
constant in the first parameter, passing a one-page
mem-ory window in the second parameter, and passing a word
without transformation in the third parameter This
tem-plate is used by the read and write portals
4.5 Portal Manipulations
As described earlier, portals are referred to by their
index in the local portal table A portal that is available
in a particular portal table cannot be exported to other
protection domains using this index A protection
domain may access only the portals in its portal table
These properties are the basis for Pebble safety When a
thread calls fork(), it creates a new thread that
exe-cutes in the same protection domain as the parent When
a thread calls domain_fork(), it creates a new
pro-tection domain that has a copy of the parent domain’s
portal table The parent may modify the child’s portal
table to allow portal interposition, which is described in
Section 6
In this section we discuss some of the more interesting
implementation details of Pebble
5.1 Nucleus Data Structures
The Pebble nucleus maintains only a handful of datastructures, which are illustrated in Figure 3 Each thread
is associated with a Thread data structure It containspointer to the thread’s current portal table, user stack,interrupt stack and invocation stack The user stack isthe normal stack that is used by user mode code Theinterrupt stack is used whenever an interrupt or excep-tion occurs while the thread is executing The interruptportal switches to the interrupt stack, saves state on theinvocation stack and calls the interrupt dispatcherserver
The invocation stack keeps track of portal traversals andprocessor state The portal call code saves the invokingdomain’s state on this stack It also saves the address ofthe corresponding return portal on the invocation stack.The portal return code restores the state from this stack
The portal table pointer in the Thread data structure isportal table of the domain that the thread is currentlyexecuting in It is changed by the portal call and restored
by the portal return
5.2 Virtual Memory and Cache
The virtual memory manager is responsible for taining the page tables, which are accessed by the TLBmiss handler and by the memory window manipulationcode in portals The virtual memory manager is the onlycomponent that has access to the entire physical mem-ory The current implementation of Pebble does not sup-port demand-paged virtual memory
main-Pebble implementation takes advantage of the MIPStagged memory architecture Each protection domain is
Figure 3 Pebble nucleus data structures
thread currentlyrunning
Trang 10allocated a unique ASID (address space identifier),
which avoids TLB and cache flushes during context
switches Portal calls and returns also load the mapping
of the current stack into TLB entry 0 to avoid a certain
TLB miss
On the flip side, Pebble components run in separate
pro-tection domains in user mode, which necessitates
care-ful memory allocation and cache flushes whenever a
component must commit values to physical memory
For example, the portal manager must generate portal
code so that it is placed in contiguous physical memory
5.3 Memory Windows
The portal code that opens a memory window updates
an access data structure that contains a vector of
counters, one counter for each protection domain in the
system The vector is addressed by the ASID of the
cor-responding domain The counter keeps track of the
num-ber of portal traversals into the corresponding domain
that passed this page in a memory window This counter
is incremented by one for each portal call, and is
decre-mented by one for each portal return The page is
acces-sible if the counter that corresponds with the domain is
greater than zero We must use counters and not bit
val-ues for maintaining page access rights, since the same
page may be handed to the same domain by multiple
concurrent threads
The page table contains a pointer to the corresponding
access data structure, if any Only shared pages have a
dedicated access data structure
The portal code does not load the TLB with the mapping
of the memory window page Rather, the TLB miss
han-dler consults this counter vector in order to verify the
access rights to this page This arrangement saves time
if the shared window is passed to another domain
with-out being touched by the current domain The portal
return code must remove the corresponding TLB entry
when the counter reaches zero
5.4 Stack Manipulations
The portal call may implement stack sharing, which
does not require any stack manipulations The invoked
domain just uses the current thread’s stack
If the portal call requires a new stack, it obtains one
from the invoked domain’s stack queue In this case, the
invoked protection domain must pre-allocate one or
more stacks and notify the portal manger to place them
in the domain’s stack queue The portal call dequeues a
new stack from the invoked domain’s stack queue If the
stacks queue is empty, the portal calls the scheduler andwaits until a stack becomes available The portal returnenqueues the released stack back in the stack queue Ifthere are any threads waiting for the stack, the portalreturn calls the scheduler to pick the first waiting threadand allow it to proceed in its portal code
The portal that calls the interrupt dispatcher after aninterrupt switches the stack to the interrupt stack, which
is always available in every thread
5.5 Footprint
The Pebble nucleus and the essential components rupt dispatcher, scheduler, portal manager, real-timeclock, console driver and the idle task) can fit into about
(inter-70 pages (8KB each) Pebble does not support sharedlibraries yet, which cause code duplication among com-ponents Each user thread has three stacks (user, inter-rupt and invocation) which require three pages, althoughthe interrupt and invocation stacks could be placed onthe same page to reduce memory consumption In addi-tion, fixed size pages inherently waste memory Thiscould be alleviated on segmented architectures
An important aspect of component-based system is the
ability to interpose code between any client and its
serv-ers The interposed code can modify the operation of theserver, enforce safety policies, enable logging and errorrecovery services, or even implement protocol stacksand other layered system services
Pebble implements low-overhead interposition by fying the portal table of the controlled domain Since allinteractions between the domain and its surroundingsare implemented by portal traversals, it is possible toplace the controlled domain in a comprehensive sand-box by replacing the domain’s portal table All of theoriginal portals are replaced with portal stubs, whichtransfer to the interposed controlling domain The con-trolling domain intercepts each portal traversal thattakes place, performs whatever actions it deems neces-sary, and then calls the original portal Portal stubs passtheir parameters in the same way as the original portals,which is necessary to maintain the semantics of the
modi-parameter passing (e.g windows) Actually, portal stubs
are regular portals that pass the corresponding portalindex in their first argument The controlling domaindoes not have to be aware of the particular semantics ofthe intercepted portals; it can implement a transparentsandbox by passing portal parameters verbatim
Trang 11The top diagram of Figure 4 illustrates the configuration
of the original portal table without interposition, where
the domain calls its servers directly The bottom
dia-gram shows the operation of portal interposition In this
case, all of the portals in the controlled domain call the
controlling domain, which makes the calls to the
serv-ers
However, one-time modification of the controlled
domain’s portal table is not enough Many servers create
new portals dynamically in their client’s portal table,
and then return an index to the newly created portal
back to the client Since the controlling domain calls the
server, the server creates new portals in the controlling
domain’s table The controlling domain is notified by
the portal manager that a new portal was created in its
portal table The notification portal completes the
pro-cess by creating a portal stub in the controlled domain’s
table with the same index as in controlling domain table
The portal stub calls the controlling domain and passesthe parameters in the same way as the original portal Inthis way, the controlling domain implements a robustsandbox around the controlled domain, without actuallyunderstanding the semantics of the controlled domainportals
There are a few comments about this interpositionmechanism First, the controlled domain cannot detectthat its portals are diverted nor can it thwart the interpo-sition in any way This mechanism is similar to the UnixI/O redirection, in which a child process accesses stan-
dard file descriptor (e.g., 0, 1 and 2), which are
redi-rected by the parent process Second, portalinterposition is more comprehensive than Unix I/O redi-
rection, since we can control all interactions between
the controlled domain and its environment Third, position can be recursive: a controlling domain inter-poses the portals of a child domain, which does the same
inter-to its child, ad infinitum The last comment deals with
the semantics of certain system services, like fork()
and sbrk(), which change the internal state of thecalling domain; these are somewhat tricky to implement
in the face of transparent interposition We have had tomake special accommodations to allow the controllingdomain to issue them on behalf of the controlleddomain
6.1 Implementing a Transparent Sandbox by Portal Interposition
The Appendix contains a code excerpt from a programthat implements a transparent sandbox around its childdomain The program counts the number of times eachportal was called by the child domain, and completes allchild portal traversals by calling the appropriate server
It is a fully functional program; we omitted only errorhandling code, due to space constraints When run onour test hardware (see Section 7, below) the overhead ofthis process is 1511 machine cycles for one iteration(two sem_wait() and two sem_post()), which isroughly twice the execution time of the original codewithout interposition
The program starts by calling portal_notify(),which registers the routine notify() with the portalmanager Any modification to the calling domain’s por-tal table will call notify() immediately even beforethe portal that caused it has returned
Portal_notify() is necessary to handle any portalcall that the parent executed on behalf of the child whichcreated a new portal in the parent’s portal table Thisnew portal should be replicated also in the child’s portaltable to ensure correct operation The above situation
Figure 4 Original portal configuration (above)
and with portal interposition (below)
portal tablenotification
createportalcreate
portal
Trang 12occurs in the example when the parent executes
sem_create() on behalf of the child
The notify() routine receives the template of the
newly created portal and its position in the portal table
It creates a portal in the child’s portal table at the same
position The portal’s template is modified to pass the
portal number as the first argument
The program proceeds to create a child domain by
domain_fork() The child starts with a copy of the
parent’s portal table However, all of the entries in the
child’s portal table now point at the intercept()
routine in the parent domain The first argument to the
intercept() routine is the index of the called portal
in the portal table This routine increments the counters
and then performs the required action by invoking the
portal with the same index in the parent domain
invoke_portal() let applications invoke a specific
portal in the caller’s portal table The intercept()
routine assumes that portals have no more than five
parameters
The child domain executes the measure() routine,
which measures the execution time of a semaphore
ping-pong between two threads in the same domain The
hrtime() function returns the current value of the
high-resolution timer, which is incremented every two
machine cycles Measure() creates two semaphores
by calling sem_create() The scheduler creates two
new portals for each semaphore in the parent domain,
which calls notify() to create the corresponding
stubs in the child domain’s portal table
In this section we measure the performance of Pebble
and, where possible, compare it with OpenBSD running
on the same hardware The test hardware is an
Algorith-mics P-5064 board, which includes a 166 MHz MIPS
R5000 processor with 32 KB instruction + 32 KB data
level one cache (two way set associative), one megabyte
integrated level two cache and 64MB of memory We
ran version 2.4 of OpenBSD
Times were measured using the high-resolution on-chip
timer, which is incremented every two clock cycles All
results are presented in terms of elapsed machine cycles,
not elapsed time, as our tests generally fit into the level
one or level two cache As long as cache memory speed
scales with processor speed, cycle-based results will
remain meaningful To convert cycle counts to elapsed
time, multiply by the cycle time (6 ns)
As the code size of Pebble is very small, and the cacheassociativity of the level one cache is low (two-way),the performance of Pebble is very dependent on howcode and data is placed in the cache Out of a sense offairness, in our experiments we specifically do not makeany attempt to control cache layout We believe thatwith careful tuning of the cache layout, we could reducethe number of cache misses and conflicts Given the per-formance results we have seen to date, we have felt littleneed to go to this effort
The context switch, pipe latency, and semaphore latencytests were adapted from the hBench:OS test suite[Brown98] All tests on Pebble were run 10,000 times.The context switch and pipe latency times presented forOpenBSD were the 80% trimmed mean (excluding thesmallest 10% and largest 10% of the measurements) oftwenty results of 10,000 iterations, as per thehBench:OS measurement methodology In all cases thestandard deviation for Pebble measurements was lessthan 1%
7.1 IPC
A naive implementation of inter-process communication(IPC) will emulate the behavior of a remote procedurecall (RPC), marshalling all arguments into a buffer,copying the buffer from the invoking protection domain
to the invoked protection domain, unmarshalling them,and then calling the server function Several commonoptimizations can be performed that greatly improve theperformance of IPC
First, the amount of data transmitted in an IPC follows abimodal distribution [Bershad89]; either a small number
of bytes are sent (in which case they can be passed inregisters) or a large number of bytes are sent (in whichcase it may make more sense to transfer the data usingvirtual memory mapping operations)
In this test we measure the cost of performing an IPCwhen all data fits into registers, when a one-page mem-ory window is passed to the invoked domain (but theinvoked domain does not access the page), and when theone-page memory window is written by the invokeddomain Because virtual memory and the TLB are man-aged in software on the MIPS, the memory managementunit is not involved if when passing a memory window
if the window is never used, although there is someadditional portal overhead When the window is used inthe invoked domain, a TLB fault takes place, and thememory management unit comes into play Moreover,the portal code may have to remove the resulting TLBentry on return
Trang 13Simply measuring the per-leg cost of an IPC between
two domains does not tell the entire story In a system
that has been factored into components, we may find
that a client request to service A causes A to make a
request of A′, A′ to make a request of A′′, and so on, until
the initial request is finally satisfied For example, a
cli-ent page fault generates a request to its VM service, then
makes a request of the file system, which then makes a
request the disk driver to bring the page into memory
Although simple IPC between two protection domains
must be cheap, it is also critical that when a cascade of
IPCs takes place performance does not drop
precipi-tously
In this test we measure the time to perform an IPC to the
same domain and return (A→A→A), the time required
to perform an IPC to a second domain and return
(A→B→A), an IPC involving three domains
(A→B→C→B→A) and so on, up to a total of eight
domains We used the portal specification “npciii”
(no window) and “npcwii” (with memory window),
which means that a new stack was allocated on call and
reclaimed on the return Also, all processor registers that
should be preserved across calls according to the C
call-ing convention were saved on call and restored on
return See Section 4.4 for a description of portal
speci-fication The results are presented as the per-leg
(one-way) time, in cycles
As a point of comparison, we included the time required
to perform a “null” short-circuit portal traversal (user
level → nucleus → user level) This is the Pebble
equiv-alent to a “null” system call, and can be thought of as the
minimum time required to enter and leave the nucleus
Results of these tests are found in Table 1 In all cases
parameters are passed only in registers and not on the
stack
We see that the times per leg with no window and with
an unused window remains roughly constant as thenumber of domains traversed increases, at about 114 and
135 cycles; the overhead of passing a window through aportal is thus 21 machine cycles The time per legincreases above 4 domains due to cache contention.When the memory window is used, the cost increases byabout 50 cycles, which is the time required to handle aTLB fault and then remove the TLB entry on returnfrom the IPC The one outlier is in the single domaincase, where there is no TLB fault at all; this is becausethe page is already mapped in the domain (as there isonly one domain)
An optimization can be performed if the invokingdomain trusts the invoked domain (as would be the casewith an application invoking a system service) The twocan share a stack, saving the costs of allocating a stackfrom a pool in the invoked protection domain and copy-ing data to the new stack Also, no additional processorregistered are saved on the call, since the invokingdomain trusts the invoked domain to save and restorethose registers We used the portal specifications
“smciii” and “smcwii” Even in the tested case,where no data is passed on the stack, this optimizationhas a significant performance benefit, as seen in Table 2
The savings of this optimization are measured here to beabout 20 cycles, which reduces the per-leg time by 17%
In addition, by sharing stacks between invoking andinvoked protection domains, the number of stacks, andhence amount of memory, needed by the system isdecreased, which is an absolute good
Pebble IPC time is slightly higher than Aegis, an ernel, on MIPS processors [Engler95] Aegis performs aminimal one-way protected control transfer in about 36cycles on MIPS R2000 and R3000 processors, and per-forms a null system call without a stack in about 40cycles Pebble’s IPC takes longer since it maintains aninvocation stack, which enables easy scheduling of thethread
window + fault short-
Table 1 IPC in Pebble, new stack and partial save,
All times in CPU cycles, the mean of 10,000 runs
window + fault
Trang 147.2 Context Switch
As described above, portal traversal does not involve a
scheduling decision In this section we show the cost of
a context switch in Pebble
We measure Pebble context switch cost in two ways,
first using Pebble’s explicit yield primitive, and then by
passing a one-byte token around a ring of pipes The
lat-ter test was derived from hBench:OS, and was used to
compare the performance of Pebble with OpenBSD In
both cases a number of protection domains, with a
sin-gle thread each, are arranged in a ring, and scheduled in
turn Measurements are found in Table 3
We see that the cost of an explicit yield increases with
the number of protection domains, up to a certain point,
and then levels off As the work done by the scheduler
in this case is independent of the number of processes (it
simply selects the next thread from the ready queue), the
increase in time is due to cache effects: as we grow out
of the level one cache, we rely more on the level two
cache, to the point where we are running almost entirely
out of the level two cache (at six protection domains)
We would expect to see a similar jump at the point
where we begin to overflow the one-megabyte level two
cache
The OpenBSD pipe test shows similar behavior,
level-ing off at four protection domains and roughly 2200
machine cycles
7.3 Pipe Latency
This test measures the time required to pass a single
byte through pipes connecting a ring of processes Each
value represents the time to transfer one byte between
two adjacent processes, and includes the context switch
time By measuring the time required to transmit a
sin-gle byte, we capture the overhead associated with using
pipes; the more data that is sent, the more the data copy
time will mask pipe costs Results are found in Table 4
We see that, as with the context switch times, the BSD pipe time increases up to five domains, and thenlevels off The difference between the numbers inTable 4 and Table 3 gives us the time required to trans-fer data through a pipe on each system On OpenBSDthe pipe overhead is roughly 2000 cycles; on Pebble it isapproximately half that
Open-7.4 Semaphore Acquire/Release
This test is very similar to the test in Section 7.3, butinstead of using pipes we use semaphores A number ofprocesses are arranged in a ring, and are synchronized
by means of n semaphores Each process performs a V operation on its right semaphore and then a P operation
on its left semaphore Each value in the table represents
the time to release a semaphore in process p and acquire
it in process (p + 1) mod n around a ring of n processes,
including the context switch time Results are found inTable 5
When there are two processes the difference betweenPebble and OpenBSD is roughly 1500 cycles, 1000cycles of which can be attributed to the difference incontext switch times As the number of domains (andthus semaphores) increases, the difference widens;because Pebble’s semaphores are a highly optimizedkey system primitive, and OpenBSD’s semaphores arenot, we believe that this is due to a restriction in theimplementation of OpenBSD semaphores, and is not areflection of the difference in system structure
yield
Pebble pipe
OpenBSD pipe
Table 3 Context switch times, Pebble vs OpenBSD.
In CPU cycles, the mean of at least 10,000 runs
Table 4 Pipe latency, Pebble vs OpenBSD In CPU
cycles, the mean of at least 10,000 runs
semaphore
OpenBSD semaphore
Trang 157.5 Portal Generation
Table 6 shows the portal generation time for two typical
portals This is the time measured by an application
pro-gram, including all overheads incurred by the portal
manager The first portal (with specification “smcii”)
is typically used to call a trusted server with only integer
arguments The second portal (with specification
“npcwi”) is typically used to call an untrusted server
with a memory window argument See Section 4.4 for
additional explanations of portal specifications
Table 6 indicates that portal generation time is relatively
fast An examination of the portal manager reveals that
portal generation time includes a large fixed overhead
for interpretation of the specification string and for
cache flushing We can reduce this time by employing
various techniques used for run-time code generation,
e.g., the techniques used by VCODE [Engler96]
The Pebble nucleus and a small set of servers
(sched-uler, portal manager, interrupt dispatcher, and minimal
VM) and devices (console and clock) currently run on
MIPS-based single-board computers from Algorithmics
We support both the P-4032 (with QED RM5230
pro-cessor) and P-5064 (with IDT R5000 or QED RM7000
processors) We are currently porting Ethernet and SCSI
device drivers to Pebble
Next we plan to port Pebble to the Intel x86 to verify
that Pebble mechanisms and performance advantages
are indeed architecture independent We also plan to
implement a demand-paged virtual memory system
Building a high-performance VM system for Pebble is a
challenge, since the servers cannot (and should not)
share data structures freely We also plan to port a TCP/
IP stack to Pebble and compare its performance with
similar user-level protocol stacks
In addition to the Intel x86 port, we plan to port to a
symmetric multiprocessor and to an embedded
proces-sor such as the StrongARM We also plan to investigate
the various processor architecture support for
compo-nent-based systems such as Pebble
Pebble provides a new engineering trade-off for the struction of efficient component-based systems, usinghardware memory management to enforce protectiondomain boundaries, and reducing the cross domaintransfer time by synthesizing custom portal code Pebbleenhances flexibility by maintaining a private portal tablefor each domain This table can be used to provide dif-ferent implementations of system services, servers andportal interposition for each domain In addition, portalinterposition allows running untrusted code in a robustsandbox with an acceptable overhead while usingunsafe languages such as C
con-Having a small nucleus with minimal functionalityenhances system modularity, while it enables non-stopsystems to modify their behavior by integrating newservers on-the-fly
In this paper we showed that Pebble is much faster thanOpenBSD for a limited set of system-related micro-benchmarks Pebble efficiency does not stem fromclever low-level highly-optimized code; rather it is anatural consequence of custom portal synthesis, judi-cious processor state manipulations at portal traversals,encapsulating state in portal code, and direct transfer ofcontrol from clients to their servers without schedulerintervention
Pebble can be used to build systems that are more ble, as safe as, and have higher performance than con-ventionally constructed systems
[Bershad89] B Bershad, T Anderson, E Lazowska, H Levy,
“Lightweight Remote Procedure Call,” Proc 12th SOSP, pp.
102–113 (1989)
[Bershad95] B Bershad, S Savage, P Pardyak, E Sirer, M.Fiuczynski, D Becker, C Chambers, S Eggers, “Extensibility,Safety, and Performance in the SPIN Operating System,”
cycles per instr.
Table 6 Portal generation time.
Trang 16[Brown98] A Brown, M Seltzer, “Operating System
Bench-marking in the Wake of lmbench: A Case Study of the
Perfor-mance of NetBSD on the Intel x86 Architecture,” Proc 1997
SIGMETRICS, pp 214–224 (1997).
[Bruno99] J Bruno, J Brustoloni, E Gabber, A Silberschatz,
C Small, “Pebble: A Component-Based Operating System for
Embedded Applications,” Proc USENIX Workshop on
Embedded Systems, Cambridge, MA (1999).
[Custer92] H Custer, Inside Windows NT, Microsoft Press,
Redmond, WA (1992)
[Dijkstra68] E W Dijkstra, “The Structure of “THE”
Multi-programming System,” CACM, Volume 11, Number 5, pp
341-346 (1968)
[Dorward97] S Dorward, R Pike, D Presotto, D Ritchie, H
Trickey, P Winterbottom, “Inferno,” Proc IEEE Compcon 97,
pp 241–244 (1997)
[Engler95] D Engler, M Frans Kaashoek, J O’Toole Jr.,
“Exokernel: An Operating System Architecture for
Applica-tion-Level Resource Management”, Proc 15th SOSP, pp
251-266 (1995)
[Engler96] D Engler, “VCODE: A Retargetable, Extensible,
Very Fast Dynamic Code Generation System”, Proc
Confer-ence on Programming Language Design and Implementation
(PLDI’96), pp 160-170 (1996).
[Ford96] B Ford, M Hibler, J, Lepreau, P Tullmann, G
Back, S Clawson, “Microkernels Meet Recursive Virtual
Machines,” Proc 2nd OSDI, pp 137–151 (1996).
[Golub90] D Golub, R Dean, A Forin, R Rashid, “UNIX as
an Application Program,” Proc 1990 Summer USENIX, pp.
87–96 (1990)
[Gosling96] J Gosling, B Joy, G Steele, The Java
Lan-guage Specification, Addison-Wesley, Reading, MA (1996).
[Helander98] J Helander and A Forin, “MMLite: A Highly
Componentized System Architecture”, Proc 8th ACM
SIGOPS European Workshop, Sintra, Portugal (1998).
[Intel94] Intel Corp., Pentium Family User’s Manual Volume
3: Architecture and Programming Manual (1994).
[Kaashoek97] M F Kaashoek, D Engler, G Ganger, H
Briceño, R Hunt, D Mazières, T Pinckney, “Application
Per-formance and Flexibility on Exokernel Systems,” Proc 16th
SOSP, pp 52–65 (1997).
[Liedtke95] J Liedtke, “On Micro-Kernel Construction,”
Proc 15th SOSP, pp 237–250 (1995).
[Liedtke97] J Liedtke, K Elphinstone, S Schönberg, H
Här-tig, G Heiser, N Islam, T Jager, “Achieved IPC
Perfor-mance,” Proc 6th HotOS, pp 28–3 (1997).
[Massalin92] H Massalin, Synthesis: An Efficient
Implemen-tation of Fundamental Operating System Services, Ph.D
the-sis, Columbia University Department of Computer Science,New York, NY (1992)
[Mitchell94] J G Mitchel et al, “An Overview of the Spring System”, Proc Compcon Spring 1994, pp 122-131 (1994).
[Pike90] R Pike, D Presotto, K Thompson, H Trickey, “Plan
9 from Bell Labs,” Proc Summer 1990 UKUUG Conf., pp 1–
9 (1990)
[Probert91] D Probert, J Bruno, M Karaorman, “SPACE: A
New Approach to Operating System Abstractions,” Proc Intl.
Workshop on Object Orientation in Operating Systems (IWOOS), pp 133–137 (1991), Also available on-line at
ftp.cs.ucsb.edu/pub/papers/space/iwooos91.ps.gz[Pu95] C Pu, T Autrey, A Black, C Consel, C Cowan, J.Inouye, L Kethana, J Walpole, K Zhang, “Optimistic Incre-mental Specialization: Streamlining a Commercial Operating
System,” Proc 15th SOSP, pp 314–324, (1995).
[Rozier88] M Rozier, V Abrossimov, F Armand, I Boule, M.Gien, M Guillemont, F Herrmann, C Kaiser, S Langlois, P.Leonard, W Neuhauser “Chorus Distributed Operating Sys-
tem.” Computing Systems 1(4), pp 305–370 (1988).
[Seltzer96] M Seltzer, Y Endo, C Small, K Smith, “DealingWith Disaster: Surviving Misbehaved Kernel Extensions,”
Proc 2nd OSDI, pp 213–227 (1996).
[Veitch96] A C Veitch and N C Hutchinson, “Kea - ADynamically Extensible and Configurable Operating SystemKernel”, Proc 3rd Conference on Configurable and Distrib-uted Systems (ICCDS’96), Annapolis, Mariland (1996)