1.If there is extensive damage toa wide portion of the database due to catastrophicfailure, such as a disk crash, the recovery method restores a past copy of the data-base that was backe
Trang 1610 IChapter 18 Concurrency Control Techniques
and Bassiouni (1988) Papadimitriou and Kanellakis (1979) and Bernstein and Goodman(1983) discuss multiversion techniques Multiversion timestamp ordering was proposed inReed (1978, 1983), and multiversion two-phase locking is discussed in Lai and Wilkinson(1984) A method for multiple locking granularities was proposed in Gray et al (1975),and the effects of locking granularities are analyzed in Ries and Stonebraker (1977).Bhargava and Reidl (1988) presents an approach for dynamically choosing among variousconcurrency control and recovery methods Concurrency control methods for indexes arepresented in Lehman and Yao (1981) and in Shasha and Goodman (1988) A perfor-mance study of various B+ tree concurrency control algorithms is presented in Srinivasanand Carey (1991)
Other recent work on concurrency control includes semantic-based concurrencycontrol (Badrinath and Ramamritham, 1992), transaction models for long runningactivities (Dayal et al., 1991), and multilevel transaction management (Hasse andWeikum, 1991)
Trang 2Database Recovery Techniques
Inthis chapter we discuss some of the techniques that can be used for database recovery
from failures We have already discussed the different causes of failure, such as system
crashes and transaction errors, in Section 17.1,4 We have also covered many of the
con-cepts that are used by recovery processes, such as the system log and commit points, in
Section 17.2
We start Section 19.1 with an outline of a typical recovery procedures and a
categor-ization of recovery algorithms, and then discuss several recovery concepts, including
write-ahead logging, in-place versus shadow updates, and the process of rolling back (undoing)
the effect of an incomplete or failed transaction In Section 19.2, we present recovery
techniques based on deferred update, also known as the NO-UNDO/REDO technique In
Section 19.3, we discuss recovery techniques based on immediate update; these include the
UNDO/REDO and UNDO/NO-REDO algorithms We discuss the technique known as
shadowing or shadow paging, which can be categorized as aNO-UNDO/NO-REDOalgorithm
inSection 19,4 An example of a practicalDBMSrecovery scheme, calledARIES,is presented
in Section 19.5 Recovery in rnultidatabases is briefly discussed in Section 19.6 Finally,
techniques for recovery from catastrophic failure are discussedinSection 19.7
Our emphasis is on conceptually describing several different approaches to recovery
For descriptions of recovery features in specific systems, the reader should consult the
bibliographic notes and the user manuals for those systems Recovery techniques are often
intertwined with the concurrency control mechanisms Certain recovery techniques are
best used with specific concurrency control methods We will attempt to discuss recovery
611
Trang 3612 IChapter 19 Database Recovery Techniques
concepts independently of concurrency control mechanisms, but we will discuss thecircumstances under which a particular recovery mechanism is best used with a certainconcurrency control protocol
1.If there is extensive damage toa wide portion of the database due to catastrophicfailure, such as a disk crash, the recovery method restores a past copy of the data-base that was backed upto archival storage (typically tape) and reconstructs amore current state by reapplying orredoing the operations of committed transac-tions from thebacked uplog, uptothe time of failure
2 When the database is not physically damaged but has become inconsistent due tononcatastrophic failures of types 1 through 4 of Section 17.1.4, the strategy is toreverse any changes that caused the inconsistency byundoingsome operations Itmay also be necessary toredosome operations in order to restore a consistent state
of the database, as we shall see In this case we do not need a complete archivalcopy of the database Rather, the entries kept in the online system log are con-sulted during recovery
Conceptually, we can distinguish two main techniques for recovery from strophic transaction failures: (l) deferred update and (2) immediate update The deferredupdate techniques do not physically update the database on disk untilaftera transactionreaches its commit point; then the updates are recorded in the database Before reachingcommit, all transaction updates are recorded in the local transaction workspace (or buffers).During commit, the updates are first recorded persistently in the log and then written to thedatabase If a transaction fails before reaching its commit point, it will not have changed thedatabase in any way, soUNDOis not needed Itmay be necessary toREDOthe effect of theoperations of a committed transaction from the log, because their effect may not yet havebeen recorded in the database Hence, deferred update is also known as the NO-UNDO/ REDOalgorithm We discuss this technique in Section 19.2
noncata-In the immediate update techniques, the database may be updated by someoperations of a transaction before the transaction reaches its commit point However,these operations are typically recorded in the log on disk by force writing beforethey areapplied to the database making recovery still possible If a transaction fails after recordingsome changes in the database but before reaching its commit point, the effect of its
Trang 419.1 Recovery Concepts I 613
operations on the database must be undone; that is, the transaction must be rolled back
In the general case of immediate update, both undoand redo may be required during
recovery This technique, known as theUNDO/REDOalgorithm, requires both operations,
and is used most often in practice A variation of the algorithm where all updates are
recorded in the database before a transaction commits requiresundoonly, so it is known
as theUNDO/NO-REDOalgorithm We discuss these techniques in Section 19.3
19.1.2 Caching (Buffering) of Disk Blocks
The recovery process is often closely intertwined with operating system functions-in
particular, the buffering and caching of disk pages in main memory Typically, one or more
diskpages that include the data items to be updated are cached into main memory buffers
and then updated in memory before being written back to disk The caching of disk pages
is traditionally an operating system function, but because of its importance to the
effi-ciency of recovery procedures, it is handled by the DBMSby calling low-level operating
systems routines
In general, it is convenient to consider recovery in terms of the database disk pages
(blocks) Typically a collection of in-memory buffers, called the DBMS cache, is kept
under the control of the DBMSfor the purpose of holding these buffers A directory for
the cache is used to keep track of which database items are in the buffers.' This can be a
table of <disk page address, buffer location> entries When the DBMS requests
action on some item, it first checks the cache directory to determine whether the disk
page containing the item is in the cache If it is not, then the item must be located on
disk, and the appropriate disk pages are copied into the cache It may be necessary to
replace (or flush) some of the cache buffers to make space available for the new item
Some page-replacement strategy from operating systems, such as least recently used (LRU)
or first-in-first-out(FIFO),can be used to select the buffers for replacement
Associated with each buffer in the cache is a dirty bit, which can be included in the
directory entry, to indicate whether or not the buffer has been modified When a page is
first read from the database disk into a cache buffer, the cache directory is updated with the
new disk page address, and the dirty bit is set toa(zero) As soon as the buffer is modified,
the dirty bit for the corresponding directory entry is set to 1 (one) When the buffer
contents are replaced (flushed) from the cache, the contents must first be written back to
the corresponding disk pageonly if its dirty bitis 1 Another bit, called the pin-unpin bit, is
alsoneeded-a page in the cache is pinned (bit value 1 (one» if it cannot be written back
to disk as yet
Two main strategies can be employed when flushing a modified buffer back to disk
The first strategy, known as in-place updating, writes the buffer back to thesame original
disk location, thus overwriting the old value of any changed data items on disk,z Hence, a
single copy of each database disk block is maintained The second strategy, known as
shadowing, writes an updated buffer at a different disk location, so multiple versions of
1.This is somewhat similar to the concept ofpage tablesused by the operating system
2 In-place updating is used in most systems in practice
Trang 5614 I Chapter 19 Database Recovery Techniques
data items can be maintained In general, the old value of the data item before updating iscalled the before image(BFIM),and the new value after updating is called the after image
(AFIM) In shadowing, both the BFIM and the AFIM can be kept on disk; hence, it is notstrictly necessary to maintain a log for recovering We briefly discuss recovery based onshadowing in Section 19.4
we need to distinguish between two types of log entry information included for a writecommand: (1) the information needed for UNDO and (2) that needed for REDO AREDO-
type log entry includes the new value (AFIM) of the item written by the operation sincethis is needed toredothe effect of the operation from the log (by setting the item value inthe databasetoits AFIM) The UNDO-type log entries include the old value (BFIM) of theitem since this is needed to undothe effect of the operation from the log (by setting theitem value in the database back toits BFIM) In an UNDO/REDO algorithm, both types oflog entries are combined In addition, when cascading rollback is possible, read_item
entries in the log are consideredtobe UNDO-type entries (see Section 19.1.5)
As mentioned, the DBMS cache holds the cached database disk blocks, which includenot onlydatablocksbut also index blocksandlogblocksfrom the disk When a log record iswritten, it is stored in the current log block in the DBMS cache The log is simply asequential (append-only) disk file and the DBMS cache may contain several log blocks (forexample, the last n log blocks) that will be written to disk When an update to a datablock-stored in the DBMS cache-is made, an associated log record is written to the lastlog block in the DBMS cache With the write-ahead logging approach, the log blocks thatcontain the associated log records for a particular data block update must first be written
to disk before the data block itself can be written back to disk
Standard DBMS recovery terminology includes the terms steal/no-steal and force, which specify when a page from the database can be written to disk from the cache:
force/no-1 Ifa cache page updated by a transactioncannotbe written to disk before the action commits, this is called a no-steal approach The pin-unpin bit indicates if
trans-a ptrans-age ctrans-annot be written btrans-ack to disk Otherwise, if the protocol trans-allows writing trans-anupdated bufferbeforethe transaction commits, it is called steal Steal is used whenthe DBMS cache (buffer) manager needs a buffer frame for another transaction andthe buffer manager replaces an existing page that had been updated but whosetransaction has not committed
2.If all pages updated by a transaction are immediately written to disk when the action commits, this is called a force approach Otherwise, it is called no-force
Trang 6trans-19.1 Recovery Concepts I 615
The deferred update recovery scheme in Section 19.2 follows a no-steal approach
However, typical database systems employ asteal/no-forcestrategy The advantage of steal
isthat it avoids the need for a very large buffer space to store all updated pages in memory
The advantage of no-force is that an updated page of a committed transaction may still be
in the buffer when another transaction needs to update it, thus eliminating the I/O cost to
read that page again from disk This may provide a substantial saving in the number of I/O
operations when a specific page is updated heavily by multiple transactions
To permit recovery when in-place updating is used, the appropriate entries required
for recovery must be permanently recorded in the logon disk before changes are applied to
the database For example, consider the following write-ahead logging (WAL) protocol
for a recovery algorithm that requires both UNDO and REDO:
1.The before image of an item cannot be overwritten by its after image in the
data-base on disk until all UNDO-type log records for the updating transaction-up to
this point in time-have been force-written to disk
2 The commit operation of a transaction cannot be completed until all the REDO-type
and UNDO-type log records for that transaction have been force-written to disk
To facilitate the recovery process, the DBMS recovery subsystem may need to maintain
a number of lists related to the transactions being processed in the system These include a
list for active transactions that have started but not committed as yet, and it may also
include lists of all committed and aborted transactions since the last checkpoint (see next
section) Maintaining these lists makes the recovery process more efficient
19.1.4 Checkpoints in the System log and
Fuzzy Checkpointing
Another type of entry in the log is called acheckpoint.lA [checkpoi nt] record is
writ-ten into the log periodically at that point when the system writes out to the database on
disk all DBMS buffers that have been modified As a consequence of this, all transactions
that have their [commi t,T] entries in the log before a [checkpoi nt] entry do not need to
have their WRITE operationsredone in case of a system crash, since all their updates will
be recorded in the database on disk during checkpointing
The recovery manager of a DBMS must decide at what intervals to take a checkpoint
The interval may be measured in time-say, every m minutes-or in the number t of
committed transactions since the last checkpoint, where the values of m or t are system
parameters Taking a checkpoint consists of the following actions:
1 Suspend execution of transactions temporarily
2 Force-write all main memory buffers that have been modified to disk
3 Write a [checkpoi nt] record to the log, and force-write the log to disk
4 Resume executing transactions
- - - ~ ~
-3.The term checkpoint has been usedtodescribe more restrictive situations in some systems, such as
DB2.It has also been used in the literature to describe entirely different concepts
Trang 7616 I Chapter 19 Database Recovery Techniques
As a consequence of step 2, a checkpoint record in the log may also includeadditional information, such as a list of active transaction ids, and the locations(addresses) of the first and most recent (last) records in the log for each activetransaction This can facilitate undoing transaction operations in the event that atransaction must be rolled back
The time needed to force-write all modified memory buffers may delay transactionprocessing because of step 1.To reduce this delay, it is common to use a technique calledfuzzy checkpointing in practice In this technique, the system can resume transactionprocessing after the [checkpoi nt] record is written to the log without having to wait forstep 2 to finish However, until step 2 is completed, the previous [checkpoi nt] recordshould remain valid To accomplish this, the system maintains a pointer to the validcheckpoint, which continues to point to the previous [checkpoi nt] record in the log Oncestep 2 is concluded, that pointer is changed to point to the new checkpoint in the log
19.1.5 Transaction Rollback
Ifa transaction fails for whatever reason after updating the database, it may be necessary toroll back the transaction If any data item values have been changed by the transaction andwritten to the database, they must be restoredtotheir previous values (BFIMs) The undo-type log entries are used to restore the old values of data items that must be rolled back
If a transaction T is rolled back, any transaction S that has, in the interim, read thevalue of some data item X written by T must also be rolled back Similarly, once S is rolledback, any transaction R that has read the value of some data item Y written by S must also
be rolled back; and so on This phenomenon is called cascading rollback, and can occurwhen the recovery protocol ensures recoverable schedules but does not ensure strict or
cascadeless schedules (see Section 17.4.2) Cascading rollback, understandably, can bequite complex and time-consuming That is why almost all recovery mechanisms aredesigned such that cascading rollback isnever required.
Figure 19.1 shows an example where cascading rollback is required The read andwrite operations of three individual transactions are shown in Figure 19.1a Figure 19.1bshows the system log at the point of a system crash for a particular execution schedule ofthese transactions The values of data items A, B, C, and 0, which are used by thetransactions, are shown to the right of the system log entries We assume that the originalitem values, shown in the first line, are A= 30, B= 15, C= 40, and0 = 20 At the point
of system failure, transaction T3has not reached its conclusion and must be rolled back.TheWRITE operations ofT3 , marked by a single *in Figure 19.1b, are the T3 operationsthat are undone during transaction rollback Figure 19.1c graphically shows theoperations of the different transactions along the time axis
We must now check for cascading rollback From Figure 19.1c we see thattransaction Tzreads the value of item B that was written by transactionT3;this can also
be determined by examining the log Because T3 is rolled back, T z must now be rolledback, too The WRITE operations of T z, marked by ** in the log, are the ones that areundone Note that only write_item operations need to be undone during transactionrollback; read_item operations are recorded in the log only to determine whethercascading rollback of additional transactions is necessary
Trang 8read_item( C)
write3em(B)
read_item(A) write_item(A)
f - system crash
'T«is rolled back because it did not reach its commit point.
"T 2is rolled back because it reads the value of item 8 written by Ts.
I
IRE~D(A)
T1 1 IBEGIN
FIGURE19.1 Illustrating cascading rollback (a process that never occurs in strict or
cascadeless schedules) (a) The read and write operations of three transactions
(b) System log at point of crash (c) Operations before the crash
In practice, cascading rollback of transactions is never required because practical
recovery methods guarantee cascadeless or strict schedules Hence, there is also no need
to record any read_item operations in the log, because these are needed only for
determining cascading rollback
Trang 9618 IChapter 19 Database Recovery Techniques
The idea behind deferred update techniques istodefer or postpone any actual updates tothe database until the transaction completes its execution successfully and reaches itscommit point." During transaction execution, the updates are recorded only in the logand in the cache buffers After the transaction reaches its commit point and the log isforce-written to disk, the updates are recorded in the database If a transaction fails beforereaching its commit point, there is no need to undo any operations, because the transac-tion has not affected the database on disk in any way Although this may simplify recov-ery, it cannot be used in practice unless transactions are short and each transactionchanges few items For other types of transactions, there is the potential for running out
of buffer space because transaction changes must be held in the cache buffers until thecommit point
We can state a typical deferred update protocol as follows:
1.A transaction cannot change the database on disk until it reaches its commit point
2 A transaction does not reach its commit point until all its update operations arerecorded in the logandthe log is force-written to disk
Notice that step 2 of this protocol is a restatement of the write-ahead logging (WAL)protocol Because the database is never updated on disk until after the transactioncommits, there is never a need to UNDOany operations Hence, this is known as theNO-UNDO/REDO recovery algorithm. REDO is needed in case the system fails after atransaction commits but before all its changes are recorded in the database on disk.Inthiscase, the transaction operations are redone from the log entries
Usually, the method of recovery from failure is closely related to the concurrencycontrol method in multiuser systems First we discuss recovery in single-user systems,where no concurrency control is needed, so that we can understand the recovery processindependently of any concurrency control method We then discuss how concurrencycontrol may affect the recovery process
Single-User Environment
In such an environment, the recovery algorithm can be rather simple The algorithmRDU_S(Recovery using Deferred Update in a Single-user environment) uses aREDOprocedure,given subsequently, for redoing certain wri te_item operations; it works as follows:
PROCEDURE RDU_S: Use two lists of transactions: the committed transactions sincethe last checkpoint, and the active transactions (at most one transaction will fall inthis category, because the system is single-user) Apply theREDOoperation to all the
4 Hence deferred updare can generally be characrerized as ano-stealapproach.
Trang 1019.2 Recovery Techniques Based on Deferred Update I 619
WRITE_ITEM operations of the committed transactions from the log in the order in
which they were writtentothe log Restart the active transactions
TheREDOprocedure is defined as follows:
REDO(WRITE_OP):Redoing a wri te_i tern operationWRITE_OPconsists of examining
its log entry [write_itern,T,X,new_value] and setting the value of item X in the
database to new_val ue, which is the after image (AFIM)
TheREDO operation is requiredtobe idempotent-that is, executing it over and over
is equivalent to executing it just once In fact, the whole recovery process should be
idempotent This is so because, if the system were to fail during the recovery process, the
next recovery attempt mightREDOcertain wri te_i tern operations that had already been
redone during the first recovery process The result of recovery from a system crashduring
recoveryshould be the same as the result of recoveringwhen there isnocrash during recovery!
Notice that the only transaction in the active list will have had no effect on the
database because of the deferred update protocol, and it is ignored completely by the
recovery process because none of its operations were reflected in the database on disk
However, this transaction must now be restarted, either automatically by the recovery
process or manually by the user
Figure 19.2 shows an example of recovery in a single-user environment, where the
first failure occurs during execution of transaction Tv as shown in Figure 19.2b The
recovery process will redo the [wri te_i tern, T1, D, 20] entry in the log by resetting the
value of item D to 20 (its new value) The [wri te, T2, ] entries in the log are ignored
by the recovery process because T2is not committed If a second failure occurs during
recovery from the first failure, the same recovery process is repeated from start to finish,
with identical results
[write_item, T 2,D,25] +-system crash
The [write_item, ] operations ofT 1are redone.
T 2log entries are ignored by the recovery process.
FIGURE19.2 An example of recovery using deferred update in a single-user
envi-ronment (a) The READandWRITEoperations of two transactions (b) The system log at
the point of crash
Trang 11620 IChapter 19 Database Recovery Techniques
Execution in a Multiuser Environment
For multiuser systems with concurrency control, the recovery process may be more plex, depending on the protocols used for concurrency control In many cases, the con-currency control and recovery processes are interrelated In general, the greater thedegree of concurrency we wish to achieve, the more time consuming the task of recoverybecomes
com-Consider a system in which concurrency control uses strict two-phase locking, so thelocks on items remain in effectuntil the transaction reaches itscommitpoint.After that, thelocks can be released This ensures strict and serializable schedules Assuming that[checkpoi nt] entries are included in the log, a possible recovery algorithm for this case,which we call RDU_M (Recovery using Deferred Update in a Multiuser environment), isgiven next This procedure uses theREDOprocedure defined earlier
PROCEDURE RDU_M (WITH CHECKPOINTS): Use two lists of transactions tained by the system: the committed transactions T since the last checkpoint (com-
main-mit list), and the active transactionsT' (active list) REDOall theWRITEoperations
of the committed transactions from the log, inthe order in which they were written into the log. The transactions that are active and did not commit are effectively canceledand must be resubmitted
Figure 19.3 shows a possible schedule of executing transactions When the point was taken at timet),transaction T) had committed, whereas transactions T3andT4
check-had not Before the system crash at time t 2,T3and T2were committed but not T4and Ts.According to the RDU_M method, there is no need to redo the wri te_i tern operations oftransaction T I-or any transactions committed before the last checkpoint timet).Wri re_
i tern operations of T2and T3must be redone, however, because both transactions reached
Trang 1219.2 Recovery Techniques Based on Deferred Update I 621
their commit points after the last checkpoint Recall that the log is force-written before
committing a transaction Transactions T4 and T5 are ignored: They are effectively
canceled or rolled back because none of their wri te_i tern operations were recorded in
the database under the deferred update protocol We will refer to Figure 19.3 later to
illustrate other recovery protocols
We can make theNO-UNDO/REDOrecovery algorithmmore efficientby noting that, if
a data item X has been updated-as indicated in the log entries-more than once by
committed transactions since the last checkpoint, it is only necessary to REDO the last
update ofX from the log during recovery The other updates would be overwritten by this
lastREDOin any case In this case, we start fromthe end of the log; then, whenever an item
isredone, it is added to a list of redone items BeforeREDOis applied to an item, the list is
checked; if the item appears on the list, it is not redone again, since its last value has
already been recovered
If a transaction is aborted for any reason (say, by the deadlock detection method), it
is simply resubmitted, since it has not changed the database on disk A drawback of the
method described here is that it limits the concurrent execution of transactions because
all items remain locked until the transaction reaches itscommitpoint. In addition, it may
require excessive buffer space to hold all updated items until the transactions commit
The method's main benefit is that transaction operationsnever need to be undone,for two
reasons:
1 A transaction does not record any changes in the database on disk until after it
reaches its commit point-that is, until it completes its execution successfully
Hence, a transaction is never rolled back because of failure during transaction
execution
2 A transaction will never read the value of an item that is written by an
uncom-mitted transaction, because items remain locked until a transaction reaches its
commit point Hence, no cascading rollback will occur
Figure 19.4 shows an example of recovery for a multiuser system that utilizes the
recovery and concurrency control method just described
the Database
In general, a transaction will have actions that donotaffect the database, such as
generat-ing and printgenerat-ing messages or reports from information retrieved from the database If a
transaction fails before completion, we may not want the user to get these reports, since
the transaction has failed to complete If such erroneous reports are produced, part of the
recovery process would have to inform the user that these reports are wrong, since the
user may take an action based on these reports that affects the database Hence, such
reports should be generated onlyafter the transaction reaches itscommitpoint.A common
method of dealing with such actions is to issue the commands that generate the reports
but keep them as batch jobs, which are executed only after the transaction reaches its
commit point If the transaction fails, the batch jobs are canceled
Trang 13622 IChapter 19 Database Recovery Techniques
T 1
-(a) read_item(A) read_item(D) write_item(D)
T 2
read_item(B) write_item(B) read_item(D) write_item(D)
T 3
read_item(A) write_item(A) read_item( C) write_item( C)
T 4
read_item(B) write_item(B) read_item(A) write_item(A)
[write_item,T 2 ,D,25] system crash
TzandT 3are ignored because they did not reach their commit points.
~isredone because its commit point is after the last system checkpoint.
FIGURE 19.4 An example of recovery using deferred update with concurrent actions (a) The READandWRITEoperations of four transactions (b) System log at thepoint of crash
IMMEDIATE UPDATE
In these techniques, when a transaction issues an update command, the database can beupdated "immediately," without any need to wait for the transaction to reach its commitpoint In these techniques, however, an update operation must still be recorded in the log(on disk) beforeit is applied to the database-using the write-ahead logging protocol-sothat we can recover in case of failure
Provisions must be made for undoing the effect of update operations that have beenapplied to the database by a failed transaction. This is accomplished by rolling back thetransaction and undoing the effect of the transaction's wri te_i tern operations Theoretically,
we can distinguish two main categories of immediate update algorithms If the recoverytechnique ensures that all updates of a transaction are recorded in the database on diskbefore
the transaction commits, there is never a need toREDO any operations of committed actions This is called theUNDO!NO-REDO recovery algorithm On the other hand, if the
Trang 14trans-19.3 Recovery Techniques Based on Immediate Update I 623
transaction is allowed to commit before all its changes are writtento the database, we have
the most general case, known as theUNDO/REDOrecovery algorithm This is also the most
complex technique Next, we discuss two examples ofUNDO/REDOalgorithms and leave it as
an exercise for the reader to develop the UNDO/NO-REDO variation In Section 19.5, we
describe a more practical approach known as theARIESrecovery technique
Update in a Single-User Environment
In a single-user system, if a failure occurs, the executing (active) transaction at the time
of failure may have recorded some changes in the database The effect of all such
opera-tions must be undone The recovery algorithmRIU_S(Recovery using Immediate Update
in a Single-user environment) uses the REDO procedure defined earlier, as well as the
UNDOprocedure defined below
PROCEDURERIU_S
1.Use two lists of transactions maintained by the system: the committed
tions since the last checkpoint and the active transactions (at most one
transac-tion will fall in this category, because the system is single-user)
2 Undo all the wri te_i tern operations of theactivetransaction from the log, using
theUNDOprocedure described below
3 Redo the write_; ternoperations of the committedtransactions from the log, in the
order in which they were written in the log, using the REDOprocedure described earlier
TheUNDOprocedure is defined as follows:
UNDO(WRITE_OP): Undoing a write_i ternoperation write_opconsists of
examin-ing its log entry [write_;tern,T,X,01d_va1ue,new_va1ue] and settexamin-ing the value of
item X in the database to 01d_va1 ue which is the before image (BFIM) Undoing a
num-ber of wri te_; tern operations from one or more transactions from the log must proceed
in thereverse orderfrom the order in which the operations were written in the log
Update with Concurrent Execution
When concurrent execution is permitted, the recovery process again depends on the
pro-tocols used for concurrency control The procedure RIU_M (Recovery using Immediate
Updates for a Multiuser environment) outlines a recovery algorithm for concurrent
trans-actions with immediate update Assume that the log includes checkpoints and that the
concurrency control protocol produces strict schedules-as, for example, the strict
two-phase locking protocol does Recall that a strict schedule does not allow a transaction to
read or write an item unless the transaction that last wrote the item has committed (or
aborted and rolled back) However, deadlocks can occur in strict two-phase locking, thus
Trang 15624 IChapter 19 Database Recovery Techniques
requiring abort and UNDO of transactions For a strict schedule, UNDO of an operationrequires changing the item back to its old value (BFIM)
3 Redo all the wri te_item operations of thecommittedtransactions from the log, inthe order in which they were written into the log
As we discussed in Section 19.2.2, step 3 is more efficiently done by starting from the
end of the logand redoing only the last update of each itemX Whenever an item is redone,
it is added to a list of redone items and is not redone again A similar procedure can bedevised to improve the efficiency of step 2
19.4 SHADOW PAGING
This recovery scheme does not require the use of a log in a single-user environment In amultiuser environment, a log may be needed for the concurrency control method Shadowpaging considers the database to be made up of a number of fixed-size disk pages (or diskblocks)-say, n-for recovery purposes A directory with n entries' is constructed, where the
ithentry points to the ithdatabase page on disk The directory is kept in main memory if it isnot too large, and all references-reads or writes-to database pages on disk go through it.When a transaction begins executing, the current directory-whose entries point to themost recent or current database pages on disk-is copied into a shadow directory Theshadow directory is then saved on disk while the current directory is used by the transaction.During transaction execution, the shadow directory isnevermodified When a wri te_itemoperation is performed, a new copy of the modified database page is created, but theold copy of that page isnot overwritten. Instead, the new page is written elsewhere-onsome previously unused disk block The current directory entry is modified to point to thenew disk block, whereas the shadow directory is not modified and continues to point to theold unmodified disk block Figure 19.5 illustrates the concepts of shadow and currentdirectories For pages updated by the transaction, two versions are kept The old version isreferenced by the shadow directory, and the new version by the current directory
To recover from a failure during transaction execution, it is sufficient to free themodified database pages and to discard the current directory The state of the databasebefore transaction execution is available through the shadow directory, and that state isrecovered by reinstating the shadow directory The database thus is returned to its state
5 The directory is similar to the page table maintained by the operating system for each process
Trang 1619.5 TheARIESRecovery Algorithm I 625
database disk blocks (pages)
shadow directory (not updated)
FIGURE 19.5 An example of shadow paging
prior to the transaction that was executing when the crash occurred, and any modified
pages are discarded Committing a transaction corresponds to discarding the previous
shadow directory Since recovery involves neither undoing nor redoing data items, this
technique can be categorized as aNO-UNDO/NO-REDOtechnique for recovery
In a multiuser environment with concurrent transactions, logs and checkpoints must be
incorporated into the shadow paging technique One disadvantage of shadow paging is that
the updated database pages change location on disk This makes it difficult to keep related
database pages close together on disk without complex storage management strategies
Furthermore, if the directory is large, the overhead of writing shadow directories to disk as
transactions commit is significant A further complication is how to handle garbage collection
when a transaction commits The old pages referenced by the shadow directory that have
been updated must be released and added to a list of free pages for future use These pages are
no longer needed after the transaction commits Another issue is that the operation to migrate
between current and shadow directories must be implemented as an atomic operation
19.5 TH EARl ES RECOVERY ALGORITHM
We now describe theARIESalgorithm as an example of a recovery algorithm used in
data-base systems ARIES uses a steal/no-force approach for writing, and it is based on three
concepts: (l) write-ahead logging, (2) repeating history during redo, and (3) logging
Trang 17626 IChapter 19 Database Recovery Techniques
changes during undo We already discussed write-ahead logging in Section 19.1.3 Thesecond concept, repeating history, means thatARIESwill retrace all actions of the data-base system priorto the crashto reconstruct the database state when the crash occurred.
Transactions that were uncommitted at the time of the crash (active transactions) areundone The third concept, logging during undo, will preventARIESfrom repeating thecompleted undo operations if a failure occurs during recovery, which causes a restart ofthe recovery process
TheARIESrecovery procedure consists of three main steps: (1) analysis, (2) REDOand(3) UNDO The analysis step identifies the dirty (updated) pages in the buffer," and theset of transactions active at the time of the crash The appropriate point in the log wherethe REDOoperation should start is also determined The REDO phase actually reappliesupdates from the logto the database Generally, the REDOoperation is applied to onlycommitted transactions However, in ARIES,this is not the case Certain information inthe ARIES log will provide the start point for REDO,from which REDOoperations areapplied until the end of the log is reached In addition, information stored byARIESand
in the data pageswill allow ARIES to determine whether the operation to be redonehas
actually been applied to the database and hence need not be reapplied Thus only the necessary REDOoperations are applied during recovery Finally, during the UNDO phase,the log is scanned backwards and the operations of transactions that were active at thetime of the crash are undone in reverse order The information needed for ARIES toaccomplish its recovery procedure includes the log, the Transaction Table, and the DirtyPage Table In addition, checkpointing is used These two tables are maintained by thetransaction manager and written to the log during checkpointing
In ARIES,every log record has an associated log sequence number (LSN) that ismonotonically increasing and indicates the address of the log record on disk Each LSNcorresponds to aspecific change (action) of some transaction In addition, each data page
will store theLSNof thelatest logrecord correspondingtoa change for that page A log record
is written for any of the following actions: updating a page (write), committing atransaction (commit), aborting a transaction (abort), undoing an update (undo), andending a transaction (end) The need for including the first three actions in the log hasbeen discussed, but the last two need some explanation When an update is undone, a
compensation log record is written in the log When a transaction ends, whether by
committing or aborting, an end logrecord is written.
Common fields in all log records include: (1) the previousLSNfor that transaction,(2) the transaction ID, and (3) the type of log record The previous LSNis importantbecause it links the log records (in reverse order) for each transaction For an update(write) action, additional fields in the log record include: (4) the pageIDfor the page thatincludes the item, (5) the length of the updated item, (6) its offset from the beginning ofthe page, (7) the before image of the item, and (8) its after image
6 The actual buffers may be lost during a crash, since they are in main memory Additional tablesstored in the log during checkpointing (Dirty Page Table, Transaction Table) allow ARIES to iden-tify this information (see next page)
Trang 1819.5 TheARIESRecovery Algorithm I 627
Besides the log, two tables are needed for efficient recovery: the Transaction Table
and the Dirty Page Table, which are maintained by the transaction manager When a
crash occurs, these tables are rebuilt in the analysis phase of recovery The Transaction
Table contains an entry for each active transaction, with information such as the
transaction ID, transaction status, and the LSN of the most recent log record for the
transaction The Dirty Page Table contains an entry for each dirty page in the buffer,
which includes the pageIDand theLSNcorresponding to the earliest update to that page
Checkpointing in ARIESconsists of the following: (1) writing a begi n_checkpoi nt
record to the log, (2) writing an end_checkpoi nt record to the log, and (3) writing the
LSN ofthe begi n_checkpoi nt record to a special file This special file is accessed during
recovery to locate the last checkpoint information With the end_checkpoi nt record, the
contents of both the Transaction Table and Dirty Page Table are appended to the end of
the log To reduce the cost, fuzzy checkpointing is used so that theDBMScan continue to
execute transactions during checkpointing (see Section 19.1.4) In addition, the contents
of the DBMScache do not have to be flushed to disk during checkpoint, since the
Transaction Table and Dirty Page Table-which are appended to the log on
disk-contain the information needed for recovery Notice that if a crash occurs during
checkpointing, the special file will refer to the previous checkpoint, which is used for
recovery
After a crash, the ARIES recovery manager takes over Information from the last
checkpoint is first accessed through the special file The analysis phase starts at the
begi n_checkpoi nt record and proceeds to the end of the log When the end_checkpoi nt
record is encountered, the Transaction Table and Dirty Page Table are accessed (recall
that these tables were written in the log during checkpointing) During analysis, the log
records being analyzed may cause modifications to these two tables For instance, if an
end log record was encountered for a transaction T in the Transaction Table, then the
entry for T is deleted from that table.Ifsome other type of log record is encountered for a
transactionT',then an entry for T' is inserted into the Transaction Table, if not already
present, and the last LSNfield is modified If the log record corresponds to a change for
page P, then an entry would be made for page P (if not present in the table) and the
associated LSN field would be modified When the analysis phase is complete, the
necessary information forREDOandUNDOhas been compiled in the tables
The REDO phase follows next To reduce the amount of unnecessary work, ARIES
starts redoing at a point in the log where it knows (for sure) that previous changes to dirty
pageshave already been applied to the databaseondisk.It can determine this by finding the
smallestLSN,M, of all the dirty pages in the Dirty Page Table, which indicates the log
position whereARIESneeds to start theREDOphase Any changes corresponding to aLSN
<M, for redoable transactions, must have already been propagated to disk or already
been overwritten in the buffer; otherwise, those dirty pages with thatLSNwould be in the
buffer (and the Dirty Page Table) SO,REDOstarts at the log record withLSN= M and
scans forward to the end of the log For each change recorded in the log, the REDO
algorithm would verify whether or not the change has to be reapplied For example, if a
change recorded in the log pertains to page P that is not in the Dirty Page Table, then this
change is already on disk and need not be reapplied Or, if a change recorded in the log
(withLSN= N, say) pertains to page P and the Dirty Page Table contains an entry for P
Trang 19628 IChapter 19 Database Recovery Techniques
with LSN greater than N, then the change is already present If neither of these twoconditions hold, page P is read from disk and the LSN stored on that page, LSN(P), iscompared with N If N < LSN(P),then the change has been applied and the page neednot be rewritten to disk
Once theREDOphase is finished, the database is in the exact state that it was in when thecrash occurred The set of active transactions -ealled the undo_set-has been identified inthe Transaction Table during the analysis phase Now, theUNDOphase proceeds by scanningbackward from the end of the log and undoing the appropriate actions A compensating logrecord is written for each action that is undone TheUNDOreads backward in the log untilevery action of the set of transactions in the undo_set has been undone When this iscompleted, the recovery process is finished and normal processing can begin again
Consider the recovery example shown in Figure 19.6 There are three transactions:
Tj , r; and T3•Tj updates page C, r, updates pages Band C, and T3updates pageA.Figure 19.6 (a) shows the partial contents of the log and (b) shows the contents of theTransaction Table and Dirty Page Table Now, suppose that a crash occurs at this point
(c)
Trang 2019.6 Recovery in Multidatabase Systems I 629
Since a checkpoint has occurred, the address of the associated begi n_checkpoi nt record
isretrieved, which is location 4 The analysis phase starts from location 4 until it reaches
the end The end_checkpoi nt record would contain the Transaction Table and Dirty
Page table in Figure 19.6b, and the analysis phase will further reconstruct these tables
When the analysis phase encounters log record 6, a new entry for transaction T3is made
in the Transaction Table and a new entry for page A is made in the Dirty Page table After
log record 8 is analyzed, the status of transaction T z is changed to committed in the
Transaction Table Figure 19.6c shows the two tables after the analysis phase
For the REDO phase, the smallest LSN in the Dirty Page table is 1 Hence the REDO
will start at log record 1 and proceed with the REDO of updates The LSNs {I, 2, 6, 7}
corresponding to the updates for pages C, B,A, and C, respectively, are not less than the
LSNs of those pages (as shown in the Dirty Page table) So those data pages will be read
again and the updates reapplied from the log (assuming the actual LSNs stored on those
data pages are less then the corresponding log entry) At this point, the REDO phase is
finished and the UNDO phase starts From the Transaction Table (Figure 19.6c), UNDO is
applied only to the active transactionT3 The UNDO phase starts at log entry 6 (the last
update for T3 ) and proceeds backward in the log The backward chain of updates for
transaction T3 (only log record 6 in this example) is followed and undone
So far, we have implicitly assumed that a transaction accesses a single database In some
cases a single transaction, called a multidatabase transaction, may require access to
mul-tiple databases These databases may even be stored on different types of DBMSs; for
exam-ple, some DBMSs may be relational, whereas others are object-oriented, hierarchical, or
network DBMSs In such a case, each DBMS involved in the multidatabase transaction may
have its own recovery technique and transaction manager separate from those of the
other DBMSs This situation is somewhat similar to the case of a distributed database
man-agement system (see Chapter 25), where parts of the database reside at different sites that
are connected by a communication network
To maintain the atomicity of a multidatabase transaction, it is necessary to have a
two-level recovery mechanism A global recovery manager, or coordinator, is needed to
maintain information needed for recovery, in addition to the local recovery managers and
the information they maintain (log, tables) The coordinator usually follows a protocol
called the two-phase commit protocol, whose two phases can be stated as follows:
• Phase 1: When all participating databases signal the coordinator that the part of the
multidatabase transaction involving each has concluded, the coordinator sends a
message "prepare for commit" to each participant to get ready for committing the
transaction Each participating database receiving that message will force-write all
log records and needed information for local recovery to disk and then send a "ready
to commit" or "OK"signal to the coordinator If the force-writing to disk fails or the
local transaction cannot commit for some reason, the participating database sends a
"cannot commit" or "notOK" signal to the coordinator If the coordinator does not
Trang 21630 IChapter 19 Database Recovery Techniques
receive a reply from a database within a certain time out interval, it assumes a "not
OK"response
• Phase 2: Ifallparticipating databases reply"OK," and the coordinator's vote is also
"OK," the transaction is successful, and the coordinator sends a "commit" signalforthe transaction to the participating databases Because all the local effects of thetransaction and information needed for local recovery have been recorded in the logs
of the participating databases, recovery from failure is now possible Each ing database completes transaction commit by writing a [commit] entry for the trans-action in the log and permanently updating the database if needed On the otherhand, if one or more of the participating databases or the coordinator have a "not
participat-OK"response, the transaction has failed, and the coordinator sends a message to "tollback" orUNDOthe local effect of the transaction to each participating database This
is done by undoing the transaction operations, using the log
The net effect of the two-phase commit protocol is that either all participatingdatabases commit the effect of the transaction or none of them do In case any of theparticipants-or the coordinator-fails, it is always possible to recover to a state whereeither the transaction is committed or it is rolled back A failure during or before Phase 1usually requires the transaction to be rolled back, whereas a failure during Phase 2 meansthat a successful transaction can recover and commit
CATASTROPHIC FAILURES
So far, all the techniques we have discussed apply to noncatastrophic failures A keyassumption has been that the system log is maintained on the disk and is not lost as aresult of the failure Similarly, the shadow directory must be stored on disk to allow recov-ery when shadow paging is used The recovery techniques we have discussed use theentries in the system log or the shadow directory to recover from failure by bringing thedatabase back to a consistent state
The recovery manager of aDBMS must also be equipped to handle more catastrophicfailures such as disk crashes The main technique used tohandle such crashes is that ofdatabase backup The whole database and the log are periodically copied onto a cheapstorage medium such as magnetic tapes In case of a catastrophic system failure, the latestbackup copy can be reloaded from the tape to the disk, and the system can be restarted
To avoid losing all the effects of transactions that have been executed since the lastbackup, it is customary to back up the system log at more frequent intervals than fulldatabase backup by periodically copying it to magnetic tape The system log is usuallysubstantially smaller than the database itself and hence can be backed up more frequently.Thus users do not lose all transactions they have performed since the last databasebackup All committed transactions recorded in the portion of the system log that hasbeen backed up to tape can have their effect on the database redone A new log is started
Trang 2219.8 Summary I 631
after each database backup Hence, to recover from disk failure, the database is first
recreated on disk from its latest backup copy on tape Following that, the effects of all the
committed transactions whose operations have been recorded in the backed-up copies of
the system log are reconstructed
In this chapter we discussed the techniques for recovery from transaction failures The
main goal of recovery is to ensure the atomicity property of a transaction.Ifa transaction
fails before completing its execution, the recovery mechanism has to make sure that the
transaction has no lasting effects on the database We first gave an informal outline for a
recovery process and then discussed system concepts for recovery These included a
dis-cussion of caching, in-place updating versus shadowing, before and after images of a data
item,UNDO versus REDO recovery operations, steal/no-steal and force/no-force policies,
system checkpointing, and the write-ahead logging protocol
Next we discussed two different approaches to recovery: deferred update and
immediate update Deferred update techniques postpone any actual updating of the
database on disk until a transaction reaches its commit point The transaction
force-writes the log to disk before recording the updates in the database This approach, when
used with certain concurrency control methods, is designed never to require transaction
rollback, and recovery simply consists of redoing the operations of transactions
committed after the last checkpoint from the log The disadvantage is that too much
buffer space may be needed, since updates are kept in the buffers and are not applied to
disk until a transaction commits Deferred update can lead to a recovery algorithm known
asNO-UNDO/REDO Immediate update techniques may apply changes to the database on
disk before the transaction reaches a successful conclusion Any changes applied to the
database must first be recorded in the log and force-written to disk so that these
operations can be undone if necessary We also gave an overview of a recovery algorithm
for immediate update known asUNDO/REDO Another algorithm, known as
UNDO/NO-REDO, can also be developed for immediate update if all transaction actions are recorded
in the database before commit
We discussed the shadow paging technique for recovery, which keeps track of old
database pages by using a shadow directory This technique, which is classified as
NO-UNDO/NO-REDO, does not require a log in single-user systems but still needs the log for
multiuser systems We also presentedARIES, a specific recovery scheme used in some of
IBM's relational database products We then discussed the two-phase commit protocol,
which is used for recovery from failures involving multidatabase transactions Finally,
we discussed recovery from catastrophic failures, which is typically done by backing up
the database and the log to tape The log can be backed up more frequently than the
database, and the backup log can be used to redo operations starting from the last
database backup
Trang 23632 IChapter 19 Database Recovery Techniques
19.4 How are buffering and caching techniques used by the recovery subsystem?19.5 What are the before image (BFIM) and after image (AFIM) of a data item? What isthe difference between in-place updating and shadowing, with respect to theirhandling of BFIM and AFIM?
19.6 What are UNDO-type and REDO-type log entries?
19.7 Describe the write-ahead logging protocol
19.8 Identify three typical lists of transactions that are maintained by the recovery system
sub-19.9 What is meant by transaction rollback? What is meant by cascading rollback?Why do practical recovery methods use protocols that do not permit cascadingrollback? Which recovery techniques do not require any rollback?
19.10 Discuss the UNDO and REDO operations and the recovery techniques that use each.19.11 Discuss the deferred update technique of recovery What are the advantages anddisadvantages of this technique? Why is it called the NO-UNDO/REDO method?19.12 How can recovery handle transaction operations that do not affect the database,such as the printing of reports by a transaction?
19.13 Discuss the immediate update recovery technique in both single-user and tiuser environments What are the advantages and disadvantages of immediateupdate?
mul-19.14 What is the difference between the UNDO/REDO and the UNDO/NO-REDO rithms for recovery with immediate update? Develop the outline for an UNDO/NO-REDO algorithm
algo-19.15 Describe the shadow paging recovery technique Under what circumstances does
it not require a log?
19.16 Describe the three phases of the ARIES recovery method
19.17 What are log sequence numbers (LSNs) in ARIES? How are they used? What mation does the Dirty Page Table and Transaction Table contain? Describe howfuzzy checkpointing is used in ARIES
infor-19.18 What do the terms steal/no-steal and force/no-force mean with regard to buffermanagement for transaction processing
19.19 Describe the two-phase commit protocol for multidatabase transactions
19.20 Discuss how recovery from catastrophic failures is handled
Trang 2419.21 Suppose that the system crashes before the [read_item,T3,A] entry is written to
the log in Figure 19.1b Will that make any difference in the recovery process?
19.22 Suppose that the system crashes before the [write_item,T2,D,25,26] entry is
writtentothe log in Figure 19.1b Will that make any difference in the recovery
process?
19.23 Figure 19.7 shows the log correspondingtoa particular schedule at the point of a
system crash for four transactions TI ,Tz,T 3, and T 4. Suppose that we use the
immediate update protocolwith checkpointing Describe the recovery process from
the system crash Specify which transactions are rolled back, which operations in
the log are redone and which (if any) are undone, and whether any cascading
rollback takes place
19.24 Suppose that we use the deferred update protocol for the example in Figure 19.7
Show how the log would be different in the case of deferred update by removing
the unnecessary log entries; then describe the recovery process, using your
modi-fied log Assume that onlyREDOoperations are applied, and specify which
opera-tions in the log are redone and which are ignored
19.25 How does checkpointing inARIESdiffer from checkpointing as described in
Sec-tion 19.1.4?
19.26 How are log sequence numbers used byARIESto reduce the amount ofREDOwork
needed for recovery? Illustrate with an example using the information shown in
Fig-ure 19.6 You can make your own assumptions as to when a page is written to disk
[write_item,T 2,D,15, 25]f- system crash
FIGURE19.7 An example schedule and its corresponding log
Exercises I 633
Trang 25634 IChapter 19 Database Recovery Techniques
19.27 What implications would a no-steal/force buffer management policy have oncheckpointing and recovery?
Choose the correct answer for each of the following multiple-choice questions:
19.28 Incremental logging with deferred updates implies that the recovery system mustnecessarily
a store the old value of the updated item in the log
b store the new value of the updated item in the log
e store both the old and new value of the updated item in the log
d store only the Begin Transaction and Commit Transaction records in the log.19.29 The write ahead logging (WAL) protocol simply means that
a the writing of a data item should be done ahead of any logging operation
b the log record for an operation should be written before the actual data iswritten
e all log records should be written before a new transaction begins execution
d the log never needstobe written to disk
19.30 In case of transaction failure under a deferred update incremental logging scheme,which of the following will be needed:
a an undo operation
b a redo operation
e an undo and redo operation
d none of the above
19.31 For incremental logging with immediate updates, a log record for a transactionwould contain:
a a transaction name, data item name, old value of item, new value of item
b a transaction name, data item name, old value of item
e a transaction name, data item name, new value of item
d a transaction name and a data item name
19.32 For correct behavior during recovery, undo and redo operations must be
a searching the entire log is time consuming
b many redo's are unnecessary
e both (a) and (b)
d none of the above
19.34 When using a log based recovery scheme, it might improve performance as well asproviding a recovery mechanism by
a writing the log records to disk when each transaction commits
b writing the appropriate log records to disk during the transaction's execution
c waiting to write the log records until multiple transactions commit and ing them as a batch
writ-d never writing the log records to disk
Trang 26Selected Bibliography I 635
19.35 There is a possibility of a cascading rollback when
a a transaction writes items that have been written only by a committed
19.36 To cope with media (disk) failures, it is necessary
a for theDBMS toonly execute transactions in a single user environment
b to keep a redundant copy of the database
c to never abort a transaction
d all of the above
19.37 If the shadowing approach is used for flushing a data item back to disk, then
a the item is written to disk only after the transaction commits
b the item is written to a different location on disk
c the item is written to disk before the transaction commits
d the item is written to the same disk location from which it was read
Selected Bibliography
The books by Bernstein et al (1987) and Papadimitriou (1986) are devoted to the theory
and principles of concurrency control and recovery The book by Gray and Reuter (1993) is
an encyclopedic work on concurrency control, recovery, and other transaction-processing
issues
Verhofstad (1978) presents a tutorial and survey of recovery techniques in database
systems Categorizing algorithms based on theirUNDO/REDOcharacteristics is discussed in
Haerder and Reuter (1983) and in Bernstein et al (1983) Gray (1978) discusses
recov-ery, along with other system aspects of implementing operating systems for databases The
shadow paging technique is discussed in Lorie (1977), Verhofstad (1978), and Reuter
(1980) Gray et al (1981) discuss the recovery mechanism in SYSTEM R.Lockeman and
Knutsen (1968), Davies (1972), and Bjork (1973) are early papers that discuss recovery
Chandy et al (1975) discuss transaction rollback Lilien and Bhargava (1985) discuss the
concept of integrity block and its use to improve the efficiency of recovery
Recovery using write-ahead logging is analyzed in [hingran and Khedkar (1992) and
isused in theARIESsystem (Mohan et al 1992a) More recent work on recovery includes
compensating transactions (Korth et al 1990) and main memory database recovery
(Kumar 1991) TheARIES recovery algorithms (Mohan et al 1992) have been quite
suc-cessful in practice Franklin et al (1992) discusses recovery in the EXODUS system Two
recent books by Kumar and Hsu (1998) and Kumar and Son (1998) discuss recovery in
detail and contain descriptions of recovery methods used in a number of existing
rela-tional database products
Trang 27OBJECT AND
OBJECT-RELATIONAL DATABASES
Trang 28Concepts for Object Databases
In this chapter and the next, we discuss object-oriented data models and database
sys-terns.' Traditional data models and systems, such as relational, network, and hierarchical,
have been quite successful in developing the database technology required for many
tradi-tional business database applications However, they have certain shortcomings when
more complex database applications must be designed and implemented-for example,
databases for engineering design and manufacturing (CAD/CAM and CIM2), scientific
experiments, telecommunications, geographic information systems, and rnultimedia'
These newer applications have requirements and characteristics that differ from those of
traditional business applications, such as more complex structures for objects,
longer-duration transactions, new data types for storing images or large textual items, and the
need to define nonstandard application-specific operations Object-oriented databases
were proposed to meet the needs of these more complex applications The
object-oriented approach offers the flexibility to handle some of these requirements without
1.These darabases are often referred to as Object Databases and the systems are referred to as
Object Database Management Systems (ODBMS). However, because this chapter discusses many
general object-oriented concepts, wewilluse the termobject-orientedinstead of justobject.
2 Computer-Aided Design/Computer-Aided Manufacturing and Computer-Integrated
Manufac-turing
3.Multimedia databases must store various types of multimedia objects, such as video, audio,
images, graphics, and documents (see Chapter 24)
639
Trang 29640 IChapter 20 Concepts for Object Databases
being limited by the data types and query languages available in traditional database tems A key feature of object-oriented databases is the power they give the designer to
sys-specify both the structure of complex objects and the operations that can be applied tothese objects
Another reason for the creation of object-oriented databases is the increasing use ofobject-oriented programming languages in developing software applications Databasesare now becoming fundamental components in many software systems, and traditionaldatabases were difficult to use with object-oriented software applications that aredeveloped in an object-oriented programming language such as C++, SMALLTALK, orJAVA Object-oriented databases are designed so they can be directly-or seamlessly-
integrated with software that is developed using object-oriented programming languages.The need for additional data modeling features has also been recognized by relationalDBMS vendors, and newer versions of relational systems are incorporating many of thefeatures that were proposed for object-oriented databases This has led to systems that arecharacterized asobject-relationalorextended relationalDBMSs (see Chapter22) The latestversion of the SQL standard for relational DBMSs includes some of these features
Although many experimental prototypes and commercial object-oriented databasesystems have been created, they have not found widespread use because of the popularity
of relational and object-relational systems The experimental prototypes included theORION system developed at MCC,4 OPENOODB at Texas Instruments, the IRIS system atHewlett-Packard laboratories, the ODE system at AT&T Bell Labs.? and the ENCORE!ObServer project at Brown University Commercially available systems includedGEMSTONE/OPAL of GemStone Systems, ONTOS of Ontos, Objectivity of Objectivity Inc.,Versant of Versant Object Technology, ObjectStore of Object Design, ARDENT ofARDENT Software," and POET of POET Software These represent only a partial list of theexperimental prototypes and commercial object-oriented database systems that werecreated
As commercial object-oriented DBMSs became available, the need for a standardmodel and language was recognized Because the formal procedure for approval ofstandards normally takes a number of years, a consortium of object-oriented DBMSvendors and users, called ODMG,7proposed a standard that is known as the ODMG-93standard, which has since been revised We will describe some features of the ODMGstandard in Chapter 21
Object-oriented databases have adopted many of the concepts that were developedoriginally for object-oriented programming languages.f In Section 20.1, we examine theorigins of the object-oriented approach and discuss how it applies to database systems.Then, in Sections20.2through20.6,we describe the key concepts utilized in many object-
4 Microelectronics and Computer Technology Corporation, Austin, Texas
5 Now called Lucent Technologies
6 Formerly 02 of 02 Technology
7 Object Database Management Group
8 Similar concepts were also developed in the fields of semantic data modeling and knowledgerepresentation
Trang 3020.1 Overview of Object-Oriented Concepts I 641
oriented database systems Section 20.2 discussesobject identity, object structure, and type
constructors.Section 20.3 presents the concepts ofencapsulation of operationsand definition
ofmethods as part of class declarations, and also discusses the mechanisms for storing
objects in a database by making them persistent. Section 2004 describes type and class
hierarchies and inheritance in object-oriented databases, and Section 20.5 provides an
overview of the issues that arise when complex objects need to be represented and stored
Section 20.6 discusses additional concepts, including polymorphism, operator overloading,
dynamic binding, multiple and selective inheritance,andversioningandconfigurationof objects
This chapter presents the general concepts of object-oriented databases, whereas
Chapter 22 will present the ODMG standard The reader may skip Sections 20.5 and 20.6
ofthis chapter if a less detailed introduction to the topic is desired
20.1 OVERVIEW OF OBJECT-ORIENTED
CONCEPTS
This section gives a quick overview of the history and main concepts of object-oriented
databases, or OODBs for short The OODB concepts are then explained in more detail in
Sections 20.2 through 20.6 The termobject-oriented-abbreviatedby00or O-O-has its
origins in 00programming languages, or OOPLs Today 00 concepts are applied in the
areas of databases, software engineering, knowledge bases, artificial intelligence, and
com-puter systems in general OOPLs have their roots in the SIMULA language, which was
pro-posed in the late 1960s In SIMULA, the concept of a class groups together the internal
data structure of an object in a class declaration Subsequently, researchers proposed the
concept ofabstractdata type,which hides the internal data structures and specifies all
pos-sible external operations that can be applied to an object, leading to the concept of
encap-sulation. The programming language SMALLTALK, developed at Xerox PARC9 in the
1970s, was one of the first languages to explicitly incorporate additional 00 concepts,
such as message passing and inheritance.Itis known as apure00programming language,
meaning that it was explicitly designed tobe object-oriented This contrasts withhybrid
00programming languages, which incorporate00concepts into an already existing
lan-guage An example of the latter is C++, which incorporates00concepts into the popular
cprogramming language
An object typically has two components; state (value) and behavior (operations)
Hence, it is somewhat similar to aprogram variable in a programming language, except
that it will typically have acomplex data structureas well asspecific operationsdefined by
the programmer.10Objects in an OOPL exist only during program execution and are hence
calledtransient objects. An00database can extend the existence of objects so that they
are stored permanently, and hence the objectspersist beyond program termination and
can be retrieved later and shared by other programs In other words, 00 databases store
9 Palo Alto Research Center, Palo Alto, California
10.Objects have many other characteristics, as we discuss in the rest of this chapter
Trang 31642 I Chapter 20 Concepts for Object Databases
persistent objectspermanently on secondary storage, and allow the sharing of these objectsamong multiple programs and applications This requires the incorporation of other well-known features of database management systems, such as indexing mechanisms,concurrency control, and recovery An 00database system interfaces with one or more
00programming languages to provide persistent and shared object capabilities
One goal of00 databases is to maintain a direct correspondence between real-worldand database objects so that objects do not lose their integrity and identity and can easily
be identified and operated upon Hence,00 databases provide a unique system-generated
object identifier(OID) for each object We can compare this with the relational model whereeach relation must have a primary key attribute whose value identifies each tuple uniquely
In the relational model, if the value of the primary key is changed, the tuple will have anew identity, even though it may still represent the same real-world object Alternatively,
a real-world object may have different names for key attributes in different relations,making it difficult to ascertain that the keys represent the same object (for example, theobject identifier may be represented asEMP_IDin one relation and asSSNin another).Another feature of 00 databases is that objects may have an object structure of
arbitrary complexity in order to contain all of the necessary information that describes theobject In contrast, in traditional database systems, information about a complex object isoften scattered over many relations or records, leading to loss of direct correspondencebetween a real-world object and its database representation
The internal structure of an object in OOPLs includes the specification of instancevariables, which hold the values that define the internal state of the object Hence, aninstance variable is similar to the concept of an attributein the relational model, exceptthat instance variables may be encapsulated within the object and thus are notnecessarily visible to external users Instance variables may also be of arbitrarily complexdata types Object-oriented systems allow definition of the operations or functions(behavior) that can be applied to objects of a particular type In fact, some 00 modelsinsist that all operations a user can apply to an object must be predefined This forces a
complete encapsulation of objects This rigid approach has been relaxed in most 00datamodels for several reasons First, the database user often needs to know the attributenames so they can specify selection conditions on the attributes to retrieve specificobjects Second, complete encapsulation implies that any simple retrieval requires apredefined operation, thus making ad hoc queries difficult to specify on the fly
To encourage encapsulation, an operation is defined in two parts The first part,called the signature or interface of the operation, specifies the operation name andarguments (or parameters) The second part, called the methodor body, specifies the
implementation of the operation Operations can be invoked by passing amessage to anobject, which includes the operation name and the parameters The object then executesthe method for that operation This encapsulation permits modification of the internalstructure of an object, as well as the implementation of its operations, without the need todisturb the external programs that invoke these operations Hence, encapsulationprovides a form of data and operation independence (see Chapter 2)
Another key concept in00systems is that of type and class hierarchies and inheritance.
This permits specification of new types or classes that inherit much of their structure and/oroperations from previously defined types or classes Hence, specification of object types can
Trang 3220.2 Object Identity, Object Structure, and Type Constructors I 643
proceed systematically This makes it easier to develop the data types of a system
incrementally, and toreuseexisting type definitions when creating new types of objects
One problem in early 00database systems involved representingrelationshipsamong
objects The insistence on complete encapsulation in early 00data models led to the
argument that relationships should not be explicitly represented, but should instead be
described by defining appropriate methods that locate related objects However, this
approach does not work very well for complex databases with many relationships, because
it is useful to identify these relationships and make them visible to users The ODMG
standard has recognized this need and it explicitly represents binary relationships via a
pairofinversereferences-that is, by placing the OIDs of related objects within the objects
themselves, and maintaining referential integrity, as we shall describe in Chapter 21
Some 00 systems provide capabilities for dealing withmultiple versions of the same
object-a feature that is essential in design and engineering applications For example, an
old version of an object that represents a tested and verified design should be retained
until the new version is tested and verified A new version of a complex object may
include only a few new versions of its component objects, whereas other components
remain unchanged In addition to permitting versioning, 00databases should also allow
forschema evolution,which occurs when type declarations are changed or when new types
or relationships are created These two features are not specific to OODBs and should
ideally be included in all types of DBMSs.11
Another00concept isoperator overloading, which refers to an operation's ability to
be applied to different types of objects; in such a situation, an operation namemay refer to
several distinct implementations, depending on the type of objects it is applied to This
feature is also called operator polymorphism. For example, an operation to calculate the
area of a geometric object may differ in its method (implementation), depending on
whether the object is of type triangle, circle, or rectangle This may require the use oflate
bindingof the operation name to the appropriate method at run-time, when the type of
object to which the operation is applied becomes known
This section provided an overview of the main concepts of00databases In Sections
20.2 through 20.6, we discuss these concepts in more detail
AND TYPE CONSTRUCTORS
In this section we first discuss the concept of object identity, and then we present the
typ-ical structuring operations for defining the structure of the state of an object These
structuring operations are often called type constructors They define basic data-structuring
operations that can be combined to form complex object structures
- - -
-11.Several schema evolution operations, such asALTER TABLE,are already defined in the relational
SQLstandard (see Section 8.3)
Trang 33644 I Chapter 20 Concepts for Object Databases
An00database system provides a unique identity to each independent object stored in thedatabase This unique identity is typically implemented via a unique, system-generated objectidentifier, or DID. The value of an OID is not visible to the external user, but it is usedinternally by the systemtoidentify each object uniquely and to create and manage inter-object references The OlDcan be assigned to program variables of the appropriate typewhen needed
The main property required of an OID is that it be immutable; that is, the OlD value
of a particular object should not change This preserves the identity of the real-worldobject being represented Hence, an 00database system must have some mechanism forgenerating OIDs and preserving the immutability property It is also desirable that eachOID be used only once; that is, even if an object is removed from the database, its OIDshould not be assigned to another object These two properties imply that the OIDshould not depend on any attribute values of the object, since the value of an attributemay be changed or corrected It is also generally considered inappropriate to base theOID on the physical address of the object in storage, since the physical address canchange after a physical reorganization of the database However, some systems do use thephysical address as OID to increase the efficiency of object retrieval If the physicaladdress of the object changes, an indirect pointercan be placed at the former address,which gives the new physical location of the object It is more common to use longintegers as OIDs and then to use some form of hash table to map the OID value tothecurrent physical address of the object in storage
Some early 00 data models required that everything-from a simple value to acomplex object-be represented as an object; hence, every basic value, such as an integer,string, or Boolean value, has an OID This allows two basic values to have different OIDs,which can be useful in some cases For example, the integer value 50 can be used sometimes
to mean a weight in kilograms and at other times to mean the age of a person Then, twobasic objects with distinct OIDs could be created, but both objects would represent theinteger value 50 Although useful as a theoretical model, this is not very practical, since itmay lead to the generation of too many OIDs Hence, most00 database systems allow forthe representation of both objects and values Every object must have an immutable OID,whereas a value has no OIDand just stands for itself Hence, a value is typically stored within
an object andcannot be referencedfrom other objects In some systems, complex structuredvalues can also be created without having a corresponding OID if needed
20.2.2 Object Structure
In 00 databases, the state (current value) of a complex object may be constructed fromother objects (or other values) by using certain type constructors One formal way of rep-resenting such objects is to view each object as a triple(i, c, v), whereiis a uniqueobject identifier(the OlD), c is atype constructor 12(that is, an indication of how the object state is
12.This is different from the constructor operation that is used inc++and other OOPLstocreatenew objects
Trang 3420.2 Object Identity, Object Structure, and Type Constructors I 645
constructed), and v is the object state (or current value).The data model will typically
include several type constructors The three most basic constructors are atom, tuple, and
set Other commonly used constructors include list, bag, and array The atom
construc-tor is used to represent all basic atomic values, such as integers, real numbers, character
strings, Booleans, and any other basic data types that the system supports directly
The object statevof an object(i,c,v)is interpreted based on the constructor c.Ifc=
atom, the state (value)vis an atomic value from the domain of basic values supported by
the system.Ifc=set, the statevis aset of objectidentifiers {iI' iz, , in},which are the OIDs
for a set of objects that are typically of the same type If c=tuple, the statevis a tuple of
the form<al:il, az:iz, , an:in >,where eacha j is an attribute namel ' and eachi jis an OID
Ifc= list, the valuev is an ordered list [iI' iz, , in]of OIDs of objects of the same type A
list is similar to a set except that the OIDs in a list areordered, and hence we can refer to
the first, second, orlh object in a list For c= array, the state of the object is a
single-dimensional array of object identifiers The main difference between array and list is that
a list can have an arbitrary number of elements whereas an array typically has a maximum
size The difference between setand bagl4is that all elements in a set must be distinct
whereas a bag can have duplicate elements
This model of objects allows arbitrary nesting of the set, list, tuple, and other
constructors The state of an object that is not of type atom will refer to other objects by
their object identifiers Hence, the only case where an actual value appears is inthe state
ofan object of type atom.IS
The type constructors set, list, array, and bag are called collection types (or bulk
types), to distinguish them from basic types and tuple types The main characteristic of a
collection type is that the state of the object will be acollection of objects that may be
unordered (such as a set or a bag) or ordered (such as a list or an array) The tuple type
constructor is often called a structured type, since it corresponds to the struct construct
in theCandc++programming languages
EXAMPLE1: AComplex Object
We now represent some objects from the relational database shown in Figure 5.6, using
the preceding model, where an object is defined by a triple (OID, type constructor, state)
and the available type constuctors are atom, set, and tuple We useii' iz, i 3, •••to stand for
unique system-generated object identifiers Consider the following objects:
01 = (ii' atom, 'Houston')
Oz= (iz,atom, 'Bellaire')
03 = (i 3,atom, 'Sugarland')
13.Also called an instance variable name in00terminology
14 Also called a multiset
15.As we noted earlier, it is not practical to generate a unique system identifier for every value, so
real systems allow for bothOlfrsand structured value, which can be structured by using the same type
constructors as objects, except that a value does not have anaID
Trang 35646 I Chapter 20 Concepts for Object Databases
04 = (i 4,atom, 5)
05 = (is, atom, 'Research')
06=(i 6,atom, '1988-05-22')
07 =(i7,set, {iI'iz,i3})
Os = (is, tuple,<DNAME:is, DNUMBER:i4, MGR:i9, LOCATIONS:i 7, EMPLOYEES:ilO,PROJECTS:i l l»
09 = (i 9,tuple,<MANAGER:i 12, MANAGER_START_DATE:i6»
010=(i1O'set,{in, i13,i14})
011 = (ill'set filS' i16, in}) 0lZ = (in,tuple,<FNAME:i lS' MINIT:i19, LNAME:i 20, SSN:iZl , ,SALARY:i z6, SUPERVISOR:in ,DEPT:i s»
The first six objects (01-06) listed here represent atomic values There will be many
similar objects, one for each distinct constant atomic value in the database.16Object07
is a set-valued object that represents the set of locations for department 5; the set{iI' iz,
i 3}refers to the atomic objects with values {'Houston', 'Bellaire', 'Sugarland'} ObjectOs
is a tuple-valued object that represents department 5 itself, and has the attributesDNAME, DNUMBER, MGR, LOCATIONS,and so on The first two attributesDNAME and DNUMBERhave atomicobjects Osand 04 as their values The MGR attribute has a tuple object 09 as its value,
which in turn has two attributes The value of the MANAGERattribute is the object whoseOID isin,which represents the employee 'John B Smith' who manages the department,whereas the value ofMANAGER_START_DATEis another atomic object whose value is a date Thevalue of the EMPLOYEESattribute ofOsis a set object withOID =ilO ,whose value is the set ofobject identifiers for the employees who work for theDEPARTMENT(objectsin,plusi13andi14,which are not shown) Similarly, the value of thePROJECTSattribute ofOsis a set object withOID = ill'whose value is the set of object identifiers for the projects that are controlled bydepartment number 5 (objectsilS' i16,andin'which are not shown) The object whoseOID
=in represents the employee 'John B Smith' with all its atomic attributes (FNAME, MINH, LNAME, SSN, ••• , SALARY,that are referencing the atomic objectsi lS' i19,iZG'iZl' ,iZ6' respect-ively (not shown» plusSUPERVISORwhich references the employee object withOID =in(thisrepresents 'James E Borg' who supervises 'John B Smith' but is not shown) andDEPTwhichreferences the department object withOID=is(this represents department number 5 where'John B Smith' works)
Inthis model, an object can be represented as a graph structure that can be constructed
by recursively applying the type constructors The graph representing an object0i can beconstructed by first creating a node for the object0i itself The node for0i is labeled with theOIDand the object constructor c We also create a node in the graph for each basic atomic
16 These atomic objects are the ones that may cause a problem, due to the use of too many objectidentifiers, if this model is implemented directly
Trang 3620.2 Object Identity, Object Structure, and Type Constructors I 647
value If an object0ihas an atomic value, we draw a directed arc from the node representing
0i to the node representing its basic value If the object value is constructed, we draw
directed arcs from the object node to a node that represents the constructed value Figure
20.1 shows the graph for the exampleDEPARTMENTobjectOsgiven earlier
The preceding model permits two types of definitions in a comparison of thestates of
two objectsfor equality Two objects are said to have identical states (deep equality) if the
graphs representing their states are identical in every respect, including the OIDs at every
level Another, weaker definition of equality is when two objects have equal states
(shallow equality) In this case, the graph structures must be the same, and all the
corresponding atomic values in the graphs should also be the same However, some
corresponding internal nodes in the two graphs may have objects withdifferent OIDs.
EXAMPLE 2: Identical Versus Equal Objects
A example can illustrate the difference between the two definitions for comparing object
statesfor equality Consider the following objectsOJ'0z,03' 04' 0S, and06:
OJ = (ij , tuple,<aj:i 4, az:i 6»
Oz = (iz,tuple,<aj:is,az:i 6»
03 = (i 3,tuple,<aj:i 4, az:i 6 »
04 = (i 4,atom, 10)
as= (is, atom, 10)
06 = (i 6,atom,20)
The objectsOJand0zhaveequalstates, since their states at the atomic level are the
same but the values are reached through distinct objects04and05. However, the states of
objectsOJand 03are identical, even though the objects themselves are not because they
have distinct OIDs Similarly, although the states of 04 and 05 are identical, the actual
objects04and05 are equal but not identical, because they have distinct OIDs
20.2.3 Type Constructors
An object definition language (ODL)j? that incorporates the preceding type constructors
can be used to define the object types for a particular database application In Chapter21,
we shall describe the standard ODL of ODMG, but we first introduce the concepts gradually
in this section using a simpler notation The type constructors can be used to define the
datastructures for an 00database schema.In Section 20.3we will see how to incorporate
the definition ofoperations (or methods) into the00schema Figure 20.2shows how we
may declare Employee and Department types corresponding to the object instances shown
17 This would correspond to the DDL (Data Definition Language) of the database system (see
Chapter 2)
Trang 37648 IChapter 20 Concepts for Object Databases
in:'"
tuple
PROJECTS EMPLOYEES
i3:~3atom
FIGURE 20.1 Representation of aDEPARTMENTcomplex object as a graph
in Figure 20.1 In Figure 20.2, the Date type is defined as a tuple rather than an atomicvalue as in Figure 20.1 We use the keywords tuple, set, and list for the type constructors,and the available standard data types (integer, string, float, and so on) for atomic types
Trang 3820.3 Encapsulation of Operations, Methods, and Persistence I 649
Employee;
FIGURE20.2 Specifying the object types Employee, Date, and Department using
typeconstructors
Attributes that refer to other objects-such as dept of Employee or projects of
Department-are basically references to other objects and hence serve to represent
relationshipsamong the object types For example, the attribute dept of Employee is of type
Department, and hence is used to refer to a specific Department object (where the
Employee works) The value of such an attribute would be an OID for a specific Department
object A binary relationship can be represented in one direction, or it can have an inverse
reference. The latter representation makes it easy to traverse the relationship in both
directions For example, the attribute employees of Department has as its value a set of
references (that is, a set of OIDs) to objects of type Employee; these are the employees who
workfor the department The inverse is the reference attribute dept of Employee We will
see in Chapter 21 how the ODMG standard allows inverses to be explicitly declared as
relationship attributestoensure that inverse references are consistent
METHODS, AND PERSISTENCE
The concept ofencapsulationis one of the main characteristics of00 languages and
sys-tems.Itis also relatedtothe concepts ofabstractdatatypesandinformationhiding in
pro-gramming languages In traditional database models and systems, this concept was not
Trang 39650 I Chapter 20 Concepts for Object Databases
applied, since it is customary to make the structure of database objects visible to users andexternal programs In these traditional models, a number of standard database operationsare applicable to objects of all types For example, in the relational model, the operationsfor selecting, inserting, deleting, and modifying tuples are generic and may be appliedto
any relation in the database The relation and its attributes are visible to users and toexternal programs that access the relation by using these operations
20.3.1 Specifying Object Behavior via Class Operations
The concepts of information hiding and encapsulation can be applied to database objects.The main idea is to define the behavior of a type of object based on the operations thatcan be externally applied to objects of that type The internal structure of the object ishidden, and the object is accessible only through a number of predefined operations.Some operations may be used to create (insert) or destroy (delete) objects; other opera-tions may update the object state; and others may be used to retrieve parts of the objectstate or to apply some calculations Still other operations may perform a combination ofretrieval, calculation, and update In general, the implementation of an operation can bespecified in ageneral-purpose programming languagethat provides flexibility and power indefining the operations
The external users of the object are only made aware of the interface of the objecttype, which defines the name and arguments (parameters) of each operation Theimplementation is hidden from the external users; it includes the definition of theinternal data structures of the object and the implementation of the operations thataccess these structures In00terminology, the interface part of each operation is calledthe signature, and the operation implementation is called a method Typically, a method
is invoked by sending a message to the object to execute the corresponding method.Notice that, as part of executing a method, a subsequent message to another object may
be sent, and this mechanism may be used to return values from the objects to the externalenvironment or to other objects
For database applications, the requirement that all objects be completelyencapsulated is too stringent One way of relaxing this requirement is to divide thestructure of an object into visible and hidden attributes (instance variables) Visibleattributes may be directly accessed for reading by external operators, or by a high-levelquery language The hidden attributes of an object are completely encapsulated and can
be accessed only through predefined operations Most OODBMSs employ high-level querylanguages for accessing visible attributes In Chapter 21, we will describe the OQL querylanguage that is proposed as a standard query language for OODBs
In most cases, operations thatupdatethe state of an object are encapsulated This is away of defining the update semantics of the objects, given that in many 00data models,few integrity constraints are predefined in the schema Each type of object has its integrityconstraints programmed into the methods that create, delete, and update the objects byexplicitly writing code to check for constraint violations and to handle exceptions Insuch cases, all update operations are implemented by encapsulated operations Morerecently, the ODL for the ODMG standard allows the specification of some common
Trang 4020.3 Encapsulation of Operations, Methods, and Persistence I 651
constraints such as keys and inverse relationships (referential integrity) so that the system
can automatically enforce these constraints (see Chapter 21)
The term class is often used to refer to an object type definition, along with the
definitions of the operations for that type.I SFigure 20.3shows how the type definitions of
Figure 20.2 may be extended with operations to define classes A number of operations
are declared for each class, and the signature (interface) of each operation is included in
the class definition A method (implementation) for each operation must be defined
elsewhere, using a programming language Typical operations include the object
constructor operation, which is used to create a new object, and the destructor
operation, which is used to destroy an object A number of object modifier operations can
define class Employee:
type tuple( fname:
define class Department
type tuple( dname: string;
assign_emp(e: Employee): boolean;
(*adds an employee to the department*)
remove_emp(e: Employee): boolean;
(*removes an employee from the department*)
end Department;
FIGURE20.3 Adding operations to the definitions of Employee and Department
18.This definition of class is similar to how it is used in the popular c++ programming language.
TheODMGstandard uses the word interface in additiontoclass(see Chapter21).In theEERmodel,
the term class was usedtorefertoan object type, along with the set of all objects of that type (see
Chapter4)