1. Trang chủ
  2. » Công Nghệ Thông Tin

FUNDAMENTALS OF DATABASE SYSTEMS Fourth Edition phần 7 ppsx

111 522 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Concurrency Control Techniques and Database Recovery Techniques
Tác giả Bassiouni (1988), Papadimitriou and Kanellakis (1979), Bernstein and Goodman (1983), Reed (1978, 1983), Lai and Wilkinson (1984), Gray et al. (1975), Ries and Stonebraker (1977), Bhargava and Reidl (1988), Lehman and Yao (1981), Shasha and Goodman (1988), Srinivasan and Carey (1991), Badrinath and Ramamritham (1992), Dayal et al. (1991), Hasse and Weikum (1991)
Trường học University of Technology Sydney
Chuyên ngành Database Systems
Thể loại Chương of a textbook
Thành phố Sydney
Định dạng
Số trang 111
Dung lượng 4,32 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

1.If there is extensive damage toa wide portion of the database due to catastrophicfailure, such as a disk crash, the recovery method restores a past copy of the data-base that was backe

Trang 1

610 IChapter 18 Concurrency Control Techniques

and Bassiouni (1988) Papadimitriou and Kanellakis (1979) and Bernstein and Goodman(1983) discuss multiversion techniques Multiversion timestamp ordering was proposed inReed (1978, 1983), and multiversion two-phase locking is discussed in Lai and Wilkinson(1984) A method for multiple locking granularities was proposed in Gray et al (1975),and the effects of locking granularities are analyzed in Ries and Stonebraker (1977).Bhargava and Reidl (1988) presents an approach for dynamically choosing among variousconcurrency control and recovery methods Concurrency control methods for indexes arepresented in Lehman and Yao (1981) and in Shasha and Goodman (1988) A perfor-mance study of various B+ tree concurrency control algorithms is presented in Srinivasanand Carey (1991)

Other recent work on concurrency control includes semantic-based concurrencycontrol (Badrinath and Ramamritham, 1992), transaction models for long runningactivities (Dayal et al., 1991), and multilevel transaction management (Hasse andWeikum, 1991)

Trang 2

Database Recovery Techniques

Inthis chapter we discuss some of the techniques that can be used for database recovery

from failures We have already discussed the different causes of failure, such as system

crashes and transaction errors, in Section 17.1,4 We have also covered many of the

con-cepts that are used by recovery processes, such as the system log and commit points, in

Section 17.2

We start Section 19.1 with an outline of a typical recovery procedures and a

categor-ization of recovery algorithms, and then discuss several recovery concepts, including

write-ahead logging, in-place versus shadow updates, and the process of rolling back (undoing)

the effect of an incomplete or failed transaction In Section 19.2, we present recovery

techniques based on deferred update, also known as the NO-UNDO/REDO technique In

Section 19.3, we discuss recovery techniques based on immediate update; these include the

UNDO/REDO and UNDO/NO-REDO algorithms We discuss the technique known as

shadowing or shadow paging, which can be categorized as aNO-UNDO/NO-REDOalgorithm

inSection 19,4 An example of a practicalDBMSrecovery scheme, calledARIES,is presented

in Section 19.5 Recovery in rnultidatabases is briefly discussed in Section 19.6 Finally,

techniques for recovery from catastrophic failure are discussedinSection 19.7

Our emphasis is on conceptually describing several different approaches to recovery

For descriptions of recovery features in specific systems, the reader should consult the

bibliographic notes and the user manuals for those systems Recovery techniques are often

intertwined with the concurrency control mechanisms Certain recovery techniques are

best used with specific concurrency control methods We will attempt to discuss recovery

611

Trang 3

612 IChapter 19 Database Recovery Techniques

concepts independently of concurrency control mechanisms, but we will discuss thecircumstances under which a particular recovery mechanism is best used with a certainconcurrency control protocol

1.If there is extensive damage toa wide portion of the database due to catastrophicfailure, such as a disk crash, the recovery method restores a past copy of the data-base that was backed upto archival storage (typically tape) and reconstructs amore current state by reapplying orredoing the operations of committed transac-tions from thebacked uplog, uptothe time of failure

2 When the database is not physically damaged but has become inconsistent due tononcatastrophic failures of types 1 through 4 of Section 17.1.4, the strategy is toreverse any changes that caused the inconsistency byundoingsome operations Itmay also be necessary toredosome operations in order to restore a consistent state

of the database, as we shall see In this case we do not need a complete archivalcopy of the database Rather, the entries kept in the online system log are con-sulted during recovery

Conceptually, we can distinguish two main techniques for recovery from strophic transaction failures: (l) deferred update and (2) immediate update The deferredupdate techniques do not physically update the database on disk untilaftera transactionreaches its commit point; then the updates are recorded in the database Before reachingcommit, all transaction updates are recorded in the local transaction workspace (or buffers).During commit, the updates are first recorded persistently in the log and then written to thedatabase If a transaction fails before reaching its commit point, it will not have changed thedatabase in any way, soUNDOis not needed Itmay be necessary toREDOthe effect of theoperations of a committed transaction from the log, because their effect may not yet havebeen recorded in the database Hence, deferred update is also known as the NO-UNDO/ REDOalgorithm We discuss this technique in Section 19.2

noncata-In the immediate update techniques, the database may be updated by someoperations of a transaction before the transaction reaches its commit point However,these operations are typically recorded in the log on disk by force writing beforethey areapplied to the database making recovery still possible If a transaction fails after recordingsome changes in the database but before reaching its commit point, the effect of its

Trang 4

19.1 Recovery Concepts I 613

operations on the database must be undone; that is, the transaction must be rolled back

In the general case of immediate update, both undoand redo may be required during

recovery This technique, known as theUNDO/REDOalgorithm, requires both operations,

and is used most often in practice A variation of the algorithm where all updates are

recorded in the database before a transaction commits requiresundoonly, so it is known

as theUNDO/NO-REDOalgorithm We discuss these techniques in Section 19.3

19.1.2 Caching (Buffering) of Disk Blocks

The recovery process is often closely intertwined with operating system functions-in

particular, the buffering and caching of disk pages in main memory Typically, one or more

diskpages that include the data items to be updated are cached into main memory buffers

and then updated in memory before being written back to disk The caching of disk pages

is traditionally an operating system function, but because of its importance to the

effi-ciency of recovery procedures, it is handled by the DBMSby calling low-level operating

systems routines

In general, it is convenient to consider recovery in terms of the database disk pages

(blocks) Typically a collection of in-memory buffers, called the DBMS cache, is kept

under the control of the DBMSfor the purpose of holding these buffers A directory for

the cache is used to keep track of which database items are in the buffers.' This can be a

table of <disk page address, buffer location> entries When the DBMS requests

action on some item, it first checks the cache directory to determine whether the disk

page containing the item is in the cache If it is not, then the item must be located on

disk, and the appropriate disk pages are copied into the cache It may be necessary to

replace (or flush) some of the cache buffers to make space available for the new item

Some page-replacement strategy from operating systems, such as least recently used (LRU)

or first-in-first-out(FIFO),can be used to select the buffers for replacement

Associated with each buffer in the cache is a dirty bit, which can be included in the

directory entry, to indicate whether or not the buffer has been modified When a page is

first read from the database disk into a cache buffer, the cache directory is updated with the

new disk page address, and the dirty bit is set toa(zero) As soon as the buffer is modified,

the dirty bit for the corresponding directory entry is set to 1 (one) When the buffer

contents are replaced (flushed) from the cache, the contents must first be written back to

the corresponding disk pageonly if its dirty bitis 1 Another bit, called the pin-unpin bit, is

alsoneeded-a page in the cache is pinned (bit value 1 (one» if it cannot be written back

to disk as yet

Two main strategies can be employed when flushing a modified buffer back to disk

The first strategy, known as in-place updating, writes the buffer back to thesame original

disk location, thus overwriting the old value of any changed data items on disk,z Hence, a

single copy of each database disk block is maintained The second strategy, known as

shadowing, writes an updated buffer at a different disk location, so multiple versions of

1.This is somewhat similar to the concept ofpage tablesused by the operating system

2 In-place updating is used in most systems in practice

Trang 5

614 I Chapter 19 Database Recovery Techniques

data items can be maintained In general, the old value of the data item before updating iscalled the before image(BFIM),and the new value after updating is called the after image

(AFIM) In shadowing, both the BFIM and the AFIM can be kept on disk; hence, it is notstrictly necessary to maintain a log for recovering We briefly discuss recovery based onshadowing in Section 19.4

we need to distinguish between two types of log entry information included for a writecommand: (1) the information needed for UNDO and (2) that needed for REDO AREDO-

type log entry includes the new value (AFIM) of the item written by the operation sincethis is needed toredothe effect of the operation from the log (by setting the item value inthe databasetoits AFIM) The UNDO-type log entries include the old value (BFIM) of theitem since this is needed to undothe effect of the operation from the log (by setting theitem value in the database back toits BFIM) In an UNDO/REDO algorithm, both types oflog entries are combined In addition, when cascading rollback is possible, read_item

entries in the log are consideredtobe UNDO-type entries (see Section 19.1.5)

As mentioned, the DBMS cache holds the cached database disk blocks, which includenot onlydatablocksbut also index blocksandlogblocksfrom the disk When a log record iswritten, it is stored in the current log block in the DBMS cache The log is simply asequential (append-only) disk file and the DBMS cache may contain several log blocks (forexample, the last n log blocks) that will be written to disk When an update to a datablock-stored in the DBMS cache-is made, an associated log record is written to the lastlog block in the DBMS cache With the write-ahead logging approach, the log blocks thatcontain the associated log records for a particular data block update must first be written

to disk before the data block itself can be written back to disk

Standard DBMS recovery terminology includes the terms steal/no-steal and force, which specify when a page from the database can be written to disk from the cache:

force/no-1 Ifa cache page updated by a transactioncannotbe written to disk before the action commits, this is called a no-steal approach The pin-unpin bit indicates if

trans-a ptrans-age ctrans-annot be written btrans-ack to disk Otherwise, if the protocol trans-allows writing trans-anupdated bufferbeforethe transaction commits, it is called steal Steal is used whenthe DBMS cache (buffer) manager needs a buffer frame for another transaction andthe buffer manager replaces an existing page that had been updated but whosetransaction has not committed

2.If all pages updated by a transaction are immediately written to disk when the action commits, this is called a force approach Otherwise, it is called no-force

Trang 6

trans-19.1 Recovery Concepts I 615

The deferred update recovery scheme in Section 19.2 follows a no-steal approach

However, typical database systems employ asteal/no-forcestrategy The advantage of steal

isthat it avoids the need for a very large buffer space to store all updated pages in memory

The advantage of no-force is that an updated page of a committed transaction may still be

in the buffer when another transaction needs to update it, thus eliminating the I/O cost to

read that page again from disk This may provide a substantial saving in the number of I/O

operations when a specific page is updated heavily by multiple transactions

To permit recovery when in-place updating is used, the appropriate entries required

for recovery must be permanently recorded in the logon disk before changes are applied to

the database For example, consider the following write-ahead logging (WAL) protocol

for a recovery algorithm that requires both UNDO and REDO:

1.The before image of an item cannot be overwritten by its after image in the

data-base on disk until all UNDO-type log records for the updating transaction-up to

this point in time-have been force-written to disk

2 The commit operation of a transaction cannot be completed until all the REDO-type

and UNDO-type log records for that transaction have been force-written to disk

To facilitate the recovery process, the DBMS recovery subsystem may need to maintain

a number of lists related to the transactions being processed in the system These include a

list for active transactions that have started but not committed as yet, and it may also

include lists of all committed and aborted transactions since the last checkpoint (see next

section) Maintaining these lists makes the recovery process more efficient

19.1.4 Checkpoints in the System log and

Fuzzy Checkpointing

Another type of entry in the log is called acheckpoint.lA [checkpoi nt] record is

writ-ten into the log periodically at that point when the system writes out to the database on

disk all DBMS buffers that have been modified As a consequence of this, all transactions

that have their [commi t,T] entries in the log before a [checkpoi nt] entry do not need to

have their WRITE operationsredone in case of a system crash, since all their updates will

be recorded in the database on disk during checkpointing

The recovery manager of a DBMS must decide at what intervals to take a checkpoint

The interval may be measured in time-say, every m minutes-or in the number t of

committed transactions since the last checkpoint, where the values of m or t are system

parameters Taking a checkpoint consists of the following actions:

1 Suspend execution of transactions temporarily

2 Force-write all main memory buffers that have been modified to disk

3 Write a [checkpoi nt] record to the log, and force-write the log to disk

4 Resume executing transactions

- - - ~ ~

-3.The term checkpoint has been usedtodescribe more restrictive situations in some systems, such as

DB2.It has also been used in the literature to describe entirely different concepts

Trang 7

616 I Chapter 19 Database Recovery Techniques

As a consequence of step 2, a checkpoint record in the log may also includeadditional information, such as a list of active transaction ids, and the locations(addresses) of the first and most recent (last) records in the log for each activetransaction This can facilitate undoing transaction operations in the event that atransaction must be rolled back

The time needed to force-write all modified memory buffers may delay transactionprocessing because of step 1.To reduce this delay, it is common to use a technique calledfuzzy checkpointing in practice In this technique, the system can resume transactionprocessing after the [checkpoi nt] record is written to the log without having to wait forstep 2 to finish However, until step 2 is completed, the previous [checkpoi nt] recordshould remain valid To accomplish this, the system maintains a pointer to the validcheckpoint, which continues to point to the previous [checkpoi nt] record in the log Oncestep 2 is concluded, that pointer is changed to point to the new checkpoint in the log

19.1.5 Transaction Rollback

Ifa transaction fails for whatever reason after updating the database, it may be necessary toroll back the transaction If any data item values have been changed by the transaction andwritten to the database, they must be restoredtotheir previous values (BFIMs) The undo-type log entries are used to restore the old values of data items that must be rolled back

If a transaction T is rolled back, any transaction S that has, in the interim, read thevalue of some data item X written by T must also be rolled back Similarly, once S is rolledback, any transaction R that has read the value of some data item Y written by S must also

be rolled back; and so on This phenomenon is called cascading rollback, and can occurwhen the recovery protocol ensures recoverable schedules but does not ensure strict or

cascadeless schedules (see Section 17.4.2) Cascading rollback, understandably, can bequite complex and time-consuming That is why almost all recovery mechanisms aredesigned such that cascading rollback isnever required.

Figure 19.1 shows an example where cascading rollback is required The read andwrite operations of three individual transactions are shown in Figure 19.1a Figure 19.1bshows the system log at the point of a system crash for a particular execution schedule ofthese transactions The values of data items A, B, C, and 0, which are used by thetransactions, are shown to the right of the system log entries We assume that the originalitem values, shown in the first line, are A= 30, B= 15, C= 40, and0 = 20 At the point

of system failure, transaction T3has not reached its conclusion and must be rolled back.TheWRITE operations ofT3 , marked by a single *in Figure 19.1b, are the T3 operationsthat are undone during transaction rollback Figure 19.1c graphically shows theoperations of the different transactions along the time axis

We must now check for cascading rollback From Figure 19.1c we see thattransaction Tzreads the value of item B that was written by transactionT3;this can also

be determined by examining the log Because T3 is rolled back, T z must now be rolledback, too The WRITE operations of T z, marked by ** in the log, are the ones that areundone Note that only write_item operations need to be undone during transactionrollback; read_item operations are recorded in the log only to determine whethercascading rollback of additional transactions is necessary

Trang 8

read_item( C)

write3em(B)

read_item(A) write_item(A)

f - system crash

'T«is rolled back because it did not reach its commit point.

"T 2is rolled back because it reads the value of item 8 written by Ts.

I

IRE~D(A)

T1 1 IBEGIN

FIGURE19.1 Illustrating cascading rollback (a process that never occurs in strict or

cascadeless schedules) (a) The read and write operations of three transactions

(b) System log at point of crash (c) Operations before the crash

In practice, cascading rollback of transactions is never required because practical

recovery methods guarantee cascadeless or strict schedules Hence, there is also no need

to record any read_item operations in the log, because these are needed only for

determining cascading rollback

Trang 9

618 IChapter 19 Database Recovery Techniques

The idea behind deferred update techniques istodefer or postpone any actual updates tothe database until the transaction completes its execution successfully and reaches itscommit point." During transaction execution, the updates are recorded only in the logand in the cache buffers After the transaction reaches its commit point and the log isforce-written to disk, the updates are recorded in the database If a transaction fails beforereaching its commit point, there is no need to undo any operations, because the transac-tion has not affected the database on disk in any way Although this may simplify recov-ery, it cannot be used in practice unless transactions are short and each transactionchanges few items For other types of transactions, there is the potential for running out

of buffer space because transaction changes must be held in the cache buffers until thecommit point

We can state a typical deferred update protocol as follows:

1.A transaction cannot change the database on disk until it reaches its commit point

2 A transaction does not reach its commit point until all its update operations arerecorded in the logandthe log is force-written to disk

Notice that step 2 of this protocol is a restatement of the write-ahead logging (WAL)protocol Because the database is never updated on disk until after the transactioncommits, there is never a need to UNDOany operations Hence, this is known as theNO-UNDO/REDO recovery algorithm. REDO is needed in case the system fails after atransaction commits but before all its changes are recorded in the database on disk.Inthiscase, the transaction operations are redone from the log entries

Usually, the method of recovery from failure is closely related to the concurrencycontrol method in multiuser systems First we discuss recovery in single-user systems,where no concurrency control is needed, so that we can understand the recovery processindependently of any concurrency control method We then discuss how concurrencycontrol may affect the recovery process

Single-User Environment

In such an environment, the recovery algorithm can be rather simple The algorithmRDU_S(Recovery using Deferred Update in a Single-user environment) uses aREDOprocedure,given subsequently, for redoing certain wri te_item operations; it works as follows:

PROCEDURE RDU_S: Use two lists of transactions: the committed transactions sincethe last checkpoint, and the active transactions (at most one transaction will fall inthis category, because the system is single-user) Apply theREDOoperation to all the

4 Hence deferred updare can generally be characrerized as ano-stealapproach.

Trang 10

19.2 Recovery Techniques Based on Deferred Update I 619

WRITE_ITEM operations of the committed transactions from the log in the order in

which they were writtentothe log Restart the active transactions

TheREDOprocedure is defined as follows:

REDO(WRITE_OP):Redoing a wri te_i tern operationWRITE_OPconsists of examining

its log entry [write_itern,T,X,new_value] and setting the value of item X in the

database to new_val ue, which is the after image (AFIM)

TheREDO operation is requiredtobe idempotent-that is, executing it over and over

is equivalent to executing it just once In fact, the whole recovery process should be

idempotent This is so because, if the system were to fail during the recovery process, the

next recovery attempt mightREDOcertain wri te_i tern operations that had already been

redone during the first recovery process The result of recovery from a system crashduring

recoveryshould be the same as the result of recoveringwhen there isnocrash during recovery!

Notice that the only transaction in the active list will have had no effect on the

database because of the deferred update protocol, and it is ignored completely by the

recovery process because none of its operations were reflected in the database on disk

However, this transaction must now be restarted, either automatically by the recovery

process or manually by the user

Figure 19.2 shows an example of recovery in a single-user environment, where the

first failure occurs during execution of transaction Tv as shown in Figure 19.2b The

recovery process will redo the [wri te_i tern, T1, D, 20] entry in the log by resetting the

value of item D to 20 (its new value) The [wri te, T2, ] entries in the log are ignored

by the recovery process because T2is not committed If a second failure occurs during

recovery from the first failure, the same recovery process is repeated from start to finish,

with identical results

[write_item, T 2,D,25] +-system crash

The [write_item, ] operations ofT 1are redone.

T 2log entries are ignored by the recovery process.

FIGURE19.2 An example of recovery using deferred update in a single-user

envi-ronment (a) The READandWRITEoperations of two transactions (b) The system log at

the point of crash

Trang 11

620 IChapter 19 Database Recovery Techniques

Execution in a Multiuser Environment

For multiuser systems with concurrency control, the recovery process may be more plex, depending on the protocols used for concurrency control In many cases, the con-currency control and recovery processes are interrelated In general, the greater thedegree of concurrency we wish to achieve, the more time consuming the task of recoverybecomes

com-Consider a system in which concurrency control uses strict two-phase locking, so thelocks on items remain in effectuntil the transaction reaches itscommitpoint.After that, thelocks can be released This ensures strict and serializable schedules Assuming that[checkpoi nt] entries are included in the log, a possible recovery algorithm for this case,which we call RDU_M (Recovery using Deferred Update in a Multiuser environment), isgiven next This procedure uses theREDOprocedure defined earlier

PROCEDURE RDU_M (WITH CHECKPOINTS): Use two lists of transactions tained by the system: the committed transactions T since the last checkpoint (com-

main-mit list), and the active transactionsT' (active list) REDOall theWRITEoperations

of the committed transactions from the log, inthe order in which they were written into the log. The transactions that are active and did not commit are effectively canceledand must be resubmitted

Figure 19.3 shows a possible schedule of executing transactions When the point was taken at timet),transaction T) had committed, whereas transactions T3andT4

check-had not Before the system crash at time t 2,T3and T2were committed but not T4and Ts.According to the RDU_M method, there is no need to redo the wri te_i tern operations oftransaction T I-or any transactions committed before the last checkpoint timet).Wri re_

i tern operations of T2and T3must be redone, however, because both transactions reached

Trang 12

19.2 Recovery Techniques Based on Deferred Update I 621

their commit points after the last checkpoint Recall that the log is force-written before

committing a transaction Transactions T4 and T5 are ignored: They are effectively

canceled or rolled back because none of their wri te_i tern operations were recorded in

the database under the deferred update protocol We will refer to Figure 19.3 later to

illustrate other recovery protocols

We can make theNO-UNDO/REDOrecovery algorithmmore efficientby noting that, if

a data item X has been updated-as indicated in the log entries-more than once by

committed transactions since the last checkpoint, it is only necessary to REDO the last

update ofX from the log during recovery The other updates would be overwritten by this

lastREDOin any case In this case, we start fromthe end of the log; then, whenever an item

isredone, it is added to a list of redone items BeforeREDOis applied to an item, the list is

checked; if the item appears on the list, it is not redone again, since its last value has

already been recovered

If a transaction is aborted for any reason (say, by the deadlock detection method), it

is simply resubmitted, since it has not changed the database on disk A drawback of the

method described here is that it limits the concurrent execution of transactions because

all items remain locked until the transaction reaches itscommitpoint. In addition, it may

require excessive buffer space to hold all updated items until the transactions commit

The method's main benefit is that transaction operationsnever need to be undone,for two

reasons:

1 A transaction does not record any changes in the database on disk until after it

reaches its commit point-that is, until it completes its execution successfully

Hence, a transaction is never rolled back because of failure during transaction

execution

2 A transaction will never read the value of an item that is written by an

uncom-mitted transaction, because items remain locked until a transaction reaches its

commit point Hence, no cascading rollback will occur

Figure 19.4 shows an example of recovery for a multiuser system that utilizes the

recovery and concurrency control method just described

the Database

In general, a transaction will have actions that donotaffect the database, such as

generat-ing and printgenerat-ing messages or reports from information retrieved from the database If a

transaction fails before completion, we may not want the user to get these reports, since

the transaction has failed to complete If such erroneous reports are produced, part of the

recovery process would have to inform the user that these reports are wrong, since the

user may take an action based on these reports that affects the database Hence, such

reports should be generated onlyafter the transaction reaches itscommitpoint.A common

method of dealing with such actions is to issue the commands that generate the reports

but keep them as batch jobs, which are executed only after the transaction reaches its

commit point If the transaction fails, the batch jobs are canceled

Trang 13

622 IChapter 19 Database Recovery Techniques

T 1

-(a) read_item(A) read_item(D) write_item(D)

T 2

read_item(B) write_item(B) read_item(D) write_item(D)

T 3

read_item(A) write_item(A) read_item( C) write_item( C)

T 4

read_item(B) write_item(B) read_item(A) write_item(A)

[write_item,T 2 ,D,25] system crash

TzandT 3are ignored because they did not reach their commit points.

~isredone because its commit point is after the last system checkpoint.

FIGURE 19.4 An example of recovery using deferred update with concurrent actions (a) The READandWRITEoperations of four transactions (b) System log at thepoint of crash

IMMEDIATE UPDATE

In these techniques, when a transaction issues an update command, the database can beupdated "immediately," without any need to wait for the transaction to reach its commitpoint In these techniques, however, an update operation must still be recorded in the log(on disk) beforeit is applied to the database-using the write-ahead logging protocol-sothat we can recover in case of failure

Provisions must be made for undoing the effect of update operations that have beenapplied to the database by a failed transaction. This is accomplished by rolling back thetransaction and undoing the effect of the transaction's wri te_i tern operations Theoretically,

we can distinguish two main categories of immediate update algorithms If the recoverytechnique ensures that all updates of a transaction are recorded in the database on diskbefore

the transaction commits, there is never a need toREDO any operations of committed actions This is called theUNDO!NO-REDO recovery algorithm On the other hand, if the

Trang 14

trans-19.3 Recovery Techniques Based on Immediate Update I 623

transaction is allowed to commit before all its changes are writtento the database, we have

the most general case, known as theUNDO/REDOrecovery algorithm This is also the most

complex technique Next, we discuss two examples ofUNDO/REDOalgorithms and leave it as

an exercise for the reader to develop the UNDO/NO-REDO variation In Section 19.5, we

describe a more practical approach known as theARIESrecovery technique

Update in a Single-User Environment

In a single-user system, if a failure occurs, the executing (active) transaction at the time

of failure may have recorded some changes in the database The effect of all such

opera-tions must be undone The recovery algorithmRIU_S(Recovery using Immediate Update

in a Single-user environment) uses the REDO procedure defined earlier, as well as the

UNDOprocedure defined below

PROCEDURERIU_S

1.Use two lists of transactions maintained by the system: the committed

tions since the last checkpoint and the active transactions (at most one

transac-tion will fall in this category, because the system is single-user)

2 Undo all the wri te_i tern operations of theactivetransaction from the log, using

theUNDOprocedure described below

3 Redo the write_; ternoperations of the committedtransactions from the log, in the

order in which they were written in the log, using the REDOprocedure described earlier

TheUNDOprocedure is defined as follows:

UNDO(WRITE_OP): Undoing a write_i ternoperation write_opconsists of

examin-ing its log entry [write_;tern,T,X,01d_va1ue,new_va1ue] and settexamin-ing the value of

item X in the database to 01d_va1 ue which is the before image (BFIM) Undoing a

num-ber of wri te_; tern operations from one or more transactions from the log must proceed

in thereverse orderfrom the order in which the operations were written in the log

Update with Concurrent Execution

When concurrent execution is permitted, the recovery process again depends on the

pro-tocols used for concurrency control The procedure RIU_M (Recovery using Immediate

Updates for a Multiuser environment) outlines a recovery algorithm for concurrent

trans-actions with immediate update Assume that the log includes checkpoints and that the

concurrency control protocol produces strict schedules-as, for example, the strict

two-phase locking protocol does Recall that a strict schedule does not allow a transaction to

read or write an item unless the transaction that last wrote the item has committed (or

aborted and rolled back) However, deadlocks can occur in strict two-phase locking, thus

Trang 15

624 IChapter 19 Database Recovery Techniques

requiring abort and UNDO of transactions For a strict schedule, UNDO of an operationrequires changing the item back to its old value (BFIM)

3 Redo all the wri te_item operations of thecommittedtransactions from the log, inthe order in which they were written into the log

As we discussed in Section 19.2.2, step 3 is more efficiently done by starting from the

end of the logand redoing only the last update of each itemX Whenever an item is redone,

it is added to a list of redone items and is not redone again A similar procedure can bedevised to improve the efficiency of step 2

19.4 SHADOW PAGING

This recovery scheme does not require the use of a log in a single-user environment In amultiuser environment, a log may be needed for the concurrency control method Shadowpaging considers the database to be made up of a number of fixed-size disk pages (or diskblocks)-say, n-for recovery purposes A directory with n entries' is constructed, where the

ithentry points to the ithdatabase page on disk The directory is kept in main memory if it isnot too large, and all references-reads or writes-to database pages on disk go through it.When a transaction begins executing, the current directory-whose entries point to themost recent or current database pages on disk-is copied into a shadow directory Theshadow directory is then saved on disk while the current directory is used by the transaction.During transaction execution, the shadow directory isnevermodified When a wri te_itemoperation is performed, a new copy of the modified database page is created, but theold copy of that page isnot overwritten. Instead, the new page is written elsewhere-onsome previously unused disk block The current directory entry is modified to point to thenew disk block, whereas the shadow directory is not modified and continues to point to theold unmodified disk block Figure 19.5 illustrates the concepts of shadow and currentdirectories For pages updated by the transaction, two versions are kept The old version isreferenced by the shadow directory, and the new version by the current directory

To recover from a failure during transaction execution, it is sufficient to free themodified database pages and to discard the current directory The state of the databasebefore transaction execution is available through the shadow directory, and that state isrecovered by reinstating the shadow directory The database thus is returned to its state

5 The directory is similar to the page table maintained by the operating system for each process

Trang 16

19.5 TheARIESRecovery Algorithm I 625

database disk blocks (pages)

shadow directory (not updated)

FIGURE 19.5 An example of shadow paging

prior to the transaction that was executing when the crash occurred, and any modified

pages are discarded Committing a transaction corresponds to discarding the previous

shadow directory Since recovery involves neither undoing nor redoing data items, this

technique can be categorized as aNO-UNDO/NO-REDOtechnique for recovery

In a multiuser environment with concurrent transactions, logs and checkpoints must be

incorporated into the shadow paging technique One disadvantage of shadow paging is that

the updated database pages change location on disk This makes it difficult to keep related

database pages close together on disk without complex storage management strategies

Furthermore, if the directory is large, the overhead of writing shadow directories to disk as

transactions commit is significant A further complication is how to handle garbage collection

when a transaction commits The old pages referenced by the shadow directory that have

been updated must be released and added to a list of free pages for future use These pages are

no longer needed after the transaction commits Another issue is that the operation to migrate

between current and shadow directories must be implemented as an atomic operation

19.5 TH EARl ES RECOVERY ALGORITHM

We now describe theARIESalgorithm as an example of a recovery algorithm used in

data-base systems ARIES uses a steal/no-force approach for writing, and it is based on three

concepts: (l) write-ahead logging, (2) repeating history during redo, and (3) logging

Trang 17

626 IChapter 19 Database Recovery Techniques

changes during undo We already discussed write-ahead logging in Section 19.1.3 Thesecond concept, repeating history, means thatARIESwill retrace all actions of the data-base system priorto the crashto reconstruct the database state when the crash occurred.

Transactions that were uncommitted at the time of the crash (active transactions) areundone The third concept, logging during undo, will preventARIESfrom repeating thecompleted undo operations if a failure occurs during recovery, which causes a restart ofthe recovery process

TheARIESrecovery procedure consists of three main steps: (1) analysis, (2) REDOand(3) UNDO The analysis step identifies the dirty (updated) pages in the buffer," and theset of transactions active at the time of the crash The appropriate point in the log wherethe REDOoperation should start is also determined The REDO phase actually reappliesupdates from the logto the database Generally, the REDOoperation is applied to onlycommitted transactions However, in ARIES,this is not the case Certain information inthe ARIES log will provide the start point for REDO,from which REDOoperations areapplied until the end of the log is reached In addition, information stored byARIESand

in the data pageswill allow ARIES to determine whether the operation to be redonehas

actually been applied to the database and hence need not be reapplied Thus only the necessary REDOoperations are applied during recovery Finally, during the UNDO phase,the log is scanned backwards and the operations of transactions that were active at thetime of the crash are undone in reverse order The information needed for ARIES toaccomplish its recovery procedure includes the log, the Transaction Table, and the DirtyPage Table In addition, checkpointing is used These two tables are maintained by thetransaction manager and written to the log during checkpointing

In ARIES,every log record has an associated log sequence number (LSN) that ismonotonically increasing and indicates the address of the log record on disk Each LSNcorresponds to aspecific change (action) of some transaction In addition, each data page

will store theLSNof thelatest logrecord correspondingtoa change for that page A log record

is written for any of the following actions: updating a page (write), committing atransaction (commit), aborting a transaction (abort), undoing an update (undo), andending a transaction (end) The need for including the first three actions in the log hasbeen discussed, but the last two need some explanation When an update is undone, a

compensation log record is written in the log When a transaction ends, whether by

committing or aborting, an end logrecord is written.

Common fields in all log records include: (1) the previousLSNfor that transaction,(2) the transaction ID, and (3) the type of log record The previous LSNis importantbecause it links the log records (in reverse order) for each transaction For an update(write) action, additional fields in the log record include: (4) the pageIDfor the page thatincludes the item, (5) the length of the updated item, (6) its offset from the beginning ofthe page, (7) the before image of the item, and (8) its after image

6 The actual buffers may be lost during a crash, since they are in main memory Additional tablesstored in the log during checkpointing (Dirty Page Table, Transaction Table) allow ARIES to iden-tify this information (see next page)

Trang 18

19.5 TheARIESRecovery Algorithm I 627

Besides the log, two tables are needed for efficient recovery: the Transaction Table

and the Dirty Page Table, which are maintained by the transaction manager When a

crash occurs, these tables are rebuilt in the analysis phase of recovery The Transaction

Table contains an entry for each active transaction, with information such as the

transaction ID, transaction status, and the LSN of the most recent log record for the

transaction The Dirty Page Table contains an entry for each dirty page in the buffer,

which includes the pageIDand theLSNcorresponding to the earliest update to that page

Checkpointing in ARIESconsists of the following: (1) writing a begi n_checkpoi nt

record to the log, (2) writing an end_checkpoi nt record to the log, and (3) writing the

LSN ofthe begi n_checkpoi nt record to a special file This special file is accessed during

recovery to locate the last checkpoint information With the end_checkpoi nt record, the

contents of both the Transaction Table and Dirty Page Table are appended to the end of

the log To reduce the cost, fuzzy checkpointing is used so that theDBMScan continue to

execute transactions during checkpointing (see Section 19.1.4) In addition, the contents

of the DBMScache do not have to be flushed to disk during checkpoint, since the

Transaction Table and Dirty Page Table-which are appended to the log on

disk-contain the information needed for recovery Notice that if a crash occurs during

checkpointing, the special file will refer to the previous checkpoint, which is used for

recovery

After a crash, the ARIES recovery manager takes over Information from the last

checkpoint is first accessed through the special file The analysis phase starts at the

begi n_checkpoi nt record and proceeds to the end of the log When the end_checkpoi nt

record is encountered, the Transaction Table and Dirty Page Table are accessed (recall

that these tables were written in the log during checkpointing) During analysis, the log

records being analyzed may cause modifications to these two tables For instance, if an

end log record was encountered for a transaction T in the Transaction Table, then the

entry for T is deleted from that table.Ifsome other type of log record is encountered for a

transactionT',then an entry for T' is inserted into the Transaction Table, if not already

present, and the last LSNfield is modified If the log record corresponds to a change for

page P, then an entry would be made for page P (if not present in the table) and the

associated LSN field would be modified When the analysis phase is complete, the

necessary information forREDOandUNDOhas been compiled in the tables

The REDO phase follows next To reduce the amount of unnecessary work, ARIES

starts redoing at a point in the log where it knows (for sure) that previous changes to dirty

pageshave already been applied to the databaseondisk.It can determine this by finding the

smallestLSN,M, of all the dirty pages in the Dirty Page Table, which indicates the log

position whereARIESneeds to start theREDOphase Any changes corresponding to aLSN

<M, for redoable transactions, must have already been propagated to disk or already

been overwritten in the buffer; otherwise, those dirty pages with thatLSNwould be in the

buffer (and the Dirty Page Table) SO,REDOstarts at the log record withLSN= M and

scans forward to the end of the log For each change recorded in the log, the REDO

algorithm would verify whether or not the change has to be reapplied For example, if a

change recorded in the log pertains to page P that is not in the Dirty Page Table, then this

change is already on disk and need not be reapplied Or, if a change recorded in the log

(withLSN= N, say) pertains to page P and the Dirty Page Table contains an entry for P

Trang 19

628 IChapter 19 Database Recovery Techniques

with LSN greater than N, then the change is already present If neither of these twoconditions hold, page P is read from disk and the LSN stored on that page, LSN(P), iscompared with N If N < LSN(P),then the change has been applied and the page neednot be rewritten to disk

Once theREDOphase is finished, the database is in the exact state that it was in when thecrash occurred The set of active transactions -ealled the undo_set-has been identified inthe Transaction Table during the analysis phase Now, theUNDOphase proceeds by scanningbackward from the end of the log and undoing the appropriate actions A compensating logrecord is written for each action that is undone TheUNDOreads backward in the log untilevery action of the set of transactions in the undo_set has been undone When this iscompleted, the recovery process is finished and normal processing can begin again

Consider the recovery example shown in Figure 19.6 There are three transactions:

Tj , r; and T3•Tj updates page C, r, updates pages Band C, and T3updates pageA.Figure 19.6 (a) shows the partial contents of the log and (b) shows the contents of theTransaction Table and Dirty Page Table Now, suppose that a crash occurs at this point

(c)

Trang 20

19.6 Recovery in Multidatabase Systems I 629

Since a checkpoint has occurred, the address of the associated begi n_checkpoi nt record

isretrieved, which is location 4 The analysis phase starts from location 4 until it reaches

the end The end_checkpoi nt record would contain the Transaction Table and Dirty

Page table in Figure 19.6b, and the analysis phase will further reconstruct these tables

When the analysis phase encounters log record 6, a new entry for transaction T3is made

in the Transaction Table and a new entry for page A is made in the Dirty Page table After

log record 8 is analyzed, the status of transaction T z is changed to committed in the

Transaction Table Figure 19.6c shows the two tables after the analysis phase

For the REDO phase, the smallest LSN in the Dirty Page table is 1 Hence the REDO

will start at log record 1 and proceed with the REDO of updates The LSNs {I, 2, 6, 7}

corresponding to the updates for pages C, B,A, and C, respectively, are not less than the

LSNs of those pages (as shown in the Dirty Page table) So those data pages will be read

again and the updates reapplied from the log (assuming the actual LSNs stored on those

data pages are less then the corresponding log entry) At this point, the REDO phase is

finished and the UNDO phase starts From the Transaction Table (Figure 19.6c), UNDO is

applied only to the active transactionT3 The UNDO phase starts at log entry 6 (the last

update for T3 ) and proceeds backward in the log The backward chain of updates for

transaction T3 (only log record 6 in this example) is followed and undone

So far, we have implicitly assumed that a transaction accesses a single database In some

cases a single transaction, called a multidatabase transaction, may require access to

mul-tiple databases These databases may even be stored on different types of DBMSs; for

exam-ple, some DBMSs may be relational, whereas others are object-oriented, hierarchical, or

network DBMSs In such a case, each DBMS involved in the multidatabase transaction may

have its own recovery technique and transaction manager separate from those of the

other DBMSs This situation is somewhat similar to the case of a distributed database

man-agement system (see Chapter 25), where parts of the database reside at different sites that

are connected by a communication network

To maintain the atomicity of a multidatabase transaction, it is necessary to have a

two-level recovery mechanism A global recovery manager, or coordinator, is needed to

maintain information needed for recovery, in addition to the local recovery managers and

the information they maintain (log, tables) The coordinator usually follows a protocol

called the two-phase commit protocol, whose two phases can be stated as follows:

• Phase 1: When all participating databases signal the coordinator that the part of the

multidatabase transaction involving each has concluded, the coordinator sends a

message "prepare for commit" to each participant to get ready for committing the

transaction Each participating database receiving that message will force-write all

log records and needed information for local recovery to disk and then send a "ready

to commit" or "OK"signal to the coordinator If the force-writing to disk fails or the

local transaction cannot commit for some reason, the participating database sends a

"cannot commit" or "notOK" signal to the coordinator If the coordinator does not

Trang 21

630 IChapter 19 Database Recovery Techniques

receive a reply from a database within a certain time out interval, it assumes a "not

OK"response

• Phase 2: Ifallparticipating databases reply"OK," and the coordinator's vote is also

"OK," the transaction is successful, and the coordinator sends a "commit" signalforthe transaction to the participating databases Because all the local effects of thetransaction and information needed for local recovery have been recorded in the logs

of the participating databases, recovery from failure is now possible Each ing database completes transaction commit by writing a [commit] entry for the trans-action in the log and permanently updating the database if needed On the otherhand, if one or more of the participating databases or the coordinator have a "not

participat-OK"response, the transaction has failed, and the coordinator sends a message to "tollback" orUNDOthe local effect of the transaction to each participating database This

is done by undoing the transaction operations, using the log

The net effect of the two-phase commit protocol is that either all participatingdatabases commit the effect of the transaction or none of them do In case any of theparticipants-or the coordinator-fails, it is always possible to recover to a state whereeither the transaction is committed or it is rolled back A failure during or before Phase 1usually requires the transaction to be rolled back, whereas a failure during Phase 2 meansthat a successful transaction can recover and commit

CATASTROPHIC FAILURES

So far, all the techniques we have discussed apply to noncatastrophic failures A keyassumption has been that the system log is maintained on the disk and is not lost as aresult of the failure Similarly, the shadow directory must be stored on disk to allow recov-ery when shadow paging is used The recovery techniques we have discussed use theentries in the system log or the shadow directory to recover from failure by bringing thedatabase back to a consistent state

The recovery manager of aDBMS must also be equipped to handle more catastrophicfailures such as disk crashes The main technique used tohandle such crashes is that ofdatabase backup The whole database and the log are periodically copied onto a cheapstorage medium such as magnetic tapes In case of a catastrophic system failure, the latestbackup copy can be reloaded from the tape to the disk, and the system can be restarted

To avoid losing all the effects of transactions that have been executed since the lastbackup, it is customary to back up the system log at more frequent intervals than fulldatabase backup by periodically copying it to magnetic tape The system log is usuallysubstantially smaller than the database itself and hence can be backed up more frequently.Thus users do not lose all transactions they have performed since the last databasebackup All committed transactions recorded in the portion of the system log that hasbeen backed up to tape can have their effect on the database redone A new log is started

Trang 22

19.8 Summary I 631

after each database backup Hence, to recover from disk failure, the database is first

recreated on disk from its latest backup copy on tape Following that, the effects of all the

committed transactions whose operations have been recorded in the backed-up copies of

the system log are reconstructed

In this chapter we discussed the techniques for recovery from transaction failures The

main goal of recovery is to ensure the atomicity property of a transaction.Ifa transaction

fails before completing its execution, the recovery mechanism has to make sure that the

transaction has no lasting effects on the database We first gave an informal outline for a

recovery process and then discussed system concepts for recovery These included a

dis-cussion of caching, in-place updating versus shadowing, before and after images of a data

item,UNDO versus REDO recovery operations, steal/no-steal and force/no-force policies,

system checkpointing, and the write-ahead logging protocol

Next we discussed two different approaches to recovery: deferred update and

immediate update Deferred update techniques postpone any actual updating of the

database on disk until a transaction reaches its commit point The transaction

force-writes the log to disk before recording the updates in the database This approach, when

used with certain concurrency control methods, is designed never to require transaction

rollback, and recovery simply consists of redoing the operations of transactions

committed after the last checkpoint from the log The disadvantage is that too much

buffer space may be needed, since updates are kept in the buffers and are not applied to

disk until a transaction commits Deferred update can lead to a recovery algorithm known

asNO-UNDO/REDO Immediate update techniques may apply changes to the database on

disk before the transaction reaches a successful conclusion Any changes applied to the

database must first be recorded in the log and force-written to disk so that these

operations can be undone if necessary We also gave an overview of a recovery algorithm

for immediate update known asUNDO/REDO Another algorithm, known as

UNDO/NO-REDO, can also be developed for immediate update if all transaction actions are recorded

in the database before commit

We discussed the shadow paging technique for recovery, which keeps track of old

database pages by using a shadow directory This technique, which is classified as

NO-UNDO/NO-REDO, does not require a log in single-user systems but still needs the log for

multiuser systems We also presentedARIES, a specific recovery scheme used in some of

IBM's relational database products We then discussed the two-phase commit protocol,

which is used for recovery from failures involving multidatabase transactions Finally,

we discussed recovery from catastrophic failures, which is typically done by backing up

the database and the log to tape The log can be backed up more frequently than the

database, and the backup log can be used to redo operations starting from the last

database backup

Trang 23

632 IChapter 19 Database Recovery Techniques

19.4 How are buffering and caching techniques used by the recovery subsystem?19.5 What are the before image (BFIM) and after image (AFIM) of a data item? What isthe difference between in-place updating and shadowing, with respect to theirhandling of BFIM and AFIM?

19.6 What are UNDO-type and REDO-type log entries?

19.7 Describe the write-ahead logging protocol

19.8 Identify three typical lists of transactions that are maintained by the recovery system

sub-19.9 What is meant by transaction rollback? What is meant by cascading rollback?Why do practical recovery methods use protocols that do not permit cascadingrollback? Which recovery techniques do not require any rollback?

19.10 Discuss the UNDO and REDO operations and the recovery techniques that use each.19.11 Discuss the deferred update technique of recovery What are the advantages anddisadvantages of this technique? Why is it called the NO-UNDO/REDO method?19.12 How can recovery handle transaction operations that do not affect the database,such as the printing of reports by a transaction?

19.13 Discuss the immediate update recovery technique in both single-user and tiuser environments What are the advantages and disadvantages of immediateupdate?

mul-19.14 What is the difference between the UNDO/REDO and the UNDO/NO-REDO rithms for recovery with immediate update? Develop the outline for an UNDO/NO-REDO algorithm

algo-19.15 Describe the shadow paging recovery technique Under what circumstances does

it not require a log?

19.16 Describe the three phases of the ARIES recovery method

19.17 What are log sequence numbers (LSNs) in ARIES? How are they used? What mation does the Dirty Page Table and Transaction Table contain? Describe howfuzzy checkpointing is used in ARIES

infor-19.18 What do the terms steal/no-steal and force/no-force mean with regard to buffermanagement for transaction processing

19.19 Describe the two-phase commit protocol for multidatabase transactions

19.20 Discuss how recovery from catastrophic failures is handled

Trang 24

19.21 Suppose that the system crashes before the [read_item,T3,A] entry is written to

the log in Figure 19.1b Will that make any difference in the recovery process?

19.22 Suppose that the system crashes before the [write_item,T2,D,25,26] entry is

writtentothe log in Figure 19.1b Will that make any difference in the recovery

process?

19.23 Figure 19.7 shows the log correspondingtoa particular schedule at the point of a

system crash for four transactions TI ,Tz,T 3, and T 4. Suppose that we use the

immediate update protocolwith checkpointing Describe the recovery process from

the system crash Specify which transactions are rolled back, which operations in

the log are redone and which (if any) are undone, and whether any cascading

rollback takes place

19.24 Suppose that we use the deferred update protocol for the example in Figure 19.7

Show how the log would be different in the case of deferred update by removing

the unnecessary log entries; then describe the recovery process, using your

modi-fied log Assume that onlyREDOoperations are applied, and specify which

opera-tions in the log are redone and which are ignored

19.25 How does checkpointing inARIESdiffer from checkpointing as described in

Sec-tion 19.1.4?

19.26 How are log sequence numbers used byARIESto reduce the amount ofREDOwork

needed for recovery? Illustrate with an example using the information shown in

Fig-ure 19.6 You can make your own assumptions as to when a page is written to disk

[write_item,T 2,D,15, 25]f- system crash

FIGURE19.7 An example schedule and its corresponding log

Exercises I 633

Trang 25

634 IChapter 19 Database Recovery Techniques

19.27 What implications would a no-steal/force buffer management policy have oncheckpointing and recovery?

Choose the correct answer for each of the following multiple-choice questions:

19.28 Incremental logging with deferred updates implies that the recovery system mustnecessarily

a store the old value of the updated item in the log

b store the new value of the updated item in the log

e store both the old and new value of the updated item in the log

d store only the Begin Transaction and Commit Transaction records in the log.19.29 The write ahead logging (WAL) protocol simply means that

a the writing of a data item should be done ahead of any logging operation

b the log record for an operation should be written before the actual data iswritten

e all log records should be written before a new transaction begins execution

d the log never needstobe written to disk

19.30 In case of transaction failure under a deferred update incremental logging scheme,which of the following will be needed:

a an undo operation

b a redo operation

e an undo and redo operation

d none of the above

19.31 For incremental logging with immediate updates, a log record for a transactionwould contain:

a a transaction name, data item name, old value of item, new value of item

b a transaction name, data item name, old value of item

e a transaction name, data item name, new value of item

d a transaction name and a data item name

19.32 For correct behavior during recovery, undo and redo operations must be

a searching the entire log is time consuming

b many redo's are unnecessary

e both (a) and (b)

d none of the above

19.34 When using a log based recovery scheme, it might improve performance as well asproviding a recovery mechanism by

a writing the log records to disk when each transaction commits

b writing the appropriate log records to disk during the transaction's execution

c waiting to write the log records until multiple transactions commit and ing them as a batch

writ-d never writing the log records to disk

Trang 26

Selected Bibliography I 635

19.35 There is a possibility of a cascading rollback when

a a transaction writes items that have been written only by a committed

19.36 To cope with media (disk) failures, it is necessary

a for theDBMS toonly execute transactions in a single user environment

b to keep a redundant copy of the database

c to never abort a transaction

d all of the above

19.37 If the shadowing approach is used for flushing a data item back to disk, then

a the item is written to disk only after the transaction commits

b the item is written to a different location on disk

c the item is written to disk before the transaction commits

d the item is written to the same disk location from which it was read

Selected Bibliography

The books by Bernstein et al (1987) and Papadimitriou (1986) are devoted to the theory

and principles of concurrency control and recovery The book by Gray and Reuter (1993) is

an encyclopedic work on concurrency control, recovery, and other transaction-processing

issues

Verhofstad (1978) presents a tutorial and survey of recovery techniques in database

systems Categorizing algorithms based on theirUNDO/REDOcharacteristics is discussed in

Haerder and Reuter (1983) and in Bernstein et al (1983) Gray (1978) discusses

recov-ery, along with other system aspects of implementing operating systems for databases The

shadow paging technique is discussed in Lorie (1977), Verhofstad (1978), and Reuter

(1980) Gray et al (1981) discuss the recovery mechanism in SYSTEM R.Lockeman and

Knutsen (1968), Davies (1972), and Bjork (1973) are early papers that discuss recovery

Chandy et al (1975) discuss transaction rollback Lilien and Bhargava (1985) discuss the

concept of integrity block and its use to improve the efficiency of recovery

Recovery using write-ahead logging is analyzed in [hingran and Khedkar (1992) and

isused in theARIESsystem (Mohan et al 1992a) More recent work on recovery includes

compensating transactions (Korth et al 1990) and main memory database recovery

(Kumar 1991) TheARIES recovery algorithms (Mohan et al 1992) have been quite

suc-cessful in practice Franklin et al (1992) discusses recovery in the EXODUS system Two

recent books by Kumar and Hsu (1998) and Kumar and Son (1998) discuss recovery in

detail and contain descriptions of recovery methods used in a number of existing

rela-tional database products

Trang 27

OBJECT AND

OBJECT-RELATIONAL DATABASES

Trang 28

Concepts for Object Databases

In this chapter and the next, we discuss object-oriented data models and database

sys-terns.' Traditional data models and systems, such as relational, network, and hierarchical,

have been quite successful in developing the database technology required for many

tradi-tional business database applications However, they have certain shortcomings when

more complex database applications must be designed and implemented-for example,

databases for engineering design and manufacturing (CAD/CAM and CIM2), scientific

experiments, telecommunications, geographic information systems, and rnultimedia'

These newer applications have requirements and characteristics that differ from those of

traditional business applications, such as more complex structures for objects,

longer-duration transactions, new data types for storing images or large textual items, and the

need to define nonstandard application-specific operations Object-oriented databases

were proposed to meet the needs of these more complex applications The

object-oriented approach offers the flexibility to handle some of these requirements without

1.These darabases are often referred to as Object Databases and the systems are referred to as

Object Database Management Systems (ODBMS). However, because this chapter discusses many

general object-oriented concepts, wewilluse the termobject-orientedinstead of justobject.

2 Computer-Aided Design/Computer-Aided Manufacturing and Computer-Integrated

Manufac-turing

3.Multimedia databases must store various types of multimedia objects, such as video, audio,

images, graphics, and documents (see Chapter 24)

639

Trang 29

640 IChapter 20 Concepts for Object Databases

being limited by the data types and query languages available in traditional database tems A key feature of object-oriented databases is the power they give the designer to

sys-specify both the structure of complex objects and the operations that can be applied tothese objects

Another reason for the creation of object-oriented databases is the increasing use ofobject-oriented programming languages in developing software applications Databasesare now becoming fundamental components in many software systems, and traditionaldatabases were difficult to use with object-oriented software applications that aredeveloped in an object-oriented programming language such as C++, SMALLTALK, orJAVA Object-oriented databases are designed so they can be directly-or seamlessly-

integrated with software that is developed using object-oriented programming languages.The need for additional data modeling features has also been recognized by relationalDBMS vendors, and newer versions of relational systems are incorporating many of thefeatures that were proposed for object-oriented databases This has led to systems that arecharacterized asobject-relationalorextended relationalDBMSs (see Chapter22) The latestversion of the SQL standard for relational DBMSs includes some of these features

Although many experimental prototypes and commercial object-oriented databasesystems have been created, they have not found widespread use because of the popularity

of relational and object-relational systems The experimental prototypes included theORION system developed at MCC,4 OPENOODB at Texas Instruments, the IRIS system atHewlett-Packard laboratories, the ODE system at AT&T Bell Labs.? and the ENCORE!ObServer project at Brown University Commercially available systems includedGEMSTONE/OPAL of GemStone Systems, ONTOS of Ontos, Objectivity of Objectivity Inc.,Versant of Versant Object Technology, ObjectStore of Object Design, ARDENT ofARDENT Software," and POET of POET Software These represent only a partial list of theexperimental prototypes and commercial object-oriented database systems that werecreated

As commercial object-oriented DBMSs became available, the need for a standardmodel and language was recognized Because the formal procedure for approval ofstandards normally takes a number of years, a consortium of object-oriented DBMSvendors and users, called ODMG,7proposed a standard that is known as the ODMG-93standard, which has since been revised We will describe some features of the ODMGstandard in Chapter 21

Object-oriented databases have adopted many of the concepts that were developedoriginally for object-oriented programming languages.f In Section 20.1, we examine theorigins of the object-oriented approach and discuss how it applies to database systems.Then, in Sections20.2through20.6,we describe the key concepts utilized in many object-

4 Microelectronics and Computer Technology Corporation, Austin, Texas

5 Now called Lucent Technologies

6 Formerly 02 of 02 Technology

7 Object Database Management Group

8 Similar concepts were also developed in the fields of semantic data modeling and knowledgerepresentation

Trang 30

20.1 Overview of Object-Oriented Concepts I 641

oriented database systems Section 20.2 discussesobject identity, object structure, and type

constructors.Section 20.3 presents the concepts ofencapsulation of operationsand definition

ofmethods as part of class declarations, and also discusses the mechanisms for storing

objects in a database by making them persistent. Section 2004 describes type and class

hierarchies and inheritance in object-oriented databases, and Section 20.5 provides an

overview of the issues that arise when complex objects need to be represented and stored

Section 20.6 discusses additional concepts, including polymorphism, operator overloading,

dynamic binding, multiple and selective inheritance,andversioningandconfigurationof objects

This chapter presents the general concepts of object-oriented databases, whereas

Chapter 22 will present the ODMG standard The reader may skip Sections 20.5 and 20.6

ofthis chapter if a less detailed introduction to the topic is desired

20.1 OVERVIEW OF OBJECT-ORIENTED

CONCEPTS

This section gives a quick overview of the history and main concepts of object-oriented

databases, or OODBs for short The OODB concepts are then explained in more detail in

Sections 20.2 through 20.6 The termobject-oriented-abbreviatedby00or O-O-has its

origins in 00programming languages, or OOPLs Today 00 concepts are applied in the

areas of databases, software engineering, knowledge bases, artificial intelligence, and

com-puter systems in general OOPLs have their roots in the SIMULA language, which was

pro-posed in the late 1960s In SIMULA, the concept of a class groups together the internal

data structure of an object in a class declaration Subsequently, researchers proposed the

concept ofabstractdata type,which hides the internal data structures and specifies all

pos-sible external operations that can be applied to an object, leading to the concept of

encap-sulation. The programming language SMALLTALK, developed at Xerox PARC9 in the

1970s, was one of the first languages to explicitly incorporate additional 00 concepts,

such as message passing and inheritance.Itis known as apure00programming language,

meaning that it was explicitly designed tobe object-oriented This contrasts withhybrid

00programming languages, which incorporate00concepts into an already existing

lan-guage An example of the latter is C++, which incorporates00concepts into the popular

cprogramming language

An object typically has two components; state (value) and behavior (operations)

Hence, it is somewhat similar to aprogram variable in a programming language, except

that it will typically have acomplex data structureas well asspecific operationsdefined by

the programmer.10Objects in an OOPL exist only during program execution and are hence

calledtransient objects. An00database can extend the existence of objects so that they

are stored permanently, and hence the objectspersist beyond program termination and

can be retrieved later and shared by other programs In other words, 00 databases store

9 Palo Alto Research Center, Palo Alto, California

10.Objects have many other characteristics, as we discuss in the rest of this chapter

Trang 31

642 I Chapter 20 Concepts for Object Databases

persistent objectspermanently on secondary storage, and allow the sharing of these objectsamong multiple programs and applications This requires the incorporation of other well-known features of database management systems, such as indexing mechanisms,concurrency control, and recovery An 00database system interfaces with one or more

00programming languages to provide persistent and shared object capabilities

One goal of00 databases is to maintain a direct correspondence between real-worldand database objects so that objects do not lose their integrity and identity and can easily

be identified and operated upon Hence,00 databases provide a unique system-generated

object identifier(OID) for each object We can compare this with the relational model whereeach relation must have a primary key attribute whose value identifies each tuple uniquely

In the relational model, if the value of the primary key is changed, the tuple will have anew identity, even though it may still represent the same real-world object Alternatively,

a real-world object may have different names for key attributes in different relations,making it difficult to ascertain that the keys represent the same object (for example, theobject identifier may be represented asEMP_IDin one relation and asSSNin another).Another feature of 00 databases is that objects may have an object structure of

arbitrary complexity in order to contain all of the necessary information that describes theobject In contrast, in traditional database systems, information about a complex object isoften scattered over many relations or records, leading to loss of direct correspondencebetween a real-world object and its database representation

The internal structure of an object in OOPLs includes the specification of instancevariables, which hold the values that define the internal state of the object Hence, aninstance variable is similar to the concept of an attributein the relational model, exceptthat instance variables may be encapsulated within the object and thus are notnecessarily visible to external users Instance variables may also be of arbitrarily complexdata types Object-oriented systems allow definition of the operations or functions(behavior) that can be applied to objects of a particular type In fact, some 00 modelsinsist that all operations a user can apply to an object must be predefined This forces a

complete encapsulation of objects This rigid approach has been relaxed in most 00datamodels for several reasons First, the database user often needs to know the attributenames so they can specify selection conditions on the attributes to retrieve specificobjects Second, complete encapsulation implies that any simple retrieval requires apredefined operation, thus making ad hoc queries difficult to specify on the fly

To encourage encapsulation, an operation is defined in two parts The first part,called the signature or interface of the operation, specifies the operation name andarguments (or parameters) The second part, called the methodor body, specifies the

implementation of the operation Operations can be invoked by passing amessage to anobject, which includes the operation name and the parameters The object then executesthe method for that operation This encapsulation permits modification of the internalstructure of an object, as well as the implementation of its operations, without the need todisturb the external programs that invoke these operations Hence, encapsulationprovides a form of data and operation independence (see Chapter 2)

Another key concept in00systems is that of type and class hierarchies and inheritance.

This permits specification of new types or classes that inherit much of their structure and/oroperations from previously defined types or classes Hence, specification of object types can

Trang 32

20.2 Object Identity, Object Structure, and Type Constructors I 643

proceed systematically This makes it easier to develop the data types of a system

incrementally, and toreuseexisting type definitions when creating new types of objects

One problem in early 00database systems involved representingrelationshipsamong

objects The insistence on complete encapsulation in early 00data models led to the

argument that relationships should not be explicitly represented, but should instead be

described by defining appropriate methods that locate related objects However, this

approach does not work very well for complex databases with many relationships, because

it is useful to identify these relationships and make them visible to users The ODMG

standard has recognized this need and it explicitly represents binary relationships via a

pairofinversereferences-that is, by placing the OIDs of related objects within the objects

themselves, and maintaining referential integrity, as we shall describe in Chapter 21

Some 00 systems provide capabilities for dealing withmultiple versions of the same

object-a feature that is essential in design and engineering applications For example, an

old version of an object that represents a tested and verified design should be retained

until the new version is tested and verified A new version of a complex object may

include only a few new versions of its component objects, whereas other components

remain unchanged In addition to permitting versioning, 00databases should also allow

forschema evolution,which occurs when type declarations are changed or when new types

or relationships are created These two features are not specific to OODBs and should

ideally be included in all types of DBMSs.11

Another00concept isoperator overloading, which refers to an operation's ability to

be applied to different types of objects; in such a situation, an operation namemay refer to

several distinct implementations, depending on the type of objects it is applied to This

feature is also called operator polymorphism. For example, an operation to calculate the

area of a geometric object may differ in its method (implementation), depending on

whether the object is of type triangle, circle, or rectangle This may require the use oflate

bindingof the operation name to the appropriate method at run-time, when the type of

object to which the operation is applied becomes known

This section provided an overview of the main concepts of00databases In Sections

20.2 through 20.6, we discuss these concepts in more detail

AND TYPE CONSTRUCTORS

In this section we first discuss the concept of object identity, and then we present the

typ-ical structuring operations for defining the structure of the state of an object These

structuring operations are often called type constructors They define basic data-structuring

operations that can be combined to form complex object structures

- - -

-11.Several schema evolution operations, such asALTER TABLE,are already defined in the relational

SQLstandard (see Section 8.3)

Trang 33

644 I Chapter 20 Concepts for Object Databases

An00database system provides a unique identity to each independent object stored in thedatabase This unique identity is typically implemented via a unique, system-generated objectidentifier, or DID. The value of an OID is not visible to the external user, but it is usedinternally by the systemtoidentify each object uniquely and to create and manage inter-object references The OlDcan be assigned to program variables of the appropriate typewhen needed

The main property required of an OID is that it be immutable; that is, the OlD value

of a particular object should not change This preserves the identity of the real-worldobject being represented Hence, an 00database system must have some mechanism forgenerating OIDs and preserving the immutability property It is also desirable that eachOID be used only once; that is, even if an object is removed from the database, its OIDshould not be assigned to another object These two properties imply that the OIDshould not depend on any attribute values of the object, since the value of an attributemay be changed or corrected It is also generally considered inappropriate to base theOID on the physical address of the object in storage, since the physical address canchange after a physical reorganization of the database However, some systems do use thephysical address as OID to increase the efficiency of object retrieval If the physicaladdress of the object changes, an indirect pointercan be placed at the former address,which gives the new physical location of the object It is more common to use longintegers as OIDs and then to use some form of hash table to map the OID value tothecurrent physical address of the object in storage

Some early 00 data models required that everything-from a simple value to acomplex object-be represented as an object; hence, every basic value, such as an integer,string, or Boolean value, has an OID This allows two basic values to have different OIDs,which can be useful in some cases For example, the integer value 50 can be used sometimes

to mean a weight in kilograms and at other times to mean the age of a person Then, twobasic objects with distinct OIDs could be created, but both objects would represent theinteger value 50 Although useful as a theoretical model, this is not very practical, since itmay lead to the generation of too many OIDs Hence, most00 database systems allow forthe representation of both objects and values Every object must have an immutable OID,whereas a value has no OIDand just stands for itself Hence, a value is typically stored within

an object andcannot be referencedfrom other objects In some systems, complex structuredvalues can also be created without having a corresponding OID if needed

20.2.2 Object Structure

In 00 databases, the state (current value) of a complex object may be constructed fromother objects (or other values) by using certain type constructors One formal way of rep-resenting such objects is to view each object as a triple(i, c, v), whereiis a uniqueobject identifier(the OlD), c is atype constructor 12(that is, an indication of how the object state is

12.This is different from the constructor operation that is used inc++and other OOPLstocreatenew objects

Trang 34

20.2 Object Identity, Object Structure, and Type Constructors I 645

constructed), and v is the object state (or current value).The data model will typically

include several type constructors The three most basic constructors are atom, tuple, and

set Other commonly used constructors include list, bag, and array The atom

construc-tor is used to represent all basic atomic values, such as integers, real numbers, character

strings, Booleans, and any other basic data types that the system supports directly

The object statevof an object(i,c,v)is interpreted based on the constructor c.Ifc=

atom, the state (value)vis an atomic value from the domain of basic values supported by

the system.Ifc=set, the statevis aset of objectidentifiers {iI' iz, , in},which are the OIDs

for a set of objects that are typically of the same type If c=tuple, the statevis a tuple of

the form<al:il, az:iz, , an:in >,where eacha j is an attribute namel ' and eachi jis an OID

Ifc= list, the valuev is an ordered list [iI' iz, , in]of OIDs of objects of the same type A

list is similar to a set except that the OIDs in a list areordered, and hence we can refer to

the first, second, orlh object in a list For c= array, the state of the object is a

single-dimensional array of object identifiers The main difference between array and list is that

a list can have an arbitrary number of elements whereas an array typically has a maximum

size The difference between setand bagl4is that all elements in a set must be distinct

whereas a bag can have duplicate elements

This model of objects allows arbitrary nesting of the set, list, tuple, and other

constructors The state of an object that is not of type atom will refer to other objects by

their object identifiers Hence, the only case where an actual value appears is inthe state

ofan object of type atom.IS

The type constructors set, list, array, and bag are called collection types (or bulk

types), to distinguish them from basic types and tuple types The main characteristic of a

collection type is that the state of the object will be acollection of objects that may be

unordered (such as a set or a bag) or ordered (such as a list or an array) The tuple type

constructor is often called a structured type, since it corresponds to the struct construct

in theCandc++programming languages

EXAMPLE1: AComplex Object

We now represent some objects from the relational database shown in Figure 5.6, using

the preceding model, where an object is defined by a triple (OID, type constructor, state)

and the available type constuctors are atom, set, and tuple We useii' iz, i 3, •••to stand for

unique system-generated object identifiers Consider the following objects:

01 = (ii' atom, 'Houston')

Oz= (iz,atom, 'Bellaire')

03 = (i 3,atom, 'Sugarland')

13.Also called an instance variable name in00terminology

14 Also called a multiset

15.As we noted earlier, it is not practical to generate a unique system identifier for every value, so

real systems allow for bothOlfrsand structured value, which can be structured by using the same type

constructors as objects, except that a value does not have anaID

Trang 35

646 I Chapter 20 Concepts for Object Databases

04 = (i 4,atom, 5)

05 = (is, atom, 'Research')

06=(i 6,atom, '1988-05-22')

07 =(i7,set, {iI'iz,i3})

Os = (is, tuple,<DNAME:is, DNUMBER:i4, MGR:i9, LOCATIONS:i 7, EMPLOYEES:ilO,PROJECTS:i l l»

09 = (i 9,tuple,<MANAGER:i 12, MANAGER_START_DATE:i6»

010=(i1O'set,{in, i13,i14})

011 = (ill'set filS' i16, in}) 0lZ = (in,tuple,<FNAME:i lS' MINIT:i19, LNAME:i 20, SSN:iZl , ,SALARY:i z6, SUPERVISOR:in ,DEPT:i s»

The first six objects (01-06) listed here represent atomic values There will be many

similar objects, one for each distinct constant atomic value in the database.16Object07

is a set-valued object that represents the set of locations for department 5; the set{iI' iz,

i 3}refers to the atomic objects with values {'Houston', 'Bellaire', 'Sugarland'} ObjectOs

is a tuple-valued object that represents department 5 itself, and has the attributesDNAME, DNUMBER, MGR, LOCATIONS,and so on The first two attributesDNAME and DNUMBERhave atomicobjects Osand 04 as their values The MGR attribute has a tuple object 09 as its value,

which in turn has two attributes The value of the MANAGERattribute is the object whoseOID isin,which represents the employee 'John B Smith' who manages the department,whereas the value ofMANAGER_START_DATEis another atomic object whose value is a date Thevalue of the EMPLOYEESattribute ofOsis a set object withOID =ilO ,whose value is the set ofobject identifiers for the employees who work for theDEPARTMENT(objectsin,plusi13andi14,which are not shown) Similarly, the value of thePROJECTSattribute ofOsis a set object withOID = ill'whose value is the set of object identifiers for the projects that are controlled bydepartment number 5 (objectsilS' i16,andin'which are not shown) The object whoseOID

=in represents the employee 'John B Smith' with all its atomic attributes (FNAME, MINH, LNAME, SSN, ••• , SALARY,that are referencing the atomic objectsi lS' i19,iZG'iZl' ,iZ6' respect-ively (not shown» plusSUPERVISORwhich references the employee object withOID =in(thisrepresents 'James E Borg' who supervises 'John B Smith' but is not shown) andDEPTwhichreferences the department object withOID=is(this represents department number 5 where'John B Smith' works)

Inthis model, an object can be represented as a graph structure that can be constructed

by recursively applying the type constructors The graph representing an object0i can beconstructed by first creating a node for the object0i itself The node for0i is labeled with theOIDand the object constructor c We also create a node in the graph for each basic atomic

16 These atomic objects are the ones that may cause a problem, due to the use of too many objectidentifiers, if this model is implemented directly

Trang 36

20.2 Object Identity, Object Structure, and Type Constructors I 647

value If an object0ihas an atomic value, we draw a directed arc from the node representing

0i to the node representing its basic value If the object value is constructed, we draw

directed arcs from the object node to a node that represents the constructed value Figure

20.1 shows the graph for the exampleDEPARTMENTobjectOsgiven earlier

The preceding model permits two types of definitions in a comparison of thestates of

two objectsfor equality Two objects are said to have identical states (deep equality) if the

graphs representing their states are identical in every respect, including the OIDs at every

level Another, weaker definition of equality is when two objects have equal states

(shallow equality) In this case, the graph structures must be the same, and all the

corresponding atomic values in the graphs should also be the same However, some

corresponding internal nodes in the two graphs may have objects withdifferent OIDs.

EXAMPLE 2: Identical Versus Equal Objects

A example can illustrate the difference between the two definitions for comparing object

statesfor equality Consider the following objectsOJ'0z,03' 04' 0S, and06:

OJ = (ij , tuple,<aj:i 4, az:i 6»

Oz = (iz,tuple,<aj:is,az:i 6»

03 = (i 3,tuple,<aj:i 4, az:i 6 »

04 = (i 4,atom, 10)

as= (is, atom, 10)

06 = (i 6,atom,20)

The objectsOJand0zhaveequalstates, since their states at the atomic level are the

same but the values are reached through distinct objects04and05. However, the states of

objectsOJand 03are identical, even though the objects themselves are not because they

have distinct OIDs Similarly, although the states of 04 and 05 are identical, the actual

objects04and05 are equal but not identical, because they have distinct OIDs

20.2.3 Type Constructors

An object definition language (ODL)j? that incorporates the preceding type constructors

can be used to define the object types for a particular database application In Chapter21,

we shall describe the standard ODL of ODMG, but we first introduce the concepts gradually

in this section using a simpler notation The type constructors can be used to define the

datastructures for an 00database schema.In Section 20.3we will see how to incorporate

the definition ofoperations (or methods) into the00schema Figure 20.2shows how we

may declare Employee and Department types corresponding to the object instances shown

17 This would correspond to the DDL (Data Definition Language) of the database system (see

Chapter 2)

Trang 37

648 IChapter 20 Concepts for Object Databases

in:'"

tuple

PROJECTS EMPLOYEES

i3:~3atom

FIGURE 20.1 Representation of aDEPARTMENTcomplex object as a graph

in Figure 20.1 In Figure 20.2, the Date type is defined as a tuple rather than an atomicvalue as in Figure 20.1 We use the keywords tuple, set, and list for the type constructors,and the available standard data types (integer, string, float, and so on) for atomic types

Trang 38

20.3 Encapsulation of Operations, Methods, and Persistence I 649

Employee;

FIGURE20.2 Specifying the object types Employee, Date, and Department using

typeconstructors

Attributes that refer to other objects-such as dept of Employee or projects of

Department-are basically references to other objects and hence serve to represent

relationshipsamong the object types For example, the attribute dept of Employee is of type

Department, and hence is used to refer to a specific Department object (where the

Employee works) The value of such an attribute would be an OID for a specific Department

object A binary relationship can be represented in one direction, or it can have an inverse

reference. The latter representation makes it easy to traverse the relationship in both

directions For example, the attribute employees of Department has as its value a set of

references (that is, a set of OIDs) to objects of type Employee; these are the employees who

workfor the department The inverse is the reference attribute dept of Employee We will

see in Chapter 21 how the ODMG standard allows inverses to be explicitly declared as

relationship attributestoensure that inverse references are consistent

METHODS, AND PERSISTENCE

The concept ofencapsulationis one of the main characteristics of00 languages and

sys-tems.Itis also relatedtothe concepts ofabstractdatatypesandinformationhiding in

pro-gramming languages In traditional database models and systems, this concept was not

Trang 39

650 I Chapter 20 Concepts for Object Databases

applied, since it is customary to make the structure of database objects visible to users andexternal programs In these traditional models, a number of standard database operationsare applicable to objects of all types For example, in the relational model, the operationsfor selecting, inserting, deleting, and modifying tuples are generic and may be appliedto

any relation in the database The relation and its attributes are visible to users and toexternal programs that access the relation by using these operations

20.3.1 Specifying Object Behavior via Class Operations

The concepts of information hiding and encapsulation can be applied to database objects.The main idea is to define the behavior of a type of object based on the operations thatcan be externally applied to objects of that type The internal structure of the object ishidden, and the object is accessible only through a number of predefined operations.Some operations may be used to create (insert) or destroy (delete) objects; other opera-tions may update the object state; and others may be used to retrieve parts of the objectstate or to apply some calculations Still other operations may perform a combination ofretrieval, calculation, and update In general, the implementation of an operation can bespecified in ageneral-purpose programming languagethat provides flexibility and power indefining the operations

The external users of the object are only made aware of the interface of the objecttype, which defines the name and arguments (parameters) of each operation Theimplementation is hidden from the external users; it includes the definition of theinternal data structures of the object and the implementation of the operations thataccess these structures In00terminology, the interface part of each operation is calledthe signature, and the operation implementation is called a method Typically, a method

is invoked by sending a message to the object to execute the corresponding method.Notice that, as part of executing a method, a subsequent message to another object may

be sent, and this mechanism may be used to return values from the objects to the externalenvironment or to other objects

For database applications, the requirement that all objects be completelyencapsulated is too stringent One way of relaxing this requirement is to divide thestructure of an object into visible and hidden attributes (instance variables) Visibleattributes may be directly accessed for reading by external operators, or by a high-levelquery language The hidden attributes of an object are completely encapsulated and can

be accessed only through predefined operations Most OODBMSs employ high-level querylanguages for accessing visible attributes In Chapter 21, we will describe the OQL querylanguage that is proposed as a standard query language for OODBs

In most cases, operations thatupdatethe state of an object are encapsulated This is away of defining the update semantics of the objects, given that in many 00data models,few integrity constraints are predefined in the schema Each type of object has its integrityconstraints programmed into the methods that create, delete, and update the objects byexplicitly writing code to check for constraint violations and to handle exceptions Insuch cases, all update operations are implemented by encapsulated operations Morerecently, the ODL for the ODMG standard allows the specification of some common

Trang 40

20.3 Encapsulation of Operations, Methods, and Persistence I 651

constraints such as keys and inverse relationships (referential integrity) so that the system

can automatically enforce these constraints (see Chapter 21)

The term class is often used to refer to an object type definition, along with the

definitions of the operations for that type.I SFigure 20.3shows how the type definitions of

Figure 20.2 may be extended with operations to define classes A number of operations

are declared for each class, and the signature (interface) of each operation is included in

the class definition A method (implementation) for each operation must be defined

elsewhere, using a programming language Typical operations include the object

constructor operation, which is used to create a new object, and the destructor

operation, which is used to destroy an object A number of object modifier operations can

define class Employee:

type tuple( fname:

define class Department

type tuple( dname: string;

assign_emp(e: Employee): boolean;

(*adds an employee to the department*)

remove_emp(e: Employee): boolean;

(*removes an employee from the department*)

end Department;

FIGURE20.3 Adding operations to the definitions of Employee and Department

18.This definition of class is similar to how it is used in the popular c++ programming language.

TheODMGstandard uses the word interface in additiontoclass(see Chapter21).In theEERmodel,

the term class was usedtorefertoan object type, along with the set of all objects of that type (see

Chapter4)

Ngày đăng: 08/08/2014, 18:22

TỪ KHÓA LIÊN QUAN