Tài liệu Database Systems: The Complete Book- P11 ppt

For instance, there is nothlng in the serializability definition that forbids a transaction with a lock on a n element A from writing a new value of A into the database before committing

Trang 1

1 START thc scLt of t r a n s a c t i o ~ ~ s that have started but not yet completed va!idation For each transaction T in this set the scheduler maintains ST.4R-1 (T) the tilnc a t which T started

2 K4L; the set of transactions that have been validated hut not yet finished tlie n-riting of phase 3 For each transaction T in this set, the scheduler niairitains both srr.-\nr(T) and \:-\L(T), the time a t which T valiciated

S o t e that \ ~ L ( T ) is also thc time a t which T is irnagined t o execute ill the hypotlirtical serial order of esccutioi~

3 FIIV: the set of trai~sactio~is that have corripletcd phase 3 For thesc tra~isactions T , the scheduler records S T A R T ( T ) , \'.-\I.(T), and F I S ( T ) : the time a t which T finished In principle this set grows, but as a-e shall see

n-e do not havc t o remember transaction T if ~ l n ( T ) < ST.~KT(C-) for any actir~c transaction U (i.e for any U in START or V A L ) The scheduler may thus periodically purge the FIN set t o keep its size from growing beyond bounds

18.9.2 The Validation Rules

If rnaintaincd by t h e scheduler the information of Section 18.9.1 is cnotigh for

it to detect any potential violation of the assulned serial order of the transactions - the order in which the trai~sactions validate To understand tlie rules

Irt us first consider what can be I\-long ~ v h e ~ i w\-r try to validate a transaction

T

T reads X

/ U writes X

U stalt T start U validated T validating

Figure 18.43: T cannot ~ a l i d a t e if an earlier transaction is nolv ~viiting something tlrat T slioulci have rcati

1 Supposcx tlir~rc, is ;I transaction L7 sur.11 t11;it:

(a) C is in 1/;-lL or FLV: that is C- has vnlid;~tcd

(b) F I S ( C ) > s'I-~\RT(T): that is, C tiid not finish beforc T started.'"

'"ore tlrat if 1: is in VAL then C has not yet firris11c.d when ? validates In that case

FIX((.') is trclirricall? l~ndefined Holvever we lirlon it mrlst he largpr than ST;\KT(T) in this

is shown in Fig, 18.43 To interpret the figure note that the dotted lines

connrct the eyents in real time ~ v i t h the time a t which they xvould have occurred had transactions bee11 executed a t t h e molnent they validated Since n.e don't kno~v n-hether or not T got t o read li's value, \ve must

rollback T t o avoid a risk that the actions of T and U will not be consistent

~vitli the assumed serial order

2 Suppose there is a transaction U such that:

(a) U is in VAL: i.e., U has successfully validated

(h) F I S ( U ) > \:-\L(T); that is, U did not finish before T entered its validation phase

(c) \ v s ( T ) n \\.s(U) # 0: in particular let S be in both \\-rite sets Thcn the potential probleni is as sho~vn ill Fig 18.44 T and li must both

\\rite values of S , and if \vc let T validate it is possible t h a t it will wiite

S before I - does Since \ve cannot be sure n e rollback T t o make sure it does not violate the assumed serial order in which it f o l l o ~ s C'

T writes X

I U writes X

D validated T validating U finish

Figure 18.41: T cannot validate if it co~ild tl ~ e n m i t e something ahead of a n earlier transaction

Tile two descrillpd above are the only situations in I\-hich a write

T could I,e p l ~ ~ s i c a l l y ullrcalizablt In Fig 15.43 if C finished before 7'

starred tlle~l sure]! T lv0~~ltl read tlic va111c of S that either c- or sollle later trallsaction n.roce In Fig 18.44 if C finished hefore T validated then surely

C' lvrote y before T did \Ye may tli~ls sunllnarize these observations with the follon-ing rule for validating a transaction T :

Check that R S ( T ) n \\.s(U) = 0 for any previously validated C' that did not finish before T startcd, i.e if F I S ( ~ ) > S T A R T ( T )

Trang 2

982 CHAPTER 18 COAiCURRENCY C'OXTROL

Check t h a t wS(T) n W S ( U ) = 0 for any previously validated U that did not finish before T validated, i.e., if F I S ( U ) > v.%L(T)

Example 18.29 : Figure 18.45 shows a time line during which four transactiorls

T, U , V , and IV attempt t o execute and validate The read and write sets for each transaction are indicated on the diagram T starts first, although U is the first t o validate

Figure 18.45: Four transactiorls and their validation

1 \'alidation of U: When U validates there are no other validated transactions, so there is nothing t o check U validates successfully and writes a value for database element D

2 \lidation of T : When T validates, LT is validated but not finished Thus

lve must check t h a t neither the read nor write set of T has anything

in common with W S ( U ) = { D ) Since R S ( T ) = {.4 B ) and m ( T ) =

3 \%lidation of IT: \lilien 17 validates li is validated and finished and T

is validated but not finishtd Also I ' started hefore C finished 711~5

n e n ~ u s t compare bath R S ( I ' ) and n ~ ( 1 3 against w s ( T ) Lilt onlv R S ( I )

nerds to be compared against \\.s(l*) \\e find:

R S ( ~ - ) n u s ( T ) = { B ) n { - 4 C ) = 0

ns(17) n ~ z s ( T ) = { D , E ) n {-4.C) = 0

R S ( ~ * ) n ~ ( u ) = { B ) n { D ) = 0

Thus, I - also validates successfully

lrou may have been concelned xvith a tacit notion that validation takes place in a moment, or indivisible instant of time For example, we i~nagine that vie can decide whether a transaction U has already validated before

we start t o validate transaction T Could U perhaps finish validating while n-e are xalidating T?

If we are running on a uniprocessor system, and there is only one scheduler process, we can indeed think of validation and other actions of the scheduler as taking place in a n instant of time The reason is that if the scheduler is validating T, then it cannot also be validating U , so all during the validation of T , the validation status of U cannot change

If I\-e are running on a multiprocessor, and there are several scheduler processes, then it might be t h a t one is validating T while the other

is validating U If so, then we need t o rely on whatever synchroniza- tion mechanism the ~nultiprocessor system provides to make validation an atomic action

4 Iralidation of 15': \'i;hen \ I T validates, ~ \ - e find that U finished bcfore Ili

started so no co~nparison b e t w e n IV and U is performed T is finished before 11 validates but did not finish before Ti7 started, so [ve compare onl\- R S ( T V ) with \j's(T) I is validated but not finished so x e need t o cornpale both ~s(T1') arid I\ ~ ( 1 1 ~ ) with ws(T) These tests are:

~ s ( r l / ) n w s ( ~ ) = {A4 D ) n { l;C) = {.A)

~ s ( r v ) n ws(l') = {.4 D ) n { D E } = { D l

\vs(11-) n ws(17) = {.-I C ) n {D; E ) = 0

Since the i~ltersections are not all empty Ti7 IS not validated Rather, T I T

is rolled back and does not write values for -I or C

18.9.3 Comparison of Three Concurrency-Control

Mechanisms Tile tllrce approaches to serializabllity that n-e have collsidered locks timestamps and validation - each have their advantages First they can be corn- pared for their storage utilization:

Locks: Space in the lock table is proportional t o the number of database elements locked

Trang 3

Tzmestamps: In a naive implementation, space is needed for read- and write-times with every database element, nhether or not it is currently accessed However, a more careful implenlentation \%-ill treat all timestamps that are prior t o the earliest active transaction as "minus infinity.' and not record them In that case we can store read- and write-times in

a table analogous t o a lock table, in which only those database elements that have been accessed recently are mentioned a t all

Validation: Space is used for timestamps and read/\vrite sets for each currently active transaction, plus a few more transactions that finished after some currently active transaction began

Thus, the amounts of space used by each approach is approximately proportional to the sum over all active transactions of the number of database elenle~lts the transaction accesses Timesta~nping and validation may use slightly more space because they keep track of certain accesses by recently committed transactions that a lock table ~vould not record X poter~tial problem with validation

is that the w ~ i t e set for a transaction must be known before the xrites occur (but after the transaction's local cornputation has been conlpleteti)

It'e can also conipare the methods for their effect on the ability of transactions to complete tvithout delay The performance of the three methotfs depends

on whether interaction among transactions (the likelihood that a tra~lractioci will access an elenlent that is also being accessed by a concurrent transaction)

is high or low

Locking delays transactions but avoids rollbaclts even ~ v h e n interactio~l

is high Tiniestamps and validation do not delay transactions but call cause them t o rollback, which is a niore serious form of delay and also

~ ~ a s t e s resources

If interference is lo\v then neither timestamps nor validation ~vill cause many rollbacks and may be preferable t o locking because they generally have lolver overhead than a locking scheduler

\\-hen a rollback is necessary, tinlestamps catch some proble~ns earlier than validation, which altx-ays lets a transactioll do all its i ~ i t e r ~ l a l n-ork before considering whether the transaction niust rollback

Exercise 18.9.1 : In the follo~vi~lg scquc.nccs of events \\e IISP R,(.\-) to mcnn

"transaction T, starts, and its read set IS the list of d a t a b a ~ e elc~nents S." =\lqo

I/, lrieans .'T, attempts to talidate." and II;(.Y) lneans that T, finishes and its write set was S." Tell n h a t happens n-lien each sequence is piocessect b j a validation-based scheduler

* a) R1(.4.B); R r ( B , C ) ; 1;; R3(C D): 15: II;(.4): I > : TI:L(,4): 11;(B):

b) R1(-4.B): R 2 ( B , C ) : Vl; R s ( C , D ) , t:; fT-1(~4); 15: 11'2(A4); 1 i 7 3 ( ~ ) :

C ) R1(.4.B); Rr(I3.C); 15; R3(C D): 15; I I 7 l ( c ) : 1:; 11'2(-+1): 1ir3(D); d) R1(-4.B); R 2 ( B C ) : R3(C); V1: i5; If3; llTl(-4): Ilr2(B); fv3(c):

e) Rl(.-I.B); R 2 ( B C ) ; R3(C); 1;: 1;: V3; ll'-l(C): 11-z(B); 1i73(>4):

f ) Rl(-4.B): R 2 ( B , C ) ; R3(C); 11: 1;: 1;; Ll-1 (-4) I17z(C): 1$-3(B):

18.10 Summary of Chapter 18

+ Conszstent Database States: Database states that obey xhatever i~nplied

or declared constraints the designers inte~lded are called consistent It

is essential that operations on the database preserve consiste~lcy that is they turn one consistent database state into anothel

+ C o n s ~ s t e n c ~ of Concurrent Transacttons: I t is normal for several transactions t o have access t o a database a t the same time Trarisactions, run

111 isolation, are assumed t o preserve consistency of the database It is the job of the scheduler to assure that concurrently operating transactions also preserxe the consistency of t h e database

+ Schedrrles: Tra~lsactions are brokcn into actions, lnaillly reading and writ-

i ~ l g from the database X sequcnce of these actions from one or more tra~lsactiolls is called a schedule

+ Serial Schedules: If trallsactio~ls esecutc ollf ar a time, the s~ht!du!C is said t o be serial

+ Serializable Schedules: i schcdnle t h a t is equivalent in its effect on the database t o sollle serial schedule is said t o bc serializable 111terlcat-i11g of actions from transactions is I~ossible in a serializable schedule that

a t least one of ~vhich actions is ~vritc

+ PVecedence Gmyhs: in easy tcst for cullflirt-serializal~ility is to construct

a precedellce graph for the schedule Sodes correspond t o transactions and there is a n arc T + C if some action of T in the schedule conflicts n-itIl a later action of c .\ schedule is conflict-serializable if and onl> if the precedence graph is ac\-clic

Trang 4

CH.-IPTER 18 CONCURRESCY C O S T R O L

+ Locking: The most common approach to assuring serializable schedules is

to lock database elernents before accessillg them, and t o release the lock after finishing access t o the element Locks on an eleluent prevent otlier transactions from accessing the element

+ TWO-Phase Lockzng: Lorking by itself does not assure serializability How- ever, two-phase locking, in which all transactions first enter a phase ~vhere they only acquire locks, and then enter a phase d i e r e they only release locks will guarantee serializability

+ Lock Modes: To a\-oitl locking out transactions unnecessarily, systems usually use several lock modes, with different rules for each lriode about when a lock can be granted Most common is the system with shared locks for read-only access and esclusive locks for accesses that include writing

+ Compatzbzlzty Matrzces: A compatibility matrix is a useful summaiy of xhen it is legal t o grant a lock in a certain lock mode, given that there may be other locks, in the same or other rnocles, on the same elelnent

+ Update Locks: A scheduler can allow a transactiori that plans t o read and then write an element first to take an update lock, and later t o upgrade the lock to esclusive Update locks call be granted hen there are already shared locks on the elcmerit: but once there, a n update lock prevents vtlier locks from being granted on tliat element

+ Increment L o c h : For the common case where a transaction n a n t i only t o add or subtract a constant from an element, a n increment lock is suitable

Increnlent locks on the sanie elelne~lt do not conflict n-it11 each other

although they conflict bit11 shared and e s c l u s i ~ e locks

+ Locking Elements Li'zth a GI-u~zularfty Hzerarchy: \\-hell both large and srnall elenients - relations, disk; blorks and tuples, perhaps - may need

to be locked, a ~ v a ~ l l i n g system of locks enforces serializability Tra~lsactions place intention locks on large elements to warn other transactions that tliey plan to access one or more of its subelements

+ Locking Elemen,ts irmnged i n a Tree: If database elements are only accessed by moving dolvn a tree as in a 13-tree index, then a non-tn-o-phase locking strategy call enforce serializability The rules require a lock t o 11e

held on the parent n-llilt, obtaining a lock on tlic child altliough the lock

on the parent c;111 then be rtlleasrd anti adtlitiorial locks taken latcr

+ Optimistic Concurrency Control: Instead of locking, a scheduler can assume transactions d l be scrializahle and abort a transactiori if some potentially nonserializable behavior is seen This approach, called optimistic, is divided into timestamp-based, and validation-based scheduling

R E F E R E S C E S FOR CII.4PTER 18

+ Timestamp-Based Schedulers: Tliis type of scheduler assigns tirnesta~ilps

t o transactio~ls as they begin Database elements have associated read- and write-times, \\.!lich are the tiniestanlps of the transactions that most recently 1;erformed those actions If a n irnpossible situation, such as a read by one transaction of a value that s a s written in that transaction's future is detected the violating transaction is rolled back, i.e., aborted and restarted

+ Val2dntfon-Based Schedrrlers: These schedlilers validate transactions after tliey haye read pverything they need, but before they write Trar~sactions that have wad or \v111 nritc, a n elenient t h a t some other transaction is in the ploccss of xvriting nil1 have a n ambiguous result, so the transaction

is not val~dated A transaction that fails t o validate is rolled back

+ Mr~ltiverszon Timestamps: A common technique in practice is for read- only transactiolls t o l ~ e scheduled by timestamps but with multiple ver- sio~is, rvhere a !\-rite of a n element does not overwrite earlier values of that ele~nent until all transactions t h a t could possibly need the earlier value have finished IYriting transactions are scheduled by conventional locks

18.11 References for Chapter 18

The book [GI is a n important source for niaterial on scheduling, as well as locking [3] is another important source Two recent surveys of concurrency control are [I21 alid [Ill

Probably tlie most significant paper in t h e field of transaction processing is

[4] on two-phase locking Tlle ~varning protocol for hierarchies of granularity

is from [3] Son-tx-o-phase locking for trees is from [lo] The compatibility matrix was introduced t o study behavior of lock modes ill [7]

Timestaiups a s a concurrency control rilethod appeared in [2] and [I] Sched- uling by ~ a l i d a t i o n is from [a] The use of riiultiple versions was studied by [9]

1 P .\ Brln>tein arid 1 Goodman Ti~nestamp-based algorithms for concurrency control ill distributed database systems." Proc Intl C O I L ~ o n l'ery Large Databnses (1980) pp 28.3-300

2 P ;\ Benlstein S Goodman J 13 Rothnie, Jr and C H Papadirn- itriou -Anal\-4s of sprializabiIity in SDD-1: a system of distributed databases ( t h e f u l l rcdlrrlda~lt case)." IEEE Tra11,s on Software En,g~:neering

SE-4:3 (197S) pp 1.54-168

3 P .A Bclnitein \ Hadlilncoi ant1 S Goodman C o n c u ~ r e n c y Corltrol and Recocery 171 Datrrbnsr: Sgstems Iddlson-IYesley Reading \IX, 1987

1 K P Esn-amn J S Gray R -1 Lorie, and I L Traiger "The notions

of consistency and pledicate locks in a database system." C o m m iiCM

1 9 : l l (1976) pp 624-633

Trang 5

988 CII.4PTER IS CONCURRENCY CONTROL

5 J N Gray, F Putzolo and I L Traiger "Granularity of locks and degrees

of consistency in a shared data base," in G A I Sijssen (ed.), JJodelzng zn Duta Base 121anngen~ent Systems, North Holland, Amsterdam 19iG

6 J X Gray and A Reuter, 'II-nnsaction Processing: Concepts and Tech- nzques, Morgan-Kaufrnann San Francisco, 1993

7 H F Korth, "Locking primitives in a database system," J ACM 30:l (19831, pp 55-79

8 H.-T Kung and J T Robinson, "Optimistic concurrency control,.' ACM Trans on Database Systems 6:2 (1981), pp 312-326

9 C H Papadimitriou and P C Kanellakis, "On concurrency control by multiple versions," ACM Trans on Database Systems 9:l (1984), pp 89-

This chapter also incllldes an introduction t o distributed databases IVe focus on ho1v to lock elements that are distributed among several sites, perhaps with replicated copies K e also consider how the decision to co~nmit or abort a transaction can be rnade ~vhen the transaction itself involves actions at several sites

Finally, consider the problems that arise due to ''long transactions." There are applications, such as CAD syste~lls or "workflow" systems, in which llumaii and conlputer processes interact, perhaps over a period of days These systelns like short-transaction systems such as banking or airline reservations, need to preserl-e consistency of the database state Ho\T-ever, the concurrexlcy- control methods discussed in Chapter 18 do not rvork reasonably when locks are held for days, or decisions to validate are based on events that 'happened days in the past

19.1 Serializability and Recoverability

In Chapter 17 Xve discussed the creation of a log and its use to recover the database state when a system crash occurs \Ye introduced the vie\\- of database cornputatio~l in which values move bet\\-ecn nonvolatile disk, volatile ~ n a i n - menlor?- and the local address space of transactions The guarantee the various

Trang 6

090 CHAPTER 19 AIORE A B O U T TRAj\'SACTION JIALVL4GE-IIEIVT

logging methods give is that, should a crash occur, it ~57ill be able t o reconstruct tlie actions of the committed transactions (and only the committed transactions) on the disk copy of the database A logging system makes no attempt

t o support serializabil~ty; it w ~ l l blindly reconstruct a database state, even if

it is the result of a noriserializable schedule of actions In fact, commercial database systems do not always insist on serializabilit~; and in sorne systems

serializability is enforced only on explicit request of the user

On the othcr hand, Chapter 18 talked about serializability only Scliedulels designed according t o the principles of that chapter may do things that the log manager cannot tolerate For instance, there is nothlng in the serializability definition that forbids a transaction with a lock on a n element A from writing

a new value of A into the database before committing, and thus violating a rule

of the logging policy \Verse, a transaction might write into the database and then abort without undoing the Ivnte, which could easily result in a n inconsistent database state, even though there is no system crash and the scheduler

theoretically maintains serializability

19.1.1 The Dirty-Data Problem

Recall from Section 8.6.5 that data is "dirty" if it has been written by a transaction tliat is not yet committed The dirty data could appear either in the buffers, or on disk, or both; either can cause trouble

Figure 19.1: TI writes dirty d a t a and then aborts

Example 19.1 : Let us rcconsider the serializable schedule from Fig 18.13

but suppose that after reading B, TI has t o abolt for sonic reason Then tlie sequence of events is as in Fig 19.1 After Tl aborts, the sclieduler releases the

lock on B that TI obtained; that step is essential, or else the lock o n B would

be unavailable to any other transaction, forever

Ho~i-ever, T2 has now read data that does not represent a consistent state

of the database That is, ?r2 read the value of -4 that TI changed, but read the value of B that existed prior to Ti's actions I t doesn't matter in this casc whether or not the value 125 for il that TI created n-as m i t t e n t o disk or not; ?'?

gets that value from a buffer, regardless As a result of reading a n incorlsistcr~t state, T2 leaves the database (on disk) with a n inconsistent state, where -4 # B

The problem in Fig 19.1 is that -4 ~vritten by TI is dirty data, whether

it is in a buffer or on disk The fact that 1; read -4 and used it in its on-n calculation makes z ' s actions questionable -1s we shall see in Section 19.1.2

it is necessary, if such a situation is allowed t o occur, t o abort and roll back T2

as \\-ell a s TI

Figure 19.2: TI has read dirty data from T2 and nlust abort n-hen Tl docs

Example 19.2 : Sow, consider Fig 19.2.1~11ich sho~vs a sequellce of actions i ~ n - der a timestamp-based scheduler as in Section 18.8 Ho~vever: lye ilnagille that this sclleduler does not use the colnrnit bit that \\-as introduced in Section 18.8.1 Recall that, the purpose of this bit is to prevent a value t h a t !\-as n-ritten b>-

a n uncommitted transaction t o be read by anot,her transaction T h ~ s , when TI

reads B a t the second step, there is no co~nmit-bit check to tell T I t o delay

TI can pr.oceed and could eve11 write t o disk and commit; we haye not shoiv11 further details of 1v11at Tl dors

Eyei~tually 7; tries to ~i-ritc C in a ph!.sically unrealizable \\-a? and T2

aborts The effecr of f i ' s prior write of B is cancelled: the value and \\-rite-ti~np

of B is reset to 1~11at it was before T2 wrote I-et TI has been allo~i-?ti t o use this cancelled value of B and can do anything ~ i t h it: such as using it to conlpute

n e x values of A B , and/or C and ~vriting them to disk Thus T I ? ha\-ing read

a dirty value of B, can cause an inconsistellt database state Xote that had the commit bit been recorded and used, the read rl(13) a t step (2) would have

Trang 7

992 C'H-4l'TER 19 MORE ABOUT TRA.VSS-iCTION AI-A-YilGElIEST

I 19.1, SERI.~LIZ.-~BILITY 1 S D RECOVERABI~~ITI- 993

been delayed, and not allowed to occur until after T2 aborted and the value of

B had been restored to its previous (presumably committed) value

AS x e see from the e x a m ~ ~ l e s above, if dirty data is available to transactions, then \ve so~netilnes have to perform a cascading rollback That is, when a transaction T aborts, we must determine ~vhich tralisactions have read data written by T, abort thein: and recursively abort any tralisactions that have read data written by an a.borted transaction That is, we must find each transaction

L' that read dirty data written by T , abort C': find any transaction 5- that read dirty data from li, abort V : and so on To cancel the effect of a n aborted

transaction, we can use the log, if it is one of the types (undo or undo/redo) that provides former ~ralalues We may also be able t o restore the d a t a from the disk copy of the database, if the effect of the dirty data has not migrated to disk These approaches are considered in the next section

As Jve have noted, a ti~ncstamp-based scheduler witti a conlrnit bit prevents a transaction that rnay Ilax-e read dirty data from proceeding, so there is

no possibility of cascading rollbaclc xvith such a scheduler -4 validation-based sclieduler avoids cascading rollback, because ~vriting to the database (el-en in buffers) occurs only after it is determined that the transaction JX-ill colnmit

19.1.3 Recoverable Schedules

In order for any of the logging metllods ~ v e Ilave discussed in Chapter 17 to ailon- 1-ecovery the set of transactions that are regarded a s committed after recol-el?- must be consistent That is if a transaction TI is, after recovery r e g a ~ d r d

as committed, and Tl used a value written by G, the11 T2 must also remain committed after recovei 5 Thus, n e define:

-1 schedule is rccove~able if earh tra~lsaction coinmits only after each tiansaction from n-hlcli it lias read lias committed

Example 19.3: 111 this and several subsequent exa~nples of schedules n-it11 read- and n-rite-actions, we shall use c, for the action 'transaction T, commits."

Here is a n example of a recoverable schedule:

In schedule S2, T2 must precede TI in a serial order because of the writing of -4 but T I ~ l i u s t precede T2 because of the n-ritirlg and readillg of B

Fillally observe the follotving variation on S1 \vllich is serializable but not rccoveiable:

In sclledule S3: TI precedes T2: but their cornrnitrne~lts occur in the wrong order

If before a crash the corlllllit record for T'2 reachcd disk, but the conllnit record for Ti did 11ot then regardless of whether u ~ l d o , redo, or urldo/redo logging ,$-ere used: 6 ~votild be committed after recovery, but Tl would not fJ

Irl order fc,r schpclules t o be truly recoverable under ally of the three loggilrg methods, there is one additional assiiniption a c nlust make regarding schedules:

The log's colllmit records reach disk in the order in which they are written

As 15-c observed in Example 19.3 concerning sclirdule Sg should it be possible fol coniniit records t o reach di4k in the wrong order then consistent lecovery might

be iInllossible, \ye return to a ~ i d exploit this prillciple in Section 19.1.6

19.1.4 Schedules That Avoid Cascading Rollback

Recoverable sclletiules solnetimes require cascading rollback For instance, if after first four steps of ~clicdule S1 in Esnl~iple 19.3 TI had t o roll back,

it n-ould be lleccssary to roll back TL, as n-ell To guar:lntec the absence of cascadillg rollback, lleed a stronger co~lditioll tlian rccowrabilit~ 11'~ Siiy that :

-1 schedule olioids cascarlzng rollback (or -is an .4CR schedfile") if transactions ma! lead only values written 11.1 co~lnnitted tiansactions

Put allotller \v\-a\- a11 XCR schedule forbids the rcadi~ig of dirty data As for recol-erablc sclledules \ye assume that "comlnitted" ~ n e a n s that the log's comn~it record has reaclled disk

Exalllple 19.4 : 5clicdules of Exalnple 19.3 are not -1CR 111 each case T2

reads B frolll the uncomniitted transaction T I Hon-ever consider:

son., T? rends B ollly after T I thc transaction that last n.rotc B has colnlnit- red alld its log record n-rittc~i to disk Thus sc,hcdnle S1 is ACR as 'vell as rcco\.crablc

s o t i c e tllat sllould a transaction such a s T2 read a value m i t t e n 11)- T I after

TI conrmits then surely fi either co~nnlits or a1)orts after T1 commits Thus: Ever>- ;\CR schedule is recotwable

Trang 8

9914 CYL4PTER 19 AfORE A B O U T TIZAh-SACTION h I I A 1 7 ~ ~ ~ ~ f ~ h r ~ 19.1 SERI.4LIZABILITY AJ-D R E C O \ ~ E R ~ ~ ~ L ~ ~ - 9%

Our prior discussion applies t o schedules that are generated by any kind of scheduler In the common case that the scheduler is lock-based, there is a simple and commonly used way to guarantee that there are no cascading rollbacks:

Strict Locking: .% transaction must not release any exclusive Iocks (or other locks, such as increment locks that allo~ir values to he changed) until the transaction has either con~mitted or aborted, and the commit or abort log record has been flushed to disk

A schedule of transactions that follow the strict-locking rule is called a strict schedule Two important properties of these schedules are:

1 Every strict schedule is ACR The reason is that a transaction T2 cannot read a value of element X written by TI until Ti releases any exclusive lock (or similar lock that allolvs X to be changed) Under strict locking, the release does not occur until after commit

2 Every strict schedule is serialzzable To see why, ohscrve that a strict schedule is equivalent t o the serial schedule in which each tra~isaction runs instantaneously at the time it commits

IVith these observations, we can now picture the relationships among the dif-

ferent kinds of schedules we have seen so far The containments are suggested

in Fig.19.3

Figure 19.3: Containments an noncontai~lments among classes of schetlules Clearly in a strict schedule it is not possihle for a transaction to rcad dirty data since data written to a huffer by an unconilnitted transaction re~nairls locked until the transaction commits Ho~vever: we still have tlie prohleni of fising the data in buffers when a transaction aborts, since these cllallges must have their effects cancelled How difficult it is t o fix buffered data depellds on

~vhether database elements are blocks or sornethi~lg smaller \Ye shall consider each

Rollback for Blocks

If the lockable database elements are blocks then there is a simple rollback method t h a t never requires us t o use tile log Suppose that a transaction T has obtained an esc1usi~-e lock on block A written a new value for A in a buffer, and then had t o abort Since -4 has been locked since T xvrote its value, 110

other transaction has lead -4 I t 1s easy t o restore the old value of -4 provided the folloning rule is follo~ved

Blocks ~vritten by uilcominittcd transactiolls are pinned in main memory; that is their buffers are not alloxved t o be written t o disk

I11 this case n e roll back.' T when it aborts by telling the buffer manager t o ignore t h e value of A T h a t is, the buffer occupied by -4 is not written anywhere, and its buffer is added t o the pool of available buffers \Ve call be sure that the value of A o n disk is t h e most recent value written by a committed transaction, which is c ~ a c t l y the value we want A t o have

Tllele 1s also a sinlple rollback method if we are using a multiversion system

as in Sections 18.8.5 and 18.8.6 \Ye niust again assume that blocks written by

~incomniitted transactions are pinned 111 memory Then, we simply renlove the value of A that was m i t t e n by T from the list of available values of A S o t e that because T was a i\iiting transaction, its value of I ~ v a s locked from the time the lalue n.as \vritten to the time it aborted (assuming the timestamp/lock scheme of Section 18.8.6 is used)

R o l l b a c k f o r Small D a t a b a s e E1ement.s When lockable database elenlcnts are fractions of a block (e.g., tuples or oh-

~ e c t s ) then the sinlple appioach to restori~lg buffels that have been ~ n o d ~ f i e d hl- aborted transactions nil1 not uoik The p ~ o h l e ~ n is that a buffer may contain data changed by t ~ v o or more transactions: if one of them aboits, Tve still nlust plesesve tlie changes made by the other \ l e have several choices \vhen we must restore thc old value of a small database element A that n-as written by the tlansaction that has a11ortt.d

1 We can read t h e original value of -I from the database stored on disk and modify the buffer contents appropriately

2 If the log is a n undo or untlo/redo log then we can obtain the former value from the log itself The same code used t o recover frorn crashes ma? be used for \-oll~ntary" rolll~acks as \~-cll

3 \IF can keep a separare ~ n a i r ~ - l ~ l c l r ~ o r y log of the changes n ~ a d e by car11

I transaction, preserved for only the tinlc that transactio~l is active The

i old value call be fouxid fro111 this "log."

Sone of these approaches is ideal The first s ~ ~ r e l y il~rolves a disk access The second (examining the log) might not involve a disk access if the relevant

Trang 9

996 CHAPTER 19 MORE ABOUT TRAATSACTION JlilA-~lGE:lIEXT

When is a Transaction Really Committed?

The subtlety of group commit reminds us that a completed transaction can

be in several different states between when it finishes its xvork and when it

is truly "committed." in the sense that under no circumstances, including the occurrence of a system failure, will the effect of that transaction be lost As we noted in Chapter 17, it is possible for a transaction to finlsh its work and even write its C O M M I T record t o the log in a main-memory buffer, yet have the effect of that transaction lost if there is a system crash and the COMMIT record has not yet readied disk Lloreover, we saw in Section 17.5 that even if the C O M M I T record is on disk but not yet backed

up in the archive, a media failure can cause the transaction to be undone and its effect to be lost

In the absence of failure, all these states are equivalent, in the sense that each transaction will surely advance from being finished to having its effects survive even a media failure However, when rve need t o take failures and recovery into account, it is important t o recognize the differences among these states, which otherwis'e could all be referred t o informally as 'L~ommitted."

portion of the log is still in a buffer Hone1 er it could also invol~ e extensix e esamination of portions of the log on disk sea~ching for the update record that tells the correct former value Tlie last approach does not require disk accesses

but may consume a large fraction of menioi y for the main-memory '.logs."

Under some circumsta~ices, n-e can avoid reading dirty data even if r e do not flush every commit record on the log t o disk immediately As long as a-e flush log records in the order that they ale written, we can release locks as soon as tlle commit record is written t o tlie log in a buffer

Example 19.5: Suppose transaction TI I\-rites X , finishes, writes its C O M M I T

record on the log, but the log record remains in a buffer Even though TI has not committed in the sense that its connilit record can survive a crash

we shall release TL's locks Then T2 reads S and 'colnmits." but its c o ~ n n ~ i t record, n-hicli follows that of TI also remains in a buffer Since we are flushing log records ill the order 1s-ritten T2 cannot be perceived as co~nmittcd b?- a recovery manager (because its commit record reached disk) unless Tl is also perceived as committed Thus, there arc three cases that the recovery manager could find:

1 Neither TI nor T.L has its commit record on disk Then both are aborted by the recovery manager, and the fact that T2 read S from an uncommitted

2 TI is comnlitted but T2 is not There is n o problerri for two reasons: T2

did not read S from an uncomlnitted transaction, and it aborted anyway with n o effect on the database

3 Both are corrnnitted Then the read of S by Tz was not dirty

On the other hand, suppose t h a t the buffer containing Tz's commit record got flushed t o disk (say because the buffer manager decided t o use the buffer for somet11i1:g else) but the buffer containing TI'S commit lecord did not If

there is a crash a t t h a t point it will look t o the recovery manager that TI did not commit, but T2 did The effect of T2 will be perlrianently reflected in tlie database, but this effect was based on the dirty read of X by T2

Our conclusion from E s a ~ n p l e 19.5 is t h a t we can release locks earlier than the time t h a t t h e transaction's commit record is flushed to disk This policy,

often called g i a z p commit is:

Do iiot release locks until the transaction finishes: and the comniit log record a t least appears in a buffer

Flush log blocks in the order that they \\-ere created

Group commit like the policy of requiring 'recoverable schedules" as discussed

in Section 19.1.3, guarantees that there is never a read of dirty data

19.1.7 Logical Logging

We salv in Section 19.1.5 that dirty reads are easier to fis up rvhen the unit of locking is t h e block or page Holvever, there are a t least two problems prese~lted when database elements are blocks

1 -411 logging methods I-equirc either the old or new value of a database element, or both: t o be recorded in the log \Vhen the change t o a block

is small, e.g., a ren-rittcri attribute of one tuple or an inserted or deleted tuple, then there is a great deal of redundant information written on tile log

2 Tlie recluireme~it that the schedule be recoverable; releasing its locks only after co~nnlit car1 illhibit concurrency severely For esample, recall our discilssion in Section 18.7.1 of the advantage of early lock release as xr

access d a t a tllro,lgll a B-tree indes If we require that locks be helti until connnit thcn this advalitagc cannot be obtained: and n-e effectively allon- only one writing transaction to access a B-tree a t any time

Both these concerns motivate the use of logical logging villere only the

changes t o the blocks are described There are several degrees of coniplesity depending on the nature of the change

Trang 10

1 .A small rlunlber of bytes of the database element are changed, e.g the update of a fixed-length field This situation call be handled in a straightforward way, where we record only the changed bytes and their positions

Example 19.6 \rill show this situation and a n appropriate form of update record

2 The change to the database element is simply described; and easily restored, but it has the effect of cliangiiig most or all of the bytes in the database element One coninion situation: discussed in Example 19.7: is when a variable-length field is changed and illuch of its record, and even other records must slide within the block The new and old values of the block look very different unless we realize and indicate the simple cause

ample 19.8, take up the matter of B-trees, a logical structure represe~ited

by database clements that are disk blocks, t o illustrate this co~rlples form

of logical logging

Example 19.6 : Suppose database elements are blocks that each contain a set

of tuples from some relation 11'e call express the update of an attribute by a log record that says somethirig like 'tuple t had its attribute a changed f r o ~ n vahie ~ ' 1 t o 02.'' An insertion of a nerv tuple into empty space on the block can

be expressed as "a tuple t with value ( n l a 2 : : a k ) was inserted beginning

at offset position p." Unless the attribute changed or the tuple inserted are

comparable in size t o a block, the alnount of space taken by these records will

be much smaller than the entire block lloreot-er, thcy serve for both undo and redo operations

Notice that both these operations are idernpotent; if you perform them scv- era1 tinlcs on a block; the result is the same as perfor~ning them once Liken-ise

thcir implied inrerses, I\-here the value of t [ n ] is restored from vz back t o 1.1 or the tuple t is removed are also idenrpoteiit Thus records of these types can

be used for rccol-cry in exactly tlie same way that update log rccords were used throughout Cliaptcr 17 0

E x a n l p l e 19.7: Again assunic database clc~nents arc blocks lioldiiig t l ~ p l c but the tul~les Ilavc sonie rariahle-lengtil ficlds If a c l l t ~ ~ ~ g e t o a f i ~ l d such as Ivas described in Exalilple 19.6 occurs, n.e niay 1la1-e to slide large portio~ls of the block t o make room for a longer field or to preserve space if a ficld beco~~ics smaller In extreme cases, ~ v e could have to crcatc ail overfloxr block (1.c~cal1 Section 12.5) to hold part of the contents of the original block, or wc could remove a n ovc.rflo\v block if a shorter field allows us to combine the contenrs of two bl~clis into one

As 101ig as the block and its o\.erflow block(s) are considered part of one database c l ~ i n e n t , then it is straightforward to use the old and/or new value of tlic changed field to tundo or redo the change Ho~vever, the block-plus-overflox~~- bloik(s) must l ~ e thougilt of as holding certain tuples a t a "lo@cal" level 1Ve nlay not even be able t o restore the bytes of these blocks to their original state after a n undo or redo, because there nlay have been reorganization of t h e blocks due t o othcr cliarges that varied the length of other fields Holvever if we think

of a database ele~nent a s being a collection of blocks that together represent certain tupleb tile11 a redo or undo can indeed restore the logical *state" of the eleme~it O

Hoxvever, it ]nay not be possible, as we suggested in Example 19.7, t o treat blocks as expandable through t h e mechanis~ll of overflow blocks IVe nmay thus

be able t o undo or redo actions only a t a level higher than blocks The next esample discusses the important case of B-tree indexes, nhere the management

of blocks does not perinit ove~flow blocks, and we must think of undo and redo

as occuiring a t the logical level of the B-tree itself; rather tllan the blocks

Example 19.8 : Let us consider the problem of logical logging for B-tree nodes Instead of xvriting the old and/or new value of a n entire node (block) on the log we n-rite a short record t h a t describes the change These changes include:

1 Insertion or deletion of a key/pointer pair for a child

2 Change of the key associated \x-it11 a pointer

3 Splittirig or ~rlerging of nodes

Each of these changes call be indicated with a short log record Even the splittin: operation requires only telling xvhere t,he split occurs; and ivhere tahe iiex lodes are Likewise: merging requires only a reference to the nodes involved; since rhe manner of rnergirlg is determined by the B-tree rnallagenlent algorithms used

csillg logical iljii!at~ rerorris of these tj-pesalloirs us t o release locks earlier than xrould othern-ise be required for a recoverable schedule The reasoil is that d i r t - reads of B-tree blocks are never a problem for the transaction that reads tl~ein provided its only purpose is t o use the B-tree t o locate the data the transaction needs to access

For instance suppose that tra~lsactioll T reads a leaf node dY but the transaction c- tilat 1a.t wrote -\- lates aborts and sorne change nlade to S (e.g.; the

illscrrioll of a nelr keT/lloillter pair into due t o a n insertion of a tuple b\

liceds to be undone If T has also inserted a k e y / p o i ~ ~ t e r pair into S then it is liot possiMe t o restore '.t o the !ray it was before LT inodified it Hoxevcr tlie effect of L- on -\- call be undone; in this exa~nple n-e would delete the key/pointer pair that C had iiiscrted Tlie resulting 5 is riot the same as that irllich existed before U operated: it has the i~lsertion made by T Hon-ever, there is no database inconsistency siilcc the B-tree a s a ivhole continues to reflect only the

Trang 11

1000 CHa4PTER 19 MORE ABOUT TRAA7S.4CTION AlANrlGEJlEl-T

changes made by committed transactions That is, we have restored the B-tree

a t a logical level, but not a t the physical level

If the logical actions are idempotent - i.e they can be repeated any number

of times without harm - then we can recover easily using a logical log For instance, we discussed in Example 19 6 how a tuple insertion could be represented in the logical log by the tuple and the place within a block where the tuple was placed If we write that tuple in the same place two or more tune5 then it is as if we had written it once Thus when recovering, should \ve need

t o redo a transaction that inserted a tuple, we can repeat the insertion into the proper block a t the proper place, without worrying whether me had a l r e a d ~ inserted that tuple

In contrast, consider a situation ishere tuples can move around withi11 blocks

or between blocks, as in Examples 19.7 and 19.8 Sow, we cannot associate a

particular place into which a tuple is to be inserted; the best we can do is place

in the log a n action such as '.the tuple t was inserted somewhere on block B

If we need to redo the insertion of t during recovery, we may ~r,iild up with t n o copies o f t in block B W'oise, we may not know whether the block B 1vit11 tlle first copy o f t made it t o disk Another transaction writing t o another database element on block B may have caused a copy of B t o be written to disk for example

To disambiguate situations such as this ~vhen we recover using a logical log

a technique called log sequence numbers has been developed

Each log record is g i ~ e n a number one greater than that of tlle previous log record.' Thus, a typical logical log record has the form <L,T .I B>

where:

- L is the log sequence number, an integer

- T is the transaction involved

- A is the action performed by T e.g., "insert of tuple t."

- B is the block on which the action was performed

For each action, there is a cornpensating action that logically undoes the action -4s discussed in Esample 19.8 the compensating action niny not restore the database t o exactly the same state S it ~vould liar-e I ~ c c ~ l in had the action never occurred, but it restores the database to a statc that

is logically equivalent to S For instance, the compensating action for

"insert tuple t" is "delete tuple t."

' ~ v e n t u a l l y t h e log sequence numbers must restart a t 0; but the time hetween restarts of the sequence is so large that no ambiguity can occur

19.1 SERMLIZ.4BILITY A N D R E C O V E R A B I L I T Y

If a transaction T aborts, then for each action performed or1 the database

by T, the compensating action is performed, and the fact that this action was performed is also rccorded in the log

Each block maintains, in its header, the log sequence number of the last action t h a t affected that block

Suppose noxv that we need t o use the logical log t o recover after a crash Here is an outlirie of tlle steps t o take

1 Our first step is t o reconstruct the state of the database a t the time of the crash including blocks xvhose current values were in buffers and therefore got lost To do so:

(a) Find the most recent checkpoint on the log, and determine frorn it the set of transactions that nere active a t that time

(b) For each log entry <L,T, A, B>, compare the log sequence number

IV on block B with the log sequence number L for this log record

If !V < L, then redo action A: t h a t action was never perfornled on block I? However, if N 2 L then do nothlng; the effect of '4 was already felt by B

(c) For each log entry t h a t informs us t h a t a transaction T started, committed, or aborted, adjust the set of active transactions accordingly

2 The set of transactions that remain active evllcn se reach the end of the log must be aborted To do so:

(a) Scan the log again, rhis time from the end back to the plel-ious checkpoint Each time we encounter a record <L T, A B> for a transaction T that must be aborted perfor111 the compensating action for

-4 on block B and record in the log t h e fact that that compensatillg action was performed

(b) If we must abort a tiansaction that began prior t o the most recent checkpoint (i.e., that transaction was on the active list for the check-

p i l l t ) then continue back in the log until tile start-records f o ~ all such trailsactions have been found

(c) Write abort-records in the log for each of the transactions we had to abort

* Exercise 19.1.1 : Consider all \\-ays t o insert locks (of a single type only as in Section 18.3) into the sequellce of actiorls

so that the transaction TI is:

Trang 12

1002 CH.4PTER 19 NORE ABOUT TR-I*\SACTION JIANIGEJIEYT 19.2 I'IEIV SEXI,-LLIZ IBILITI- 1003

a ) Two-phase locked, and strict 19.2 View Serializability

b) Two-phase locked, but not strict

Exercise 19.1.2: Suppose that each of the sequences of actions below IS fol- lolved by a n abort action for transactio~l TI Tell whicli transactions need t o be rolled back

* a ) r1(24); rz(B); wl(B); ~ 2 ( C ) j r j ( B ) ; r3(C); 703(D);

b) r l (A): ml (B); rz(B); 102(C); r3(C); w3(D);

c) r2(A); r3(A); r l ( A ) ; w ( B ) ; r2(B): rz(B); m2(C); r3(C);

d) 72(-4); r3(A); r l ( A ) ; wl(B); rd(B); IUL(C); r3(C);

E x e r c i s e 19.1.3: Consider each of the sequences of actions in Exercise 19.1.2

but now suppose t h a t all three transactions cornrnit and write their cornillit record on the log immediately after their last action Hon-ever, a crash occurs

and a tail of the log mas not writtcn t o disk before the crash and is therefore lost Tell, depending on where the lost tail of the log begins:

2 f hat transactions could be consideled uncomnlitted9

ii ilre any dirty reads created during the recovery process? If so n-hat transactions need t o he rolled back?

zii \$-hat additional dirty reads could have been created if the portion of tlie log lost was not a tail but rather solne potions in the middle?

! E x e r c i s e 19.1.4 : Consider the folloa-ing tn-o transactions

TI: WI (-4): (B); r~ (C): cl;

T2: WZ(-4): T Z ( B ) : ? U ~ ( C ) CZ;

* a ) HOW nnany schedules of Tl and T2 are rccovcrable?

b) Of these how many are ICR sclietlules?

c) How many are both rccoveral~lc and scrializnble?

d) How many are both iCR and serializable?

E x e r c i s e 19.1.5: Give an example of an ICR schedule wit11 shared and esclusive locks that is not strict

Recall our discussion in Section 18.1.4 of how our true goal in tlie design of a scheduler is t o allow only schedules t h a t are serializable We also saw how tiif- ferences in what operations transactions apply to the d a t a call affect whether or not a given schedule is serializable lye also learned in Section 18.2 that sched-

u l e r ~ nor~nally ellforce "conflict serializability," which guarantees serializability regardless of what tlie transactiolls do with their data

However, there are weaker conditions than conflict-serializability t h a t also guarantee serializability In this sectiorl we shall consider one such condition, called 'vie\v-serializability:' Intuitively, view-serializability considers all the connectio~is between transactions T and li such that T writes a database el-

ement ~vhose value U reads The key difference between view- and conflict- serializability appears when a transaction T writes a value A t h a t no other transaction reads (because some other transaction later writes its om11 value for

.A) In that case, the KT(-4) action can be placed in certain other povitiolls

of the schedule (where A is like~vise never read) that ~vould not be permitted under the definition of conflict-serializability In this section, 11-e shall define vie~v-serializability precisely and give a test for it

19.2.1 View Equivalence

Suppose we have two scheduIcs S1 and S2 of the same set of transactions Imagine that there is a hypothetical transaction To that wrote initial \alu?s for each database element read by any transaction in the schedules, and another hypothetical transaction T j that reads every element written by one or more tra~isactions after each schedule ends Then for every read action ri(*.I) in one

of the schedules 17c can find the write action l u j ( ; l ) that most closely preceded the read in question.' We say T, is the source of the read action ri(=l) S o t e that transaction T j could be the lippothetical initial tra~isactioll To, and Ti

could be Tf

If for every read action ill one of the schedules, its source is the same in the other schedule, we say that S1 and Sg are view-equivalent Surely, view- equivalent schedules are truly equivalent; they each d o the same when executed

on any one database state If a scliedille S is vie~v-equivalent t o a serial schedule

we say S is view-serializable

1 E x a m p l e 19.9 : Consider the \chetlulr S defined by:

TI : rl(-J) 1L-1 ( B ) T?: r2(B) ~ " ( ~ 4 ) w 2 ( B )

Trang 13

1004 CH-APTER 19 AiORE ABOUT TR.AXSACTION lIA4LY-4GE-\1EST 19.2 SERIALIZ-4BILITY

Sotice that vie have separated the actions of each transaction vertically: to indicate better which transaction does what; you should read the schcd~lle from left-to-right, as usual

In S , both TI and T2 write values of B that are lost; only tbe value of

B written by T3 survives to the elid of the schedule and is "read.' by the hypothetical transaction Tf S is not conflict-serializable To see rvhi, first note that T2 writes A before TI reads A: so l must precede TI in a hypothetical conflict-equivalent serial schedule TIie fact t h a t t h e action ,lnl (Bj precedes

I C ~ ( B ) also forces TI to precede T2 ill any co~iflict-equivalent serial schedulc

Yet neither a l ( B ) nor l(i2(B) has any long-term affect on tlie database It is these sorts of irrelevant \\,rites that vien.-serializability is able t o ignore, when determining the true constraints on an equivalent serial schedule

hIore precisely, let us consider the sources of all the reads in S:

1 The source of r z ( B ) is To, since there is no prior write of B in S

2 The source of rl(A) is T 2 , since T.l most recently wrote -4 before the read

3 Likewise, the source of r3 (-4) is T2

4 The source of the hypothetical read of =I by Tf is T 2

5 Thc source of thc hypothetical read of B by T f is TJ, the last w i t e r of B

Of course, To appears before all real transactions iri any schrtiule, arid Ij appears after all transactions If we order the real transactions (T.L: T I T3) then the sources of all reads are the same as in schedulc S That is, T2 reads B, and surely TO is the previous "15-riter." Tl reads -4; but T z already wrote -l so the source of rl(.4) is T2, as in S T3 also reads .4: but since the prior T.2 \{-rote -4

that is the source of r3(.-l), as in S Finally, the hypot,hctical Tf reads -4 and

B j but the last writers of d and B in the s c h e d ~ l e (T2: TI, T3) are T2 and T3 rc- spectivel!; also as in S K e conclude that S is a view-serializable scliedule, and the schedule represented by the order ( f i , T I : T 3 ) is a vien.-cquivaleiit schedule

19.2.2 Polygraphs and t h e Test for View-Serializability

Therc is a gcneralization of the precedence graph ivhicll n-c, iiscd to tcst co11- flict scri;ilixal~ility in Section 18.2.2 that reflects all thc prcc.odcncc, constrai~lts required 1))- thc dc~finition of vicn- scl.ializability \Ye tl(+i~lr) ill(, pol!/grclpli for ;i

schedule to consist of the follo~ving:

1 -1 node for cach transaction and additional rlodcs for tlic hypothetical

transactions To arid Tf

2 For each action r , ( S ) with source T, place an arc froni T, t o T,

3 Suppose Tj is t h e source of a read ri(X), and Tk is another ~vriter of X

It is not allowed for Tk t o intervene between T, and Ti, so it must appear either before T, or after Ti n T e represent this condition by a n arc pair (sho~r-n dashed) from Tk t o Ti and froni Ti t o T k Intuitively: one or t h e other of an arc pair is 'real," but lve don't care which, and when x e t r y

to make the polygraph acyclic, we can pick whichever of the pair helps t o make it acyclic Honever? there are important special cases where t h e arc pair becomes a single arc:

(a) If T j is To, then it is not possible for Tk t o appear before T', so we use a n arc Ti + Tk in place of the arc pair

(b) If Ti is T f ; then Tk cannot follow T i , so we use an arc Tk + Tj in place of the arc pair

Figure 19.4: Beginxling of polygraph for Esample 19.10

Example 19.10: Consider the schedule S from Example 19.9 \Ire show in Fig 19.4 the beginning of the polygraph f o ~ S , where only the nodes and the arcs fi-om rule (2) have hcen placed \Ye have also indicated the database elemcnt causing each arc That is, -4 is passed from T2 t o TI T3 and T f , while

B is passed fro111 To to T2 and from T3 to T f

?;o\v, n.e lllust considel n-hat transactioils might interfere with each of these five connections by n-~iting the same clen~cnt bet~vecn them These potential interferences are ruled out by the arc pairs from rule (3) although as n-e shall see, in this example each of the arc pairs inrolves a special case and becomes a single arc

Consider the arc & -+ Ti based on eleliler~t d The only writers of A are To

and T2 and ncitller of rllem can get in tlie iniddle of this arc: since To cannot move its posirioll and T2 is already an a i d of the arc Thus 110 additional arcs are needed ;\ sinlilar argurntnt tells us no additional arcs are needed to keep writers of -I outside the arcs T2 -+ 7; and T? -t Tf

S o ~ r - collsider the arcs based on B Xote that To TI T? and T3 all n-rite

B Consider the arc To -+ T2 first TI and T3 are otlier writers of B: To and T2 also ~yrite B; but as sav,- the arc ends cannot cause interfererlce so we need not consider them -1s we cannot place TI bet\\-een To and T 2 , in principle \re need tlic arc pair ( T I -+ To T.r -+ T I ) Honever nothing can precede To, so the optioll T I -+ To is not possible \Ye may in this special case just add the

Trang 14

1006 CHAPTER 19 AIORE ABOUT TRANS24CTION AM-TAGEJIENT

1" 19.2 1-IETV SERI-4LIZdBILITY 1007 arc T2 -+ Ti to the polygraph But this arc is already there because of .4, so in

effect, we make no change to the polygraph t o keep Ti outside the arc To -+ T2

We also cannot place T3 between To and T2 Similar reasoning tells us to add the arc Tz -+ T3, rather than an arc pair However, this arc too is already

in the polygraph because of A, so we make no change

ivext, consider the arc T3 -+ T f Since To, T I , and Tz are other writers of

B, we must keep them each outside the arc To cannot be moved between T3

and T f : but TI or Tz could Since neither could be moved after T f r e must constrain Ti and T.L t o appear before T3 There is already an arc Tz -+ T3, but

we must add t o the polygraph the arc Tl -+ T3 This change is the only arc we must add to the polygraph, whose final set of arcs is shown in Fig 19.5

Figure 19.5: Complete polygraph for Example 19.10

Example 19.11 : In Example 19.10, all the arc pairs turned out t o be single

arcs as a special case Figure 19.6 is an example of a schedule of four transac-

tions where there is a true arc pair in the polygraph

be added As we saw in Example 19.10, there are several silnplifications Ive can make \Then avoiding interference with the arc T, -t T,, the only transactiol~s

that need be considered as Tk (the transaction t h a t cannot be in the middle) are:

\Vriters of a n e!ement that caused this arc T, -+ T,

But not To or T f , 15-hich can never be Tn and

S o t Ti or T,, the ends of the arc itself

\\*it11 these rules in mind let us co~lsider the arcs due to database element .4

\\-l-hich is xritten by To T3 and T4 \Ye need nut consider To a t all T3 must not get between T4 -+ T f so \ve add arc T3 -+ T4; remember t h a t t h e other arc in t h e pair, T f + T3 is not an optiotl Likewise, T3 must not get between

To -+ Tl or To -+ T 2 , tvhich results in t h e arcs TI -+ T3 and T2 -+ T3

Figure 19.7: Beginning of pol\-graph for Example 19.11 Sou-, coilsider the fact t h a t T4 also must not get in the middle of an alc due t o -4 It is all end of T4 -+ T f so that a l c is irrelevant TI must not get

b e t ~ ~ e e n To -+ TI or To -+ T? n-hicli ~ e s u l t s in the arcs TI T4 and ir?2 4 T4

S e s t let us consider the arcs due t o B nhich is w i t t e n by To, T1, and T4 .igain we need not consider To The only arcs due t o B are T I -+ T?, T I -+ T4, and T4 -t T f Tl cannot get in the middle of the first t ~ o , but the third requires arc Tl -t T4

T4 can get in the middle of TI -+ f i only This arc has neither end a t To

or Tf: SO it really requires a n arc pair: (7.1 -+ T I , Tz -+ T4) We show this arc pair, as well as all t h e other arcs added, in Fig 19.8

Test consider the writers of C : To and Ti -1s before, To cannot present a problem -41~0, T I is par[ of el-ery arc due to C' 50 it cannot get in the middle Similarl\- D is ~ ~ r i t t e n only by To and f i so n-c can dctcrmine that no Inore arcs are nccessar): The final j ~ o l ~ g r a p h is thus the one in Fig 19.8

i 19.2.3 Testing for View-Serializability

Since we must choose only one of each arc pair we can find a n equivalent serial order for schedule S if and onl? if there is son-he selection from each arc pair that turns S's polygraph into an acyclic graph TO see why, notice that if there

Trang 15

1008 CHAPTER 19 MORE ABOUT TRAA'SACTION ~fAX.4GEJ1EAJT

Figure 19.8: Complete polygraph for Example 19.11

is such an acyclic graph, then any topological sort of the graph gives an order in which no writer may appear between a reader and its source, arid every n-riter appears before its readers Thus, the reader-source connections in the serial order are exactly the same as in S ; the two schedules are view-equivalent, and therefore S is view-serializable

Conversely, if S is view-serializable then there is a view-equivalent serial order S' E ~ e r y arc pair (Tk + T, T, -t Tk) in S's polygraph niust have

Tk either before T, or after T, in S': otherw~se the writing by Tk breaks the connection from T, t o T,, which means that S and Sf are not view-equivalent Likewise every arc in the polygraph must be respected by the transaction order

of S f Ke conclude that there is a choice of arcs from each arc pair tliat makes the polygraph into a graph for which the serial order S' is consistent with each arc of the graph Thus, this graph is acyclic

Example 19.12: Consider the polygraph of Fig 19.5 It is already a graph

and it is acyclic The only topological order is (T2, TI, T3), which is therefore a view-equivalent serial order for the schedule of Example 19.10

Sow consider the polygraph of Fig 19.8 We must consider each choice from the one arc pair If we choose T4 -t TI then there is a cycle Honever, if we choose Tz + T 4 , the result is an acyclic graph The sole topological order for this graph is (Tl.T2, T3, T4) This order yields a view-equivalent serial order and shon-s that the original schedule is vie\\ serializable CI

Exercise 19.2.1 : Draw tlie polygraph and finti all view-equ~valent s e ~ i a l orders for the following schedules:

e I11 Scc.tlun 15.-1.3 n e saw how the ab11it)- to 11pgr.idt loclcs from illarc~rl t o esclusiTe can cause a deadlock because each trdnsaction holds a shared lock on the same elerneilt aiid lvarlts to upgrade the lock

There are t ~ v o broad ap1,roaches to dealing u-it11 deadlock \IF car1 detect deadlocks and fix tlle~n or we call manage traiisactio~ls in such a way that deadlocks are never able to form

19.3.1 Deadlock Detection by Timeout

\\-hen a deadlock exists, it is genclrally iulpossible to repair the situation so tliat all transactions involved can proceed Thus at least one of the traiisactio~ls \\-ill have t o he rolled back - al~ortcd and rcstartcd

T h e silllplcsr 1 t - a ~ t o detect ant1 resolve deadlocks is \\.it11 a tinleo~rt Pllt

a limit on lion- long zi tr;rnsac.tio~~ may he active and if a trilnsaction excectls this tinle roll it 1,ac.k For csamplc in a si~nple transaction q s t c i n IV\I<Y('

t?-pica1 transactions cxecutc ill nlillistc~ollds a tirneout of one niiiiutc ~\-o~lltl affect only transactions that are caught in a deadlock If some transactions are nlore colnplcx n-e might ~vant tlie tinieout to occur after a longer interval box-ever

Sotice that n h e n one transaction involved in the deadlock tirncs out it releases its locks or o t l i c ~ resources Thus tllercl is a chance that the other

Trang 16

transactions involved in the deadlock will complete before reaching their timeout limits However since transactions involved in a deadlock are likely to have started a t approximately the same time (or else, one would have completed before another started), it is also possible that spurious timeouts of transactions that are no longer involved in a deadlock will occur

19.3.2 The Waits-For Graph

Deadlocks that are caused by transactions waiting for locks held by another can

be addressed by a waits-for graph, indicating which transactions are waiting for locks held by another transaction This graph can be used either to detect deadlocks after they have formed or to prevent deadlocks from ever forming

We shall assume the latter, which requires us t o maintain the waits-for graph

at all times, refusing to allow an action that creates a cycle in the graph

Recall from Section 18.5.2 that a lock table maintains for each database elenlent X a list of the transactions that are ~i-aiting for locks on X , as nell as transactions that currently holtl locks on X The waits-for graph has a node for each transaction that currently holds a lock or is waiting for one There is

an arc from node (transaction) T t o node U if there is some database elenleiit

d such that:

1 li holds a lock on A,

2 T is waiting for a lock on A, and

3 T cannot get a lock on A in its desired mode unless U first releases its lock on .L3

If theie are no cycles in the waits-for graph, then each tiansactioii can evenrually complete There will be a t least one transactiori u-aiting for no other transaction, arid this transaction snrely can complete At that tlme t l i e ~ e will

be a t least one other transaction that is not waiting, which can complete and

FO 011

Hon-ever if there is a cycle then no transaction in the cycle can ever make progress so there is a deadlock Thus a strategy for deadlock avoidance is to roll back any transaction that makes a request that ~vould cause a cycle in the waits-for graph

Example 19.13: Suppose n-e have the following four transactions each of n-hich reads one element and n-rites another:

31n common sitnations, such as shared and exclusive locks; every waiting transacrion rvill

have to w i t until all current lock holders release their locks; but there are examples of systems

of lock ]nodes where a transaction can get its lock after only some of t h e c~lrrent locks are released: see Exercise 19.3.6

\Ye use a simple locking system \\-it11 only one lock mode, although the same effect nould be noted if we were to use a shared/exclusive system and took locks in thc appropriate niode: sharcd for a read and exclusive for a write

5) 12(.4): Denied

3 ) l l ( B ) : Denied

Figure 19.9: Beginning of a schedule mith a deadlock

In Fig 19.9 is the beginning of a scliedule of these four transactions In the first four steps each transaction obtains a lock o n the elenlent it wants to read

It step (3), T2 tries t o lock 4: but the request is denied because TI already has

a lock on -4 Thus: T._, waits for TI: and we draw an arc from the node for T.2

t o the node for T I

Figure 19.10: \Yaits-for graph after step (7) of Fig 19.9 Similar1)- at step (6) T3 is denicd a lock on C because of T2 and at step (7)

T4 is de~iieti a lock on f because of TI The waits-for graph a t this point is as sho\\-n in Fig 19.10 There is 110 cycle in this graph,

At step (8) TI nus st wait for the lock on B held by T3 If \ r e allon-ed TI to wait then there ~ o u l d be a cycle in the waits-for graph involving Ti Tz, and

T3 as suggested by Fig 19.11 Since they are each waiting for allother t o finish, none can iilake progress and therefore there is a deadlock involving these three

Trang 17

1012 CHAPTER 19 310RE ABOUT TRAArSACTION AIA:\'ilGElIEST 4

Figure 19.11: Waits-for graph with a cycle caused by step (8) of Fig 19.9 8 -

Figure 19.12: 1I'aits-for graph after TI is rolled back

transactions Incidentally, T4 could not finish either, although it is not in the cycle because T4's progress depends on TI making progress

Since we roll back any transaction that would cause a cycle, then TI must

be rolled back, yielding thc waits-for graph of Fig 19.12 TI relinquishc~s its lock on A, which may be given t o either T2 or Ti Suppose it is given to T2

Then T2 can complete n-hereupon it relinquishes its locks on 4 and C Tow T 3

which needs a lock on C, and T4, which needs a lock on 21, call both complete

At solne time, Tl is restarted, but it cannot get locks on 4 and B until T2 T 3

and T4 have completed

19.3.3 Deadlock Prevention by Ordering Elements

Sow let us consider several more methods for deadlock prevention The first requires us to order database elements in some arbitrary but fixed order For instance, if database elements are blocks, Ive could order them lexicographically

by their physical address Recall from Section 8.3.4 that the physical address

of a block is normally represented by a sequence of bytes describing its locntioll trithin the storage sl-stem

If cvcry transaction is required to request locks on elenicnts in order ( a condition that is not realistic in no st applications), then there can be no deadlock due t o transactions waiting for locks For suppose T2 is waiting for a lock on

.-I1 held by T I ; T3 is waiting for a lock on -42 held by T 2 , and so on, while T,,

is waiting for a lock on An-1 held by Tn-l, and Tl is xvaiting for a lock on 4, held by T,, Since 2'2 11% a lock on -42 but is waiting for AI, i t nlust be that

.-I2 < -41 in t'lie order of eleulents Similarly, < for i = 3 , 4 , ; n But

since Tl has a lock on ,A1 while it is waiting for A,, it also f o l l o ~ s t h a t ill < A,

\\re noly have .Al < An < -An-1 < < -42 < .-I1, which is impossible, since it implies A1 < :I1

Example 19.14: Let us suppose elenlents are ordered alphabetically Then

if the four transactions of Examplel9.13 are to lock elelllents in alphabetical order, ?il and T4 must be ren-ritten t o lock elements in the opposite order Thus, the four transactions are noxr:

Figure 19.13 shows what happens if the transactions execute ~ v i t h the same timing as Fig 19.9 TI begins and gets a lock on A T2 tries t o begin next by

g e t t ~ n g a lock on -4, but must ~vait for TI Then T3 begills by getting a lock

on B but T4 is unable to begin because it too needs a lock on A, for \vhich it must wait

Figure 19.13: Locking elenlentc in al~llnletical order prevents deadlock

Since r.) is stalled, it cannot proceed, and follo\ving the order of events in Fig 1 0 9 T3 gets a turn next It is able to get its lock on C whereupon it conipletes a t step (6) Soi\- iviilr T3's locks on B and C released TI is able

t o co~nplete which it does a t step (8) At this point the lock on -4 becomes

Trang 18

1014 CHAPTER 19 XORE ABOUT TR.4ArSACT10i\' JIANAGEJIEYT

available, and we suppose that it is given on a first-conie-first-served basis t o T2

Then, T2 can get both locks that it needs and completes a t step (11) Finally

T4 can get its locks and completes

19.3.4 Detecting Deadlocks by Timestamps

\ r e can detect deadlocks by maintaining the waits-for graph, as we discussed

in Section 19.3.2 Ho~vever, this graph can be large, and analyzing it for cj-cles each time a transaction has t o wait for a lock can be time-co~isuming An alter- native to maintaining the waits-for graph is t o associate with each transaction

a timestamp This timestamp:

Is for deadlock detection only; it is not the same as the timestamp used for concurrency control in Section 18.8, even if timestamp-based concurrency control is in use

In particular, if a transaction is rolled back, it restarts with a new, later concurrency timestamp, but its timestamp for deadlock detection never changes

The timestamp is used when a transaction T has t o wait for a lock that

is held by another transaction U Two different things happen depending on whether T or U is older (has the earlier timestamp) There are two different policies that can be used to Inanage transactions and detect deadlocks

1 The Wait-Die Scheme:

(a) If T is older than U (i.e the timestamp of T is smaller than L*'s

timestamp), then T is allo~ved to x a i t for the lock(s) held by U

(I)) If li is older than T , then T 'dies": it is rolled back

2 The iifound- Wait Scheme:

(a) If T is older than CT, it 'wounds" C Usually the "wound" is fatal:

C' must roll back and relinquish t o T the lock(s) t h a t T needs from

U There is a n csception if, by the time the "nound" takes effect C

has already finished and lcleased its locks In that case C' survives and need riot be rolled back

(b) If C' is older than T then T waits for the lotk(s) held by IT

E x a m p l e 19.15 : Let us consider the wait-die schcmc using the transactions

of Esalnple 19.14 \Ye shall assume that T17T2: T.$ T4 is the order of times: i.e.:

Tl is the oldest transaction lye also assume that ~ v h e n a transaction rolls back

it does not restart soon enough t o become active before the other transactions finish

Figure 19.14 sho\x-s a possible sequence of events under the wait-die schcme

TI gets the lock on 4 first \Yhen T2 asks for a lock on 4, it dies; because TI

Figure 19.14: Ictions of transactions detecting deadlock under the wait-(lie schenie

1 ) 11(-4): rl(-4):

2) l2 (A); Waits 3) 13(B): r ~ ( B 1 ;

Trang 19

1016 CH.4PTER 19 JIORE ABOUT TR=II\~SACTION AIIAh'AGEIIEYT

Why Timestamp-Based Deadlock Detection Works

We claim t h a t in either the wait-die or wound-wait scheme, there can be

no cycle in the waits-for graph, and hence no dcadlock Suppose other- wise; that is there is a cycle such as TI -+ T2 -+ T3 -+ TI One of the transactions is t h e oldest, say T?

In the wait-dic scheme, you can only wait for younger transactions

Thus, it is not possible that TI is waiting for TI, since T2 is surely older than TI In the wound-wait scheme, you can only wait for older transactions Thus, there is no way Tl could be 11-aiting for the younger T3 \Ye conclude that the cycle cannot exist, and therefore there is no deadlock

is older than T2 In step (3), T3 gets a lock on B, but in step (4) T4 asks for

a loclc on d and dies because TI, the holder of the lock on A, is older than T4

Sext, T3 gets its lock on C and conlpletes n'hen Tl continues, it finds the lock

on B available and also completes a t step (8)

Sow, the two transactions that rolled back - T2 and T4 - start again -

Their timestamps a s far as deadlock is concerned, d o not change: T2 is still older than T4 Honever, XT-e assume that T4 restarts first, a t step (9) and when the older transaction T.L requests a lock on -I a t step ( l o ) , it is forced to n-ait

but does not abort Ti completes a t step (12), and then TI is allov-ed to run to completion, as slion-n in the last three steps

E x a m p l e 19.16: Sext, let us consider the same transactions running urlder the 11-ound-wait policy, as shown in Fig 19.15 As in Fig 19.14, Tl begins by locking -I When T2 requests a lock on -I a t step (2); it waits, since Tl is older

than T2 After T3 gets its lock on B a t step (3), T4 is also made t o wait for the lock on .a

Then, suppose t h a t TI cont,inues a t step (5) with its request for the lock on

B That lock is already held by T3; but Tl is older than T3 Thus, TI .'wounds'- T3 Since T3 is riot yet finished, the rvound is fatal: T3 relinquishes its lock and rolls back Thus; TI is able to complete

\\:hen Tl makes the lock on 1 available, suppose it is given t o T2 n-hich

is thcn a l ~ l e to procccd After T2, the lock is given to T 4 : which proceeds to

coniplction Finally T3 restarts and co~llpletcs ~vithout interference

19.3.5 Comparison of Deadlock-Management Met hods

In both the nait-die and n-ound-wait schc~n~es, older transactions kill off newer transactions Since tra~isactions restart ivith their old timestamp eventually each trallsaction becomes the oldest In tlie system and is sure t o complete This

guarantee that every transaction eventually completes is called n o starvat~orl

Xotice that otllcr schcnles described in this scction d o not necessarily prevent

We sliould also consider the advantages and disadvantages of both wound- n-ait and wait-die x h e n compared with a straightfor\vard construction and use

of the waits-for graph The important poi~lts are:

Both wound-wait and wait-die are easier t o implement than a system that maintains or periodically constructs the waits-for graph The disad- vantage of constructing the waits-for graph is even more extreme when the database is distributed and the naits-for graph must be constructed from a collection of lock tables a t different sites See Section 19.6 for a discussion

Lsing the waits-for minimizes the number of times we must abort

a transaction because of deadlock fi never abort a transaction unless there really is a deadlock On the other hand either wound-wait or wait- die will solnetimes roll back a transaction when there a-as no deadlock and no deadlock 11-ould have occurred had the transaction been allo~ved

t o survive

19.3.6 Exercises for Section 19.3

E x e r c i s e 19.3.1: For each of the sequences of actions belorv assume t h a t shared locks are requested immediately hcfore each read action and exclusive locks are lequested immediately heforc every \\-rite action .ilso, unlocks occur imnlediately after the filial action that a transaction executes Tell what actions are denied, and nhether deadlock occurs Also tell holv tlie waits-for graph evolves during the executioll of the actions If there are deadlocks, pick a transaction to abort, and show how the sequence of actions continues

Trang 20

1018 CHAPTER 19 IWORE ABOUT TRANSACTION MANAGE-\IEATT

Exercise 19.3.2 : For each of the action sequences in Exercise 19.3.1, tell n-hat happens under the wound-wait deadlock avoidance system .Assume the order of deadlock-timestamps is the same as the order of subscripts for the transactions, that is, Tl,T2, T3,T4 Also assume that transactions t h a t need t o restart d o so

in the order that they were rolled back

Exercise 19.3.3 : For each of the action sequences in Exercise 19.3.1, tell what happens under the wait-die deadlock avoidance system Make the same assumptions as in Exercise 19.3.2

! Exercise 19.3.4: Can one have a waits-for graph with a cycle of length n, but

no smaller cycle, for any integer n > l ? What about n = 1, i.e., a loop on a node?

!! Exercise 19.3.5 : One approach t o avoiding deadlocks is to require each transaction to announce all the locks it wants a t the beginning, a n d t o either grant all those locks or deny them all and make the transaction wait Does this approach avoid deadlocks due t o locking? Either explain why, or give an example

of a deadlock that can arise

! Exercise 19.3.6: Consider the intention-locking system of Section 18.6 De-

scribe how t o construct the waits-for graph for this system of lock modes Espe- cially, consider the possibility that a database element A is locked by different transactions in modes IS and also either S or Ix If a request for a lock on '1

has t o wait, what arcs do we draw?

*! Exercise 19.3.7: In Section 19.3.5 we pointed out t h a t deadlock-detection

methods other than wound-wait and wait-die do not necessarily prevent starvation, where a transaction is repeatedly rolled back and never gets t o finish

Give a n example of how using the policy of rolling back any transaction that

~vould cause a cycle can lead to starvation Does requiring that transactions request locks on elements in a fixed order necessarily prevent starvation? \That about timeouts as a deadlock-resolution mechanism?

19.4 Distributed Databases

We shall now consider the elements of distributed database systems In a distributed system, there are many, relatively autonomous processors that may participate in database operations Distributed databases offer several oppor- tunities:

1 Since many machines can be brought t o bear on a problem, the opportu- nities for parallelisn~ and speedy response t o queries are increased

One important reason t o distribute data is that the organization is itself distributed among many sites, and the sites each have data that is germane pri- marily to that site Some examples are:

1 A bank may have many branches Each branch (or the group of branches

in a given city) will keep a database of accounts maintained a t that branch (or city) Customers can choose t o bank a t any branch, but will normally bank a t "their" branch, where their account data is stored The bank may also have d a t a that is kept in the central office, such as employee records and policies such as current interest rates Of course, a backup of the records a t each branch is also stored, probably in a site that is neither

a branch office nor the central office

2 A chain of department stores may have many individual stores Each store (or a group of stores in one city) has a database of sales a t t h a t store and inventory a t that store There may also be a central office with data about employees, chain-wide inventory, credit-card customers, and information about suppliers such as unfilled orders and what each

is owed In addition there may be a copy of all the stores' sales d a t a in

a "data warehouse." which is used t o analyze and predict sales through ad-hoc queries issued by analysts: see Section 20.4

3 A digital library may consist of a consortium of universities that each hold on-line books and other documents Search a t any site xvill examine the catalog of documents available a t all sites and deliver an electronic copy

of the document t o the user if any site holds it

In some cases, what we might think of logically as a single relation has been partitioned among many sites For example, the chain of stores might be imagined t o have a single sales relation, such as

S a l e s ( i t e r n , d a t e , p r i c e , p u r c h a s e r )

Trang 21

I Factors in Communication Cost I .As b a n d ~ i d t h cost drops rapidly one might wonder whether communication cost needs to be considered when designing a distributed database system S o w c e ~ t a i n kinds of data are among the largest objects managed electronically, so even with very cheap communicatioil the cost of sending

a terabyte-sized piece of data caniiot be ignored Ho~vevcr, comlnunication cost generally involves not only the shipping of the bits, but several layers

of protocol t h a t prepare the data for shipping, r e c o n s t i t ~ ~ t e them a t the receiving end, and manage the communication These protocols each require substantial computation While computation is also getting cheaper, the con~putation needed t o perform the communication is likely to remain significant, coinpared to the needs for conventional, single-processor exe- cution of key database operations

However, this relation does not exist physically Rather it is the union of a number of relations with the same schema, one a t each of the stores in the chain These local relations are called fragments, and the partitioning of a logical relation into physical fragments is called Aorzzontal decomposztion of the relation S a l e s We regard the partition as "horizontal" because we ma?;

visualize a single S a l e s relation with its tuples separated by horizontal lines

into the sets of tuples a t each store

In other situations, a distributed database appears t o have partitioned a relation "r~erticall~;" by decomposing ~ v h a t niight be one logical relatiori into two or more, each with a subset of the attributes, and with each relation a t a different site For instance if lye want t o find out which sales a t the Boston store

\(-ere made t o customers who are more than 90 days in arrears on their credit- card payments, it \%-ould be useful t o have a relation (or view) that included the item date, and purchaser info~mation from Sales alorig with the date of the last credit-card payment by that purchaser Howel-er, in the scenario we are describing, this relation is decomposed vertically, and \ye ~vould have t o join the credit-card-custorner relation a t the central headquarters with the fragment of Sales a t the Boston store

Rather a transaction consists of conimunicating transactzon components each

at a different site and communicating with the local scheduler and logger Two important issues t h a t must thus be looked a t anelr arc:

1 How do n e manage the comniit/abort decision when a transaction is distributed? K h a t happens if one component of the transaction wal1tS t o abort the ivhole transaction, ~yhile others encountered no problem and lyant to commit:' jve discuss a technique called two-phase commit" in Section 19.5: it allors the decision t o he made properly and also frequently allows sites that ale up t o operate even if s o n ~ e other site(s) have failed

2 How do n e assure serializability of transactions that involve components

a t several sites'? \fi look a t locking in particulal, in Section 186 and see how local lock tables can be used t o support global locks on database r.lenlmts and thus support serialirab~lity of transactions in a distributed environment

19.4.3 Data Replication

Oire important advantage of a distr~buted system is the ability t o replicate d a t a , that is t o make copier of the d a t a a t diffeiellt sites One slotivation is that if a site fails, there may be other sites that can provide the same data that as a t tlie failed site h second use is ill inlpmving the speed of query answrilrg by makillg a copy of needed d a t a available a t tlie sites where queries a r e initiated For example:

1 \ bank may lllake copies of current interest-rate policy arrilable a t eacll branch so a qucry about rates does not have t o lie sent t o the central office

2 \ chain store may keep c o p i c ~ f infolmation about soppliers a t each store so local rcqucsts for infornlatioll about suppliers (e.g t h r ma~lnger needs the phone n u ~ i ~ b e r of a si~pplier t o cliecl; on a slliplne~lt) CBI be handled 11-ithout scndillg messages to the ccntral office

3 I digital library may temporarily cache a copy of a poplilar document a t

a school ~vlicre students haye bee11 assigned to read tlie docunlent Holve\er there are problems tlrat most bc faced a h e n data is replicated

a) HoXv do w keep copies identical? 111 n s m c e an update t o a replicated data elemel?t heconles a distri1,utc.d transaction that updates all copics b) Holy do lye decide \illprc and llcjii illany copies to kerp'? The siori cnl~i's the Illore effort is rc<lllircd to pil lilt^ 1 1 ~ t tlic casirr qurrics ~ > E C O I ~ I C For exalllple a r~]atioll flint is rarely opdatcd nright have copies crrryhllcre for lilaxinlrim efficiency ivhile a frecl~icntly updated relation might have only one or t ~ o copies

C) 1Yh.t happals "hen there is a cornnliillication failure in the netivork and different copies of the same tlstir have the o ~ p o r t i m i t y t o evolve separately and must then be reconciled d e n the netur-ork reconnects?

Trang 22

1022 C m P T E R 19 .\IORE ABOUT TR.4NS.4CTIOAV V~\I~~~V-~GE-~\IE-\~T 19 j DISTRIBUTED CG-'\I\IrT

Tlie existence of distributed data also affects the complexity and options available in the design of a physical query plan (see Section 16.7) Among the issues that must be decided as we choose a physical plan are:

1 If there are several copies of a needed relation R, from which do rye get the valuc of R?

2 If we apply a n operator, say join, t o two relations R and S , n-e have several options and must choose one Some of the possibilities are:

(a) We can copy S to the site of R ant1 d o tlie colnputation there

(b) We can copy R to the site of S and d o the computation there

(c) 117e can copy both R and S to a third site and do the conlputation

a t that site

Which is best depends on several factors, including which site has available processing cycles and whether the result of the operation will be combined with data a t a third site For example if we are computing (R w S ) w T

Tve may choose to ship both R and S t o the site of T and take both joins there

If a relation R is in fragments R 1 , R2, , R, distributed among several sites, 11-e should also replace a use of R in the query by a use of

RI U R2 U U R,,

as we seiect a logical query plan The query may then allow us t o simplify the expression significantly For instance, if the R,'s each represent fragments of

with a single store, then a query about sales a t the Boston store might allon-

us to leniove all R,'s except the fragment for Boston from the union

*!! Exercise 19.4.1: The following exercise ~vill allow you to address sonie of

the problcrns that come up when deciding 011 a replication strategy for data

Suppose there is a relation R that is accessed from n sites Tlie it11 site issncs

qi queries about R and 7 l i updates t o R pcr second for i = 1 2 : n Thc sost of executing a query if there is a copy of R a t the site issuing the cluerj- is

c, wliile if tlierc is no copy there, and the query must be sent to some remote

site: then the cost is 10c The cost of esecuting an update is d for the copy of

R at the issuing site and 10d for every copy of R that is not a t the issuing site

.is a fij~lction of these parameters, how ~rould j-ou choose for large ;en: a set of sites at ~vliich to replicate R

In this section, n.e shall address the ~ r o b l e m of holv a distributed transaction that has components a t several sites can execute atomically T h e next section discusses another important property of distributed transactions: executing them serializably l i e shall begin with a n example that illustrates the problenis that might arise

E x a m p l e 19.17 : Consider our example of a chain of stores mentioned in Sec- tion 19.4 Suppose a manager of the chain wants t o query all the stores, find t h e ii~ventory of toothbrushes a t cach, and issue instructions t o move toothbrushes from store i o store in order t o balance the inventory The operation is done

by a single global transaction T that has cornpoilent T, a t the i t h store and

a coniponent To a t the office where the manager is located The sequellce of activities performed hy T are summarized belolv:

1 Corilponellt To is created a t tlie site of the nlanager

2 To swds messages t o all the stores instructing them t o create components

TI

3 Each T, executes a q u e q a t store i to discover the number of toothbrushes

in ill\-entory and reports this ~ i u ~ n b c r t o To

1 To takes these nuinhers and deterlni~les, by some algorithln we shall not discuss \\-hat d ~ i p m c n t s of tootht)rushci are desired To then sends mcs- sages such as -store 10 should ship 500 toothblushes to store 7" t o the appiopliate stores ( ~ t o r e s 7 and 10 in this instance)

3 Stores receiving instructions update their inventory and perfor111 the ship-

ment s

There are a nulnher of things that could go w o n g in Example 19.17, and many

of these result in violations of the atomicity of T That is, some of the actions comprising T get executed b ~ ~ t o t l i ~ r s do not SIechanisms such as logging and recovery ~vhi,.h n-c assume arc prespnt a t each site, ~vill assure that each Ti is csecuted atomicail? but do not asslirc that T itself is atomic

E x a m p l e 19.18 : Suppose a b11g in rhc algorithnl t o redistribute tootlibrushes migilt cause store 10 to be instructed to ship more toothbrushes than it has Ti0

~vill therefore abort and no tootlibrushcs \<-ill be shipped from store 10; neither will the in\-entory a t store 10 be changed Ho~vever T7 detects no problems and commits a t <tore 7 updating its in\-cntory t o reflect the supposedly shipped

toothbrushes ?;ow not only has T failed t o execute aton~ically (since Tlo never

Trang 23

1024 CHAPTER 19 MORE ABOUT TRA.SSACTION S4AN.4GEAIEST

completes), but it has left the distributed database in an inconsistent state: the toothbrush inventory does not equal tllc number of toothbrushes on hand

Another source of problems is the possibility that a site will fail or be disconnected from the network w h ~ l e the distributed transaction is running

Example 19.19: Suppose Tlo replies t o To's first message by telling its inventory of toothbrushes Ho\vever; the machine a t store 10 then crashes, and the instructions from To are never received by Tlo Can distributed transaction T

ever commit? What should TIo d o when its site recovers?

19.5.2 Two-Phase Commit

In order t o avoid the problems suggested in Section 19.5.1, distributed DBMS's use a complex protocol for deciding whether or not to commit a distributed transaction In this section, \re shall describe the basic idea behind these protocols, called two-phase commit By making a global decision about committing, each compo~ient of the transaction will commit, or none will -4s usual

~ v e assume that the atomicity mechanisms a t each site assure that either the local component commits or it has no effect on the database state a t that site:

i.e., components of the transaction are atomic Thus, by enforcing the rule that either all components of a distributed transaction commit or none does

we make the distributed transaction itself atomic

Several salient points about the trvo-phase commit protocol folloxv:

In a two-phase commit, we assume that each site logs actions a t that site

but there is no global log

\Ye also assume that one site, called the coordznator, plays a special role

in deciding whether or not the distributed transaction can commit For example the coordinator might be the site a t which the transaction orig- inates, such as the site of To in the esalnples of Scction 19.5 1

The two-phase commit protocol involves sending certain ~nessagcs between the coordinator and the other sites .Is each message is sent, it is logged a t the sending site, t o aid in Iecovery should it be necessary

K i t h these points in mind, n.c can describe the two phases in terms of the messages sent between sites

P h a s e I

In phase 1 of the two-phase commit the coordinator for a distributed transaction T decides when t o attempt to connnit T Presumably the attempt to commit occurs after the component of T at the coordinator site is ready to

"0 not confuse tao-phase commit tlith tno-phase locking They are independent ideas

designed to solve different problems

commit, but in principle the steps must be carried out even if the coordinator's component lvants to abort (but mith o b v i o ~ s simplifications as rve shall see) The coordinator polls all the sites mith compollelits of the transaction T t o determine their wishes regarding the commit/abort decision

1 T h e coordinator places a log record < P r e p a r e T > on the log a t its site

2 The coordinator sends to each component's site (in principle including itself) the message p r e p a r e T

3 Each site receiving the message p r e p a r e T decides whether t o commit or abort its component of T The site can delay if the component has not yet completed its activity, but must eventually send a response

4 If a site wants to commit its component, it must enter a state called

precommitted Once in the precommitted state, the site cannot abort its component of T without a directive t o d o so from the coordinator T h e following steps are done t o become precommitted:

(a) Perform whatever steps are necessary t o be sure the local component

of T \$-ill not have t o abort, even if there is a system failure follo~ved

by recovery a t the site Thus not only must all actions associated

~ v i t h the local T be performed but the appropriate actions regarding the log must be taken so that T will be redone rather than undone

in a recover): The actions depend on the logging method, but surely the log records associated \\-it11 nctions of the local T must be flushed

t o disk

(b) Place the record <Ready T > on the local log and flush the log t o disk

(c) Send t o the coordinator the message ready T

However the site does not commit its component of T a t this time; it must ~ ~ a i t for phdae 2

3 If; instead, the site Ivants to abort its component of T: then it logs t h e record <Don't commit T > and sends the message d o n ' t commit T t o the coordinator It is safe to abort the component at this time, since T

xvill surely abort if even one cornpontnt wants t o abort

The messages of phase 1 are suxmnarizcd in Fig 19.16

Phase I1

The second phase begins ~vlien responses r e a d y or d o n ' t commit are receixed from each site by the coordinator However it is possible that some site falls to respond: it may be down or it has been disconnected by the network 1x1 that case after a suitable timeout period the coordinator tvill treat the site as if it had sent d o n ' t commit

Trang 24

1026 C'H.'tPTER 19 JIORE ABOUT TR.41WCTIOA\T di'A-i-YAGEIIE-\rT

prepare

f / O

ready or

O ~ O ( H don't commit

Figure 19.16: Messages in phase 1 of two-phase colnnlit

1 If the coordinator has received ready T from all components of T1 then

it decides t o commit T The coordinator (a) Logs <Commit T > a t its site, and (b) Sends message commit T t o all sites involved in T

2 If the coordinator has received don't commit T from one or more sites, '

Figure 19.17: 1Icssages in phase 2 of two-phase corn~nit

19.5.3 Recovery of Distributed Trallsactions

.It any time during the two-phase commit process, a site may fail \Ye need

t o make sure that what happens when the site recovers is consistent ~ v i t h the global decision that was made about a distributed trdnsaction T There are several cases t o consider: depending on the last log entiy for T

1 If the last log record for T was <Commit T > , then T must have been committed bv the coordinator Depending on the log nletl~od used, i t

1 - may bc necessary to redo the component of T a t the recovering site

I 2 If the last log record is <Abort T> then sinlilarly we kno~v that the

global decision was t o abort T If the log method requires it we undo the component of T a t the recolering site

3 If the last log record is <Don't commlt T > , then the site knon-s that tllc global decision must have been to abort T If necessary effects of T on the local database arc undone

4 The hard case is when the last log record for T is <Ready T> Sow, the recovering site does not know 13-liether the global decision was t o conimit

or abort T This site must coinlnunlcate wit11 a t least one other site t o find out the global decision for T If the coordinator is up, the site call ask the coordinator If the coordinator is not up a t this time some otller site may be asked t o consult its log to find out what happcncd t o T In the \Torst case no other site can be contacted and the local cornpollent

of T must be kept active until the cornmit/abort decision is deterrninecl

3 It may also be the case that tlle local log lias no records about T tllat conle from actions of tlle tlvo-phase commit protocol If so, then the recovering site may unilaterally decide t o abort its component of T : ~vhich

is consistent n.ith all logging nlethods It is ~ ~ o s s i b l c that t l ~ c coorclinator already detected a timeout from the failetl site ant1 decitfcd t o abort T If

the failure \vas brief: T may still be active a t other sites but it ~vill never

be inconsistent if the recovering site decides to abort its colliponent of T

and responds \\-it11 don't commit T if later polled in phasc 1

The above analysis assumes that tlic failed site is not the coortiinator IVhcll the coordinator fails during a two-phase commit, n c ~ v problems arise First, the survivilig participant sites niust either \T-ait for the coordinator t o recover or elect a new coordinator Since the coordi~lator co~tld be dolvn for an indefinite period there is good nlotivation t o elect a nexv leader: a t least after a brief

~vaiting period to see if the coordinator conies hack up

The matter of lender election is in its on.11 right a cornples p r o b l r l ~ ~ of distributed systems beyond the scol~c of this l~ooli Hon-cvcr a si~nplt> tncthod will work in most situations For instance n-e ilia\- assume that all participallt sitc,s

h a v ~ uniqnr idcntif\-ing nl~rnbcrs: IP at1tlrci;scs n-ill n-ork in ninny sitllatiol~s Each participant sends nlessages almou~lcil~g its a~ailahility as 1e;idcr t o ;ill thr' other sites pil-ing its identifying nunlbrr After a suitable length of time each participant ackno~vledges as the neu- coordirlator tlle lowest-n~lnlbered site from nhicli it has Ileal-d and sends messages to that effect to all the otllcr sites If all sites receive consistent messages: then there is a unique choice for new coordinator and everyone kao\vs about it If there is i~iconsistellcy or a s~lrrivillg

Trang 25

1028 CHAPTER 19 AIORE ABOUT TRAIVS,~CTION -\I.;li\'- IGEI\fE-\-T

sitc has failed t o respond, that too will be universally kno~vn, and the election stalts oler

Now, the new leader polls the sites for information about each distributed transaction T Each site reports the last record on its log concerning T , if there

is one Tlle possible cases are:

1 Some site has <Commit T> on its log Then the original coordinator must have ~vanted t o send commit T messages everywhere, and it is safe

t o commit T

2 Similarly, if some site has <Abort T> on its log, then the original coordinator must have decided to abort T, and it is safe for the new coordinator

t o order that action

3 Suppose now that no site has <Commit T > or <Abort T > on its log, but

a t least one site does not have <Ready T > on its log Then since actions are logged before the corresponding messages are sent, we know that the old coordinator never received r e a d y T from this site and therefore could not have decided t o commit It is safe for the neTv coordinator t o decide

t o abort T

4 The hard case is when there is no <Commit T > or <Abort T > t o be found, hut every surviving site has <Ready T> Sow, we cannot be sure whether the old coordinator fo~und sonle reason t o abort T or not; it could have decided t o do so because of actions a t its oxvn site, or because of a

d o n ' t commit T message from another failed site, for example Or the old coordinator may h a x decided to commit T and already conimitted its local conlponelit of T Thns, the nen- coordinator is not able t o decide xvhether t o comniit or abort T and must wait until the original coordinator recovers 111 real systems, the database administrator has the ability

to intervene and manually force the waiting transaction comporielits to finish The result is a possi1)Ic loss of atomicity, but the person executing the blocked transaction will be notified to t,ake soille appropriate compensating action

! E x e r c i s e 19.5.1: Consider a transaction T initiated at a home computer that

a ~ k s bank B to transfer $10.000 from a n acrount a t B to an account at anothel I~ank C

* a) \That are the colnponents of distributed transactio11 T? \That should tlie conlponents a t B and C do?

b) \Vllat can go lvrong if there is not $10.000 in the account a t B?

c ) \That can go wrong if one or both banks' computers crash, or if the netxvork is disconnected?

d) If one of the problems suggested in (c) occurs, how could the transaction resume correctly when the computers and network resume operation? Exercise 19.5.2 : In this exercise, n-e need a notation for describing sequences

of messages t h a t can take place during a two-phase commit Let (i, j , 3f) mean that site i sends the message ,If to site j, where t h e value of AI and its meaning can be P (prepare) R (ready), D (don't commit), C (commit), or A (abort)

We shall discuss a simple situation in which site 0 is the coordinator, but not other:\-ise part of the transaction, and sites 1 and 2 are the components For instance, the following is one possible sequence of messages that could take place during a successful commit of the transaction:

* a) Give a n example of a sequence of messages t h a t could occur if site 1 wants

t o commit and site 2 xvants t o abort

*! b) How Inany possible sequences of messages such as the above are there, if the transaction successfully commits?

! c) If site 1 wants t o commit, but site 2 does not, how many sequences of messages are there, assuming no failures occur?

! d ) If sitc 1 wants t o commit but site 2 is down and does not respond to messages, how many sequences are there?

!! Exercise 19.5.3: Csing the notation of Esercise 19.5.2, suppose the sites are coordiliator and n other sites that are the transaction components As a function of n how many sequences of messages are there if the transaction successfully commits'?

19.6 Distributed Locking

In this section we shall see how to extend a locking scheduler t o an environment where transactions are distributed and consist of components at several sites n'e assume that lock tables are managed by individual sites, and that the component of a transaction at a site can only request a lock on the data elements

at that site

I\'hen data is leplicated n c must arrange that the copies of a single element S are changed in the same n-a? b! each transaction This r~quircment introduces a tlistinctioll betn-een locking the loy~cal database element S and locking one or more of the copies of S In this section, lve shall offer a cost model for distributed locking algorithms that applies t o both replicated and nonreplicated data However, before introducing the model, let us consider a n obvious (and someti~nes adequate) solution t o t h e problem of maintaining locks

in a distributed database - centralized locking

Tiêu đề	Validation Rules in Transaction Management
Trường học	University of Database Systems
Chuyên ngành	Database Systems
Thể loại	Tài liệu
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	50
Dung lượng	3,97 MB