For instance, there is nothlng in the serializability definition that forbids a transaction with a lock on a n element A from writing a new value of A into the database before committing
Trang 11 START thc scLt of t r a n s a c t i o ~ ~ s that have started but not yet completed va!idation For each transaction T in this set the scheduler maintains ST.4R-1 (T) the tilnc a t which T started
2 K4L; the set of transactions that have been validated hut not yet finished tlie n-riting of phase 3 For each transaction T in this set, the scheduler niairitains both srr.-\nr(T) and \:-\L(T), the time a t which T valiciated
S o t e that \ ~ L ( T ) is also thc time a t which T is irnagined t o execute ill the hypotlirtical serial order of esccutioi~
3 FIIV: the set of trai~sactio~is that have corripletcd phase 3 For thesc tra~isactions T , the scheduler records S T A R T ( T ) , \'.-\I.(T), and F I S ( T ) : the time a t which T finished In principle this set grows, but as a-e shall see
n-e do not havc t o remember transaction T if ~ l n ( T ) < ST.~KT(C-) for any actir~c transaction U (i.e for any U in START or V A L ) The scheduler may thus periodically purge the FIN set t o keep its size from growing beyond bounds
18.9.2 The Validation Rules
If rnaintaincd by t h e scheduler the information of Section 18.9.1 is cnotigh for
it to detect any potential violation of the assulned serial order of the transac- tions - the order in which the trai~sactions validate To understand tlie rules
Irt us first consider what can be I\-long ~ v h e ~ i w\-r try to validate a transaction
T
T reads X
/ U writes X
U stalt T start U validated T validating
Figure 18.43: T cannot ~ a l i d a t e if an earlier transaction is nolv ~viiting some- thing tlrat T slioulci have rcati
1 Supposcx tlir~rc, is ;I transaction L7 sur.11 t11;it:
(a) C is in 1/;-lL or FLV: that is C- has vnlid;~tcd
(b) F I S ( C ) > s'I-~\RT(T): that is, C tiid not finish beforc T started.'"
'"ore tlrat if 1: is in VAL then C has not yet firris11c.d when ? validates In that case
FIX((.') is trclirricall? l~ndefined Holvever we lirlon it mrlst he largpr than ST;\KT(T) in this
is shown in Fig, 18.43 To interpret the figure note that the dotted lines
connrct the eyents in real time ~ v i t h the time a t which they xvould have occurred had transactions bee11 executed a t t h e molnent they validated Since n.e don't kno~v n-hether or not T got t o read li's value, \ve must
rollback T t o avoid a risk that the actions of T and U will not be consistent
~vitli the assumed serial order
2 Suppose there is a transaction U such that:
(a) U is in VAL: i.e., U has successfully validated
(h) F I S ( U ) > \:-\L(T); that is, U did not finish before T entered its validation phase
(c) \ v s ( T ) n \\.s(U) # 0: in particular let S be in both \\-rite sets Thcn the potential probleni is as sho~vn ill Fig 18.44 T and li must both
\\rite values of S , and if \vc let T validate it is possible t h a t it will wiite
S before I - does Since \ve cannot be sure n e rollback T t o make sure it does not violate the assumed serial order in which it f o l l o ~ s C'
T writes X
I U writes X
D validated T validating U finish
Figure 18.41: T cannot validate if it co~ild tl ~ e n m i t e something ahead of a n earlier transaction
Tile two descrillpd above are the only situations in I\-hich a write
T could I,e p l ~ ~ s i c a l l y ullrcalizablt In Fig 15.43 if C finished before 7'
starred tlle~l sure]! T lv0~~ltl read tlic va111c of S that either c- or sollle later trallsaction n.roce In Fig 18.44 if C finished hefore T validated then surely
C' lvrote y before T did \Ye may tli~ls sunllnarize these observations with the follon-ing rule for validating a transaction T :
Check that R S ( T ) n \\.s(U) = 0 for any previously validated C' that did not finish before T startcd, i.e if F I S ( ~ ) > S T A R T ( T )
Trang 2982 CHAPTER 18 COAiCURRENCY C'OXTROL
Check t h a t wS(T) n W S ( U ) = 0 for any previously validated U that did not finish before T validated, i.e., if F I S ( U ) > v.%L(T)
Example 18.29 : Figure 18.45 shows a time line during which four transactiorls
T, U , V , and IV attempt t o execute and validate The read and write sets for each transaction are indicated on the diagram T starts first, although U is the first t o validate
Figure 18.45: Four transactiorls and their validation
1 \'alidation of U: When U validates there are no other validated transac- tions, so there is nothing t o check U validates successfully and writes a value for database element D
2 \lidation of T : When T validates, LT is validated but not finished Thus
lve must check t h a t neither the read nor write set of T has anything
in common with W S ( U ) = { D ) Since R S ( T ) = {.4 B ) and m ( T ) =
3 \%lidation of IT: \lilien 17 validates li is validated and finished and T
is validated but not finishtd Also I ' started hefore C finished 711~5
n e n ~ u s t compare bath R S ( I ' ) and n ~ ( 1 3 against w s ( T ) Lilt onlv R S ( I )
nerds to be compared against \\.s(l*) \\e find:
R S ( ~ - ) n u s ( T ) = { B ) n { - 4 C ) = 0
ns(17) n ~ z s ( T ) = { D , E ) n {-4.C) = 0
R S ( ~ * ) n ~ ( u ) = { B ) n { D ) = 0
Thus, I - also validates successfully
lrou may have been concelned xvith a tacit notion that validation takes place in a moment, or indivisible instant of time For example, we i~nagine that vie can decide whether a transaction U has already validated before
we start t o validate transaction T Could U perhaps finish validating while n-e are xalidating T?
If we are running on a uniprocessor system, and there is only one scheduler process, we can indeed think of validation and other actions of the scheduler as taking place in a n instant of time The reason is that if the scheduler is validating T, then it cannot also be validating U , so all during the validation of T , the validation status of U cannot change
If I\-e are running on a multiprocessor, and there are several sched- uler processes, then it might be t h a t one is validating T while the other
is validating U If so, then we need t o rely on whatever synchroniza- tion mechanism the ~nultiprocessor system provides to make validation an atomic action
4 Iralidation of 15': \'i;hen \ I T validates, ~ \ - e find that U finished bcfore Ili
started so no co~nparison b e t w e n IV and U is performed T is finished before 11 validates but did not finish before Ti7 started, so [ve compare onl\- R S ( T V ) with \j's(T) I is validated but not finished so x e need t o cornpale both ~s(T1') arid I\ ~ ( 1 1 ~ ) with ws(T) These tests are:
~ s ( r l / ) n w s ( ~ ) = {A4 D ) n { l;C) = {.A)
~ s ( r v ) n ws(l') = {.4 D ) n { D E } = { D l
\vs(11-) n ws(17) = {.-I C ) n {D; E ) = 0
Since the i~ltersections are not all empty Ti7 IS not validated Rather, T I T
is rolled back and does not write values for -I or C
18.9.3 Comparison of Three Concurrency-Control
Mechanisms Tile tllrce approaches to serializabllity that n-e have collsidered locks times- tamps and validation - each have their advantages First they can be corn- pared for their storage utilization:
Locks: Space in the lock table is proportional t o the number of database elements locked
Trang 3Tzmestamps: In a naive implementation, space is needed for read- and write-times with every database element, nhether or not it is currently accessed However, a more careful implenlentation \%-ill treat all times- tamps that are prior t o the earliest active transaction as "minus infinity.' and not record them In that case we can store read- and write-times in
a table analogous t o a lock table, in which only those database elements that have been accessed recently are mentioned a t all
Validation: Space is used for timestamps and read/\vrite sets for each currently active transaction, plus a few more transactions that finished after some currently active transaction began
Thus, the amounts of space used by each approach is approximately propor- tional to the sum over all active transactions of the number of database elenle~lts the transaction accesses Timesta~nping and validation may use slightly more space because they keep track of certain accesses by recently committed trans- actions that a lock table ~vould not record X poter~tial problem with validation
is that the w ~ i t e set for a transaction must be known before the xrites occur (but after the transaction's local cornputation has been conlpleteti)
It'e can also conipare the methods for their effect on the ability of transac- tions to complete tvithout delay The performance of the three methotfs depends
on whether interaction among transactions (the likelihood that a tra~lractioci will access an elenlent that is also being accessed by a concurrent transaction)
is high or low
Locking delays transactions but avoids rollbaclts even ~ v h e n interactio~l
is high Tiniestamps and validation do not delay transactions but call cause them t o rollback, which is a niore serious form of delay and also
~ ~ a s t e s resources
If interference is lo\v then neither timestamps nor validation ~vill cause many rollbacks and may be preferable t o locking because they generally have lolver overhead than a locking scheduler
\\-hen a rollback is necessary, tinlestamps catch some proble~ns earlier than validation, which altx-ays lets a transactioll do all its i ~ i t e r ~ l a l n-ork before considering whether the transaction niust rollback
Exercise 18.9.1 : In the follo~vi~lg scquc.nccs of events \\e IISP R,(.\-) to mcnn
"transaction T, starts, and its read set IS the list of d a t a b a ~ e elc~nents S." =\lqo
I/, lrieans .'T, attempts to talidate." and II;(.Y) lneans that T, finishes and its write set was S." Tell n h a t happens n-lien each sequence is piocessect b j a validation-based scheduler
* a) R1(.4.B); R r ( B , C ) ; 1;; R3(C D): 15: II;(.4): I > : TI:L(,4): 11;(B):
b) R1(-4.B): R 2 ( B , C ) : Vl; R s ( C , D ) , t:; fT-1(~4); 15: 11'2(A4); 1 i 7 3 ( ~ ) :
C ) R1(.4.B); Rr(I3.C); 15; R3(C D): 15; I I 7 l ( c ) : 1:; 11'2(-+1): 1ir3(D); d) R1(-4.B); R 2 ( B C ) : R3(C); V1: i5; If3; llTl(-4): Ilr2(B); fv3(c):
e) Rl(.-I.B); R 2 ( B C ) ; R3(C); 1;: 1;: V3; ll'-l(C): 11-z(B); 1i73(>4):
f ) Rl(-4.B): R 2 ( B , C ) ; R3(C); 11: 1;: 1;; Ll-1 (-4) I17z(C): 1$-3(B):
18.10 Summary of Chapter 18
+ Conszstent Database States: Database states that obey xhatever i~nplied
or declared constraints the designers inte~lded are called consistent It
is essential that operations on the database preserve consiste~lcy that is they turn one consistent database state into anothel
+ C o n s ~ s t e n c ~ of Concurrent Transacttons: I t is normal for several trans- actions t o have access t o a database a t the same time Trarisactions, run
111 isolation, are assumed t o preserve consistency of the database It is the job of the scheduler to assure that concurrently operating transactions also preserxe the consistency of t h e database
+ Schedrrles: Tra~lsactions are brokcn into actions, lnaillly reading and writ-
i ~ l g from the database X sequcnce of these actions from one or more tra~lsactiolls is called a schedule
+ Serial Schedules: If trallsactio~ls esecutc ollf ar a time, the s~ht!du!C is said t o be serial
+ Serializable Schedules: i schcdnle t h a t is equivalent in its effect on the database t o sollle serial schedule is said t o bc serializable 111terlcat-i11g of actions from transactions is I~ossible in a serializable schedule that
a t least one of ~vhich actions is ~vritc
+ PVecedence Gmyhs: in easy tcst for cullflirt-serializal~ility is to construct
a precedellce graph for the schedule Sodes correspond t o transactions and there is a n arc T + C if some action of T in the schedule conflicts n-itIl a later action of c .\ schedule is conflict-serializable if and onl> if the precedence graph is ac\-clic
Trang 4CH.-IPTER 18 CONCURRESCY C O S T R O L
+ Locking: The most common approach to assuring serializable schedules is
to lock database elernents before accessillg them, and t o release the lock after finishing access t o the element Locks on an eleluent prevent otlier transactions from accessing the element
+ TWO-Phase Lockzng: Lorking by itself does not assure serializability How- ever, two-phase locking, in which all transactions first enter a phase ~vhere they only acquire locks, and then enter a phase d i e r e they only release locks will guarantee serializability
+ Lock Modes: To a\-oitl locking out transactions unnecessarily, systems usually use several lock modes, with different rules for each lriode about when a lock can be granted Most common is the system with shared locks for read-only access and esclusive locks for accesses that include writing
+ Compatzbzlzty Matrzces: A compatibility matrix is a useful summaiy of xhen it is legal t o grant a lock in a certain lock mode, given that there may be other locks, in the same or other rnocles, on the same elelnent
+ Update Locks: A scheduler can allow a transactiori that plans t o read and then write an element first to take an update lock, and later t o upgrade the lock to esclusive Update locks call be granted hen there are already shared locks on the elcmerit: but once there, a n update lock prevents vtlier locks from being granted on tliat element
+ Increment L o c h : For the common case where a transaction n a n t i only t o add or subtract a constant from an element, a n increment lock is suitable
Increnlent locks on the sanie elelne~lt do not conflict n-it11 each other
although they conflict bit11 shared and e s c l u s i ~ e locks
+ Locking Elements Li'zth a GI-u~zularfty Hzerarchy: \\-hell both large and srnall elenients - relations, disk; blorks and tuples, perhaps - may need
to be locked, a ~ v a ~ l l i n g system of locks enforces serializability Tra~lsac- tions place intention locks on large elements to warn other transactions that tliey plan to access one or more of its subelements
+ Locking Elemen,ts irmnged i n a Tree: If database elements are only ac- cessed by moving dolvn a tree as in a 13-tree index, then a non-tn-o-phase locking strategy call enforce serializability The rules require a lock t o 11e
held on the parent n-llilt, obtaining a lock on tlic child altliough the lock
on the parent c;111 then be rtlleasrd anti adtlitiorial locks taken latcr
+ Optimistic Concurrency Control: Instead of locking, a scheduler can as- sume transactions d l be scrializahle and abort a transactiori if some potentially nonserializable behavior is seen This approach, called opti- mistic, is divided into timestamp-based, and validation-based scheduling
R E F E R E S C E S FOR CII.4PTER 18
+ Timestamp-Based Schedulers: Tliis type of scheduler assigns tirnesta~ilps
t o transactio~ls as they begin Database elements have associated read- and write-times, \\.!lich are the tiniestanlps of the transactions that most recently 1;erformed those actions If a n irnpossible situation, such as a read by one transaction of a value that s a s written in that transaction's future is detected the violating transaction is rolled back, i.e., aborted and restarted
+ Val2dntfon-Based Schedrrlers: These schedlilers validate transactions after tliey haye read pverything they need, but before they write Trar~sactions that have wad or \v111 nritc, a n elenient t h a t some other transaction is in the ploccss of xvriting nil1 have a n ambiguous result, so the transaction
is not val~dated A transaction that fails t o validate is rolled back
+ Mr~ltiverszon Timestamps: A common technique in practice is for read- only transactiolls t o l ~ e scheduled by timestamps but with multiple ver- sio~is, rvhere a !\-rite of a n element does not overwrite earlier values of that ele~nent until all transactions t h a t could possibly need the earlier value have finished IYriting transactions are scheduled by conventional locks
18.11 References for Chapter 18
The book [GI is a n important source for niaterial on scheduling, as well as locking [3] is another important source Two recent surveys of concurrency control are [I21 alid [Ill
Probably tlie most significant paper in t h e field of transaction processing is
[4] on two-phase locking Tlle ~varning protocol for hierarchies of granularity
is from [3] Son-tx-o-phase locking for trees is from [lo] The compatibility matrix was introduced t o study behavior of lock modes ill [7]
Timestaiups a s a concurrency control rilethod appeared in [2] and [I] Sched- uling by ~ a l i d a t i o n is from [a] The use of riiultiple versions was studied by [9]
1 P .\ Brln>tein arid 1 Goodman Ti~nestamp-based algorithms for con- currency control ill distributed database systems." Proc Intl C O I L ~ o n l'ery Large Databnses (1980) pp 28.3-300
2 P ;\ Benlstein S Goodman J 13 Rothnie, Jr and C H Papadirn- itriou -Anal\-4s of sprializabiIity in SDD-1: a system of distributed data- bases ( t h e f u l l rcdlrrlda~lt case)." IEEE Tra11,s on Software En,g~:neering
SE-4:3 (197S) pp 1.54-168
3 P .A Bclnitein \ Hadlilncoi ant1 S Goodman C o n c u ~ r e n c y Corltrol and Recocery 171 Datrrbnsr: Sgstems Iddlson-IYesley Reading \IX, 1987
1 K P Esn-amn J S Gray R -1 Lorie, and I L Traiger "The notions
of consistency and pledicate locks in a database system." C o m m iiCM
1 9 : l l (1976) pp 624-633
Trang 5988 CII.4PTER IS CONCURRENCY CONTROL
5 J N Gray, F Putzolo and I L Traiger "Granularity of locks and degrees
of consistency in a shared data base," in G A I Sijssen (ed.), JJodelzng zn Duta Base 121anngen~ent Systems, North Holland, Amsterdam 19iG
6 J X Gray and A Reuter, 'II-nnsaction Processing: Concepts and Tech- nzques, Morgan-Kaufrnann San Francisco, 1993
7 H F Korth, "Locking primitives in a database system," J ACM 30:l (19831, pp 55-79
8 H.-T Kung and J T Robinson, "Optimistic concurrency control,.' ACM Trans on Database Systems 6:2 (1981), pp 312-326
9 C H Papadimitriou and P C Kanellakis, "On concurrency control by multiple versions," ACM Trans on Database Systems 9:l (1984), pp 89-
This chapter also incllldes an introduction t o distributed databases IVe focus on ho1v to lock elements that are distributed among several sites, perhaps with replicated copies K e also consider how the decision to co~nmit or abort a transaction can be rnade ~vhen the transaction itself involves actions at several sites
Finally, consider the problems that arise due to ''long transactions." There are applications, such as CAD syste~lls or "workflow" systems, in which llumaii and conlputer processes interact, perhaps over a period of days These systelns like short-transaction systems such as banking or airline reservations, need to preserl-e consistency of the database state Ho\T-ever, the concurrexlcy- control methods discussed in Chapter 18 do not rvork reasonably when locks are held for days, or decisions to validate are based on events that 'happened days in the past
19.1 Serializability and Recoverability
In Chapter 17 Xve discussed the creation of a log and its use to recover the database state when a system crash occurs \Ye introduced the vie\\- of database cornputatio~l in which values move bet\\-ecn nonvolatile disk, volatile ~ n a i n - menlor?- and the local address space of transactions The guarantee the various
Trang 6090 CHAPTER 19 AIORE A B O U T TRAj\'SACTION JIALVL4GE-IIEIVT
logging methods give is that, should a crash occur, it ~57ill be able t o reconstruct tlie actions of the committed transactions (and only the committed transac- tions) on the disk copy of the database A logging system makes no attempt
t o support serializabil~ty; it w ~ l l blindly reconstruct a database state, even if
it is the result of a noriserializable schedule of actions In fact, commercial database systems do not always insist on serializabilit~; and in sorne systems
serializability is enforced only on explicit request of the user
On the othcr hand, Chapter 18 talked about serializability only Scliedulels designed according t o the principles of that chapter may do things that the log manager cannot tolerate For instance, there is nothlng in the serializability definition that forbids a transaction with a lock on a n element A from writing
a new value of A into the database before committing, and thus violating a rule
of the logging policy \Verse, a transaction might write into the database and then abort without undoing the Ivnte, which could easily result in a n incon- sistent database state, even though there is no system crash and the scheduler
theoretically maintains serializability
19.1.1 The Dirty-Data Problem
Recall from Section 8.6.5 that data is "dirty" if it has been written by a trans- action tliat is not yet committed The dirty data could appear either in the buffers, or on disk, or both; either can cause trouble
Figure 19.1: TI writes dirty d a t a and then aborts
Example 19.1 : Let us rcconsider the serializable schedule from Fig 18.13
but suppose that after reading B, TI has t o abolt for sonic reason Then tlie sequence of events is as in Fig 19.1 After Tl aborts, the sclieduler releases the
lock on B that TI obtained; that step is essential, or else the lock o n B would
be unavailable to any other transaction, forever
Ho~i-ever, T2 has now read data that does not represent a consistent state
of the database That is, ?r2 read the value of -4 that TI changed, but read the value of B that existed prior to Ti's actions I t doesn't matter in this casc whether or not the value 125 for il that TI created n-as m i t t e n t o disk or not; ?'?
gets that value from a buffer, regardless As a result of reading a n incorlsistcr~t state, T2 leaves the database (on disk) with a n inconsistent state, where -4 # B
The problem in Fig 19.1 is that -4 ~vritten by TI is dirty data, whether
it is in a buffer or on disk The fact that 1; read -4 and used it in its on-n calculation makes z ' s actions questionable -1s we shall see in Section 19.1.2
it is necessary, if such a situation is allowed t o occur, t o abort and roll back T2
as \\-ell a s TI
Figure 19.2: TI has read dirty data from T2 and nlust abort n-hen Tl docs
Example 19.2 : Sow, consider Fig 19.2.1~11ich sho~vs a sequellce of actions i ~ n - der a timestamp-based scheduler as in Section 18.8 Ho~vever: lye ilnagille that this sclleduler does not use the colnrnit bit that \\-as introduced in Section 18.8.1 Recall that, the purpose of this bit is to prevent a value t h a t !\-as n-ritten b>-
a n uncommitted transaction t o be read by anot,her transaction T h ~ s , when TI
reads B a t the second step, there is no co~nmit-bit check to tell T I t o delay
TI can pr.oceed and could eve11 write t o disk and commit; we haye not shoiv11 further details of 1v11at Tl dors
Eyei~tually 7; tries to ~i-ritc C in a ph!.sically unrealizable \\-a? and T2
aborts The effecr of f i ' s prior write of B is cancelled: the value and \\-rite-ti~np
of B is reset to 1~11at it was before T2 wrote I-et TI has been allo~i-?ti t o use this cancelled value of B and can do anything ~ i t h it: such as using it to conlpute
n e x values of A B , and/or C and ~vriting them to disk Thus T I ? ha\-ing read
a dirty value of B, can cause an inconsistellt database state Xote that had the commit bit been recorded and used, the read rl(13) a t step (2) would have
Trang 7992 C'H-4l'TER 19 MORE ABOUT TRA.VSS-iCTION AI-A-YilGElIEST
I 19.1, SERI.~LIZ.-~BILITY 1 S D RECOVERABI~~ITI- 993
been delayed, and not allowed to occur until after T2 aborted and the value of
B had been restored to its previous (presumably committed) value
AS x e see from the e x a m ~ ~ l e s above, if dirty data is available to transactions, then \ve so~netilnes have to perform a cascading rollback That is, when a transaction T aborts, we must determine ~vhich tralisactions have read data written by T, abort thein: and recursively abort any tralisactions that have read data written by an a.borted transaction That is, we must find each transaction
L' that read dirty data written by T , abort C': find any transaction 5- that read dirty data from li, abort V : and so on To cancel the effect of a n aborted
transaction, we can use the log, if it is one of the types (undo or undo/redo) that provides former ~ralalues We may also be able t o restore the d a t a from the disk copy of the database, if the effect of the dirty data has not migrated to disk These approaches are considered in the next section
As Jve have noted, a ti~ncstamp-based scheduler witti a conlrnit bit pre- vents a transaction that rnay Ilax-e read dirty data from proceeding, so there is
no possibility of cascading rollbaclc xvith such a scheduler -4 validation-based sclieduler avoids cascading rollback, because ~vriting to the database (el-en in buffers) occurs only after it is determined that the transaction JX-ill colnmit
19.1.3 Recoverable Schedules
In order for any of the logging metllods ~ v e Ilave discussed in Chapter 17 to ailon- 1-ecovery the set of transactions that are regarded a s committed after recol-el?- must be consistent That is if a transaction TI is, after recovery r e g a ~ d r d
as committed, and Tl used a value written by G, the11 T2 must also remain committed after recovei 5 Thus, n e define:
-1 schedule is rccove~able if earh tra~lsaction coinmits only after each tians- action from n-hlcli it lias read lias committed
Example 19.3: 111 this and several subsequent exa~nples of schedules n-it11 read- and n-rite-actions, we shall use c, for the action 'transaction T, commits."
Here is a n example of a recoverable schedule:
In schedule S2, T2 must precede TI in a serial order because of the writing of -4 but T I ~ l i u s t precede T2 because of the n-ritirlg and readillg of B
Fillally observe the follotving variation on S1 \vllich is serializable but not rccoveiable:
In sclledule S3: TI precedes T2: but their cornrnitrne~lts occur in the wrong order
If before a crash the corlllllit record for T'2 reachcd disk, but the conllnit record for Ti did 11ot then regardless of whether u ~ l d o , redo, or urldo/redo logging ,$-ere used: 6 ~votild be committed after recovery, but Tl would not fJ
Irl order fc,r schpclules t o be truly recoverable under ally of the three loggilrg methods, there is one additional assiiniption a c nlust make re- garding schedules:
The log's colllmit records reach disk in the order in which they are written
As 15-c observed in Example 19.3 concerning sclirdule Sg should it be possible fol coniniit records t o reach di4k in the wrong order then consistent lecovery might
be iInllossible, \ye return to a ~ i d exploit this prillciple in Section 19.1.6
19.1.4 Schedules That Avoid Cascading Rollback
Recoverable sclletiules solnetimes require cascading rollback For instance, if after first four steps of ~clicdule S1 in Esnl~iple 19.3 TI had t o roll back,
it n-ould be lleccssary to roll back TL, as n-ell To guar:lntec the absence of cascadillg rollback, lleed a stronger co~lditioll tlian rccowrabilit~ 11'~ Siiy that :
-1 schedule olioids cascarlzng rollback (or -is an .4CR schedfile") if trans- actions ma! lead only values written 11.1 co~lnnitted tiansactions
Put allotller \v\-a\- a11 XCR schedule forbids the rcadi~ig of dirty data As for re- col-erablc sclledules \ye assume that "comlnitted" ~ n e a n s that the log's comn~it record has reaclled disk
Exalllple 19.4 : 5clicdules of Exalnple 19.3 are not -1CR 111 each case T2
reads B frolll the uncomniitted transaction T I Hon-ever consider:
son., T? rends B ollly after T I thc transaction that last n.rotc B has colnlnit- red alld its log record n-rittc~i to disk Thus sc,hcdnle S1 is ACR as 'vell as rcco\.crablc
s o t i c e tllat sllould a transaction such a s T2 read a value m i t t e n 11)- T I after
TI conrmits then surely fi either co~nnlits or a1)orts after T1 commits Thus: Ever>- ;\CR schedule is recotwable
Trang 89914 CYL4PTER 19 AfORE A B O U T TIZAh-SACTION h I I A 1 7 ~ ~ ~ ~ f ~ h r ~ 19.1 SERI.4LIZABILITY AJ-D R E C O \ ~ E R ~ ~ ~ L ~ ~ - 9%
Our prior discussion applies t o schedules that are generated by any kind of scheduler In the common case that the scheduler is lock-based, there is a simple and commonly used way to guarantee that there are no cascading rollbacks:
Strict Locking: .% transaction must not release any exclusive Iocks (or other locks, such as increment locks that allo~ir values to he changed) until the transaction has either con~mitted or aborted, and the commit or abort log record has been flushed to disk
A schedule of transactions that follow the strict-locking rule is called a strict schedule Two important properties of these schedules are:
1 Every strict schedule is ACR The reason is that a transaction T2 cannot read a value of element X written by TI until Ti releases any exclusive lock (or similar lock that allolvs X to be changed) Under strict locking, the release does not occur until after commit
2 Every strict schedule is serialzzable To see why, ohscrve that a strict schedule is equivalent t o the serial schedule in which each tra~isaction runs instantaneously at the time it commits
IVith these observations, we can now picture the relationships among the dif-
ferent kinds of schedules we have seen so far The containments are suggested
in Fig.19.3
Figure 19.3: Containments an noncontai~lments among classes of schetlules Clearly in a strict schedule it is not possihle for a transaction to rcad dirty data since data written to a huffer by an unconilnitted transaction re~nairls locked until the transaction commits Ho~vever: we still have tlie prohleni of fising the data in buffers when a transaction aborts, since these cllallges must have their effects cancelled How difficult it is t o fix buffered data depellds on
~vhether database elements are blocks or sornethi~lg smaller \Ye shall consider each
Rollback for Blocks
If the lockable database elements are blocks then there is a simple rollback method t h a t never requires us t o use tile log Suppose that a transaction T has obtained an esc1usi~-e lock on block A written a new value for A in a buffer, and then had t o abort Since -4 has been locked since T xvrote its value, 110
other transaction has lead -4 I t 1s easy t o restore the old value of -4 provided the folloning rule is follo~ved
Blocks ~vritten by uilcominittcd transactiolls are pinned in main memory; that is their buffers are not alloxved t o be written t o disk
I11 this case n e roll back.' T when it aborts by telling the buffer manager t o ignore t h e value of A T h a t is, the buffer occupied by -4 is not written anywhere, and its buffer is added t o the pool of available buffers \Ve call be sure that the value of A o n disk is t h e most recent value written by a committed transaction, which is c ~ a c t l y the value we want A t o have
Tllele 1s also a sinlple rollback method if we are using a multiversion system
as in Sections 18.8.5 and 18.8.6 \Ye niust again assume that blocks written by
~incomniitted transactions are pinned 111 memory Then, we simply renlove the value of A that was m i t t e n by T from the list of available values of A S o t e that because T was a i\iiting transaction, its value of I ~ v a s locked from the time the lalue n.as \vritten to the time it aborted (assuming the timestamp/lock scheme of Section 18.8.6 is used)
R o l l b a c k f o r Small D a t a b a s e E1ement.s When lockable database elenlcnts are fractions of a block (e.g., tuples or oh-
~ e c t s ) then the sinlple appioach to restori~lg buffels that have been ~ n o d ~ f i e d hl- aborted transactions nil1 not uoik The p ~ o h l e ~ n is that a buffer may contain data changed by t ~ v o or more transactions: if one of them aboits, Tve still nlust plesesve tlie changes made by the other \ l e have several choices \vhen we must restore thc old value of a small database element A that n-as written by the tlansaction that has a11ortt.d
1 We can read t h e original value of -I from the database stored on disk and modify the buffer contents appropriately
2 If the log is a n undo or untlo/redo log then we can obtain the former value from the log itself The same code used t o recover frorn crashes ma? be used for \-oll~ntary" rolll~acks as \~-cll
3 \IF can keep a separare ~ n a i r ~ - l ~ l c l r ~ o r y log of the changes n ~ a d e by car11
I transaction, preserved for only the tinlc that transactio~l is active The
i old value call be fouxid fro111 this "log."
Sone of these approaches is ideal The first s ~ ~ r e l y il~rolves a disk access The second (examining the log) might not involve a disk access if the relevant
Trang 9996 CHAPTER 19 MORE ABOUT TRAATSACTION JlilA-~lGE:lIEXT
When is a Transaction Really Committed?
The subtlety of group commit reminds us that a completed transaction can
be in several different states between when it finishes its xvork and when it
is truly "committed." in the sense that under no circumstances, including the occurrence of a system failure, will the effect of that transaction be lost As we noted in Chapter 17, it is possible for a transaction to finlsh its work and even write its C O M M I T record t o the log in a main-memory buffer, yet have the effect of that transaction lost if there is a system crash and the COMMIT record has not yet readied disk Lloreover, we saw in Section 17.5 that even if the C O M M I T record is on disk but not yet backed
up in the archive, a media failure can cause the transaction to be undone and its effect to be lost
In the absence of failure, all these states are equivalent, in the sense that each transaction will surely advance from being finished to having its effects survive even a media failure However, when rve need t o take failures and recovery into account, it is important t o recognize the differences among these states, which otherwis'e could all be referred t o informally as 'L~ommitted."
portion of the log is still in a buffer Hone1 er it could also invol~ e extensix e esamination of portions of the log on disk sea~ching for the update record that tells the correct former value Tlie last approach does not require disk accesses
but may consume a large fraction of menioi y for the main-memory '.logs."
Under some circumsta~ices, n-e can avoid reading dirty data even if r e do not flush every commit record on the log t o disk immediately As long as a-e flush log records in the order that they ale written, we can release locks as soon as tlle commit record is written t o tlie log in a buffer
Example 19.5: Suppose transaction TI I\-rites X , finishes, writes its C O M M I T
record on the log, but the log record remains in a buffer Even though TI has not committed in the sense that its connilit record can survive a crash
we shall release TL's locks Then T2 reads S and 'colnmits." but its c o ~ n n ~ i t record, n-hicli follows that of TI also remains in a buffer Since we are flushing log records ill the order 1s-ritten T2 cannot be perceived as co~nmittcd b?- a recovery manager (because its commit record reached disk) unless Tl is also perceived as committed Thus, there arc three cases that the recovery manager could find:
1 Neither TI nor T.L has its commit record on disk Then both are aborted by the recovery manager, and the fact that T2 read S from an uncommitted
2 TI is comnlitted but T2 is not There is n o problerri for two reasons: T2
did not read S from an uncomlnitted transaction, and it aborted anyway with n o effect on the database
3 Both are corrnnitted Then the read of S by Tz was not dirty
On the other hand, suppose t h a t the buffer containing Tz's commit record got flushed t o disk (say because the buffer manager decided t o use the buffer for somet11i1:g else) but the buffer containing TI'S commit lecord did not If
there is a crash a t t h a t point it will look t o the recovery manager that TI did not commit, but T2 did The effect of T2 will be perlrianently reflected in tlie database, but this effect was based on the dirty read of X by T2
Our conclusion from E s a ~ n p l e 19.5 is t h a t we can release locks earlier than the time t h a t t h e transaction's commit record is flushed to disk This policy,
often called g i a z p commit is:
Do iiot release locks until the transaction finishes: and the comniit log record a t least appears in a buffer
Flush log blocks in the order that they \\-ere created
Group commit like the policy of requiring 'recoverable schedules" as discussed
in Section 19.1.3, guarantees that there is never a read of dirty data
19.1.7 Logical Logging
We salv in Section 19.1.5 that dirty reads are easier to fis up rvhen the unit of locking is t h e block or page Holvever, there are a t least two problems prese~lted when database elements are blocks
1 -411 logging methods I-equirc either the old or new value of a database element, or both: t o be recorded in the log \Vhen the change t o a block
is small, e.g., a ren-rittcri attribute of one tuple or an inserted or deleted tuple, then there is a great deal of redundant information written on tile log
2 Tlie recluireme~it that the schedule be recoverable; releasing its locks only after co~nnlit car1 illhibit concurrency severely For esample, recall our discilssion in Section 18.7.1 of the advantage of early lock release as xr
access d a t a tllro,lgll a B-tree indes If we require that locks be helti until connnit thcn this advalitagc cannot be obtained: and n-e effectively allon- only one writing transaction to access a B-tree a t any time
Both these concerns motivate the use of logical logging villere only the
changes t o the blocks are described There are several degrees of coniplesity depending on the nature of the change
Trang 101 .A small rlunlber of bytes of the database element are changed, e.g the update of a fixed-length field This situation call be handled in a straight- forward way, where we record only the changed bytes and their positions
Example 19.6 \rill show this situation and a n appropriate form of update record
2 The change to the database element is simply described; and easily re- stored, but it has the effect of cliangiiig most or all of the bytes in the database element One coninion situation: discussed in Example 19.7: is when a variable-length field is changed and illuch of its record, and even other records must slide within the block The new and old values of the block look very different unless we realize and indicate the simple cause
ample 19.8, take up the matter of B-trees, a logical structure represe~ited
by database clements that are disk blocks, t o illustrate this co~rlples form
of logical logging
Example 19.6 : Suppose database elements are blocks that each contain a set
of tuples from some relation 11'e call express the update of an attribute by a log record that says somethirig like 'tuple t had its attribute a changed f r o ~ n vahie ~ ' 1 t o 02.'' An insertion of a nerv tuple into empty space on the block can
be expressed as "a tuple t with value ( n l a 2 : : a k ) was inserted beginning
at offset position p." Unless the attribute changed or the tuple inserted are
comparable in size t o a block, the alnount of space taken by these records will
be much smaller than the entire block lloreot-er, thcy serve for both undo and redo operations
Notice that both these operations are idernpotent; if you perform them scv- era1 tinlcs on a block; the result is the same as perfor~ning them once Liken-ise
thcir implied inrerses, I\-here the value of t [ n ] is restored from vz back t o 1.1 or the tuple t is removed are also idenrpoteiit Thus records of these types can
be used for rccol-cry in exactly tlie same way that update log rccords were used throughout Cliaptcr 17 0
E x a n l p l e 19.7: Again assunic database clc~nents arc blocks lioldiiig t l ~ p l c but the tul~les Ilavc sonie rariahle-lengtil ficlds If a c l l t ~ ~ ~ g e t o a f i ~ l d such as Ivas described in Exalilple 19.6 occurs, n.e niay 1la1-e to slide large portio~ls of the block t o make room for a longer field or to preserve space if a ficld beco~~ics smaller In extreme cases, ~ v e could have to crcatc ail overfloxr block (1.c~cal1 Section 12.5) to hold part of the contents of the original block, or wc could remove a n ovc.rflo\v block if a shorter field allows us to combine the contenrs of two bl~clis into one
As 101ig as the block and its o\.erflow block(s) are considered part of one database c l ~ i n e n t , then it is straightforward to use the old and/or new value of tlic changed field to tundo or redo the change Ho~vever, the block-plus-overflox~~- bloik(s) must l ~ e thougilt of as holding certain tuples a t a "lo@cal" level 1Ve nlay not even be able t o restore the bytes of these blocks to their original state after a n undo or redo, because there nlay have been reorganization of t h e blocks due t o othcr cliarges that varied the length of other fields Holvever if we think
of a database ele~nent a s being a collection of blocks that together represent certain tupleb tile11 a redo or undo can indeed restore the logical *state" of the eleme~it O
Hoxvever, it ]nay not be possible, as we suggested in Example 19.7, t o treat blocks as expandable through t h e mechanis~ll of overflow blocks IVe nmay thus
be able t o undo or redo actions only a t a level higher than blocks The next esample discusses the important case of B-tree indexes, nhere the management
of blocks does not perinit ove~flow blocks, and we must think of undo and redo
as occuiring a t the logical level of the B-tree itself; rather tllan the blocks
Example 19.8 : Let us consider the problem of logical logging for B-tree nodes Instead of xvriting the old and/or new value of a n entire node (block) on the log we n-rite a short record t h a t describes the change These changes include:
1 Insertion or deletion of a key/pointer pair for a child
2 Change of the key associated \x-it11 a pointer
3 Splittirig or ~rlerging of nodes
Each of these changes call be indicated with a short log record Even the splittin: operation requires only telling xvhere t,he split occurs; and ivhere tahe iiex lodes are Likewise: merging requires only a reference to the nodes in- volved; since rhe manner of rnergirlg is determined by the B-tree rnallagenlent algorithms used
csillg logical iljii!at~ rerorris of these tj-pesalloirs us t o release locks earlier than xrould othern-ise be required for a recoverable schedule The reasoil is that d i r t - reads of B-tree blocks are never a problem for the transaction that reads tl~ein provided its only purpose is t o use the B-tree t o locate the data the transaction needs to access
For instance suppose that tra~lsactioll T reads a leaf node dY but the trans- action c- tilat 1a.t wrote -\- lates aborts and sorne change nlade to S (e.g.; the
illscrrioll of a nelr keT/lloillter pair into due t o a n insertion of a tuple b\
liceds to be undone If T has also inserted a k e y / p o i ~ ~ t e r pair into S then it is liot possiMe t o restore '.t o the !ray it was before LT inodified it Hoxevcr tlie effect of L- on -\- call be undone; in this exa~nple n-e would delete the key/pointer pair that C had iiiscrted Tlie resulting 5 is riot the same as that irllich ex- isted before U operated: it has the i~lsertion made by T Hon-ever, there is no database inconsistency siilcc the B-tree a s a ivhole continues to reflect only the
Trang 111000 CHa4PTER 19 MORE ABOUT TRAA7S.4CTION AlANrlGEJlEl-T
changes made by committed transactions That is, we have restored the B-tree
a t a logical level, but not a t the physical level
If the logical actions are idempotent - i.e they can be repeated any number
of times without harm - then we can recover easily using a logical log For instance, we discussed in Example 19 6 how a tuple insertion could be repre- sented in the logical log by the tuple and the place within a block where the tuple was placed If we write that tuple in the same place two or more tune5 then it is as if we had written it once Thus when recovering, should \ve need
t o redo a transaction that inserted a tuple, we can repeat the insertion into the proper block a t the proper place, without worrying whether me had a l r e a d ~ inserted that tuple
In contrast, consider a situation ishere tuples can move around withi11 blocks
or between blocks, as in Examples 19.7 and 19.8 Sow, we cannot associate a
particular place into which a tuple is to be inserted; the best we can do is place
in the log a n action such as '.the tuple t was inserted somewhere on block B
If we need to redo the insertion of t during recovery, we may ~r,iild up with t n o copies o f t in block B W'oise, we may not know whether the block B 1vit11 tlle first copy o f t made it t o disk Another transaction writing t o another database element on block B may have caused a copy of B t o be written to disk for example
To disambiguate situations such as this ~vhen we recover using a logical log
a technique called log sequence numbers has been developed
Each log record is g i ~ e n a number one greater than that of tlle previous log record.' Thus, a typical logical log record has the form <L,T .I B>
where:
- L is the log sequence number, an integer
- T is the transaction involved
- A is the action performed by T e.g., "insert of tuple t."
- B is the block on which the action was performed
For each action, there is a cornpensating action that logically undoes the action -4s discussed in Esample 19.8 the compensating action niny not restore the database t o exactly the same state S it ~vould liar-e I ~ c c ~ l in had the action never occurred, but it restores the database to a statc that
is logically equivalent to S For instance, the compensating action for
"insert tuple t" is "delete tuple t."
' ~ v e n t u a l l y t h e log sequence numbers must restart a t 0; but the time hetween restarts of the sequence is so large that no ambiguity can occur
19.1 SERMLIZ.4BILITY A N D R E C O V E R A B I L I T Y
If a transaction T aborts, then for each action performed or1 the database
by T, the compensating action is performed, and the fact that this action was performed is also rccorded in the log
Each block maintains, in its header, the log sequence number of the last action t h a t affected that block
Suppose noxv that we need t o use the logical log t o recover after a crash Here is an outlirie of tlle steps t o take
1 Our first step is t o reconstruct the state of the database a t the time of the crash including blocks xvhose current values were in buffers and therefore got lost To do so:
(a) Find the most recent checkpoint on the log, and determine frorn it the set of transactions that nere active a t that time
(b) For each log entry <L,T, A, B>, compare the log sequence number
IV on block B with the log sequence number L for this log record
If !V < L, then redo action A: t h a t action was never perfornled on block I? However, if N 2 L then do nothlng; the effect of '4 was already felt by B
(c) For each log entry t h a t informs us t h a t a transaction T started, com- mitted, or aborted, adjust the set of active transactions accordingly
2 The set of transactions that remain active evllcn se reach the end of the log must be aborted To do so:
(a) Scan the log again, rhis time from the end back to the plel-ious check- point Each time we encounter a record <L T, A B> for a transac- tion T that must be aborted perfor111 the compensating action for
-4 on block B and record in the log t h e fact that that compensatillg action was performed
(b) If we must abort a tiansaction that began prior t o the most recent checkpoint (i.e., that transaction was on the active list for the check-
p i l l t ) then continue back in the log until tile start-records f o ~ all such trailsactions have been found
(c) Write abort-records in the log for each of the transactions we had to abort
* Exercise 19.1.1 : Consider all \\-ays t o insert locks (of a single type only as in Section 18.3) into the sequellce of actiorls
so that the transaction TI is:
Trang 121002 CH.4PTER 19 NORE ABOUT TR-I*\SACTION JIANIGEJIEYT 19.2 I'IEIV SEXI,-LLIZ IBILITI- 1003
a ) Two-phase locked, and strict 19.2 View Serializability
b) Two-phase locked, but not strict
Exercise 19.1.2: Suppose that each of the sequences of actions below IS fol- lolved by a n abort action for transactio~l TI Tell whicli transactions need t o be rolled back
* a ) r1(24); rz(B); wl(B); ~ 2 ( C ) j r j ( B ) ; r3(C); 703(D);
b) r l (A): ml (B); rz(B); 102(C); r3(C); w3(D);
c) r2(A); r3(A); r l ( A ) ; w ( B ) ; r2(B): rz(B); m2(C); r3(C);
d) 72(-4); r3(A); r l ( A ) ; wl(B); rd(B); IUL(C); r3(C);
E x e r c i s e 19.1.3: Consider each of the sequences of actions in Exercise 19.1.2
but now suppose t h a t all three transactions cornrnit and write their cornillit record on the log immediately after their last action Hon-ever, a crash occurs
and a tail of the log mas not writtcn t o disk before the crash and is therefore lost Tell, depending on where the lost tail of the log begins:
2 f hat transactions could be consideled uncomnlitted9
ii ilre any dirty reads created during the recovery process? If so n-hat transactions need t o he rolled back?
zii \$-hat additional dirty reads could have been created if the portion of tlie log lost was not a tail but rather solne potions in the middle?
! E x e r c i s e 19.1.4 : Consider the folloa-ing tn-o transactions
TI: WI (-4): (B); r~ (C): cl;
T2: WZ(-4): T Z ( B ) : ? U ~ ( C ) CZ;
* a ) HOW nnany schedules of Tl and T2 are rccovcrable?
b) Of these how many are ICR sclietlules?
c) How many are both rccoveral~lc and scrializnble?
d) How many are both iCR and serializable?
E x e r c i s e 19.1.5: Give an example of an ICR schedule wit11 shared and es- clusive locks that is not strict
Recall our discussion in Section 18.1.4 of how our true goal in tlie design of a scheduler is t o allow only schedules t h a t are serializable We also saw how tiif- ferences in what operations transactions apply to the d a t a call affect whether or not a given schedule is serializable lye also learned in Section 18.2 that sched-
u l e r ~ nor~nally ellforce "conflict serializability," which guarantees serializability regardless of what tlie transactiolls do with their data
However, there are weaker conditions than conflict-serializability t h a t also guarantee serializability In this sectiorl we shall consider one such condition, called 'vie\v-serializability:' Intuitively, view-serializability considers all the connectio~is between transactions T and li such that T writes a database el-
ement ~vhose value U reads The key difference between view- and conflict- serializability appears when a transaction T writes a value A t h a t no other transaction reads (because some other transaction later writes its om11 value for
.A) In that case, the KT(-4) action can be placed in certain other povitiolls
of the schedule (where A is like~vise never read) that ~vould not be permitted under the definition of conflict-serializability In this section, 11-e shall define vie~v-serializability precisely and give a test for it
19.2.1 View Equivalence
Suppose we have two scheduIcs S1 and S2 of the same set of transactions Imagine that there is a hypothetical transaction To that wrote initial \alu?s for each database element read by any transaction in the schedules, and another hypothetical transaction T j that reads every element written by one or more tra~isactions after each schedule ends Then for every read action ri(*.I) in one
of the schedules 17c can find the write action l u j ( ; l ) that most closely preceded the read in question.' We say T, is the source of the read action ri(=l) S o t e that transaction T j could be the lippothetical initial tra~isactioll To, and Ti
could be Tf
If for every read action ill one of the schedules, its source is the same in the other schedule, we say that S1 and Sg are view-equivalent Surely, view- equivalent schedules are truly equivalent; they each d o the same when executed
on any one database state If a scliedille S is vie~v-equivalent t o a serial schedule
we say S is view-serializable
1 E x a m p l e 19.9 : Consider the \chetlulr S defined by:
TI : rl(-J) 1L-1 ( B ) T?: r2(B) ~ " ( ~ 4 ) w 2 ( B )
Trang 131004 CH-APTER 19 AiORE ABOUT TR.AXSACTION lIA4LY-4GE-\1EST 19.2 SERIALIZ-4BILITY
Sotice that vie have separated the actions of each transaction vertically: to indicate better which transaction does what; you should read the schcd~lle from left-to-right, as usual
In S , both TI and T2 write values of B that are lost; only tbe value of
B written by T3 survives to the elid of the schedule and is "read.' by the hypothetical transaction Tf S is not conflict-serializable To see rvhi, first note that T2 writes A before TI reads A: so l must precede TI in a hypothetical conflict-equivalent serial schedule TIie fact t h a t t h e action ,lnl (Bj precedes
I C ~ ( B ) also forces TI to precede T2 ill any co~iflict-equivalent serial schedulc
Yet neither a l ( B ) nor l(i2(B) has any long-term affect on tlie database It is these sorts of irrelevant \\,rites that vien.-serializability is able t o ignore, when determining the true constraints on an equivalent serial schedule
hIore precisely, let us consider the sources of all the reads in S:
1 The source of r z ( B ) is To, since there is no prior write of B in S
2 The source of rl(A) is T 2 , since T.l most recently wrote -4 before the read
3 Likewise, the source of r3 (-4) is T2
4 The source of the hypothetical read of =I by Tf is T 2
5 Thc source of thc hypothetical read of B by T f is TJ, the last w i t e r of B
Of course, To appears before all real transactions iri any schrtiule, arid Ij ap- pears after all transactions If we order the real transactions (T.L: T I T3) then the sources of all reads are the same as in schedulc S That is, T2 reads B, and surely TO is the previous "15-riter." Tl reads -4; but T z already wrote -l so the source of rl(.4) is T2, as in S T3 also reads .4: but since the prior T.2 \{-rote -4
that is the source of r3(.-l), as in S Finally, the hypot,hctical Tf reads -4 and
B j but the last writers of d and B in the s c h e d ~ l e (T2: TI, T3) are T2 and T3 rc- spectivel!; also as in S K e conclude that S is a view-serializable scliedule, and the schedule represented by the order ( f i , T I : T 3 ) is a vien.-cquivaleiit schedule
19.2.2 Polygraphs and t h e Test for View-Serializability
Therc is a gcneralization of the precedence graph ivhicll n-c, iiscd to tcst co11- flict scri;ilixal~ility in Section 18.2.2 that reflects all thc prcc.odcncc, constrai~lts required 1))- thc dc~finition of vicn- scl.ializability \Ye tl(+i~lr) ill(, pol!/grclpli for ;i
schedule to consist of the follo~ving:
1 -1 node for cach transaction and additional rlodcs for tlic hypothetical
transactions To arid Tf
2 For each action r , ( S ) with source T, place an arc froni T, t o T,
3 Suppose Tj is t h e source of a read ri(X), and Tk is another ~vriter of X
It is not allowed for Tk t o intervene between T, and Ti, so it must appear either before T, or after Ti n T e represent this condition by a n arc pair (sho~r-n dashed) from Tk t o Ti and froni Ti t o T k Intuitively: one or t h e other of an arc pair is 'real," but lve don't care which, and when x e t r y
to make the polygraph acyclic, we can pick whichever of the pair helps t o make it acyclic Honever? there are important special cases where t h e arc pair becomes a single arc:
(a) If T j is To, then it is not possible for Tk t o appear before T', so we use a n arc Ti + Tk in place of the arc pair
(b) If Ti is T f ; then Tk cannot follow T i , so we use an arc Tk + Tj in place of the arc pair
Figure 19.4: Beginxling of polygraph for Esample 19.10
Example 19.10: Consider the schedule S from Example 19.9 \Ire show in Fig 19.4 the beginning of the polygraph f o ~ S , where only the nodes and the arcs fi-om rule (2) have hcen placed \Ye have also indicated the database elemcnt causing each arc That is, -4 is passed from T2 t o TI T3 and T f , while
B is passed fro111 To to T2 and from T3 to T f
?;o\v, n.e lllust considel n-hat transactioils might interfere with each of these five connections by n-~iting the same clen~cnt bet~vecn them These potential interferences are ruled out by the arc pairs from rule (3) although as n-e shall see, in this example each of the arc pairs inrolves a special case and becomes a single arc
Consider the arc & -+ Ti based on eleliler~t d The only writers of A are To
and T2 and ncitller of rllem can get in tlie iniddle of this arc: since To cannot move its posirioll and T2 is already an a i d of the arc Thus 110 additional arcs are needed ;\ sinlilar argurntnt tells us no additional arcs are needed to keep writers of -I outside the arcs T2 -+ 7; and T? -t Tf
S o ~ r - collsider the arcs based on B Xote that To TI T? and T3 all n-rite
B Consider the arc To -+ T2 first TI and T3 are otlier writers of B: To and T2 also ~yrite B; but as sav,- the arc ends cannot cause interfererlce so we need not consider them -1s we cannot place TI bet\\-een To and T 2 , in principle \re need tlic arc pair ( T I -+ To T.r -+ T I ) Honever nothing can precede To, so the optioll T I -+ To is not possible \Ye may in this special case just add the
Trang 141006 CHAPTER 19 AIORE ABOUT TRANS24CTION AM-TAGEJIENT
1" 19.2 1-IETV SERI-4LIZdBILITY 1007 arc T2 -+ Ti to the polygraph But this arc is already there because of .4, so in
effect, we make no change to the polygraph t o keep Ti outside the arc To -+ T2
We also cannot place T3 between To and T2 Similar reasoning tells us to add the arc Tz -+ T3, rather than an arc pair However, this arc too is already
in the polygraph because of A, so we make no change
ivext, consider the arc T3 -+ T f Since To, T I , and Tz are other writers of
B, we must keep them each outside the arc To cannot be moved between T3
and T f : but TI or Tz could Since neither could be moved after T f r e must constrain Ti and T.L t o appear before T3 There is already an arc Tz -+ T3, but
we must add t o the polygraph the arc Tl -+ T3 This change is the only arc we must add to the polygraph, whose final set of arcs is shown in Fig 19.5
Figure 19.5: Complete polygraph for Example 19.10
Example 19.11 : In Example 19.10, all the arc pairs turned out t o be single
arcs as a special case Figure 19.6 is an example of a schedule of four transac-
tions where there is a true arc pair in the polygraph
be added As we saw in Example 19.10, there are several silnplifications Ive can make \Then avoiding interference with the arc T, -t T,, the only transactiol~s
that need be considered as Tk (the transaction t h a t cannot be in the middle) are:
\Vriters of a n e!ement that caused this arc T, -+ T,
But not To or T f , 15-hich can never be Tn and
S o t Ti or T,, the ends of the arc itself
\\*it11 these rules in mind let us co~lsider the arcs due to database element .4
\\-l-hich is xritten by To T3 and T4 \Ye need nut consider To a t all T3 must not get between T4 -+ T f so \ve add arc T3 -+ T4; remember t h a t t h e other arc in t h e pair, T f + T3 is not an optiotl Likewise, T3 must not get between
To -+ Tl or To -+ T 2 , tvhich results in t h e arcs TI -+ T3 and T2 -+ T3
Figure 19.7: Beginning of pol\-graph for Example 19.11 Sou-, coilsider the fact t h a t T4 also must not get in the middle of an alc due t o -4 It is all end of T4 -+ T f so that a l c is irrelevant TI must not get
b e t ~ ~ e e n To -+ TI or To -+ T? n-hicli ~ e s u l t s in the arcs TI T4 and ir?2 4 T4
S e s t let us consider the arcs due t o B nhich is w i t t e n by To, T1, and T4 .igain we need not consider To The only arcs due t o B are T I -+ T?, T I -+ T4, and T4 -t T f Tl cannot get in the middle of the first t ~ o , but the third requires arc Tl -t T4
T4 can get in the middle of TI -+ f i only This arc has neither end a t To
or Tf: SO it really requires a n arc pair: (7.1 -+ T I , Tz -+ T4) We show this arc pair, as well as all t h e other arcs added, in Fig 19.8
Test consider the writers of C : To and Ti -1s before, To cannot present a problem -41~0, T I is par[ of el-ery arc due to C' 50 it cannot get in the middle Similarl\- D is ~ ~ r i t t e n only by To and f i so n-c can dctcrmine that no Inore arcs are nccessar): The final j ~ o l ~ g r a p h is thus the one in Fig 19.8
i 19.2.3 Testing for View-Serializability
Since we must choose only one of each arc pair we can find a n equivalent serial order for schedule S if and onl? if there is son-he selection from each arc pair that turns S's polygraph into an acyclic graph TO see why, notice that if there
Trang 151008 CHAPTER 19 MORE ABOUT TRAA'SACTION ~fAX.4GEJ1EAJT
Figure 19.8: Complete polygraph for Example 19.11
is such an acyclic graph, then any topological sort of the graph gives an order in which no writer may appear between a reader and its source, arid every n-riter appears before its readers Thus, the reader-source connections in the serial order are exactly the same as in S ; the two schedules are view-equivalent, and therefore S is view-serializable
Conversely, if S is view-serializable then there is a view-equivalent serial order S' E ~ e r y arc pair (Tk + T, T, -t Tk) in S's polygraph niust have
Tk either before T, or after T, in S': otherw~se the writing by Tk breaks the connection from T, t o T,, which means that S and Sf are not view-equivalent Likewise every arc in the polygraph must be respected by the transaction order
of S f Ke conclude that there is a choice of arcs from each arc pair tliat makes the polygraph into a graph for which the serial order S' is consistent with each arc of the graph Thus, this graph is acyclic
Example 19.12: Consider the polygraph of Fig 19.5 It is already a graph
and it is acyclic The only topological order is (T2, TI, T3), which is therefore a view-equivalent serial order for the schedule of Example 19.10
Sow consider the polygraph of Fig 19.8 We must consider each choice from the one arc pair If we choose T4 -t TI then there is a cycle Honever, if we choose Tz + T 4 , the result is an acyclic graph The sole topological order for this graph is (Tl.T2, T3, T4) This order yields a view-equivalent serial order and shon-s that the original schedule is vie\\ serializable CI
Exercise 19.2.1 : Draw tlie polygraph and finti all view-equ~valent s e ~ i a l orders for the following schedules:
e I11 Scc.tlun 15.-1.3 n e saw how the ab11it)- to 11pgr.idt loclcs from illarc~rl t o esclusiTe can cause a deadlock because each trdnsaction holds a shared lock on the same elerneilt aiid lvarlts to upgrade the lock
There are t ~ v o broad ap1,roaches to dealing u-it11 deadlock \IF car1 detect deadlocks and fix tlle~n or we call manage traiisactio~ls in such a way that deadlocks are never able to form
19.3.1 Deadlock Detection by Timeout
\\-hen a deadlock exists, it is genclrally iulpossible to repair the situation so tliat all transactions involved can proceed Thus at least one of the traiisactio~ls \\-ill have t o he rolled back - al~ortcd and rcstartcd
T h e silllplcsr 1 t - a ~ t o detect ant1 resolve deadlocks is \\.it11 a tinleo~rt Pllt
a limit on lion- long zi tr;rnsac.tio~~ may he active and if a trilnsaction excectls this tinle roll it 1,ac.k For csamplc in a si~nple transaction q s t c i n IV\I<Y('
t?-pica1 transactions cxecutc ill nlillistc~ollds a tirneout of one niiiiutc ~\-o~lltl affect only transactions that are caught in a deadlock If some transactions are nlore colnplcx n-e might ~vant tlie tinieout to occur after a longer interval box-ever
Sotice that n h e n one transaction involved in the deadlock tirncs out it releases its locks or o t l i c ~ resources Thus tllercl is a chance that the other
Trang 16transactions involved in the deadlock will complete before reaching their timeout limits However since transactions involved in a deadlock are likely to have started a t approximately the same time (or else, one would have completed before another started), it is also possible that spurious timeouts of transactions that are no longer involved in a deadlock will occur
19.3.2 The Waits-For Graph
Deadlocks that are caused by transactions waiting for locks held by another can
be addressed by a waits-for graph, indicating which transactions are waiting for locks held by another transaction This graph can be used either to detect deadlocks after they have formed or to prevent deadlocks from ever forming
We shall assume the latter, which requires us t o maintain the waits-for graph
at all times, refusing to allow an action that creates a cycle in the graph
Recall from Section 18.5.2 that a lock table maintains for each database elenlent X a list of the transactions that are ~i-aiting for locks on X , as nell as transactions that currently holtl locks on X The waits-for graph has a node for each transaction that currently holds a lock or is waiting for one There is
an arc from node (transaction) T t o node U if there is some database elenleiit
d such that:
1 li holds a lock on A,
2 T is waiting for a lock on A, and
3 T cannot get a lock on A in its desired mode unless U first releases its lock on .L3
If theie are no cycles in the waits-for graph, then each tiansactioii can evenrually complete There will be a t least one transactiori u-aiting for no other transaction, arid this transaction snrely can complete At that tlme t l i e ~ e will
be a t least one other transaction that is not waiting, which can complete and
FO 011
Hon-ever if there is a cycle then no transaction in the cycle can ever make progress so there is a deadlock Thus a strategy for deadlock avoidance is to roll back any transaction that makes a request that ~vould cause a cycle in the waits-for graph
Example 19.13: Suppose n-e have the following four transactions each of n-hich reads one element and n-rites another:
31n common sitnations, such as shared and exclusive locks; every waiting transacrion rvill
have to w i t until all current lock holders release their locks; but there are examples of systems
of lock ]nodes where a transaction can get its lock after only some of t h e c~lrrent locks are released: see Exercise 19.3.6
\Ye use a simple locking system \\-it11 only one lock mode, although the same effect nould be noted if we were to use a shared/exclusive system and took locks in thc appropriate niode: sharcd for a read and exclusive for a write
5) 12(.4): Denied
3 ) l l ( B ) : Denied
Figure 19.9: Beginning of a schedule mith a deadlock
In Fig 19.9 is the beginning of a scliedule of these four transactions In the first four steps each transaction obtains a lock o n the elenlent it wants to read
It step (3), T2 tries t o lock 4: but the request is denied because TI already has
a lock on -4 Thus: T._, waits for TI: and we draw an arc from the node for T.2
t o the node for T I
Figure 19.10: \Yaits-for graph after step (7) of Fig 19.9 Similar1)- at step (6) T3 is denicd a lock on C because of T2 and at step (7)
T4 is de~iieti a lock on f because of TI The waits-for graph a t this point is as sho\\-n in Fig 19.10 There is 110 cycle in this graph,
At step (8) TI nus st wait for the lock on B held by T3 If \ r e allon-ed TI to wait then there ~ o u l d be a cycle in the waits-for graph involving Ti Tz, and
T3 as suggested by Fig 19.11 Since they are each waiting for allother t o finish, none can iilake progress and therefore there is a deadlock involving these three
Trang 171012 CHAPTER 19 310RE ABOUT TRAArSACTION AIA:\'ilGElIEST 4
Figure 19.11: Waits-for graph with a cycle caused by step (8) of Fig 19.9 8 -
Figure 19.12: 1I'aits-for graph after TI is rolled back
transactions Incidentally, T4 could not finish either, although it is not in the cycle because T4's progress depends on TI making progress
Since we roll back any transaction that would cause a cycle, then TI must
be rolled back, yielding thc waits-for graph of Fig 19.12 TI relinquishc~s its lock on A, which may be given t o either T2 or Ti Suppose it is given to T2
Then T2 can complete n-hereupon it relinquishes its locks on 4 and C Tow T 3
which needs a lock on C, and T4, which needs a lock on 21, call both complete
At solne time, Tl is restarted, but it cannot get locks on 4 and B until T2 T 3
and T4 have completed
19.3.3 Deadlock Prevention by Ordering Elements
Sow let us consider several more methods for deadlock prevention The first requires us to order database elements in some arbitrary but fixed order For instance, if database elements are blocks, Ive could order them lexicographically
by their physical address Recall from Section 8.3.4 that the physical address
of a block is normally represented by a sequence of bytes describing its locntioll trithin the storage sl-stem
If cvcry transaction is required to request locks on elenicnts in order ( a con- dition that is not realistic in no st applications), then there can be no deadlock due t o transactions waiting for locks For suppose T2 is waiting for a lock on
.-I1 held by T I ; T3 is waiting for a lock on -42 held by T 2 , and so on, while T,,
is waiting for a lock on An-1 held by Tn-l, and Tl is xvaiting for a lock on 4, held by T,, Since 2'2 11% a lock on -42 but is waiting for AI, i t nlust be that
.-I2 < -41 in t'lie order of eleulents Similarly, < for i = 3 , 4 , ; n But
since Tl has a lock on ,A1 while it is waiting for A,, it also f o l l o ~ s t h a t ill < A,
\\re noly have .Al < An < -An-1 < < -42 < .-I1, which is impossible, since it implies A1 < :I1
Example 19.14: Let us suppose elenlents are ordered alphabetically Then
if the four transactions of Examplel9.13 are to lock elelllents in alphabetical order, ?il and T4 must be ren-ritten t o lock elements in the opposite order Thus, the four transactions are noxr:
Figure 19.13 shows what happens if the transactions execute ~ v i t h the same timing as Fig 19.9 TI begins and gets a lock on A T2 tries t o begin next by
g e t t ~ n g a lock on -4, but must ~vait for TI Then T3 begills by getting a lock
on B but T4 is unable to begin because it too needs a lock on A, for \vhich it must wait
Figure 19.13: Locking elenlentc in al~llnletical order prevents deadlock
Since r.) is stalled, it cannot proceed, and follo\ving the order of events in Fig 1 0 9 T3 gets a turn next It is able to get its lock on C whereupon it conipletes a t step (6) Soi\- iviilr T3's locks on B and C released TI is able
t o co~nplete which it does a t step (8) At this point the lock on -4 becomes
Trang 181014 CHAPTER 19 XORE ABOUT TR.4ArSACT10i\' JIANAGEJIEYT
available, and we suppose that it is given on a first-conie-first-served basis t o T2
Then, T2 can get both locks that it needs and completes a t step (11) Finally
T4 can get its locks and completes
19.3.4 Detecting Deadlocks by Timestamps
\ r e can detect deadlocks by maintaining the waits-for graph, as we discussed
in Section 19.3.2 Ho~vever, this graph can be large, and analyzing it for cj-cles each time a transaction has t o wait for a lock can be time-co~isuming An alter- native to maintaining the waits-for graph is t o associate with each transaction
a timestamp This timestamp:
Is for deadlock detection only; it is not the same as the timestamp used for concurrency control in Section 18.8, even if timestamp-based concurrency control is in use
In particular, if a transaction is rolled back, it restarts with a new, later concurrency timestamp, but its timestamp for deadlock detection never changes
The timestamp is used when a transaction T has t o wait for a lock that
is held by another transaction U Two different things happen depending on whether T or U is older (has the earlier timestamp) There are two different policies that can be used to Inanage transactions and detect deadlocks
1 The Wait-Die Scheme:
(a) If T is older than U (i.e the timestamp of T is smaller than L*'s
timestamp), then T is allo~ved to x a i t for the lock(s) held by U
(I)) If li is older than T , then T 'dies": it is rolled back
2 The iifound- Wait Scheme:
(a) If T is older than CT, it 'wounds" C Usually the "wound" is fatal:
C' must roll back and relinquish t o T the lock(s) t h a t T needs from
U There is a n csception if, by the time the "nound" takes effect C
has already finished and lcleased its locks In that case C' survives and need riot be rolled back
(b) If C' is older than T then T waits for the lotk(s) held by IT
E x a m p l e 19.15 : Let us consider the wait-die schcmc using the transactions
of Esalnple 19.14 \Ye shall assume that T17T2: T.$ T4 is the order of times: i.e.:
Tl is the oldest transaction lye also assume that ~ v h e n a transaction rolls back
it does not restart soon enough t o become active before the other transactions finish
Figure 19.14 sho\x-s a possible sequence of events under the wait-die schcme
TI gets the lock on 4 first \Yhen T2 asks for a lock on 4, it dies; because TI
Figure 19.14: Ictions of transactions detecting deadlock under the wait-(lie schenie
1 ) 11(-4): rl(-4):
2) l2 (A); Waits 3) 13(B): r ~ ( B 1 ;
Trang 191016 CH.4PTER 19 JIORE ABOUT TR=II\~SACTION AIIAh'AGEIIEYT
Why Timestamp-Based Deadlock Detection Works
We claim t h a t in either the wait-die or wound-wait scheme, there can be
no cycle in the waits-for graph, and hence no dcadlock Suppose other- wise; that is there is a cycle such as TI -+ T2 -+ T3 -+ TI One of the transactions is t h e oldest, say T?
In the wait-dic scheme, you can only wait for younger transactions
Thus, it is not possible that TI is waiting for TI, since T2 is surely older than TI In the wound-wait scheme, you can only wait for older transac- tions Thus, there is no way Tl could be 11-aiting for the younger T3 \Ye conclude that the cycle cannot exist, and therefore there is no deadlock
is older than T2 In step (3), T3 gets a lock on B, but in step (4) T4 asks for
a loclc on d and dies because TI, the holder of the lock on A, is older than T4
Sext, T3 gets its lock on C and conlpletes n'hen Tl continues, it finds the lock
on B available and also completes a t step (8)
Sow, the two transactions that rolled back - T2 and T4 - start again -
Their timestamps a s far as deadlock is concerned, d o not change: T2 is still older than T4 Honever, XT-e assume that T4 restarts first, a t step (9) and when the older transaction T.L requests a lock on -I a t step ( l o ) , it is forced to n-ait
but does not abort Ti completes a t step (12), and then TI is allov-ed to run to completion, as slion-n in the last three steps
E x a m p l e 19.16: Sext, let us consider the same transactions running urlder the 11-ound-wait policy, as shown in Fig 19.15 As in Fig 19.14, Tl begins by locking -I When T2 requests a lock on -I a t step (2); it waits, since Tl is older
than T2 After T3 gets its lock on B a t step (3), T4 is also made t o wait for the lock on .a
Then, suppose t h a t TI cont,inues a t step (5) with its request for the lock on
B That lock is already held by T3; but Tl is older than T3 Thus, TI .'wounds'- T3 Since T3 is riot yet finished, the rvound is fatal: T3 relinquishes its lock and rolls back Thus; TI is able to complete
\\:hen Tl makes the lock on 1 available, suppose it is given t o T2 n-hich
is thcn a l ~ l e to procccd After T2, the lock is given to T 4 : which proceeds to
coniplction Finally T3 restarts and co~llpletcs ~vithout interference
19.3.5 Comparison of Deadlock-Management Met hods
In both the nait-die and n-ound-wait schc~n~es, older transactions kill off newer transactions Since tra~isactions restart ivith their old timestamp eventually each trallsaction becomes the oldest In tlie system and is sure t o complete This
guarantee that every transaction eventually completes is called n o starvat~orl
Xotice that otllcr schcnles described in this scction d o not necessarily prevent
We sliould also consider the advantages and disadvantages of both wound- n-ait and wait-die x h e n compared with a straightfor\vard construction and use
of the waits-for graph The important poi~lts are:
Both wound-wait and wait-die are easier t o implement than a system that maintains or periodically constructs the waits-for graph The disad- vantage of constructing the waits-for graph is even more extreme when the database is distributed and the naits-for graph must be constructed from a collection of lock tables a t different sites See Section 19.6 for a discussion
Lsing the waits-for minimizes the number of times we must abort
a transaction because of deadlock fi never abort a transaction unless there really is a deadlock On the other hand either wound-wait or wait- die will solnetimes roll back a transaction when there a-as no deadlock and no deadlock 11-ould have occurred had the transaction been allo~ved
t o survive
19.3.6 Exercises for Section 19.3
E x e r c i s e 19.3.1: For each of the sequences of actions belorv assume t h a t shared locks are requested immediately hcfore each read action and exclusive locks are lequested immediately heforc every \\-rite action .ilso, unlocks occur imnlediately after the filial action that a transaction executes Tell what actions are denied, and nhether deadlock occurs Also tell holv tlie waits-for graph evolves during the executioll of the actions If there are deadlocks, pick a transaction to abort, and show how the sequence of actions continues
Trang 201018 CHAPTER 19 IWORE ABOUT TRANSACTION MANAGE-\IEATT
Exercise 19.3.2 : For each of the action sequences in Exercise 19.3.1, tell n-hat happens under the wound-wait deadlock avoidance system .Assume the order of deadlock-timestamps is the same as the order of subscripts for the transactions, that is, Tl,T2, T3,T4 Also assume that transactions t h a t need t o restart d o so
in the order that they were rolled back
Exercise 19.3.3 : For each of the action sequences in Exercise 19.3.1, tell what happens under the wait-die deadlock avoidance system Make the same assumptions as in Exercise 19.3.2
! Exercise 19.3.4: Can one have a waits-for graph with a cycle of length n, but
no smaller cycle, for any integer n > l ? What about n = 1, i.e., a loop on a node?
!! Exercise 19.3.5 : One approach t o avoiding deadlocks is to require each trans- action to announce all the locks it wants a t the beginning, a n d t o either grant all those locks or deny them all and make the transaction wait Does this ap- proach avoid deadlocks due t o locking? Either explain why, or give an example
of a deadlock that can arise
! Exercise 19.3.6: Consider the intention-locking system of Section 18.6 De-
scribe how t o construct the waits-for graph for this system of lock modes Espe- cially, consider the possibility that a database element A is locked by different transactions in modes IS and also either S or Ix If a request for a lock on '1
has t o wait, what arcs do we draw?
*! Exercise 19.3.7: In Section 19.3.5 we pointed out t h a t deadlock-detection
methods other than wound-wait and wait-die do not necessarily prevent star- vation, where a transaction is repeatedly rolled back and never gets t o finish
Give a n example of how using the policy of rolling back any transaction that
~vould cause a cycle can lead to starvation Does requiring that transactions request locks on elements in a fixed order necessarily prevent starvation? \That about timeouts as a deadlock-resolution mechanism?
19.4 Distributed Databases
We shall now consider the elements of distributed database systems In a dis- tributed system, there are many, relatively autonomous processors that may participate in database operations Distributed databases offer several oppor- tunities:
1 Since many machines can be brought t o bear on a problem, the opportu- nities for parallelisn~ and speedy response t o queries are increased
One important reason t o distribute data is that the organization is itself dis- tributed among many sites, and the sites each have data that is germane pri- marily to that site Some examples are:
1 A bank may have many branches Each branch (or the group of branches
in a given city) will keep a database of accounts maintained a t that branch (or city) Customers can choose t o bank a t any branch, but will normally bank a t "their" branch, where their account data is stored The bank may also have d a t a that is kept in the central office, such as employee records and policies such as current interest rates Of course, a backup of the records a t each branch is also stored, probably in a site that is neither
a branch office nor the central office
2 A chain of department stores may have many individual stores Each store (or a group of stores in one city) has a database of sales a t t h a t store and inventory a t that store There may also be a central office with data about employees, chain-wide inventory, credit-card customers, and information about suppliers such as unfilled orders and what each
is owed In addition there may be a copy of all the stores' sales d a t a in
a "data warehouse." which is used t o analyze and predict sales through ad-hoc queries issued by analysts: see Section 20.4
3 A digital library may consist of a consortium of universities that each hold on-line books and other documents Search a t any site xvill examine the catalog of documents available a t all sites and deliver an electronic copy
of the document t o the user if any site holds it
In some cases, what we might think of logically as a single relation has been partitioned among many sites For example, the chain of stores might be imagined t o have a single sales relation, such as
S a l e s ( i t e r n , d a t e , p r i c e , p u r c h a s e r )
Trang 21I Factors in Communication Cost I .As b a n d ~ i d t h cost drops rapidly one might wonder whether communi- cation cost needs to be considered when designing a distributed database system S o w c e ~ t a i n kinds of data are among the largest objects managed electronically, so even with very cheap communicatioil the cost of sending
a terabyte-sized piece of data caniiot be ignored Ho~vevcr, comlnunication cost generally involves not only the shipping of the bits, but several layers
of protocol t h a t prepare the data for shipping, r e c o n s t i t ~ ~ t e them a t the receiving end, and manage the communication These protocols each re- quire substantial computation While computation is also getting cheaper, the con~putation needed t o perform the communication is likely to remain significant, coinpared to the needs for conventional, single-processor exe- cution of key database operations
However, this relation does not exist physically Rather it is the union of a number of relations with the same schema, one a t each of the stores in the chain These local relations are called fragments, and the partitioning of a logical relation into physical fragments is called Aorzzontal decomposztion of the relation S a l e s We regard the partition as "horizontal" because we ma?;
visualize a single S a l e s relation with its tuples separated by horizontal lines
into the sets of tuples a t each store
In other situations, a distributed database appears t o have partitioned a relation "r~erticall~;" by decomposing ~ v h a t niight be one logical relatiori into two or more, each with a subset of the attributes, and with each relation a t a different site For instance if lye want t o find out which sales a t the Boston store
\(-ere made t o customers who are more than 90 days in arrears on their credit- card payments, it \%-ould be useful t o have a relation (or view) that included the item date, and purchaser info~mation from Sales alorig with the date of the last credit-card payment by that purchaser Howel-er, in the scenario we are describing, this relation is decomposed vertically, and \ye ~vould have t o join the credit-card-custorner relation a t the central headquarters with the fragment of Sales a t the Boston store
Rather a transaction consists of conimunicating transactzon components each
at a different site and communicating with the local scheduler and logger Two important issues t h a t must thus be looked a t anelr arc:
1 How do n e manage the comniit/abort decision when a transaction is dis- tributed? K h a t happens if one component of the transaction wal1tS t o abort the ivhole transaction, ~yhile others encountered no problem and lyant to commit:' jve discuss a technique called two-phase commit" in Section 19.5: it allors the decision t o he made properly and also frequently allows sites that ale up t o operate even if s o n ~ e other site(s) have failed
2 How do n e assure serializability of transactions that involve components
a t several sites'? \fi look a t locking in particulal, in Section 186 and see how local lock tables can be used t o support global locks on database r.lenlmts and thus support serialirab~lity of transactions in a distributed environment
19.4.3 Data Replication
Oire important advantage of a distr~buted system is the ability t o replicate d a t a , that is t o make copier of the d a t a a t diffeiellt sites One slotivation is that if a site fails, there may be other sites that can provide the same data that as a t tlie failed site h second use is ill inlpmving the speed of query answrilrg by makillg a copy of needed d a t a available a t tlie sites where queries a r e initiated For example:
1 \ bank may lllake copies of current interest-rate policy arrilable a t eacll branch so a qucry about rates does not have t o lie sent t o the central office
2 \ chain store may keep c o p i c ~ f infolmation about soppliers a t each store so local rcqucsts for infornlatioll about suppliers (e.g t h r ma~lnger needs the phone n u ~ i ~ b e r of a si~pplier t o cliecl; on a slliplne~lt) CBI be handled 11-ithout scndillg messages to the ccntral office
3 I digital library may temporarily cache a copy of a poplilar document a t
a school ~vlicre students haye bee11 assigned to read tlie docunlent Holve\er there are problems tlrat most bc faced a h e n data is repli- cated
a) HoXv do w keep copies identical? 111 n s m c e an update t o a replicated data elemel?t heconles a distri1,utc.d transaction that updates all copics b) Holy do lye decide \illprc and llcjii illany copies to kerp'? The siori cnl~i's the Illore effort is rc<lllircd to pil lilt^ 1 1 ~ t tlic casirr qurrics ~ > E C O I ~ I C For exalllple a r~]atioll flint is rarely opdatcd nright have copies crrryhllcre for lilaxinlrim efficiency ivhile a frecl~icntly updated relation might have only one or t ~ o copies
C) 1Yh.t happals "hen there is a cornnliillication failure in the netivork and different copies of the same tlstir have the o ~ p o r t i m i t y t o evolve separately and must then be reconciled d e n the netur-ork reconnects?
Trang 221022 C m P T E R 19 .\IORE ABOUT TR.4NS.4CTIOAV V~\I~~~V-~GE-~\IE-\~T 19 j DISTRIBUTED CG-'\I\IrT
Tlie existence of distributed data also affects the complexity and options avail- able in the design of a physical query plan (see Section 16.7) Among the issues that must be decided as we choose a physical plan are:
1 If there are several copies of a needed relation R, from which do rye get the valuc of R?
2 If we apply a n operator, say join, t o two relations R and S , n-e have several options and must choose one Some of the possibilities are:
(a) We can copy S to the site of R ant1 d o tlie colnputation there
(b) We can copy R to the site of S and d o the computation there
(c) 117e can copy both R and S to a third site and do the conlputation
a t that site
Which is best depends on several factors, including which site has available processing cycles and whether the result of the operation will be combined with data a t a third site For example if we are computing (R w S ) w T
Tve may choose to ship both R and S t o the site of T and take both joins there
If a relation R is in fragments R 1 , R2, , R, distributed among several sites, 11-e should also replace a use of R in the query by a use of
RI U R2 U U R,,
as we seiect a logical query plan The query may then allow us t o simplify the expression significantly For instance, if the R,'s each represent fragments of
with a single store, then a query about sales a t the Boston store might allon-
us to leniove all R,'s except the fragment for Boston from the union
*!! Exercise 19.4.1: The following exercise ~vill allow you to address sonie of
the problcrns that come up when deciding 011 a replication strategy for data
Suppose there is a relation R that is accessed from n sites Tlie it11 site issncs
qi queries about R and 7 l i updates t o R pcr second for i = 1 2 : n Thc sost of executing a query if there is a copy of R a t the site issuing the cluerj- is
c, wliile if tlierc is no copy there, and the query must be sent to some remote
site: then the cost is 10c The cost of esecuting an update is d for the copy of
R at the issuing site and 10d for every copy of R that is not a t the issuing site
.is a fij~lction of these parameters, how ~rould j-ou choose for large ;en: a set of sites at ~vliich to replicate R
In this section, n.e shall address the ~ r o b l e m of holv a distributed transaction that has components a t several sites can execute atomically T h e next section discusses another important property of distributed transactions: executing them serializably l i e shall begin with a n example that illustrates the problenis that might arise
E x a m p l e 19.17 : Consider our example of a chain of stores mentioned in Sec- tion 19.4 Suppose a manager of the chain wants t o query all the stores, find t h e ii~ventory of toothbrushes a t cach, and issue instructions t o move toothbrushes from store i o store in order t o balance the inventory The operation is done
by a single global transaction T that has cornpoilent T, a t the i t h store and
a coniponent To a t the office where the manager is located The sequellce of activities performed hy T are summarized belolv:
1 Corilponellt To is created a t tlie site of the nlanager
2 To swds messages t o all the stores instructing them t o create components
TI
3 Each T, executes a q u e q a t store i to discover the number of toothbrushes
in ill\-entory and reports this ~ i u ~ n b c r t o To
1 To takes these nuinhers and deterlni~les, by some algorithln we shall not discuss \\-hat d ~ i p m c n t s of tootht)rushci are desired To then sends mcs- sages such as -store 10 should ship 500 toothblushes to store 7" t o the appiopliate stores ( ~ t o r e s 7 and 10 in this instance)
3 Stores receiving instructions update their inventory and perfor111 the ship-
ment s
There are a nulnher of things that could go w o n g in Example 19.17, and many
of these result in violations of the atomicity of T That is, some of the actions comprising T get executed b ~ ~ t o t l i ~ r s do not SIechanisms such as logging and recovery ~vhi,.h n-c assume arc prespnt a t each site, ~vill assure that each Ti is csecuted atomicail? but do not asslirc that T itself is atomic
E x a m p l e 19.18 : Suppose a b11g in rhc algorithnl t o redistribute tootlibrushes migilt cause store 10 to be instructed to ship more toothbrushes than it has Ti0
~vill therefore abort and no tootlibrushcs \<-ill be shipped from store 10; neither will the in\-entory a t store 10 be changed Ho~vever T7 detects no problems and commits a t <tore 7 updating its in\-cntory t o reflect the supposedly shipped
toothbrushes ?;ow not only has T failed t o execute aton~ically (since Tlo never
Trang 231024 CHAPTER 19 MORE ABOUT TRA.SSACTION S4AN.4GEAIEST
completes), but it has left the distributed database in an inconsistent state: the toothbrush inventory does not equal tllc number of toothbrushes on hand
Another source of problems is the possibility that a site will fail or be dis- connected from the network w h ~ l e the distributed transaction is running
Example 19.19: Suppose Tlo replies t o To's first message by telling its inven- tory of toothbrushes Ho\vever; the machine a t store 10 then crashes, and the instructions from To are never received by Tlo Can distributed transaction T
ever commit? What should TIo d o when its site recovers?
19.5.2 Two-Phase Commit
In order t o avoid the problems suggested in Section 19.5.1, distributed DBMS's use a complex protocol for deciding whether or not to commit a distributed transaction In this section, \re shall describe the basic idea behind these pro- tocols, called two-phase commit By making a global decision about commit- ting, each compo~ient of the transaction will commit, or none will -4s usual
~ v e assume that the atomicity mechanisms a t each site assure that either the local component commits or it has no effect on the database state a t that site:
i.e., components of the transaction are atomic Thus, by enforcing the rule that either all components of a distributed transaction commit or none does
we make the distributed transaction itself atomic
Several salient points about the trvo-phase commit protocol folloxv:
In a two-phase commit, we assume that each site logs actions a t that site
but there is no global log
\Ye also assume that one site, called the coordznator, plays a special role
in deciding whether or not the distributed transaction can commit For example the coordinator might be the site a t which the transaction orig- inates, such as the site of To in the esalnples of Scction 19.5 1
The two-phase commit protocol involves sending certain ~nessagcs be- tween the coordinator and the other sites .Is each message is sent, it is logged a t the sending site, t o aid in Iecovery should it be necessary
K i t h these points in mind, n.c can describe the two phases in terms of the messages sent between sites
P h a s e I
In phase 1 of the two-phase commit the coordinator for a distributed trans- action T decides when t o attempt to connnit T Presumably the attempt to commit occurs after the component of T at the coordinator site is ready to
"0 not confuse tao-phase commit tlith tno-phase locking They are independent ideas
designed to solve different problems
commit, but in principle the steps must be carried out even if the coordinator's component lvants to abort (but mith o b v i o ~ s simplifications as rve shall see) The coordinator polls all the sites mith compollelits of the transaction T t o determine their wishes regarding the commit/abort decision
1 T h e coordinator places a log record < P r e p a r e T > on the log a t its site
2 The coordinator sends to each component's site (in principle including itself) the message p r e p a r e T
3 Each site receiving the message p r e p a r e T decides whether t o commit or abort its component of T The site can delay if the component has not yet completed its activity, but must eventually send a response
4 If a site wants to commit its component, it must enter a state called
precommitted Once in the precommitted state, the site cannot abort its component of T without a directive t o d o so from the coordinator T h e following steps are done t o become precommitted:
(a) Perform whatever steps are necessary t o be sure the local component
of T \$-ill not have t o abort, even if there is a system failure follo~ved
by recovery a t the site Thus not only must all actions associated
~ v i t h the local T be performed but the appropriate actions regarding the log must be taken so that T will be redone rather than undone
in a recover): The actions depend on the logging method, but surely the log records associated \\-it11 nctions of the local T must be flushed
t o disk
(b) Place the record <Ready T > on the local log and flush the log t o disk
(c) Send t o the coordinator the message ready T
However the site does not commit its component of T a t this time; it must ~ ~ a i t for phdae 2
3 If; instead, the site Ivants to abort its component of T: then it logs t h e record <Don't commit T > and sends the message d o n ' t commit T t o the coordinator It is safe to abort the component at this time, since T
xvill surely abort if even one cornpontnt wants t o abort
The messages of phase 1 are suxmnarizcd in Fig 19.16
Phase I1
The second phase begins ~vlien responses r e a d y or d o n ' t commit are receixed from each site by the coordinator However it is possible that some site falls to respond: it may be down or it has been disconnected by the network 1x1 that case after a suitable timeout period the coordinator tvill treat the site as if it had sent d o n ' t commit
Trang 241026 C'H.'tPTER 19 JIORE ABOUT TR.41WCTIOA\T di'A-i-YAGEIIE-\rT
prepare
f / O
ready or
O ~ O ( H don't commit
Figure 19.16: Messages in phase 1 of two-phase colnnlit
1 If the coordinator has received ready T from all components of T1 then
it decides t o commit T The coordinator (a) Logs <Commit T > a t its site, and (b) Sends message commit T t o all sites involved in T
2 If the coordinator has received don't commit T from one or more sites, '
Figure 19.17: 1Icssages in phase 2 of two-phase corn~nit
19.5.3 Recovery of Distributed Trallsactions
.It any time during the two-phase commit process, a site may fail \Ye need
t o make sure that what happens when the site recovers is consistent ~ v i t h the global decision that was made about a distributed trdnsaction T There are several cases t o consider: depending on the last log entiy for T
1 If the last log record for T was <Commit T > , then T must have been committed bv the coordinator Depending on the log nletl~od used, i t
1 - may bc necessary to redo the component of T a t the recovering site
I 2 If the last log record is <Abort T> then sinlilarly we kno~v that the
global decision was t o abort T If the log method requires it we undo the component of T a t the recolering site
3 If the last log record is <Don't commlt T > , then the site knon-s that tllc global decision must have been to abort T If necessary effects of T on the local database arc undone
4 The hard case is when the last log record for T is <Ready T> Sow, the recovering site does not know 13-liether the global decision was t o conimit
or abort T This site must coinlnunlcate wit11 a t least one other site t o find out the global decision for T If the coordinator is up, the site call ask the coordinator If the coordinator is not up a t this time some otller site may be asked t o consult its log to find out what happcncd t o T In the \Torst case no other site can be contacted and the local cornpollent
of T must be kept active until the cornmit/abort decision is deterrninecl
3 It may also be the case that tlle local log lias no records about T tllat conle from actions of tlle tlvo-phase commit protocol If so, then the recovering site may unilaterally decide t o abort its component of T : ~vhich
is consistent n.ith all logging nlethods It is ~ ~ o s s i b l c that t l ~ c coorclinator already detected a timeout from the failetl site ant1 decitfcd t o abort T If
the failure \vas brief: T may still be active a t other sites but it ~vill never
be inconsistent if the recovering site decides to abort its colliponent of T
and responds \\-it11 don't commit T if later polled in phasc 1
The above analysis assumes that tlic failed site is not the coortiinator IVhcll the coordinator fails during a two-phase commit, n c ~ v problems arise First, the survivilig participant sites niust either \T-ait for the coordinator t o recover or elect a new coordinator Since the coordi~lator co~tld be dolvn for an indefinite period there is good nlotivation t o elect a nexv leader: a t least after a brief
~vaiting period to see if the coordinator conies hack up
The matter of lender election is in its on.11 right a cornples p r o b l r l ~ ~ of dis- tributed systems beyond the scol~c of this l~ooli Hon-cvcr a si~nplt> tncthod will work in most situations For instance n-e ilia\- assume that all participallt sitc,s
h a v ~ uniqnr idcntif\-ing nl~rnbcrs: IP at1tlrci;scs n-ill n-ork in ninny sitllatiol~s Each participant sends nlessages almou~lcil~g its a~ailahility as 1e;idcr t o ;ill thr' other sites pil-ing its identifying nunlbrr After a suitable length of time each participant ackno~vledges as the neu- coordirlator tlle lowest-n~lnlbered site from nhicli it has Ileal-d and sends messages to that effect to all the otllcr sites If all sites receive consistent messages: then there is a unique choice for new coor- dinator and everyone kao\vs about it If there is i~iconsistellcy or a s~lrrivillg
Trang 251028 CHAPTER 19 AIORE ABOUT TRAIVS,~CTION -\I.;li\'- IGEI\fE-\-T
sitc has failed t o respond, that too will be universally kno~vn, and the election stalts oler
Now, the new leader polls the sites for information about each distributed transaction T Each site reports the last record on its log concerning T , if there
is one Tlle possible cases are:
1 Some site has <Commit T> on its log Then the original coordinator must have ~vanted t o send commit T messages everywhere, and it is safe
t o commit T
2 Similarly, if some site has <Abort T> on its log, then the original coordi- nator must have decided to abort T, and it is safe for the new coordinator
t o order that action
3 Suppose now that no site has <Commit T > or <Abort T > on its log, but
a t least one site does not have <Ready T > on its log Then since actions are logged before the corresponding messages are sent, we know that the old coordinator never received r e a d y T from this site and therefore could not have decided t o commit It is safe for the neTv coordinator t o decide
t o abort T
4 The hard case is when there is no <Commit T > or <Abort T > t o be found, hut every surviving site has <Ready T> Sow, we cannot be sure whether the old coordinator fo~und sonle reason t o abort T or not; it could have decided t o do so because of actions a t its oxvn site, or because of a
d o n ' t commit T message from another failed site, for example Or the old coordinator may h a x decided to commit T and already conimitted its local conlponelit of T Thns, the nen- coordinator is not able t o decide xvhether t o comniit or abort T and must wait until the original coordina- tor recovers 111 real systems, the database administrator has the ability
to intervene and manually force the waiting transaction comporielits to finish The result is a possi1)Ic loss of atomicity, but the person executing the blocked transaction will be notified to t,ake soille appropriate compen- sating action
! E x e r c i s e 19.5.1: Consider a transaction T initiated at a home computer that
a ~ k s bank B to transfer $10.000 from a n acrount a t B to an account at anothel I~ank C
* a) \That are the colnponents of distributed transactio11 T? \That should tlie conlponents a t B and C do?
b) \Vllat can go lvrong if there is not $10.000 in the account a t B?
c ) \That can go wrong if one or both banks' computers crash, or if the netxvork is disconnected?
d) If one of the problems suggested in (c) occurs, how could the transaction resume correctly when the computers and network resume operation? Exercise 19.5.2 : In this exercise, n-e need a notation for describing sequences
of messages t h a t can take place during a two-phase commit Let (i, j , 3f) mean that site i sends the message ,If to site j, where t h e value of AI and its meaning can be P (prepare) R (ready), D (don't commit), C (commit), or A (abort)
We shall discuss a simple situation in which site 0 is the coordinator, but not other:\-ise part of the transaction, and sites 1 and 2 are the components For instance, the following is one possible sequence of messages that could take place during a successful commit of the transaction:
* a) Give a n example of a sequence of messages t h a t could occur if site 1 wants
t o commit and site 2 xvants t o abort
*! b) How Inany possible sequences of messages such as the above are there, if the transaction successfully commits?
! c) If site 1 wants t o commit, but site 2 does not, how many sequences of messages are there, assuming no failures occur?
! d ) If sitc 1 wants t o commit but site 2 is down and does not respond to messages, how many sequences are there?
!! Exercise 19.5.3: Csing the notation of Esercise 19.5.2, suppose the sites are coordiliator and n other sites that are the transaction components As a function of n how many sequences of messages are there if the transaction successfully commits'?
19.6 Distributed Locking
In this section we shall see how to extend a locking scheduler t o an environment where transactions are distributed and consist of components at several sites n'e assume that lock tables are managed by individual sites, and that the component of a transaction at a site can only request a lock on the data elements
at that site
I\'hen data is leplicated n c must arrange that the copies of a single ele- ment S are changed in the same n-a? b! each transaction This r~quircment introduces a tlistinctioll betn-een locking the loy~cal database element S and locking one or more of the copies of S In this section, lve shall offer a cost model for distributed locking algorithms that applies t o both replicated and nonreplicated data However, before introducing the model, let us consider a n obvious (and someti~nes adequate) solution t o t h e problem of maintaining locks
in a distributed database - centralized locking