An Introduction to Database Systems 8Ed - C J Date - Solutions Manual Episode 2 Part 3 potx

Chapter 16Principal Sections • Three concurrency problems • Locking • The three concurrency problems revisited • Deadlock • Serializability • Recovery revisited • Isolation levels • Inte

Trang 1

SELECT P.P#, P.PNAME, P.COLOR, P.WEIGHT, P.CITY FROM P

ORDER BY P# ; eof := FALSE ;

EXEC SQL START TRANSACTION ;

EXEC SQL OPEN CP ;

DO WHILE ( NOT eof ) ;

DO count := 1 TO 10 ;

EXEC SQL FETCH CP INTO :P#, ;

IF SQLSTATE = '02000' THEN

DO ; EXEC SQL CLOSE CP ; EXEC SQL COMMIT ; eof := TRUE ; END DO ;

ELSE print P#, ; END IF ;

END DO ;

EXEC SQL DELETE FROM P WHERE P.P# = :P# ;

EXEC SQL COMMIT AND CHAIN ;

END DO ;

A blow-by-blow comparison of the two solutions is left as a subsidiary exercise

Trang 2

Chapter 16

Principal Sections

• Three concurrency problems

• Locking

• The three concurrency problems revisited

• Deadlock

• Serializability

• Recovery revisited

• Isolation levels

• Intent locking

• Dropping ACID

• SQL facilities

General Remarks

Very intuitive introduction: Two independently acting agents* can get in each other's way (i.e., interfere with each other)──think

of, e.g., two people both trying to use the bathroom at the same time in the morning The solution to the problem is to introduce

a mechanism (door locks) and a protocol for using that mechanism

(lock the bathroom door if you don't want to be disturbed)

──────────

* I'm not using the term "agent" here in any special technical sense──in particular, not in the formal sense of Chapter 21

──────────

By analogy with intuitive examples such as the foregoing,

concurrency control in transaction processing systems has

traditionally been based on a mechanism called locking (though of

course the locks involved are software constructs, not hardware)

and a protocol ("the two-phase locking protocol") for using that

mechanism Moreover, most systems still typically rely on locking right up to this day, a fact that explains the emphasis on locking

in the body of the chapter However, certain nonlocking schemes are described in the annotation to several of the references in the "References and Bibliography" section

Trang 3

16.2 Three Concurrency Problems

The three classic problems: lost updates, uncommitted

dependencies, and inconsistent analysis The examples are

straightforward Observe that the lost update problem can occur

in two different ways Note: Uncommitted dependencies are also called dirty reads, and inconsistent analysis is also called

nonrepeatable read (though this latter term is sometimes taken to

include the phantom problem also) Mention conflict terminology:

RR ═* no problem; RW ═* inconsistent analysis / nonrepeatable

read; WR ═* uncommitted dependency; WW ═* lost update

16.3 Locking

Discuss only exclusive (X) and shared (S) locks at this stage

Carefully distinguish between the mechanism and the protocol

(beginners often get confused over which is which; both are

needed!) Explain that the whole business is usually implicit in practice

16.4 The Three Concurrency Problems Revisited

Self-explanatory

16.5 Deadlock

Mostly self-explanatory Explain the Wait-For Graph (it isn't

discussed in detail in the text because it's fairly obvious, not

to say trivial; see the answer to Exercise 16.4) Detection vs avoidance vs timeout (perhaps skip avoidance)

16.6 Serializability

A given interleaved execution (= schedule) is considered to be

correct if and only if it is equivalent to some serial execution

(= schedule); thus, there might be several different but equally correct overall outcomes

Discuss the two-phase locking theorem (important!) and the two-phase locking protocol

If A and B are any two transactions in some serializable

schedule, then either B can see A's output or A can see B's

Trang 4

If transaction A is not two-phase, it's always possible to

construct some other transaction B that can run interleaved with A

in such a way as to produce an overall schedule that's not

serializable and not correct Real systems typically do allow

transactions that aren't two-phase (see the next section but one),

but allowing such a transaction──T, say──amounts to a gamble that

no interfering transaction will ever coexist with T in the system

Such gambles aren't recommended! (Personally, I really question whether isolation levels lower than the maximum would ever have been supported if we'd started out with a good understanding of the importance of the integrity issue in the first place See

Section 16.8.)

16.7 Recovery Revisited

This section could be skipped If not, explain the concept of an unrecoverable schedule, plus the sufficient conditions for

recoverable and cascade-free schedules

16.8 Isolation Levels

(Begin quote)

Serializability guarantees isolation in the ACID sense One

direct and very desirable consequence is that if all schedules are serializable, then the application programmer writing the code for

a given transaction A need pay absolutely no attention at all to the fact that some other transaction B might be executing in the

system at the same time However, it can be argued that the

protocols used to guarantee serializability reduce the degree of concurrency or overall system throughput to unacceptable levels

In practice, therefore, systems usually support a variety of

levels of "isolation" (in quotes because any level lower than the maximum means the transaction isn't truly isolated from others

after all, as we'll soon see)

(End quote)

As this extract indicates, I think the concept of "isolation levels" is and always was a logical mistake But it has to be

covered The only safe level is the highest (no interference at

all), called repeatable read in DB2 and SERIALIZABLE──a

misnomer──in SQL:1999 Cursor stability (this is the DB2

term──the SQL:1999 equivalent is READ COMMITTED) should also be discussed, however.* Perhaps mention U locks (partly to

illustrate the point that X and S locks, though the commonest

perhaps, aren't the only kind)

Trang 5

──────────

* I remark in passing that DB2 now supports the same four

isolation levels as the SQL standard does, albeit under different

names: RR or repeatable read ("SERIALIZABLE"), RS or read

stability ("REPEATABLE READ"), CS or cursor stability ("READ

COMMITTED"), and UR or uncommitted read ("READ UNCOMMITTED") The

terms in parentheses are the standard ones Incidentally, DB2 allows these various options to be specified at the level of

specific database accesses (i.e., individual SELECT, UPDATE, etc., statements)

──────────

Stress the point that if transaction T operates at less than

the maximum isolation level, then we can no longer guarantee that

T if running concurrently with other transactions will transform a

"correct" (consistent) state of the database into another such state. A system that supports any isolation level lower than the maximum should provide some explicit concurrency control

facilities (e.g., an explicit LOCK statement) to allow users to guarantee safety for themselves in the absence of such a guarantee from the system itself DB2 does provide such facilities but the standard doesn't (In fact, the standard doesn't mention locks,

as such, at all──deliberately The idea is to allow an

implementation to use some nonlocking scheme if it wants to.)

Explain phantoms and the basic idea (only) of predicate

locking Mention access path locking as an implementation of

predicate locking

16.9 Intent Locking

This is good stuff (unlike the isolation level stuff!) Discuss

locking granularity and corresponding tradeoffs Conflict

detection requires intent locks: intent shared (IS), intent

exclusive (IX), and shared intent exclusive (SIX) Discuss the intent locking protocol (simplified version only; the full version

is explained in the annotation to reference [16.10]) Mention

lock precedence and lock escalation

16.10 Dropping ACID

This section offers some fresh and slightly skeptical (unorthodox,

contentious) observations on the question of the so-called ACID

properties of transactions You might want to skip it

Trang 6

Review the intended meaning of "the ACID properties" (C for correctness, not consistency, though) We now propose to

deconstruct these concepts; in fact, I believe we've all been sold

a bill of goods, slightly, in this area, especially with respect

to "consistency" or "correctness"

Begin by taking care of some unfinished business: Explain why

we believe all constraint checking has to be immediate (for

detailed arguments, see the book) Critical role of multiple

assignment

Now discuss the ACID properties per se (in the order C-I-D-A)

Follow the arguments in the book

• With respect to "C": Could it have been that transaction

theory was worked out before we had a clear notion of

consistency? (Yes, I think so.) Note the quotes from the

Gray & Reuter book! Note too this text from the discussion in the chapter: "[If] the C in ACID stands for consistency, then

in a sense the property is trivial; if it stands for

correctness, then it's unenforceable Either way, therefore, the property is essentially meaningless, at least from a

formal standpoint."

• With regard to "I": The original term was "degrees of

consistency" Not the happiest of names! Data is either consistent or it isn't (Quote from the annotation to

reference [16.11].)

• With regard to "D": Makes sense only if there's no nesting

but nesting is desirable "for at least three reasons:

intra-transaction parallelism, intra-transaction recovery

control, and system modularity" [15.15]

• With regard to "A": Multiple assignment again!

In sum: A makes sense only because we don't have multiple

assignment (but we need multiple assignment, and we already have

it partially──even in SQL!──and we're going to get more of it in SQL:2003); C is only a desideratum, it can't be guaranteed; the same is true for I; and D makes sense only without nesting, but we want nesting To quote the conclusion of this section in the

book, then:

(Begin quote)

We conclude that, overall, the transaction concept is important more from a pragmatic point of view than it is from a theoretical one Please understand that this conclusion mustn't be taken as

Trang 7

disparaging! We have nothing but respect for the many elegant and useful results obtained from over 25 years of transaction

management research We're merely observing that we now have a better understanding of some of the assumptions on which that

research has been based──a better appreciation of integrity

constraints in particular, plus a recognition of the need to

support multiple assignment as a primitive operator Indeed, it would be surprising if a change in assumptions didn't lead to a change in conclusions

(End quote)

16.11 SQL Facilities

No explicit locking, but SQL does support isolation levels

(discuss options on START TRANSACTION; recall that REPEATABLE READ

in the SQL standard is not the same thing as "repeatable read" in

DB2) Explain SQL's definitions of dirty read, nonrepeatable

read, and phantoms (are they the same as the definitions given in

the body of the chapter?) Is the SQL support broken?──see

references [16.2] and [16.14]

References and Bibliography

References [16.1], [16.3], [16.7-16.8], [16.13], [16.15-16.17], and [16.20] discuss approaches to concurrency control that are wholly or partly based on something other than locking

Answers to Exercises

16.1 See Section 16.6

16.2 For a precise statement of the two-phase locking protocol and the two-phase locking theorem, see Section 16.6 For an

explanation of how two-phase locking deals with RW, WR, and WW conflicts, see Sections 16.2-16.4

16.3

a There are six possible correct results, corresponding to the six possible serial schedules:

Initially : A = 0

T1-T2-T3 : A = 1

T1-T3-T2 : A = 2

T2-T1-T3 : A = 1

T2-T3-T1 : A = 2

Trang 8

T3-T1-T2 : A = 4

T3-T2-T1 : A = 3

Of course, the six possible correct results aren't all

distinct As a matter of fact, it so happens in this

particular example that the possible correct results are all independent of the initial state of the database, owing to the

nature of transaction T3

b There are 90 possible distinct schedules We can represent

the possibilities as follows (Ri, Rj, Rk stand for the three RETRIEVE operations R1, R2, R3, not necessarily in that order; similarly, Up, Uq, Ur stand for the three UPDATE operations

U1, U2, U3, again not necessarily in that order.)

Ri-Rj-Rk-Up-Uq-Ur : 3 * 2 * 1 * 3 * 2 * 1 = 36 possibilities

Ri-Rj-Up-Rk-Uq-Ur : 3 * 2 * 2 * 1 * 2 * 1 = 24 possibilities

Ri-Rj-Up-Uq-Rk-Ur : 3 * 2 * 2 * 1 * 1 * 1 = 12 possibilities

Ri-Up-Rj-Rk-Uq-Ur : 3 * 1 * 2 * 1 * 2 * 1 = 12 possibilities

Ri-Up-Rj-Uq-Rk-Ur : 3 * 1 * 2 * 1 * 1 * 1 = 6 possibilities

──────────────── TOTAL = 90 combinations

════════════════

c Yes For example, the schedule R1-R2-R3-U3-U2-U1 produces

the same result (one) as two of the six possible serial

schedules (Exercise: Check this statement), and thus happens

to be "correct" for the given initial value of zero But it must be clearly understood that this "correctness" is a mere fluke, and results purely from the fact that the initial data value happened to be zero and not something else As a

counterexample, consider what would happen if the initial

value of A were ten instead of zero Would the schedule

R1-R2-R3-U3-U2-U1 shown above still produce one of the genuinely

correct results? (What are the genuinely correct results in

this case?) If not, then that schedule isn't serializable

d Yes For example, the schedule R1-R3-U1-U3-R2-U2 is

serializable (it's equivalent to the serial schedule

T1-T3-T2 ), but it cannot be produced if T1, T2, and T3 all obey the

two-phase locking protocol For, under that protocol,

operation R3 will acquire an S lock on A on behalf of

transaction T3; operation U1 in transaction T1 will thus not

be able to proceed until that lock has been released, and that

won't happen until transaction T3 terminates (in fact,

transactions T3 and T1 will deadlock when operation U3 is

reached)

This exercise illustrates very clearly the following important point Given a set of transactions and an initial state of the

Trang 9

database, (a) let ALL be the set of all possible schedules

involving those transactions; (b) let "CORRECT" be the set of all schedules that do at least produce a correct final state from the given initial state; (c) let SERIALIZABLE be the set of all

guaranteed correct (i.e., serializable) schedules; and (d) let

PRODUCIBLE be the set of all schedules producible under the two-phase locking protocol Then, in general,

ALL ⊇ "CORRECT" ⊇ SERIALIZABLE ⊇ PRODUCIBLE

16.4 At time tn no transactions are doing any useful work at all! There's one deadlock, involving transactions T2, T3, T9, and T8;

in addition, T4 is waiting for T9, T12 is waiting for T4, and T10 and T11 are both waiting for T12 We can represent the situation

by means of a graph (the Wait-For Graph), in which the nodes

represent transactions and a directed edge from node Ti to node Tj indicates that Ti is waiting for Tj (see the figure below) Edges

are labeled with the name of the database item and level of lock they're waiting for

╔════════════════════════════════════════════════════════════════╗

║ ║

║ T10 T11 ║

║ A ( X ) └─────┬─────┘ C ( X ) ║

║ * ║

║ T12 ║

║ D ( X ) │ ║

║ * ║

║ T4 ║

║ G ( S ) │ ║

║ * H ( X ) ║

║ T9 ────────────* T8 ║

║ * │ E ( S ) ║

║ G ( S ) │ * ║

║ T3 *──────────── T2 ║

║ F ( X ) ║

║ ║

╚════════════════════════════════════════════════════════════════╝

16.5 Isolation level CS has the same effect as isolation level RR

on the problems of Figs 16.1-16.3 (Note, however, that this

statement does not apply to CS as implemented in DB2, thanks to

DB2's use of U locks in place of S locks [4.21].) As for the

inconsistent analysis problem (Fig 16.4): Isolation level CS

doesn't solve this problem; transaction A must execute under RR in

order to retain its locks until end-of-transaction, for otherwise it'll still produce the wrong answer (Alternatively, of course,

A could lock the entire accounts relvar via some explicit lock

request, if the system supports such an operation This solution would work under both CS and RR isolation levels.)

Trang 10

16.6 See Section 16.9 Note in particular that the formal

definitions are given by the lock type compatibility matrix (Fig 16.13)

16.7 See Section 16.9

16.8 See the annotation to reference [16.10]

16.9 The three concurrency problems identified in Section 16.2

were lost update, uncommitted dependency, and inconsistent

analysis Of these three:

• Lost updates: The SQL implementation is required to

guarantee (in all circumstances) that lost updates never

occur

• Uncommitted dependency: This is just another name for dirty

read

• Inconsistent analysis: This term covers both nonrepeatable

read and phantoms

16.10 The following brief description is based on one given in reference [15.6] First of all, the system must keep:

1 For each data object, a stack of committed versions (each stack entry giving a value for the object and the ID of the transaction that established that value; i.e., each stack

entry essentially consists of a pointer to the relevant entry

in the log) The stack is in reverse chronological sequence, with the most recent entry being on the top

2 A list of transaction IDs for all committed transactions (the

commit list)

When a transaction starts executing, the system gives it a private copy of the commit list Read operations on an object are directed to the most recent version of the object produced by a transaction on that private list Write operations, by contrast, are directed to the actual current data object (which is why

write/write conflict testing is still necessary) When the

transaction commits, the system updates the commit list and the data object version stacks appropriately

Tiêu đề	Concurrency
Tác giả	C. J. Date
Trường học	Not Available
Chuyên ngành	Database Systems
Thể loại	Solutions Manual
Năm xuất bản	2003
Thành phố	Not Available

Định dạng
Số trang	20
Dung lượng	100,22 KB