Joe Celko s SQL for Smarties - Advanced SQL Programming P76 ppt

722 CHAPTER 32: TRANSACTIONS AND CONCURRENCY CONTROL Isolation actually becomes more complicated in practice, because one transaction may or may not actually see the data inserted, updat

Trang 1

722 CHAPTER 32: TRANSACTIONS AND CONCURRENCY CONTROL

Isolation actually becomes more complicated in practice, because one transaction may or may not actually see the data inserted, updated, or deleted by another transaction This will be dealt with in detail in the section on isolation levels

32.2.4 Durability

The database is stored on durable media, so that if the database program

is destroyed, the database itself persists Furthermore, the database can

be restored to a consistent state when the database system is restored Log files and backup procedures figure into this property, as well as disk writes done during processing

This is all well and good if you have just one user accessing the database at a time But one of the reasons you have a database system is that you also have multiple users who want to access it at the same time

in their own sessions This leads us to concurrency control

32.3 Concurrency Control

Concurrency control is the part of transaction handling that deals with the way multiple users access the shared database without running into each other—like a traffic light system One way to avoid any problems is

to allow only one user in the database at a time The only problem with that solution is that the other users are going to get lousy response time Can you seriously imagine doing that with a bank teller machine system

or an airline reservation system, where tens of thousands of users are waiting to get into the system at the same time?

32.3.1 The Five Phenomena

If all you do is execute queries against the database, then the ACID properties hold The trouble occurs when two or more transactions want

to change the database at the same time In the SQL model, there are five ways that one transaction can affect another:

transaction, T2, then further modifies that data item before T1 performs a COMMIT or ROLLBACK If T1 or T2 then performs a

reason dirty writes are bad is that they can violate database consistency Assume there is a constraint between x and y (e.g., x =

y), and T1 and T2 each maintain the consistency of the constraint

Trang 2

32.3 Concurrency Control 723

if run alone However, the constraint can easily be violated if the two transactions write x and y in different orders, which can only happen if there are dirty writes

then reads that row before T1 performs a COMMIT WORK If T1 then performs a ROLLBACK WORK, T2 will have read a row that was never committed, and that may thus be considered to have never existed

T2 then modifies or deletes that row and performs a COMMIT WORK If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted

some <search condition> Transaction T2 then executes

statements that generate one or more rows that satisfy the

then repeats the initial read with the same <search

condition>, it obtains a different collection of rows )

transaction T1 reads a data item and then T2 updates the data item (possibly based on a previous read), then T1 (based on its earlier read value) updates the data item and COMMITs

These phenomena are not always bad things If the database is being used only for queries, without any changes being made during the workday, then none of these problems will occur The database system will run much faster if you do not have to try to protect yourself from them They are also acceptable when changes are being made under certain circumstances

Imagine that I have a table of all the cars in the world I want to execute a query to find the average age of drivers of red sport cars This query will take some time to run and during that time, cars will be crashed, bought and sold, new cars will be built, and so forth But I accept a situation with the five phenomena listed above, because the average age of the information will not change that much from the time I start the query to the time it finishes Changes after the second decimal place really don’t matter

Trang 3

However, you don’t want any of these phenomena to occur in a database where the husband makes a deposit to a joint account and his wife makes a withdrawal This leads us to the transaction isolation levels The original ANSI model included only P1, P2, and P3 The other definitions first appeared in Microsoft Research Technical Report: MSR-TR-95-51 “A Critique of ANSI SQL Isolation Levels” by Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil (1995)

32.3.2 The Isolation Levels

In standard SQL, the user gets to set the isolation level of the transactions in his session The isolation level avoids some of the phenomena we just talked about and gives other information to the database The syntax for the <set transaction statement> is as follows

SET TRANSACTION <transaction mode list>

<transaction mode> ::=

| <transaction access mode>

| <diagnostics size>

<diagnostics size> ::= DIAGNOSTICS SIZE <number of conditions

<transaction access mode> ::= READ ONLY | READ WRITE

<isolation level> ::= ISOLATION LEVEL <level of isolation>

<level of isolation> ::=

READ UNCOMMITTED | READ COMMITTED | REPEATABLE READ | SERIALIZABLE

The optional <diagnostics size> clause tells the database to set

up a list for error messages of a given size This is a Standard SQL feature, so you might not have it in your particular product The reason is that a single statement can have several errors in it, and the engine is supposed to find them all and report them in the diagnostics area via a

Trang 4

32.3 Concurrency Control 725

ONLY option means that this is a query and lets the SQL engine know

that it can relax a bit The READ WRITE option lets the SQL engine know

that rows might be changed, and that it has to watch out for the five

phenomena

The important clause, which is implemented in most current SQL

products, is the <isolation level> clause The isolation level of a

transaction defines the degree to which the operations of one transaction

are affected by concurrent transactions The isolation level of a

transaction is SERIALIZABLE by default, but the user can explicitly set

it in the <set transaction statement>

The isolation levels each guarantee that each transaction will be

executed completely or not at all, and that no updates will be lost When

the SQL engine detects the inability to guarantee the serializability of two

or more concurrent transactions or detects unrecoverable errors, it may

initiate a ROLLBACK WORK statement on its own

Let’s take a look at a table (Table 32.1) of the isolation levels and the

initial three phenomena (P1, P2, and P3) A “Yes” means that the

phenomena are possible under that isolation level:

Table 32.1 Isolation Levels and the Initial Three Phenomena

Isolation Levels and the Three Phenomena

Isolation Level P1 P2 P3

========================================

SERIALIZABLE No No No

REPEATABLE READ No No Yes

READ COMMITTED No Yes Yes

READ UNCOMMITTED Yes Yes Yes

same results that the concurrent transactions would have, if they had

been done in some serial order A serial execution is one in which each

transaction executes to completion before the next transaction begins

The users act as if they are standing in a line waiting to get complete

access to the database

same image of the database to the user during his session

session see rows that other transactions commit while this session is

running

Trang 5

session see rows that other transactions create without necessarily committing while this session is running

Regardless of the isolation level of the transaction, phenomena P1, P2, and P3 shall not occur during the implied reading of schema definitions performed on behalf of executing a statement, the checking

of integrity constraints, and the execution of referential actions associated with referential constraints We do not want the schema itself changing on users

32.3.3 CURSOR STABILITY Isolation Level

locking behavior for SQL cursors by adding a new read action for FETCH

from a cursor and requiring that a lock be held on the current item of the cursor The lock is held until the cursor moves or is closed, possibly by a commit Naturally, the fetching transaction can update the row, and in that case a write lock will be held on the row until the transaction

makes CURSOR STABILITY stronger than READ COMMITTED and weaker than REPEATABLE READ

prevent lost updates for rows read via a cursor READ COMMITTED, in some systems, is actually the stronger cursor stability The ANSI standard allows this

The SQL standards do not say how you are to achieve these results

However, there are two basic classes of concurrency control methods—

optimistic and pessimistic Within those two classes, each vendor will have his own implementation

32.4 Pessimistic Concurrency Control

Pessimistic concurrency control is based on the idea that transactions are expected to conflict with each other, so we need to design a system to avoid the problems before they start

All pessimistic concurrency control schemes use locks A lock is a flag placed in the database that gives exclusive access to a schema object to one user Imagine an airplane toilet door, with its “occupied” sign

The differences are the level of locking they use; setting those flags on and off costs time and resources If you lock the whole database, then you will have, in effect, a serial batch processing system, since only one transaction at a time is active In practice, you would do this only for

Trang 6

32.5 SNAPSHOT Isolation: Optimistic Concurrency 727

system maintenance work on the whole database If you lock at the table level, performance can suffer because users must wait for the most common tables to become available However, there are transactions that

do involve the whole table and this lock level will use only one flag

If you lock the table at the row level, then other users can get to the rest of the table and you will have the best possible shared access You will also have a huge number of flags to process, and performance will suffer This approach is generally not practical

Page locking is in between table and row locking This approach puts

a lock on subsets of rows within the table that include the desired values The name comes from the fact that this lock level is usually implemented with pages of physical disk storage Performance depends on the statistical distribution of data in physical storage, but it is generally a good compromise

32.5 SNAPSHOT Isolation: Optimistic Concurrency

Optimistic concurrency control is based on the idea that transactions are not very likely to conflict with each other, so we need to design a system

to handle the problems as exceptions after they actually occur

In Snapshot Isolation, each transaction reads data from a snapshot of the (committed) data as of the time the transaction started, called its Start_timestamp This time may be any time before the transaction’s first read A transaction running in Snapshot Isolation is never blocked attempting a read because it is working on its private copy of the data But this means that at any time, each data item might have multiple versions, created by active and committed transactions

When the transaction T1 is ready to commit, it gets a

commit_timestamp, which is later than any existing start_timestamp or commit_timestamp The transaction successfully COMMITs only if no other transaction T2 with a commit_timestamp in T1’s execution interval [start_timestamp, commit_timestamp] wrote data that T1 also wrote Otherwise, T1 will ROLLBACK This “first committer wins” strategy prevents lost updates (phenomenon P4) When T1 COMMITs, its changes become visible to all transactions whose start_timestamps are larger than T1’s commit-timestamp

Snapshot isolation is nonserializable because a transaction’s reads come at one instant and the writes at another We assume we have

several transactions working on the same data and a constraint that (x + y) should be positive Each transaction that writes a new value for x and y

is expected to maintain the constraint While T1 and T2 both act

Trang 7

properly in isolation, the constraint fails to hold when you put them together The possible problems are:

database constraint between two data items, x and y, in the

database Here are two anomalies arising from constraint violation:

second transaction T2 updates x and y to new values and COM-MITs Now, if T1 reads y, it may see an inconsistent state, and

therefore produce an inconsistent state as output

Skew where x = y More typically, a transaction reads two

differ-ent but related items (e.g., referdiffer-ential integrity)

with constraint C, and then a T2 reads x and y, writes x, and

and y, it might be violated As an example, consider a constraint at

a bank, where account balances are allowed to go negative as long

as the sum of commonly held balances remains nonnegative, with

an anomaly arising as in history H5

Clearly, neither A5A nor A5B could arise in histories where P2 is precluded, since both A5A and A5B have T2 write a data item that has been previously read by an uncommitted T1 Thus, phenomena A5A and A5B are only useful for distinguishing isolation levels below

The ANSI SQL definition of REPEATABLE READ, in its strictest interpretation, captures a degenerate form of row constraints, but misses the general concept To be specific, locking REPEATABLE READ on Table 2 provides protection from row constraint violations, but the ANSI SQL definition of Table 1, forbidding anomalies A1 and A2, does not Snapshot Isolation, however, is surprisingly strong, even stronger than READ COMMITTED

This approach predates databases by decades It was implemented manually in the central records department of companies when they started storing data on microfilm You do not get the actual microfilm; instead, they make a timestamped photocopy for you You take the copy

to your desk, mark it up, and return it to the central records department The Central Records clerk timestamps your updated document,

photographs it and adds it to the end of the roll of microfilm

Trang 8

32.6 Logical Concurrency Control 729

But what if user number two also went to the central records

department and got a timestamped photocopy of the same document? The Central Records clerk has to look at both timestamps and make a decision If the first user attempts to put his updates into the database while the second user is still working on his copy, then the clerk has to either hold the first copy or wait for the second copy to show up or to return it to the first user When both copies are in hand, the clerk stacks the copies on top of each other, holds them up to the light and looks to see if there are any conflicts If both updates can be made to the

database, he does so If there are conflicts, he must either have rules for resolving the problems or he has to reject both transactions This is a kind of row-level locking, done after the fact

32.6 Logical Concurrency Control

Logical concurrency control is based on the idea that the machine can analyze the predicates in the queue of waiting queries and processes on a purely logical level, and then determine which of the statements can be allowed to operate on the database at the same time

Clearly, all SELECT statements can operate at the same time, since they do not change the data After that, it is tricky to determine which statements conflict with the others For example, one pair of UPDATE

statements on two separate tables might be allowed only in a certain order because of PRIMARY KEY and FOREIGN KEY constraints Another of pair of UPDATE statements on the same tables might be disallowed because they modify the same rows and leave different final states in them

However, a third pair of UPDATE statements on the same tables might

be allowed because they modify different rows and have no conflicts with each other

There is also the problem of having statements waiting too long in the queue to be executed This is a version of livelock, which we discuss in the next section The usual solution is to assign a priority number to each waiting transaction and then decrement that priority number when they have been waiting for a certain length of time Eventually, every transaction will arrive at priority one and be able to go ahead of any other transaction

This approach also allows you to enter transactions at a higher priority than the transactions in the queue While it is possible to create a livelock this way, it is not a problem and it lets you bump less important jobs in favor of more important jobs, such as payroll checks

Trang 9

32.7 Deadlock and Livelocks

It is possible for a user to fail to complete a transaction for reasons other than the hardware failing A deadlock is a situation where two or more users hold resources that the others need and neither party will surrender the objects to which they have locks To make this more concrete, imagine that both user A and user B need tables X and Y User

A gets a lock on table X, and User B gets a lock on table Y They both sit and wait for their missing resource to become available; it never

happens The common solution for a deadlock is for the DBA to kill one

or more of the sessions involved and rollback their work

In a livelock, a user is waiting for a resource, but never gets it because other users keep grabbing it before he gets a chance None of the other users hold onto the resource permanently, as in a deadlock, but as a group they never free it To make this more concrete, imagine user A needs all of table X But a hundred other users are always updating table X, so that user A cannot find a page without a lock on in

it the table He sits and waits for all the pages to become available; it never happens in time

The DBA can, again, kill one or more of the sessions involved and rollback their work In some systems, he can raise the priority of the livelocked session so that it can seize the resources as they become available

None of this is trivial, and each database system will have its own version of transaction processing and concurrency control This should not be of great concern to the applications programmer, but should be the responsibility of the DBA But it is nice to know what happens under the covers

Trang 10

C H A P T E R

33

Optimizing SQL

THERE IS NO SET of rules for writing code that will take the best advantage

of every query optimizer on every SQL product The query optimizers depend on the underlying architecture and are simply too different for universal rules; however, we can make some general statements Just remember that you have to test code What would improve

performance in one SQL implementation might have no effect in another or make the performance worse

There are two kinds of optimizers: cost-based and based A rule-based optimizer (such as Oracle before version 7.0) looks at the syntax

of the query and plans how to execute the query without considering the size of the tables or the statistical distribution of the data It will parse a query and execute it in the order in which it was written, perhaps doing some reorganization of the query into an equivalent form using some syntax rules Basically, it is no optimizer at all

A cost-based optimizer looks at both the query and the statistical data about the database itself before deciding the best way to execute the query These decisions involve whether to use indexes, whether to use hashing, which tables to bring into main storage, what sorting technique to use, and so forth Most of the time (but not all!), it will make better decisions than a human programmer would have, simply because it has more information

Định dạng
Số trang	10
Dung lượng	245,79 KB