Joe Celko s SQL for Smarties - Advanced SQL Programming P25 potx

212 CHAPTER 8: TABLE OPERATIONS the searched deletion uses a WHERE clause like the search condition in a SELECT statement.. 8.1.1 The DELETE FROM Clause The syntax for a searched deletio

Trang 1

212 CHAPTER 8: TABLE OPERATIONS

the searched deletion uses a WHERE clause like the search condition in a

SELECT statement

8.1.1 The DELETE FROM Clause

The syntax for a searched deletion statement is:

<delete statement: searched> :: = DELETE FROM <table name>

[WHERE <search condition>]

The DELETE FROM clause simply gives the name of the updatable table or view to be changed Notice that no correlation name is allowed

in the DELETE FROM clause The SQL model for an alias table name is that the engine effectively creates a new table with that new name and populates it with rows identical to the base table or updatable view from which it was built If you had a correlation name, you would be deleting from this system-created temporary table, and it would vanish at the end

of the statement The base table would never have been touched For this discussion, we will assume the user doing the deletion has applicable DELETE privileges for the table The positioned deletion removes the row in the base table that is the source of the current cursor row The syntax is:

<delete statement: positioned> :: = DELETE FROM <table name>

WHERE CURRENT OF <cursor name>

Cursors in SQL are generally more expensive than nonprocedural code and, despite the existence of the Standard, they vary widely in current implementations If you have a properly designed table with a key, you should be able to avoid them in a DELETE FROM statement

8.1.2 The WHERE Clause

The most important thing to remember about the WHERE clause is that it

is optional If there is no WHERE clause, all rows in the table are deleted The table structure still exists, but there are no rows

Most, but not all, interactive SQL tools will give the user a warning when he or she is about to do this and ask for confirmation Unless you want to clear out the table, immediately do a ROLLBACK to restore it; if you COMMIT or have set the tool to automatically commit the work, then

Trang 2

8.1 DELETE FROM Statement 213

the data is pretty much gone The DBA will have to do something to save you And don’t feel badly about doing it at least once while you are learning SQL

Because we wish to remove a subset of rows all at once, we cannot simply scan the table one row at a time and remove each qualifying row

as it is encountered The way most SQL implementations do a deletion is with two passes on the table The first pass marks all of the candidate rows that meet the WHERE clause condition This is also when most products check to see if the deletion will violate any constraints The most common violations involve trying to remove a value that is

referenced by a foreign key (“Hey, we still have orders for those pink lawn flamingoes; you cannot drop them from inventory yet!”) But other constraints in CREATE ASSERTION statements’ CHECK() constraints can also cause a ROLLBACK

After the subset is validated, the second pass removes it, either immediately or by marking the rows so that a housekeeping routine can later reclaim the storage space Then any further housekeeping, such as updating indexes, is done last

The important point is that while the rows are being marked, the entire table is still available for the WHERE condition to use In many if not most cases, this two-pass method does not make any difference in the results The WHERE clause is usually a fairly simple predicate that references constants or relationships among the columns of a row For example, we could clear out some Personnel with this deletion:

DELETE FROM Personnel

WHERE iq <= 100; constant in simple predicate

or:

WHERE hat_size = iq; uses columns in the same row

A good optimizer could recognize that these predicates do not depend on the table as a whole, and would use a single scan for them The two passes make a difference when the table references itself Let’s fire employees with IQs that are below average for their departments DELETE FROM Personnel

WHERE iq < (SELECT AVG(P1.iq)

FROM Personnel AS P1 must have correlation name

Trang 3

WHERE Personnel.dept_nbr = P1.dept_nbr);

We have the following data:

Personnel emp_nbr dept_nbr iq ======================

'Able' 'Acct' 101 'Baker' 'Acct' 105 'Charles' 'Acct' 106 'Henry' 'Mkt' 101 'Celko' 'Mkt' 170 'Popkin' 'HR' 120

If this were done one row at a time, we would first go to Accounting and find the average IQ, (101 + 105 + 106)/3.0 = 104, and fire Able Then we would move sequentially down the table, and again find the average IQ, (105 + 106)/2.0 = 105.5 and fire Baker Only Charles would escape the downsizing

Now sort the table a little differently, so that the rows are visited in reverse alphabetic order We first read Charles’s IQ and compute the average for Accounting (101 + 105 + 106)/3.0 = 104, and retain Charles Then we would move sequentially down the table, with the average IQ unchanged, so we also retain Baker Able, however, is downsized when that row comes up

It might be worth noting that early versions of DB2 would delete rows

in the sequential order in which they appear in physical storage Sybase’s SQL Anywhere (née WATCOM SQL) has an optional ORDER BY clause that sorts the table, then does a sequential deletion on the table This feature can be used to force a sequential deletion in cases where order does not matter, thus optimizing the statement by saving a second pass over the table But it also can give the desired results in situations where you would otherwise have to use a cursor and a host language

Anders Altberg, Johannes Becher, and I tested different versions of a

DELETE statement whose goal was to remove all but one row of a group The column dup_cnt is a count of the duplicates of that row in the original table The three statements tested were:

D1:

DELETE FROM Test

Trang 4

WHERE EXISTS (SELECT *

FROM Test AS T1

WHERE T1.dup_id = Test.dup_id

AND T1.dup_cnt < dup_cnt)

D2:

DELETE FROM Test

WHERE dup_cnt > (SELECT MIN(T1.dup_cnt)

FROM Test AS T1

WHERE T1.dup_id = Test.dup_id);

D3:

BEGIN ATOMIC

INSERT INTO WorkingTable(dup_id, min_dup_cnt)

SELECT dup_id, MIN(dup_cnt)

FROM Test

GROUP BY dup_id;

DELETE FROM Test

WHERE dup_cnt > (SELECT min_dup_cnt

FROM WorkingTable

WHERE Working.dup_id = Test.dup_id);

END;

Their relative execution speeds in one SQL desktop product were: D1 3.20 seconds

D2 31.22 seconds

D3 0.17 seconds

Without seeing the execution plans, I would guess that statement D1 went to an index for the EXISTS() test and returned TRUE on the first item it found On the other hand, D2 scanned each subset in the partitioning of Test by dup_id to find the MIN() over and over Finally, the D3 version simply does a JOIN on simple scalar columns With full SQL-92, you could write D3 as:

D3-2:

DELETE FROM Test

WHERE dup_cnt >

(SELECT min_dup_cnt

FROM (SELECT dup_id, MIN(dup_cnt)

Trang 5

FROM Test GROUP BY dup_id) AS WorkingTable(dup_id, min_dup_cnt)

WHERE Working.dup_id = Test.dup_id);

Having said all of this, the faster way to remove redundant duplicates

is most often with a CURSOR that does a full table scan

8.1.3 Deleting Based on Data in a Second Table

The WHERE clause can be as complex as you wish This means you can have subqueries that use other tables For example, to remove customers who have paid their bills from the Deadbeats table, you can use a correlated EXISTS predicate, thus:

DELETE FROM Deadbeats WHERE EXISTS (SELECT * FROM Payments AS P1 WHERE Deadbeats.cust_nbr = P1.cust_nbr AND P1.amtpaid >= Deadbeats.amtdue);

The scope rules from SELECT statements also apply to the WHERE

clause of a DELETE FROM statement, but it is a good idea to qualify all of the column names

8.1.4 Deleting within the Same Table

SQL allows a DELETE FROM statement to use columns, constants, and aggregate functions drawn from the table itself For example, it is perfectly all right to remove everyone who is below average in a class with this statement:

DELETE FROM Students WHERE grade < (SELECT AVG(grade) FROM Students);

But the DELETE FROM clause does not allow for correlation names on the table in the DELETE FROM clause, so not all WHERE clauses that could be written as part of a SELECT statement will work in a DELETE FROM statement For example, a self-join on the working table in a subquery is impossible

DELETE FROM Personnel AS B1 correlation name is INVALID SQL

Trang 6

WHERE Personnel.boss_nbr = B1.emp_nbr

AND Personnel.salary > B1.salary);

There are ways to work around this One trick is to build a VIEW of

the table and use the VIEW instead of a correlation name Consider the

problem of finding all employees who are now earning more than their

boss and deleting them The employee table being used has a column for

the employee’s identification number, emp_nbr, and another column for

the boss’s employee identification number, boss_nbr

CREATE VIEW Bosses

AS SELECT emp_nbr, salary FROM Personnel;

WHERE EXISTS (SELECT *

FROM Bosses AS B1

WHERE Personnel.boss_nbr = B1.emp_nbr

AND Personnel.salary > B1.salary);

Simply using the Personnel table in the subquery will not work We

need an outer reference in the WHERE clause to the Personnel table in the

subquery, and we cannot get that if the Personnel table is in the

subquery Such views should be as small as possible, so that the SQL

engine can materialize them in main storage

Redundant Duplicates in a Table

Redundant duplicates are unneeded copies of a row in a table You most

often get them because you did not put a UNIQUE constraint on the table

and then you inserted the same data twice Removing the extra copies

from a table in SQL is much harder than you would think If fact, if the

rows are exact duplicates, you cannot do it with a simple DELETE FROM

statement Removing redundant duplicates involves saving one of them

while deleting the other(s) But if SQL has no way to tell them apart, it

will delete all rows that were qualified by the WHERE clause Another

problem is that the deletion of a row from a base table can trigger

referential actions, which can have unwanted side effects

For example, if there is a referential integrity constraint that says a

deletion in Table1 will cascade and delete matching rows in Table2,

removing redundant duplicates from T1 can leave me with no matching

rows in T2 Yet I still have a referential integrity rule that says there must

be at least one match in T2 for the single row I preserved in T1 SQL

Trang 7

allows constraints to be deferrable or nondeferrable, so you might be able to suspend the referential actions that the transaction below would cause:

BEGIN INSERT INTO WorkingTable use DISTINCT to kill duplicates SELECT DISTINCT * FROM MessedUpTable;

DELETE FROM MessedUpTable; clean out messed-up table INSERT INTO MessedUpTable put working table into it SELECT * FROM WorkingTable;

DROP TABLE WorkingTable; get rid of working table END;

Removal of Redundant Duplicates with ROWID

Leonard C Medel came up with several interesting ways to delete redundant duplicate rows from a table in an Oracle database

Let’s assume that we have a table:

CREATE TABLE Personnel (emp_id INTEGER NOT NULL, name CHAR(30) NOT NULL, .);

The classic Oracle “delete dups” solution is the statement:

DELETE FROM Personnel WHERE ROWID < (SELECT MAX(P1.ROWID) FROM Personnel AS P1 WHERE P1.dup_id = Personnel.dup_id AND P1.name = Personnel.name);

AND );

The column, or more properly pseudo-column, ROWID is based on the physical location of a row in storage It can change after a user session but not during the session It is the fastest possible physical access method into an Oracle table, because it goes directly to the physical address of the data It is also a complete violation of Dr Codd’s rules, which require that the physical representation of the data be hidden from the users

Trang 8

Doing a quick test on a 100,000-row table, Mr Medel achieved a nearly tenfold improvement with these two alternatives In English, the first alternative is to find the highest ROWID for each group of one or more duplicate rows, and then delete every row, except the one with highest ROWID

WHERE ROWID

IN (SELECT P2.ROWID

FROM Personnel AS P2,

(SELECT P3.dup_id, P3.name,

MAX(P3.ROWID) AS max_rowid

FROM Personnel AS P3

GROUP BY P3.dup_id, P3.name, )

AS P4

WHERE P2.ROWID <> P4.max_rowid

AND P2.dup_id = P4.dup_id

AND P2.name = P4.name);

Notice that the GROUP BY clause needs all the columns in the table The second approach is to notice that the set of all rows in the table minus the set of rows we want to keep defines the set of rows to delete This gives us the following statement:

WHERE ROWID

IN (SELECT P2.ROWID

EXCEPT

SELECT MAX(P3.ROWID)

GROUP BY P3.dup_id, P3.name, );

Both of these approaches are faster than the short, classic version because they avoid a correlated subquery expression in the WHERE

clause

Trang 9

8.1.5 Deleting in Multiple Tables without Referential Integrity

There is no way to directly delete rows from more than one table in a single DELETE FROM statement There are two approaches to removing related rows from multiple tables One is to use a temporary table of the deletion values; the other is to use referential integrity actions For the purposes of this section, let us assume that we have a database with an Orders table and an Inventory table Our business rule is that when something is out of stock, we delete it from all the orders

Assume that no referential integrity constraints have been declared at all First, create a temporary table of the products to be deleted based on your search criteria, then use that table in a correlated subquery to remove rows from each table involved

CREATE MODULE Foobar CREATE LOCAL TEMPORARY TABLE Discontinue (part_nbr INTEGER NOT NULL UNIQUE)

ON COMMIT DELETE ROWS;

PROCEDURE CleanInventory( ) BEGIN ATOMIC

INSERT INTO Discontinue SELECT DISTINCT part_nbr pick out the items to be removed FROM

WHERE ; using whatever criteria you require DELETE FROM Orders

WHERE part_nbr IN (SELECT part_nbr FROM Discontinue);

DELETE FROM Inventory WHERE part_nbr IN (SELECT part_nbr FROM Discontinue);

COMMIT WORK;

END;

END MODULE;

In the Standard SQL model, the temporary table is persistent in the schema, but its content is not TEMPORARY tables are always empty at the start of a session, and they always appear to belong only to the user

of the session The GLOBAL option means that each application gets one copy of the table for all the modules, while LOCAL would limit the scope

to the module in which it is declared

Trang 10

8.2 INSERT INTO Statement 221

The INSERT INTO statement is the only way to get new data into a base table In practice, there are always other tools for loading large amounts

of data into a table, but they are very vendor-dependent

8.2.1 INSERT INTO Clause

The syntax for INSERT INTO is:

<insert statement> :: =

INSERT INTO <table name>

<insert columns and source> :: =

[(<insert column list>)]

| VALUES <table value constructor list>

| DEFAULT VALUES

<table value constructor list> :: =

<row value constructor> [{<comma> <row value

constructor>} ]

<row value constructor> :: =

| <left paren> <row value constructor list> <right paren> | <row subquery>

<row value constructor list> :: =

[{<comma> <row value constructor element>} ]

<row value constructor element> :: =

<value expression> | NULL |DEFAULT

The two basic forms of an INSERT INTO are a table constant (usually

a single row) insertion and a query insertion The table constant

insertion is done with a VALUES() clause The list of insert values usually consists of constants or explicit NULLs, but in theory they could

be almost any expression, including scalar SELECT subqueries

Định dạng
Số trang	10
Dung lượng	127,63 KB