212 CHAPTER 8: TABLE OPERATIONS the searched deletion uses a WHERE clause like the search condition in a SELECT statement.. 8.1.1 The DELETE FROM Clause The syntax for a searched deletio
Trang 1212 CHAPTER 8: TABLE OPERATIONS
the searched deletion uses a WHERE clause like the search condition in a
SELECT statement
8.1.1 The DELETE FROM Clause
The syntax for a searched deletion statement is:
<delete statement: searched> :: = DELETE FROM <table name>
[WHERE <search condition>]
The DELETE FROM clause simply gives the name of the updatable table or view to be changed Notice that no correlation name is allowed
in the DELETE FROM clause The SQL model for an alias table name is that the engine effectively creates a new table with that new name and populates it with rows identical to the base table or updatable view from which it was built If you had a correlation name, you would be deleting from this system-created temporary table, and it would vanish at the end
of the statement The base table would never have been touched For this discussion, we will assume the user doing the deletion has applicable DELETE privileges for the table The positioned deletion removes the row in the base table that is the source of the current cursor row The syntax is:
<delete statement: positioned> :: = DELETE FROM <table name>
WHERE CURRENT OF <cursor name>
Cursors in SQL are generally more expensive than nonprocedural code and, despite the existence of the Standard, they vary widely in current implementations If you have a properly designed table with a key, you should be able to avoid them in a DELETE FROM statement
8.1.2 The WHERE Clause
The most important thing to remember about the WHERE clause is that it
is optional If there is no WHERE clause, all rows in the table are deleted The table structure still exists, but there are no rows
Most, but not all, interactive SQL tools will give the user a warning when he or she is about to do this and ask for confirmation Unless you want to clear out the table, immediately do a ROLLBACK to restore it; if you COMMIT or have set the tool to automatically commit the work, then
Trang 28.1 DELETE FROM Statement 213
the data is pretty much gone The DBA will have to do something to save you And don’t feel badly about doing it at least once while you are learning SQL
Because we wish to remove a subset of rows all at once, we cannot simply scan the table one row at a time and remove each qualifying row
as it is encountered The way most SQL implementations do a deletion is with two passes on the table The first pass marks all of the candidate rows that meet the WHERE clause condition This is also when most products check to see if the deletion will violate any constraints The most common violations involve trying to remove a value that is
referenced by a foreign key (“Hey, we still have orders for those pink lawn flamingoes; you cannot drop them from inventory yet!”) But other constraints in CREATE ASSERTION statements’ CHECK() constraints can also cause a ROLLBACK
After the subset is validated, the second pass removes it, either immediately or by marking the rows so that a housekeeping routine can later reclaim the storage space Then any further housekeeping, such as updating indexes, is done last
The important point is that while the rows are being marked, the entire table is still available for the WHERE condition to use In many if not most cases, this two-pass method does not make any difference in the results The WHERE clause is usually a fairly simple predicate that references constants or relationships among the columns of a row For example, we could clear out some Personnel with this deletion:
DELETE FROM Personnel
WHERE iq <= 100; constant in simple predicate
or:
DELETE FROM Personnel
WHERE hat_size = iq; uses columns in the same row
A good optimizer could recognize that these predicates do not depend on the table as a whole, and would use a single scan for them The two passes make a difference when the table references itself Let’s fire employees with IQs that are below average for their departments DELETE FROM Personnel
WHERE iq < (SELECT AVG(P1.iq)
FROM Personnel AS P1 must have correlation name
Trang 3214 CHAPTER 8: TABLE OPERATIONS
WHERE Personnel.dept_nbr = P1.dept_nbr);
We have the following data:
Personnel emp_nbr dept_nbr iq ======================
'Able' 'Acct' 101 'Baker' 'Acct' 105 'Charles' 'Acct' 106 'Henry' 'Mkt' 101 'Celko' 'Mkt' 170 'Popkin' 'HR' 120
If this were done one row at a time, we would first go to Accounting and find the average IQ, (101 + 105 + 106)/3.0 = 104, and fire Able Then we would move sequentially down the table, and again find the average IQ, (105 + 106)/2.0 = 105.5 and fire Baker Only Charles would escape the downsizing
Now sort the table a little differently, so that the rows are visited in reverse alphabetic order We first read Charles’s IQ and compute the average for Accounting (101 + 105 + 106)/3.0 = 104, and retain Charles Then we would move sequentially down the table, with the average IQ unchanged, so we also retain Baker Able, however, is downsized when that row comes up
It might be worth noting that early versions of DB2 would delete rows
in the sequential order in which they appear in physical storage Sybase’s SQL Anywhere (née WATCOM SQL) has an optional ORDER BY clause that sorts the table, then does a sequential deletion on the table This feature can be used to force a sequential deletion in cases where order does not matter, thus optimizing the statement by saving a second pass over the table But it also can give the desired results in situations where you would otherwise have to use a cursor and a host language
Anders Altberg, Johannes Becher, and I tested different versions of a
DELETE statement whose goal was to remove all but one row of a group The column dup_cnt is a count of the duplicates of that row in the original table The three statements tested were:
D1:
DELETE FROM Test
Trang 48.1 DELETE FROM Statement 215
WHERE EXISTS (SELECT *
FROM Test AS T1
WHERE T1.dup_id = Test.dup_id
AND T1.dup_cnt < dup_cnt)
D2:
DELETE FROM Test
WHERE dup_cnt > (SELECT MIN(T1.dup_cnt)
FROM Test AS T1
WHERE T1.dup_id = Test.dup_id);
D3:
BEGIN ATOMIC
INSERT INTO WorkingTable(dup_id, min_dup_cnt)
SELECT dup_id, MIN(dup_cnt)
FROM Test
GROUP BY dup_id;
DELETE FROM Test
WHERE dup_cnt > (SELECT min_dup_cnt
FROM WorkingTable
WHERE Working.dup_id = Test.dup_id);
END;
Their relative execution speeds in one SQL desktop product were: D1 3.20 seconds
D2 31.22 seconds
D3 0.17 seconds
Without seeing the execution plans, I would guess that statement D1 went to an index for the EXISTS() test and returned TRUE on the first item it found On the other hand, D2 scanned each subset in the partitioning of Test by dup_id to find the MIN() over and over Finally, the D3 version simply does a JOIN on simple scalar columns With full SQL-92, you could write D3 as:
D3-2:
DELETE FROM Test
WHERE dup_cnt >
(SELECT min_dup_cnt
FROM (SELECT dup_id, MIN(dup_cnt)
Trang 5216 CHAPTER 8: TABLE OPERATIONS
FROM Test GROUP BY dup_id) AS WorkingTable(dup_id, min_dup_cnt)
WHERE Working.dup_id = Test.dup_id);
Having said all of this, the faster way to remove redundant duplicates
is most often with a CURSOR that does a full table scan
8.1.3 Deleting Based on Data in a Second Table
The WHERE clause can be as complex as you wish This means you can have subqueries that use other tables For example, to remove customers who have paid their bills from the Deadbeats table, you can use a correlated EXISTS predicate, thus:
DELETE FROM Deadbeats WHERE EXISTS (SELECT * FROM Payments AS P1 WHERE Deadbeats.cust_nbr = P1.cust_nbr AND P1.amtpaid >= Deadbeats.amtdue);
The scope rules from SELECT statements also apply to the WHERE
clause of a DELETE FROM statement, but it is a good idea to qualify all of the column names
8.1.4 Deleting within the Same Table
SQL allows a DELETE FROM statement to use columns, constants, and aggregate functions drawn from the table itself For example, it is perfectly all right to remove everyone who is below average in a class with this statement:
DELETE FROM Students WHERE grade < (SELECT AVG(grade) FROM Students);
But the DELETE FROM clause does not allow for correlation names on the table in the DELETE FROM clause, so not all WHERE clauses that could be written as part of a SELECT statement will work in a DELETE FROM statement For example, a self-join on the working table in a subquery is impossible
DELETE FROM Personnel AS B1 correlation name is INVALID SQL
Trang 68.1 DELETE FROM Statement 217
WHERE Personnel.boss_nbr = B1.emp_nbr
AND Personnel.salary > B1.salary);
There are ways to work around this One trick is to build a VIEW of
the table and use the VIEW instead of a correlation name Consider the
problem of finding all employees who are now earning more than their
boss and deleting them The employee table being used has a column for
the employee’s identification number, emp_nbr, and another column for
the boss’s employee identification number, boss_nbr
CREATE VIEW Bosses
AS SELECT emp_nbr, salary FROM Personnel;
DELETE FROM Personnel
WHERE EXISTS (SELECT *
FROM Bosses AS B1
WHERE Personnel.boss_nbr = B1.emp_nbr
AND Personnel.salary > B1.salary);
Simply using the Personnel table in the subquery will not work We
need an outer reference in the WHERE clause to the Personnel table in the
subquery, and we cannot get that if the Personnel table is in the
subquery Such views should be as small as possible, so that the SQL
engine can materialize them in main storage
Redundant Duplicates in a Table
Redundant duplicates are unneeded copies of a row in a table You most
often get them because you did not put a UNIQUE constraint on the table
and then you inserted the same data twice Removing the extra copies
from a table in SQL is much harder than you would think If fact, if the
rows are exact duplicates, you cannot do it with a simple DELETE FROM
statement Removing redundant duplicates involves saving one of them
while deleting the other(s) But if SQL has no way to tell them apart, it
will delete all rows that were qualified by the WHERE clause Another
problem is that the deletion of a row from a base table can trigger
referential actions, which can have unwanted side effects
For example, if there is a referential integrity constraint that says a
deletion in Table1 will cascade and delete matching rows in Table2,
removing redundant duplicates from T1 can leave me with no matching
rows in T2 Yet I still have a referential integrity rule that says there must
be at least one match in T2 for the single row I preserved in T1 SQL
Trang 7218 CHAPTER 8: TABLE OPERATIONS
allows constraints to be deferrable or nondeferrable, so you might be able to suspend the referential actions that the transaction below would cause:
BEGIN INSERT INTO WorkingTable use DISTINCT to kill duplicates SELECT DISTINCT * FROM MessedUpTable;
DELETE FROM MessedUpTable; clean out messed-up table INSERT INTO MessedUpTable put working table into it SELECT * FROM WorkingTable;
DROP TABLE WorkingTable; get rid of working table END;
Removal of Redundant Duplicates with ROWID
Leonard C Medel came up with several interesting ways to delete redundant duplicate rows from a table in an Oracle database
Let’s assume that we have a table:
CREATE TABLE Personnel (emp_id INTEGER NOT NULL, name CHAR(30) NOT NULL, .);
The classic Oracle “delete dups” solution is the statement:
DELETE FROM Personnel WHERE ROWID < (SELECT MAX(P1.ROWID) FROM Personnel AS P1 WHERE P1.dup_id = Personnel.dup_id AND P1.name = Personnel.name);
AND );
The column, or more properly pseudo-column, ROWID is based on the physical location of a row in storage It can change after a user session but not during the session It is the fastest possible physical access method into an Oracle table, because it goes directly to the physical address of the data It is also a complete violation of Dr Codd’s rules, which require that the physical representation of the data be hidden from the users
Trang 88.1 DELETE FROM Statement 219
Doing a quick test on a 100,000-row table, Mr Medel achieved a nearly tenfold improvement with these two alternatives In English, the first alternative is to find the highest ROWID for each group of one or more duplicate rows, and then delete every row, except the one with highest ROWID
DELETE FROM Personnel
WHERE ROWID
IN (SELECT P2.ROWID
FROM Personnel AS P2,
(SELECT P3.dup_id, P3.name,
MAX(P3.ROWID) AS max_rowid
FROM Personnel AS P3
GROUP BY P3.dup_id, P3.name, )
AS P4
WHERE P2.ROWID <> P4.max_rowid
AND P2.dup_id = P4.dup_id
AND P2.name = P4.name);
Notice that the GROUP BY clause needs all the columns in the table The second approach is to notice that the set of all rows in the table minus the set of rows we want to keep defines the set of rows to delete This gives us the following statement:
DELETE FROM Personnel
WHERE ROWID
IN (SELECT P2.ROWID
FROM Personnel AS P2
EXCEPT
SELECT MAX(P3.ROWID)
FROM Personnel AS P3
GROUP BY P3.dup_id, P3.name, );
Both of these approaches are faster than the short, classic version because they avoid a correlated subquery expression in the WHERE
clause
Trang 9220 CHAPTER 8: TABLE OPERATIONS
8.1.5 Deleting in Multiple Tables without Referential Integrity
There is no way to directly delete rows from more than one table in a single DELETE FROM statement There are two approaches to removing related rows from multiple tables One is to use a temporary table of the deletion values; the other is to use referential integrity actions For the purposes of this section, let us assume that we have a database with an Orders table and an Inventory table Our business rule is that when something is out of stock, we delete it from all the orders
Assume that no referential integrity constraints have been declared at all First, create a temporary table of the products to be deleted based on your search criteria, then use that table in a correlated subquery to remove rows from each table involved
CREATE MODULE Foobar CREATE LOCAL TEMPORARY TABLE Discontinue (part_nbr INTEGER NOT NULL UNIQUE)
ON COMMIT DELETE ROWS;
PROCEDURE CleanInventory( ) BEGIN ATOMIC
INSERT INTO Discontinue SELECT DISTINCT part_nbr pick out the items to be removed FROM
WHERE ; using whatever criteria you require DELETE FROM Orders
WHERE part_nbr IN (SELECT part_nbr FROM Discontinue);
DELETE FROM Inventory WHERE part_nbr IN (SELECT part_nbr FROM Discontinue);
COMMIT WORK;
END;
END MODULE;
In the Standard SQL model, the temporary table is persistent in the schema, but its content is not TEMPORARY tables are always empty at the start of a session, and they always appear to belong only to the user
of the session The GLOBAL option means that each application gets one copy of the table for all the modules, while LOCAL would limit the scope
to the module in which it is declared
Trang 108.2 INSERT INTO Statement 221
The INSERT INTO statement is the only way to get new data into a base table In practice, there are always other tools for loading large amounts
of data into a table, but they are very vendor-dependent
8.2.1 INSERT INTO Clause
The syntax for INSERT INTO is:
<insert statement> :: =
INSERT INTO <table name>
<insert columns and source>
<insert columns and source> :: =
[(<insert column list>)]
<query expression>
| VALUES <table value constructor list>
| DEFAULT VALUES
<table value constructor list> :: =
<row value constructor> [{<comma> <row value
constructor>} ]
<row value constructor> :: =
<row value constructor element>
| <left paren> <row value constructor list> <right paren> | <row subquery>
<row value constructor list> :: =
<row value constructor element>
[{<comma> <row value constructor element>} ]
<row value constructor element> :: =
<value expression> | NULL |DEFAULT
The two basic forms of an INSERT INTO are a table constant (usually
a single row) insertion and a query insertion The table constant
insertion is done with a VALUES() clause The list of insert values usually consists of constants or explicit NULLs, but in theory they could
be almost any expression, including scalar SELECT subqueries