FOR EACH row IN the Transactions table DO IF working row NOT IN Master table THEN INSERT working row INTO the Master table; ELSE UPDATE Master table SET Master table columns to the Tran
Trang 1232 CHAPTER 8: TABLE OPERATIONS
CREATE TABLE Foo (col_a CHAR(1) NOT NULL, col_b INTEGER NOT NULL);
INSERT INTO Foo VALUES ('A', 0),('B', 0),('C', 0);
CREATE TABLE Bar (col_a CHAR(1) NOT NULL, col_b INTEGER NOT NULL);
INSERT INTO Bar VALUES ('A', 1), ('A', 2),('B', 1), ('C', 1);
You run this proprietary UPDATE with a FROM clause:
UPDATE Foo SET Foo.col_b = Bar.col_b FROM Foo INNER JOIN Bar
ON Foo.col_a = Bar.col_a;
The result of the UPDATE cannot be determined The value of the column will depend upon either order of insertion, (if there are no clustered indexes present), or on order of clustering (but only if the cluster is not fragmented)
SQL-99 added a single statement to mimic a common magnetic tape file system “merge and insert” procedure The business logic, in a pseudo-code, is like this
FOR EACH row IN the Transactions table
DO IF working row NOT IN Master table THEN INSERT working row INTO the Master table;
ELSE UPDATE Master table SET Master table columns to the Transactions table values
WHERE they meet a matching criteria;
END IF;
END FOR;
Trang 28.5 MERGE Statement 233
In the 1950s, we would sort the transaction tape(s) and Master tape
on the same key, read each one looking for a match, then perform whatever logic is needed In its simplest form, the MERGE statement looks like this:
MERGE INTO <table name> [AS [<correlation name>]]
USING <table reference> ON <search condition>
{WHEN [NOT] MATCHED [AND <search condition>]
THEN <modification operation>}
[ELSE IGNORE]
You will notice that use of a correlation name in the MERGE INTO clause is in complete violation of the principle that a correlation name effectively creates a temporary table There are several other places where SQL 2003 destroyed the original SQL language model, but you do not have to write irregular syntax in all cases
After a row is matched (or not) to the target table, you can add more
<search condition>s in the WHEN clauses The <modification operation> clause can include insertion, update, or delete operations that follow the same rules as those single statements This approach can hide complex programming logic in a single statement
Let’s assume that that we have a table of Personnel salary changes at the branch office in a table called PersonnelChanges Here is a MERGE statement that will take the contents of the PersonnelChanges table and merge them with the Personnel table Both of them use the emp_nbr as the key This is a typical, but very simple, use of MERGE INTO
MERGE INTO Personnel
USING (SELECT emp_nbr, salary, bonus, comm
FROM PersonnelChanges) AS C
ON Personnel.emp_nbr = C.emp_nbr
WHEN MATCHED
THEN UPDATE
SET (Personnel.salary, Personnel.bonus, Personnel.comm) = (C.salary, C.bonus, C.comm)
WHEN NOT MATCHED
THEN INSERT
(Personnel.emp_nbr, Personnel.salary, Personnel.bonus, Personnel.comm)
VALUES (C.emp_nbr, C.salary, C.bonus, C.comm);
Trang 3234 CHAPTER 8: TABLE OPERATIONS
Think about it for a minute If there is a match, then all you can do is update the row If there is no match, then all you can do is insert the new row
There are proprietary versions of this statement and other options In particular, look for the tern “upsert” in the literature These statements are most often used for adding data to a data warehouse
If you do not have this statement, you can get the same effect from this pseudocode block of code
BEGIN ATOMIC UPDATE T1 SET (a, b, c,
= (SELECT a, b, c,
FROM T2 WHERE T1.somekey = T2.somekey), WHERE EXISTS
(SELECT * FROM T2 WHERE T1.somekey = T2.somekey);
INSERT INTO T1 SELECT * FROM T2 WHERE NOT EXISTS (SELECT * FROM T2 WHERE T1.somekey = T2.somekey);
END;
For performance, first do the UPDATE, then the INSERT INTO If you INSERT INTO first, all rows just inserted will be affected by the UPDATE
as well
Trang 4C H A P T E R
9
Comparison or Theta Operators
DR CODD INTRODUCED THE term “theta operators” in his early papers to refer to what a programmer would have called comparison predicate operators The large number of data types in SQL makes doing comparisons a little harder than in other programming languages Values of one data type have to be promoted to values of the other data type before the comparison can be done The available data types are implementation- and hardware-dependent, so read the manuals for your product
The comparison operators are overloaded and will work for
<numeric>, <character>, and <datetime> data types The symbols and meanings for comparison operators are shown in table 9.1 Table 9.1 Symbols and Meanings for Comparison Operators
operator numeric character datetime
=========================================================================== < : less than (collates before) (earlier than)
= : equal (collates equal to) (same time as) > : greater than (collates after) (later than)
<= : at most (collates before or equals) (no earlier than)
<> : not equal (not the same as) (not the same time as)
>= : at least (collates after or equals) (no later than)
Trang 5236 CHAPTER 9: COMPARISON OR THETA OPERATORS
You will also see != or ~= for “not equal to” in some older SQL implementations These symbols are borrowed from the C and PL/I programming languages, respectively, and have never been part of standard SQL It is a bad habit to use them, since it destroys the portability of your code and makes it harder to read
9.1 Converting Data Types
Numeric data types are all mutually comparable and mutually assignable If an assignment will result in a loss of the most significant digits, an exception condition is raised If the least significant digits are lost, the implementation defines what rounding or truncating has occurred and does not report an exception condition Most often, one value is converted to the same data type as the other, and then the comparison is done in the usual way The chosen data type is the
“higher” of the two, using the following ordering: SMALLINT, INTEGER, BIGINT, DECIMAL, NUMERIC, REAL, FLOAT, DOUBLEPRECISION Floating-point hardware will often affect comparisons for REAL, FLOAT, and DOUBLEPRECISION numbers There is no good way to avoid this, since it is not always reasonable to use DECIMAL or NUMERIC
in their place A host language will probably use the same floating-point hardware, so at least errors will be constant across the application CHARACTER and CHARACTER VARYING data types are comparable if and only if they are taken from the same character repertoire That means that ASCII characters cannot be compared to graphics characters, English cannot be compared to Arabic, and so on In most
implementations, this is not a problem, because the database has only one repertoire
The comparison takes the shorter of the two strings and pads it with spaces The strings are compared position by position from left to right, using the collating sequence for the repertoire—ASCII or EBCDIC, in most cases
Temporal (or <datetime>, as they are called in the standard) data types are mutually assignable only if the source and target of the assignment have the same <datetime> fields That is, you cannot compare a date and a time
The CAST() operator can do explicit type conversions before you do
a comparison Table 9.2 shows the valid combinations of source and target data types in Standard SQL Y means that the combination is syntactically valid without restriction; M indicates that the combination
Trang 69.1 Converting Data Types 237
is valid subject to other syntax rules; and N indicates that the
combination is not valid The codes mean yes, maybe, and no in English Table 9.2 Valid Combinations of Source and Target Data Types in Standard SQL
<value | <cast target>
expr> | EN AN VC FC VB FB D T TS YM DT
===============================================================
EN | Y Y Y Y N N N N N M M
AN | Y Y Y Y N N N N N N N
C | Y Y M M Y Y Y Y Y Y Y
B | N N Y Y Y Y N N N N N
D | N N Y Y N N Y N Y N N
T | N N Y Y N N N Y Y N N
TS | N N Y Y N N Y Y Y N N
YM | M N Y Y N N N N N Y N
DT | M N Y Y N N N N N N Y
In Table 9.2,
EN = Exact Numeric
AN = Approximate Numeric
C = Character (Fixed- or Variable-length)
FC = Fixed-length Character
VC = Variable-length Character
B = Bit String (Fixed- or Variable-length)
FB = Fixed-length Bit String
VB = Variable-length Bit String
D = Date
T = Time
TS = Timestamp
YM = Year-Month Interval
DT = Day-Time Interval
Trang 7238 CHAPTER 9: COMPARISON OR THETA OPERATORS
9.2 Row Comparisons in SQL
Standard SQL generalized the theta operators so they would work on row expressions and not just on scalars This feature is not yet popular, but it is very handy for situations where a key is made from more than one column, and so forth This makes SQL more orthogonal, and it has
an intuitive feel to it Take three row constants:
A = (10, 20, 30, 40);
B = (10, NULL, 30, 40);
C = (10, NULL, 30, 100);
It seems reasonable to define a row comparison as valid only when the data types of each corresponding column in the rows are union-compatible If not, the operation is an error and should report a warning
It also seems reasonable to define the results of the comparison to the ANDed results of each corresponding column using the same operator That is, (A = B) becomes:
((10, 20, 30, 40) = (10, NULL, 30, 40));
becomes:
((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 40))
becomes:
(TRUE AND UNKNOWN AND TRUE AND TRUE);
becomes:
(UNKNOWN);
This seems to be reasonable and conforms to the idea that a NULL is a missing value that we expect to resolve at a future date, so we cannot draw a conclusion about this comparison just yet Now consider the comparison (A = C), which becomes:
((10, 20, 30, 40) = (10, NULL, 30, 100));
Trang 89.2 Row Comparisons in SQL 239
becomes:
((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 100));
becomes:
(TRUE AND UNKNOWN AND TRUE AND FALSE);
becomes:
(FALSE);
There is no way to pick a value for column 2 of row C such that the
UNKNOWN result will change to TRUE, because the fourth column is
always FALSE This leaves you with a situation that is not very intuitive
The first case can resolve to TRUE or FALSE, but the second case can
only go to FALSE
Standard SQL decided that the theta operators would work as shown
in the table below The expression RX <comp op> RY is shorthand for
a row RX compared to a row RY; likewise, RXi means the ith column in
the row RX The results are still TRUE, FALSE, or UNKNOWN, if there is no
error in type matching The rules favor solid tests for TRUE or FALSE,
using UNKNOWN as a last resort
The idea of these rules is the same principle that you would use to
compare words alphabetically As you read the columns from left to
right, match them by position and compare each one This is how it
would work if you were alphabetizing words
The rules are
1 RX = RY is TRUE if and only if RXi = RYi for all i
2 RX <> RY is TRUE if and only if RXi <> RYi for some i.
3 RX < RY is TRUE if and only if RXi = RYi for all i < n and
RXn < RYn for some n.
4 RX > RY is TRUE if and only if RXi = RYi for all i < n and
RXn > RYn for some n.
5 RX <= RY is TRUE if and only if Rx = Ry or Rx < Ry
6 RX >= RY is TRUE if and only if Rx = Ry or Rx > Ry
7 RX = RY is FALSE if and only if RX <> RY is TRUE
Trang 9240 CHAPTER 9: COMPARISON OR THETA OPERATORS
8 RX <> RY is FALSE if and only if RX = RY is TRUE
9 RX < RY is FALSE if and only if RX >= RY is TRUE
10 RX > RY is FALSE if and only if RX <= RY is TRUE
11 RX <= RY is FALSE if and only if RX > RY is TRUE
12 RX >= RY is FALSE if and only if RX < RY is TRUE
13 RX <comp op> RY is UNKNOWN if and only if RX <comp op>
RY is neither TRUE nor FALSE
The negations are defined so that the NOT operator will still have its usual properties Notice that a NULL in a row will give an UNKNOWN result in a comparison Consider this expression:
(a, b, c) < (x, y, z)
which becomes:
((a < x)
OR ((a = x) AND (b < y))
OR ((a = x) AND (b = y) AND (c < z)))
The standard allows a single-row expression of any sort, including a single-row subquery, on either side of a comparison Likewise, the BETWEEN predicate can use row expressions in any position in Standard SQL
Trang 10C H A P T E R
10
Valued Predicates
VALUED PREDICATES IS MY term for a set of related unary Boolean predicates that test for the logical value or NULL value of their operands
IS NULL has always been part of SQL, but the logical IS predicate was new to SQL-92, and is not well implemented at this time
10.1 IS NULL Predicate
The IS NULL predicate is a test for a NULL value in a column with the syntax:
<null predicate> ::= <row value constructor> IS [NOT] NULL
It is the only way to test to see if an expression is NULL or not, and it has been in SQL-86 and all later versions of the standard The SQL-92 standard extended it to accept <row value constructor>, instead
of a single column or scalar expression, as we saw in Section 9.2 This extended version will start showing up in implementations when other row expressions are allowed If all the values in row R are the NULL value, then R IS NULL is TRUE; otherwise, it is FALSE If none of the values in R are NULL value, R IS NOT NULL is TRUE; otherwise, it is FALSE The case where the row is a mix of NULL and