Joe Celko s SQL for Smarties - Advanced SQL Programming P27 pps

FOR EACH row IN the Transactions table DO IF working row NOT IN Master table THEN INSERT working row INTO the Master table; ELSE UPDATE Master table SET Master table columns to the Tran

Trang 1

232 CHAPTER 8: TABLE OPERATIONS

CREATE TABLE Foo (col_a CHAR(1) NOT NULL, col_b INTEGER NOT NULL);

INSERT INTO Foo VALUES ('A', 0),('B', 0),('C', 0);

CREATE TABLE Bar (col_a CHAR(1) NOT NULL, col_b INTEGER NOT NULL);

INSERT INTO Bar VALUES ('A', 1), ('A', 2),('B', 1), ('C', 1);

You run this proprietary UPDATE with a FROM clause:

UPDATE Foo SET Foo.col_b = Bar.col_b FROM Foo INNER JOIN Bar

ON Foo.col_a = Bar.col_a;

The result of the UPDATE cannot be determined The value of the column will depend upon either order of insertion, (if there are no clustered indexes present), or on order of clustering (but only if the cluster is not fragmented)

SQL-99 added a single statement to mimic a common magnetic tape file system “merge and insert” procedure The business logic, in a pseudo-code, is like this

FOR EACH row IN the Transactions table

DO IF working row NOT IN Master table THEN INSERT working row INTO the Master table;

ELSE UPDATE Master table SET Master table columns to the Transactions table values

WHERE they meet a matching criteria;

END IF;

END FOR;

Trang 2

8.5 MERGE Statement 233

In the 1950s, we would sort the transaction tape(s) and Master tape

on the same key, read each one looking for a match, then perform whatever logic is needed In its simplest form, the MERGE statement looks like this:

MERGE INTO <table name> [AS [<correlation name>]]

USING <table reference> ON <search condition>

{WHEN [NOT] MATCHED [AND <search condition>]

THEN <modification operation>}

[ELSE IGNORE]

You will notice that use of a correlation name in the MERGE INTO clause is in complete violation of the principle that a correlation name effectively creates a temporary table There are several other places where SQL 2003 destroyed the original SQL language model, but you do not have to write irregular syntax in all cases

After a row is matched (or not) to the target table, you can add more

<search condition>s in the WHEN clauses The <modification operation> clause can include insertion, update, or delete operations that follow the same rules as those single statements This approach can hide complex programming logic in a single statement

Let’s assume that that we have a table of Personnel salary changes at the branch office in a table called PersonnelChanges Here is a MERGE statement that will take the contents of the PersonnelChanges table and merge them with the Personnel table Both of them use the emp_nbr as the key This is a typical, but very simple, use of MERGE INTO

MERGE INTO Personnel

USING (SELECT emp_nbr, salary, bonus, comm

FROM PersonnelChanges) AS C

ON Personnel.emp_nbr = C.emp_nbr

WHEN MATCHED

THEN UPDATE

SET (Personnel.salary, Personnel.bonus, Personnel.comm) = (C.salary, C.bonus, C.comm)

WHEN NOT MATCHED

THEN INSERT

(Personnel.emp_nbr, Personnel.salary, Personnel.bonus, Personnel.comm)

VALUES (C.emp_nbr, C.salary, C.bonus, C.comm);

Trang 3

234 CHAPTER 8: TABLE OPERATIONS

Think about it for a minute If there is a match, then all you can do is update the row If there is no match, then all you can do is insert the new row

There are proprietary versions of this statement and other options In particular, look for the tern “upsert” in the literature These statements are most often used for adding data to a data warehouse

If you do not have this statement, you can get the same effect from this pseudocode block of code

BEGIN ATOMIC UPDATE T1 SET (a, b, c,

= (SELECT a, b, c,

FROM T2 WHERE T1.somekey = T2.somekey), WHERE EXISTS

(SELECT * FROM T2 WHERE T1.somekey = T2.somekey);

INSERT INTO T1 SELECT * FROM T2 WHERE NOT EXISTS (SELECT * FROM T2 WHERE T1.somekey = T2.somekey);

END;

For performance, first do the UPDATE, then the INSERT INTO If you INSERT INTO first, all rows just inserted will be affected by the UPDATE

as well

Trang 4

C H A P T E R

9

Comparison or Theta Operators

DR CODD INTRODUCED THE term “theta operators” in his early papers to refer to what a programmer would have called comparison predicate operators The large number of data types in SQL makes doing comparisons a little harder than in other programming languages Values of one data type have to be promoted to values of the other data type before the comparison can be done The available data types are implementation- and hardware-dependent, so read the manuals for your product

The comparison operators are overloaded and will work for

<numeric>, <character>, and <datetime> data types The symbols and meanings for comparison operators are shown in table 9.1 Table 9.1 Symbols and Meanings for Comparison Operators

operator numeric character datetime

=========================================================================== < : less than (collates before) (earlier than)

= : equal (collates equal to) (same time as) > : greater than (collates after) (later than)

<= : at most (collates before or equals) (no earlier than)

<> : not equal (not the same as) (not the same time as)

>= : at least (collates after or equals) (no later than)

Trang 5

236 CHAPTER 9: COMPARISON OR THETA OPERATORS

You will also see != or ~= for “not equal to” in some older SQL implementations These symbols are borrowed from the C and PL/I programming languages, respectively, and have never been part of standard SQL It is a bad habit to use them, since it destroys the portability of your code and makes it harder to read

9.1 Converting Data Types

Numeric data types are all mutually comparable and mutually assignable If an assignment will result in a loss of the most significant digits, an exception condition is raised If the least significant digits are lost, the implementation defines what rounding or truncating has occurred and does not report an exception condition Most often, one value is converted to the same data type as the other, and then the comparison is done in the usual way The chosen data type is the

“higher” of the two, using the following ordering: SMALLINT, INTEGER, BIGINT, DECIMAL, NUMERIC, REAL, FLOAT, DOUBLEPRECISION Floating-point hardware will often affect comparisons for REAL, FLOAT, and DOUBLEPRECISION numbers There is no good way to avoid this, since it is not always reasonable to use DECIMAL or NUMERIC

in their place A host language will probably use the same floating-point hardware, so at least errors will be constant across the application CHARACTER and CHARACTER VARYING data types are comparable if and only if they are taken from the same character repertoire That means that ASCII characters cannot be compared to graphics characters, English cannot be compared to Arabic, and so on In most

implementations, this is not a problem, because the database has only one repertoire

The comparison takes the shorter of the two strings and pads it with spaces The strings are compared position by position from left to right, using the collating sequence for the repertoire—ASCII or EBCDIC, in most cases

Temporal (or <datetime>, as they are called in the standard) data types are mutually assignable only if the source and target of the assignment have the same <datetime> fields That is, you cannot compare a date and a time

The CAST() operator can do explicit type conversions before you do

a comparison Table 9.2 shows the valid combinations of source and target data types in Standard SQL Y means that the combination is syntactically valid without restriction; M indicates that the combination

Trang 6

9.1 Converting Data Types 237

is valid subject to other syntax rules; and N indicates that the

combination is not valid The codes mean yes, maybe, and no in English Table 9.2 Valid Combinations of Source and Target Data Types in Standard SQL

expr> | EN AN VC FC VB FB D T TS YM DT

===============================================================

EN | Y Y Y Y N N N N N M M

AN | Y Y Y Y N N N N N N N

C | Y Y M M Y Y Y Y Y Y Y

B | N N Y Y Y Y N N N N N

D | N N Y Y N N Y N Y N N

T | N N Y Y N N N Y Y N N

TS | N N Y Y N N Y Y Y N N

YM | M N Y Y N N N N N Y N

DT | M N Y Y N N N N N N Y

In Table 9.2,

EN = Exact Numeric

AN = Approximate Numeric

C = Character (Fixed- or Variable-length)

FC = Fixed-length Character

VC = Variable-length Character

B = Bit String (Fixed- or Variable-length)

FB = Fixed-length Bit String

VB = Variable-length Bit String

D = Date

T = Time

TS = Timestamp

YM = Year-Month Interval

DT = Day-Time Interval

Trang 7

9.2 Row Comparisons in SQL

Standard SQL generalized the theta operators so they would work on row expressions and not just on scalars This feature is not yet popular, but it is very handy for situations where a key is made from more than one column, and so forth This makes SQL more orthogonal, and it has

an intuitive feel to it Take three row constants:

A = (10, 20, 30, 40);

B = (10, NULL, 30, 40);

C = (10, NULL, 30, 100);

It seems reasonable to define a row comparison as valid only when the data types of each corresponding column in the rows are union-compatible If not, the operation is an error and should report a warning

It also seems reasonable to define the results of the comparison to the ANDed results of each corresponding column using the same operator That is, (A = B) becomes:

((10, 20, 30, 40) = (10, NULL, 30, 40));

becomes:

((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 40))

becomes:

(TRUE AND UNKNOWN AND TRUE AND TRUE);

becomes:

(UNKNOWN);

This seems to be reasonable and conforms to the idea that a NULL is a missing value that we expect to resolve at a future date, so we cannot draw a conclusion about this comparison just yet Now consider the comparison (A = C), which becomes:

((10, 20, 30, 40) = (10, NULL, 30, 100));

Trang 8

9.2 Row Comparisons in SQL 239

becomes:

((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 100));

becomes:

(TRUE AND UNKNOWN AND TRUE AND FALSE);

becomes:

(FALSE);

There is no way to pick a value for column 2 of row C such that the

UNKNOWN result will change to TRUE, because the fourth column is

always FALSE This leaves you with a situation that is not very intuitive

The first case can resolve to TRUE or FALSE, but the second case can

only go to FALSE

Standard SQL decided that the theta operators would work as shown

in the table below The expression RX <comp op> RY is shorthand for

a row RX compared to a row RY; likewise, RXi means the ith column in

the row RX The results are still TRUE, FALSE, or UNKNOWN, if there is no

error in type matching The rules favor solid tests for TRUE or FALSE,

using UNKNOWN as a last resort

The idea of these rules is the same principle that you would use to

compare words alphabetically As you read the columns from left to

right, match them by position and compare each one This is how it

would work if you were alphabetizing words

The rules are

1 RX = RY is TRUE if and only if RXi = RYi for all i

2 RX <> RY is TRUE if and only if RXi <> RYi for some i.

3 RX < RY is TRUE if and only if RXi = RYi for all i < n and

RXn < RYn for some n.

4 RX > RY is TRUE if and only if RXi = RYi for all i < n and

RXn > RYn for some n.

5 RX <= RY is TRUE if and only if Rx = Ry or Rx < Ry

6 RX >= RY is TRUE if and only if Rx = Ry or Rx > Ry

7 RX = RY is FALSE if and only if RX <> RY is TRUE

Trang 9

8 RX <> RY is FALSE if and only if RX = RY is TRUE

9 RX < RY is FALSE if and only if RX >= RY is TRUE

10 RX > RY is FALSE if and only if RX <= RY is TRUE

11 RX <= RY is FALSE if and only if RX > RY is TRUE

12 RX >= RY is FALSE if and only if RX < RY is TRUE

13 RX <comp op> RY is UNKNOWN if and only if RX <comp op>

RY is neither TRUE nor FALSE

The negations are defined so that the NOT operator will still have its usual properties Notice that a NULL in a row will give an UNKNOWN result in a comparison Consider this expression:

(a, b, c) < (x, y, z)

which becomes:

((a < x)

OR ((a = x) AND (b < y))

OR ((a = x) AND (b = y) AND (c < z)))

The standard allows a single-row expression of any sort, including a single-row subquery, on either side of a comparison Likewise, the BETWEEN predicate can use row expressions in any position in Standard SQL

Trang 10

C H A P T E R

10

Valued Predicates

VALUED PREDICATES IS MY term for a set of related unary Boolean predicates that test for the logical value or NULL value of their operands

IS NULL has always been part of SQL, but the logical IS predicate was new to SQL-92, and is not well implemented at this time

10.1 IS NULL Predicate

The IS NULL predicate is a test for a NULL value in a column with the syntax:

<null predicate> ::= <row value constructor> IS [NOT] NULL

It is the only way to test to see if an expression is NULL or not, and it has been in SQL-86 and all later versions of the standard The SQL-92 standard extended it to accept <row value constructor>, instead

of a single column or scalar expression, as we saw in Section 9.2 This extended version will start showing up in implementations when other row expressions are allowed If all the values in row R are the NULL value, then R IS NULL is TRUE; otherwise, it is FALSE If none of the values in R are NULL value, R IS NOT NULL is TRUE; otherwise, it is FALSE The case where the row is a mix of NULL and

Định dạng
Số trang	10
Dung lượng	246,37 KB