Joe Celko s SQL for Smarties - Advanced SQL Programming P38 potx

However, I could write: SELECT sup_id, sup_name, SELECT COUNT* FROM Orders WHERE Suppliers.sup_id = Orders.sup_id FROM Suppliers; instead of writing: SELECT sup_id, sup_name, COUNT* FROM

Trang 1

However, I could write:

SELECT sup_id, sup_name, (SELECT COUNT(*) FROM Orders WHERE Suppliers.sup_id = Orders.sup_id) FROM Suppliers;

instead of writing:

SELECT sup_id, sup_name, COUNT(*) FROM Suppliers LEFT OUTER JOIN Orders

ON Suppliers.sup_id = Orders.sup_id GROUP BY sup_id, sup_name;

17.2.2 NULLs and OUTER JOINs

The NULLs generated by the OUTER JOIN can occur in columns derived from source table columns that have been declared to be NOT NULL Even if you tried to avoid all the problems with NULLs by making every column in every table of your database schema NOT NULL, they could still occur in OUTER JOIN and OLAP function results However,

a table can have NULLs and still be used in an OUTER JOIN Consider different JOINs on the following two tables, which have NULLs in the common column:

T1 T2

a x b x ======== ===========

1 'r' 7 'r'

2 'v' 8 's'

3 NULL 9 NULL

A natural INNER JOIN on column x can only match those values that are equal to each other But NULLs do not match to anything, even to other NULLs Thus, there is one row in the result, on the value ‘r in column x in both tables.’

Trang 2

T1 INNER JOIN T2 ON (T1.x = T2.x)

a T1.x b T2.x

========================

1 'r' 7 'r'

Now do a LEFT OUTER JOIN on the tables, which will preserve table T1, and you get:

T1 LEFT OUTER JOIN T2 ON (T1.x = T2.x)

a T1.x b T2.x

===========================

1 'r' 7 'r'

2 'v' NULL NULL

3 NULL NULL NULL

Again, there are no surprises The original INNER JOIN row is still in the results The other two rows of T1 that were not in the equi-JOIN do show up in the results, and the columns derived from table T2 are filled with NULLs The RIGHT OUTER JOIN would behave the same way The problems start with the FULL OUTER JOIN, which looks like this: T1 FULL OUTER JOIN T2 ON (T1.x = T2.x)

a T1.x b T2.x

========================

1 'r' 7 'r'

2 'v' NULL NULL

3 NULL NULL NULL

NULL NULL 8 's'

NULL NULL 9 NULL

The way this result is constructed is worth explaining in detail

First do an INNER JOIN on T1 and T2, using the ON clause condition, and put those rows (if any) in the results Then all rows in T1 that could not be joined are padded out with NULLs in the columns derived from T2 and inserted into the results Finally, take the rows in T2 that could not be joined, pad them out with NULLs, and insert them into the results The bad news is that the original tables cannot be reconstructed from an OUTER JOIN Look at the results of the FULL OUTER JOIN, which we will call R1, and SELECT the first columns from it:

Trang 3

SELECT T1.a, T1.x FROM R1

a x =========================

1 'r'

2 'v'

3 NULL NULL NULL NULL NULL

The created NULLs remain and cannot be differentiated from the original NULLs But you cannot throw out those duplicate rows, because they may be in the original table T1

17.2.3 NATURAL versus Searched OUTER JOINs

It is worth mentioning in passing that Standard SQL has a NATURAL LEFT OUTER JOIN, but it is not implemented in most current versions

of SQL Even those that have the syntax are actually creating an ON

clause with equality tests, like the examples we have been using in this chapter

A NATURAL JOIN has only one copy of the common column pairs

in its result The searched OUTER JOIN has both of the original columns, with their table-qualified names The NATURAL JOIN has to have a correlation name for the result table to identify the shared columns We can build a NATURAL LEFT OUTER JOIN by using the

COALESCE() function to combine the common column pairs into a single column and put the results into a VIEW where the columns can

be properly named, thus:

CREATE VIEW NLOJ12 (x, a, b)

AS SELECT COALESCE(T1.x, T2.x), T1.a, T2.b FROM T1 LEFT OUTER JOIN T2 ON T1.x = T2.x;

NLOJ12

x a b ==============

'r' 1 7 'v' 2 NULL NULL 3 NULL

Unlike the NATURAL JOINs, the searched OUTER JOIN does not have to use a simple one-column equality as the JOIN search condition

Trang 4

The search condition can have several predicates, use other

comparisons, and so forth For example,

T1 LEFT OUTER JOIN T2 ON (T1.x < T2.x)

a T1.x b T2.x

===================================

1 'r' 8 's'

2 'v' NULL NULL

3 NULL NULL NULL

as compared to:

T1 LEFT OUTER JOIN T2 ON (T1.x > T2.x)

a T1.x b T2.x

========================================

1 'r' NULL NULL

2 'v' 7 'r'

2 'v' 8 's'

3 NULL NULL NULL

Again, so much of current OUTER JOIN behavior is vendor-specific that the programmer should experiment with his own particular product

to see what actually happens

17.2.4 Self OUTER JOINs

There is no rule that forbids an OUTER JOIN on the same table In fact, this kind of self-join is a good trick for “flattening” a normalized table into a horizontal report To illustrate the method, start with a table defined as

CREATE TABLE Credits

(student_nbr INTEGER NOT NULL,

course_name CHAR(8) NOT NULL,

PRIMARY KEY (student_nbr, course_name));

This table represents student IDs and a course name for each class they have taken However, our rules say that students cannot get credit for CS-102 until they have taken its prerequisite, CS-101; they cannot get credit for CS-103 until they have taken its prerequisite, CS-102; and

so forth Let’s first load the table with some sample values

Trang 5

Notice that student 1 has both courses, student 2 has only the first of the series, and student 3 jumped ahead of sequence and therefore cannot get credit for his CS-102 course until he goes back and takes CS-101 as a prerequisite

Credits student_nbr course_name ==========================

1 'CS-101'

1 'CS-102'

2 'CS-101'

3 'CS-102'

What we want is basically a histogram (bar chart) for each student, showing how far he or she has gone in his or her degree programs Assume that we are only looking at two courses; the result of the desired query might look like this (NULL is used to represent a missing value): (1, 'CS-101', 'CS-102')

(2, 'CS-101', NULL)

Clearly, this will need a self-JOIN, since the last two columns come from the same table, Credits You have to give correlation names to both uses of the Credits table in the OUTER JOIN operator when you

construct a self OUTER JOIN, just as you would with any other SELF-JOIN, thus:

SELECT student_nbr, C1.course_name, C2.course_name FROM Credits AS C1 LEFT OUTER JOIN Credits AS C2

ON C1.student_nbr = C2.student_nbr AND C1.course_name = 'CS-101' AND C2.course_name = 'CS-102';

17.2.5 Two or More OUTER JOINs

Some relational purists feel that every operator should have an inverse, and therefore they do not like the OUTER JOIN Others feel that the created NULLs are fundamentally different from the explicit NULLs in a base table and should have a special token SQL uses its general-purpose

NULLs and leaves things at that Getting away from theory, you will also find that vendors have often done strange things with the ways their products work

Trang 6

A major problem is that OUTER JOIN operators do not have the same properties as INNER JOIN operators The order in which FULL OUTER JOINs are executed will change the results (a mathematician would say that they are not associative) To show some of the problems that can come up when you have more than two tables, let us use three very simple two-column tables Notice that some of the column values match and some do not match, but the three tables have all possible pairs of column names in them

CREATE TABLE T1 (a INTEGER NOT NULL, b INTEGER NOT NULL); INSERT INTO T1 VALUES (1, 2);

CREATE TABLE T2 (a INTEGER NOT NULL, c INTEGER NOT NULL); INSERT INTO T2 VALUES (1, 3);

CREATE TABLE T3 (b INTEGER NOT NULL, c INTEGER NOT NULL); INSERT INTO T3 VALUES (2, 100);

Now let’s try some of the possible orderings of the three tables in a chain of LEFT OUTER JOINS The problem is that a table can be preserved or unpreserved in the immediate JOIN and in the opposite state in the containing JOIN

SELECT T1.a, T1.b, T3.c

FROM ((T1 NATURAL LEFT OUTER JOIN T2)

NATURAL LEFT OUTER JOIN T3);

Result

a b c

===========

1 2 NULL

SELECT T1.a, T1.b, T3.c

FROM ((T1 NATURAL LEFT OUTER JOIN T3)

NATURAL LEFT OUTER JOIN T2);

Result

a b c

===========

1 2 100

Trang 7

SELECT T1.a, T1.b, T3.c FROM ((T1 NATURAL LEFT OUTER JOIN T3) NATURAL LEFT OUTER JOIN T2);

Result

a b c ==============

NULL NULL NULL

Even worse, the choice of column in the SELECT list can change the output Instead of displaying T3.c, use T2.c and you will get:

SELECT T1.a, T1.b, T2.c FROM ((T2 NATURAL LEFT OUTER JOIN T3) NATURAL LEFT OUTER JOIN T1);

Result

a b c ===========

NULL NULL 3

17.2.6 OUTER JOINs and Aggregate Functions

At the start of this chapter, we had a table of orders and a table of suppliers, which were to be used to build a report to tell us how much business we did with each supplier The query that will do this is: SELECT Suppliers.sup_id, sup_name, SUM(order_amt)

FROM Suppliers LEFT OUTER JOIN Orders

ON Suppliers.sup_id = Orders.sup_id GROUP BY sup_id, sup_name;

Some suppliers’ totals include credits for returned merchandise, so that our total business with them worked out to zero dollars Each supplier with which we did no business will have a NULL in its order_amt column in the OUTER JOIN The usual rules for aggregate functions with NULL values apply, so these suppliers will also show a zero total amount It is also possible to use a function inside an aggregate function, so you could write SUM(COALESCE(T1.x, T2.x)) for the common column pairs

Trang 8

If you need to tell the difference between a true sum of zero and the result of a NULL in an OUTER JOIN, use the MIN() or MAX() function

on the questionable column These functions both return a NULL result for a NULL input, so an expression inside the MAX() function could be used to print the message MAX(COALESCE(order_amt, 'No

Orders')), for example

Likewise, these functions could be used in a HAVING clause, but that would defeat the purpose of an OUTER JOIN

17.2.7 FULL OUTER JOIN

The FULL OUTER JOIN is a mix of the LEFT and RIGHT OUTER JOINs, with preserved rows constructed from both tables The statement takes two tables and puts them in one result table Again, this is easier to explain with an example than with a formal definition It is also a way to show how to form a query that will perform the same function Using Suppliers and Orders again, we find that we have suppliers with whom

we have done no business, but we also have orders for which we have not decided on suppliers To get all orders and all suppliers in one result table, we could use the SQL-89 query:

SELECT sup_id, sup_name, order_amt regular INNER JOIN FROM Suppliers, Orders

WHERE Suppliers.sup_id = Orders.sup_id

UNION ALL

SELECT sup_id, sup_name, CAST (NULL AS INTEGER) preserved rows of LEFT JOIN

FROM Suppliers

WHERE NOT EXISTS (SELECT *

FROM Orders

WHERE Suppliers.sup_id = Orders.sup_id) UNION ALL

SELECT CAST (NULL AS CHAR(2)), CAST (NULL AS CHAR(10)),

order_amt preserved rows of RIGHT JOIN

FROM Orders

WHERE NOT EXISTS (SELECT *

FROM Suppliers

WHERE Suppliers.sup_id = Orders.sup_id);

The same thing in Standard SQL would be:

Trang 9

SELECT sup_id, sup_name, order_amt FROM Orders FULL OUTER JOIN Suppliers

ON (Suppliers.sup_id = Orders.sup_id);

The FULL OUTER JOIN is not used as much as a LEFT or RIGHT OUTER JOIN When you are doing a report, it is usually done from a viewpoint that leads to preserving only one side of the JOIN

That is, you might ask “What suppliers got no business from us?” or ask “What orders have not been assigned a supplier?” but a combination

of the two questions is not likely to be in the same report

17.2.8 WHERE Clause OUTER JOIN Operators

As we have seen, SQL engines that use special operators in the WHERE

clause for OUTER JOIN syntax get strange results But with the Standard SQL syntax for OUTER JOINs, the programmer has to be careful in the

WHERE to qualify the JOIN columns of the same name to be sure that he picks up the preserved column Both of these are legal queries:

SELECT * FROM T1 LEFT OUTER JOIN T2

ON T1.a = T2.a WHERE T1.a = 15;

versus SELECT * FROM T1 LEFT OUTER JOIN T2

ON T1.a = T2.a WHERE T2.a = 15;

However, the second one will reject the rows with generated NULLs in them If that is what you wanted, then why bother with an OUTER JOIN

in the first place?

There is also a UNION JOIN in the SQL-92 Standard, which returns the results of a FULL OUTER JOIN without the rows that were in the

INNER JOIN of the two tables No product has implemented it as of

2005

Figure 17.1 shows the various JOINs

Trang 10

17.3 Old versus New JOIN Syntax

One of the classics of software engineering is a short paper by the late Edsger Dijkstra entitled “Go To Statement Considered Harmful”

(Dijkstra 1968, pp 147-148) Dijkstra argued for dropping the GOTO

statement from programming languages in favor of what we now call structured programming

One of his observations was that programs that used blocks, WHILE

loops, and IF-THEN-ELSE statements were easier to read and maintain Programs that jumped around via GOTO statements were harder to follow, because the execution path could have arrived at a statement label from anywhere in the code

With the SQL-92 Standard, we added a set of infixed join operators

to SQL, making the syntax closer to the way that relational algebra looks The infixed OUTER JOIN syntax was meant to replace several different vendor options, which all had different syntax and semantics It was absolutely needed

But while we were fixing that problem, we also added a few more options because they were easy to define Most of them have not been

Figure 17.1

SQL JOIN

Functions.

Định dạng
Số trang	10
Dung lượng	143 KB