Joe Celko s SQL for Smarties - Advanced SQL Programming P39 ppt

354 CHAPTER 17: THE SELECT STATEMENT SELECT a, b, c FROM Foo, Bar, Flub WHERE y BETWEEN x AND w But this statement will work from inside the parentheses first, and then does the outermos

Trang 1

352 CHAPTER 17: THE SELECT STATEMENT

implemented in actual products yet, and nobody seems to be missing the OUTER UNION or CORRESPONDING clause

The INNER JOIN operator did get to be popular This was fairly easy

to implement, since vendors only had to extend the parser without having to add more functionality Additionally, it is a binary operator, and programmers are used to binary operators—add, subtract, multiply, and divide are all binary operators E-R diagrams use lines between tables to show a relational schema

But this leads to a linear approach to problem solving that might not

be such a good thing in SQL Consider this statement, which would have been written in the traditional syntax as:

SELECT a, b, c FROM Foo, Bar, Flub WHERE Foo.y BETWEEN Bar.x AND Flub.z;

With the infixed syntax, I can write this same statement in any of several ways For example:

SELECT * FROM Foo INNER JOIN Bar ON Foo.y >= Bar.x INNER JOIN

Flub ON Foo.y <= Flub.z;

Humans tend to see things that are close together as a unit or as having a relationship The extra reserved words in the infixed notation tend to work against that perception

The infixed notation invites a programmer to add one table at a time

to the chain of joins First I built and tested the Foo-Bar join, and when I was happy with the results, I added Flub “Step-wise” program

refinement was one of the mantras of structured programming

But look at the code; can you see that there is a BETWEEN relationship among the three tables? It is not easy, is it? In effect, you see only pairs of tables and not the whole problem SQL is an “all-at-once” set-oriented language, not a “step-wise” language

Technically, the SQL engine is supposed to perform the infixed joins

in left to right order as they appear in the FROM clause It is free to rearrange the order of the joins, if the rearrangement does not change

Trang 2

the results Order of execution does not make a difference with INNER JOINs, but it is very important with OUTER JOINs

Another problem is that many SQL programmers do not fully

understand the rules for the scope of names If an infixed join is given a derived table name, then all of the table names inside it are hidden from containing expressions For example, this will fail:

SELECT a, b, c wrong!

FROM (Foo

INNER JOIN

Bar ON Foo.y >= Bar.x) AS Foobar (x, y)

INNER JOIN

Flub ON Foo.y <= Flub.z;

It fails because the table name Foo is not available to the second INNER JOIN However, this will work:

SELECT a, b, c

FROM (Foo

INNER JOIN

Bar ON Foo.y >= Bar.x) AS Foobar (x, y)

INNER JOIN

Flub ON Foobar.y <= Flub.z;

If you start nesting lots of derived table expressions, you can force an order of execution in the query It is generally not a good idea to try to outguess the optimizer

So far, I have shown fully qualified column names It is a good programming practice, but it is not required Assume that Foo and Bar both have a column named w These statements will produce an

ambiguous name error:

SELECT a, b, c

FROM Foo

INNER JOIN

Bar ON y >= x

INNER JOIN

Flub ON y <= w;

Trang 3

SELECT a, b, c FROM Foo, Bar, Flub WHERE y BETWEEN x AND w But this statement will work from inside the parentheses first, and then does the outermost INNER JOIN last

SELECT a, b, c FROM Foo INNER JOIN (Bar INNER JOIN Flub ON y <= w)

ON y >= x;

If Bar did not have a column named w, then the parser would go to the next containing expression, find Foo.w, and use it

As an aside, there is a myth among new SQL programmers that the join conditions must be in the ON clause, and the search argument predicates (SARGs) must be in the WHERE clause It is a nice programming style and isolates the search arguments to one location for easy changes But it is not a requirement

Am I against infixed joins? No, but they are a bit more complicated than they first appear, and if there are some OUTER JOINs in the mix, things can be very complicated Just be careful with the new toys, kids

17.5 JOINs by Function Calls

JOINs can also be done inside functions that relate columns from one or more tables in their parameters This is easier to explain with an actual example, from John Botibol of Deverill plc in Dorset, U.K His problem was how to “flatten” legacy data stored in a flat file database into a relational format for a data warehouse The data included a vast amount

of demographic information on people, related to their subjects of interest The subjects of interest were selected from a list; some subjects required just one answer, and others allowed multiple selections The problem was that the data for multiple selections was stored as a string with a one or a zero in positional places to indicate “interested” or

“not interested” in that item The actual list of products was stored in another file as a list Thus, for one person we might have something like

Trang 4

‘101110’ together with a list like 1 = Bananas, 2 = Apples, 3 = Bread, 4 = Fish, 5 = Meat, 6 = Butter, if the subject area was foods

The data was first moved into working tables like this:

CREATE TABLE RawSurvey

(rawkey INTEGER NOT NULL PRIMARY KEY,

rawstring CHAR(20) NOT NULL);

CREATE TABLE SurveyList

(survey_id INTEGER NOT NULL PRIMARY KEY,

surveytext CHAR(30) NOT NULL);

There were always the correct number of ones and zeros for the number of question options in any group (thus, in this case, the answer strings always have six characters) and the list was in the correct order to match the positions in the string The data had to be ported into SQL, which meant that each survey had to be broken down into a row for each response

CREATE TABLE Surveys

(survey_id INTEGER NOT NULL,

surveytext CHAR(30) NOT NULL,

ticked INTEGER NOT NULL

CONSTRAINT tick_mark

CHECK (ticked IN (0, 1)) DEFAULT 0,

PRIMARY KEY (survey_id, surveytext));

This table can be loaded with the query:

INSERT INTO Surveys(survey_id, surveytext, ticked)

SELECT rawkey, surveytext,

SUBSTRING(rawstring FROM survey_id FOR 1)

FROM RawSurvey, SurveyList;

The tables are joined in the SUBSTRING() function, instead of with a theta operator The SUBSTRING() function returns an empty string if survey_id goes beyond the end of the string The query will always return a number of rows that is equal to or less than the number of characters in rawstring The technique will adjust itself correctly for any number of possible survey answers

Trang 5

In the real problem, the table SurveyList always contained exactly the right number of entries for the length of the string to be exploded, and the string to be exploded always had exactly the right number of characters, so you did not need a WHERE clause to check for bad data

The UNION JOIN was defined in Standard SQL, but I know of no SQL product that has implemented it As the name implies, it is a cross between a UNION and a FULL OUTER JOIN The definition followed easily from the other infixed JOIN operators The syntax has no searched clause:

<table expression 1> UNION JOIN <table expression 2>

The statement takes two dissimilar tables and puts them into one result table It preserves all the rows from both tables and does not try to consolidate them Columns that do not exist in one table are simply padded out with NULLs in the result rows Columns with the same names in the tables have to be renamed differently in the result It is equivalent to:

FULL OUTER JOIN <table expression 2>

ON 1 = 2;

Any searched expression that is always FALSE will work As an example of this, you might want to combine the medical records of male and female patients into one table with this query:

SELECT * FROM (SELECT 'male', prostate FROM Males) OUTER UNION

(SELECT 'female', pregnancy FROM Females);

to get a result table like this:

Result male prostate female pregnancy

==================================

'male' no NULL NULL

Trang 6

'male' no NULL NULL

'male' yes NULL NULL

NULL NULL 'female' no

NULL NULL 'female' yes

Frédéric Brouard came up with a nice trick for writing a similar join—that is, a join on one table, say a basic table of student data, with either a table of data particular to domestic students or another table of data particular to foreign students, based on the value of a parameter This differs from a true UNION JOIN in that it must have a “root” table

to use for the outer joins

CREATE TABLE Students

(student_nbr INTEGER NOT NULL PRIMARY KEY,

student_type CHAR(1) NOT NULL DEFAULT 'D'

CHECK (student_type IN ('D', 'F', ))

);

CREATE TABLE DomesticStudents

REFERENCES Students(student_nbr),

);

CREATE TABLE ForeignStudents

REFERENCES Students(student_nbr),

);

SELECT Students.*, DomesticStudents.*, ForeignStudents.*

FROM Students

LEFT OUTER JOIN

DomesticStudents

ON CASE Students.student_type

WHEN 'D' THEN 1 ELSE NULL END

= 1

LEFT OUTER JOIN

ForeignStudents

ON CASE Students.student_type

WHEN 'F'

THEN 1 ELSE NULL END = 1;

Trang 7

We can relate two tables together based on quantities in each of them The simplest example is filling customer orders from our inventories at various stores To make life easier, let’s assume that we have only one product, we process orders in increasing customer_id order, and we draw from store inventory by increasing store_id

CREATE TABLE Inventory (store_id INTEGER NOT NULL PRIMARY KEY, item_qty INTEGER NOT NULL CHECK (item_qty >= 0));

INSERT INTO Inventory (store_id, item_qty) VALUES (10, 2),(20, 3), (30, 2);

CREATE TABLE Orders (customer_id CHAR(5) NOT NULL PRIMARY KEY, item_qty INTEGER NOT NULL CHECK (item_qty > 0));

INSERT INTO Orders (customer_id, item_qty) VALUES ('Bill', 4), ('Fred', 2);

What we want to do is fill Bill’s order for four units by taking two units from store 1 and two units from store 2 Next we process Fred’s order with the one unit left in store 1, and one unit from store 3 SELECT I.store_id, O.customer_id,

(CASE WHEN O.end_running_qty <= I.end_running_qty THEN O.end_running_qty

ELSE I.end_running_qty END

- CASE WHEN O.start_running_qty >= I.start_running_qty THEN O.start_running_qty

ELSE I.start_running_qty END)

AS items_consumed_tally FROM (SELECT I1.store_id, SUM(I2.item_qty) - I1.item_qty, SUM(I2.item_qty)

FROM Inventory AS I1, Inventory AS I2 WHERE I2.store_id <= I1.store_id GROUP BY I1.store_id, I1.item_qty)

AS I (store_id, start_running_qty, end_running_qty)

Trang 8

INNER JOIN

(SELECT O1.customer_id,

SUM(O2.item_qty) - O1.item_qty,

SUM(O2.item_qty) AS end_running_qty

FROM Orders AS O1, Orders AS O2

WHERE O2.customer_id <= O1.customer_id

GROUP BY O1.customer_id, O1.item_qty)

AS O (store_id, start_running_qty, end_running_qty)

ON O.start_running_qty < I.end_running_qty

AND O.end_running_qty > I.start_running_qty;

ORDER BY store_id, customer_id;

This can also be done with the new SQL-99 OLAP operators

17.8 Dr Codd’s T-Join

Dr E F Codd introduced a set of new theta operators, called

T-operators, which were based on the idea of a best-fit or approximate equality (Codd 1990) The algorithm for the operators is easier to understand with an example modified from Dr Codd (Codd 1990) The problem is to assign the classes to the available classrooms We want (class_size < room_size) to be true after the assignments are made This will allow us a few empty seats in each room for late students We can do this in one of two ways The first way is to sort the tables in ascending order by classroom size and the number of students

in a class We start with the following tables:

CREATE TABLE Rooms

(room_nbr CHAR(2) PRIMARY KEY,

room_size INTEGER NOT NULL);

CREATE TABLE Classes

(class_nbr CHAR(2) PRIMARY KEY,

class_size INTEGER NOT NULL);

These tables have the following rows in them:

Classes

class_nbr class_size

=====================

'c1' 80

'c2' 70

Trang 9

'c3' 65 'c4' 55 'c5' 50 'c6' 40

Rooms room_nbr room_size ==================

'r1' 70 'r2' 40 'r3' 50 'r4' 85 'r5' 30 'r6' 65 'r7' 55

The goal of the T-Join problem is to assign a class that is smaller than the classroom given it (class_size < room_size) Dr Codd gives two approaches to the problem

1 Ascending Order Algorithm: Sort both tables into ascending

order Reading from the top of the Rooms table, match each class with the first room that will fit

Classes Rooms class_nbr class_size room_nbr room_size ==================== ===================

'c6' 40 'r5' 30 'c5' 50 'r2' 40 'c4' 55 'r3' 50 'c3' 65 'r7' 55 'c2' 70 'r6' 65 'c1' 80 'r1' 70 'r4' 85 Results

class_nbr class_size room_nbr room_size ========================================

'c2' 70 'r4' 85 'c3' 65 'r1' 70 'c4' 55 'r6' 65 'c5' 50 'r7' 55 'c6' 40 'r3' 50

Trang 10

2 Descending Order Algorithm: Sort both tables into descending

order Reading from the top of the Classes table, match each class with the first room that will fit

Classes Rooms

class_nbr class_size room_nbr room_size

===================== ===================

'c1' 80 'r4' 85

'c2' 70 'r1' 70

'c3' 65 'r6' 65

'c4' 55 'r7' 55

'c5' 50 'r3' 50

'c6' 40 'r2' 40

'r5' 30

Results

class_nbr class_size room_nbr room_size

=========================================

'c1' 80 'r4' 85

'c3' 65 'r1' 70

'c4' 55 'r6' 65

'c5' 50 'r7' 55

'c6' 40 'r3' 50

Notice that the answers are different! Dr Codd has never given a definition in relational algebra of the T-Join, so I propose that we need one Informally, for each class, we want the smallest room that will hold

it, while maintaining the T-Join condition Or for each room, we want the largest class that will fill it, while maintaining the T-Join condition These can be two different things, so you must decide which table is the driver But either way, I advocate a “best fit” over Codd’s “first fit” approach

In effect, the Swedish and Croatian solutions given later in this section use my definition instead of Dr Codd’s; the Colombian solution

is true to the algorithmic approach

Other theta conditions can be used in place of the “less than” shown here If “less than or equal” is used, all the classes are assigned to a room

in this case, but not in all cases This is left to the reader as an exercise The first attempts in standard SQL are versions grouped by queries They can, however, produce some rows that would be left out of the answers Dr Codd was expecting The first JOIN can be written as

Định dạng
Số trang	10
Dung lượng	131,89 KB