Joe Celko s SQL for Smarties - Advanced SQL Programming P35 potx

312 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES When you get a single FALSE result, the whole predicate is FALSE.. To do this, we would first construct a grouped VIEW and group it again:

Trang 1

312 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES

When you get a single FALSE result, the whole predicate is FALSE As long as <table expression> has cardinality greater than zero and all non-NULL values, you will get a result of TRUE or FALSE

That sounds reasonable so far Now let EmptyTable be an empty table (no rows, cardinality zero) and NullTable be a table with only

NULLs in its rows (cardinality greater than zero) The rules for SQL say that <value expression> <comp op> ALL NullTable always returns UNKNOWN, and likewise <value expression> <comp op>

ANY NullTable always returns UNKNOWN This makes sense, because every row comparison test in the expansion would return UNKNOWN, so the series of OR and AND operators would behave in the usual way

However, <value expression> <comp op> ALL EmptyTable

always returns TRUE, and <value expression> <comp op> ANY EmptyTable always returns FALSE Most people have no trouble seeing why the ANY predicate works that way; you cannot find a match, so the result is FALSE But most people have lots of trouble seeing why the ALL

predicate is TRUE This convention is called existential import, and I have just discussed it in Chapter 15 If I were to walk into a bar and announce that I can beat any pink elephant in the bar, that would be a true statement The fact that there are no pink elephants in the bar merely shows that the problem is reduced to the minimum case

If this seems unnatural, then convert the ALL and ANY predicates into

EXISTS predicates and look at the way that this rule preserves the properties that:

1 (∀x P(x)) = (¬ ∃ x (¬P(x)))

2 (∃ x P(x)) = ¬ (∀x ¬P(x))

The Table1.x <comp op> ALL (SELECT y FROM Table2 WHERE <search condition>) predicate converts to:

NOT EXISTS (SELECT * FROM Table1, Table2 WHERE Table1.x <comp op> Table2.y AND NOT <search condition>)

The Table1.x <comp op> ANY (SELECT y FROM Table2 WHERE <search condition>) predicate converts to:

EXISTS (SELECT *

Trang 2

16.3 The ALL Predicate and Extrema Functions 313

FROM Table1, Table2

WHERE Table1.x <comp op> Table2.y

AND <search condition>)

Of the two quantified predicates, the <comp op> ALL predicate is used more The ANY predicate is more easily replaced and more naturally written with an EXISTS() predicate or an IN() predicate In fact, the standard defines the IN() predicate as shorthand for = ANY and the

NOT IN() predicate as shorthand for <> ANY, which is how most people would construct them in English

The <comp op> ALL predicate is probably the more useful of the two, since it cannot be written in terms of an IN() predicate The trick with it is to make sure that its subquery defines the set of values in which you are interested For example, to find the authors whose books all sell for $19.95 or more, you could write:

SELECT *

FROM Authors AS A1

WHERE 19.95

< ALL (SELECT price

FROM Books AS B1

WHERE A1.author_name = B1.author_name);

The best way to think of this is to reverse the usual English sentence

“Show me all x that are y” in your mind so that it says “y is the value of all x” instead

16.3 The ALL Predicate and Extrema Functions

It is counterintuitive at first that these two predicates are not the same in SQL:

x >= (SELECT MAX(y) FROM Table1)

x >= ALL (SELECT y FROM Table1)

But you have to remember the rules for the extrema functions—they drop out all the NULLs before returning the greater or least values The

ALL predicate does not drop NULLs, so you can get them in the results However, if you know that there are no NULLs in a column, or are willing to drop the NULLs yourself, then you can use the ALL predicate to construct single queries to do work that would otherwise be done by two

Trang 3

314 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES

queries For example, we could use the table of products and store managers we used earlier in this chapter and find which manager handles the largest number of products To do this, we would first construct a grouped VIEW and group it again:

CREATE VIEW TotalProducts (manager_name, product_tally)

AS SELECT manager_name, COUNT(*) FROM Stores

GROUP BY manager_name;

SELECT manager_name FROM TotalProducts WHERE product_tally = (SELECT MAX(product_tally) FROM TotalProducts);

But Alex Dorfman found a single query solution instead:

SELECT manager_name, COUNT(*) FROM Stores

GROUP BY manager_name HAVING COUNT(*) + 1 > ALL (SELECT DISTINCT COUNT(*) FROM Stores

GROUP BY manager_name);

The use of the SELECT DISTINCT in the subquery is to guarantee that we do not get duplicate rows when two managers handle the same number of products You can also add a WHERE dept IS NOT NULL clause to the subquery to get the effect of a true MAX() aggregate function

16.4 The UNIQUE Predicate

The UNIQUE predicate is a test for the absence of duplicate rows in a subquery The UNIQUE keyword is also used as a table or column This predicate is used to define the constraint The UNIQUE column

constraint is implemented in many SQL implementations with a CREATE UNIQUE INDEX <indexname> ON <table>(<column list>)

statement hidden under the covers The syntax for this predicate is:

Trang 4

16.4 The UNIQUE Predicate 315

<unique predicate> ::= UNIQUE <table subquery>

If any two rows in the subquery are equal to each other, the predicate

is FALSE However, the definition in the standard is worded in the negative, so that NULLs get the benefit of the doubt The query can be written as an EXISTS predicate that counts rows, thus:

EXISTS (SELECT <column list>

FROM <subquery>

WHERE (<column list>) IS NOT NULL

GROUP BY <column list>

HAVING COUNT(*) > 1);

An empty subquery is always TRUE, since you cannot find two rows, and therefore duplicates do not exist This makes sense on the face of it

NULLs are easier to explain with an example—say a table with only two rows,('a', 'b') and ('a', NULL) The first columns of each row are non-NULL and are equal to each other, so we have a match so far The second column in the second row is NULL and cannot compare

to anything, so we skip the second column pair and go with what we have, and the test is TRUE This is giving the NULLs the benefit of the doubt, since the NULL in the second row could become ‘b’ some day and give us a duplicate row

Now consider the case where the subquery has two rows, ('a', NULL) and ('a', NULL) The predicate is still TRUE, because the

NULLs do not test equal or unequal to each other—not because we are making NULLs equal to each other

As you can see, it is a good idea to avoid NULLs in UNIQUE

constraints

Trang 6

C H A P T E R

17

The SELECT Statement

THE GOOD NEWS ABOUT SQL is that the programmer only needs to learn the SELECT statement to do almost all his work The bad news

is that the statement can have so many nested clauses that it looks like a Victorian novel! The SELECT statement is used to query the database It combines one or more tables, can do some calculations, and finally puts the results into a result table that can be passed on to the host language

I have not spent much time on the simple one-table SELECT

statements you see in introductory books I am assuming that the readers are experienced SQL programmers and got enough of those queries when they were learning SQL

17.1 SELECT and JOINs

There is an order to the execution of the clauses of an SQL SELECT

statement that does not seem to be covered in most beginning SQL books It explains why some things work in SQL and others do not

17.1.1 One-Level SELECT Statement

The simplest possible SELECT statement is just “SELECT * FROM Sometable;” which returns the entire table as it stands You can actually write this as “TABLE Sometable” in Standard SQL, but

Trang 7

318 CHAPTER 17: THE SELECT STATEMENT

nobody seems to use that syntax Though the syntax rules say that all you need are the SELECT and FROM clauses, in practice there is almost always a WHERE clause

Let’s look at the SELECT statement in detail The syntax for the statement is:

SELECT [ALL | DISTINCT] <scalar expression list>

FROM <table expression>

[WHERE <search condition>]

[GROUP BY <grouping column list>]

[HAVING <group condition>];

The order of execution is as follows:

1 Execute theFROM <table expression>clause and construct the working result table defined in that clause. The FROM can have all sorts of other table expressions, but the point is that they return a working table as a result We will get into the details of those expressions later, with particular attention to the JOIN

operators

The result table preserves the order of the tables, and the order of the columns within each, in the result The result table

is different from other tables in that each column retains the table name from which it was derived Thus if table A and table

B both have a column named x, there will be a column A.x and

a column B.x in the results of the FROM clause No product actually uses a CROSS JOIN to construct the intermediate table—the working table would get too large too fast For example, a 1,000-row table and a 1,000-row table

would-CROSS JOIN to get a 1,000,000-row working table This is just the conceptual model we use to describe behavior

2 If there is a WHEREclause, apply the search condition in it to each row of the FROMclause result table. The rows that test TRUE are retained; the rows that test FALSE or UNKNOWN are deleted from the working set

The WHERE clause is where the action is The predicate can

be quite complex and have nested subqueries The syntax of a subquery is a SELECT statement, which is inside parentheses— failure to use parentheses is a common error for new SQL pro-grammers Subqueries are where the original SQL got the name

Trang 8

17.1 SELECT and JOINs 319

“Structured English Query Language”—the ability to nest

SELECT statements was the “structured” part We will deal with

those in another section

3 If there is a GROUP BY <grouping column list> clause,

execute it next. It uses the FROM and WHERE clause working

table and breaks these rows into groups where the columns in

the <grouping column list> all have the same value

NULLs are treated as if they were all equal to each other, and

form their own group Each group is then reduced to a single

row in a new result table that replaces the old one

Each row represents information about its group Standard

SQL does not allow you to use the name of a calculated column

such as “(salary + commission) AS total_pay” in the

GROUP BY clause, because that column is computed and

named in the SELECT clause of this query It does not exist yet

However, you will find products that allow it because they

cre-ate a result table first, using names in the SELECT cause, then

fill the result table with rows created by the query There are

ways to get the same result by using VIEWs and derived table

expressions, which we will discuss later

Only four things make sense as group characteristics: the

columns that define it, the aggregate functions that summarize

group characteristics, function calls and constants, and

expres-sions built from those three things

4 If there is a HAVING clause, apply it to each of the groups. The

groups that test TRUE are retained; the groups that test FALSE

or UNKNOWN are deleted If there is no GROUP BY clause, the

HAVING clause treats the whole table as a single group It is not

true that there must be a GROUP BY clause

Standard SQL prohibits correlated queries in a HAVING

clause, but there are workarounds that use derived tables

The <group condition> must apply to columns in the

grouped working table or to group properties, not to the

indi-vidual rows that originally built the group Aggregate functions

used in the HAVING clause usually appear in the SELECT

clause, but that is not part of the standard Nor does the

SELECT clause have to include all the grouping columns

5 Finally, apply the SELECT clause to the result table If a column

does not appear in the <expression list>, it is dropped

Trang 9

320 CHAPTER 17: THE SELECT STATEMENT

from the final results Expressions can be constants or column names, or they can be calculations made from constants, columns, functions, and scalar subqueries

If the SELECT clause has the DISTINCT option, redundant duplicate rows are deleted from the final result table The phrase “redundant duplicate” means that one copy of the row

is retained If the SELECT clause has the explicit ALL option or

is missing the [ALL | DISTINCT] option, then all duplicate rows are preserved in the final results table (Frankly, although

it is legal syntax, nobody really uses the SELECT ALL option.) Finally, the results are returned

Let us carry an example out in detail, with a two-table join

SELECT sex, COUNT(*), AVG(age), (MAX(age) - MIN(age)) AS age_range

FROM Students, Gradebook WHERE grade = 'A'

AND Students.stud_nbr = Gradebook.stud_nbr GROUP BY sex

HAVING COUNT(*) > 3;

The two starting tables look like this:

CREATE TABLE Students (stud_nbr INTEGER NOT NULL PRIMARY KEY, stud_name CHAR(10) NOT NULL,

sex CHAR(1) NOT NULL, age INTEGER NOT NULL);

Students stud_nbr stud_name sex age

===============================

1 'Smith' 'M' 16

2 'Smyth' 'F' 17

3 'Smoot' 'F' 16

4 'Adams' 'F' 17

5 'Jones' 'M' 16

6 'Celko' 'M' 17

Trang 10

17.1 SELECT and JOINs 321

7 'Vennor' 'F' 16

8 'Murray' 'M' 18

CREATE TABLE Gradebook

(stud_nbr INTEGER NOT NULL PRIMARY KEY

REFERENCES Students(stud_nbr),

grade CHAR(1) NOT NULL);

Gradebook

stud_nbr grade

=================

1 'A'

2 'B'

3 'C'

4 'D'

5 'A'

6 'A'

7 'A'

8 'A'

The CROSS JOIN in the FROM clause looks like this:

Cross Join working table

Students Gradebook

stud_nbr stud_name sex age | stud_nbr grade

====================================================

1 'Smith' 'M' 16 | 1 'A'

1 'Smith' 'M' 16 | 2 'B'

1 'Smith' 'M' 16 | 3 'C'

1 'Smith' 'M' 16 | 4 'D'

1 'Smith' 'M' 16 | 5 'A'

1 'Smith' 'M' 16 | 6 'A'

1 'Smith' 'M' 16 | 7 'A'

1 'Smith' 'M' 16 | 8 'A'

2 'Smyth' 'F' 17 | 1 'A'

2 'Smyth' 'F' 17 | 2 'B'

2 'Smyth' 'F' 17 | 3 'C'

2 'Smyth' 'F' 17 | 4 'D'

2 'Smyth' 'F' 17 | 5 'A'

2 'Smyth' 'F' 17 | 6 'A'

2 'Smyth' 'F' 17 | 7 'A'

Định dạng
Số trang	10
Dung lượng	243,77 KB