312 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES When you get a single FALSE result, the whole predicate is FALSE.. To do this, we would first construct a grouped VIEW and group it again:
Trang 1312 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES
When you get a single FALSE result, the whole predicate is FALSE As long as <table expression> has cardinality greater than zero and all non-NULL values, you will get a result of TRUE or FALSE
That sounds reasonable so far Now let EmptyTable be an empty table (no rows, cardinality zero) and NullTable be a table with only
NULLs in its rows (cardinality greater than zero) The rules for SQL say that <value expression> <comp op> ALL NullTable always returns UNKNOWN, and likewise <value expression> <comp op>
ANY NullTable always returns UNKNOWN This makes sense, because every row comparison test in the expansion would return UNKNOWN, so the series of OR and AND operators would behave in the usual way
However, <value expression> <comp op> ALL EmptyTable
always returns TRUE, and <value expression> <comp op> ANY EmptyTable always returns FALSE Most people have no trouble seeing why the ANY predicate works that way; you cannot find a match, so the result is FALSE But most people have lots of trouble seeing why the ALL
predicate is TRUE This convention is called existential import, and I have just discussed it in Chapter 15 If I were to walk into a bar and announce that I can beat any pink elephant in the bar, that would be a true statement The fact that there are no pink elephants in the bar merely shows that the problem is reduced to the minimum case
If this seems unnatural, then convert the ALL and ANY predicates into
EXISTS predicates and look at the way that this rule preserves the properties that:
1 (∀x P(x)) = (¬ ∃ x (¬P(x)))
2 (∃ x P(x)) = ¬ (∀x ¬P(x))
The Table1.x <comp op> ALL (SELECT y FROM Table2 WHERE <search condition>) predicate converts to:
NOT EXISTS (SELECT * FROM Table1, Table2 WHERE Table1.x <comp op> Table2.y AND NOT <search condition>)
The Table1.x <comp op> ANY (SELECT y FROM Table2 WHERE <search condition>) predicate converts to:
EXISTS (SELECT *
Trang 216.3 The ALL Predicate and Extrema Functions 313
FROM Table1, Table2
WHERE Table1.x <comp op> Table2.y
AND <search condition>)
Of the two quantified predicates, the <comp op> ALL predicate is used more The ANY predicate is more easily replaced and more naturally written with an EXISTS() predicate or an IN() predicate In fact, the standard defines the IN() predicate as shorthand for = ANY and the
NOT IN() predicate as shorthand for <> ANY, which is how most people would construct them in English
The <comp op> ALL predicate is probably the more useful of the two, since it cannot be written in terms of an IN() predicate The trick with it is to make sure that its subquery defines the set of values in which you are interested For example, to find the authors whose books all sell for $19.95 or more, you could write:
SELECT *
FROM Authors AS A1
WHERE 19.95
< ALL (SELECT price
FROM Books AS B1
WHERE A1.author_name = B1.author_name);
The best way to think of this is to reverse the usual English sentence
“Show me all x that are y” in your mind so that it says “y is the value of all x” instead
16.3 The ALL Predicate and Extrema Functions
It is counterintuitive at first that these two predicates are not the same in SQL:
x >= (SELECT MAX(y) FROM Table1)
x >= ALL (SELECT y FROM Table1)
But you have to remember the rules for the extrema functions—they drop out all the NULLs before returning the greater or least values The
ALL predicate does not drop NULLs, so you can get them in the results However, if you know that there are no NULLs in a column, or are willing to drop the NULLs yourself, then you can use the ALL predicate to construct single queries to do work that would otherwise be done by two
Trang 3314 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES
queries For example, we could use the table of products and store managers we used earlier in this chapter and find which manager handles the largest number of products To do this, we would first construct a grouped VIEW and group it again:
CREATE VIEW TotalProducts (manager_name, product_tally)
AS SELECT manager_name, COUNT(*) FROM Stores
GROUP BY manager_name;
SELECT manager_name FROM TotalProducts WHERE product_tally = (SELECT MAX(product_tally) FROM TotalProducts);
But Alex Dorfman found a single query solution instead:
SELECT manager_name, COUNT(*) FROM Stores
GROUP BY manager_name HAVING COUNT(*) + 1 > ALL (SELECT DISTINCT COUNT(*) FROM Stores
GROUP BY manager_name);
The use of the SELECT DISTINCT in the subquery is to guarantee that we do not get duplicate rows when two managers handle the same number of products You can also add a WHERE dept IS NOT NULL clause to the subquery to get the effect of a true MAX() aggregate function
16.4 The UNIQUE Predicate
The UNIQUE predicate is a test for the absence of duplicate rows in a subquery The UNIQUE keyword is also used as a table or column This predicate is used to define the constraint The UNIQUE column
constraint is implemented in many SQL implementations with a CREATE UNIQUE INDEX <indexname> ON <table>(<column list>)
statement hidden under the covers The syntax for this predicate is:
Trang 416.4 The UNIQUE Predicate 315
<unique predicate> ::= UNIQUE <table subquery>
If any two rows in the subquery are equal to each other, the predicate
is FALSE However, the definition in the standard is worded in the negative, so that NULLs get the benefit of the doubt The query can be written as an EXISTS predicate that counts rows, thus:
EXISTS (SELECT <column list>
FROM <subquery>
WHERE (<column list>) IS NOT NULL
GROUP BY <column list>
HAVING COUNT(*) > 1);
An empty subquery is always TRUE, since you cannot find two rows, and therefore duplicates do not exist This makes sense on the face of it
NULLs are easier to explain with an example—say a table with only two rows,('a', 'b') and ('a', NULL) The first columns of each row are non-NULL and are equal to each other, so we have a match so far The second column in the second row is NULL and cannot compare
to anything, so we skip the second column pair and go with what we have, and the test is TRUE This is giving the NULLs the benefit of the doubt, since the NULL in the second row could become ‘b’ some day and give us a duplicate row
Now consider the case where the subquery has two rows, ('a', NULL) and ('a', NULL) The predicate is still TRUE, because the
NULLs do not test equal or unequal to each other—not because we are making NULLs equal to each other
As you can see, it is a good idea to avoid NULLs in UNIQUE
constraints
Trang 6C H A P T E R
17
The SELECT Statement
THE GOOD NEWS ABOUT SQL is that the programmer only needs to learn the SELECT statement to do almost all his work The bad news
is that the statement can have so many nested clauses that it looks like a Victorian novel! The SELECT statement is used to query the database It combines one or more tables, can do some calculations, and finally puts the results into a result table that can be passed on to the host language
I have not spent much time on the simple one-table SELECT
statements you see in introductory books I am assuming that the readers are experienced SQL programmers and got enough of those queries when they were learning SQL
17.1 SELECT and JOINs
There is an order to the execution of the clauses of an SQL SELECT
statement that does not seem to be covered in most beginning SQL books It explains why some things work in SQL and others do not
17.1.1 One-Level SELECT Statement
The simplest possible SELECT statement is just “SELECT * FROM Sometable;” which returns the entire table as it stands You can actually write this as “TABLE Sometable” in Standard SQL, but
Trang 7318 CHAPTER 17: THE SELECT STATEMENT
nobody seems to use that syntax Though the syntax rules say that all you need are the SELECT and FROM clauses, in practice there is almost always a WHERE clause
Let’s look at the SELECT statement in detail The syntax for the statement is:
SELECT [ALL | DISTINCT] <scalar expression list>
FROM <table expression>
[WHERE <search condition>]
[GROUP BY <grouping column list>]
[HAVING <group condition>];
The order of execution is as follows:
1 Execute theFROM <table expression>clause and construct the working result table defined in that clause. The FROM can have all sorts of other table expressions, but the point is that they return a working table as a result We will get into the details of those expressions later, with particular attention to the JOIN
operators
The result table preserves the order of the tables, and the order of the columns within each, in the result The result table
is different from other tables in that each column retains the table name from which it was derived Thus if table A and table
B both have a column named x, there will be a column A.x and
a column B.x in the results of the FROM clause No product actually uses a CROSS JOIN to construct the intermediate table—the working table would get too large too fast For example, a 1,000-row table and a 1,000-row table
would-CROSS JOIN to get a 1,000,000-row working table This is just the conceptual model we use to describe behavior
2 If there is a WHEREclause, apply the search condition in it to each row of the FROMclause result table. The rows that test TRUE are retained; the rows that test FALSE or UNKNOWN are deleted from the working set
The WHERE clause is where the action is The predicate can
be quite complex and have nested subqueries The syntax of a subquery is a SELECT statement, which is inside parentheses— failure to use parentheses is a common error for new SQL pro-grammers Subqueries are where the original SQL got the name
Trang 817.1 SELECT and JOINs 319
“Structured English Query Language”—the ability to nest
SELECT statements was the “structured” part We will deal with
those in another section
3 If there is a GROUP BY <grouping column list> clause,
execute it next. It uses the FROM and WHERE clause working
table and breaks these rows into groups where the columns in
the <grouping column list> all have the same value
NULLs are treated as if they were all equal to each other, and
form their own group Each group is then reduced to a single
row in a new result table that replaces the old one
Each row represents information about its group Standard
SQL does not allow you to use the name of a calculated column
such as “(salary + commission) AS total_pay” in the
GROUP BY clause, because that column is computed and
named in the SELECT clause of this query It does not exist yet
However, you will find products that allow it because they
cre-ate a result table first, using names in the SELECT cause, then
fill the result table with rows created by the query There are
ways to get the same result by using VIEWs and derived table
expressions, which we will discuss later
Only four things make sense as group characteristics: the
columns that define it, the aggregate functions that summarize
group characteristics, function calls and constants, and
expres-sions built from those three things
4 If there is a HAVING clause, apply it to each of the groups. The
groups that test TRUE are retained; the groups that test FALSE
or UNKNOWN are deleted If there is no GROUP BY clause, the
HAVING clause treats the whole table as a single group It is not
true that there must be a GROUP BY clause
Standard SQL prohibits correlated queries in a HAVING
clause, but there are workarounds that use derived tables
The <group condition> must apply to columns in the
grouped working table or to group properties, not to the
indi-vidual rows that originally built the group Aggregate functions
used in the HAVING clause usually appear in the SELECT
clause, but that is not part of the standard Nor does the
SELECT clause have to include all the grouping columns
5 Finally, apply the SELECT clause to the result table If a column
does not appear in the <expression list>, it is dropped
Trang 9320 CHAPTER 17: THE SELECT STATEMENT
from the final results Expressions can be constants or column names, or they can be calculations made from constants, columns, functions, and scalar subqueries
If the SELECT clause has the DISTINCT option, redundant duplicate rows are deleted from the final result table The phrase “redundant duplicate” means that one copy of the row
is retained If the SELECT clause has the explicit ALL option or
is missing the [ALL | DISTINCT] option, then all duplicate rows are preserved in the final results table (Frankly, although
it is legal syntax, nobody really uses the SELECT ALL option.) Finally, the results are returned
Let us carry an example out in detail, with a two-table join
SELECT sex, COUNT(*), AVG(age), (MAX(age) - MIN(age)) AS age_range
FROM Students, Gradebook WHERE grade = 'A'
AND Students.stud_nbr = Gradebook.stud_nbr GROUP BY sex
HAVING COUNT(*) > 3;
The two starting tables look like this:
CREATE TABLE Students (stud_nbr INTEGER NOT NULL PRIMARY KEY, stud_name CHAR(10) NOT NULL,
sex CHAR(1) NOT NULL, age INTEGER NOT NULL);
Students stud_nbr stud_name sex age
===============================
1 'Smith' 'M' 16
2 'Smyth' 'F' 17
3 'Smoot' 'F' 16
4 'Adams' 'F' 17
5 'Jones' 'M' 16
6 'Celko' 'M' 17
Trang 1017.1 SELECT and JOINs 321
7 'Vennor' 'F' 16
8 'Murray' 'M' 18
CREATE TABLE Gradebook
(stud_nbr INTEGER NOT NULL PRIMARY KEY
REFERENCES Students(stud_nbr),
grade CHAR(1) NOT NULL);
Gradebook
stud_nbr grade
=================
1 'A'
2 'B'
3 'C'
4 'D'
5 'A'
6 'A'
7 'A'
8 'A'
The CROSS JOIN in the FROM clause looks like this:
Cross Join working table
Students Gradebook
stud_nbr stud_name sex age | stud_nbr grade
====================================================
1 'Smith' 'M' 16 | 1 'A'
1 'Smith' 'M' 16 | 2 'B'
1 'Smith' 'M' 16 | 3 'C'
1 'Smith' 'M' 16 | 4 'D'
1 'Smith' 'M' 16 | 5 'A'
1 'Smith' 'M' 16 | 6 'A'
1 'Smith' 'M' 16 | 7 'A'
1 'Smith' 'M' 16 | 8 'A'
2 'Smyth' 'F' 17 | 1 'A'
2 'Smyth' 'F' 17 | 2 'B'
2 'Smyth' 'F' 17 | 3 'C'
2 'Smyth' 'F' 17 | 4 'D'
2 'Smyth' 'F' 17 | 5 'A'
2 'Smyth' 'F' 17 | 6 'A'
2 'Smyth' 'F' 17 | 7 'A'