Joe Celko s SQL for Smarties - Advanced SQL Programming P34 doc

302 CHAPTER 15: EXISTS PREDICATE SELECT P1.emp_name, ' was born on a day without a famous New Yorker!' FROM Personnel AS P1 WHERE P1.birthday NOT IN SELECT C1.birthday FROM Celebrities

Trang 1

302 CHAPTER 15: EXISTS() PREDICATE

SELECT P1.emp_name, ' was born on a day without a famous New Yorker!'

FROM Personnel AS P1 WHERE P1.birthday NOT IN (SELECT C1.birthday FROM Celebrities AS C1 WHERE C1.birth_city = 'New York');

and you would think that the EXISTS version would be:

FROM Personnel AS P1 WHERE NOT EXISTS (SELECT * FROM Celebrities AS C1 WHERE C1.birth_city = 'New York' AND C1.birthday = P1.birthday);

Assume that Gloria Glamour is our only New Yorker and we still do not know her birthday The subquery will be empty for every employee

in the NOT EXISTS predicate version, because her NULL birthday will not test equal to the known employee birthdays

That means that the NOT EXISTS predicate will return TRUE and we will get every employee to match to Ms Glamour But now look at the

IN predicate version, which will have a single NULL in the subquery result This predicate will be equivalent to (Personnel.birthday = NULL), which is always UNKNOWN, and we will get no employees back

Likewise, you cannot, in general, transform the quantified comparison predicates into EXISTS predicates, because of the possibility of NULL values Remember that x <> ALL <subquery> is shorthand for x NOT IN <subquery>, and x = ANY <subquery> is shorthand for x IN <subquery>, and it will not surprise you

In general, the EXISTS predicates will run faster than the IN predicates The problem is in deciding whether to build the query or the subquery first; the optimal approach depends on the size and distribution

of values in each, and that cannot usually be known until runtime

15.2 EXISTS and INNER JOINs

The [NOT] EXISTS predicate is almost always used with a correlated subquery Very often the subquery can be “flattened” into a JOIN, which

Trang 2

15.3 NOT EXISTS and OUTER JOINs 303

will frequently run faster than the original query Our sample query can

be converted into:

SELECT P1.emp_name, ' has the same birthday as a famous person!'

FROM Personnel AS P1, Celebrities AS C1

WHERE P1.birthday = C1.birthday;

The advantage of the JOIN version is that it allows us to show

columns from both tables We should make the query more informative

by rewriting it:

SELECT P1.emp_name, ' has the same birthday as ', C1.emp_name

WHERE P1.birthday = C1.birthday;

This new query could be written with an EXISTS() predicate, but

that is a waste of resources

SELECT P1.emp_name, ' has the same birthday as ', C1.emp_name

WHERE EXISTS

(SELECT *

FROM Celebrities AS C2

WHERE P1.birthday = C2.birthday

AND C1.emp_name = C2.emp_name);

15.3 NOT EXISTS and OUTER JOINs

The NOT EXISTS version of this predicate is almost always used with a

correlated subquery Very often the subquery can be “flattened” into an

OUTER JOIN, which will frequently run faster than the original query

Our other sample query was:

SELECT P1.emp_name, ' was born on a day without a famous New

Yorker!'

FROM Personnel AS P1

WHERE NOT EXISTS

(SELECT *

FROM Celebrities AS C1

WHERE C1.birth_city = 'New York'

AND C1.birthday = P1.birthday);

Trang 3

Which we can replace with:

FROM Personnel AS P1 LEFT OUTER JOIN Celebrities AS C1

ON C1.birth_city = 'New York' AND C1.birthday = E2.birthday WHERE C1.emp_name IS NULL;

This is assuming that we know each and every celebrity name in the Celebrities table If the column in the WHERE clause could have NULLs in its base table, then we could not prune out the generated NULLs The test for NULL should always be on (a column of) the primary key, which cannot be NULL Relating this back to the example, how could a celebrity be a celebrity with an unknown name? Even The Unknown Comic had a name (“The Unknown Comic”)

15.4 EXISTS() and Quantifiers

Formal logic makes use of quantifiers that can be applied to

propositions The two forms are “For allx, P(x)” and “For somex, P(x)”

The first is written as {{inverted uppercase A }} and the second is written

as {{reversed uppercase E}}, if you want to look up formulas in a textbook The quantifiers put into symbols such statements as “all men are mortal” or “some Cretans are liars” so they can be manipulated The big question more than 100 years ago was that of existential import in formal logic Everyone agreed that saying “all men are mortal” implies that “no men are not mortal,” but does it also imply that “some men are mortal”—that we have to have at least one man who is mortal? Existential import lost the battle and the modern convention is that

“All men are mortal” has the same meaning as “There are no men who are immortal,” but does not imply that any men exist at all This is the convention followed in the design of SQL Consider the statement “some salesmen are liars” and the way we would write it with the EXISTS() predicate in SQL:

EXISTS(SELECT *

Trang 4

FROM Personnel AS P1, Liars AS L1

WHERE P1.job = 'Salesman'

AND P1.emp_name = L1.emp_name);

If we are more cynical about salesmen, we might want to formulate the predicate “all salesmen are liars” with the EXISTS predicate in SQL, using the transform rule just discussed:

NOT EXISTS(SELECT *

FROM Personnel AS P1

WHERE P1.job = 'Salesman'

AND P1.emp_name

NOT IN

(SELECT L1.emp_name

FROM Liars AS L1));

That says, informally, “there are no salesmen who are not liars” in English In this case, the IN predicate can be changed into JOIN, which should improve performance and be a bit easier to read

15.5 EXISTS() and Referential Constraints

Standard SQL was designed so that the declarative referential constraints could be expressed as EXISTS() predicates in a CHECK() clause For example:

CREATE TABLE Addresses

(addressee_name CHAR(25) NOT NULL PRIMARY KEY,

street_loc CHAR(25) NOT NULL,

city_name CHAR(20) NOT NULL,

state_code CHAR(2) NOT NULL

REFERENCES ZipCodeData(state_code),

);

could be written as:

CREATE TABLE Addresses

(addressee_name CHAR(25) NOT NULL PRIMARY KEY,

street_loc CHAR(25) NOT NULL,

Trang 5

city_name CHAR(20) NOT NULL, state_code CHAR(2) NOT NULL, CONSTRAINT valid_state_code CHECK (EXISTS(SELECT * FROM ZipCodeData AS Z1 WHERE Z1.state_code = Addresses.state_code)), .);

There is no advantage to this expression for the DBA, since you cannot attach referential actions with the CHECK() constraint However,

an SQL database can use the same mechanisms in the SQL compiler for both constructions

15.6 EXISTS and Three-Valued Logic

This example is due to an article by Lee Fesperman at FirstSQL Using Chris Date’s “SupplierParts” table with three rows:

CREATE TABLE SupplierPart (sup_nbr CHAR(2) NOT NULL PRIMARY KEY, part_nbr CHAR(2) NOT NULL,

qty INTEGER CHECK (qty > 0));

sup_nbr part_nbr qty

======================

'S1' 'P1' NULL 'S2' 'P1' 200 'S3' 'P1' 1000

The row (‘S1’, ‘P1’, NULL) means that supplier ‘S1’ supplies part ‘P1’ but we do not know what quantity he has

The query we wish to answer is “Find suppliers of part ‘P1’, but not in

a quantity of 1000 on hand.” The correct answer is ‘S2’ All suppliers in the table supply ‘P1’, but we do know ‘S3’ supplies the part in quantity

1000 and we do not know in what quantity ‘S1’ supplies the part The only supplier we eliminate for certain is ‘S2’

An SQL query to retrieve this result would be:

SELECT spx.sup_nbr FROM SupplierParts AS spx WHERE px.part_nbr = 'P1'

Trang 6

AND 1000

NOT IN (SELECT spy.qty

FROM SupplierParts AS spy

WHERE spy.sup_nbr = spx.sup_nbr

AND spy.part_nbr = 'P1');

According to Standard SQL, this query should return only ‘S2’, but when we transform the query into an equivalent version, using EXISTS instead, we obtain:

SELECT spx.sup_nbr

FROM SupplierParts AS spx

WHERE spx.part_nbr = 'P1'

AND NOT EXISTS

(SELECT *

FROM SupplierParts AS spy

WHERE spy.sup_nbr = spx.sup_nbr

AND spy.part_nbr = 'P1'

AND spy.qty = 1000);

Which will return (‘S1’, ‘S2’) You can argue that this is the wrong answer because we do not definitely know whether or not ‘S1’ supplies

‘P1’ in quantity 1000 The EXISTS() predicate will return TRUE or FALSE, even in situations where a subquery’s predicate returns an UNKNOWN (i.e., NULL = 1000)

The solution is to modify the predicate that deals with the quantity in the subquery to explicitly say that you do or not want to give the “benefit

of the doubt” to the NULL You have several alternatives:

1 (spy.qty = 1000) IS NOT FALSE

This uses the new predicates in Standard SQL for testing logical values Frankly, this is confusing to read and worse to maintain

2 (spy.qty = 1000 OR spy.qty IS NULL)

This uses another test predicate, but the optimizer can probably use any index on the qty column

Trang 7

3 (COALESCE(spy.qty, 1000) = 1000)

This is portable and easy to maintain The only disadvantage is that some SQL products might not be able to use an index on the qty column, because it is in an expression

The real problem is that the query was formed with a double negative in the form of a NOT EXISTS and an implicit IS NOT FALSE condition The problem stems from the fact that the EXISTS() predicate is one of the few two-value predicates in SQL, and that (NOT (NOT UNKNOWN)) = UNKNOWN

For another approach based on Dr Codd’s second relational model, visit www.FirstSQL.com and read some of the white papers by Lee Fesperman He used the two NULLs Codd proposed to develop a product

Trang 8

C H A P T E R

16

Quantified Subquery Predicates

A QUANTIFIER IS A logical operator that states the quantity of objects for which a statement is TRUE This is a logical quantity, not a numeric quantity; it relates a statement to the whole set of possible objects In everyday life, you see statements like “There is only one mouthwash that stops dinosaur breath,” “All doctors drive Mercedes,” or “Some people got rich investing in cattle futures,” which are quantified The first statement, about the mouthwash, is a uniqueness quantifier If there were two or more products that could save us from dinosaur breath, the statement would be FALSE The second statement has what is called a universal quantifier, since it deals with all

doctors—find one exception and the statement is FALSE The last statement has an existential quantifier, since it asserts that one or more people exist who got rich on cattle futures—find one example and the statement is TRUE

SQL has forms of these quantifiers that are not quite like those in formal logic They are based on extending the use of comparison predicates to allow result sets to be quantified, and they use SQL’s three-valued logic, so they do not return just TRUE or FALSE

Trang 9

310 CHAPTER 16: QUANTIFIED SUBQUERY PREDICATES

16.1 Scalar Subquery Comparisons

Standard SQL allows both scalar and row comparisons, but most queries use only scalar expressions If a subquery returns a row, single-column result table, it is treated as a scalar value in Standard SQL in virtually any place a scalar could appear For example, to find out if we have any teachers who are more than one year older than the students, I could write:

SELECT T1.teacher_name FROM Teachers AS T1 WHERE

T1.birthday > (SELECT MAX(S1.birthday) - INTERVAL '365' DAY FROM Students AS S1);

In this case, the scalar subquery will be run only once and reduced to

a constant value by the optimizer before scanning the Teachers table

A correlated subquery is more complex, because it will have to be executed for each value from the containing query For example, to find which suppliers have sent us fewer than 100 parts, we would use this query Notice how the SUM(quantity) has to be computed for each supplier number, sup_nbr

SELECT sup_nbr, sup_name FROM Suppliers

WHERE 100 > (SELECT SUM(quantity) FROM Shipments WHERE Shipments.sup_nbr = Suppliers.sup_nbr);

If a scalar subquery returns a NULL, we have rules for handling comparison with NULLs But what if it returns an empty result—a supplier that has not shipped us anything? In Standard SQL, the empty result table is converted to a NULL of the appropriate data type

In Standard SQL, you can place scalar or row subqueries on either side of a comparison predicate as long as they return comparable results But you must be aware of the rules for row comparisons For example, the following query will find the product manager who has more of his product at the stores than in the warehouse:

SELECT manager_name, product_nbr FROM Stores AS S1

Trang 10

16.2 Quantifiers and Missing Data 311

WHERE (SELECT SUM(qty)

FROM Warehouses AS W1

WHERE S1.product_nbr = W1.product_nbr)

< (SELECT SUM(qty)

FROM RetailStores AS R1

WHERE S1.product_nbr = R1.product_nbr);

Here is a programming tip: the main problem with writing these queries is getting a result with more than one row in it You can

guarantee uniqueness in several ways An aggregate function on an ungrouped table will always be a single value A JOIN with the

containing query based on a key will always be a single value

16.2 Quantifiers and Missing Data

The quantified predicates are used with subquery expressions to

compare a single value to those of the subquery, and take the general form <value expression> <comp op> <quantifier>

<subquery> The predicate "<value expression> <comp op> [ANY|SOME] <table expression>" is equivalent to taking each row, s, (assume that they are numbered from 1 to n) of <table expression> and testing "<value expression> <comp op> s" with ORs between the expanded expressions:

((<value expression> <comp op> s1)

OR (<value expression> <comp op> s2)

OR (<value expression> <comp op> sn))

When you get a single TRUE result, the whole predicate is TRUE

As long as <table expression> has cardinality greater than zero and one non-NULL value, you will get a result of TRUE or FALSE The keyword SOME is the same as ANY, and the choice is just a matter of style and readability Likewise, "<value expression> <comp op> ALL

<table expression>" takes each row, s, of <table expression> and tests <value expression> <comp op> s with ANDs between the expanded expressions:

((<value expression> <comp op> s1)

AND (<value expression> <comp op> s2)

AND (<value expression> <comp op> sn))

Định dạng
Số trang	10
Dung lượng	242,82 KB