292 CHAPTER 14: THE [NOT] IN PREDICATE SELECT * FROM JohnsBook AS J1 WHERE NOT EXISTS SELECT * FROM QualityGuide AS Q1 WHERE Q1.restaurant_name = J1.restaurant_name; The reason the secon
Trang 1292 CHAPTER 14: THE [NOT] IN() PREDICATE
SELECT * FROM JohnsBook AS J1 WHERE NOT EXISTS (SELECT * FROM QualityGuide AS Q1 WHERE Q1.restaurant_name = J1.restaurant_name);
The reason the second version will probably run faster is that it can
version has to test all the values in the subquery table for inequality
Many SQL implementations will construct a temporary table from the
table will not have any indexes The temporary table can also have duplicates and a random ordering of its rows, so that the SQL engine has
to do a full-table scan
14.2 Replacing ORs with the IN() Predicate
A simple trick that beginning SQL programmers often miss is that an
SELECT * FROM QualityControlReport WHERE test_1 = 'passed'
OR test_2 = 'passed'
OR test_3 = 'passed'
OR test_4 = 'passed';
can be rewritten as:
SELECT * FROM QualityControlReport WHERE 'passed' IN (test_1, test_2, test_3, test_4);
The reason this is difficult to see is that programmers get used to thinking of either a subquery or a simple list of constants They miss the fact that the IN() predicate list can be a list of expressions The
optimizer would have handled each of the original predicates separately
item, which can change the order of evaluation This might or might not
be faster than the list of ORed predicates for a particular query This
Trang 2formulation might cause the predicate to become nonindexable; you should check the indexability rules of your particular DBMS
14.3 NULLs and the IN() Predicate
subquery Consider these two tables:
CREATE TABLE Table1 (x INTEGER);
INSERT INTO Table1 VALUES (1), (2), (3), (4);
CREATE TABLE Table2 (x INTEGER);
INSERT INTO Table2 VALUES (1), (NULL), (2);
Now execute the query:
SELECT *
FROM Table1
WHERE x NOT IN (SELECT x FROM Table2)
Let’s work it out step by painful step:
SELECT * FROM Table1 WHERE x NOT IN (1, NULL, 2);
SELECT * FROM Table1 WHERE NOT (x IN (1, NULL, 2));
SELECT * FROM Table1 WHERE NOT ((x = 1) OR (x = NULL) OR (x = 2));
SELECT * FROM Table1
Trang 3294 CHAPTER 14: THE [NOT] IN() PREDICATE
WHERE ((x <> 1) AND (x <> NULL) AND (x <> 2
SELECT * FROM Table1 WHERE ((x <> 1) AND UNKNOWN AND (x <> 2));
SELECT * FROM Table1 WHERE UNKNOWN;
Now try this with another set of tables
CREATE TABLE Table3 (x INTEGER);
INSERT INTO Table3 VALUES (1), (2), (NULL), (4);
CREATE TABLE Table4 (x INTEGER);
INSERT INTO Table3 VALUES (1), (3), (2);
Let’s work out the same query step by painful step again
SELECT * FROM Table3 WHERE x NOT IN (1, 3, 2);
SELECT * FROM Table3 WHERE NOT (x IN (1, 3, 2));
SELECT * FROM Table3
Trang 4WHERE NOT ((x = 1) OR (x = 3) OR (x = 2));
SELECT * FROM Table3 WHERE ((x <> 1) AND (x <> 3) AND (x <> 2));
substitutions:
SELECT * FROM Table3 WHERE ((1 <> 1) AND (1 <> 3) AND (1 <> 2)) FALSE UNION ALL
SELECT * FROM Table3 WHERE ((2 <> 1) AND (2 <> 3) AND (2 <> 2)) FALSE UNION ALL
SELECT * FROM Table3 WHERE ((CAST(NULL AS INTEGER) <> 1) AND (CAST(NULL AS INTEGER) <> 3) AND (CAST(NULL AS INTEGER) <> 2)) UNKNOWN UNION ALL
SELECT * FROM Table3 WHERE ((4 <> 1) AND (4 <> 3) AND (4 <> 2)); TRUE
14.4 IN() Predicate and Referential Constraints
clause on a table The usual form is a list of values that are legal for a column, such as:
CREATE TABLE Addresses
(addressee_name CHAR(25) NOT NULL PRIMARY KEY,
street_loc CHAR(25) NOT NULL,
city_name CHAR(20) NOT NULL,
state_code CHAR(2) NOT NULL
CONSTRAINT valid_state_code
Trang 5296 CHAPTER 14: THE [NOT] IN() PREDICATE
CHECK (state_code IN ('AL', 'AK', )), .);
This method works fine with a small list of values, but it has problems with a longer list It is very important to arrange the values in the order that they are most likely to match to the two-letter state_code to speed
up the search
In Standard SQL a constraint can reference other tables, so you could write the same constraint as:
CREATE TABLE Addresses (addressee_name CHAR(25) NOT NULL PRIMARY KEY, street_loc CHAR(25) NOT NULL,
city_name CHAR(20) NOT NULL, state_code CHAR(2) NOT NULL, CONSTRAINT valid_state_code CHECK (state_code
IN (SELECT state_code FROM ZipCodes AS Z1 WHERE Z1.state_code = Addresses.state_code)), .);
The advantage of this is that you can change the ZipCodes table and
is fine for adding more data in the outer reference (i.e., Quebec joins the United States and gets the code ‘QB’), but it has a bad effect when you try
to delete data in the outer reference (i.e., California secedes from the
when the list is short, static, and unique to one table When the list is
domain
procedures, etc.) reference the values A separate table can have an index, and that makes a big difference in searching and doing joins
Trang 614.5 IN() Predicate and Scalar Queries
expression This includes scalar subqueries, but most people do not seem to know that this is possible For example, given tables that model warehouses, trucking centers, and so forth, we can find if we have a product, identified by its UPC code, somewhere in the enterprise
SELECT P.upc
FROM Picklist AS P
WHERE P.upc
IN ((SELECT upc FROM Warehouse AS W WHERE W.upc =
Picklist.upc),
(SELECT upc FROM TruckCenter AS T WHERE T.upc = Picklist.upc),
(SELECT upc FROM Garbage AS G WHERE G.upc =
Picklist.upc));
predicates
Trang 8C H A P T E R
15
EXISTS() Predicate
there are any rows in its subquery, it is TRUE; otherwise, it is FALSE
<exists predicate> ::= EXISTS <table subquery>
parentheses to avoid problems in the grammar during parsing
In SQL-89, the rules stated that the subquery had to have a
SELECT clause with one column or a * If the SELECT * option was used, the database engine would (in theory) pick one column and use
it This fiction was needed because SQL-89 defined subqueries as having only one column
Some early SQL implementations would work better with
EXISTS(SELECT <column> ), EXISTS(SELECT <constant> .), or EXISTS(SELECT * ) versions of the predicate Today, there is no difference in the three forms in the major products, so the
EXISTS(SELECT * ) is the preferred form
be searched while the base table is left alone completely For example,
we want to find all employees who were born on the same day as any famous person The query could be:
Trang 9300 CHAPTER 15: EXISTS() PREDICATE
SELECT P1.emp_name, ' has the same birthday as a famous person!' FROM Personnel AS P1
WHERE EXISTS (SELECT * FROM Celebrities AS C1 WHERE P1.birthday = C1.birthday);
look up that value in the index If the value is in the index, the predicate
If it is not in the index, the predicate is FALSE and there is still no
indexes are smaller than their tables and are structured for very fast searching
query may have to look at every row to see if there is a birthday that matches the current employee’s birthday There are some tricks that a good optimizer can use to speed things up in this situation
15.1 EXISTS and NULLs
and how they behave
Think of them as being like a brown paper bag—you know that something is inside because you lifted it, but you do not know exactly what that something is For example, we want to find all the employees who were not born on the same day as a famous person This can be answered with the negation of the original query, like this:
SELECT P1.emp_name, ' was born on a day without a famous person!' FROM Personnel AS P1
WHERE NOT EXISTS (SELECT * FROM Celebrities AS C1 WHERE P1.birthday = C1.birthday);
But assume that among the celebrities we have a movie star who will
new SQL programmer might expect that Ms Glamour would not match
Trang 1015.1 EXISTS and NULLs 301
to anyone, since we do not know her birthday yet Actually, she will match to everyone, since there is a chance that they may match when some tabloid newspaper finally gets a copy of her birth certificate But work out the subquery in the usual way to convince yourself:
WHERE NOT EXISTS
(SELECT *
FROM Celebrities
WHERE P1.birthday = NULL);
becomes:
WHERE NOT EXISTS
(SELECT *
FROM Celebrities
WHERE UNKNOWN);
becomes:
WHERE TRUE;
comparison, and therefore fails whenever we look at Ms Glamour
employees to famous people, the query can be rewritten as:
SELECT P1.emp_name, ' was born on a day without a famous person!' FROM Personnel AS P1
WHERE P1.birthday NOT IN
(SELECT C1.birthday
FROM Celebrities AS C1);
However, consider a more complex version of the same query, where
would be: