Old Thing Red Thing New Thing Blue Thing Plane Cycle Train Car An inner join between tableOneand tableTwowill return only the two matching rows: SELECT Thing1, Thing2 FROM dbo.One INNER
Trang 1INSERT dbo.One(OnePK, Thing1) VALUES (2, ‘New Thing’);
INSERT dbo.One(OnePK, Thing1) VALUES (3, ‘Red Thing’);
INSERT dbo.One(OnePK, Thing1) VALUES (4, ‘Blue Thing’);
INSERT dbo.Two(TwoPK, OnePK, Thing2) VALUES(1,0, ‘Plane’);
INSERT dbo.Two(TwoPK, OnePK, Thing2) VALUES(2,2, ‘Train’);
INSERT dbo.Two(TwoPK, OnePK, Thing2) VALUES(3,3, ‘Car’);
INSERT dbo.Two(TwoPK, OnePK, Thing2) VALUES(4,NULL, ‘Cycle’);
FIGURE 10-9
The Red Thing Blue Thing example has data to view every type of join
Old Thing
Red Thing New Thing Blue Thing
Plane Cycle Train
Car
An inner join between tableOneand tableTwowill return only the two matching rows:
SELECT Thing1, Thing2 FROM dbo.One
INNER JOIN dbo.Two
ON One.OnePK = Two.OnePK;
Result:
-New Thing Train
Red Thing Car
A left outer join will extend the inner join and include the rows from tableOnewithout a match:
SELECT Thing1, Thing2 FROM dbo.One
LEFT OUTER JOIN dbo.Two
ON One.OnePK = Two.OnePK;
Trang 2All the rows are now returned from tableOne, but two rows are still missing from tableTwo:
-Old Thing NULL
New Thing Train
Red Thing Car
Blue Thing NULL
A full outer join will retrieve every row from both tables, regardless of a match between the tables:
SELECT Thing1, Thing2
FROM dbo.One
FULL OUTER JOIN dbo.Two
ON One.OnePK = Two.OnePK;
The plane and cycle from tableTwoare now listed along with every row from tableOne:
-Old Thing NULL
New Thing Train
Red Thing Car
Blue Thing NULL
As this example shows, full outer joins are an excellent tool for finding all the data, even bad data Set
difference queries, explored later in this chapter, build on outer joins to zero in on bad data
Placing the conditions within outer joins
When working with inner joins, a condition has the same effect whether it’s in theJOINclause or the
WHEREclause, but that’s not the case with outer joins:
■ When the condition is in theJOINclause, SQL Server includes all rows from the outer table
and then uses the condition to include rows from the second table
■ When the restriction is placed in theWHEREclause, the join is performed and then theWHERE
clause is applied to the joined rows
The following two queries demonstrate the effect of the placement of the condition
In the first query, the left outer join includes all rows from tableOneand then joins those rows from
tableTwowhereOnePKis equal in both tables andThing1’s value isNew Thing The result is all the
rows from tableOne, and rows from tableTwothat meet both join restrictions:
SELECT Thing1, Thing2
FROM dbo.One
LEFT OUTER JOIN dbo.Two
Trang 3ON One.OnePK = Two.OnePK AND One.Thing1 = ‘New Thing’;
Result:
-Old Thing NULL
New Thing Train Red Thing NULL Blue Thing NULL The second query first performs the left outer join, producing the same four rows as the previous query
but without theANDcondition TheWHEREclause then restricts that result to those rows whereThing1
is equal toNew Thing1 The net effect is the same as when an inner join was used (but it might take
more execution time):
SELECT Thing1, Thing2 FROM dbo.One
LEFT OUTER JOIN dbo.Two
ON One.OnePK = Two.OnePK
WHERE One.Thing1 = ‘New Thing’;
Result:
-New Thing Train
Multiple outer joins
Coding a query with multiple outer joins can be tricky Typically, the order of data sources in theFROM
clause doesn’t matter, but here it does The key is to code them in a sequential chain Think through it
this way:
1 Grab all the customers regardless of whether they’ve placed any orders.
2 Then grab all the orders regardless of whether they’ve shipped.
3 Then grab all the ship details.
When chaining multiple outer joins, stick to left outer joins, as mixing left and right outer joins
becomes very confusing very fast Be sure to unit test the query with a small sample set of data to
ensure that the outer join chain is correct
Self-Joins
A self-join is a join that refers back to the same table This type of unary relationship is often used to
extract data from a reflexive (also called a recursive) relationship, such as organizational charts (employee
to boss) Think of a self-join as a table being joined with a temporary copy of itself
Trang 4TheFamilysample database uses two self-joins between a child and his or her parents, as shown in
the database diagram in Figure 10-10 The mothers and fathers are also people, of course, and are listed
in the same table They link back to their parents, and so on The sample database is populated with
five fictitious generations that can be used for sample queries
FIGURE 10-10
The database diagram of the Family database includes two unary relationships (children to parents)
on the left and a many-to-many unary relationship (husband to wife) on the right
The key to constructing a self-join is to include a second reference to the table using a table alias Once
the table is available twice to theSELECTstatement, the self-join functions much like any other join In
the following example, thedbo.Persontable is referenced using the table aliasMother:
Switching over to theFamilysample database, the following query locates the children of Audry
Hal-loway:
USE Family;
SELECT Child.PersonID, Child.FirstName,
Child.MotherID, Mother.PersonID
Trang 5FROM dbo.Person AS Child INNER JOIN dbo.Person AS Mother
ON Child.MotherID = Mother.PersonID WHERE Mother.LastName = ‘Halloway’
AND Mother.FirstName = ‘Audry’;
The query uses thePersontable twice The first reference (aliased asChild) is joined with the
sec-ond reference (aliased asMother), which is restricted by theWHEREclause to only Audry Halloway
Only the rows with aMotherIDthat points back to Audry will be included in the inner join Audry’s
PersonIDis 6 and her children are as follows:
PersonID FirstName MotherID PersonID
While the previous query adequately demonstrates a self-join, it would be more useful if the mother
weren’t hard-coded in theWHEREclause, and if more information were provided about each birth, as
follows:
SELECT CONVERT(NVARCHAR(15),C.DateofBirth,1) AS Date, C.FirstName AS Name, C.Gender AS G,
ISNULL(F.FirstName + ‘ ‘ + F.LastName, ‘ * unknown *’)
as Father, M.FirstName + ‘ ‘ + M.LastName as Mother FROM dbo.Person AS C
LEFT OUTER JOIN dbo.Person AS F
ON C.FatherID = F.PersonID INNER JOIN dbo.Person AS M
ON C.MotherID = M.PersonID
ORDER BY C.DateOfBirth;
This query makes three references to thePersontable: the child, the father, and the mother, with
mnemonic one-letter aliases The result is a better listing:
- - - -5/19/22 James M James Halloway Kelly Halloway 8/05/28 Audry F Bryan Miller Karen Miller 8/19/51 Melanie F James Halloway Audry Halloway 8/30/53 James M James Halloway Audry Halloway 2/12/58 Dara F James Halloway Audry Halloway 3/13/61 Corwin M James Halloway Audry Halloway 3/13/65 Cameron M Richard Campbell Elizabeth Campbell
.
For more ideas about working with hierarchies and self-joins, refer to Chapter 17,
‘‘Traversing Hierarchies.’’
Trang 6Cross (Unrestricted) Joins
The cross join, also called an unrestricted join, is a pure relational algebra multiplication of the two
source tables Without a join condition restricting the result set, the result set includes every possible
combination of rows from the data sources Each row in data set one is matched with every row in data
set two — for example, if the first data source has five rows and the second data source has four rows,
a cross join between them would result in 20 rows This type of result set is referred to as a Cartesian
product.
Using theOne/Twosample tables, a cross join is constructed in Management Studio by omitting the join
condition between the two tables, as shown in Figure 10-11
FIGURE 10-11
A graphical representation of a cross join is simply two tables without a join condition
In code, this type of join is specified by the keywordsCROSS JOINand the lack of anONcondition:
SELECT Thing1, Thing2
FROM dbo.One
CROSS JOIN dbo.Two;
Trang 7The result of a join without restriction is that every row in tableOnematches with every row from table
Two:
-Old Thing Plane
New Thing Plane Red Thing Plane Blue Thing Plane Old Thing Train New Thing Train Red Thing Train Blue Thing Train Old Thing Car New Thing Car Red Thing Car Blue Thing Car Old Thing Cycle New Thing Cycle Red Thing Cycle Blue Thing Cycle Sometimes cross joins are the result of someone forgetting to draw the join in a graphical-query tool;
however, they are useful for populating databases with sample data, or for creating empty ‘‘pidgin hole’’
rows for population during a procedure
Understanding how a cross join multiplies data is also useful when studying relational division, the
inverse of relational multiplication Relational division requires subqueries, so it’s explained in the next
chapter
Exotic Joins
Nearly all joins are based on a condition of equality between the primary key of a primary table and the
foreign key of a secondary table, which is why the inner join is sometimes called an equi-join Although
it’s commonplace to base a join on a single equal condition, it is not a requirement The condition
between the two columns is not necessarily equal, nor is the join limited to one condition
TheONcondition of the join is in reality nothing more than aWHEREcondition restricting the product
of the two joined data sets Where-clause conditions may be very flexible and powerful, and the same is
true of join conditions This understanding of theONcondition enables the use of three powerful
tech-niques: (theta) joins, multiple-condition joins, and non-key joins.
Multiple-condition joins
If a join is nothing more than a condition between two data sets, then it makes sense that multiple
con-ditions are possible at the join In fact, multiple-condition joins and joins go hand-in-hand Without
the ability to use multiple-condition joins, joins would be of little value.
Trang 8If the database schema uses natural primary keys, then there are probably tables with composite primary
keys, which means queries must use multiple-condition joins
Join conditions can refer to any table in theFROMclause, enabling interesting three-way joins:
FROM A
INNER JOIN B
ON A.col = B.col
INNER JOIN C
ON B.col = C.col
AND A.col = C.col;
The first query in the previous section, ‘‘Placing the Conditions within Outer Joins,’’ was a
multiple-condition join
(theta) joins
A theta join (depicted throughout as ) is a join based on a non-equaloncondition In relational
the-ory, conditional operators (=, >, <, >=, <=, <>) are called operators While the equals
condi-tion is technically a operator, it is commonly used, so only joins with condicondi-tions other than equal are
referred to as joins.
The condition may be set within Management Studio’s Query Designer using the join Properties
dia-log, as previously shown in Figure 10-7
Non-key joins
Joins are not limited to primary and foreign keys The join can match a row in one data source with a
row in another data source using any column, as long as the columns share compatible data types and
the data match
For example, an inventory allocation system would use a non-key join to find products that are expected
to arrive from the supplier before the customer’s required ship date A non-key join between the
PurchaseOrderandOrderDetailtables with a condition betweenPO.DateExpectedand
OD.DateRequiredwill filter the join to those products that can be allocated to the customer’s orders
The following code demonstrates the non-key join (this is not in a sample database):
SELECT OD.OrderID, OD.ProductID, PO.POID
FROM OrderDetail AS OD
INNER JOIN PurchaseOrder AS PO
ON OD.ProductID = PO.ProductID
AND OD.DateRequired > PO.DateExpected;
When working with inner joins, non-key join conditions can be placed in theWHEREclause or in the
JOIN Because the conditions compare similar values between two joined tables, I often place these
con-ditions in theJOINportion of theFROMclause, rather than theWHEREclause The critical difference
depends on whether you view the conditions as a part of creating the record set upon which the rest
of the SQLSELECTstatement is acting, or as a filtering task that follows theFROMclause Either way,
the query-optimization plan is identical, so use the method that is most readable and seems most logical
Trang 9to you Note that when constructing outer joins, the placement of the condition in theJOINor in the
WHEREclause yields different results, as explained earlier in the section ‘‘Placing the Conditions within
Outer Joins.’’
Asking the question, ‘‘Who are twins?’’ of theFamilysample database uses all three exotic
join techniques in the join between person and twin The join contains three conditions The
Person.PersonID <> Twin.PersonIDcondition is a join that prevents a person from being
considered his or her own twin The join condition onMotherID, while a foreign key, is nonstandard
because it is being joined with another foreign key TheDateOfBirthcondition is definitely a non-key
join condition:
SELECT Person.FirstName + ‘ ‘ + Person.LastName AS Person, Twin.FirstName + ‘ ‘ + Twin.LastName AS Twin,
Person.DateOfBirth FROM dbo.Person INNER JOIN dbo.Person AS Twin
ON Person.PersonID <> Twin.PersonID
AND Person.MotherID = Twin.MotherID AND Person.DateOfBirth = Twin.DateOfBirth;
The following is the same query, this time with the exotic join condition moved to theWHEREclause
Not surprisingly, SQL Server’s Query Optimizer produces the exact same query execution plan for each
query:
SELECT Person.FirstName + ‘ ‘ + Person.LastName AS Person, Twin.FirstName + ‘ ‘ + Twin.LastName AS Twin,
Person.DateOfBirth FROM dbo.Person INNER JOIN dbo.Person AS Twin
ON Person.MotherID = Twin.MotherID AND Person.DateOfBirth = Twin.DateOfBirth
WHERE Person.PersonID <> Twin.PersonID;
Result:
- - -Abbie Halloway Allie Halloway 1979-010-14 00:00:00.000 Allie Halloway Abbie Halloway 1979-010-14 00:00:00.000 The difficult query scenarios at the end of the next chapter also demonstrate exotic joins, which are
often used with subqueries
Set Difference Queries
A query type that’s useful for analyzing the correlation between two data sets is a set difference query,
sometimes called a left (or right) anti-semi join, which finds the difference between the two data sets
based on the conditions of the join In relational algebra terms, it removes the divisor from the dividend,
Trang 10leaving the difference This type of query is the inverse of an inner join Informally, it’s called a find
unmatched rows query.
Set difference queries are great for locating out-of-place data or data that doesn’t match, such as rows
that are in data set one but not in data set two (see Figure 10-12)
FIGURE 10-12
The set difference query finds data that is outside the intersection of the two data sets
Old Thing
Red Thing
New Thing
Blue Thing
Plane Cycle Train
Car
Table Two Table One
Set Difference
Set
Difference
Left set difference query
A left set difference query finds all the rows on the left side of the join without a match on the right side
of the joins
Using theOneandTwosample tables, the following query locates all rows in tableOnewithout a
match in tableTwo, removing set two (the divisor) from set one (the dividend) The result will be the
rows from set one that do not have a match in set two
The outer join already includes the rows outside the intersection, so to construct a set difference query
use anOUTER JOINwith anIS NULLrestriction on the second data set’s primary key This will return
all the rows from tableOnethat do not have a match in tableTwo:
USE tempdb;
SELECT Thing1, Thing2
FROM dbo.One
LEFT OUTER JOIN dbo.Two
ON One.OnePK = Two.OnePK
WHERE Two.TwoPK IS NULL;
TableOne’s difference is as follows: