FIGURE 10-4Building an inner join within Management Studio’s Query Designer Because joins pull together data from two data sets, it makes sense that SQL needs to know how to match up row
Trang 1FIGURE 10-4
Building an inner join within Management Studio’s Query Designer
Because joins pull together data from two data sets, it makes sense that SQL needs to know how to
match up rows from those sets SQL Server merges the rows by matching a value common to both
tables Typically, a primary key value from one table is being matched with a foreign key value from the
secondary table Whenever a row from the first table matches a row from the second table, the two rows
are merged into a new row containing data from both tables
The following code sample joins theTour(secondary) andBaseCamp(primary) tables from the Cape
Hatteras Adventures sample database TheONclause specifies the common data:
USE CHA2;
SELECT Tour.Name, Tour.BaseCampID, BaseCamp.BaseCampID, BaseCamp.Name
FROM dbo.Tour INNER JOIN dbo.BaseCamp
ON Tour.BaseCampID = BaseCamp.BaseCampID;
Trang 2The query begins with theTourtable For everyTourrow, SQL Server will attempt to identify
match-ingBaseCamprows by comparing theBasecampIDcolumns in both tables TheTourtable rows and
BaseCamptable rows that match will be merged into a new result:
- - -
Number of rows returned
In the preceding query, every row in both theTourandBaseCamptables had a match No rows were
excluded from the join However, in real life this is seldom the case Depending upon the number of
matching rows from each data source and the type of join, it’s possible to decrease or increase the final
number of rows in the result set
To see how joins can alter the number of rows returned, look at theContactand[Order]tables of
theOBXKitesdatabase The initial row count of contacts is 21, yet when the customers are matched
with their orders, the row count changes to 10 The following code sample compares the two queries
and their respective results side by side:
USE OBXKites;
SELECT ContactCode, LastName SELECT ContactCode, OrderNumber
ON [Order].ContactID
= Contact.ContactID ORDER BY ContactCode;
Results from both queries:
Trang 3108 Hanks
Joins can appear to multiply rows If a row on one side of the join matches with several rows on the
other side of the join, the result will include a row for every match In the preceding query, some
con-tacts (Smith, Adams, and Reagan) are listed multiple times because they have multiple orders
Joins also eliminate rows Only contacts 101 through 106 have matching orders The rest of the contacts
are excluded from the join because they have no matching orders
ANSI SQL 89 joins
A join is really nothing more than the act of selecting data from two tables for which a condition of
equality exists between common columns Join conditions in theONclause are similar toWHEREclauses
In fact, before ANSI SQL 92 standardized theJOIN ONsyntax, ANSI SQL 89 joins (also called legacy
style joins, old style joins, or even grandpa joins) accomplished the same task by listing the tables within
theFROMclause and specifying the join condition in theWHEREclause
The previous sample join betweenContactand[Order]could be written as an ANSI 89 join as
fol-lows:
SELECT Contact.ContactCode, [Order].OrderNumber
FROM dbo.Contact, dbo.[Order]
WHERE [Order].ContactID = Contact.ContactID
ORDER BY ContactCode;
Best Practice
Always code joins using the ANSI 92 style ANSI 92 joins are cleaner, easier to read, and easier to debug
than ANSI 89 style joins, which leads to improved data integrity and decreases maintenance costs With
ANSI 89 style joins it’s possible to get the wrong result unless it’s coded very carefully ANSI 89 style outer
joins are deprecated in SQL Server 2008, so any ANSI 89 outer joins will generate an error
Trang 4Multiple data source joins
As some of the examples have already demonstrated, aSELECTstatement isn’t limited to one or two
data sources (tables, views, CTEs, subqueries, etc.); a SQL ServerSELECTstatement may refer to up to
256 data sources That’s a lot of joins
Because SQL is a declarative language, the order of the data sources is not important for inner joins
(The query optimizer will decide the best order to actually process the query based on the indexes
available and the data in the tables.) Multiple joins may be combined in multiple paths, or even circular
patterns (A joins B joins C joins A) Here’s where a large whiteboard and a consistent development style
really pay off
The following query (first shown in Figure 10-5 and then worked out in code) answers the question
‘‘Who purchased kites?’’ The answer must involve five tables:
FIGURE 10-5
Answering the question ‘‘Who purchased kites?’’ using Management Studio’s Query Designer
1 TheContacttable for the ‘‘who’’
2 The[Order]table for the ‘‘purchased’’
Trang 53 TheOrderDetailtable for the ‘‘purchased’’
4 TheProducttable for the ‘‘kites’’
5 TheProductCategorytable for the ‘‘kites’’
The following SQLSELECTstatement begins with the ‘‘who’’ portion of the question and specifies the
join tables and conditions as it works through the required tables The query that is shown graphically
in Management Studio (refer to Figure 10-5) is listed as raw SQL in the following code sample Notice
how thewhereclause restricts theProductCategorytable rows and yet affects the contacts selected:
USE OBXKites;
SELECT LastName, FirstName, ProductName
FROM dbo.Contact C
INNER JOIN dbo.[Order] O
ON C.ContactID = O.ContactID
INNER JOIN dbo.OrderDetail OD
ON O.OrderID = OD.OrderID
INNER JOIN dbo.Product P
ON OD.ProductID = P.ProductID
INNER JOIN dbo.ProductCategory PC
ON P.ProductCategoryID = PC.ProductCategoryID WHERE ProductCategoryName = ‘Kite’
ORDER BY LastName, FirstName;
Result:
-
.
To summarize the main points about inner joins:
■ They only match rows with a common value
■ The order of the data sources is unimportant
■ They can appear to multiply rows
■ Newer ANSI 92 style is the best way to write them
Outer Joins
Whereas an inner join contains only the intersection of the two data sets, an outer join extends the inner
join by adding the nonmatching data from the left or right data set, as illustrated in Figure 10-6
Outer joins solve a significant problem for many queries by including all the data regardless of a match
The common customer-order query demonstrates this problem well If the requirement is to build a
query that lists all customers plus their recent orders, only an outer join can retrieve every customer
Trang 6whether the customer has placed an order or not An inner join between customers and orders would
miss every customer who did not place a recent order
Depending on the nullability of the keys and the presence of rows on both sides of the join, it’s easy to write a query that misses rows from one side or the other of the join I’ve even seen this error in third-party ISV application code To avoid this data integrity error, know your
schema well and always unit test your queries against a small data set with known answers.
FIGURE 10-6
An outer join includes not only rows from the two data sources with a match, but also unmatched
rows from outside the intersection
Data Set A Data Set B
Common Intersection
Right Outer Join Left Outer Join
Some of the data in the result set produced by an outer join will look just like the data from an inner
join There will be data in columns that come from each of the data sources, but any rows from the
outer-join table that do not have a match in the other side of the join will return data only from the
outer-join table In this case, columns from the other data source will have null values
A Join Analogy
When I teach how to build queries, I sometimes use the following story to explain the different types of
joins Imagine a pilgrim church in the seventeenth century, segmented by gender The men all sit on
one side of the church and the women on the other Some of the men and women are married, and some
are single Now imagine that each side of the church is a database table and the various combinations of
people that leave the church represent the different types of joins
If all the married couples stood up, joined hands, and left the church, that would be an inner join between
the men and women The result set leaving the church would include only matched pairs
If all the men stood, and those who were married held hands with their wives and they left as a group, that
would be a left outer join The line leaving the church would include some couples and some bachelors
Likewise, if all women and their husbands left the church, that would be a right outer join All the bachelors
would be left alone in the church
A full outer join (covered later in this chapter) would be everyone leaving the church, but only the married
couples could hold hands
Trang 7Using the Query Designer to create outer joins
When building queries using the Query Designer, the join type can be changed from the default, inner
join, to an outer join via either the context menu or the properties of the join, as shown in Figure
10-7 The Query Designer does an excellent job of illustrating the types of joins with the join symbol (as
previously detailed in Table 10-1)
FIGURE 10-7
The join Properties window displays the join columns, and is used to set the join condition (=, >, <,
etc.) and add the left or right side of an outer join (all rows from Product, all rows from OrderDetail)
T-SQL code and outer joins
In SQL code, an outer join is declared by the keywordsLEFT OUTERorRIGHT OUTERbefore theJOIN
(technically, the keywordOUTERis optional):
SELECT * FROM Table1
LEFT|RIGHT [OUTER] JOIN Table2
ON Table1.column = Table2.column;
Trang 8Several keywords (such as INNER , OUTER , or AS ) in SQL are optional or may be abbreviated
syntax, explicitly stating the intent by spelling out the full syntax improves the readability of the code.
There’s no trick to telling the difference between left and right outer joins In code, left or right refers to
the table that will be included regardless of the match The outer-join table (sometimes called the
driv-ing table) is typically listed first, so left outer joins are more common than right outer joins I suspect
any confusion between left and right outer joins is caused by the use of graphical-query tools to build
joins, because left and right refers to the table’s listing in the SQL text, and the tables’ positions in the
graphical-query tool are moot
Best Practice
When coding outer joins, always order your data sources so you can write left outer joins Don’t use
right outer joins, and never mix left outer joins and right outer joins
To modify the previous contact-order query so that it returns all contacts regardless of any orders,
changing the join type from inner to left outer is all that’s required, as follows:
SELECT ContactCode, OrderNumber
FROM dbo.Contact
LEFT OUTER JOIN dbo.[Order]
ON [Order].ContactID = Contact.ContactID ORDER BY ContactCode;
The left outer join will include all rows from theContacttable and matching rows from the[Order]
table The abbreviated result of the query is as follows:
.
.
Because contact 107 and 108 do not have corresponding rows in the[Order]table, the columns from
the[Order]table return a null for those rows
Earlier versions of SQL Server extended the ANSI SQL 89 legacy join syntax with outer
condition While this syntax worked through SQL Server 2000, it has been deprecated since SQL Server
Trang 9Having said that, SQL Server supports backward compatibility, so if the database compatibility level is
set to 80 (SQL Server 2000), then the ANSI 82 style outer joins still work
Outer joins and optional foreign keys
Outer joins are often employed when a secondary table has a foreign-key constraint to the primary table
and permits nulls in the foreign key column The presence of this optional foreign key means that if the
secondary row refers to a primary row, then the primary row must exist However, it’s perfectly valid for
the secondary row to refrain from referring to the primary table at all
Another example of an optional foreign key is an order alert or priority column Many order rows will
not have an alert or special-priority status However, those that do must point to a valid row in the
order-priority table
The OBX Kite store uses a similar order-priority scheme, so reporting all the orders with their optional
priorities requires an outer join:
SELECT OrderNumber, OrderPriorityName FROM dbo.[Order]
LEFT OUTER JOIN dbo.OrderPriority
ON [Order].OrderPriorityID = OrderPriority.OrderPriorityID;
The left outer join retrieves all the orders and any matching priorities TheOBXKites_Populate.sql
script sets two orders to rush priority:
OrderNumber OrderPriorityName -
The adjacency pairs pattern (also called reflexive, recursive, or self-join relationships, covered in
Chapter 17, ‘‘Traversing Hierarchies’’) also uses optional foreign keys In theFamilysample database,
theMotherIDandFatherIDare both foreign keys that refer to thePersonIDof the mother or
father The optional foreign key allows persons to be entered without their father and mother already
in the database; but if a value is entered in theMotherIDorFatherIDcolumns, then the data must
point to valid persons in the database
functions You’ll find that covered in Chapter 25, ‘‘Building User-Defined Functions.’’
Trang 10Full outer joins
A full outer join returns all the data from both data sets regardless of the intersection, as shown in
Figure 10-8 It is functionally the same as taking the results from a left outer join and the results from a
right outer join, and unioning them together (unions are explained later in this chapter)
FIGURE 10-8
The full outer join returns all the data from both data sets, matching the rows where it can and filling
in the holes with nulls
Full Outer Join
Data Set A Data Set B
Common
Intersection
In real life, referential integrity reduces the need for a full outer join because every row from the
sec-ondary table should have a match in the primary table (depending on the optionality of the foreign key),
so left outer joins are typically sufficient Full outer joins are most useful for cleaning up data that has
not had the benefit of clean constraints to filter out bad data
Red thing blue thing
The following example is a mock-up of such a situation and compares the full outer join with an inner
and a left outer join TableOneis the primary table TableTwois a secondary table with a foreign key
that refers to tableOne There’s no foreign-key constraint, so there may be some nonmatches for the
outer join to find:
CREATE TABLE dbo.One (
OnePK INT,
Thing1 VARCHAR(15)
);
CREATE TABLE dbo.Two (
TwoPK INT,
OnePK INT,
Thing2 VARCHAR(15)
);
The sample data includes rows that would normally break referential integrity As illustrated in
Figure 10-9, the foreign key (OnePK) for the plane and the cycle in tableTwodo not have a match
in tableOne; and two of the rows in tableOnedo not have related secondary rows in tableTwo The
following batch inserts the eight sample data rows:
INSERT dbo.One(OnePK, Thing1)
VALUES (1, ‘Old Thing’);