The outer query will group the orders with toys for each contact, and the subquery will count the number of products in the toy product category.. The outer query’s HAVINGclause will the
Trang 1relational division query would list only those students who passed the required courses and no others.
A relational division with a remainder, also called an approximate divide, would list all the students who
passed the required courses and include students who passed any additional courses Of course, that
example is both practical and academic
Relational division is more complex than a join A join simply finds any matches between two data sets
Relational division finds exact matches between two data sets Joins/subqueries and relational division
solve different types of questions For example, the following questions apply to the sample databases
and compare the two methods:
■ Joins/subqueries:
■ CHA2: Who has ever gone on a tour?
■ CHA2: Who lives in the same region as a base camp?
■ CHA2: Who has attended any event in his or her home region?
■ Exact relational division:
■ CHA2: Who has gone on every tour in his or her home state but no tours outside it?
■ OBXKites: Who has purchased every kite but nothing else?
■ Family: Which women (widows or divorcees) have married the same husbands as each other, but no other husbands?
■ Relational division with remainders:
■ CHA2: Who has gone on every tour in his or her home state, and possibly other tours as well?
■ OBXKites: Who has purchased every kite and possibly other items as well?
■ Family: Which women have married the same husbands and may have married other men
as well?
Relational division with a remainder
Relational division with a remainder essentially extracts the quotient while allowing some leeway for
rows that meet the criteria but contain additional data as well In real-life situations this type of division
is typically more useful than an exact relational division
The previous OBX Kites sales question (‘‘Who has purchased every kite and possibly other items as
well?’’) is a good one to use to demonstrate relational division Because it takes five tables to go from
contact to product category, and because the question refers to the join betweenOrderDetailand
Product, this question involves enough complexity that it simulates a real-world relational-database
problem
The toy category serves as a good example category because it contains only two toys and no one has
purchased a toy in the sample data, so the query will answer the question ‘‘Who has purchased at least
one of every toy sold by OBX Kites?’’ (Yes, my kids volunteered to help test this query.)
First, the following data will mock up a scenario in theOBXKitesdatabase The only toys are
ProductCode1049 and 1050 TheOBXKitesdatabase uses unique identifiers for primary keys and
therefore uses stored procedures for all inserts The firstOrderandOrderDetailinserts will list the
stored procedure parameters so the following stored procedure calls are easier to understand:
USE OBXKites;
DECLARE @OrderNumber INT;
Trang 2The first person,ContactCode110, orders exactly all toys:
EXEC pOrder_AddNew
@ContactCode = ‘110’,
@EmployeeCode = ‘120’,
@LocationCode = ‘CH’,
@OrderDate= ‘2002/6/1’,
@OrderNumber = @OrderNumber output;
EXEC pOrder_AddItem
@OrderNumber = @OrderNumber,
@Code = ‘1049’,
@NonStockProduct = NULL,
@Quantity = 12,
@UnitPrice = NULL,
@ShipRequestDate = ‘2002/6/1’,
@ShipComment = NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1050’, NULL, 3, NULL, NULL, NULL;
The second person,ContactCode111, orders exactly all toys — and toy 1050 twice:
EXEC pOrder_AddNew
‘111’, ‘119’, ‘JR’, ‘2002/6/1’, @OrderNumber output;
EXEC pOrder_AddItem
@OrderNumber, ‘1049’, NULL, 6, NULL, NULL, NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1050’, NULL, 6, NULL, NULL, NULL;
EXEC pOrder_AddNew
‘111’, ‘119’, ‘JR’, ‘2002/6/1’, @OrderNumber output;
EXEC pOrder_AddItem
@OrderNumber, ‘1050’, NULL, 6, NULL, NULL, NULL;
The third person,ContactCode112, orders all toys plus some other products:
EXEC pOrder_AddNew
‘112’, ‘119’, ‘JR’, ‘2002/6/1’, @OrderNumber output;
EXEC pOrder_AddItem
@OrderNumber, ‘1049’, NULL, 6, NULL, NULL, NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1050’, NULL, 5, NULL, NULL, NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1001’, NULL, 5, NULL, NULL, NULL;
EXEC pOrder_AddItem
@OrderNumber, ‘1002’, NULL, 5, NULL, NULL, NULL;
The fourth person,ContactCode113, orders one toy:
EXEC pOrder_AddNew
Trang 3EXEC pOrder_AddItem
@OrderNumber, ‘1049’, NULL, 6, NULL, NULL, NULL;
In other words, only customers 110 and 111 order all the toys and nothing else Customer 112
pur-chases all the toys, as well as some kites Customer 113 is an error check because she bought only one
toy
At least a couple of methods exist for coding a relational-division query The original method, proposed
by Chris Date, involves using nested correlated subqueries to locate rows in and out of the sets A more
direct method has been popularized by Joe Celko: It involves comparing the row count of the dividend
and divisor data sets
Basically, Celko’s solution is to rephrase the question as ‘‘For whom is the number of toys ordered equal
to the number of toys available?’’
The query is asking two questions The outer query will group the orders with toys for each contact,
and the subquery will count the number of products in the toy product category The outer query’s
HAVINGclause will then compare the distinct count of contact products ordered that are toys against
the count of products that are toys:
Is number of toys ordered
SELECT Contact.ContactCode FROM dbo.Contact
JOIN dbo.[Order]
ON Contact.ContactID = [Order].ContactID JOIN dbo.OrderDetail
ON [Order].OrderID = OrderDetail.OrderID JOIN dbo.Product
ON OrderDetail.ProductID = Product.ProductID JOIN dbo.ProductCategory
ON Product.ProductCategoryID = ProductCategory.ProductCategoryID WHERE ProductCategory.ProductCategoryName = ‘Toy’
GROUP BY Contact.ContactCode HAVING COUNT(DISTINCT Product.ProductCode) = equal to number of toys available?
(SELECT Count(ProductCode) FROM dbo.Product
JOIN dbo.ProductCategory
ON Product.ProductCategoryID
= ProductCategory.ProductCategoryID WHERE ProductCategory.ProductCategoryName = ‘Toy’);
Result:
ContactCode -110
111 112
Trang 4Some techniques in the previous query — namely, group by , having , and count() — are
explained in the next chapter, ‘‘Aggregating Data.’’
Exact relational division
Exact relational division finds exact matches without any remainder It takes the basic question of
rela-tional division with remainder and tightens the method so that the divisor will have no extra rows that
cause a remainder
In practical terms it means that the example question now asks, ‘‘Who has ordered only every toy?’’
If you address this query with a modified form of Joe Celko’s method, the pseudocode becomes ‘‘For
whom is the number of toys ordered equal to the number of toys available, and also equal to the total
number of products ordered?’’ If a customer has ordered additional products other than toys, then the
third part of the question eliminates that customer from the result set
The SQL code contains two primary changes to the previous query One, the outer query must find
both the number of toys ordered and the number of all products ordered It does this by finding the
toys purchased in a derived table and joining the two data sets Two, theHAVINGclause must be
modified to compare the number of toys available with both the number of toys purchased and the
number of all products purchased, as follows:
Exact Relational Division
Is number of all products ordered
SELECT Contact.ContactCode
FROM dbo.Contact
JOIN dbo.[Order]
ON Contact.ContactID = [Order].ContactID
JOIN dbo.OrderDetail
ON [Order].OrderID = OrderDetail.OrderID
JOIN dbo.Product
ON OrderDetail.ProductID = Product.ProductID
JOIN dbo.ProductCategory P1
ON Product.ProductCategoryID = P1.ProductCategoryID
JOIN
and number of toys ordered
(SELECT Contact.ContactCode, Product.ProductCode
FROM dbo.Contact JOIN dbo.[Order]
ON Contact.ContactID = [Order].ContactID JOIN dbo.OrderDetail
ON [Order].OrderID = OrderDetail.OrderID JOIN dbo.Product
ON OrderDetail.ProductID = Product.ProductID JOIN dbo.ProductCategory
ON Product.ProductCategoryID = ProductCategory.ProductCategoryID WHERE ProductCategory.ProductCategoryName = ‘Toy’
Trang 5) ToysOrdered
ON Contact.ContactCode = ToysOrdered.ContactCode GROUP BY Contact.ContactCode
HAVING COUNT(DISTINCT Product.ProductCode) = equal to number of toys available?
(SELECT Count(ProductCode) FROM dbo.Product
JOIN dbo.ProductCategory
ON Product.ProductCategoryID
= ProductCategory.ProductCategoryID WHERE ProductCategory.ProductCategoryName = ‘Toy’) AND equal to the total number of any product ordered?
AND COUNT(DISTINCT ToysOrdered.ProductCode) = (SELECT Count(ProductCode)
FROM dbo.Product JOIN dbo.ProductCategory
ON Product.ProductCategoryID
= ProductCategory.ProductCategoryID WHERE ProductCategory.ProductCategoryName = ‘Toy’);
The result is a list of contacts containing the number of toys purchased (2) and the number of total
products purchased (2), both equal to the number of products available (2):
ContactCode -110
111
Composable SQL
Composable SQL, also called select from output or DML table source (in SQL Server BOL), is the ability
to pass data from an insert, update, or delete’s output clause to an outer query This is a very powerful
new way to build subqueries, and it can significantly reduce the amount of code and improve the
per-formance of code that needs to write to one table, and then, based on that write, write to another table
To track the evolution of composable SQL (illustrated in Figure 11-3), SQL Server has always had
DML triggers, which include the inserted and deleted virtual tables Essentially, these are a view to the
DML modification that fired the triggers The deleted table holds the before image of the data, and the
inserted table holds the after image
Since SQL Server 2005, any DML statement that modifies data (INSERT,UPDATE,DELETE,MERGE)
can have an optionalOUTPUTclause that canSELECTfrom the virtual inserted and deleted table The
OUTPUTclause can pass the data to the client or insert it directly into a table
Trang 6The inserted and deleted virtual tables are covered in Chapter 26, ‘‘Creating DML
Trig-gers,’’ and the output clause is detailed in Chapter 15, ‘‘Modifying Data.’’
In SQL Server 2008, composable SQL can place the DML statements and itsOUTPUTclause in a
sub-query and then select from that subsub-query The primary benefit of composable SQL, as opposed to just
using theOUTPUTclause to insert into a table, is thatOUTPUTclause data may be further filtered and
manipulated by the outer query
FIGURE 11-3
Composable SQL is an evolution of the inserted and deleted tables
Output
Select From Output
Inserted Deleted
Insert
Select From
SQL 2008
SQL 2005
SQL 2000
DML
Insert, Update, Delete,
Merge
Client, table variable, temp tables, tables subquery
The following script first creates a table and then has a composable SQL query The subquery has an
UPDATEcommand with anOUTPUTclause TheOUTPUTclause passes theoldvalueandnewvalue
columns to the outer query The outer query filters outTestDataand then inserts it into theCompSQL
table:
CREATE TABLE CompSQL (oldvalue varchar(50), newvalue varchar(50));
INSERT INTO CompSQL (oldvalue, newvalue )
SELECT oldvalue, newvalue
FROM
(UPDATE HumanResources.Department
SET GroupName = ‘Composable SQL Test’
OUTPUT Deleted.GroupName as ‘oldvalue’,
Inserted.GroupName as ‘newvalue’
WHERE Name = ‘Sales’) Q;
Trang 7SELECT oldvalue, newvalue FROM CompSQL
WHERE newvalue <> ‘TestData’;
Result:
- -Sales and Marketing Composable SQL Test
Note several restrictions on composable SQL:
■ The update DML in the subquery must modify a local table and cannot be a partitioned view
■ The composable SQL query cannot include nested composable SQL, aggregate function, sub-query, ranking function, full-text features, user-defined functions that perform data access, or thetextptrfunction
■ The target table must be a local base table with no triggers, no foreign keys, no merge replication, or updatable subscriptions for transactional replication
Summary
While the basic nuts and bolts of subqueries may appear simple, they open a world of possibilities, as
they enable you to build complex nested queries that pull and twist data into the exact shape that is
needed to solve a difficult problem As you continue to play with subqueries, I think you’ll agree that
herein lies the power of SQL — and if you’re still developing primarily with the GUI tools, this might
provide the catalyst to move you to developing SQL using the query text editor
A few key points from this chapter:
■ Simple subqueries are executed once and the results are inserted into the outer query
■ Subqueries can be used in nearly every portion of the query — not just as derived tables
■ Correlated subqueries refer to the outer query, so they can’t be executed by themselves Con-ceptually, the outer query is executed and the results are passed to the correlated subquery, which is executed once for every row in the outer query
■ You don’t need to memorize how to code relational division; just remember that if you need to join not on any row but every row, then relational division is the set-based solution to do the job
■ Composable SQL is useful if you need to write to multiple tables from a single transaction, but there are plenty of limitations
The previous chapters established the foundation for working with SQL, covering theSELECT
state-ment, expressions, joins, and unions, while this chapter expanded theSELECTwith powerful subqueries
and CTEs If you’re reading through this book sequentially, congratulations — you are now over the
hump of learning SQL If you can master relational algebra and subqueries, the rest is a piece of cake
The next chapter continues to describe the repertoire of data-retrieval techniques with aggregation
queries, where using subqueries pays off
Trang 8Aggregating Data
IN THIS CHAPTER
Calculating sums and averages Statistical analysis
Grouping data within a query Solving aggravating aggregation problems
Generating cumulative totals Building crosstab queries with the case, pivot, and dynamic methods
The Information Architecture Principle in Chapter 2 implies that
informa-tion, not just data, is an asset Turning raw lists of keys and data into
useful information often requires summarizing data and grouping it in
meaningful ways While summarization and analysis can certainly be performed
with other tools, such as Reporting Services, Analysis Services, or an external tool
such as SAS, SQL is a set-based language, and a fair amount of summarizing and
grouping can be performed very well within the SQLSELECTstatement
SQL excels at calculating sums, max values, and averages for the entire data set
or for segments of data In addition, SQL queries can create cross-tabulations,
commonly known as pivot tables.
Simple Aggregations
The premise of an aggregate query is that instead of returning all the selected
rows, SQL Server returns a single row of computed values that summarizes the
original data set, as illustrated in Figure 12-1 More complex aggregate queries
can slice the selected rows into subsets and then summarize every subset
The types of aggregate calculations range from totaling the data to performing
basic statistical operations
It’s important to note that in the logical order of the SQL query, the aggregate
functions (indicated by the Summing function in the diagram) occur following
theFROMclause and theWHEREfilters This means that the data can be
assem-bled and filtered prior to being summarized without needing to use a subquery,
although sometimes a subquery is still needed to build more complex aggregate
queries (as detailed later in the ‘‘Aggravating Queries’’ section in this chapter.)
Trang 9What’s New with Query Aggregations?
Microsoft continues to evolve T-SQL’s ability to aggregate data SQL Server 2005 included the capability
to roll your own aggregate functions using the NET CLR SQL Server 2008 expands this feature by
removing the 8,000-byte limit on intermediate results for CLR user-defined aggregate functions
The most significant enhancement to query aggregation in SQL Server 2008 is the ability to use grouping sets
to further define the CUBE and ROLLUP functions with the GROUP BY clause
WITH ROLLUPand WITH CUBE have been deprecated, as they are non-ISO-compliant syntax for special
cases of the ISO-compliant syntax They are replaced with the new, more powerful, syntax for ROLLUP and
CUBE
FIGURE 12-1
The aggregate function produces a single row result from a data set
Where
From
Col(s), Expr(s) Summing Single Row Data
Source(s)
Basic aggregations
SQL includes a set of aggregate functions, listed in Table 12-1, which can be used as expressions in the
SELECTstatement to return summary data
ON the WEBSITE
ON the WEBSITE The code examples for this chapter use a small table called RawData The code to create and populate this data set is at the beginning of the chapter’s script You can
download the script from www.SQLServerBible.com.
CREATE TABLE RawData ( RawDataID INT NOT NULL IDENTITY PRIMARY KEY, Region VARCHAR(10) NOT NULL,
Category CHAR(1) NOT NULL, Amount INT NULL,
SalesDate Date NOT NULL );
Trang 10TABLE 12-1
Basic Aggregate Functions
Aggregate Function Data Type Supported Description
sum() Numeric Totals all the non-null values in the column
avg() Numeric Averages all the non-null values in the column The
result has the same data type as the input, so the input is often converted to a higher precision, such as avg(cast col as a float)
min() Numeric, string,
datetime
Returns the smallest number or the first datetime or the first string according to the current collation from the column
max() Numeric, string,
datetime
Returns the largest number or the last datetime or the last string according to the current collation from the column
Count[_big](*) Any data type
(row-based)
Performs a simple count of all the rows in the result set up to 2,147,483,647 The count_big() variation uses the bigint data type and can handle up to
2 ˆ 63-1 rows
Count[_big]
([distinct]
column)
Any data type (row-based)
Performs a simple count of all the rows with non-null values in the column in the result set up to
2,147,483,647 The distinct option eliminates duplicate rows Will not count blobs
This simple aggregate query counts the number of rows in the table and totals the Amount column In
lieu of returning the actual rows from theRawDatatable, the query returns the summary row with the
row count and total Therefore, even though there are 24 rows in theRawDatatable, the result is a
single row:
SELECT COUNT(*) AS Count,
SUM(Amount) AS [Sum]
FROM RawData;
Result: