TheCASEexpression method’s query execution plan is identical to the plan generated by thePIVOTmethod: SELECT CASE GROUPINGCategory WHEN 0 THEN Category WHEN 1 THEN ‘All Categories’ END A
Trang 1TheCASEexpression method’s query execution plan is identical to the plan generated by thePIVOT
method:
SELECT CASE GROUPING(Category) WHEN 0 THEN Category WHEN 1 THEN ‘All Categories’
END AS Category, SUM(CASE WHEN Region = ‘MidWest’ THEN Amount ELSE 0 END)
AS MidWest, SUM(CASE WHEN Region = ‘NorthEast’ THEN Amount ELSE 0 END)
AS NorthEast, SUM(CASE WHEN Region = ‘South’ THEN Amount ELSE 0 END)
AS South, SUM(CASE WHEN Region = ‘West’ THEN Amount ELSE 0 END)
AS West, SUM(Amount) AS Total FROM RawData
GROUP BY RollUp (Category)
ORDER BY Coalesce(Category, ‘ZZZZ’) Result:
Category MidWest NorthEast South West Total - -
Dynamic crosstab queries
The rows of a crosstab query are automatically dynamically generated by the aggregation at runtime;
however, in both thePIVOTmethod and theCASEexpression method, the crosstab columns (region
in this example) must be hard-coded in the SQL statement
The only way to create a crosstab query with dynamic columns is to determine the columns at execution
time and assemble a dynamic SQL command to execute the crosstab query While it could be done with
a cursor, the following example uses a multiple-assignment variableSELECTto create the list of regions
in the@SQLStr A little string manipulation to assemble the pivot statement and ansp_executesql
command completes the job:
DECLARE @SQLStr NVARCHAR(1024) SELECT @SQLStr = COALESCE(@SQLStr + ‘,’, ‘’) + [a].[Column]
FROM (SELECT DISTINCT Region AS [Column]
FROM RawData) AS a SET @SQLStr = ‘SELECT Category, ‘ + @SQLStr + ‘ FROM (Select Category, Region, Amount from RawData) sq ‘
Trang 2+ ‘ PIVOT (Sum (Amount) FOR Region IN (’
+ @SQLStr + ‘)) AS pt’
PRINT @SQLStr
EXEC sp_executesql @SQLStr
Result:
SELECT Category, MidWest,NorthEast,South,West FROM (Select
Category, Region, Amount from RawData) sq PIVOT (Sum (Amount)
FOR Region IN (MidWest,NorthEast,South,West)) AS pt
Category MidWest NorthEast South West
-
This example is only to demonstrate the technique for building a dynamic crosstab query.
Anytime you’re working with dynamic SQL, be sure to guard against SQL injection, which
is discussed in Chapter 29, ‘‘Dynamic SQL and Code Generation.’’
An Analysis Services cube is basically a dynamic crosstab query on steroids For more
about designing these high-performance interactive cubes, turn to Chapter 71, ‘‘Building
Multidimensional Cubes with Analysis Services.’’
Unpivot
The inverse of a crosstab query is theUNPIVOTcommand, which is extremely useful for normalizing
denormalized data Starting with a table that looks like the result of a crosstab, theUNPIVOTcommand
will twist the data back to a normalized list Of course, theUNPIVOTcan only normalize the data
sup-plied to it, so if the pivoted data is an aggregate summary, that’s all that will be normalized The details
that created the aggregate summary won’t magically reappear
The following script sets up a table populated with crosstab data:
IF OBJECT_ID(’Ptable’) IS NOT NULL
DROP TABLE Ptable
go
SELECT Category, MidWest, NorthEast, South, West
INTO PTable
FROM (SELECT Category, MidWest, NorthEast, South, West
FROM (SELECT Category, Region, Amount
FROM RawData) sq PIVOT
(SUM(Amount) FOR Region IN (MidWest, NorthEast, South, West) ) AS pt
) AS Q
Trang 3SELECT * FROM PTable Result:
Category MidWest NorthEast South West -
TheUNPIVOTcommand can now pick apart thePtabledata and convert it back into a normalized
form:
SELECT * FROM PTable UNPIVOT (Measure FOR Region IN (South, NorthEast, MidWest, West) ) as sq
Result:
Category Measure Region - -
Cumulative Totals (Running Sums)
There are numerous reasons for calculating cumulative totals, or running sums, in a database, such as
account balances and inventory quantity on hand, to name only two Of course, it’s easy to just pump
the data to a reporting tool and let the report control calculate the running sum, but those calculations
are then lost It’s much better to calculate the cumulative total in the database and then report from
consistent numbers
Cumulative totals is one area that defies the norm for SQL As a rule, SQL excels at working with sets,
but calculating a cumulative total for a set of data is based on comparing individual rows, so an iterative
row-based cursor solution performs much better than a set-based operation
Trang 4Correlated subquery solution
First, here’s the set-based solution The correlated subquery sums every row, from the first row to every
row in the outer query The first row sums from the first row to the first row The second row sums
from the first row to the second row The third row sums from the first row to the third row, and so on
until the hundred thousandths row sums from the first row to the hundred thousandths row
For a small set this solution works well enough, but as the data set grows, the correlated subquery
method becomes exponentially slower, which is why whenever someone is blogging about this cool
solution, the sample code tends to have atop(100)in theSELECT:
USE AdventureWorks2008;
SET NoCount NOCOUNT ON;
SELECT OuterQuery.SalesOrderIdD, OuterQuery.TotalDue,
(Select sumSELECT SUM(InnerQuery.TotalDue)
From FROM Sales.SalesOrderHeader AS InnerQuery
Where WHERE InnerQuery.SalesOrderID
<= OuterQuery.SalesOrderID ) as AS CT
FROM Sales.SalesOrderHeader AS OuterQuery
ORDER BY OuterQuery.SalesOrderID;
On Maui (my Dell 6400 notebook), the best time achieved for that query was 2 minutes, 19 seconds to
process 31,465 rows Youch!
T-SQL cursor solution
With this solution, the cursor fetches the next row, does a quick add, and updates the value in the row
Therefore, it’s doing more work than the previousSELECT— it’s writing the cumulative total value
back to the table
The first couple of statements add aCumulativeTotalcolumn and make sure the table isn’t
frag-mented From there, the cursor runs through the update:
USE AdventureWorks;
SET NoCount ON;
ALTER TABLE Sales.SalesOrderHeader
ADD CumulativeTotal MONEY NOT NULL
CONSTRAINT dfSalesOrderHeader DEFAULT(0);
ALTER INDEX ALL ON Sales.SalesOrderHeader
REBUILD WITH (FILLFACTOR = 100, SORT_IN_TEMPDB = ON);
DECLARE
@SalesOrderID INT,
@TotalDue MONEY,
@CumulativeTotal MONEY = 0;
Trang 5DECLARE cRun CURSOR STATIC FOR
SELECT SalesOrderID, TotalDue FROM Sales.SalesOrderHeader ORDER BY SalesOrderID;
OPEN cRun;
prime the cursor FETCH cRun INTO @SalesOrderID, @TotalDue;
WHILE @@Fetch_Status = 0 BEGIN;
SET @CumulativeTotal += @TotalDue;
UPDATE Sales.SalesOrderHeader SET CumulativeTotal = @CumulativeTotal WHERE SalesOrderID = @SalesOrderID;
fetch next FETCH cRun INTO @SalesOrderID, @TotalDue;
END;
CLOSE cRun;
DEALLOCATE cRun;
go SELECT SalesOrderID, TotalDue, CumulativeTotal FROM Sales.SalesOrderHeader
ORDER BY OrderDate, SalesOrderID;
go ALTER TABLE Sales.SalesOrderHeader DROP CONSTRAINT dfSalesOrderHeader;
ALTER TABLE Sales.SalesOrderHeader DROP COLUMN CumulativeTotal;
The T-SQL cursor with the additional update functionality pawned the set-based solution with an
execution time of 15 seconds! w00t! That’s nearly a magnitude difference Go cursor!
Multiple assignment variable solution
Another solution was posted on my blog (http://tinyurl.com/ajs3tr) in response to a screencast
I did on cumulative totals and cursors
The multiple assignment variable accumulates data in a variable iteratively during a set-based operation
It’s fast — the following multiple assignment variable solves the cumulative total problem in about one
second:
DECLARE @CumulativeTotal MONEY = 0 UPDATE Sales.SalesOrderHeader
Trang 6SET @CumulativeTotal=CumulativeTotal
=@CumulativeTotal+ISNULL(TotalDue, 0) With SQL Server 2008, the multiple assignment variable seems to respect theorder bycause, so I’m
cautiously optimistic about using this solution However, it’s not documented or supported by Microsoft,
so if the order is critical, and it certainly is to a cumulative totals problem, then I recommend the T-SQL
cursor solution If you do choose the multiple assignment variable solution, be sure to test it thoroughly
with every new service pack
Summary
SQL Server excels in aggregate functions, with the proverbial rich suite of features, and it is very
capable of calculating sums and aggregates to suit nearly any need From the simpleCOUNT()aggregate
function to the complex dynamic crosstab query and the newPIVOTcommand, these query methods
enable you to create powerful data analysis queries for impressive reports The most important points to
remember about aggregation are as follows:
■ Aggregate queries generate a single summary row, so every column has to be an aggregate
function
■ There’s no performance difference betweenCOUNT(*)andCOUNT(pk)
■ Aggregate functions, such asCOUNT(column)andAVG(column), ignore nulls, which can be
a good thing, and a reason why nulls make life easier for the database developer
■ GROUP BYqueries divide the data source into several segmented data sets and then generate a
summary row for each group ForGROUP BYqueries, theGROUP BYcolumns can and should
be in the column list
■ In the logical flow of the query, theGROUP BYoccurs after theFROMclause and theWHERE
clause, so when coding the query, get the data properly selected and then add theGROUP BY
■ Complex aggregations (e.g., nested aggregations) often require CTEs or subqueries Design
the query from the inside out — that is, design the aggregate subquery first and then add the
outer query
■ GROUP BY’sROLLUPandCUBEoption have a new syntax, and they can be as powerful as
Analysis Service’s cubes
■ There are several way to code a crosstab query I recommend using aGROUP BYandCASE
expressions, rather than thePIVOTsyntax
■ Dynamic crosstabs are possible only with dynamic SQL
■ Calculating cumulative totals (running sums) is one of the few problems best solved by
a cursor
The next chapter continues working with summary data using the windowing and ranking technology of
theOVER()clause
Trang 8Windowing and Ranking
IN THIS CHAPTER
Creating an independent sort
of the result set Grouping result sets Calculating ranks, row numbers, and ntiles
Have you ever noticed the hidden arrow in the FedEx logo? Once you
know that it’s there, it’s obvious, but in an informal poll of FedEx
drivers, not one of them was aware of the arrow Sometimes, just seeing
things in a different perspective can help clarify the picture
That’s what SQL’s windowing and ranking does — the windowing (using the
over()clause) provides a new perspective on the data The ranking functions
then use that perspective to provide additional ways to manipulate the query
results
Windowing and ranking are similar to the last chapter’s aggregate queries, but
they belong in their own chapter because they work with an independent sort
order separate from the query’sorder byclause, and should be thought of as
a different technology than traditional aggregate queries
Windowing
Before the ranking functions can be applied to the query, the window must be
established Even though the SQL query syntax places these two steps together,
logically it’s easier to think through the window and then add the ranking
function
Referring back to the logical sequence of the query in Chapter 8, ‘‘Introducing
Basic Query Flow,’’ theOVER()clause occurs in the latter half of the logical flow
of the query in step 6 after the column expressions andORDER BYbut before any
verbs (OUTPUT,INSERT,UPDATE,DELETE, orUNION)
Trang 9What’s New with Windowing and Ranking?
The functionality was introduced in SQL Server 2005, and I had hoped it would be expanded for 2008
Windowing and ranking hold so much potential, and there’s much more functionality in the ANSI SQL
specification, but unfortunately, there’s nothing new with windowing and ranking in SQL Server 2008
All the examples in this chapter use the AdventureWorks2008 sample database.
The Over() clause
TheOVER()clause creates a new window on the data — think of it as a new perspective, or
inde-pendent ordering, of the rows — which may or may not be the same as the sort order of theORDER
BYclause In a way, the windowing capability creates an alternate flow to the query with its own sort
order and ranking functions, as illustrated in Figure 13-1 The results of the windowing and ranking are
passed back into the query before theORDER BYclause
FIGURE 13-1
The windowing and ranking functions can be thought of as a parallel query process with an
independent sort order
Data Source(s)
From
Where Col(s),
Expr(s)
Order By
Windowing Sort Ranking
Functions
Predicate
Windowing and Ranking Query Flow
The complete syntaxOVER(ORDER BY columns) The columns may be any available column or
expression, just like theORDER BYclause; but unlike theORDER BYclause, theOVER()clause won’t
accept a column ordinal position, e.g.,1,2 Also, like theORDER BYclause, it can be ascending (asc),
the default, or descending (desc); and it can be sorted by multiple columns
The window’s sort order will take advantage of indexes and can be very fast, even if the sort order is
different from the main query’s sort order
Trang 10In the following query, theOVER()clause creates a separate view to the data sorted byOrderDate
(ignore theROW_NUMBER()function for now):
USE AdventureWorks2008;
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate) as RowNumber,
SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE SalesPersonID = 280
ORDER BY RowNumber;
Result (abbreviated, and note thatOrderDatedoes not include time information, so the results might
vary within a given date):
RowNumber SalesOrderID OrderDate
- -
Partitioning within the window
TheOVER()clause normally creates a single sort order, but it can divide the windowed data into
partitions, which are similar to groups in an aggregateGROUP BYquery This is dramatically powerful
because the ranking functions will be able to restart with every partition
The next query example uses theOVER()clause to create a sort order of the query results by
OrderDate, and then partition the data byYEAR()andMONTH() Notice that the syntax is the
opposite of the logical flow — thePARTITION BYgoes before theORDER BYwithin theOVER()
clause:
SELECT ROW_NUMBER()
OVER(Partition By
Year(OrderDate), Month(OrderDate)
ORDER BY OrderDate) as RowNumber, SalesOrderID, OrderDate
FROM Sales.SalesOrderHeader
WHERE SalesPersonID = 280
ORDER BY OrderDate;
Result (abbreviated):
RowNumber SalesOrderID OrderDate
- -