Here, the fourth quarter is included in the result despite the lack of data for the fourth quarter for 2009.TheGROUP BY ALLincludes the fourth quarter because there is data for the fourt
Trang 1Here, the fourth quarter is included in the result despite the lack of data for the fourth quarter for 2009.
TheGROUP BY ALLincludes the fourth quarter because there is data for the fourth quarter for 2008:
SELECT DATEPART(qq, SalesDate) AS [Quarter], Count(*) as Count,
Sum(Amount) as [Sum], Avg(Amount) as [Avg]
FROM RawData WHERE Year(SalesDate) = 2009
GROUP BY ALL DATEPART(qq, SalesDate);
Result:
The real problem with theGROUP BY ALLsolution is that it’s dependent on data being present in the
table, but outside the currentWhereclause filter If the fourth quarter data didn’t exist for another year
other than 2009, then the query would have not listed the fourth quarter, period
A better solution to listing all data in aGROUP BYis to left outer join with a known set of complete
data In the following case, theVALUELISTsubquery sets up a list of quarters TheLEFT OUTER JOIN
includes all the rows from theVALUELISTsubquery and matches up any rows with values from the
aggregate query:
SELECT ValueList.Quarter, Agg.[Count],
Agg.[Sum], Agg.[Avg]
FROM ( VALUES (1),
(2), (3), (4) ) AS ValueList (Quarter)
LEFT JOIN (SELECT DATEPART(qq, SalesDate) AS [Quarter],
COUNT(*) AS Count, SUM(Amount) AS [Sum], AVG(Amount) AS [Avg]
FROM RawData WHERE YEAR(SalesDate) = 2009 GROUP BY DATEPART(qq, SalesDate)) Agg
ON ValueList.Quarter = Agg.Quarter ORDER BY ValueList.Quarter ;
Trang 2
In my testing, the fixed values list solution is slightly faster than the deprecatedGROUP BY ALLsolution
Nesting aggregations
Aggregated data is often useful, and it can be even more useful to perform secondary
aggrega-tions on aggregated data For example, an aggregate query can easilySUM()each category and
year/quarter within a subquery, but which category has the max value for each year/quarter? An obvious
MAX(SUM())doesn’t work because there’s not enough information to tell SQL Server how to nest the
aggregation groupings
Solving this problem requires a subquery to create a record set from the first aggregation, and an outer
query to perform the second level of aggregation For example, the following query sums by quarter
and category, and then the outer query uses aMAX()to determine which sum is the greatest for each
quarter:
Select Y,Q, Max(Total) as MaxSum
FROM ( Calculate Sums
SELECT Category, Year(SalesDate) as Y,
DatePart(q,SalesDate) as Q, Sum(Amount) as Total
FROM RawData GROUP BY Category, Year(SalesDate), DatePart(q,SalesDate)
) AS sq
GROUP BY Y,Q
ORDER BY Y,Q;
If it’s easier to read, here’s the same query using common table expressions (CTEs) instead of a derived
table subquery:
WITH sq AS ( Calculate Sums
SELECT Category, YEAR(SalesDate) AS Y,
DATEPART(q, SalesDate) AS Q, SUM(Amount) AS Total FROM RawData
GROUP BY Category, YEAR(SalesDate), DATEPART(q, SalesDate)) SELECT Y, Q, MAX(Total) AS MaxSum
FROM sq
GROUP BY Y, Q;
Trang 3
Including detail descriptions
While it’s nice to report theMAX(SUM())of 147 for the first quarter of 2006, who wants to manually
look up which category matches that sum? The next logical step is to include descriptive information
about the aggregate data To add descriptive information for the detail columns, join with a subquery on
the detail values:
SELECT MaxQuery.Y, MaxQuery.Q, AllQuery.Category, MaxQuery.MaxSum as MaxSum FROM ( Find Max Sum Per Year/Quarter
Select Y,Q, Max(Total) as MaxSum From ( Calculate Sums
select Category, Year(SalesDate) as Y, DatePart(q,SalesDate) as Q, Sum(Amount) as Total from RawData
group by Category, Year(SalesDate), DatePart(q,SalesDate)) AS sq Group By Y,Q
) AS MaxQuery
INNER JOIN ( All Data Query Select Category, Year(SalesDate) as Y, DatePart(q,SalesDate) as Q, Sum(Amount) as Total
From RawData Group By Category, Year(SalesDate), DatePart(q,SalesDate)
) AS AllQuery
ON MaxQuery.Y = AllQuery.Y AND MaxQuery.Q = AllQuery.Q AND MaxQuery.MaxSum = AllQuery.Total ORDER BY MaxQuery.Y, MaxQuery.Q;
Result:
-
While the query appears complex at first glance, it’s actually just an extension of the preceding query (in
bold, with the table alias ofMaxQuery.)
Trang 4The second subquery (with the alias ofAllQuery) finds the sum of every category and year/quarter.
JoiningMaxQuerywithAllQueryon the sum and year/quarter is used to locate the category and
return the descriptive value along with the detail data
In this case, the CTE solution really starts to pay off, as the subquery doesn’t have to be repeated The
following query is exactly equivalent to the preceding one (same results, same execution plan, same
per-formance), but shorter, easier to understand, and cheaper to maintain:
WITH AllQuery AS
( All Data Query
SELECT Category,
YEAR(SalesDate) AS Y, DATEPART(qq, SalesDate) AS Q, SUM(Amount) AS Total
GROUP BY Category,
YEAR(SalesDate), DATEPART(qq, SalesDate)) SELECT MaxQuery.Y, MaxQuery.Q, AllQuery.Category, MaxQuery.MaxSum
FROM ( Find Max Sum Per Year/Quarter
Select Y,Q, Max(Total) as MaxSum
From AllQuery Group By Y,Q ) AS MaxQuery INNER JOIN AllQuery
ON MaxQuery.Y = AllQuery.Y
AND MaxQuery.Q = AllQuery.Q
AND MaxQuery.MaxSum = AllQuery.Total
ORDER BY MaxQuery.Y, MaxQuery.Q;
Another alternative is to use the ranking functions and theOVER()clause, introduced in SQL Server
2005 TheRankedCTE refers to theAllQueryCTE The following query produces the same result
and is slightly more efficient:
WITH
AllQuery AS
( All Data Query
SELECT Category,
YEAR(SalesDate) AS Y, DATEPART(qq, SalesDate) AS Q, SUM(Amount) AS Total
GROUP BY Category,
YEAR(SalesDate), DATEPART(qq, SalesDate)),
Ranked AS
( All data ranked after summing
SELECT Category, Y, Q, Total,
RANK() OVER (PARTITION BY Y, Q ORDER BY Total DESC) AS rn FROM AllQuery)
Trang 5SELECT Y, Q, Category, Total AS MaxSum FROM Ranked
ORDER BY Y, Q;
Ranking functions and the OVER() clause are explained in the next chapter, ‘‘Windowing and Ranking.’’
OLAP in the Park
While Reporting Services can easily add subtotals and totals without any extra work by the query, and
Analysis Services builds beautiful cubes, such feats of data contortion are not exclusive to OLAP tools
The relational engine can take a lap in that park as well
TheROLLUPandCUBEextensions toGROUP BYgenerate OLAP-type summaries of the data with
subto-tals and tosubto-tals The columns to be totaled are defined similarly to how grouping sets can defineGROUP
BYcolumns
The older non-ANSI standard WITH ROLLUP and WITH CUBE are deprecated The syntax still works for now, but they will be removed from a future version of SQL Server This section covers only the newer syntax — it’s much cleaner and offers more control I think you’ll like it.
TheROLLUPandCUBEaggregate functions generate subtotals and grand totals as separate rows, and
supply a null in theGROUP BYcolumn to indicate the grand total.ROLLUPgenerates subtotal and total
rows for theGROUP BYcolumns.CUBEextends the capabilities by generating subtotal rows for every
GROUP BYcolumn.ROLLUPandCUBEqueries also automatically generate a grand total row
A specialGROUPING()function is true when the row is a subtotal or grand total row for the group
Rollup subtotals
TheROLLUPoption, placed after theGROUP BYclause, instructs SQL Server to generate an additional
total row In this example, theGROUPING()function is used by aCASEexpression to convert the total
row to something understandable:
SELECT GROUPING(Category) AS ‘Grouping’, Category,
CASE GROUPING(Category) WHEN 0 THEN Category WHEN 1 THEN ‘All Categories’
END AS CategoryRollup, SUM(Amount) AS Amount FROM RawData
GROUP BY ROLLUP(Category);
Result:
Grouping Category CategoryRollup Amount - - -
Trang 60 Z Z 215
1 NULL All Categories 946
The previous example had one column in theGROUP BY ROLLUP(), but just as theGROUP BYcan
organize by multiple columns, so can theGROUP BY ROLLUP()
The next example builds a more detailed summary of the data, with subtotals for each grouping of
category and region:
SELECT
CASE GROUPING(Category)
WHEN 0 THEN Category
WHEN 1 THEN ‘All Categories’
END AS Category,
CASE GROUPING(Region)
WHEN 0 THEN Region
WHEN 1 THEN ‘All Regions’
END AS Region,
SUM(Amount) AS Amount
FROM RawData
GROUP BY ROLLUP(Category, Region)
Result:
-
All Categories All Regions 946
But wait, there’s more Multiple columns can be combined into a single grouping level The following
query placesCategoryandRegionin parentheses inside theROLLUPparentheses and thus treats
each combination of category and region as a single group:
SELECT
CASE GROUPING(Category)
WHEN 0 THEN Category
WHEN 1 THEN ‘All Categories’
END AS Category,
CASE GROUPING(Region)
Trang 7WHEN 0 THEN Region WHEN 1 THEN ‘All Regions’
END AS Region, COUNT(*) AS Count FROM RawData
GROUP BY ROLLUP((Category, Region))
Result:
-
All Categories All Regions 946
Cube queries
A cube query is the next logical progression beyond a rollup query: It adds subtotals for every possible
grouping in a multidimensional manner — just like Analysis Services Using the same example, the
rollup query had subtotals for each category; the cube query has subtotals for each category and each
reagion:
SELECT CASE GROUPING(Category) WHEN 0 THEN Category WHEN 1 THEN ‘All Categories’
END AS Category, CASE GROUPING(Region) WHEN 0 THEN Region WHEN 1 THEN ‘All Regions’
END AS Region, COUNT(*) AS Count FROM RawData R
GROUP BY CUBE(Category, Region)
ORDER BY Coalesce(R.Category, ‘ZZZZ’), Coalesce(R.Region, ‘ZZZZ’)
Result:
-
Trang 8X South 165
All Categories MidWest 145
All Categories NorthEast 236
All Categories South 485
All Categories West 80
All Categories All Regions 946
Building Crosstab Queries
Crosstab queries take the power of the previous cube query and give it more impact Although an
aggregate query canGROUP BYmultiple columns, the result is still columnar and less than perfect for
scanning numbers quickly The cross-tabulation, or crosstab, query pivots the secondGROUP BYcolumn
(or dimension) values counterclockwise 90 degrees and turns it into the crosstab columns, as shown
in Figure 12-4 The limitation, of course, is that while a columnarGROUP BYquery can have multiple
aggregate functions, a crosstab query has difficulty displaying more than a single measure
The term crosstab query describes the result set, not the method of creating the crosstab, because there
are multiple programmatic methods for generating a crosstab query — some better than others The
following sections describe ways to create the same result
Pivot method
Microsoft introduced thePIVOTmethod for coding crosstab queries with SQL Server 2005 The pivot
method deviates from the normal logical query flow by performing the aggregateGROUP BYfunction
and generating the crosstab results as a data source within theFROMclause
If you think ofPIVOTas a table-valued function that’s used as a data source, then it accepts two
parameters The first parameter is the aggregate function for the crosstab’s values The second measure
parameter lists the pivoted columns In the following example, the aggregate function sums theAmount
column, and the pivoted columns are the regions BecausePIVOTis part of theFROMclause, the data
set needs a named range or table alias:
SELECT Category, MidWest, NorthEast, South, West
FROM RawData
PIVOT
(SUM(Amount)
FOR Region IN (South, NorthEast, MidWest,West)
) AS pt
Trang 9FIGURE 12-4
Pivoting the second group bygroup bycolumn creates a crosstab query Here, the previousgroup
bycube query’s region values are pivoted to become the crosstab query columns
Region values pivot into columns
Result:
-
Trang 10Y NULL NULL 72 NULL
The result is not what was expected! This doesn’t look at all like the crosstab result shown in
Figure 12-4 That’s because thePIVOTfunction used every column provided to it Because theAmount
andRegionare specified, it assumed that every remaining column should be used for theGROUP BY,
so it grouped byCategoryandSalesDate There’s no way to explicitly define theGROUP BYfor the
PIVOT It uses an implicitGROUP BY
The solution is to use a subquery to select only the columns that should be submitted to thePIVOT
command:
SELECT Category, MidWest, NorthEast, South, West
FROM (SELECT Category, Region, Amount
FROM RawData) sq PIVOT
(SUM(Amount)
FOR Region IN (MidWest, NorthEast, South, West)
) AS pt
Result:
-
Now the result looks closer to the crosstab result in Figure 12-4
Case expression method
TheCASEexpression method starts with a normalGROUP BYquery generating a row for each value
in theGROUP BYcolumn Adding aROLLUPfunction to theGROUP BYadds a nice grand totals row to
the crosstab
To generate the crosstab columns, aCASEexpression filters the data summed by the aggregate
func-tion For example, if theregionis ‘‘south,’’ then theSUM()will see the amountvalue, but if the
regionisn’t ‘‘south,’’ then theCASEexpression passes a0to theSUM()function It’s beautifully
simple
TheCASEexpression has three clear advantages over thePIVOTmethod, making it easier both to code
and to maintain:
■ TheGROUP BYis explicit There’s no guessing which columns will generate the rows
■ The crosstab columns are defined only once
■ It’s easy to add a grand totals row