Hướng dẫn học Microsoft SQL Server 2008 part 35 ppt

Here, the fourth quarter is included in the result despite the lack of data for the fourth quarter for 2009.TheGROUP BY ALLincludes the fourth quarter because there is data for the fourt

Trang 1

Here, the fourth quarter is included in the result despite the lack of data for the fourth quarter for 2009.

TheGROUP BY ALLincludes the fourth quarter because there is data for the fourth quarter for 2008:

SELECT DATEPART(qq, SalesDate) AS [Quarter], Count(*) as Count,

Sum(Amount) as [Sum], Avg(Amount) as [Avg]

FROM RawData WHERE Year(SalesDate) = 2009

GROUP BY ALL DATEPART(qq, SalesDate);

Result:

The real problem with theGROUP BY ALLsolution is that it’s dependent on data being present in the

table, but outside the currentWhereclause filter If the fourth quarter data didn’t exist for another year

other than 2009, then the query would have not listed the fourth quarter, period

A better solution to listing all data in aGROUP BYis to left outer join with a known set of complete

data In the following case, theVALUELISTsubquery sets up a list of quarters TheLEFT OUTER JOIN

includes all the rows from theVALUELISTsubquery and matches up any rows with values from the

aggregate query:

SELECT ValueList.Quarter, Agg.[Count],

Agg.[Sum], Agg.[Avg]

FROM ( VALUES (1),

(2), (3), (4) ) AS ValueList (Quarter)

LEFT JOIN (SELECT DATEPART(qq, SalesDate) AS [Quarter],

COUNT(*) AS Count, SUM(Amount) AS [Sum], AVG(Amount) AS [Avg]

FROM RawData WHERE YEAR(SalesDate) = 2009 GROUP BY DATEPART(qq, SalesDate)) Agg

ON ValueList.Quarter = Agg.Quarter ORDER BY ValueList.Quarter ;

Trang 2

In my testing, the fixed values list solution is slightly faster than the deprecatedGROUP BY ALLsolution

Nesting aggregations

Aggregated data is often useful, and it can be even more useful to perform secondary

aggrega-tions on aggregated data For example, an aggregate query can easilySUM()each category and

year/quarter within a subquery, but which category has the max value for each year/quarter? An obvious

MAX(SUM())doesn’t work because there’s not enough information to tell SQL Server how to nest the

aggregation groupings

Solving this problem requires a subquery to create a record set from the first aggregation, and an outer

query to perform the second level of aggregation For example, the following query sums by quarter

and category, and then the outer query uses aMAX()to determine which sum is the greatest for each

quarter:

Select Y,Q, Max(Total) as MaxSum

FROM ( Calculate Sums

SELECT Category, Year(SalesDate) as Y,

DatePart(q,SalesDate) as Q, Sum(Amount) as Total

FROM RawData GROUP BY Category, Year(SalesDate), DatePart(q,SalesDate)

) AS sq

GROUP BY Y,Q

ORDER BY Y,Q;

If it’s easier to read, here’s the same query using common table expressions (CTEs) instead of a derived

table subquery:

WITH sq AS ( Calculate Sums

SELECT Category, YEAR(SalesDate) AS Y,

DATEPART(q, SalesDate) AS Q, SUM(Amount) AS Total FROM RawData

GROUP BY Category, YEAR(SalesDate), DATEPART(q, SalesDate)) SELECT Y, Q, MAX(Total) AS MaxSum

FROM sq

GROUP BY Y, Q;

Trang 3

Including detail descriptions

While it’s nice to report theMAX(SUM())of 147 for the first quarter of 2006, who wants to manually

look up which category matches that sum? The next logical step is to include descriptive information

about the aggregate data To add descriptive information for the detail columns, join with a subquery on

the detail values:

SELECT MaxQuery.Y, MaxQuery.Q, AllQuery.Category, MaxQuery.MaxSum as MaxSum FROM ( Find Max Sum Per Year/Quarter

Select Y,Q, Max(Total) as MaxSum From ( Calculate Sums

select Category, Year(SalesDate) as Y, DatePart(q,SalesDate) as Q, Sum(Amount) as Total from RawData

group by Category, Year(SalesDate), DatePart(q,SalesDate)) AS sq Group By Y,Q

) AS MaxQuery

INNER JOIN ( All Data Query Select Category, Year(SalesDate) as Y, DatePart(q,SalesDate) as Q, Sum(Amount) as Total

From RawData Group By Category, Year(SalesDate), DatePart(q,SalesDate)

) AS AllQuery

ON MaxQuery.Y = AllQuery.Y AND MaxQuery.Q = AllQuery.Q AND MaxQuery.MaxSum = AllQuery.Total ORDER BY MaxQuery.Y, MaxQuery.Q;

Result:

-

While the query appears complex at first glance, it’s actually just an extension of the preceding query (in

bold, with the table alias ofMaxQuery.)

Trang 4

The second subquery (with the alias ofAllQuery) finds the sum of every category and year/quarter.

JoiningMaxQuerywithAllQueryon the sum and year/quarter is used to locate the category and

return the descriptive value along with the detail data

In this case, the CTE solution really starts to pay off, as the subquery doesn’t have to be repeated The

following query is exactly equivalent to the preceding one (same results, same execution plan, same

per-formance), but shorter, easier to understand, and cheaper to maintain:

WITH AllQuery AS

( All Data Query

SELECT Category,

YEAR(SalesDate) AS Y, DATEPART(qq, SalesDate) AS Q, SUM(Amount) AS Total

GROUP BY Category,

YEAR(SalesDate), DATEPART(qq, SalesDate)) SELECT MaxQuery.Y, MaxQuery.Q, AllQuery.Category, MaxQuery.MaxSum

FROM ( Find Max Sum Per Year/Quarter

Select Y,Q, Max(Total) as MaxSum

From AllQuery Group By Y,Q ) AS MaxQuery INNER JOIN AllQuery

ON MaxQuery.Y = AllQuery.Y

AND MaxQuery.Q = AllQuery.Q

AND MaxQuery.MaxSum = AllQuery.Total

ORDER BY MaxQuery.Y, MaxQuery.Q;

Another alternative is to use the ranking functions and theOVER()clause, introduced in SQL Server

2005 TheRankedCTE refers to theAllQueryCTE The following query produces the same result

and is slightly more efficient:

WITH

AllQuery AS

( All Data Query

SELECT Category,

YEAR(SalesDate) AS Y, DATEPART(qq, SalesDate) AS Q, SUM(Amount) AS Total

GROUP BY Category,

YEAR(SalesDate), DATEPART(qq, SalesDate)),

Ranked AS

( All data ranked after summing

SELECT Category, Y, Q, Total,

RANK() OVER (PARTITION BY Y, Q ORDER BY Total DESC) AS rn FROM AllQuery)

Trang 5

SELECT Y, Q, Category, Total AS MaxSum FROM Ranked

ORDER BY Y, Q;

Ranking functions and the OVER() clause are explained in the next chapter, ‘‘Windowing and Ranking.’’

OLAP in the Park

While Reporting Services can easily add subtotals and totals without any extra work by the query, and

Analysis Services builds beautiful cubes, such feats of data contortion are not exclusive to OLAP tools

The relational engine can take a lap in that park as well

TheROLLUPandCUBEextensions toGROUP BYgenerate OLAP-type summaries of the data with

subto-tals and tosubto-tals The columns to be totaled are defined similarly to how grouping sets can defineGROUP

BYcolumns

The older non-ANSI standard WITH ROLLUP and WITH CUBE are deprecated The syntax still works for now, but they will be removed from a future version of SQL Server This section covers only the newer syntax — it’s much cleaner and offers more control I think you’ll like it.

TheROLLUPandCUBEaggregate functions generate subtotals and grand totals as separate rows, and

supply a null in theGROUP BYcolumn to indicate the grand total.ROLLUPgenerates subtotal and total

rows for theGROUP BYcolumns.CUBEextends the capabilities by generating subtotal rows for every

GROUP BYcolumn.ROLLUPandCUBEqueries also automatically generate a grand total row

A specialGROUPING()function is true when the row is a subtotal or grand total row for the group

Rollup subtotals

TheROLLUPoption, placed after theGROUP BYclause, instructs SQL Server to generate an additional

total row In this example, theGROUPING()function is used by aCASEexpression to convert the total

row to something understandable:

SELECT GROUPING(Category) AS ‘Grouping’, Category,

CASE GROUPING(Category) WHEN 0 THEN Category WHEN 1 THEN ‘All Categories’

END AS CategoryRollup, SUM(Amount) AS Amount FROM RawData

GROUP BY ROLLUP(Category);

Result:

Grouping Category CategoryRollup Amount - - -

Trang 6

0 Z Z 215

1 NULL All Categories 946

The previous example had one column in theGROUP BY ROLLUP(), but just as theGROUP BYcan

organize by multiple columns, so can theGROUP BY ROLLUP()

The next example builds a more detailed summary of the data, with subtotals for each grouping of

category and region:

SELECT

CASE GROUPING(Category)

WHEN 0 THEN Category

WHEN 1 THEN ‘All Categories’

END AS Category,

CASE GROUPING(Region)

WHEN 0 THEN Region

WHEN 1 THEN ‘All Regions’

END AS Region,

SUM(Amount) AS Amount

FROM RawData

GROUP BY ROLLUP(Category, Region)

Result:

-

All Categories All Regions 946

But wait, there’s more Multiple columns can be combined into a single grouping level The following

query placesCategoryandRegionin parentheses inside theROLLUPparentheses and thus treats

each combination of category and region as a single group:

SELECT

CASE GROUPING(Category)

WHEN 0 THEN Category

WHEN 1 THEN ‘All Categories’

END AS Category,

CASE GROUPING(Region)

Trang 7

WHEN 0 THEN Region WHEN 1 THEN ‘All Regions’

END AS Region, COUNT(*) AS Count FROM RawData

GROUP BY ROLLUP((Category, Region))

Result:

-

Cube queries

A cube query is the next logical progression beyond a rollup query: It adds subtotals for every possible

grouping in a multidimensional manner — just like Analysis Services Using the same example, the

rollup query had subtotals for each category; the cube query has subtotals for each category and each

reagion:

SELECT CASE GROUPING(Category) WHEN 0 THEN Category WHEN 1 THEN ‘All Categories’

END AS Category, CASE GROUPING(Region) WHEN 0 THEN Region WHEN 1 THEN ‘All Regions’

END AS Region, COUNT(*) AS Count FROM RawData R

GROUP BY CUBE(Category, Region)

ORDER BY Coalesce(R.Category, ‘ZZZZ’), Coalesce(R.Region, ‘ZZZZ’)

Result:

-

Trang 8

X South 165

All Categories MidWest 145

All Categories NorthEast 236

All Categories South 485

All Categories West 80

Building Crosstab Queries

Crosstab queries take the power of the previous cube query and give it more impact Although an

aggregate query canGROUP BYmultiple columns, the result is still columnar and less than perfect for

scanning numbers quickly The cross-tabulation, or crosstab, query pivots the secondGROUP BYcolumn

(or dimension) values counterclockwise 90 degrees and turns it into the crosstab columns, as shown

in Figure 12-4 The limitation, of course, is that while a columnarGROUP BYquery can have multiple

aggregate functions, a crosstab query has difficulty displaying more than a single measure

The term crosstab query describes the result set, not the method of creating the crosstab, because there

are multiple programmatic methods for generating a crosstab query — some better than others The

following sections describe ways to create the same result

Pivot method

Microsoft introduced thePIVOTmethod for coding crosstab queries with SQL Server 2005 The pivot

method deviates from the normal logical query flow by performing the aggregateGROUP BYfunction

and generating the crosstab results as a data source within theFROMclause

If you think ofPIVOTas a table-valued function that’s used as a data source, then it accepts two

parameters The first parameter is the aggregate function for the crosstab’s values The second measure

parameter lists the pivoted columns In the following example, the aggregate function sums theAmount

column, and the pivoted columns are the regions BecausePIVOTis part of theFROMclause, the data

set needs a named range or table alias:

SELECT Category, MidWest, NorthEast, South, West

FROM RawData

PIVOT

(SUM(Amount)

FOR Region IN (South, NorthEast, MidWest,West)

) AS pt

Trang 9

FIGURE 12-4

Pivoting the second group bygroup bycolumn creates a crosstab query Here, the previousgroup

bycube query’s region values are pivoted to become the crosstab query columns

Region values pivot into columns

Result:

-

Trang 10

Y NULL NULL 72 NULL

The result is not what was expected! This doesn’t look at all like the crosstab result shown in

Figure 12-4 That’s because thePIVOTfunction used every column provided to it Because theAmount

andRegionare specified, it assumed that every remaining column should be used for theGROUP BY,

so it grouped byCategoryandSalesDate There’s no way to explicitly define theGROUP BYfor the

PIVOT It uses an implicitGROUP BY

The solution is to use a subquery to select only the columns that should be submitted to thePIVOT

command:

SELECT Category, MidWest, NorthEast, South, West

FROM (SELECT Category, Region, Amount

FROM RawData) sq PIVOT

(SUM(Amount)

FOR Region IN (MidWest, NorthEast, South, West)

) AS pt

Result:

-

Now the result looks closer to the crosstab result in Figure 12-4

Case expression method

TheCASEexpression method starts with a normalGROUP BYquery generating a row for each value

in theGROUP BYcolumn Adding aROLLUPfunction to theGROUP BYadds a nice grand totals row to

the crosstab

To generate the crosstab columns, aCASEexpression filters the data summed by the aggregate

func-tion For example, if theregionis ‘‘south,’’ then theSUM()will see the amountvalue, but if the

regionisn’t ‘‘south,’’ then theCASEexpression passes a0to theSUM()function It’s beautifully

simple

TheCASEexpression has three clear advantages over thePIVOTmethod, making it easier both to code

and to maintain:

■ TheGROUP BYis explicit There’s no guessing which columns will generate the rows

■ The crosstab columns are defined only once

■ It’s easy to add a grand totals row

Định dạng
Số trang	10
Dung lượng	537,14 KB