Hướng dẫn học Microsoft SQL Server 2008 part 36 pdf

TheCASEexpression method’s query execution plan is identical to the plan generated by thePIVOTmethod: SELECT CASE GROUPINGCategory WHEN 0 THEN Category WHEN 1 THEN ‘All Categories’ END A

Trang 1

TheCASEexpression method’s query execution plan is identical to the plan generated by thePIVOT

method:

SELECT CASE GROUPING(Category) WHEN 0 THEN Category WHEN 1 THEN ‘All Categories’

END AS Category, SUM(CASE WHEN Region = ‘MidWest’ THEN Amount ELSE 0 END)

AS MidWest, SUM(CASE WHEN Region = ‘NorthEast’ THEN Amount ELSE 0 END)

AS NorthEast, SUM(CASE WHEN Region = ‘South’ THEN Amount ELSE 0 END)

AS South, SUM(CASE WHEN Region = ‘West’ THEN Amount ELSE 0 END)

AS West, SUM(Amount) AS Total FROM RawData

GROUP BY RollUp (Category)

ORDER BY Coalesce(Category, ‘ZZZZ’) Result:

Category MidWest NorthEast South West Total - -

Dynamic crosstab queries

The rows of a crosstab query are automatically dynamically generated by the aggregation at runtime;

however, in both thePIVOTmethod and theCASEexpression method, the crosstab columns (region

in this example) must be hard-coded in the SQL statement

The only way to create a crosstab query with dynamic columns is to determine the columns at execution

time and assemble a dynamic SQL command to execute the crosstab query While it could be done with

a cursor, the following example uses a multiple-assignment variableSELECTto create the list of regions

in the@SQLStr A little string manipulation to assemble the pivot statement and ansp_executesql

command completes the job:

DECLARE @SQLStr NVARCHAR(1024) SELECT @SQLStr = COALESCE(@SQLStr + ‘,’, ‘’) + [a].[Column]

FROM (SELECT DISTINCT Region AS [Column]

FROM RawData) AS a SET @SQLStr = ‘SELECT Category, ‘ + @SQLStr + ‘ FROM (Select Category, Region, Amount from RawData) sq ‘

Trang 2

+ ‘ PIVOT (Sum (Amount) FOR Region IN (’

+ @SQLStr + ‘)) AS pt’

PRINT @SQLStr

EXEC sp_executesql @SQLStr

Result:

SELECT Category, MidWest,NorthEast,South,West FROM (Select

Category, Region, Amount from RawData) sq PIVOT (Sum (Amount)

FOR Region IN (MidWest,NorthEast,South,West)) AS pt

Category MidWest NorthEast South West

-

This example is only to demonstrate the technique for building a dynamic crosstab query.

Anytime you’re working with dynamic SQL, be sure to guard against SQL injection, which

is discussed in Chapter 29, ‘‘Dynamic SQL and Code Generation.’’

An Analysis Services cube is basically a dynamic crosstab query on steroids For more

about designing these high-performance interactive cubes, turn to Chapter 71, ‘‘Building

Multidimensional Cubes with Analysis Services.’’

Unpivot

The inverse of a crosstab query is theUNPIVOTcommand, which is extremely useful for normalizing

denormalized data Starting with a table that looks like the result of a crosstab, theUNPIVOTcommand

will twist the data back to a normalized list Of course, theUNPIVOTcan only normalize the data

sup-plied to it, so if the pivoted data is an aggregate summary, that’s all that will be normalized The details

that created the aggregate summary won’t magically reappear

The following script sets up a table populated with crosstab data:

IF OBJECT_ID(’Ptable’) IS NOT NULL

DROP TABLE Ptable

go

SELECT Category, MidWest, NorthEast, South, West

INTO PTable

FROM (SELECT Category, MidWest, NorthEast, South, West

FROM (SELECT Category, Region, Amount

FROM RawData) sq PIVOT

(SUM(Amount) FOR Region IN (MidWest, NorthEast, South, West) ) AS pt

) AS Q

Trang 3

SELECT * FROM PTable Result:

Category MidWest NorthEast South West -

TheUNPIVOTcommand can now pick apart thePtabledata and convert it back into a normalized

form:

SELECT * FROM PTable UNPIVOT (Measure FOR Region IN (South, NorthEast, MidWest, West) ) as sq

Result:

Category Measure Region - -

Cumulative Totals (Running Sums)

There are numerous reasons for calculating cumulative totals, or running sums, in a database, such as

account balances and inventory quantity on hand, to name only two Of course, it’s easy to just pump

the data to a reporting tool and let the report control calculate the running sum, but those calculations

are then lost It’s much better to calculate the cumulative total in the database and then report from

consistent numbers

Cumulative totals is one area that defies the norm for SQL As a rule, SQL excels at working with sets,

but calculating a cumulative total for a set of data is based on comparing individual rows, so an iterative

row-based cursor solution performs much better than a set-based operation

Trang 4

Correlated subquery solution

First, here’s the set-based solution The correlated subquery sums every row, from the first row to every

row in the outer query The first row sums from the first row to the first row The second row sums

from the first row to the second row The third row sums from the first row to the third row, and so on

until the hundred thousandths row sums from the first row to the hundred thousandths row

For a small set this solution works well enough, but as the data set grows, the correlated subquery

method becomes exponentially slower, which is why whenever someone is blogging about this cool

solution, the sample code tends to have atop(100)in theSELECT:

USE AdventureWorks2008;

SET NoCount NOCOUNT ON;

SELECT OuterQuery.SalesOrderIdD, OuterQuery.TotalDue,

(Select sumSELECT SUM(InnerQuery.TotalDue)

From FROM Sales.SalesOrderHeader AS InnerQuery

Where WHERE InnerQuery.SalesOrderID

<= OuterQuery.SalesOrderID ) as AS CT

FROM Sales.SalesOrderHeader AS OuterQuery

ORDER BY OuterQuery.SalesOrderID;

On Maui (my Dell 6400 notebook), the best time achieved for that query was 2 minutes, 19 seconds to

process 31,465 rows Youch!

T-SQL cursor solution

With this solution, the cursor fetches the next row, does a quick add, and updates the value in the row

Therefore, it’s doing more work than the previousSELECT— it’s writing the cumulative total value

back to the table

The first couple of statements add aCumulativeTotalcolumn and make sure the table isn’t

frag-mented From there, the cursor runs through the update:

USE AdventureWorks;

SET NoCount ON;

ALTER TABLE Sales.SalesOrderHeader

ADD CumulativeTotal MONEY NOT NULL

CONSTRAINT dfSalesOrderHeader DEFAULT(0);

ALTER INDEX ALL ON Sales.SalesOrderHeader

REBUILD WITH (FILLFACTOR = 100, SORT_IN_TEMPDB = ON);

DECLARE

@SalesOrderID INT,

@TotalDue MONEY,

@CumulativeTotal MONEY = 0;

Trang 5

DECLARE cRun CURSOR STATIC FOR

SELECT SalesOrderID, TotalDue FROM Sales.SalesOrderHeader ORDER BY SalesOrderID;

OPEN cRun;

prime the cursor FETCH cRun INTO @SalesOrderID, @TotalDue;

WHILE @@Fetch_Status = 0 BEGIN;

SET @CumulativeTotal += @TotalDue;

UPDATE Sales.SalesOrderHeader SET CumulativeTotal = @CumulativeTotal WHERE SalesOrderID = @SalesOrderID;

fetch next FETCH cRun INTO @SalesOrderID, @TotalDue;

END;

CLOSE cRun;

DEALLOCATE cRun;

go SELECT SalesOrderID, TotalDue, CumulativeTotal FROM Sales.SalesOrderHeader

ORDER BY OrderDate, SalesOrderID;

go ALTER TABLE Sales.SalesOrderHeader DROP CONSTRAINT dfSalesOrderHeader;

ALTER TABLE Sales.SalesOrderHeader DROP COLUMN CumulativeTotal;

The T-SQL cursor with the additional update functionality pawned the set-based solution with an

execution time of 15 seconds! w00t! That’s nearly a magnitude difference Go cursor!

Multiple assignment variable solution

Another solution was posted on my blog (http://tinyurl.com/ajs3tr) in response to a screencast

I did on cumulative totals and cursors

The multiple assignment variable accumulates data in a variable iteratively during a set-based operation

It’s fast — the following multiple assignment variable solves the cumulative total problem in about one

second:

DECLARE @CumulativeTotal MONEY = 0 UPDATE Sales.SalesOrderHeader

Trang 6

SET @CumulativeTotal=CumulativeTotal

=@CumulativeTotal+ISNULL(TotalDue, 0) With SQL Server 2008, the multiple assignment variable seems to respect theorder bycause, so I’m

cautiously optimistic about using this solution However, it’s not documented or supported by Microsoft,

so if the order is critical, and it certainly is to a cumulative totals problem, then I recommend the T-SQL

cursor solution If you do choose the multiple assignment variable solution, be sure to test it thoroughly

with every new service pack

Summary

SQL Server excels in aggregate functions, with the proverbial rich suite of features, and it is very

capable of calculating sums and aggregates to suit nearly any need From the simpleCOUNT()aggregate

function to the complex dynamic crosstab query and the newPIVOTcommand, these query methods

enable you to create powerful data analysis queries for impressive reports The most important points to

remember about aggregation are as follows:

■ Aggregate queries generate a single summary row, so every column has to be an aggregate

function

■ There’s no performance difference betweenCOUNT(*)andCOUNT(pk)

■ Aggregate functions, such asCOUNT(column)andAVG(column), ignore nulls, which can be

a good thing, and a reason why nulls make life easier for the database developer

■ GROUP BYqueries divide the data source into several segmented data sets and then generate a

summary row for each group ForGROUP BYqueries, theGROUP BYcolumns can and should

be in the column list

■ In the logical flow of the query, theGROUP BYoccurs after theFROMclause and theWHERE

clause, so when coding the query, get the data properly selected and then add theGROUP BY

■ Complex aggregations (e.g., nested aggregations) often require CTEs or subqueries Design

the query from the inside out — that is, design the aggregate subquery first and then add the

outer query

■ GROUP BY’sROLLUPandCUBEoption have a new syntax, and they can be as powerful as

Analysis Service’s cubes

■ There are several way to code a crosstab query I recommend using aGROUP BYandCASE

expressions, rather than thePIVOTsyntax

■ Dynamic crosstabs are possible only with dynamic SQL

■ Calculating cumulative totals (running sums) is one of the few problems best solved by

a cursor

The next chapter continues working with summary data using the windowing and ranking technology of

theOVER()clause

Trang 8

Windowing and Ranking

IN THIS CHAPTER

Creating an independent sort

of the result set Grouping result sets Calculating ranks, row numbers, and ntiles

Have you ever noticed the hidden arrow in the FedEx logo? Once you

know that it’s there, it’s obvious, but in an informal poll of FedEx

drivers, not one of them was aware of the arrow Sometimes, just seeing

things in a different perspective can help clarify the picture

That’s what SQL’s windowing and ranking does — the windowing (using the

over()clause) provides a new perspective on the data The ranking functions

then use that perspective to provide additional ways to manipulate the query

results

Windowing and ranking are similar to the last chapter’s aggregate queries, but

they belong in their own chapter because they work with an independent sort

order separate from the query’sorder byclause, and should be thought of as

a different technology than traditional aggregate queries

Windowing

Before the ranking functions can be applied to the query, the window must be

established Even though the SQL query syntax places these two steps together,

logically it’s easier to think through the window and then add the ranking

function

Referring back to the logical sequence of the query in Chapter 8, ‘‘Introducing

Basic Query Flow,’’ theOVER()clause occurs in the latter half of the logical flow

of the query in step 6 after the column expressions andORDER BYbut before any

verbs (OUTPUT,INSERT,UPDATE,DELETE, orUNION)

Trang 9

What’s New with Windowing and Ranking?

The functionality was introduced in SQL Server 2005, and I had hoped it would be expanded for 2008

Windowing and ranking hold so much potential, and there’s much more functionality in the ANSI SQL

specification, but unfortunately, there’s nothing new with windowing and ranking in SQL Server 2008

All the examples in this chapter use the AdventureWorks2008 sample database.

The Over() clause

TheOVER()clause creates a new window on the data — think of it as a new perspective, or

inde-pendent ordering, of the rows — which may or may not be the same as the sort order of theORDER

BYclause In a way, the windowing capability creates an alternate flow to the query with its own sort

order and ranking functions, as illustrated in Figure 13-1 The results of the windowing and ranking are

passed back into the query before theORDER BYclause

FIGURE 13-1

The windowing and ranking functions can be thought of as a parallel query process with an

independent sort order

Data Source(s)

From

Where Col(s),

Expr(s)

Order By

Windowing Sort Ranking

Functions

Predicate

Windowing and Ranking Query Flow

The complete syntaxOVER(ORDER BY columns) The columns may be any available column or

expression, just like theORDER BYclause; but unlike theORDER BYclause, theOVER()clause won’t

accept a column ordinal position, e.g.,1,2 Also, like theORDER BYclause, it can be ascending (asc),

the default, or descending (desc); and it can be sorted by multiple columns

The window’s sort order will take advantage of indexes and can be very fast, even if the sort order is

different from the main query’s sort order

Trang 10

In the following query, theOVER()clause creates a separate view to the data sorted byOrderDate

(ignore theROW_NUMBER()function for now):

USE AdventureWorks2008;

SELECT ROW_NUMBER() OVER(ORDER BY OrderDate) as RowNumber,

SalesOrderID, OrderDate

FROM Sales.SalesOrderHeader

WHERE SalesPersonID = 280

ORDER BY RowNumber;

Result (abbreviated, and note thatOrderDatedoes not include time information, so the results might

vary within a given date):

RowNumber SalesOrderID OrderDate

- -

Partitioning within the window

TheOVER()clause normally creates a single sort order, but it can divide the windowed data into

partitions, which are similar to groups in an aggregateGROUP BYquery This is dramatically powerful

because the ranking functions will be able to restart with every partition

The next query example uses theOVER()clause to create a sort order of the query results by

OrderDate, and then partition the data byYEAR()andMONTH() Notice that the syntax is the

opposite of the logical flow — thePARTITION BYgoes before theORDER BYwithin theOVER()

clause:

SELECT ROW_NUMBER()

OVER(Partition By

Year(OrderDate), Month(OrderDate)

ORDER BY OrderDate) as RowNumber, SalesOrderID, OrderDate

FROM Sales.SalesOrderHeader

WHERE SalesPersonID = 280

ORDER BY OrderDate;

Result (abbreviated):

RowNumber SalesOrderID OrderDate

- -

Định dạng
Số trang	10
Dung lượng	0,98 MB