Hướng dẫn học Microsoft SQL Server 2008 part 37 pdf

Because theWHEREclause occurs very early in the query processing — often in the query operation that actually reads the data from the data source — and theOVERclause occurs late in the q

Trang 1

4 43877 2001-08-01 00:00:00.000

As expected, the windowed sort (in this case, theRowNumbercolumn) restarts with every new month

Ranking Functions

The windowing capability (theOVER()clause) by itself doesn’t create any query output columns; that’s

where the ranking functions come into play:

■ row_number

■ rank

■ dense_rank

■ ntile

Just to be explicit, the ranking functions all require the windowing function

All the normal aggregate functions —SUM(),MIN(),MAX(),COUNT(*), and so on — can also be

used as ranking functions

Row number() function

TheROW_NUMBER()function generates an on-the-fly auto-incrementing integer according to the sort

order of theOVER()clause It’s similar to Oracle’sRowNumcolumn

The row number function simply numbers the rows in the query result — there’s absolutely no

correlation with any physical address or absolute row number This is important because in a relational

database, row position, number, and order have no meaning It also means that as rows are added or

deleted from the underlying data source, the row numbers for the query results will change In addition,

if there are sets of rows with the same values in all ordering columns, then their order is undefined, so

their row numbers may change between two executions even if the underlying data does not change

One common practical use of theROW_NUMBER()function is to filter by the row number values for

pagination For example, a query that easily produces rows 21–40 would be useful for returning the

second page of data for a web page Just be aware that the rows in the pages may change — typically,

this grabs data from a temp table

It would seem that the natural way to build a row number pagination query would be to simply add the

OVER()clause andROW_NUMBER()function to theWHEREclause:

SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID) as RowNumber, SalesOrderID

FROM Sales.SalesOrderHeader WHERE SalesPersonID = 280

Trang 2

AND ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID)

Between 21 AND 40

ORDER BY RowNumber;

Result:

Msg 4108, Level 15, State 1, Line 4

Windowed functions can only appear in the SELECT or ORDER BY clauses.

Because theWHEREclause occurs very early in the query processing — often in the query operation that

actually reads the data from the data source — and theOVER()clause occurs late in the query

process-ing, theWHEREclause doesn’t yet know about the windowed sort of the data or the ranking function

TheWHEREclause can’t possibly filter by the generated row number

There is a simple solution: Embed the windowing and ranking functionality in a subquery or common

table expression:

SELECT RowNumber, SalesOrderID, OrderDate, SalesOrderNumber

FROM (

SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID) as

RowNumber, *

FROM Sales.SalesOrderHeader

WHERE SalesPersonID = 280

) AS Q

WHERE RowNumber BETWEEN 21 AND 40

ORDER BY RowNumber;

Result:

- -

Trang 3

The second query in this chapter, in the ‘‘Partitioning within the Window’’ section, showed how

group-ing the sort order of the window generated row numbers that started over with every new partition

Rank() and dense_rank() functions

TheRANK()andDENSE_RANK()functions return values as if the rows were competing according to

the windowed sort order Any ties are grouped together with the same ranked value For example, if

Frank and Jim both tied for third place, then they would both receive arank()value of3

Using sales data fromAdventureWorks2008, there are ties for least sold products, which makes it a

good table to play withRANK()andDENSE_RANK().ProductID’s943and911tie for third place

andProductID’s927and898tie for fourth or fifth place depending on how ties are counted:

Least Sold Products:

SELECT ProductID, COUNT(*) as ‘count’

FROM Sales.SalesOrderDetail GROUP BY ProductID

ORDER BY COUNT(*);

Result (abbreviated):

ProductID count

.

Examining the sales data using windowing and theRANK()function returns the ranking values:

SELECT ProductID, SalesCount,

RANK() OVER (ORDER BY SalesCount) as ‘Rank’, DENSE_RANK() OVER(Order By SalesCount) as ‘DenseRank’

FROM (SELECT ProductID, COUNT(*) as SalesCount

) AS Q ORDER BY ‘Rank’;

- -

Trang 4

898 9 5 4

.

This example perfectly demonstrates the difference betweenRANK()andDENSE_RANK().RANK()

counts each tie as a ranked row In this example,Product IDs943and911both tie for third place

but consume the third and fourth row in the ranking, placingProductID 927in fifth place

DENSE_RANK()handles ties differently Tied rows only consume a single value in the ranking, so

the next rank is the next place in the ranking order No ranks are skipped In the previous query,

ProductID 927is in fourth place usingDENSE_RANK()

Just as with theROW_NUMBER()function,RANK()andDENSE_RANK()can be used with a partitioned

OVER()clause The previous example could be partitioned by product category to rank product sales

with each category

Ntile() function

The fourth ranking function organizes the rows into n number of groups, called tiles, and returns the tile

number For example, if the result set has ten rows, thenNTILE(5)would split the ten rows into five

equally sized tiles with two rows in each tile in the order of theOVER()clause’sORDER BY

If the number of rows is not evenly divisible by the number of tiles, then the tiles get the extra row

For example, for 74 rows and 10 tiles, the first 4 tiles get 8 rows each, and tiles 5 through 10 get

7 rows each This can skew the results for smaller data sets For example, 15 rows into 10 tiles would

place 10 rows in the lower five tiles and only place five tiles in the upper five tiles But for larger data

sets — splitting a few hundred rows into 100 tiles, for example — it works great

This rule also applies if there are fewer rows than tiles The rows are not spread across all tiles; instead,

the tiles are filled until the rows are consumed For example, if five rows are split usingNTILE(10),

the result set would not use tiles 1, 3, 5, 7, and 9, but instead show tiles 1, 2, 3, 4, and 5

A common real-world example of NTILE()is the percentile scoring used in college entrance exams

The following query first calculates theAdventureWorks2008products’ sales quantity in the

sub-query The outer query then uses theOVER()clause to sort by the sales count, and theNTILE(100)

to calculate the percentile according to the sales count:

SELECT ProductID, SalesCount,

NTILE(100) OVER (ORDER BY SalesCount) as Percentile

FROM (SELECT ProductID, COUNT(*) as SalesCount

) AS Q

ORDER BY Percentile DESC;

ProductID SalesCount Percentile

- -

Trang 5

873 3354 99

.

Like the other three ranking functions,NTILE()can be used with a partitionedOVER()clause

Simi-lar to the ranking example, the previous example could be partitioned by product category to generate

percentiles within each category

Aggregate Functions

SQL query functions all fit together like a magnificent puzzle A fine example is how windowing

can use not only the four ranking functions —ROW_NUMBER(),RANK(),DENSE_RANK(), and

NTILE()— but also the standard aggregate functions:COUNT(*),MIN(),MAX(), and so on, which

were covered in the last chapter

I won’t rehash the aggregate functions here, and usually the aggregate functions will fit well within a

normal aggregate query, but here’s an example of using theSUM()aggregate function in a window to

calculate the total sales order count for each product subcategory, and then, using that result from the

window, calculate the percentage of sales orders for each product within its subcategory:

SELECT ProductID, Product, SalesCount, NTILE(100) OVER (ORDER BY SalesCount) as Percentile, SubCat,

CAST(CAST(SalesCount AS NUMERIC(9,2))

/ SUM(SalesCount) OVER(Partition BY SubCat)

* 100 AS NUMERIC (4,1)) AS PercOfSubCat FROM (SELECT P.ProductID, P.[Name] AS Product,

PSC.NAME AS SubCat, COUNT(*) as SalesCount FROM Sales.SalesOrderDetail AS SOD

JOIN Production.Product AS P

ON SOD.ProductID = P.ProductID

Trang 6

JOIN Production.ProductSubcategory PSC

ON P.ProductSubcategoryID = PSC.ProductSubcategoryID GROUP BY PSC.NAME, P.[Name], P.ProductID

) Q

ORDER BY Percentile DESC

ProductID Product SalesCount Percentile SubCat PercOfSubCat

- - - -

-870 Water Bottle - 30 oz 4688 100 Bottles and Cages 55.6

921 Mountain Tire Tube 3095 99 Tires and Tubes 17.7

873 Patch Kit/8 Patches 3354 99 Tires and Tubes 19.2

707 Sport-100 Helmet, Red 3083 98 Helmets 33.6

711 Sport-100 Helmet, Blue 3090 98 Helmets 33.7

708 Sport-100 Helmet, Black 3007 97 Helmets 32.8

922 Road Tire Tube 2376 97 Tires and Tubes 13.6

878 Fender Set - Mountain 2121 96 Fenders 100.0

871 Mountain Bottle Cage 2025 96 Bottles and Cages 24.0

.

Summary

Windowing — an extremely powerful technology that creates an independent sort of the query

results — supplies the sort order for the ranking functions which calculate row numbers, ranks, dense

ranks, and n-tiles When coding a complex query that makes the data twist and shout, creative use of

windowing and ranking can be the difference between solving the problem in a single query or resorting

to temp tables and code

The key point to remember is that theOVER()clause generates the sort order for the ranking functions

This chapter wraps up the set of chapters that explain how to query the data The next chapters finish

up the part on select by showing how to package queries into reusable views, and add insert, update,

delete, and merge verbs to queries to modify data

(In case you haven’t checked yet and still need to know: The hidden arrow in the FedEx logo is

between the E and the X.)

Trang 8

Projecting Data Through

Views

IN THIS CHAPTER Planning views wisely Creating views with Management Studio or DDL Updating through views Performance and views Nesting views

Security through views Synonyms

Aview is the saved text of a SQLSELECTstatement that may be referenced

as a data source within a query, similar to how a subquery can be used as

a data source — no more, no less A view can’t be executed by itself; it

must be used within a query

Views are sometimes described as ‘‘virtual tables.’’ This isn’t an accurate

descrip-tion because views don’t store any data Like any other SQL query, views merely

refer to the data stored in tables

With this in mind, it’s important to fully understand how views work, the pros

and cons of using views, and the best place to use views within your project

architecture

Why Use Views?

While there are several opinions on the use of views, ranging from total

absti-nence to overuse, the Information Architecture Principle (from Chapter 2, ‘‘Smart

Database Design’’) serves as a guide for their most appropriate use The principle

states that ‘‘information must be made readily available in a usable format for

daily operations and analysis by individuals, groups, and processes ’’

Presenting data in a more useable format is precisely what views do best

Based on the premise that views are best used to increase data integrity and ease of

writing ad hoc queries, and not as a central part of a production application, here

are some ideas for building ad hoc query views:

■ Use views to denormalize or flatten complex joins and hide any

surro-gate keys used to link data within the database schema A well-designed

view invites the user to get right to the data of interest

■ Save complex aggregate queries as views Even power users will

appreciate a well-crafted aggregate query saved as a view

Trang 9

Best Practice

Views are an important part of the abstraction puzzle; I recommend being intentional in their use Some

developers are enamored with views and use them as the primary abstraction layer for their databases

They create layers of nested views, or stored procedures that refer to views This practice serves no valid

purpose, creates confusion, and requires needless overhead The best database abstraction layer is a single

layer of stored procedures that directly refer to tables, or sometimes user-defined functions (see Chapter 28,

‘‘Building out the Data Abstraction Layer’’)

Instead, use views only to support ad hoc queries and reports For queries that are run occasionally, views

perform well even when compared with stored procedures

Data within a normalized database is rarely organized in a readily available format Building ad hoc

queries that extract the correct information from a normalized database is a challenge for most end-users A

well-written view can hide the complexity and present the correct data to the user

■ Use aliases to change cryptic column names to recognizable column names Just as the SQLSELECTstatement can use column or table aliases to modify the names of columns or tables, these features may be used within a view to present a more readable record set to the user

■ Include only the columns of interest to the user When columns that don’t concern users are left out of the view, the view is easier to query The columns that are included in the

view are called projected columns, meaning they project only the selected data from the entire

underlying table

■ Plan generic, dynamic views that will have long, useful lives Single-purpose views quickly become obsolete and clutter the database Build the view with the intention that it will be used with a

WHEREclause to select a subset of data The view should return all the rows if the user does not supply aWHERErestriction For example, thevEventListview returns all the events; the user should use aWHEREclause to select the local events, or the events in a certain month

■ If a view is needed to return a restricted set of data, such as the next month’s events, then the view should calculate the next month so that it will continue to function over time

Hard-coding values such as a month number or name would be poor practice

■ If the view selects data from a range, then consider writing it as a user-defined function (see Chapter 25, ‘‘Building User-Defined Functions’’), which can accept parameters

■ Consolidate data from across a complex environment Queries that need to collect data from across multiple servers are simplified by encapsulating the union of data from multiple servers within a view This is one case where basing several reports, and even stored procedures, on a view improves the stability, integrity, and maintainability of the system

Trang 10

Using Views for Column-Level Security

One of the basic relational operators is projection — the ability to expose specific columns One primary

advantage of views is their natural capacity to project a predefined set of columns Here’s where theory

becomes practical A view can project columns on a need-to-know basis and hide columns that are sensitive

(e.g., payroll and credit card data), irrelevant, or confusing for the purpose of the view

SQL Server supports column-level security, and it’s a powerful feature The problem is that ad hoc queries

made by users who don’t understand the schema very well will often run into security errors I recommend

implementing SQL Server column-level security, and then also using views to shield users from ever

encountering the security Grant users read permission from only the views, and restrict access to the

physical tables (see Chapter 50, ‘‘Authorizing Securables’’)

I’ve seen databases that only use views for column-level security without any SQL Server–enforced security

This is woefully inadequate and will surely be penalized by any serious security audit

The goal when developing views is two-fold: to enable users to get to the data easily and to protect the

data from the users By building views that provide the correct data, you are preventing erroneous or

inaccurate queries and misinterpretation

There are other advanced forms of views.

Distributed partition views, or federated databases, divide very large tables across multiple smaller

tables or separate servers to improve performance The partitioned view then spans the multiple tables

or servers, thus sharing the query load across more disk spindles These are covered in Chapter 68,

‘‘Partitioning.’’

Indexed views are a powerful feature that actually materializes the data, storing the results of the view

in a clustered index on disk, so in this sense it’s not a pure view Like any view, it can select data from

multiple data sources Think of the indexed view as a covering index but with greater control — you

can include data from multiple data sources, and you don’t have to include the clustered index keys.

The index may then be referenced when executing queries, regardless of whether the view is in the

query, so the name is slightly confusing.

Because designing an indexed view is more like designing an indexing structure than creat-ing a view, I’ve included indexed views in Chapter 64, ‘‘Indexcreat-ing Strategies.’’

The Basic View

Using SQL Server Management Studio, views may be created, modified, executed, and included within

other queries, using either the Query Designer or the DDL code within the Query Editor

Định dạng
Số trang	10
Dung lượng	1,03 MB