Because theWHEREclause occurs very early in the query processing — often in the query operation that actually reads the data from the data source — and theOVERclause occurs late in the q
Trang 14 43877 2001-08-01 00:00:00.000
As expected, the windowed sort (in this case, theRowNumbercolumn) restarts with every new month
Ranking Functions
The windowing capability (theOVER()clause) by itself doesn’t create any query output columns; that’s
where the ranking functions come into play:
■ row_number
■ rank
■ dense_rank
■ ntile
Just to be explicit, the ranking functions all require the windowing function
All the normal aggregate functions —SUM(),MIN(),MAX(),COUNT(*), and so on — can also be
used as ranking functions
Row number() function
TheROW_NUMBER()function generates an on-the-fly auto-incrementing integer according to the sort
order of theOVER()clause It’s similar to Oracle’sRowNumcolumn
The row number function simply numbers the rows in the query result — there’s absolutely no
correlation with any physical address or absolute row number This is important because in a relational
database, row position, number, and order have no meaning It also means that as rows are added or
deleted from the underlying data source, the row numbers for the query results will change In addition,
if there are sets of rows with the same values in all ordering columns, then their order is undefined, so
their row numbers may change between two executions even if the underlying data does not change
One common practical use of theROW_NUMBER()function is to filter by the row number values for
pagination For example, a query that easily produces rows 21–40 would be useful for returning the
second page of data for a web page Just be aware that the rows in the pages may change — typically,
this grabs data from a temp table
It would seem that the natural way to build a row number pagination query would be to simply add the
OVER()clause andROW_NUMBER()function to theWHEREclause:
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID) as RowNumber, SalesOrderID
FROM Sales.SalesOrderHeader WHERE SalesPersonID = 280
Trang 2AND ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID)
Between 21 AND 40
ORDER BY RowNumber;
Result:
Msg 4108, Level 15, State 1, Line 4
Windowed functions can only appear in the SELECT or ORDER BY clauses.
Because theWHEREclause occurs very early in the query processing — often in the query operation that
actually reads the data from the data source — and theOVER()clause occurs late in the query
process-ing, theWHEREclause doesn’t yet know about the windowed sort of the data or the ranking function
TheWHEREclause can’t possibly filter by the generated row number
There is a simple solution: Embed the windowing and ranking functionality in a subquery or common
table expression:
SELECT RowNumber, SalesOrderID, OrderDate, SalesOrderNumber
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID) as
RowNumber, *
FROM Sales.SalesOrderHeader
WHERE SalesPersonID = 280
) AS Q
WHERE RowNumber BETWEEN 21 AND 40
ORDER BY RowNumber;
Result:
- -
Trang 3The second query in this chapter, in the ‘‘Partitioning within the Window’’ section, showed how
group-ing the sort order of the window generated row numbers that started over with every new partition
Rank() and dense_rank() functions
TheRANK()andDENSE_RANK()functions return values as if the rows were competing according to
the windowed sort order Any ties are grouped together with the same ranked value For example, if
Frank and Jim both tied for third place, then they would both receive arank()value of3
Using sales data fromAdventureWorks2008, there are ties for least sold products, which makes it a
good table to play withRANK()andDENSE_RANK().ProductID’s943and911tie for third place
andProductID’s927and898tie for fourth or fifth place depending on how ties are counted:
Least Sold Products:
SELECT ProductID, COUNT(*) as ‘count’
FROM Sales.SalesOrderDetail GROUP BY ProductID
ORDER BY COUNT(*);
Result (abbreviated):
ProductID count
.
Examining the sales data using windowing and theRANK()function returns the ranking values:
SELECT ProductID, SalesCount,
RANK() OVER (ORDER BY SalesCount) as ‘Rank’, DENSE_RANK() OVER(Order By SalesCount) as ‘DenseRank’
FROM (SELECT ProductID, COUNT(*) as SalesCount
FROM Sales.SalesOrderDetail GROUP BY ProductID
) AS Q ORDER BY ‘Rank’;
Result (abbreviated):
- -
Trang 4898 9 5 4
.
This example perfectly demonstrates the difference betweenRANK()andDENSE_RANK().RANK()
counts each tie as a ranked row In this example,Product IDs943and911both tie for third place
but consume the third and fourth row in the ranking, placingProductID 927in fifth place
DENSE_RANK()handles ties differently Tied rows only consume a single value in the ranking, so
the next rank is the next place in the ranking order No ranks are skipped In the previous query,
ProductID 927is in fourth place usingDENSE_RANK()
Just as with theROW_NUMBER()function,RANK()andDENSE_RANK()can be used with a partitioned
OVER()clause The previous example could be partitioned by product category to rank product sales
with each category
Ntile() function
The fourth ranking function organizes the rows into n number of groups, called tiles, and returns the tile
number For example, if the result set has ten rows, thenNTILE(5)would split the ten rows into five
equally sized tiles with two rows in each tile in the order of theOVER()clause’sORDER BY
If the number of rows is not evenly divisible by the number of tiles, then the tiles get the extra row
For example, for 74 rows and 10 tiles, the first 4 tiles get 8 rows each, and tiles 5 through 10 get
7 rows each This can skew the results for smaller data sets For example, 15 rows into 10 tiles would
place 10 rows in the lower five tiles and only place five tiles in the upper five tiles But for larger data
sets — splitting a few hundred rows into 100 tiles, for example — it works great
This rule also applies if there are fewer rows than tiles The rows are not spread across all tiles; instead,
the tiles are filled until the rows are consumed For example, if five rows are split usingNTILE(10),
the result set would not use tiles 1, 3, 5, 7, and 9, but instead show tiles 1, 2, 3, 4, and 5
A common real-world example of NTILE()is the percentile scoring used in college entrance exams
The following query first calculates theAdventureWorks2008products’ sales quantity in the
sub-query The outer query then uses theOVER()clause to sort by the sales count, and theNTILE(100)
to calculate the percentile according to the sales count:
SELECT ProductID, SalesCount,
NTILE(100) OVER (ORDER BY SalesCount) as Percentile
FROM (SELECT ProductID, COUNT(*) as SalesCount
FROM Sales.SalesOrderDetail GROUP BY ProductID
) AS Q
ORDER BY Percentile DESC;
Result (abbreviated):
ProductID SalesCount Percentile
- -
Trang 5873 3354 99
.
Like the other three ranking functions,NTILE()can be used with a partitionedOVER()clause
Simi-lar to the ranking example, the previous example could be partitioned by product category to generate
percentiles within each category
Aggregate Functions
SQL query functions all fit together like a magnificent puzzle A fine example is how windowing
can use not only the four ranking functions —ROW_NUMBER(),RANK(),DENSE_RANK(), and
NTILE()— but also the standard aggregate functions:COUNT(*),MIN(),MAX(), and so on, which
were covered in the last chapter
I won’t rehash the aggregate functions here, and usually the aggregate functions will fit well within a
normal aggregate query, but here’s an example of using theSUM()aggregate function in a window to
calculate the total sales order count for each product subcategory, and then, using that result from the
window, calculate the percentage of sales orders for each product within its subcategory:
SELECT ProductID, Product, SalesCount, NTILE(100) OVER (ORDER BY SalesCount) as Percentile, SubCat,
CAST(CAST(SalesCount AS NUMERIC(9,2))
/ SUM(SalesCount) OVER(Partition BY SubCat)
* 100 AS NUMERIC (4,1)) AS PercOfSubCat FROM (SELECT P.ProductID, P.[Name] AS Product,
PSC.NAME AS SubCat, COUNT(*) as SalesCount FROM Sales.SalesOrderDetail AS SOD
JOIN Production.Product AS P
ON SOD.ProductID = P.ProductID
Trang 6JOIN Production.ProductSubcategory PSC
ON P.ProductSubcategoryID = PSC.ProductSubcategoryID GROUP BY PSC.NAME, P.[Name], P.ProductID
) Q
ORDER BY Percentile DESC
Result (abbreviated):
ProductID Product SalesCount Percentile SubCat PercOfSubCat
- - - -
-870 Water Bottle - 30 oz 4688 100 Bottles and Cages 55.6
921 Mountain Tire Tube 3095 99 Tires and Tubes 17.7
873 Patch Kit/8 Patches 3354 99 Tires and Tubes 19.2
707 Sport-100 Helmet, Red 3083 98 Helmets 33.6
711 Sport-100 Helmet, Blue 3090 98 Helmets 33.7
708 Sport-100 Helmet, Black 3007 97 Helmets 32.8
922 Road Tire Tube 2376 97 Tires and Tubes 13.6
878 Fender Set - Mountain 2121 96 Fenders 100.0
871 Mountain Bottle Cage 2025 96 Bottles and Cages 24.0
.
Summary
Windowing — an extremely powerful technology that creates an independent sort of the query
results — supplies the sort order for the ranking functions which calculate row numbers, ranks, dense
ranks, and n-tiles When coding a complex query that makes the data twist and shout, creative use of
windowing and ranking can be the difference between solving the problem in a single query or resorting
to temp tables and code
The key point to remember is that theOVER()clause generates the sort order for the ranking functions
This chapter wraps up the set of chapters that explain how to query the data The next chapters finish
up the part on select by showing how to package queries into reusable views, and add insert, update,
delete, and merge verbs to queries to modify data
(In case you haven’t checked yet and still need to know: The hidden arrow in the FedEx logo is
between the E and the X.)
Trang 8Projecting Data Through
Views
IN THIS CHAPTER Planning views wisely Creating views with Management Studio or DDL Updating through views Performance and views Nesting views
Security through views Synonyms
Aview is the saved text of a SQLSELECTstatement that may be referenced
as a data source within a query, similar to how a subquery can be used as
a data source — no more, no less A view can’t be executed by itself; it
must be used within a query
Views are sometimes described as ‘‘virtual tables.’’ This isn’t an accurate
descrip-tion because views don’t store any data Like any other SQL query, views merely
refer to the data stored in tables
With this in mind, it’s important to fully understand how views work, the pros
and cons of using views, and the best place to use views within your project
architecture
Why Use Views?
While there are several opinions on the use of views, ranging from total
absti-nence to overuse, the Information Architecture Principle (from Chapter 2, ‘‘Smart
Database Design’’) serves as a guide for their most appropriate use The principle
states that ‘‘information must be made readily available in a usable format for
daily operations and analysis by individuals, groups, and processes ’’
Presenting data in a more useable format is precisely what views do best
Based on the premise that views are best used to increase data integrity and ease of
writing ad hoc queries, and not as a central part of a production application, here
are some ideas for building ad hoc query views:
■ Use views to denormalize or flatten complex joins and hide any
surro-gate keys used to link data within the database schema A well-designed
view invites the user to get right to the data of interest
■ Save complex aggregate queries as views Even power users will
appreciate a well-crafted aggregate query saved as a view
Trang 9Best Practice
Views are an important part of the abstraction puzzle; I recommend being intentional in their use Some
developers are enamored with views and use them as the primary abstraction layer for their databases
They create layers of nested views, or stored procedures that refer to views This practice serves no valid
purpose, creates confusion, and requires needless overhead The best database abstraction layer is a single
layer of stored procedures that directly refer to tables, or sometimes user-defined functions (see Chapter 28,
‘‘Building out the Data Abstraction Layer’’)
Instead, use views only to support ad hoc queries and reports For queries that are run occasionally, views
perform well even when compared with stored procedures
Data within a normalized database is rarely organized in a readily available format Building ad hoc
queries that extract the correct information from a normalized database is a challenge for most end-users A
well-written view can hide the complexity and present the correct data to the user
■ Use aliases to change cryptic column names to recognizable column names Just as the SQLSELECTstatement can use column or table aliases to modify the names of columns or tables, these features may be used within a view to present a more readable record set to the user
■ Include only the columns of interest to the user When columns that don’t concern users are left out of the view, the view is easier to query The columns that are included in the
view are called projected columns, meaning they project only the selected data from the entire
underlying table
■ Plan generic, dynamic views that will have long, useful lives Single-purpose views quickly become obsolete and clutter the database Build the view with the intention that it will be used with a
WHEREclause to select a subset of data The view should return all the rows if the user does not supply aWHERErestriction For example, thevEventListview returns all the events; the user should use aWHEREclause to select the local events, or the events in a certain month
■ If a view is needed to return a restricted set of data, such as the next month’s events, then the view should calculate the next month so that it will continue to function over time
Hard-coding values such as a month number or name would be poor practice
■ If the view selects data from a range, then consider writing it as a user-defined function (see Chapter 25, ‘‘Building User-Defined Functions’’), which can accept parameters
■ Consolidate data from across a complex environment Queries that need to collect data from across multiple servers are simplified by encapsulating the union of data from multiple servers within a view This is one case where basing several reports, and even stored procedures, on a view improves the stability, integrity, and maintainability of the system
Trang 10Using Views for Column-Level Security
One of the basic relational operators is projection — the ability to expose specific columns One primary
advantage of views is their natural capacity to project a predefined set of columns Here’s where theory
becomes practical A view can project columns on a need-to-know basis and hide columns that are sensitive
(e.g., payroll and credit card data), irrelevant, or confusing for the purpose of the view
SQL Server supports column-level security, and it’s a powerful feature The problem is that ad hoc queries
made by users who don’t understand the schema very well will often run into security errors I recommend
implementing SQL Server column-level security, and then also using views to shield users from ever
encountering the security Grant users read permission from only the views, and restrict access to the
physical tables (see Chapter 50, ‘‘Authorizing Securables’’)
I’ve seen databases that only use views for column-level security without any SQL Server–enforced security
This is woefully inadequate and will surely be penalized by any serious security audit
The goal when developing views is two-fold: to enable users to get to the data easily and to protect the
data from the users By building views that provide the correct data, you are preventing erroneous or
inaccurate queries and misinterpretation
There are other advanced forms of views.
Distributed partition views, or federated databases, divide very large tables across multiple smaller
tables or separate servers to improve performance The partitioned view then spans the multiple tables
or servers, thus sharing the query load across more disk spindles These are covered in Chapter 68,
‘‘Partitioning.’’
Indexed views are a powerful feature that actually materializes the data, storing the results of the view
in a clustered index on disk, so in this sense it’s not a pure view Like any view, it can select data from
multiple data sources Think of the indexed view as a covering index but with greater control — you
can include data from multiple data sources, and you don’t have to include the clustered index keys.
The index may then be referenced when executing queries, regardless of whether the view is in the
query, so the name is slightly confusing.
Because designing an indexed view is more like designing an indexing structure than creat-ing a view, I’ve included indexed views in Chapter 64, ‘‘Indexcreat-ing Strategies.’’
The Basic View
Using SQL Server Management Studio, views may be created, modified, executed, and included within
other queries, using either the Query Designer or the DDL code within the Query Editor