CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks select @rowcnt = @@ROWCOUNT, @error = @@ERROR if @rowcnt = 0 print ‘no rows updated’ if @error 0 raiserror ‘Update of ti
Trang 1CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks
select @rowcnt = @@ROWCOUNT, @error = @@ERROR
if @rowcnt = 0
print ‘no rows updated’
if @error <> 0
raiserror (‘Update of titles failed’, 16, 1)
return
NOTE
Error processing was improved in SQL Server 2005 with the introduction of the
TRY CATCHconstruct in T-SQL It provides a much more robust method of error
han-dling than checking @@ERRORfor error conditions The TRY CATCHconstruct is
dis-cussed in more detail later in this chapter
De-Duping Data with Ranking Functions
One common problem encountered with imported data is unexpected duplicate data rows,
especially if the data is being consolidated from multiple sources In previous versions of
SQL Server, de-duping the data often involved the use of cursors and temp tables Since
the introduction of theROW_NUMBERranking function and common table expressions in
SQL Server 2005, you are able to de-dupe data with a single statement
To demonstrate this approach, Listing 43.26 shows how to create an authors_importtable
and populate it with some duplicate rows
LISTING 43.27 Script to Create and Populate the authors_import Table
USE bigpubs2008
GO
CREATE TABLE dbo.authors_import(
au_id dbo.id NOT NULL,
au_lname varchar(30) NOT NULL,
au_fname varchar(20) NOT NULL)
go
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘681-61-9588’, ‘Ahlberg’, ‘Allan’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘739-35-5165’, ‘Ahlberg’, ‘Janet’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘499-84-5672’, ‘Alexander’, ‘Lloyd’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘499-84-5672’, ‘Alexander’, ‘Lloyd’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘432-31-3829’, ‘Bate’, ‘W Jackson’)
Trang 2INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘432-31-3829’, ‘Bate’, ‘W Jackson’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘432-31-3829’, ‘Bate’, ‘W Jackson’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘437-99-3329’, ‘Bauer’, ‘Caroline Feller’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘378-33-9373’, ‘Benchley’, ‘Nathaniel’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘378-33-9373’, ‘Benchley’, ‘Nate’)
INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)
VALUES(‘409-56-7008’, ‘Bennet’, ‘Abraham’)
GO
You can see in the data for Listing 43.27 that there are two duplicates for au_id
499-84-5672and three for au_id 432-31-3829 To start identifying the duplicates, you can write a
query using the ROW_NUMBER()function to generate a unique row ID for each data row, as
shown in Listing 43.28
LISTING 43.28 Using the ROW_NUMBER() Function to Generate Unique Row IDs
SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *
FROM dbo.authors_import
go
ROWID au_id au_lname au_fname
- -
-1 378-33-9373 Benchley Nate
2 378-33-9373 Benchley Nathaniel
3 409-56-7008 Bennet Abraham
4 432-31-3829 Bate W Jackson
5 432-31-3829 Bate W Jackson
6 432-31-3829 Bate W Jackson
7 437-99-3329 Bauer Caroline Feller
8 499-84-5672 Alexander Lloyd
9 499-84-5672 Alexander Lloyd
10 681-61-9588 Ahlberg Allan
11 739-35-5165 Ahlberg Janet
Now you can use the query shown in Listing 43.28 to build a common table expression to
find the duplicate rows In this case, we keep the first row found To make sure it works
correctly, write the query first as a SELECTstatement to verify that it is identifying the
correct rows, as shown in Listing 43.29
Trang 3CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks
LISTING 43.29 Using a Common Table Expression to Identify Duplicate Rows
WITH authors_import AS
(SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *
FROM dbo.authors_import)
select * FROM authors_import WHERE ROWID NOT IN
(SELECT MIN(ROWID) FROM authors_import
GROUP BY au_id,au_fname, au_lname);
GO
ROWID au_id au_lname au_fname
- - -
-5 432-31-3829 Bate W Jackson
6 432-31-3829 Bate W Jackson
9 499-84-5672 Alexander Lloyd
Now you simply change the final SELECTstatement in Listing 43.29 into a DELETE
state-ment, and it removes the duplicate rows from authors_import:
WITH authors_import AS
(SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *
FROM dbo.authors_import)
delete FROM authors_import WHERE ROWID NOT IN
(SELECT MIN(ROWID) FROM authors_import
GROUP BY au_id,au_fname, au_lname);
GO
select * from authors_import
go
au_id au_lname au_fname
- -
-681-61-9588 Ahlberg Allan
739-35-5165 Ahlberg Janet
499-84-5672 Alexander Lloyd
432-31-3829 Bate W Jackson
437-99-3329 Bauer Caroline Feller
378-33-9373 Benchley Nathaniel
378-33-9373 Benchley Nate
409-56-7008 Bennet Abraham
If you want to retain the last duplicate record and delete the previous ones, you can
replace the MINfunction with the MAXfunction in the DELETEstatement
Notice that the uniqueness of the duplication is determined by the columns specified in
theGROUP BYclause of the subquery Notice that there are still two records for au_id
378-33-9373remaining in the final record set The duplicates removed were based on au_id,
Trang 4au_lname, and au_fname Because the first name is different for each of the two instances
ofau_id 378-33-9373, both Nathaniel Benchley and Nate Benchley remain in the
authors_importtable If you remove au_fnamefrom the GROUP BYclause, the earlier record
for Nathaniel Benchley would remain, and Nate Benchley would be removed However,
this result may or may not be desirable You would probably want to resolve the disparity
between Nathaniel and Nate and confirm manually that they are duplicate rows before
deleting them Running the query in Listing 43.27 with au_fnameremoved from the GROUP
BYclause helps you better determine what your final record set would look like
In Case You Missed It: New Transact-SQL Features
in SQL Server 2005
SQL Server 2005 introduced some new features and changes to the Transact-SQL (T-SQL)
language:
Thexmldata type
Themaxspecifier for the varcharandvarbinarydata types
TOPenhancements
TheOUTPUTclause
Common table expressions (CTEs)
Ranking functions
PIVOTandUNPIVOT
TheAPPLYoperator
TRY-CATCHlogic for error handling
TheTABLESAMPLEclause
NOTE
Unless stated otherwise, all examples in this chapter make use of tables in the
bigpubs2008database
The xml Data Type
SQL Server 2005 introduced a new xmldata type that supports storing XML documents
and fragments in database columns or variables The xmldata type can be used with local
variable declarations, as the output of user-defined functions, as input parameters to
stored procedures and functions, and much more The results of a FOR XMLstatement can
easily be stored in a column, stored procedure parameter, or local variable XML data is
stored in an internal binary format and can be up to 2GB in size XML instances stored in
xmlcolumns can contain up to 128 levels of nesting
Trang 5CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks
xmlcolumns can also be used to store code files such as XSLT, XSD, XHTML, and any
other well-formed content These files can then be retrieved by user-defined functions
written in managed code hosted by SQL Server (See Chapter 45, “SQL Server and the NET
Framework,” for a full review of SQL Server managed hosting.)
For more information and detailed examples on using the xmldata type, see Chapter 47,
“Using XML in SQL Server 2008.”
The max Specifier
In SQL Server 2000, the most data that could be stored in a varchar,nvarchar, or
varbinarycolumn was 8,000 bytes If you needed to store a larger value in a single
column, you had to use the large object (LOB) data types: text,ntext, or image The main
disadvantage of using the LOB data types is that they cannot be used in many places
wherevarcharorvarbinarydata types can be used (for example, as local variables, as
arguments to SQL Server string manipulation functions such as REPLACE, and in string
concatenation operations)
SQL Server 2005 introduced the maxspecifier for varcharandvarbinarydata types This
specifier expands the storage capabilities of the varcharandvarbinarydata types to store
up to 231-1bytes of data, which is the same maximum size of textandimagedata types
The main difference is that these large value data types can be used just like the smaller
varchar,nvarchar, and varbinarydata types The large value data types can be used in
functions where LOB objects cannot (such as the REPLACEfunction), as data types for
Transact-SQL variables, and in string concatenation operations They can also be used in
theDISTINCT,ORDER BY, andGROUP BYclauses of a SELECTstatement as well as in
aggre-gates, joins, and subqueries
The following example shows a local variable being defined using the varchar(max)data
type:
declare @maxvar varchar(max)
go
However, a similar variable cannot be defined using thetextdata type:
declare @textvar text
go
Msg 2739, Level 16, State 1, Line 2
The text, ntext, and image data types are invalid for local variables.declare
@maxvar varchar(max)
The remaining examples in this section make use of the following table to demonstrate
the differences between a varchar(max)column and textcolumn:
create table maxtest (maxcol varchar(max),
textcol text)
Trang 6go
populate the columns with some sample data
insert maxtest
select replicate(‘1234567890’, 1000), replicate(‘1234567890’, 1000)
go
In the following example, you can see that the substringfunction works with both
varchar(max)andtextdata types:
select substring (maxcol, 1, 10),
substring (textcol, 1, 10)
from maxtest
go
maxcol textcol
-1234567890 -1234567890
However, in this example, you can see that while a varchar(max)column can be used for
string concatenation, the textdata type cannot:
select substring(‘xxx’ + maxcol, 1, 10) from maxtest
go
-xxx1234567
select substring(‘xxx’ + textcol, 1, 10) from maxtest
go
Msg 402, Level 16, State 1, Line 1
The data types varchar and text are incompatible in the add operator.
With the introduction of the maxspecifier, the large value data types are able to store data
with the same maximum size as the LOB data types, but with the ability to be used just as
their smaller varchar,nvarchar, and varbinarycounterparts It is recommended that the
maxdata types be used instead of the LOB data types because the LOB data types will be
deprecated in future releases of SQL Server
TOP Enhancements
TheTOPclause allows you to specify the number or percentage of rows to be returned by a
SELECTstatement SQL Server 2005 introduced the capability for the TOPclause to also be
used in INSERT,UPDATE, and DELETEstatements The syntax was also enhanced to allow
the use of a numeric expression for the number value rather than having to use a
hard-coded number
Trang 7CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks
The syntax for the TOPclause is as follows:
SELECT [TOP (numeric_expression) [PERCENT] [WITH TIES]]
FROM table_name [ORDER BY ]
DELETE [TOP (numeric_expression) [PERCENT]] FROM table_name
UPDATE [TOP (numeric_expression) [PERCENT]] table_name SET
INSERT [TOP (numeric_expression) [PERCENT]] INTO table_name
paren-theses is supported inSELECTqueries only for backward compatibility The parentheses
around the expression are always required when TOPis used in UPDATE,INSERT, or DELETE
statements
If you do not specify the PERCENToption, the numeric expression must be implicitly
convertible to the bigintdata type If you specify the PERCENToption, the numeric
expression must be implicitly convertible to floatand fall within the range of 0to100
TheWITH TIESoption with the ORDER BYclause is supported only with SELECTstatements
The following example shows the use of a local variable as the numeric expression for the
TOPclause to limit the number of rows returned by a SELECTstatement:
declare @rows int
select @rows = 5
select top (@rows) * from sales
go
stor_id ord_num ord_date qty payterms title_id
- - -
-6380 6871 2007-09-14 00:00:00.000 5 Net 60 BU1032
6380 722a 2007-09-13 00:00:00.000 3 Net 60 PS2091
6380 ONFFFFFFFFFFFFFFFFFF 2007-08-09 00:00:00.000 852 Net 30 FI1980
7066 A2976 2006-05-24 00:00:00.000 50 Net 30 PC8888
7066 ONAAAAAAAAAA 2007-01-13 00:00:00.000 948 Net 60 CH2480
Allowing the use of a numeric expression rather than a constant for the TOPcommand is
especially useful when the number of requested rows is passed as a parameter to a stored
procedure or function When you use a subquery as the numeric expression, it must be
contained; it cannot refer to columns of a table in the outer query Using a
self-contained subquery allows you to more easily develop queries for dynamic requests, such
as “calculate the average number of titles published per week and return that many titles
which were most recently published”:
SELECT TOP(SELECT COUNT(*)/DATEDIFF(month, MIN(pubdate), MAX(pubdate))
FROM titles)
title_id, pub_id, pubdate
FROM titles
ORDER BY pubdate DESC
go
Trang 8title_id pub_id pubdate
- -
-CH9009 9903 2009-05-31 00:00:00.000
PC9999 1389 2009-03-31 00:00:00.000
FI0375 9901 2008-09-24 00:00:00.000
DR4250 9904 2008-09-21 00:00:00.000
BI4785 9914 2008-09-20 00:00:00.000
BI0194 9911 2008-09-19 00:00:00.000
BI3224 9905 2008-09-18 00:00:00.000
FI0435 9917 2008-09-17 00:00:00.000
FI0792 9907 2008-09-13 00:00:00.000
NOTE
Be aware that the TOPkeyword does not speed up a query if the query also contains
anORDER BYclause The reason is that the entire result set is selected into a
work-table and sorted before the top N rows in the ordered result set are returned.
When using the TOPkeyword, you can also add the WITH TIESoption to specify that
addi-tional rows should be returned from the result set if duplicate values of the columns
speci-fied in the ORDER BYclause exist within the last values returned The WITH TIESoption
can be specified only if an ORDER BYclause is specified The following query returns the
top four most expensive books:
SELECT TOP 4 price, title
FROM titles
ORDER BY price DESC
go
price title
-
-17.1675 But Is It User Friendly?
17.0884 Is Anger the Enemy?
15.9329 Emotional Security: A New Algorithm
15.894 You Can Combat Computer Stress!
If you use WITH TIES, you can see that there is an additional row with the same price
(15.894) as the last row returned by the previous query:
SELECT TOP 4 WITH TIES price, title
FROM titles
ORDER BY price DESC
go
Trang 9CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks
price title
-
-17.1675 But Is It User Friendly?
17.0884 Is Anger the Enemy?
15.9329 Emotional Security: A New Algorithm
15.894 The Gourmet Microwave
15.894 You Can Combat Computer Stress!
In versions of SQL Server prior to 2005, if you wanted to limit the number of rows affected
by anUPDATEstatement or aDELETEstatement, you had to use theSET ROWCOUNTstatement:
set rowcount 100
DELETE sales where ord_date < (select dateadd(year, 1, min(ord_date)) from sales)
set rowcount 0
SET ROWCOUNToften was used in this way to allow backing up and pruning of the transaction
log during a purge process and also to prevent lock escalation The problem withSET ROWCOUNT
is that it applies to the entire current user session You have to remember to set the rowcount
back to0to be sure you don’t limit the rows affected by subsequent statements WithTOP, you
can more easily specify the desired number of rows for each individual statement:
DELETE top (100) sales
where ord_date < (select dateadd(year, 1, min(ord_date)) from sales)
UPDATE top (100) titles
set royalty = royalty * 1.25
You may be thinking that using TOPinINSERTstatements is not really necessary because
you can always specify it in a SELECTquery, as shown in Listing 43.30
LISTING 43.30 Limiting Rows for Insert with TOP in a SELECT Statement
CREATE TABLE top_sales
(stor_id char(4),
ord_num varchar(20),
ord_date datetime NOT NULL,
qty smallint NOT NULL,
payterms varchar(12) ,
title_id dbo.tid NOT NULL)
go
insert top_sales
select top 100 * from sales
where qty > 1700
order by qty desc
However, you may find using theTOPclause in anINSERTstatement useful when
insert-ing the result of anEXECcommand or the result of aUNIONoperation, as shown in
Listing 43.31
Trang 10LISTING 43.31 UsingTOP in an Insert with a UNION ALL Query
insert top (50) into top_sales
select stor_id, ord_num, ord_date, qty, payterms, title_id from sales
where qty >= 1800
union all
select stor_id, ord_num, ord_date, qty, payterms, title_id from sales_big
where qty >= 1800
order by qty desc
When a TOP (n)clause is used with DELETE,UPDATE, or INSERT, the selection of rows on
which the operation is performed is not guaranteed If you want the TOP(n)clause to
operate on rows in a meaningful chronological order, you must use TOPtogether with
ORDER BYin a subselect statement The following query deletes the 10 rows of the
sales_bigtable that have the earliest order dates:
delete from sales_big
where sales_id in (select top 10 sales_id
from sales_big order by ord_date)
To ensure that only 10 rows are deleted, the column specified in the subselect statement
(sales_id) must be the primary key of the table Using a nonkey column in the subselect
statement could result in the deletion of more than 10 rows if the specified column
matched duplicate values
NOTE
SQL Server Books Online states that when you use TOP (n)withINSERT,UPDATE, and
DELETEoperations, the rows affected should be a random selection of the TOP(n)rows
from the underlying table In practice, this behavior has not been observed Using TOP
(n)withINSERT,UPDATE, and DELETEappears to affect only the first n matching rows.
However, because the row selection is not guaranteed, it is still recommended that you
useTOPtogether with ORDER BYin a subselect to ensure the expected result
The OUTPUT Clause
By default, the execution of a DML statement such as INSERT,UPDATE, or DELETEdoes not
produce any results that indicate what rows changed except for checking @@ROWCOUNTto
determine the number of rows affected
In SQL Server 2005, the INSERT,UPDATE, and DELETEstatements were enhanced to support
anOUTPUTclause to be able to identify the actual rows affected by the DML statement The
OUTPUTclause allows you to return data from a modification statement (INSERT,UPDATE, or
DELETE) This data can be returned as a result set to the caller or returned into a table
vari-able or an output tvari-able To capture information on the affected rows, the OUTPUTclause
provides access to the insertedanddeletedvirtual tables that are normally accessible