Microsoft SQL Server 2008 R2 Unleashed- P175 potx

CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks select @rowcnt = @@ROWCOUNT, @error = @@ERROR if @rowcnt = 0 print ‘no rows updated’ if @error 0 raiserror ‘Update of ti

Trang 1

CHAPTER 43 Transact-SQL Programming Guidelines, Tips, and Tricks

select @rowcnt = @@ROWCOUNT, @error = @@ERROR

if @rowcnt = 0

print ‘no rows updated’

if @error <> 0

raiserror (‘Update of titles failed’, 16, 1)

return

NOTE

Error processing was improved in SQL Server 2005 with the introduction of the

TRY CATCHconstruct in T-SQL It provides a much more robust method of error

han-dling than checking @@ERRORfor error conditions The TRY CATCHconstruct is

dis-cussed in more detail later in this chapter

De-Duping Data with Ranking Functions

One common problem encountered with imported data is unexpected duplicate data rows,

especially if the data is being consolidated from multiple sources In previous versions of

SQL Server, de-duping the data often involved the use of cursors and temp tables Since

the introduction of theROW_NUMBERranking function and common table expressions in

SQL Server 2005, you are able to de-dupe data with a single statement

To demonstrate this approach, Listing 43.26 shows how to create an authors_importtable

and populate it with some duplicate rows

LISTING 43.27 Script to Create and Populate the authors_import Table

USE bigpubs2008

GO

CREATE TABLE dbo.authors_import(

au_id dbo.id NOT NULL,

au_lname varchar(30) NOT NULL,

au_fname varchar(20) NOT NULL)

go

INSERT INTO dbo.authors_import(au_id, au_lname, au_fname)

VALUES(‘681-61-9588’, ‘Ahlberg’, ‘Allan’)

VALUES(‘739-35-5165’, ‘Ahlberg’, ‘Janet’)

VALUES(‘499-84-5672’, ‘Alexander’, ‘Lloyd’)

VALUES(‘432-31-3829’, ‘Bate’, ‘W Jackson’)

Trang 2

VALUES(‘437-99-3329’, ‘Bauer’, ‘Caroline Feller’)

VALUES(‘378-33-9373’, ‘Benchley’, ‘Nathaniel’)

VALUES(‘378-33-9373’, ‘Benchley’, ‘Nate’)

VALUES(‘409-56-7008’, ‘Bennet’, ‘Abraham’)

GO

You can see in the data for Listing 43.27 that there are two duplicates for au_id

499-84-5672and three for au_id 432-31-3829 To start identifying the duplicates, you can write a

query using the ROW_NUMBER()function to generate a unique row ID for each data row, as

shown in Listing 43.28

LISTING 43.28 Using the ROW_NUMBER() Function to Generate Unique Row IDs

SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *

FROM dbo.authors_import

go

ROWID au_id au_lname au_fname

- -

-1 378-33-9373 Benchley Nate

2 378-33-9373 Benchley Nathaniel

3 409-56-7008 Bennet Abraham

4 432-31-3829 Bate W Jackson

7 437-99-3329 Bauer Caroline Feller

8 499-84-5672 Alexander Lloyd

10 681-61-9588 Ahlberg Allan

11 739-35-5165 Ahlberg Janet

Now you can use the query shown in Listing 43.28 to build a common table expression to

find the duplicate rows In this case, we keep the first row found To make sure it works

correctly, write the query first as a SELECTstatement to verify that it is identifying the

correct rows, as shown in Listing 43.29

Trang 3

LISTING 43.29 Using a Common Table Expression to Identify Duplicate Rows

WITH authors_import AS

(SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *

FROM dbo.authors_import)

select * FROM authors_import WHERE ROWID NOT IN

(SELECT MIN(ROWID) FROM authors_import

GROUP BY au_id,au_fname, au_lname);

GO

ROWID au_id au_lname au_fname

- - -

-5 432-31-3829 Bate W Jackson

Now you simply change the final SELECTstatement in Listing 43.29 into a DELETE

state-ment, and it removes the duplicate rows from authors_import:

WITH authors_import AS

(SELECT ROW_NUMBER() OVER (ORDER BY au_id, au_lname, au_fname) AS ROWID, *

FROM dbo.authors_import)

delete FROM authors_import WHERE ROWID NOT IN

(SELECT MIN(ROWID) FROM authors_import

GROUP BY au_id,au_fname, au_lname);

GO

select * from authors_import

go

au_id au_lname au_fname

- -

-681-61-9588 Ahlberg Allan

739-35-5165 Ahlberg Janet

499-84-5672 Alexander Lloyd

432-31-3829 Bate W Jackson

437-99-3329 Bauer Caroline Feller

378-33-9373 Benchley Nathaniel

378-33-9373 Benchley Nate

409-56-7008 Bennet Abraham

If you want to retain the last duplicate record and delete the previous ones, you can

replace the MINfunction with the MAXfunction in the DELETEstatement

Notice that the uniqueness of the duplication is determined by the columns specified in

theGROUP BYclause of the subquery Notice that there are still two records for au_id

378-33-9373remaining in the final record set The duplicates removed were based on au_id,

Trang 4

au_lname, and au_fname Because the first name is different for each of the two instances

ofau_id 378-33-9373, both Nathaniel Benchley and Nate Benchley remain in the

authors_importtable If you remove au_fnamefrom the GROUP BYclause, the earlier record

for Nathaniel Benchley would remain, and Nate Benchley would be removed However,

this result may or may not be desirable You would probably want to resolve the disparity

between Nathaniel and Nate and confirm manually that they are duplicate rows before

deleting them Running the query in Listing 43.27 with au_fnameremoved from the GROUP

BYclause helps you better determine what your final record set would look like

In Case You Missed It: New Transact-SQL Features

in SQL Server 2005

SQL Server 2005 introduced some new features and changes to the Transact-SQL (T-SQL)

language:

Thexmldata type

Themaxspecifier for the varcharandvarbinarydata types

TOPenhancements

TheOUTPUTclause

Common table expressions (CTEs)

Ranking functions

PIVOTandUNPIVOT

TheAPPLYoperator

TRY-CATCHlogic for error handling

TheTABLESAMPLEclause

NOTE

Unless stated otherwise, all examples in this chapter make use of tables in the

bigpubs2008database

The xml Data Type

SQL Server 2005 introduced a new xmldata type that supports storing XML documents

and fragments in database columns or variables The xmldata type can be used with local

variable declarations, as the output of user-defined functions, as input parameters to

stored procedures and functions, and much more The results of a FOR XMLstatement can

easily be stored in a column, stored procedure parameter, or local variable XML data is

stored in an internal binary format and can be up to 2GB in size XML instances stored in

xmlcolumns can contain up to 128 levels of nesting

Trang 5

xmlcolumns can also be used to store code files such as XSLT, XSD, XHTML, and any

other well-formed content These files can then be retrieved by user-defined functions

written in managed code hosted by SQL Server (See Chapter 45, “SQL Server and the NET

Framework,” for a full review of SQL Server managed hosting.)

For more information and detailed examples on using the xmldata type, see Chapter 47,

“Using XML in SQL Server 2008.”

The max Specifier

In SQL Server 2000, the most data that could be stored in a varchar,nvarchar, or

varbinarycolumn was 8,000 bytes If you needed to store a larger value in a single

column, you had to use the large object (LOB) data types: text,ntext, or image The main

disadvantage of using the LOB data types is that they cannot be used in many places

wherevarcharorvarbinarydata types can be used (for example, as local variables, as

arguments to SQL Server string manipulation functions such as REPLACE, and in string

concatenation operations)

SQL Server 2005 introduced the maxspecifier for varcharandvarbinarydata types This

specifier expands the storage capabilities of the varcharandvarbinarydata types to store

up to 231-1bytes of data, which is the same maximum size of textandimagedata types

The main difference is that these large value data types can be used just like the smaller

varchar,nvarchar, and varbinarydata types The large value data types can be used in

functions where LOB objects cannot (such as the REPLACEfunction), as data types for

Transact-SQL variables, and in string concatenation operations They can also be used in

theDISTINCT,ORDER BY, andGROUP BYclauses of a SELECTstatement as well as in

aggre-gates, joins, and subqueries

The following example shows a local variable being defined using the varchar(max)data

type:

declare @maxvar varchar(max)

go

However, a similar variable cannot be defined using thetextdata type:

declare @textvar text

go

Msg 2739, Level 16, State 1, Line 2

The text, ntext, and image data types are invalid for local variables.declare

@maxvar varchar(max)

The remaining examples in this section make use of the following table to demonstrate

the differences between a varchar(max)column and textcolumn:

create table maxtest (maxcol varchar(max),

textcol text)

Trang 6

go

populate the columns with some sample data

insert maxtest

select replicate(‘1234567890’, 1000), replicate(‘1234567890’, 1000)

go

In the following example, you can see that the substringfunction works with both

varchar(max)andtextdata types:

select substring (maxcol, 1, 10),

substring (textcol, 1, 10)

from maxtest

go

maxcol textcol

-1234567890 -1234567890

However, in this example, you can see that while a varchar(max)column can be used for

string concatenation, the textdata type cannot:

select substring(‘xxx’ + maxcol, 1, 10) from maxtest

go

-xxx1234567

select substring(‘xxx’ + textcol, 1, 10) from maxtest

go

Msg 402, Level 16, State 1, Line 1

The data types varchar and text are incompatible in the add operator.

With the introduction of the maxspecifier, the large value data types are able to store data

with the same maximum size as the LOB data types, but with the ability to be used just as

their smaller varchar,nvarchar, and varbinarycounterparts It is recommended that the

maxdata types be used instead of the LOB data types because the LOB data types will be

deprecated in future releases of SQL Server

TOP Enhancements

TheTOPclause allows you to specify the number or percentage of rows to be returned by a

SELECTstatement SQL Server 2005 introduced the capability for the TOPclause to also be

used in INSERT,UPDATE, and DELETEstatements The syntax was also enhanced to allow

the use of a numeric expression for the number value rather than having to use a

hard-coded number

Trang 7

The syntax for the TOPclause is as follows:

SELECT [TOP (numeric_expression) [PERCENT] [WITH TIES]]

FROM table_name [ORDER BY ]

DELETE [TOP (numeric_expression) [PERCENT]] FROM table_name

UPDATE [TOP (numeric_expression) [PERCENT]] table_name SET

INSERT [TOP (numeric_expression) [PERCENT]] INTO table_name

paren-theses is supported inSELECTqueries only for backward compatibility The parentheses

around the expression are always required when TOPis used in UPDATE,INSERT, or DELETE

statements

If you do not specify the PERCENToption, the numeric expression must be implicitly

convertible to the bigintdata type If you specify the PERCENToption, the numeric

expression must be implicitly convertible to floatand fall within the range of 0to100

TheWITH TIESoption with the ORDER BYclause is supported only with SELECTstatements

The following example shows the use of a local variable as the numeric expression for the

TOPclause to limit the number of rows returned by a SELECTstatement:

declare @rows int

select @rows = 5

select top (@rows) * from sales

go

stor_id ord_num ord_date qty payterms title_id

- - -

-6380 6871 2007-09-14 00:00:00.000 5 Net 60 BU1032

6380 722a 2007-09-13 00:00:00.000 3 Net 60 PS2091

6380 ONFFFFFFFFFFFFFFFFFF 2007-08-09 00:00:00.000 852 Net 30 FI1980

7066 A2976 2006-05-24 00:00:00.000 50 Net 30 PC8888

7066 ONAAAAAAAAAA 2007-01-13 00:00:00.000 948 Net 60 CH2480

Allowing the use of a numeric expression rather than a constant for the TOPcommand is

especially useful when the number of requested rows is passed as a parameter to a stored

procedure or function When you use a subquery as the numeric expression, it must be

contained; it cannot refer to columns of a table in the outer query Using a

self-contained subquery allows you to more easily develop queries for dynamic requests, such

as “calculate the average number of titles published per week and return that many titles

which were most recently published”:

SELECT TOP(SELECT COUNT(*)/DATEDIFF(month, MIN(pubdate), MAX(pubdate))

FROM titles)

title_id, pub_id, pubdate

FROM titles

ORDER BY pubdate DESC

go

Trang 8

title_id pub_id pubdate

- -

-CH9009 9903 2009-05-31 00:00:00.000

PC9999 1389 2009-03-31 00:00:00.000

FI0375 9901 2008-09-24 00:00:00.000

DR4250 9904 2008-09-21 00:00:00.000

BI4785 9914 2008-09-20 00:00:00.000

BI0194 9911 2008-09-19 00:00:00.000

BI3224 9905 2008-09-18 00:00:00.000

FI0435 9917 2008-09-17 00:00:00.000

FI0792 9907 2008-09-13 00:00:00.000

NOTE

Be aware that the TOPkeyword does not speed up a query if the query also contains

anORDER BYclause The reason is that the entire result set is selected into a

work-table and sorted before the top N rows in the ordered result set are returned.

When using the TOPkeyword, you can also add the WITH TIESoption to specify that

addi-tional rows should be returned from the result set if duplicate values of the columns

speci-fied in the ORDER BYclause exist within the last values returned The WITH TIESoption

can be specified only if an ORDER BYclause is specified The following query returns the

top four most expensive books:

SELECT TOP 4 price, title

FROM titles

ORDER BY price DESC

go

price title

-

-17.1675 But Is It User Friendly?

17.0884 Is Anger the Enemy?

15.9329 Emotional Security: A New Algorithm

15.894 You Can Combat Computer Stress!

If you use WITH TIES, you can see that there is an additional row with the same price

(15.894) as the last row returned by the previous query:

SELECT TOP 4 WITH TIES price, title

FROM titles

ORDER BY price DESC

go

Trang 9

price title

-

-17.1675 But Is It User Friendly?

17.0884 Is Anger the Enemy?

15.9329 Emotional Security: A New Algorithm

15.894 The Gourmet Microwave

15.894 You Can Combat Computer Stress!

In versions of SQL Server prior to 2005, if you wanted to limit the number of rows affected

by anUPDATEstatement or aDELETEstatement, you had to use theSET ROWCOUNTstatement:

set rowcount 100

DELETE sales where ord_date < (select dateadd(year, 1, min(ord_date)) from sales)

set rowcount 0

SET ROWCOUNToften was used in this way to allow backing up and pruning of the transaction

log during a purge process and also to prevent lock escalation The problem withSET ROWCOUNT

is that it applies to the entire current user session You have to remember to set the rowcount

back to0to be sure you don’t limit the rows affected by subsequent statements WithTOP, you

can more easily specify the desired number of rows for each individual statement:

DELETE top (100) sales

where ord_date < (select dateadd(year, 1, min(ord_date)) from sales)

UPDATE top (100) titles

set royalty = royalty * 1.25

You may be thinking that using TOPinINSERTstatements is not really necessary because

you can always specify it in a SELECTquery, as shown in Listing 43.30

LISTING 43.30 Limiting Rows for Insert with TOP in a SELECT Statement

CREATE TABLE top_sales

(stor_id char(4),

ord_num varchar(20),

ord_date datetime NOT NULL,

qty smallint NOT NULL,

payterms varchar(12) ,

title_id dbo.tid NOT NULL)

go

insert top_sales

select top 100 * from sales

where qty > 1700

order by qty desc

However, you may find using theTOPclause in anINSERTstatement useful when

insert-ing the result of anEXECcommand or the result of aUNIONoperation, as shown in

Listing 43.31

Trang 10

LISTING 43.31 UsingTOP in an Insert with a UNION ALL Query

insert top (50) into top_sales

select stor_id, ord_num, ord_date, qty, payterms, title_id from sales

where qty >= 1800

union all

select stor_id, ord_num, ord_date, qty, payterms, title_id from sales_big

where qty >= 1800

order by qty desc

When a TOP (n)clause is used with DELETE,UPDATE, or INSERT, the selection of rows on

which the operation is performed is not guaranteed If you want the TOP(n)clause to

operate on rows in a meaningful chronological order, you must use TOPtogether with

ORDER BYin a subselect statement The following query deletes the 10 rows of the

sales_bigtable that have the earliest order dates:

delete from sales_big

where sales_id in (select top 10 sales_id

from sales_big order by ord_date)

To ensure that only 10 rows are deleted, the column specified in the subselect statement

(sales_id) must be the primary key of the table Using a nonkey column in the subselect

statement could result in the deletion of more than 10 rows if the specified column

matched duplicate values

NOTE

SQL Server Books Online states that when you use TOP (n)withINSERT,UPDATE, and

DELETEoperations, the rows affected should be a random selection of the TOP(n)rows

from the underlying table In practice, this behavior has not been observed Using TOP

(n)withINSERT,UPDATE, and DELETEappears to affect only the first n matching rows.

However, because the row selection is not guaranteed, it is still recommended that you

useTOPtogether with ORDER BYin a subselect to ensure the expected result

The OUTPUT Clause

By default, the execution of a DML statement such as INSERT,UPDATE, or DELETEdoes not

produce any results that indicate what rows changed except for checking @@ROWCOUNTto

determine the number of rows affected

In SQL Server 2005, the INSERT,UPDATE, and DELETEstatements were enhanced to support

anOUTPUTclause to be able to identify the actual rows affected by the DML statement The

OUTPUTclause allows you to return data from a modification statement (INSERT,UPDATE, or

DELETE) This data can be returned as a result set to the caller or returned into a table

vari-able or an output tvari-able To capture information on the affected rows, the OUTPUTclause

provides access to the insertedanddeletedvirtual tables that are normally accessible

Định dạng
Số trang	10
Dung lượng	192,19 KB