Microsoft SQL Server 2008 R2 Unleashed- P172 pot

Temp Tables Versus Table Variables Versus Common Table Expressions SQL Server 2008 provides multiple options for working with temporary result sets in T-SQL code:... Common table express

Trang 1

UNION Versus UNION ALL Performance

You should use UNION ALLinstead of UNIONif there is no need to eliminate duplicate result

rows from the result sets being combined with the UNIONoperator The UNIONstatement

has to combine the result sets into a worktable to remove any duplicate rows from the

result set UNION ALLsimply concatenates the result sets together, without the overhead of

putting them into a worktable to remove duplicate rows

Use IF EXISTS Instead of SELECT COUNT(*)

You should use IF EXISTSinstead of SELECT COUNT(*)when checking only for the

exis-tence of any matching data values and when determining the number of matching rows is

not required IF EXISTSstops the processing of the selectquery as soon as the first

matching row is found, whereas SELECT COUNT(*)continues searching until all matches

are found, wasting I/O and CPU cycles For example, you could replace

if (SELECT count(*) FROM dbo.sales WHERE stor_id = ‘6380’) > 0

with an IF EXISTScheck similar to

if exists (SELECT * FROM dbo.sales WHERE stor_id = ‘6380’)

Avoid Unnecessary ORDER BY or DISTINCT Clauses

When a T-SQL query contains an ORDER BYorDISTINCTclause, a worktable is often

required to process the final result of the query if it cannot determine that the rows will

already be retrieved in the desired sort order or that a unique key in the result makes the

rows distinct If a query requires a worktable, that adds extra overhead and I/O to put the

results into the worktable in tempdband do the sorting necessary to order the results or to

eliminate duplicate rows This can result in extended processing time for the query, which

can delay the time it takes for the final result to be returned to the client application

If it is not absolutely necessary for the rows returned to the application to be in a specific

order (for example, returning rows to a gridcontrol where the contents can be re-sorted

by any column in the gridcontrol itself), you should leave off the ORDER BYclause in

your queries

Likewise, you should not arbitrarily include the DISTINCTclause in all your queries unless

it is absolutely necessary to eliminate any duplicate rows from the result set

Temp Tables Versus Table Variables Versus Common Table

Expressions

SQL Server 2008 provides multiple options for working with temporary result sets in

T-SQL code:

Temporary tables

Table variables

Trang 2

Derived tables

Common table expressions

One of the questions you may consider is “Which method should I use and when?”

Whether you use a temporary table, table variable, derived table, or common table

expres-sion depends, in part, on how often and for how long you intend to use it This section

provides some general recommendations to consider

You should use table variables instead of temporary tables in stored procedures whenever

possible or feasible Table variables are memory resident and do not incur the I/O

over-head and system table and I/O contention that can occur in tempdbwith normal

tempo-rary tables However, remember that table variables exist only for the duration of the SQL

batch or stored procedure in which they are defined

In SQL Server 2005 and later, you also have the option of using derived tables or common

table expressions in your queries to generate and hold intermediate result sets that can be

further processed by the main query A derived table is a subquery contained in aFROM

clause that can be referred to by an alias and used as a table in the query Derived tables

and common table expressions can be thought of as sort of dynamic views that exist only

for the duration of the query Derived tables are handy if you don’t need to use a result

set more than once in multiple queries You should consider using derived tables or

common table expressions when possible to completely avoid the use of table variables or

temporary tables, especially if the temporary table or table variable is used only once by a

single query

NOTE

For more information on common table expressions and how to use them, see the

sec-tion “Common Table Expressions” later in this chapter

You should generally consider using temporary tables only when you need to share data

between an application and stored procedures or between stored procedures Also, if the

temporary result set is going to be very large (that is, larger than can be held in SQL

Server cache memory), you should consider storing it in a temporary table rather than a

table variable

NOTE

In SQL Server 2008, you can define tabledata types, which makes it possible to pass

table variables to stored procedures as table parameters, so temp tables aren’t the

only way to share data between stored procedures However, there are some

limita-tions: primarily, the contents of a table parameter passed to a stored procedure are

read only and cannot be modified within the stored procedure If you want to share

data between stored procedures and have the ability to add, remove, or modify rows in

any of the stored procedures, temporary tables are still the best solution

Trang 3

If you need to use temporary tables, you can follow these general guidelines to help

improve their performance:

Select only the columns actually required by the subsequent SQL statements into the

temp table (that is, avoid using select *) This helps reduce the size of the temp

table, thereby reducing the number of writes to tempdband also speeding up access

of the data within the temp table because more rows will fit on a data page,

reduc-ing the number of data pages that need to be accessed by the query

Select only the rows needed by the subsequent queries, again to help limit the size

of the temp table and reduce the amount of I/O in tempdb

If the temporary table will be accessed multiple times by queries using search

argu-ments (SARGs), consider creating an index on the temporary table if it can be used

to speed up the queries against the temp table and reduce I/O Of course, this option

should be considered only if the time and I/O saved by having an index on the

tem-porary table significantly exceeds the time and I/O required to create the index

Avoid Unnecessary Function Executions

If you call a SQL Server function (for example, suser_name(),getdate()) repeatedly

within a procedure or in T-SQL code, you should consider using a local variable to hold

the value returned by the function and use the local variable repeatedly throughout your

SQL statements rather than repeatedly executing the SQL Server function This saves CPU

cycles within your stored procedure and T-SQL code

NOTE

For additional performance-related query recommendations related specifically to how

queries are optimized, see Chapter 35

Cursors and Performance

In contrast to most other programming languages, SQL is a set-based processing language

You retrieve sets of rows, update sets of rows, and delete sets of rows The set of rows

affected is determined by the search conditions specified in the query Unfortunately,

most programmers are used to doing record-oriented operations on data and often want to

apply the same technique to SQL Server data Admittedly, at times, processing rows as a

single result set with a single query can seem difficult or impossible However, because of

the performance implications, cursors should not be used just because it’s easier to

program that way

NOTE

SQL Server 2008 introduces the newMERGEstatement, which provides another

set-ori-ented option for processing a set of input rows and making a row-by-row determination

which rows to ignore or which to insert, update, or delete in the target table For more

information on using the MERGEstatement, see Chapter 42, “What’s New for

Transact-SQL in Transact-SQL Server 2008.”

Trang 4

When to Use Cursors

Application performance can sometimes be slow due to the improper use of cursors You

should always try to write your T-SQL code so SQL Server can perform what it is good at:

set-based operations It makes little sense to have an advanced relational database

manage-ment system (RDBMS) and use it only for one-row-at-a-time retrievals For example, many

update operations performed using cursors can be performed with a single UPDATE

state-ment using the CASEexpression Consider the cursor shown in Listing 43.5

LISTING 43.5 Updating the titles Table by Using a Cursor

/* This is a SQL script to update book prices dependent on current price and

ytd_sales */

/*declare cursor*/

declare titles_curs cursor for

select ytd_sales, price from dbo.titles

for update of price

declare @ytd_sales int, @price money

open titles_curs

fetch next from titles_curs into @ytd_sales, @price

if (@@fetch_status = -1)

begin

print ‘No books found’

close titles_curs

deallocate titles_curs

return

end

while (@@fetch_status = 0)

begin

if @ytd_sales < 500

update titles set price = @price * 75

where current of titles_curs

else

if @price > $15

update titles set price = @price * 9

else

update titles set price = @price * 1.15

fetch next from titles_curs into @ytd_sales, @price

end

if (@@fetch_status = -2)

raiserror (‘Attempt to fetch a row failed’, 16, 1)

close titles_curs

deallocate titles_curs

Trang 5

This cursor can be replaced with a simple, single UPDATEstatement, using the CASE

expres-sion, as shown in Listing 43.6

LISTING 43.6 Thetitles Cursor Example Performed with a Single UPDATE Statement Using

theCASE Expression

update titles

set price = case when ytd_sales < 500 then price *.75

when price > $15 then price * 90 else price * 1.15

end

The advantages with this approach are significant performance improvement and much

cleaner and simpler code In testing the performance of the single update versus the cursor

using the bigpubs2008database, the cursor required on average around 100 milliseconds

(ms) to complete The single updatestatement required, on average, about 10 ms (Your

results may vary depending on hardware capabilities.) Although both of these completed

within a subsecond response time, consider that the cursor took 10 times longer to

complete than the single update Factor that out over hundreds of thousands or millions

of rows, and you could be looking at a significant performance difference

Why is the cursor so much slower? Well, for one thing, a table scan performed by an

UPDATE, aDELETE, or aSELECTuses internal, compiled C code to loop through the result

set A cursor uses interpreted SQL code In addition, with a cursor, you are performing

multiple lines of code per row retrieved Thetitlescursor example is a relatively simple

one; it performs one or two conditional checks and a single update per row, but it is still

three or four times slower Because of the overhead required to process cursors, set-oriented

operations typically run much faster, even if multiple passes of the table are required

Although set-oriented operations are almost always faster than cursor operations, the one

possible disadvantage of using a single update is locking concurrency Even though a

single update runs faster than a cursor, while it is running, the single update might end up

locking the entire table for an extended period of time This would prevent other users

from accessing the table during the update If concurrent access to the table is more

important than the time it takes for the update to complete, you might want to consider

using a cursor A cursor locks the data only a row at a time instead of locking the entire

table (as long as each row is committed individually and the entire cursor is not in a

transaction)

Another situation in which you might want to consider using cursors is for scrolling

appli-cations when the result sets can be quite large Consider a customer service application

The customer representative might need to pull up a list of cases and case contacts

associ-ated with a customer If the result sets are small, you can just pull the entire result set

down into a list box and let the user scroll through them and not need to use a cursor

Trang 6

However, if thousands of rows of information are likely, you might want to pull back only

a block of rows at a time, especially if the user needs to look at only a few of the rows to

get the information he or she needs It probably wouldn’t be worth pulling back all that

data across the network just for a few rows

In this type of situation, you might want to use a scrollable API server cursor This way,

you can retrieve the appropriate number of rows to populate the list box and then use the

available scrolling options to quickly fetch to the bottom of the list, using the LASTor

ABSOLUTEn options, or you can go backward or forward by using the RELATIVEoption

NOTE

You need to be careful using the scrollable API server cursor approach in a multitier

environment Many multitier architectures include a middle data layer that often uses

connection sharing for multiple clients, and the users are typically assigned any

avail-able connection when they need to access SQL Server Users do not necessarily use

the same connection each time Therefore, if a user created a cursor in one

connec-tion, the next time the user submitted a fetch through the data layer, he or she might

get a different connection, and the cursor will not be available

One solution for this problem is to go back to retrieving the entire result set down to

the client application Another possible solution is to use a global temp table as a

type of homemade insensitive cursor to hold the result set and grab the data from the

global temp table in chunks, as needed With the temp table approach, you need to

make sure a sequential key is on the table so you can quickly grab the block of rows

you need You need to be aware of the potential impact ontempdbperformance and

the size requirements oftempdbif the result sets are large and you have many

concur-rent users

As a general rule, you should use cursors only as a last resort when no set-oriented

solu-tion is feasible If you have decided that a cursor is the appropriate solusolu-tion, you should

try to make it as efficient as possible by limiting the number of commands to be executed

within the cursor loop as much as possible Also, you should try to keep the cursor

processing on the server side within stored procedures If you will be performing multiple

fetches over the network (for example, to support a scrolling application), you should use

an API server cursor You should avoid using client-side cursors that will be performing

many cursor operations in the client application; otherwise, you will find your application

making excessive requests to the server, and the volume of network roundtrips will make

for a sloth-like application

Variable Assignment in UPDATE Statements

One commonly overlooked feature in T-SQL is the ability to assign values to local

vari-ables in the SETclause of the UPDATEstatement This capability can help improve query

performance by reducing locking and CPU contention and reducing the number of

state-ments required in a T-SQL batch

Trang 7

The simplified syntax of the SETclause for assigning values to variables is as follows:

SET

{ column_name = { expression | DEFAULT | NULL }

| @variable = expression

| @variable = column = expression [ , n ]

} [ , n ]

One common use of variable assignment in UPDATEstatements is when you have a table

that is used for storing and generating unique key values To demonstrate this, you can

create the keygentable and populate it as shown in Listing 43.7

LISTING 43.7 Creating and Populating the keygen Table

create table keygen (keytype char(1), keyval int)

go

insert keygen(keytype, keyval) values (‘x’, 1)

go

The typical approach often used to perform the task of retrieving a key value and updating

thekeygentable to generate the next key is to issue a SELECTstatement and UPDATE

state-ment within a transaction Listing 43.8 shows an example of this

LISTING 43.8 Retrieving and Updating keyval with SELECT and UPDATE

begin tran

declare @newkey int

Select current keyval into @newkey

select @newkey = keyval

from keygen (XLOCK)

where keytype = ‘x’

update keygen

set keyval = keyval + 1

commit

select @newkey as newkey

go

newkey

-1

Trang 8

TIP

In Listing 43.8, the XLOCKhint is specified in the SELECTstatement This prevents two

separate user processes from running this T-SQL batch at the same time and both

acquiring the same keyval With the XLOCKhint, only one of the processes can acquire

an exclusive lock, and the other process waits until the lock is released and acquires

the next keyval

The use of XLOCKis definitely preferable to HOLDLOCKbecause the use of HOLDLOCKin

this type of scenario often leads to a deadlock situation

By using variable assignment in an UPDATEstatement, you can eliminate the SELECT

state-ment altogether and capture the keyvalin the same statement you use to update the

keygentable, as shown in Listing 43.9

LISTING 43.9 Using Variable Assignment in an UPDATE to Update and Retrieve keyval

declare @newkey int

update keygen

set keyval = keyval + 1,

@newkey = keyval

go

newkey

-2

Notice that the value assigned to the local variable using the syntax shown in Listing 43.9

is the value of the keyvalcolumn prior to the update If you prefer to assign the value of

the column after the column is updated, you use the @variable = column = expression

syntax, as shown in Listing 43.10

LISTING 43.10 Using Variable Assignment in an UPDATE to Update and Retrieve keyval

After Update

declare @newkey int

update keygen

set @newkey = keyval = keyval + 1

go

Trang 9

newkey

-4

You need to be aware that the variable assignment is performed for every row that qualifies

in the update The resulting value of the local variable is the value of the last row updated

Another use for variable assignment in UPDATEstatements is to accumulate the sum of a

column into a local variable for all the rows being updated The alternative approach

would be to use a cursor, as shown in Listing 43.11

LISTING 43.11 Using a Cursor to Accumulate a Sum of a Column for Each Row Updated

declare c1 cursor for

select isnull(ytd_sales, 0)

from titles where type = ‘business’

for update of price

go

declare @ytd_sales_total int,

@ytd_sales int

select @ytd_sales_total = 0

open c1

fetch c1 into @ytd_sales

while @@fetch_status = 0

begin

update titles

set price = price

where current of c1

select @ytd_sales_total = @ytd_sales_total + @ytd_sales

fetch c1 into @ytd_sales

end

select @ytd_sales_total as ytd_sales_total

close c1

deallocate c1

go

ytd_sales_total

-30788

By using variable assignment in an UPDATEstatement, you can replace the cursor in Listing

43.11 with a single UPDATEstatement, as shown in Listing 43.12

Trang 10

LISTING 43.12 Using Variable Assignment in an UPDATE Statement to Accumulate a Sum of

a Column for Each Row Updated

declare @ytd_sales_total int

set @ytd_sales_total = 0

update titles

set price = price,

@ytd_sales_total = @ytd_sales_total + isnull(ytd_sales, 0)

where type = ‘business’

select @ytd_sales_total as ytd_sales_total

go

ytd_sales_total

-30788

As you can see from the examples presented in this chapter, using variable assignment in

UPDATEstatements results in much more concise and efficient T-SQL code than using

cursors or other alternatives When your code is more concise and consistent, it will run

faster, requiring fewer CPU resources Also, faster, more efficient code reduces the amount

of time locks are held, which reduces the chance for locking contention, which also helps

improve overall application performance

T-SQL Tips and Tricks

The following sections provide some general tips and tricks to help you get the most from

your T-SQL code

Date Calculations

Occasionally, you may find that you need to start with a date value and use it to calculate

some other date For example, your SQL code might need to determine what date is the

first day of the month or last day of the month As you may know, working with the

datetimedata type in SQL Server can be a bit of a challenge You probably already know

how to use the datepart()function to extract specific components of a date (for example,

year, month, day) You can then use those components along with a number of functions

to calculate a date that you might need This section provides some examples of

algo-rithms you can use to generate some commonly needed date values

TheDATEDIFFfunction calculates the difference between two dates, where the difference is

based on an interval, such as hours, days, weeks, months, years, and so on The DATEADD

function calculates a date by adding an interval of time to a date In this case, the

inter-vals of time are the same as those used by the DATEDIFFfunction Using the DATEADDand

DATEDIFFfunctions to calculate specific dates requires thinking outside the box a bit to

Định dạng
Số trang	10
Dung lượng	192,32 KB