Temp Tables Versus Table Variables Versus Common Table Expressions SQL Server 2008 provides multiple options for working with temporary result sets in T-SQL code:... Common table express
Trang 1UNION Versus UNION ALL Performance
You should use UNION ALLinstead of UNIONif there is no need to eliminate duplicate result
rows from the result sets being combined with the UNIONoperator The UNIONstatement
has to combine the result sets into a worktable to remove any duplicate rows from the
result set UNION ALLsimply concatenates the result sets together, without the overhead of
putting them into a worktable to remove duplicate rows
Use IF EXISTS Instead of SELECT COUNT(*)
You should use IF EXISTSinstead of SELECT COUNT(*)when checking only for the
exis-tence of any matching data values and when determining the number of matching rows is
not required IF EXISTSstops the processing of the selectquery as soon as the first
matching row is found, whereas SELECT COUNT(*)continues searching until all matches
are found, wasting I/O and CPU cycles For example, you could replace
if (SELECT count(*) FROM dbo.sales WHERE stor_id = ‘6380’) > 0
with an IF EXISTScheck similar to
if exists (SELECT * FROM dbo.sales WHERE stor_id = ‘6380’)
Avoid Unnecessary ORDER BY or DISTINCT Clauses
When a T-SQL query contains an ORDER BYorDISTINCTclause, a worktable is often
required to process the final result of the query if it cannot determine that the rows will
already be retrieved in the desired sort order or that a unique key in the result makes the
rows distinct If a query requires a worktable, that adds extra overhead and I/O to put the
results into the worktable in tempdband do the sorting necessary to order the results or to
eliminate duplicate rows This can result in extended processing time for the query, which
can delay the time it takes for the final result to be returned to the client application
If it is not absolutely necessary for the rows returned to the application to be in a specific
order (for example, returning rows to a gridcontrol where the contents can be re-sorted
by any column in the gridcontrol itself), you should leave off the ORDER BYclause in
your queries
Likewise, you should not arbitrarily include the DISTINCTclause in all your queries unless
it is absolutely necessary to eliminate any duplicate rows from the result set
Temp Tables Versus Table Variables Versus Common Table
Expressions
SQL Server 2008 provides multiple options for working with temporary result sets in
T-SQL code:
Temporary tables
Table variables
Trang 2Derived tables
Common table expressions
One of the questions you may consider is “Which method should I use and when?”
Whether you use a temporary table, table variable, derived table, or common table
expres-sion depends, in part, on how often and for how long you intend to use it This section
provides some general recommendations to consider
You should use table variables instead of temporary tables in stored procedures whenever
possible or feasible Table variables are memory resident and do not incur the I/O
over-head and system table and I/O contention that can occur in tempdbwith normal
tempo-rary tables However, remember that table variables exist only for the duration of the SQL
batch or stored procedure in which they are defined
In SQL Server 2005 and later, you also have the option of using derived tables or common
table expressions in your queries to generate and hold intermediate result sets that can be
further processed by the main query A derived table is a subquery contained in aFROM
clause that can be referred to by an alias and used as a table in the query Derived tables
and common table expressions can be thought of as sort of dynamic views that exist only
for the duration of the query Derived tables are handy if you don’t need to use a result
set more than once in multiple queries You should consider using derived tables or
common table expressions when possible to completely avoid the use of table variables or
temporary tables, especially if the temporary table or table variable is used only once by a
single query
NOTE
For more information on common table expressions and how to use them, see the
sec-tion “Common Table Expressions” later in this chapter
You should generally consider using temporary tables only when you need to share data
between an application and stored procedures or between stored procedures Also, if the
temporary result set is going to be very large (that is, larger than can be held in SQL
Server cache memory), you should consider storing it in a temporary table rather than a
table variable
NOTE
In SQL Server 2008, you can define tabledata types, which makes it possible to pass
table variables to stored procedures as table parameters, so temp tables aren’t the
only way to share data between stored procedures However, there are some
limita-tions: primarily, the contents of a table parameter passed to a stored procedure are
read only and cannot be modified within the stored procedure If you want to share
data between stored procedures and have the ability to add, remove, or modify rows in
any of the stored procedures, temporary tables are still the best solution
Trang 3If you need to use temporary tables, you can follow these general guidelines to help
improve their performance:
Select only the columns actually required by the subsequent SQL statements into the
temp table (that is, avoid using select *) This helps reduce the size of the temp
table, thereby reducing the number of writes to tempdband also speeding up access
of the data within the temp table because more rows will fit on a data page,
reduc-ing the number of data pages that need to be accessed by the query
Select only the rows needed by the subsequent queries, again to help limit the size
of the temp table and reduce the amount of I/O in tempdb
If the temporary table will be accessed multiple times by queries using search
argu-ments (SARGs), consider creating an index on the temporary table if it can be used
to speed up the queries against the temp table and reduce I/O Of course, this option
should be considered only if the time and I/O saved by having an index on the
tem-porary table significantly exceeds the time and I/O required to create the index
Avoid Unnecessary Function Executions
If you call a SQL Server function (for example, suser_name(),getdate()) repeatedly
within a procedure or in T-SQL code, you should consider using a local variable to hold
the value returned by the function and use the local variable repeatedly throughout your
SQL statements rather than repeatedly executing the SQL Server function This saves CPU
cycles within your stored procedure and T-SQL code
NOTE
For additional performance-related query recommendations related specifically to how
queries are optimized, see Chapter 35
Cursors and Performance
In contrast to most other programming languages, SQL is a set-based processing language
You retrieve sets of rows, update sets of rows, and delete sets of rows The set of rows
affected is determined by the search conditions specified in the query Unfortunately,
most programmers are used to doing record-oriented operations on data and often want to
apply the same technique to SQL Server data Admittedly, at times, processing rows as a
single result set with a single query can seem difficult or impossible However, because of
the performance implications, cursors should not be used just because it’s easier to
program that way
NOTE
SQL Server 2008 introduces the newMERGEstatement, which provides another
set-ori-ented option for processing a set of input rows and making a row-by-row determination
which rows to ignore or which to insert, update, or delete in the target table For more
information on using the MERGEstatement, see Chapter 42, “What’s New for
Transact-SQL in Transact-SQL Server 2008.”
Trang 4When to Use Cursors
Application performance can sometimes be slow due to the improper use of cursors You
should always try to write your T-SQL code so SQL Server can perform what it is good at:
set-based operations It makes little sense to have an advanced relational database
manage-ment system (RDBMS) and use it only for one-row-at-a-time retrievals For example, many
update operations performed using cursors can be performed with a single UPDATE
state-ment using the CASEexpression Consider the cursor shown in Listing 43.5
LISTING 43.5 Updating the titles Table by Using a Cursor
/* This is a SQL script to update book prices dependent on current price and
ytd_sales */
/*declare cursor*/
declare titles_curs cursor for
select ytd_sales, price from dbo.titles
for update of price
declare @ytd_sales int, @price money
open titles_curs
fetch next from titles_curs into @ytd_sales, @price
if (@@fetch_status = -1)
begin
print ‘No books found’
close titles_curs
deallocate titles_curs
return
end
while (@@fetch_status = 0)
begin
if @ytd_sales < 500
update titles set price = @price * 75
where current of titles_curs
else
if @price > $15
update titles set price = @price * 9
where current of titles_curs
else
update titles set price = @price * 1.15
where current of titles_curs
fetch next from titles_curs into @ytd_sales, @price
end
if (@@fetch_status = -2)
raiserror (‘Attempt to fetch a row failed’, 16, 1)
close titles_curs
deallocate titles_curs
Trang 5This cursor can be replaced with a simple, single UPDATEstatement, using the CASE
expres-sion, as shown in Listing 43.6
LISTING 43.6 Thetitles Cursor Example Performed with a Single UPDATE Statement Using
theCASE Expression
update titles
set price = case when ytd_sales < 500 then price *.75
when price > $15 then price * 90 else price * 1.15
end
The advantages with this approach are significant performance improvement and much
cleaner and simpler code In testing the performance of the single update versus the cursor
using the bigpubs2008database, the cursor required on average around 100 milliseconds
(ms) to complete The single updatestatement required, on average, about 10 ms (Your
results may vary depending on hardware capabilities.) Although both of these completed
within a subsecond response time, consider that the cursor took 10 times longer to
complete than the single update Factor that out over hundreds of thousands or millions
of rows, and you could be looking at a significant performance difference
Why is the cursor so much slower? Well, for one thing, a table scan performed by an
UPDATE, aDELETE, or aSELECTuses internal, compiled C code to loop through the result
set A cursor uses interpreted SQL code In addition, with a cursor, you are performing
multiple lines of code per row retrieved Thetitlescursor example is a relatively simple
one; it performs one or two conditional checks and a single update per row, but it is still
three or four times slower Because of the overhead required to process cursors, set-oriented
operations typically run much faster, even if multiple passes of the table are required
Although set-oriented operations are almost always faster than cursor operations, the one
possible disadvantage of using a single update is locking concurrency Even though a
single update runs faster than a cursor, while it is running, the single update might end up
locking the entire table for an extended period of time This would prevent other users
from accessing the table during the update If concurrent access to the table is more
important than the time it takes for the update to complete, you might want to consider
using a cursor A cursor locks the data only a row at a time instead of locking the entire
table (as long as each row is committed individually and the entire cursor is not in a
transaction)
Another situation in which you might want to consider using cursors is for scrolling
appli-cations when the result sets can be quite large Consider a customer service application
The customer representative might need to pull up a list of cases and case contacts
associ-ated with a customer If the result sets are small, you can just pull the entire result set
down into a list box and let the user scroll through them and not need to use a cursor
Trang 6However, if thousands of rows of information are likely, you might want to pull back only
a block of rows at a time, especially if the user needs to look at only a few of the rows to
get the information he or she needs It probably wouldn’t be worth pulling back all that
data across the network just for a few rows
In this type of situation, you might want to use a scrollable API server cursor This way,
you can retrieve the appropriate number of rows to populate the list box and then use the
available scrolling options to quickly fetch to the bottom of the list, using the LASTor
ABSOLUTEn options, or you can go backward or forward by using the RELATIVEoption
NOTE
You need to be careful using the scrollable API server cursor approach in a multitier
environment Many multitier architectures include a middle data layer that often uses
connection sharing for multiple clients, and the users are typically assigned any
avail-able connection when they need to access SQL Server Users do not necessarily use
the same connection each time Therefore, if a user created a cursor in one
connec-tion, the next time the user submitted a fetch through the data layer, he or she might
get a different connection, and the cursor will not be available
One solution for this problem is to go back to retrieving the entire result set down to
the client application Another possible solution is to use a global temp table as a
type of homemade insensitive cursor to hold the result set and grab the data from the
global temp table in chunks, as needed With the temp table approach, you need to
make sure a sequential key is on the table so you can quickly grab the block of rows
you need You need to be aware of the potential impact ontempdbperformance and
the size requirements oftempdbif the result sets are large and you have many
concur-rent users
As a general rule, you should use cursors only as a last resort when no set-oriented
solu-tion is feasible If you have decided that a cursor is the appropriate solusolu-tion, you should
try to make it as efficient as possible by limiting the number of commands to be executed
within the cursor loop as much as possible Also, you should try to keep the cursor
processing on the server side within stored procedures If you will be performing multiple
fetches over the network (for example, to support a scrolling application), you should use
an API server cursor You should avoid using client-side cursors that will be performing
many cursor operations in the client application; otherwise, you will find your application
making excessive requests to the server, and the volume of network roundtrips will make
for a sloth-like application
Variable Assignment in UPDATE Statements
One commonly overlooked feature in T-SQL is the ability to assign values to local
vari-ables in the SETclause of the UPDATEstatement This capability can help improve query
performance by reducing locking and CPU contention and reducing the number of
state-ments required in a T-SQL batch
Trang 7The simplified syntax of the SETclause for assigning values to variables is as follows:
SET
{ column_name = { expression | DEFAULT | NULL }
| @variable = expression
| @variable = column = expression [ , n ]
} [ , n ]
One common use of variable assignment in UPDATEstatements is when you have a table
that is used for storing and generating unique key values To demonstrate this, you can
create the keygentable and populate it as shown in Listing 43.7
LISTING 43.7 Creating and Populating the keygen Table
create table keygen (keytype char(1), keyval int)
go
insert keygen(keytype, keyval) values (‘x’, 1)
go
The typical approach often used to perform the task of retrieving a key value and updating
thekeygentable to generate the next key is to issue a SELECTstatement and UPDATE
state-ment within a transaction Listing 43.8 shows an example of this
LISTING 43.8 Retrieving and Updating keyval with SELECT and UPDATE
begin tran
declare @newkey int
Select current keyval into @newkey
select @newkey = keyval
from keygen (XLOCK)
where keytype = ‘x’
update keygen
set keyval = keyval + 1
where keytype = ‘x’
commit
select @newkey as newkey
go
newkey
-1
Trang 8TIP
In Listing 43.8, the XLOCKhint is specified in the SELECTstatement This prevents two
separate user processes from running this T-SQL batch at the same time and both
acquiring the same keyval With the XLOCKhint, only one of the processes can acquire
an exclusive lock, and the other process waits until the lock is released and acquires
the next keyval
The use of XLOCKis definitely preferable to HOLDLOCKbecause the use of HOLDLOCKin
this type of scenario often leads to a deadlock situation
By using variable assignment in an UPDATEstatement, you can eliminate the SELECT
state-ment altogether and capture the keyvalin the same statement you use to update the
keygentable, as shown in Listing 43.9
LISTING 43.9 Using Variable Assignment in an UPDATE to Update and Retrieve keyval
declare @newkey int
update keygen
set keyval = keyval + 1,
@newkey = keyval
where keytype = ‘x’
select @newkey as newkey
go
newkey
-2
Notice that the value assigned to the local variable using the syntax shown in Listing 43.9
is the value of the keyvalcolumn prior to the update If you prefer to assign the value of
the column after the column is updated, you use the @variable = column = expression
syntax, as shown in Listing 43.10
LISTING 43.10 Using Variable Assignment in an UPDATE to Update and Retrieve keyval
After Update
declare @newkey int
update keygen
set @newkey = keyval = keyval + 1
where keytype = ‘x’
select @newkey as newkey
go
Trang 9newkey
-4
You need to be aware that the variable assignment is performed for every row that qualifies
in the update The resulting value of the local variable is the value of the last row updated
Another use for variable assignment in UPDATEstatements is to accumulate the sum of a
column into a local variable for all the rows being updated The alternative approach
would be to use a cursor, as shown in Listing 43.11
LISTING 43.11 Using a Cursor to Accumulate a Sum of a Column for Each Row Updated
declare c1 cursor for
select isnull(ytd_sales, 0)
from titles where type = ‘business’
for update of price
go
declare @ytd_sales_total int,
@ytd_sales int
select @ytd_sales_total = 0
open c1
fetch c1 into @ytd_sales
while @@fetch_status = 0
begin
update titles
set price = price
where current of c1
select @ytd_sales_total = @ytd_sales_total + @ytd_sales
fetch c1 into @ytd_sales
end
select @ytd_sales_total as ytd_sales_total
close c1
deallocate c1
go
ytd_sales_total
-30788
By using variable assignment in an UPDATEstatement, you can replace the cursor in Listing
43.11 with a single UPDATEstatement, as shown in Listing 43.12
Trang 10LISTING 43.12 Using Variable Assignment in an UPDATE Statement to Accumulate a Sum of
a Column for Each Row Updated
declare @ytd_sales_total int
set @ytd_sales_total = 0
update titles
set price = price,
@ytd_sales_total = @ytd_sales_total + isnull(ytd_sales, 0)
where type = ‘business’
select @ytd_sales_total as ytd_sales_total
go
ytd_sales_total
-30788
As you can see from the examples presented in this chapter, using variable assignment in
UPDATEstatements results in much more concise and efficient T-SQL code than using
cursors or other alternatives When your code is more concise and consistent, it will run
faster, requiring fewer CPU resources Also, faster, more efficient code reduces the amount
of time locks are held, which reduces the chance for locking contention, which also helps
improve overall application performance
T-SQL Tips and Tricks
The following sections provide some general tips and tricks to help you get the most from
your T-SQL code
Date Calculations
Occasionally, you may find that you need to start with a date value and use it to calculate
some other date For example, your SQL code might need to determine what date is the
first day of the month or last day of the month As you may know, working with the
datetimedata type in SQL Server can be a bit of a challenge You probably already know
how to use the datepart()function to extract specific components of a date (for example,
year, month, day) You can then use those components along with a number of functions
to calculate a date that you might need This section provides some examples of
algo-rithms you can use to generate some commonly needed date values
TheDATEDIFFfunction calculates the difference between two dates, where the difference is
based on an interval, such as hours, days, weeks, months, years, and so on The DATEADD
function calculates a date by adding an interval of time to a date In this case, the
inter-vals of time are the same as those used by the DATEDIFFfunction Using the DATEADDand
DATEDIFFfunctions to calculate specific dates requires thinking outside the box a bit to