T ABLE OF C ONTENTSThis Isn’t An Issue That Is Likely To Go Away Soon An examination of disk contention and performance with multiple jobs running at the same time.. Introduction Heavy c
Trang 1Visit us on the World Wide Web at www.sqlservercentral.com
A t e c h n i c a l J o u r n a l f o r t h e S Q L S e r v e r C e n t r a l c o m a n d P A S S c o m m u n i t i e s
Trang 3A publication of The Central Publishing Group
Frank Scafidi Rob Anderson
Typesetting and Layout:
Subscriptions and address changes:
For subscription and address changes, email
subscriptions@sqlservercentral.com For renewals, you can
extend your subscription at ww.sqlservercentral.com/store.
Feedback:
editor@sqlserverstandard.com
Copyright
Unless otherwise noted, all programming code and
arti-cles in this issue are the exclusive copyright of The Central
Publishing Group Permission to photocopy for internal or
personal use is granted to the purchaser of the magazine.
SQL Server Standard is an independent publication and is
not affiliated with Microsoft Corporation Microsoft
Corporation is not responsible in any way for the editorial
policy or other contents of this publication SQL Server,
ADO.NET, Windows, Windows NT, Windows 2000 and Visual
Studio are registered trademarks of Microsoft Corporation.
Rather than put a trademark symbol in each occurrence
of other trademarked name, we state that we are using the
names only in an editorial fashion with no intention of
infringement of the trademark Although all reasonable
attempts are made to ensure accuracy, the publisher does
not assume any liability for errors or omissions anywhere in
this publication It is the reader’s responsibility to ensure
that the procedures are acceptable in the reader’s
envi-ronment and that proper backup is created before
imple-menting any procedures.
SQLServerCentral.com Staff:
Brian Knight, President
Steve Jones, Chief Operating Officer
Andy Warren, Chief Technology Officer
You can reach the SQL Server Standard at:
To that end we’ve included a look at a variety of ance related topics We have a great article on disk con- tention with multiple tasks running While the article looks at scheduled jobs that may conflict, any large processes, scheduled or not, might have similar issues Greg Gonzalez, architect of sqlSentry, has written a fantastic reference about your disk system and one that you should use to examine the periodic slow performance of any server, look- ing for overlapping processes.
perform-We also have noted author Rahul Sharma’s look at locking, blocking, and deadlocks This is one that will probably teach anyone something about this fundamental database process; I know I learned a couple things when reading it.
We also examine the performance of GUIDs, a topic that I have not seen anything about, despite the fact that Microsoft pushes their use Sean McCown presents his research and some benchmarks on their use in comparison with integers and the identity property.
We have a couple of security related topics as well this time One very detailed look at the various ways you can discov-
er all those hidden SQL Servers on your network by using ious tools, written by Alan Miner as well as a good introduc- tion to SQL Injection from Dinesh Asanka Our best wishes to Dinesh, his family, and friends as they cope with the tsunami damage in Sri Lanka He’s OK, but there is still a lot to deal with and get past.
var-Lastly we have Randy Dyess of www.transactsql.com with a fantastic explanation of why you should have clustered indexes on your tables He’s taken a look at the perform- ance impacts of forwarding pointers, something else that doesn’t seem to ever have been tackled on the web.
This has been an interesting issue and one that’s definitely taught me a thing or two Hopefully you’ll enjoy it and take something away as well that you can use to make your sys- tems run a little smoother.
And your phone a little quieter.
Steve Jones
Trang 5T ABLE OF C ONTENTS
This Isn’t An Issue That Is Likely To Go Away Soon
An examination of disk contention and performance with multiple jobs running at the same time.
By Greg Gonzalez
L OCKING , B LOCKING AND D EADLOCKS •13
It Is Important To Know And Understand How To Maintain The Logical Unit Of Work
A detailed explanation of locks and blocks that can occur on a SQL Server An examination of causes and
potential ways to avoid issues By Rahul Sharma
D ISCOVERING SQL S ERVERS •20
My General Search Strategy Is To Find All The MSS Candidates
Using A Variety Of Sources And Techniques.
A look at finding and identifying SQL Servers on your networks using a variety of tools By Alan Miner
So, In Short, Don’t Let Your Developers Use GUID’S
Some analysis on the performance impacts (bad) of using GUIDs for a primary key v integers
with the identity properties By Sean McCown
W HAT ’ S W RONG W ITH G UID ’ S •25
Even If Uniqueidentifiers Aren’t unique, they’re damned sight more unique
than integers.
An alternative point of view on why GUID have a time and place in your database.
F ORWARDING P OINTERS •26
Logically, Forwarding Pointers Should Mean That
SQL Server Has To Read Extra Data Pages
A detailed examination of clustered indexes and forwarding pointers with the performance
impact on data retrieval By Randy Dyess
I S Y OUR D ATABASE S ECURE ? •28
They Are Like Guerrilla War Fighters; One Tiny Fault Is More Than Enough
For Them To Create A Mess
A look at SQL Injection potential problems in your applications By Dinesh Asanka
P ASS •31
A featured interview with Rony Ross By Steve Mong
Trang 6S CHEDULED J OBS
AND D ISK C ONTENTION
By: Greg Gonzalez
InterCerve, Inc.
Introduction
Heavy contention for disk resources can dramatically impact SQL
Server performance, and SQL Server Agent jobs can be some of
the biggest offenders In this article I’ll cover how and why jobs and
job collisions can cause disk contention, how to isolate the sources
of disk contention using Windows performance counters and
ana-lyze the data using some simple formulas, and then I’ll present a
process to reduce contention in general via “leveling” your job
schedules
Jobs and Disk Contention
It’s important to remember that disk contention is only one type of
resource contention that can affect SQL Server performance, but it
is a significant one Likewise, the role jobs can play in this regard is
significant enough to merit the focus of this article
As you know, jobs can perform all kinds of operations, including but
not limited to database maintenance activities (index rebuilds,
backups, integrity checks, etc.), ETL (import/export) processes such
as those using DTS and BCP, and many other operations that tend
to read and write large amounts of data to or from disk In an ideal
world every database would have separate disk controllers anddisk arrays to handle its data files, index files, transaction logs, back-ups, etc But disk hardware is costly and multiple servers + data-base server licenses are often times out of the question, so this isoften the exception rather than the rule, and it seems as if there arenever enough disk resources to go around If that’s the case in yourenvironment and your jobs are doing work outside of SQL Serverand utilizing the same physical disk resources used by your data-base files, major performance headaches can result
In addition to disk “resource sharing”, compounding the problem isthat the native tools make it all too easy to create SQLAgent jobswith overlapping, or “colliding”, schedules For example, over timeyou may end up with:
• a transaction log backup running every hour on the hour
• a DTS import which runs every 30 minutes
• an data archive job running every 15 minutes
• an index defrag job which runs nightly at 4am
• In this case we have several recurring collisions:
Collision Job Collisions # of Distinct Distinct Recurrence Collision per Day Collisions Coll Per Day
Every 24 hours Data Archive DTS Import Trans Log Index Defrag 1 6 6
Total Distinct Collisions per Day 99
Table 1: Job Collisions Example
Collisions per Day: The total number of times the jobs will collide
each day based upon their schedules
# of Distinct Collisions: This is the total number of distinct collisions
for each combination of jobs For example, every hour the Data
Archive job will collide with the DTS Import job and the Transaction
Log job (2 collisions), and the DTS Import Job will collide with the
Transaction Log job (1 collision), for a total of 3 distinct collisions
Distinct Collisions per Day: Collisions per Day multiplied by # of
Distinct Collisions
Total Distinct Collisions per Day: This is calculated by summing
Distinct Collisions per Day (126), then eliminating duplicates by
backing out the collisions which have already been accounted for
in one of the previous collision combinations: 48 + (24*2)+(1*3) = 99
So with only four jobs, we actually have a total of 99 distinct
colli-sions per day! Needless to say, most SQL Servers have more than 4
jobs, so it’s likely your collision total is higher than this In addition,
SQL Server 2005 actually introduces the concept of “shared
sched-ules”, where multiple jobs can reuse the exact same schedules!
That said, this isn’t an issue that is likely to go away anytime soon
So why are schedule collisions a problem? Because, for the sons mentioned above, the result is often disk contention, whichleads directly to the phenomena where the aggregate durationand performance impact of simultaneously processed tasks will begreater than the duration and impact of the same tasks processedindependently
rea-Disk contention happens because when reading from and writing
to multiple files and disk sectors on the same physical diskresources simultaneously, the operating system and disk subsystemhave to do a lot of extra work Extra seeks and platter rotationsresult, during which time no read/write activity can occur To put itanother way, disk controllers and disks have limited throughput, sowhen the subsystem is overloaded requests end up being queued,which causes disk transfer delays As a result:
• Jobs can’t achieve their optimal runtimes This can lead
to system slowdowns, maintenance window overruns,among other problems
• Application-related DML activity takes longer, manifesting
in delays for end users
Trang 7• If the contention is severe enough, “buffer latch timeout”
and other errors can occur
• Because of all of the extra ongoing work, lifespan for your
disk resources can be dramatically shortened
If nothing is done about it, what can result is a compounding effect
where everything tends to run slower, and you end up with
frustrat-ed users as well as premature hardware/software upgrades
because you aren’t able to get the most out of available resources
Measuring Disk Contention
Part of the problem in isolating disk contention issues is that it can
be a challenge using the available Windows performance
coun-ters to determine exactly what is happening with disk performance,
isolate the processes involved, and figure out what they are doing
This is because many of the disk counters are “general” in nature in
that they reflect the total activity on a server or disk
What is needed is a way to interpret the general counter data in
order to determine the source of the activity Fortunately, with some
of the counters we can isolate activity directly related to the various
SQL Server processes From there we can calculate percentages
of activity related to SQL Server as well as other processes With
these counters and a few simple formulas, you can gain greater
insight into the counter data to determine if disk bottlenecks are
causing performance problems for your SQL Servers
Simulating Contention
If you’re like me, your databases have only grown larger over time,
but your maintenance windows haven’t This has heightened the
need to optimize backup performance to ensure they always run
as quickly as possible, and avoid contending with other activities
on the SQL Server Products such as compressed backup software
have become one of a DBA’s best tools to combat backup size
and speed issues But even with compressed backups, disk
con-tention can still be an issue
So for these tests we will look at how a database backup job can
fight with other jobs for disk resources, ending up with less than
opti-mal performance for all processes involved For each test I’ll use a
SQL Agent job which performs a standard non-compressed
data-base backup to a local disk, a common scenario Then to create
the disk contention conditions, I’ll combine the backup with other
write-intensive jobs, for a total of four separate tests
1 Database backup job only.
2 Database backup job, plus an “Archive” job performing
heavy DML activity The job simulates an “archive” function
by copying approximately 50MB of data in 700,000 rows using
an INSERT/SELECT This occurs in another database on the
same SQL Server
3 Database backup job, plus a “File Copy” job.This involved
using a job with a CmdExec step and xcopy to copy a 62MB
ASCII file from a network drive to the local disk, as commonly
occurs in preparation for an import into SQL Server No
import will be performed, as what we are trying to simulate
here is write activity outside of SQL Server
4 Database Backup job, Archive job, and File Copy job This
is a combination of all tests
Testing Notes:
• The same 1.2GB database was backed up to disk each time
• The database backup, database files, and file copy all use the
same local disk resources
• The previous backup file was purged from disk prior to each
test to keep available disk space constant and minimize any
affects of fragmentation and split IOs
• The Recovery Model for the database was set to “Simple” toavoid unexpected log flushes during the test, which can have
a big effect on write activity Although not a realisticapproach for most real world scenarios, it will make the results
a bit easier to read
• The buffer and procedure caches were purged prior to eachtest
• Verification of the backup was not performed Verification willincur significant read activity right after the backup finishessince the entire backup file must be read from disk and exam-ined Although this is a best practice and it can certainly havedisk performance implications, we are focusing on disk writeactivity for these tests
• All other processes and services which can incur heavy diskIOs were stopped prior to the tests This was done so that forthe purposes of our tests the File Copy will account for most ofthe non-SQL Server disk activity
The Performance Counters
For each test, I fired up System Monitor using the four countersbelow for the server where SQL Server is performing the backup.System Monitor was started immediately after the backup startedand stopped immediately after the backup finished to avoid skew-ing of the averages by low counter readings at either end
SQL Server:Databases: Backup/Restore Throughput/sec
The total bytes transferred to disk by the backup operation.With this counter we can see the total throughput for all back-ups or for a specific database as we will be doing
Physical Disk: Avg Disk Write Queue Length
The average number of queued read and write requests ing the interval If you see this counter spike to over 2 per diskspindle while the backup is running, it’s a strong indicator thatthe backup and/or other activity may be overloading the disksubsystem, causing incoming requests to be queued
dur-Process: IO Write Bytes/sec [sqlservr]
The total bytes being written to the disk for the SQL Serverprocess (sqlservr.exe) This is the only counter that will give usinsight into the total amount of write activity related to SQLServer For optimal backup speed this counter should almostmirror the Backup/Restore Throughput/sec counter valueswhile the backup is running If it is considerably higher thanthe Backup/Restore Throughput/sec counter and you are see-ing queued write requests during the same time period, it’s agood indicator that the backup process is contending withother SQL Server database-related operations hitting the samedisk resources
Physical Disk: Disk Write Bytes/sec
The total bytes being written to the disk per second Ideally thiscounter should also mirror the Backup/RestoreThroughput/sec counter values If it is considerably higherthan the Backup/Restore Throughput/sec counter and youare seeing queued write requests during the same time peri-
od, it may indicate that the backup process is contending withdatabase-related write operations, write activity from ETLprocesses, or write activity from other processes outside of SQLServer
Keep in mind that although we have focused on the “write” ters here,controllers and disks have limited throughput, so read activity can and will directly affect write performance and vice versa. Additionally, some of the read activity you’ll see during abackup is incurred by SQL Server reading data pages from disk
Trang 8coun-that aren’t in cache and loading it into the backup buffer In other
words, backups don’t always just write data to disk That said, you’ll
usually want to inspect some of the corresponding “read” counters
as well
Other Important Disk Counters
There are some other performance counters which we didn’t use
in our tests, but which can provide valuable insight into disk-related
performance issues
Physical Disk: Avg Disk sec/Transfer
The average number of seconds it takes for each read and write to
disk This counter can be a good measure of how much slow disk
performance is manifesting itself in slow performance for end users.Since the counter is typically a fraction of a second, it’s easiest touse it as a relative measure For example, if it’s tripling from 05 to.15 during the backup and you see a high percentage of activityrelated to SQL Server database operations using the above formu-las, it may mean end user queries are taking 3 times as long
Physical Disk: Split IO/Sec
The number of times per second a disk IO was split into multiple IOs.High readings for this counter usually indicate that the disk is frag-mented, which can directly affect the rate at which data is written
to and read from disk, and lead to queued requests
Test Results
Figure 1: Backup Only
Note that the backup throughout, total write bytes, and SQL Server write
bytes counters are almost perfectly in synch, and the queue length is
fair-ly constant This is the ideal scenario for optimal backup performance
Figure 3: Backup + File Copy
The File Copy job started at the point where the total disk write bytes and
the other counters diverged towards the left side of the graph, and
com-pleted when they converged again in the middle You can see that both
the backup throughput and SQL Server process write bytes dipped but
stayed perfectly in synch, indicating that there was no other significant
SQL Server-related write activity at the time
Figure 4: Backup + Archive + File Copy
This graph demonstrates a combination of the effects from Tests 2 and 3.The area left of center where backup throughput, total write bytes, andSQL Server process write bytes drop and diverge at the same time repre-sents backup activity, along with high levels of SQL Server and non-SQLServer write activity occurring simultaneously This is the worst-case sce-nario for good backup performance It’s no surprise that during this peri-
od queue length spiked to the highest levels of any of the tests
Figure 2: Backup + Archive
Here you can see that when the Archive job and its DML ran (a singlelarge INSERT/SELECT), backup throughout dipped automatically, andqueue length spiked This occurred because SQL Server was writing to thebackup file and the database’s data files at the same time, causing con-tention Also note that total disk write bytes and SQL Server process writebytes stayed in synch, indicating that this was the only heavy write activityoccurring on the disk at the time
Trang 9Interpreting the Results
Now, on to the fun stuff! Here are the formulas we’ll use to gauge
the percentage of write activity related to SQL Server and other
processes:
Variable Performance Counter
b SQLServer:Databases : Backup/Restore
d Physical Disk : Disk Write Bytes/sec
p Process: IO Write Bytes/sec [sqlservr]
The percentage of write activity related to the backup process (bbpp):
b
bpp == bb // dd
The percentage of write activity from SQL Server operations other
than the backup (sspp):
sspp == ((pp –– bb)) // dd
The percentage of write activity related to all other processes (oop
o
opp == ((dd pp)) // dd
For these tests bb represents Backup/Restore Throughput/sec
However, depending on what you are trying to measure, bb can be
substituted for or combined with any of these other SQL
Server:Databases counters:
• Bulk Copy Throughput/sec (multiply by 1024 since the
output is kB)
• DBCC Logical Scan Bytes/sec
• Log Bytes Flushed/sec
• Shrink Data Movement Bytes/secPerhaps the most critical counter here is Process: IO Write Bytes/sec.This is because, as mentioned previously, it is the only counter thatgives us insight into the total amount of write activity related to theSQL Server process (sqlservr.exe)
NOTE: The description for Process: IO Write Bytes/sec says that italso includes data from “network” operations as well In my testing
I have not been able to confirm this to be true for the SQL Serverprocess, as it only appears to report activity from disk writes If thiswere true, we would not be able to use it to accurately isolate diskwrites related SQL Server
For each test, at the completion of the backup job the “Average”figures for each counter were recorded from Performance Monitor(Table 2), and the formulas above were applied to determine thepercentage of write activity for each type of job (Table 3) Keep inmind that in the real world we may not be able to say that theArchive job was responsible for all of the DML activity, but in thiscase I know that there were no other significant DML operationsoccurring on the SQL Server at the time
Table 4 lists the total bytes written during each test, broken down bycategory This was calculated by multiplying the average bytes persecond figures by the total time in seconds for each test (effective-
ly the duration of the backup job) Note how the total bytes writtenfor each category matches closely with the respective file sizes(see Testing Notes) This data is not really critical for our purposessince we already know the file sizes, but it does demonstrate thatthe “average” data provided by the performance counters is high-
ly accurate, and lends some additional validation to the test results
Average Average Average Ave Write Test Jobs Disk Write Backup SQL Write Queue
Bytes/sec Bytes/sec Bytes/sec Length
4 Backup + Archive + File Copy 11,868,675 10,857,362 11,294,352 2.7
Table 2: Averages for Each Performance Counter
Test Jobs Total Backup DML non-SQL
2 Backup + Archive 1,302,721,683 1,249,620,151 50,876,681 2,224,851
3 Backup + File Copy 1,310,577,528 1,242,809,672 668,304 67,099,552
4 Backup + Archive + File Copy 1,376,766,300 1,259,453,992 50,690,840 66,621,468
Table 4: Total Bytes Written by Category
Backup Job Archive Job File Copy Job
Table 3: Percentage of the Total Bytes Transferred
Bytes Written SQL Server
% of Total Bytes Written
Trang 10After each test the actual job durations were recorded (Table 6).
The highlighted cells show the percentage increase in duration for
each job/test combination
Perhaps the most enlightening measure is the “Combined
Duration” column, which reflects the aggregate change in
dura-tion for all jobs in the test Note that for Test 4 (Backup + Archive +
Copy), combined durations increased by 67% over the combinedoptimal durations Also note that this happened as a result ofincreasing the total bytes written by less than 9%! (See Table 3, Test4) This represents a clear illustration of how even a relatively smallamount of disk contention can have a big impact on perform-ance, and prevent jobs from achieving their optimal runtimes
Resolving Schedule Contention
At this point we’ve covered how you can go about measuring disk
contention in the context of scheduled jobs You can use the
tech-niques described above whenever you aren’t sure whether or not
particular jobs are competing for resources However, monitoring
for contention is not always the first place I’d recommend you start
— if you can prevent contention from ever happening in the first
place, then why not go that route first?
At the risk of sounding melodramatic, when you think about ways
to reduce contention, it may be helpful to remember that old quote
which goes something like, “Have the serenity to accept the things
you can’t change, the courage to change the things you can, and
the wisdom to know the difference.”
In the DBA world, the things you likely can’t change are the
demands put on your SQL Servers by end users; i.e., you can’t
real-ly control when or how often they are going to run that massive
report query that brings the server to its knees and defies all
attempts at optimization
You may have already guessed what you can change — job
schedules and the associated load incurred by job collisions, of
course! You, the DBA, are usually in complete control of many, if
not all, of the job schedules And if you aren’t in complete control
because of some special business requirements, you still likely have
some influence on when and/or how often the jobs run
So, if you can muster a little courage you can usually reduce or
eliminate much of the contention that’s directly related to colliding
jobs Here’s an approach you can take to optimize (or “level”) a
server’s schedule:
1 Make a spreadsheet with all of the jobs on the server with the
associated schedule information You can query the system
tables for this (msdb sysjobs and msdb sysjobschedules) and
use the undocumented stored proc ule_description to generate the schedule descriptions, butdepending on how many jobs you have it may be easier to gointo each job’s properties and view its schedule description(s),then copy it into the spreadsheet
msdb sp_get_sched-2 Add two columns to the spreadsheet: “Notes” and “AdjustedSchedule” This is where you’ll record any special schedulingrequirements for future reference, and the schedule changes
if applicable
3 Record the duration statistics for each job in the spreadsheet.Use the query in Listing 1 to convert the duration information inmsdb sysjobhistory to minutes The Duration_68Pct andDuration_95Pct fields reflect 1 and 2 standard deviations fromthe average runtime respectively In layman’s terms, thismeans that 68% of the runtimes will fall within theDuration_68Pct value, and 95% of the runtimes will fall withinDuration_95Pct These can be more valuable measures thanthe simple average or maximum durations when you aredetermining the appropriate spacing between jobs in the fol-lowing steps Note that sometimes the 95% value will be high-
er than the maximum, in which case you may want to givemore weight to the maximum
4 Recurring, non-recurring, and “maintenance window” jobsneed to be analyzed a bit differently, so separate the jobs into
3 groups, then sort each group by “start time”:
a.Recurring jobs.These are the jobs that run multiple times
a day
b.Maintenance window jobs. These are jobs that shouldonly run during your defined maintenance window
c.Non-recurring jobs. These jobs run daily or less
frequent-ly, but fall outside of your maintenance window
Optimal Duration Duration Change % Duration Change % Duration Change % Duration Change % Test Jobs (sec) (sec) (sec) Change (sec) (sec) Change (sec) (sec) Change (sec) (sec) Change
1 Backup + Archive 105 128 +23 +22% 107 +12 +13% 21 +11 +52%
1 Backup + Archive + File Copy 117 195 +78 +67% 116 +21 +22% 22 +12 +55% 57 +45 +79%
Table 6: Duration Changes for each Job
Optimal Job Duration (sec)
Table 5: Optional Durations for each job
Combination (All Jobs)* Backup Job Archive Job File Copy Job
Trang 115 Now it’s time to record the schedule adjustments in the
spreadsheet You’ll want to firm up the schedules for the
main-tenance window and non-recurring groups first by following
these steps:
a Take a look at any jobs with special scheduling
require-ments, and make any necessary adjustments For
exam-ple, a data warehousing job that must run every day at
4am
b Next, consider jobs with dependencies on other jobs and
make any needed ordering adjustments
c Now take a look at the average, maximum, 68% and 95%
duration values for each job, and adjust the schedule
spacing accordingly to ensure there is adequate room
between the jobs to avoid any overlap
6 Now that the non-recurring jobs are settled, you can focus on
the recurring jobs Chances are many of the recurring jobs
start at 12:00:00am, which is the default Look at the
recur-rence intervals for those jobs with the same start times as well
as their duration statistics, and determine when and how often
they will collide Next adjust the start times to stagger the jobs
as best as possible to avoid the collisions See Table 7 for an
example scenario
Note only two adjustments were made:
a The start time for Job 1, which runs every 1 minute, was
moved back 30 seconds This is because it normally runs
around 30 seconds or less, so if we start it 30 seconds past
the minute, the chances of it colliding with any other
short-running jobs starting on the minute will be much
reduced
Most importantly perhaps, this includes collisions with Job
2, since it runs every 15 minutes Note Job 2’s start time
was left at 12:00:00am, and because it runs for less than
30 seconds 68% of the time, most of the time it won’t
col-lide with Job 1 It will still colcol-lide with Job 1 sometimes
when it runs long, but since Job 1 runs every minute this
really can’t be avoided If Job 1 ran less frequently than
every minute, even every 2 minutes, we could adjust its
start time to 12:01:30am, which would avoid collisions with
Job 2 95% of the time or more
b The start time for Job 3 was moved back to 12:02:00am,
meaning it will run at :02, and :32 minutes after the hour
This will effectively prevent it from colliding with Job 2 four
times per hour Also, since we moved Job 1 back 30
sec-onds, most of the contention between it and Job 3 will be
avoided
This was a relatively simple example If you have more than 3
fre-quently recurring jobs things can get more complicated very
quick-ly, especially if you have one or more jobs which run every minute
For that reason it’s usually a good idea to avoid jobs that run every
minute if at all possible, since they effectively close any gaps in the
schedule that you can “fill” with other recurring jobs
7 There’s one more step that you can take to minimize tention caused by recurring jobs, and that’s during your main-tenance window For those recurring jobs that absolutely don’thave to run during the maintenance window, you can “split”their schedules This can be done one of two ways as listedbelow For these examples we’ll use a daily maintenancewindow between 3:00am and 5:00am
con-a.Add a second schedule.(See Figure 5) First, change theend time for the original schedule to the start time of themaintenance window, 3:00am in this case, and leave theoriginal start time alone Next add the second schedulewith the same recurrence frequency as the first schedule,and for its start time use 5:00 am
NOTE: If you have any existing “collision avoidance” logic ascovered in Step 6, for the second schedule’s start time besure to add the delta between midnight and the firstschedule’s start time For example, if the first scheduledstarts at 12:00:30am, use 5:00:30am as the second sched-ule’s start time (This assumes your maintenance windowends on the hour – if not, further adjustments to the sec-ond schedule’s start time may be needed.)
b.Split the original schedule (See Figure 6) Believe it ornot, this can be done by using the start of the mainte-nance window as the end time (3:00am in this case), andthe end of the maintenance window as the start time(5:00am) When you do this, SQLAgent will automaticallyrun the job from midnight until the maintenance windowstart, then start back up again when it’s over The down-sides of this approach are that it can be a bit more diffi-cult to read the schedule, and also that it doesn’t work formore than one schedule split, which can be neededwhen intensive jobs outside of the maintenance windoware involved However, it does work with both SQL Server7.0 and 2000!
8 Finally, take the schedule adjustments from the spreadsheetand apply them to the jobs You may also want to restartSQLAgent when you’re finished I have seen cases where itdoesn’t automatically pickup every schedule change.Don’t forget to save the spreadsheet to a safe place, so youcan refer back to it whenever adding new jobs to the server.Also, since runtimes will inevitably change over time I’d rec-ommend performing a periodic review of the job duration sta-tistics and comparing them to those recorded in the spread-sheet If any have changed dramatically, you may need tomake further adjustments to keep contention to a minimum
Recurrence Job Original Start Original Adjusted Adjusted Interval
Test Time End Time Start Time End Time (minutes) Avg 68% 95% Max
Duration (minutes)
Table 7: Adjusting recurring job schedules
Trang 12Listing 1: Calculates job duration statistics in minutes
USE msdb
SELECT sysjobs.name,
COUNT(*)
AS RecCt, MIN(run_duration)
AS MinDuration, CAST(AVG(run_duration) AS decimal(9,2))
AS AvgDuration, CAST(AVG(run_duration) + STDEVP(run_duration) AS decimal(9,2))
AS Duration_68Pct, CAST(AVG(run_duration) + (STDEVP(run_duration) * 2)
AS decimal(9,2))
AS Duration_95Pct, MAX(run_duration)
AS MaxDuration FROM (
SELECT job_id,
CAST((LEFT(run_duration, 3) * 60 + SUBSTRING(run_duration, 4, 2) + CAST(RIGHT(run_duration, 2) AS decimal(2,0)) / 60)
AS decimal(9,2))
AS run_duration FROM ( SELECT job_id,
REPLICATE(‘ ‘, (7 - LEN(run_duration))) + CAST(run_duration AS varchar(7))
AS run_duration FROM sysjobhistory
WHERE sysjobhistory.step_id = 0 ) t1
cal-Conclusion
I’ve demonstrated how heavy contention for disk resourcesincurred by scheduled jobs can have significant impact on SQLServer performance, since this is one of the most common and eas-ily controllable culprits It’s important to note, however, that the phe-nomena where contention causes everything to take longer than itwould otherwise is not isolated to disk resources For example, just
as idle time causes this with disks, in the case of CPU resources itcan be high context switching Your initial approach should be thesame in most every case – first eliminate the overlap wherever pos-sible, before simply upgrading hardware as a solution to perform-ance problems
Hopefully I have armed you with some new techniques which willhelp you effectively identify and combat both job-related and gen-eral contention issues that may be impacting your performance
Figure 5: Schedule-splitting
with 2 schedules
Figure 6: Schedule-splitting with a single schedule
Greg is the architect of sqlSentry, a visual job ing and notification management system for SQL Server He is also the founder of InterCerve, a leading Microsoft focused hosting and development services firm, and the company behind sqlSentry Greg has been working with SQL Server for over 10 years.
schedul-Greg Gonzalez
Trang 13Locking is a natural part of any OLTP application However, if the
design of the applications and transactions is not done correctly,
you can run into severe blocking issues that can manifest
them-selves into severe performance and scalability issues by resulting
into contention on resources Controlling blocking in an application
is a matter of correct application design, correct transaction
archi-tecture, and a correct set of parameter settings and testing your
application under a heavy load with volume data to make sure
that the application scales well The primary focus of this article is
OLTP applications, and we will focus on locking and blocking in
applications and how to resolve the blocking conflicts
Transactions
A transaction is essentially a sequence of operation that is
per-formed as a single logical unit of work, and that logical unit of work
must adhere to the ACID properties We, as programmers, are
responsible for starting and ending transactions at points that
enforce the logical consistency of the data The ANSI standards
state the Isolation Levels for these transactions and it is the
respon-sibility of an enterprise database system, such as SQL Server, to
pro-vide mechanisms ensuring the physical integrity of each
transac-tion SQL Server provides:
• Locking facilities that preserve transaction isolation
• Logging facilities that ensure transaction durability
Even if the server hardware, operating system, or
SQL Server itself fails, SQL Server uses the transaction
logs, upon restart, to automatically roll back any
incompleted transactions to the point of the system
failure
• Transaction management features that enforce
trans-action atomicity and consistency After a transtrans-action
has started, it must be successfully completed, or SQL
Server undoes all of the data modifications made
since the transaction started
Locking prevents users from reading data being changed by other
users and prevents multiple users from changing the same data at
the same time If locking is not used, data within the database may
become logically incorrect, and queries executed against that
data may produce unexpected results Although SQL Server
enforces locking automatically, you can design applications that
are more efficient by understanding and customizing locking in
your applications
How locking is implemented decides how much concurrency
(along with performance and scalability) is allowed in the
appli-cation It is important to know and understand how to maintain the
logical unit of work and correctly manage the locks in the
appli-cation code Due to the poor appliappli-cation design, some of the
incorrect settings on the server, or poorly written transactions, the
locks can conflict with other locks leading to high number of waits
and thus slowing down the response of the system and resulting
into a non-scalable solution
Difference between Blocking
and Deadlocks
Many people confuse blocking with deadlocks Blocking and
deadlocks are two very different occurrences Blocking occurs due
to one transaction locking the resources that the other transactionwants to read or modify, usually when one connection holds a lockand a second connection requires a conflicting lock type Thisforces the second connection to wait, blocked on the first Anyconnection can block any another connection, regardless of fromwhere they emanate Most blocking conflicts are temporary innature, and will resolve themselves eventually unless you havehung transactions
Deadlocks are much worse than blocking A deadlock occurswhen first transaction has locks on the resources that second trans-action wants to modify, and the second transaction has locks onthe resources that the first transaction intends to modify So, a dead-lock is much like an infinite loop: If you let it go, it will continue indef-initely Fortunately, SQL Server has a built-in algorithm for resolvingdeadlocks It will choose one of the deadlock participants and rollback its transaction, sending the user the following message:
“Your transaction (process ID #x) was deadlocked on {lock | munication buffer | thread} resources with another process and has been chosen as the deadlock victim Rerun your transaction.”
com-Causes of Blocking
Here are the common causes of blocking:
• De-normalized Data-Model design:
More often than not, blocking problems are due to poor cation design & data-model design A transactional databasemodel should be highly normalized There are several normal-ization rules that you should adhere to when designing yourdatabase We won’t go into the details of normalization, but tosummarize the concept—you should not keep any redundantdata in your database Transactional databases should not haveany repeating columns, and each piece of data should bestored only once That way, the transactions modify lean tablesand release locks quickly Typically, adhering closely to the third-normal form works fine with a little de-normalization done attimes to improve performance
appli-• Lack of properly designed indexes:
The lack of appropriate indexes can often cause blocking lems as well If indexes are missing, then SQL Server might decide
prob-to acquire a table lock for the duration of your transaction Therest of the connections will be blocked until your transaction iscommitted or rolled back For the queries that are written using
“SELECT…WITH (UPDLOCK)”, or the Update statements & Deletestatements, make sure that you have verified the execution plan
to ensure that the access will based be based on indexes,preferably via an index seek operation Design your indexescarefully to avoid lock escalations as well
• Bad transaction design:
Poorly written transactions are by far the most common cause ofblocking Here are some scenarios that should be avoided whendesigning transactions:
a) Transactions that ask for an input value from aninterface in the middle of a transaction Imagine
L OCKING , B LOCKINGAND D EADLOCKS
By: Rahul Sharma
Trang 14the user decides to take a break in the middle of transaction The transaction will hold the locks until the user inputs the value Therefore, never ask for a user input inside a transaction
b) Keep your transactions as small as possible so that
you do not hold locks for a long time This becomes especially important in the case of SQL Server (and default isolation level of READ COM-MITTED) where-in the readers (selects) block writers and writers (DML statements like Delete, Updateand Insert) block readers
c) Submitting queries that have long execution times
A long-running query can block other queries For example, a DELETE or UPDATE operation that affects many rows can acquire many locks that, whether or not they escalate to a table lock, blockother queries For this reason, you generally do not want to mix long-running decision support queriesand online transaction processing (OLTP) queries
on the same database The solution is to look for ways to optimize the query by changing indexes,breaking a large, complex query into simpler queries, or running the query during off hours or on
a separate computer
d) One reason queries can be long-running, and
hence cause blocking, is if they inappropriately use cursors Cursors can be a convenient methodfor navigating through a result set, but using themmay be slower than set-oriented queries So, when-ever possible, try avoiding the use of cursors and make use of more set-based approach
e) Cancelled Queries: This is one of the very common
reasons for seeing locks/blocks in the system
When a query is cancelled by the application (example: like by using the sqlcancel function when using ODBC or because of query time out/lock timeout), the application also needs to issue the required number of rollback and/or com-mit statements Canceling a query or afailed/timed-out query in a transaction does notmean that the transaction will be automaticallyrolled back or committed All locks acquired with-
in the transaction are retained after the query is canceled Subsequent transactions executedunder the same connection are treated as nested transactions, so all the locks acquired in these completed transactions are not released Thisproblem repeats with all the transactions execut-
ed from the same connection until a ROLLBACK is executed As a result, a large number of locks areheld, users are blocked, and transactions are lost, which results in data that is different from what youexpect Applications must properly manage trans-action nesting levels by committing or rolling back canceled transactions
f) Transactions should be designed such that the
end user is not allowed to enter bad data for thefields; i.e., do not design an application that allows users to fill in edit boxes that generate a long-run-ning query For example, do not design an appli-cation that allows certain fields to be left blank or
a wildcard to be entered as this may cause theapplication to submit a query with an excessive
running time, thereby causing a blocking problem.These can be avoided by using client side code tocheck the values
g) If the SET LOCK_TIMEOUT value is set very high,then wait-times will increase and hence blocks willincrease Make sure that you are using reasonablevalues for this SET option and have logic in place
in the application to handle the 1222 error thatarises because of the lock timeout
h) If the application is not using parameterizedqueries, then every SQL statement will get parsed,compiled and executed each time unless it is avery simple SQL statement in which case auto-parameterization is done by SQL Server However,
in most cases it will not be able to do meterization of the SQL Statements and hence thetime taken for the SQLs will be more since it cannot re-use the execution plan That can result intolonger waits and latches A well-designed OLTP application (OLAP is different) should alwaysmake use of parameterized queries (also known
auto-para-as bind variables usage) to parse and compilesuch queries once and execute them many times.a) Nested transactions and Savepoints and proper errorchecks: Make sure that you are checking for @@tran-count In the case of nested transactions, the transac-tion is either committed or rolled back based on theaction taken at the end of the outermost transaction Ifthe outer transaction is committed, the inner nestedtransactions are also committed If the outer transac-tion is rolled back, then all inner transactions are alsorolled back, regardless of whether or not the innertransactions were individually committed So, do notassume that the inner transaction results are saved even if the outermost transaction fails
Use savepoints only in situations where errors are
unlike-ly to occur The use of a savepoint to roll back part of
a transaction in the case of an infrequent error can bemore efficient than having each transaction test to see
if an update is valid before making the update.Updates and rollbacks are expensive operations, so savepoints are effective only if the probability of encountering the error is low and the cost of checking the validity of an update beforehand is relatively high
In T-SQL code, have proper error checks in the code after every statement Depending upon the error, the transaction may or may not abort and the codeshould take care of those scenarios
• Bad use of locking hints or query hints:
Inappropriate use of locking hints can be yet another cause ofblocking If you force SQL Server to acquire 50000 row level locks,your transaction might have to wait until other transactions com-plete, and this many locks are available
Most commonly used locking hints are: ROWLOCK, UPDLOCK,NOLOCK and READPAST Be very careful when you are usingthem and understand how your application works before start-ing to use them
The other query hints like “FAST n”, “FORCE ORDER” etc., tially the join hints, index hints, view hints, table hints need to bejudiciously used You should know and test all flows in the appli-cation with volume data (and multi-user scenarios) before put-ting that code into production
Trang 15essen-• Configuration options for the Instance:
Most often, the default options for lock configuration are ok so
these should be considered only when everything else has been
done
a) Locks Option: Use the locks option to set the
maxi-mum number of available locks, thereby limiting
the amount of memory SQL Server uses for locks
The default setting is 0, which allows SQL Server to
allocate and deallocate locks dynamically based
on changing system requirements When the
serv-er is started with locks set to 0, the lock managserv-er
allocates two percent of the memory allocated to
SQL Server to an initial pool of lock structures As
the pool of locks is exhausted, additional locks are
allocated The dynamic lock pool does not
cate more than 40 percent of the memory
allo-cated to SQL Server Each lock consumes 96 bytes
of memory, hence increasing this value can
require an increase in the amount of memory
dedicated to the server
b) Customizing locking for indexes: Changing the
index behavior by using the sp_indexoption
pro-cedure: You can change the lock escalation
behavior for the indexes by using sp_indexoption
procedure and disallow the page level locks
c) Query Wait option: Memory-intensive queries,
such as those involving sorting and hashing, are
queued when there is not enough memory
avail-able to run the query The query times out after a
set amount of time calculated by SQL Server (25
times the estimated cost of the query) or the time
amount specified by the non-negative value of
the query wait A transaction containing the
wait-ing query may hold locks while the query waits for
memory Decreasing the query wait time lowers
the probability of such deadlocks Eventually, a
waiting query will be terminated and the tion locks released However, increasing the maxi-mum wait time may increase the amount of timefor the query to be terminated Changes to thisoption are not typically recommended andshould be done only when absolutely necessary
transac-d) Memory configuration: Typically, you would want
to let SQL Server manage memory using dynamic memory management configuration If at all, you have to play with the “min server memory” and
“max server memory” options, do so judiciously Ifenough memory is not available, then the memo-
ry needs of the Lock-Manager will not be satisfied leading to waits and blocks
e) Other advanced configuration options to look into are: “Max Degree of Parallelism”, “query governorcost limit”, “AWE” The discussion on those is out ofscope for this article In a future article, I will coverthe advanced options and how they effect SQL Server configuration
• Badly written queries:
There is really no substitute to a well-written application Pleasemake sure that you are using well-tuned SQL queries in yourapplication Run them through a benchmark database with rea-sonable and representative amount of data and test with differ-ent conditions Trace the application code out using SQL ServerProfiler and use the SET commands in Query Analyzer to look intothe execution plan the I/O associated with the SQLs and tunethem
• Usage of in-correct Isolation Levels:
Understanding the most appropriate isolation for your tion is important – for both concurrency and performance whilestill maintaining the appropriate level of accuracy The concept
applica-of Isolation Level is not new – in fact, details regarding the ANSIspecifications for Isolation can be found on: www.ansi.organdthe current specification to review is ANSI INCITS 135-1992 (R1998)
Isolation Dirty Read Non-repeatable Read Phantom (Possible Level (Possible Phenomena) (Possible Phenomena) Phenomena)
The application usage for each of the above varies based on the desired level of “correctness”
and the trade-off chosen in performance and administrative overhead
Isolation Levels
Trang 16• Deciding the concurrency model:
Deciding which concurrency model to use for your application
is very critical You need to fully understand what each
concur-rency model does and in what scenarios they can and should
be used before deciding on what changes you need to make in
the application
a.) Pessimistic Concurrency Model:
Pessimistic concurrency control locks resources as theyare required, for the duration of a transaction Unlessdeadlocks occur, a transaction is assured of successfulcompletion Under a pessimistic concurrency control-based system, locks are used to prevent users from mod-ifying data in a way that affects other users Once alock has been applied, other users cannot performactions that would conflict with the lock until the ownerreleases it This level of control is used in environmentswhere there is high contention for data and where thecost of protecting the data using locks is less than thecost of rolling back transactions if/when concurrencyconflicts occur
When pessimistic locking (the ANSI standard for transaction
iso-lation) is used, applications typically exhibit blocking
Simultaneous data access requests from readers and writers
within transactions request conflicting locks; this is entirely normal
and provided the blocking is short lived, not a significant
per-formance bottleneck This can change on systems under stress,
as any increase in the time taken to process a transaction (forexample delays caused by over-utilized system resources such
as disk I/O, RAM or CPU as well as delays caused by poorly ten transaction such as those with user interaction) can have adisproportional impact on blocking – the longer the transactiontakes to execute, the longer locks are held and the greater thelikelihood of blocking
writ-b) Optimistic Concurrency Model:
Optimistic concurrency control works on the assumptionthat resource conflicts between multiple users areunlikely (but not impossible), and allows transactions toexecute without locking any resources Only whenattempting to change data are resources checked todetermine if any conflicts have occurred If a conflictoccurs, the application must read the data and attemptthe change again Under an optimistic concurrencycontrol-based system, users do not lock data when theyread it Instead, when an update is performed the sys-tem checks to see if another user changed the dataafter it was read If another user updated the data, anerror is raised Typically, the user receiving the error rollsback the transaction, resubmits (application/environ-ment dependant) and/or starts over This is called opti-mistic concurrency because it is mainly used in environ-ments where there is low contention for data, and where
Read uncommitted The application does not require absolute accuracy of data (and could
get a larger/smaller number than the final value) and wants performance
of OLTP operations above all else No version store, no locks acquired, nolocks are honored Data accuracy of queries in this isolation may seeuncommitted changes
Read committed The application does not require point-in-time consistency for long running
aggregations or long-running queries yet wants data values which areread to be only transactionally consistent The application does not want the overhead of the version store at the trade-off of potential incorrectnessfor long running queries because of non-repeatable reads
Repeatable read The application requires absolute accuracy for long running
multi-state-ment transactions and must hold all requested data from other tions until the transaction completes The application requires consistencyfor all data which is read repeatedly within this transaction and requires that no other modifications are allowed – this can impact concurrency in
modifica-a multi-user system if other trmodifica-ansmodifica-actions modifica-are modifica-attempting to updmodifica-ate dmodifica-atmodifica-a thmodifica-athas been locked by the reader This is best when the application is relying
on consistent data and plans to modify it later within the same transaction
Serializable The application requires absolute accuracy for long running
multi-state-ment transactions and must hold all requested data from other tions until the transaction completes Additionally, the transactions are requesting sets of data and not just singleton rows Each of the sets mustproduce the same output at each request within the transaction and withmodifications expected no other users can modify not only the data whichhas been read but must prevent new rows from entering the set This is best when the application is relying on consistent data, plans to modify it later within the same transaction, requires absolute accuracy and data consis-tency – even at the end of the transaction (within the active data)
modifica-Isolation Level Best Suited For An Application When:
Isolation Level and Application Best Suited
Trang 17the cost of occasionally rolling back a transaction
out-weighs the costs of locking data when read
This can be implemented in SQL Server by either using a
rowver-sion (timestamp) data-type or by using an integer for doing the
row-versioning While updating the record, the client session
updates based on the primary key column(s) and the integer
column’s old value and also increments the value by 1
The client reads the row with the current value for the column but
does not maintain any locks in the DB At some later time, when
the client wants to update the row, it must ensure that no other
client has updated that record in the interim….that is done by
including the old value clause statement in the where clause
The second part of the where clause provides the “locking” If
some other client has updated the record, then the where
clause will fail to pick any rows The client then uses this signal
(zero row update) as an indication of lock failure and can then
choose to re-read the data or ask the end user to re-do their
work If the contention for the data is less, this will be suited for
your business This concurrency approach does provide the most
concurrent approach
c) Disconnected/”logical lock” Model:
Besides these 2 concurrency options that are used
99.99% of the time in the applications, people som
etime come up with different ways of imple
menting concurrency control in their applications For
instance, using logical locks by one connection’s
trans-action updating a column in a table with a “in-use”
value & by updating a datetime column with the time
when the update was made and then before another
connection tries to make updates to that record, it
checks that column for that special “in-use” value and
checks for how long that record has been in use (by
using system date – the datetime column value), and
based on a threshold (typically time-out setting) of the
time that lock has existed, it either overrides that lock or
just gives back a lock timeout message after a specified
wait interval
I have seen people using this approach in applications
where the UI is designed to take too much of user data
and using optimistic/pessimistic is ruled out according
to them The design of the screens themselves is an issue
in such scenarios and people don’t realize that UI
appli-cations do not work the same way as typical
connect-ed client-server applications
Such an approach is fraught with dangers since till the
threshold is not reached, that record is not available
anymore and in itself violates the basic rules of
transac-tions since the transactransac-tions are not split across a period
of time not to mention the effect that this approach has
on concurrency in a heavily loaded multi-user scenario
application Whenever you see such an application
scenario, you can re-design it to use optimistic
concur-rency and scripting logic
• Orphaned Sessions:
An orphaned session is a session that remains open on the
serv-er side aftserv-er the client has disconnected Do not confuseorphaned sessions with orphaned users Orphaned users arecreated when a database is backed up and restored to anoth-
er system that does not have a corresponding user account figured Orphaned sessions occur when the client is unable tofree network connections it is holding when it terminates If theclient terminates cleanly, Windows closes the connection andnotifies SQL Server If SQL Server is processing a client command,
con-it will detect the closed connection when con-it ends the session.Client applications that crash or have their processes killed (forexample, from Task Manager), are cleaned up immediately byWindows NT, rarely resulting in an orphaned session
One common cause of orphaned sessions arises when a clientcomputer loses power unexpectedly, or is powered off withoutperforming a proper shut down Orphaned sessions can alsooccur due to a “hung” application that never completely termi-nates, resulting in a dead connection Windows will not knowthat the connection is dead and will continue to report theaction as active to SQL Server SQL Server, in turn, keeps the ses-sion open and continues to wait for a command from the client
Issues with Orphaned Sessions:
Open sessions take up one of the SQL Server network tions The maximum number of connections is limited by thenumber of server CALs; therefore, orphaned sessions may pre-vent other clients from connecting
connec-Typically, a more important issue is that open sessions use serverresources and may have open cursors, temporary tables, orlocks These locks may block other connections from performinguseful work, and can sometimes be the result of a major “pile up”
of locks In severe cases, it can appear that SQL Server hasstopped working
Resolutions:
sysprocesses (or stored procedures, such as sp_who/sp_who2)reports information on existing server sessions Possible orphanedsessions can be identified if the status of a process is “awaitingcommand” and the interval of time found by subtractinglast_batch from GETDATE() is longer than usual for the process Ifthe session hostname is known to be down, it is orphaned
Windows checks inactive sessions periodically to ensure they areactive If a session does not respond, it is closed and SQL Server
is notified The frequency of the checking depends on the work protocol and registry settings However, by default, Windows
net-NT only performs a check every one or two hours, depending onthe protocol used These configuration settings can be changed
in the registry
To close an orphaned SQL Server session, use the KILL command.All resources held by the session are then released If orphanedsessions become a problem, registry settings can be changed
on Windows to increase the frequency that clients are checked
to verify they are active Changing these settings affects otherapplication connections and the following points should be con-sidered before making any changes
Resolving issues through the Query Analyzer:
At times, the orphaned session could be created by the cation and this session will be holding locks on some tables andhence other users will not be able to query these tables or write
appli-to these tables The download contains code that you can use appli-tofind the orphaned sessions and resolve issues: A spid value of –2
is an indicator of connectionless, or orphaned transactions You
Trang 18can identify this is the sp_lock(spid column), sp_who or sp_who2
(blk column), syslockinfo and the sysprocesses table
There are two things that you may notice:
1.) The spid value of –2 In this case you will have to
use Kill UOW to kill the process UOW is the Unit ofWork of the DTC transaction and can be obtained from the syslockinfo table
2.) The last statement that was being executed by the
process is sp_cursorunprepare or sp_unpreparewhich are API cursors The situation in this case could be that the application forgot to clean upthe session in the case of an error condition, hence creating an orphaned session lying on theserver In this case, you will have to use kill spid toterminate the process (spid is the system process idfor that particular process) And then fix the appli-cation code to close out the session in case of anerror
You can also use the KILL spid/UOW with statusonly to check the
status of the kill statement
SQL Server caveats for the Oracle DBA:
Unlike Oracle, SQL Server 2000 does not implement multi-version
concurrency model (this is going to change in SQL Server 2005,
code-named “Yukon”) In SQL Server 2000, readers and writers
block each other in the default READ COMMITTED transaction
iso-lation level This comes as a big surprise to Oracle DBAs and that is
one of the reasons why when applications that were written with
Oracle in mind are ported to SQL Server 2000, they exhibit
signifi-cant concurrency issues Understanding the transactions and lock
architecture in SQL Server 2000 and modification to the application
code is needed in order to scale such an application on SQL
Server
I will be covering the new Isolation Levels in SQL Server 2005:
Snapshot and Read Committed Snapshot in another article which
makes porting such an application to SQL Server a breeze
Troubleshooting & Resolving blocking:
Most of the issues that we talked about in the article so far talked
about what the blocking problems are, what causes them, and
how to mitigate such issues at design time If proper considerations
are taken at design time, then you can develop a very robust
sys-tem that scales very well
However, regardless of that, you will run into some blocking issues
at the time of deployment of the application under heavy load
and under heavy multi-user scenarios That is just the nature of the
beast Also, if you are inheriting an application in which good
design considerations were not adhered to or if you are a
consult-ant and have been asked to find out and fix the application, then
you need to know how to detect the application/db locking issues
and how to resolve them The next few paragraphs will detail such
scripts and will give you links to some Microsoft KB articles as well
that will help you in detecting such issues
Detecting Issues:
You should use SQL Server Profiler and T-SQL scripts in order to
detect and log the locking and blocking issues in your
SyslockinfoSp_who/Sp_who2System objects to get meta-data information like sysob-jects, syscolumns etc
DBCC INPUTBUFFER commandThere is a sp_blocker_pss80 procedure published by theMicrosoft PSS team on this and there are many variants
of the same that are used by DBAs around the world.Here is that link:
us;271509
http://support.microsoft.com/default.aspx?scid=kb;en-b) SQL Server Profiler:
A sample profiler template is shown below and can beused as the starting point with additional filters as peryour application needs The same can be used fordeadlocks as well if you want to run it for extended timeand know that you will be able to trace the deadlockevent in case it happens (deadlocks will be covered inpart-II of this article) Modify this trace template toselect/remove the events and data-columns as per yourapplication needs
Always run the SQL Server Profiler trace from a clientmachine rather than running it directly on the produc-tion server Be aware that profiler is just a GUI tool andthe trace is really a server side trace It just gives yougood visibility into the data in a GUI format You can alsoscript server side traces for detecting issues I had written
an article before ( link) that shows how to use serverside scripting for traces Once you are done with thetrace, you can then save the output of the trace file into
a SQL Server table and directly query the data from thattable to diagnose the flow of events and the issues.Alternatively, you can also directly query the trace filesusing the fn_trace_gettable function (look up BOL formore information on it)
Resolving blocking issues:
a) Application Considerations: If you deduce that theblocking is caused by a poorly written transaction, try to rewrite it Often, a single transaction mightbring the entire application to its knees Othertimes, you’ll have to review many (or all) stored procedures of your application before you can resolve problems
If you see many table locks in your database, you mightwant to evaluate your indexing strategy The blockingproblems can often be resolved by adding appropriate