1. Trang chủ
  2. » Công Nghệ Thông Tin

SQL Server Standard pptx

36 190 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề SQL Server Standard pptx
Trường học Not specified
Chuyên ngành Computer Science
Thể loại Journal
Năm xuất bản 2005
Thành phố Fleming Island
Định dạng
Số trang 36
Dung lượng 2,5 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

T ABLE OF C ONTENTSThis Isn’t An Issue That Is Likely To Go Away Soon An examination of disk contention and performance with multiple jobs running at the same time.. Introduction Heavy c

Trang 1

Visit us on the World Wide Web at www.sqlservercentral.com

A t e c h n i c a l J o u r n a l f o r t h e S Q L S e r v e r C e n t r a l c o m a n d P A S S c o m m u n i t i e s

Trang 3

A publication of The Central Publishing Group

Frank Scafidi Rob Anderson

Typesetting and Layout:

Subscriptions and address changes:

For subscription and address changes, email

subscriptions@sqlservercentral.com For renewals, you can

extend your subscription at ww.sqlservercentral.com/store.

Feedback:

editor@sqlserverstandard.com

Copyright

Unless otherwise noted, all programming code and

arti-cles in this issue are the exclusive copyright of The Central

Publishing Group Permission to photocopy for internal or

personal use is granted to the purchaser of the magazine.

SQL Server Standard is an independent publication and is

not affiliated with Microsoft Corporation Microsoft

Corporation is not responsible in any way for the editorial

policy or other contents of this publication SQL Server,

ADO.NET, Windows, Windows NT, Windows 2000 and Visual

Studio are registered trademarks of Microsoft Corporation.

Rather than put a trademark symbol in each occurrence

of other trademarked name, we state that we are using the

names only in an editorial fashion with no intention of

infringement of the trademark Although all reasonable

attempts are made to ensure accuracy, the publisher does

not assume any liability for errors or omissions anywhere in

this publication It is the reader’s responsibility to ensure

that the procedures are acceptable in the reader’s

envi-ronment and that proper backup is created before

imple-menting any procedures.

SQLServerCentral.com Staff:

Brian Knight, President

Steve Jones, Chief Operating Officer

Andy Warren, Chief Technology Officer

You can reach the SQL Server Standard at:

To that end we’ve included a look at a variety of ance related topics We have a great article on disk con- tention with multiple tasks running While the article looks at scheduled jobs that may conflict, any large processes, scheduled or not, might have similar issues Greg Gonzalez, architect of sqlSentry, has written a fantastic reference about your disk system and one that you should use to examine the periodic slow performance of any server, look- ing for overlapping processes.

perform-We also have noted author Rahul Sharma’s look at locking, blocking, and deadlocks This is one that will probably teach anyone something about this fundamental database process; I know I learned a couple things when reading it.

We also examine the performance of GUIDs, a topic that I have not seen anything about, despite the fact that Microsoft pushes their use Sean McCown presents his research and some benchmarks on their use in comparison with integers and the identity property.

We have a couple of security related topics as well this time One very detailed look at the various ways you can discov-

er all those hidden SQL Servers on your network by using ious tools, written by Alan Miner as well as a good introduc- tion to SQL Injection from Dinesh Asanka Our best wishes to Dinesh, his family, and friends as they cope with the tsunami damage in Sri Lanka He’s OK, but there is still a lot to deal with and get past.

var-Lastly we have Randy Dyess of www.transactsql.com with a fantastic explanation of why you should have clustered indexes on your tables He’s taken a look at the perform- ance impacts of forwarding pointers, something else that doesn’t seem to ever have been tackled on the web.

This has been an interesting issue and one that’s definitely taught me a thing or two Hopefully you’ll enjoy it and take something away as well that you can use to make your sys- tems run a little smoother.

And your phone a little quieter.

Steve Jones

Trang 5

T ABLE OF C ONTENTS

This Isn’t An Issue That Is Likely To Go Away Soon

An examination of disk contention and performance with multiple jobs running at the same time.

By Greg Gonzalez

L OCKING , B LOCKING AND D EADLOCKS •13

It Is Important To Know And Understand How To Maintain The Logical Unit Of Work

A detailed explanation of locks and blocks that can occur on a SQL Server An examination of causes and

potential ways to avoid issues By Rahul Sharma

D ISCOVERING SQL S ERVERS •20

My General Search Strategy Is To Find All The MSS Candidates

Using A Variety Of Sources And Techniques.

A look at finding and identifying SQL Servers on your networks using a variety of tools By Alan Miner

So, In Short, Don’t Let Your Developers Use GUID’S

Some analysis on the performance impacts (bad) of using GUIDs for a primary key v integers

with the identity properties By Sean McCown

W HAT ’ S W RONG W ITH G UID ’ S •25

Even If Uniqueidentifiers Aren’t unique, they’re damned sight more unique

than integers.

An alternative point of view on why GUID have a time and place in your database.

F ORWARDING P OINTERS •26

Logically, Forwarding Pointers Should Mean That

SQL Server Has To Read Extra Data Pages

A detailed examination of clustered indexes and forwarding pointers with the performance

impact on data retrieval By Randy Dyess

I S Y OUR D ATABASE S ECURE ? •28

They Are Like Guerrilla War Fighters; One Tiny Fault Is More Than Enough

For Them To Create A Mess

A look at SQL Injection potential problems in your applications By Dinesh Asanka

P ASS •31

A featured interview with Rony Ross By Steve Mong

Trang 6

S CHEDULED J OBS

AND D ISK C ONTENTION

By: Greg Gonzalez

InterCerve, Inc.

Introduction

Heavy contention for disk resources can dramatically impact SQL

Server performance, and SQL Server Agent jobs can be some of

the biggest offenders In this article I’ll cover how and why jobs and

job collisions can cause disk contention, how to isolate the sources

of disk contention using Windows performance counters and

ana-lyze the data using some simple formulas, and then I’ll present a

process to reduce contention in general via “leveling” your job

schedules

Jobs and Disk Contention

It’s important to remember that disk contention is only one type of

resource contention that can affect SQL Server performance, but it

is a significant one Likewise, the role jobs can play in this regard is

significant enough to merit the focus of this article

As you know, jobs can perform all kinds of operations, including but

not limited to database maintenance activities (index rebuilds,

backups, integrity checks, etc.), ETL (import/export) processes such

as those using DTS and BCP, and many other operations that tend

to read and write large amounts of data to or from disk In an ideal

world every database would have separate disk controllers anddisk arrays to handle its data files, index files, transaction logs, back-ups, etc But disk hardware is costly and multiple servers + data-base server licenses are often times out of the question, so this isoften the exception rather than the rule, and it seems as if there arenever enough disk resources to go around If that’s the case in yourenvironment and your jobs are doing work outside of SQL Serverand utilizing the same physical disk resources used by your data-base files, major performance headaches can result

In addition to disk “resource sharing”, compounding the problem isthat the native tools make it all too easy to create SQLAgent jobswith overlapping, or “colliding”, schedules For example, over timeyou may end up with:

• a transaction log backup running every hour on the hour

• a DTS import which runs every 30 minutes

• an data archive job running every 15 minutes

• an index defrag job which runs nightly at 4am

• In this case we have several recurring collisions:

Collision Job Collisions # of Distinct Distinct Recurrence Collision per Day Collisions Coll Per Day

Every 24 hours Data Archive DTS Import Trans Log Index Defrag 1 6 6

Total Distinct Collisions per Day 99

Table 1: Job Collisions Example

Collisions per Day: The total number of times the jobs will collide

each day based upon their schedules

# of Distinct Collisions: This is the total number of distinct collisions

for each combination of jobs For example, every hour the Data

Archive job will collide with the DTS Import job and the Transaction

Log job (2 collisions), and the DTS Import Job will collide with the

Transaction Log job (1 collision), for a total of 3 distinct collisions

Distinct Collisions per Day: Collisions per Day multiplied by # of

Distinct Collisions

Total Distinct Collisions per Day: This is calculated by summing

Distinct Collisions per Day (126), then eliminating duplicates by

backing out the collisions which have already been accounted for

in one of the previous collision combinations: 48 + (24*2)+(1*3) = 99

So with only four jobs, we actually have a total of 99 distinct

colli-sions per day! Needless to say, most SQL Servers have more than 4

jobs, so it’s likely your collision total is higher than this In addition,

SQL Server 2005 actually introduces the concept of “shared

sched-ules”, where multiple jobs can reuse the exact same schedules!

That said, this isn’t an issue that is likely to go away anytime soon

So why are schedule collisions a problem? Because, for the sons mentioned above, the result is often disk contention, whichleads directly to the phenomena where the aggregate durationand performance impact of simultaneously processed tasks will begreater than the duration and impact of the same tasks processedindependently

rea-Disk contention happens because when reading from and writing

to multiple files and disk sectors on the same physical diskresources simultaneously, the operating system and disk subsystemhave to do a lot of extra work Extra seeks and platter rotationsresult, during which time no read/write activity can occur To put itanother way, disk controllers and disks have limited throughput, sowhen the subsystem is overloaded requests end up being queued,which causes disk transfer delays As a result:

• Jobs can’t achieve their optimal runtimes This can lead

to system slowdowns, maintenance window overruns,among other problems

• Application-related DML activity takes longer, manifesting

in delays for end users

Trang 7

• If the contention is severe enough, “buffer latch timeout”

and other errors can occur

• Because of all of the extra ongoing work, lifespan for your

disk resources can be dramatically shortened

If nothing is done about it, what can result is a compounding effect

where everything tends to run slower, and you end up with

frustrat-ed users as well as premature hardware/software upgrades

because you aren’t able to get the most out of available resources

Measuring Disk Contention

Part of the problem in isolating disk contention issues is that it can

be a challenge using the available Windows performance

coun-ters to determine exactly what is happening with disk performance,

isolate the processes involved, and figure out what they are doing

This is because many of the disk counters are “general” in nature in

that they reflect the total activity on a server or disk

What is needed is a way to interpret the general counter data in

order to determine the source of the activity Fortunately, with some

of the counters we can isolate activity directly related to the various

SQL Server processes From there we can calculate percentages

of activity related to SQL Server as well as other processes With

these counters and a few simple formulas, you can gain greater

insight into the counter data to determine if disk bottlenecks are

causing performance problems for your SQL Servers

Simulating Contention

If you’re like me, your databases have only grown larger over time,

but your maintenance windows haven’t This has heightened the

need to optimize backup performance to ensure they always run

as quickly as possible, and avoid contending with other activities

on the SQL Server Products such as compressed backup software

have become one of a DBA’s best tools to combat backup size

and speed issues But even with compressed backups, disk

con-tention can still be an issue

So for these tests we will look at how a database backup job can

fight with other jobs for disk resources, ending up with less than

opti-mal performance for all processes involved For each test I’ll use a

SQL Agent job which performs a standard non-compressed

data-base backup to a local disk, a common scenario Then to create

the disk contention conditions, I’ll combine the backup with other

write-intensive jobs, for a total of four separate tests

1 Database backup job only.

2 Database backup job, plus an “Archive” job performing

heavy DML activity The job simulates an “archive” function

by copying approximately 50MB of data in 700,000 rows using

an INSERT/SELECT This occurs in another database on the

same SQL Server

3 Database backup job, plus a “File Copy” job.This involved

using a job with a CmdExec step and xcopy to copy a 62MB

ASCII file from a network drive to the local disk, as commonly

occurs in preparation for an import into SQL Server No

import will be performed, as what we are trying to simulate

here is write activity outside of SQL Server

4 Database Backup job, Archive job, and File Copy job This

is a combination of all tests

Testing Notes:

• The same 1.2GB database was backed up to disk each time

• The database backup, database files, and file copy all use the

same local disk resources

• The previous backup file was purged from disk prior to each

test to keep available disk space constant and minimize any

affects of fragmentation and split IOs

• The Recovery Model for the database was set to “Simple” toavoid unexpected log flushes during the test, which can have

a big effect on write activity Although not a realisticapproach for most real world scenarios, it will make the results

a bit easier to read

• The buffer and procedure caches were purged prior to eachtest

• Verification of the backup was not performed Verification willincur significant read activity right after the backup finishessince the entire backup file must be read from disk and exam-ined Although this is a best practice and it can certainly havedisk performance implications, we are focusing on disk writeactivity for these tests

• All other processes and services which can incur heavy diskIOs were stopped prior to the tests This was done so that forthe purposes of our tests the File Copy will account for most ofthe non-SQL Server disk activity

The Performance Counters

For each test, I fired up System Monitor using the four countersbelow for the server where SQL Server is performing the backup.System Monitor was started immediately after the backup startedand stopped immediately after the backup finished to avoid skew-ing of the averages by low counter readings at either end

SQL Server:Databases: Backup/Restore Throughput/sec

The total bytes transferred to disk by the backup operation.With this counter we can see the total throughput for all back-ups or for a specific database as we will be doing

Physical Disk: Avg Disk Write Queue Length

The average number of queued read and write requests ing the interval If you see this counter spike to over 2 per diskspindle while the backup is running, it’s a strong indicator thatthe backup and/or other activity may be overloading the disksubsystem, causing incoming requests to be queued

dur-Process: IO Write Bytes/sec [sqlservr]

The total bytes being written to the disk for the SQL Serverprocess (sqlservr.exe) This is the only counter that will give usinsight into the total amount of write activity related to SQLServer For optimal backup speed this counter should almostmirror the Backup/Restore Throughput/sec counter valueswhile the backup is running If it is considerably higher thanthe Backup/Restore Throughput/sec counter and you are see-ing queued write requests during the same time period, it’s agood indicator that the backup process is contending withother SQL Server database-related operations hitting the samedisk resources

Physical Disk: Disk Write Bytes/sec

The total bytes being written to the disk per second Ideally thiscounter should also mirror the Backup/RestoreThroughput/sec counter values If it is considerably higherthan the Backup/Restore Throughput/sec counter and youare seeing queued write requests during the same time peri-

od, it may indicate that the backup process is contending withdatabase-related write operations, write activity from ETLprocesses, or write activity from other processes outside of SQLServer

Keep in mind that although we have focused on the “write” ters here,controllers and disks have limited throughput, so read activity can and will directly affect write performance and vice versa. Additionally, some of the read activity you’ll see during abackup is incurred by SQL Server reading data pages from disk

Trang 8

coun-that aren’t in cache and loading it into the backup buffer In other

words, backups don’t always just write data to disk That said, you’ll

usually want to inspect some of the corresponding “read” counters

as well

Other Important Disk Counters

There are some other performance counters which we didn’t use

in our tests, but which can provide valuable insight into disk-related

performance issues

Physical Disk: Avg Disk sec/Transfer

The average number of seconds it takes for each read and write to

disk This counter can be a good measure of how much slow disk

performance is manifesting itself in slow performance for end users.Since the counter is typically a fraction of a second, it’s easiest touse it as a relative measure For example, if it’s tripling from 05 to.15 during the backup and you see a high percentage of activityrelated to SQL Server database operations using the above formu-las, it may mean end user queries are taking 3 times as long

Physical Disk: Split IO/Sec

The number of times per second a disk IO was split into multiple IOs.High readings for this counter usually indicate that the disk is frag-mented, which can directly affect the rate at which data is written

to and read from disk, and lead to queued requests

Test Results

Figure 1: Backup Only

Note that the backup throughout, total write bytes, and SQL Server write

bytes counters are almost perfectly in synch, and the queue length is

fair-ly constant This is the ideal scenario for optimal backup performance

Figure 3: Backup + File Copy

The File Copy job started at the point where the total disk write bytes and

the other counters diverged towards the left side of the graph, and

com-pleted when they converged again in the middle You can see that both

the backup throughput and SQL Server process write bytes dipped but

stayed perfectly in synch, indicating that there was no other significant

SQL Server-related write activity at the time

Figure 4: Backup + Archive + File Copy

This graph demonstrates a combination of the effects from Tests 2 and 3.The area left of center where backup throughput, total write bytes, andSQL Server process write bytes drop and diverge at the same time repre-sents backup activity, along with high levels of SQL Server and non-SQLServer write activity occurring simultaneously This is the worst-case sce-nario for good backup performance It’s no surprise that during this peri-

od queue length spiked to the highest levels of any of the tests

Figure 2: Backup + Archive

Here you can see that when the Archive job and its DML ran (a singlelarge INSERT/SELECT), backup throughout dipped automatically, andqueue length spiked This occurred because SQL Server was writing to thebackup file and the database’s data files at the same time, causing con-tention Also note that total disk write bytes and SQL Server process writebytes stayed in synch, indicating that this was the only heavy write activityoccurring on the disk at the time

Trang 9

Interpreting the Results

Now, on to the fun stuff! Here are the formulas we’ll use to gauge

the percentage of write activity related to SQL Server and other

processes:

Variable Performance Counter

b SQLServer:Databases : Backup/Restore

d Physical Disk : Disk Write Bytes/sec

p Process: IO Write Bytes/sec [sqlservr]

The percentage of write activity related to the backup process (bbpp):

b

bpp == bb // dd

The percentage of write activity from SQL Server operations other

than the backup (sspp):

sspp == ((pp –– bb)) // dd

The percentage of write activity related to all other processes (oop

o

opp == ((dd pp)) // dd

For these tests bb represents Backup/Restore Throughput/sec

However, depending on what you are trying to measure, bb can be

substituted for or combined with any of these other SQL

Server:Databases counters:

• Bulk Copy Throughput/sec (multiply by 1024 since the

output is kB)

• DBCC Logical Scan Bytes/sec

• Log Bytes Flushed/sec

• Shrink Data Movement Bytes/secPerhaps the most critical counter here is Process: IO Write Bytes/sec.This is because, as mentioned previously, it is the only counter thatgives us insight into the total amount of write activity related to theSQL Server process (sqlservr.exe)

NOTE: The description for Process: IO Write Bytes/sec says that italso includes data from “network” operations as well In my testing

I have not been able to confirm this to be true for the SQL Serverprocess, as it only appears to report activity from disk writes If thiswere true, we would not be able to use it to accurately isolate diskwrites related SQL Server

For each test, at the completion of the backup job the “Average”figures for each counter were recorded from Performance Monitor(Table 2), and the formulas above were applied to determine thepercentage of write activity for each type of job (Table 3) Keep inmind that in the real world we may not be able to say that theArchive job was responsible for all of the DML activity, but in thiscase I know that there were no other significant DML operationsoccurring on the SQL Server at the time

Table 4 lists the total bytes written during each test, broken down bycategory This was calculated by multiplying the average bytes persecond figures by the total time in seconds for each test (effective-

ly the duration of the backup job) Note how the total bytes writtenfor each category matches closely with the respective file sizes(see Testing Notes) This data is not really critical for our purposessince we already know the file sizes, but it does demonstrate thatthe “average” data provided by the performance counters is high-

ly accurate, and lends some additional validation to the test results

Average Average Average Ave Write Test Jobs Disk Write Backup SQL Write Queue

Bytes/sec Bytes/sec Bytes/sec Length

4 Backup + Archive + File Copy 11,868,675 10,857,362 11,294,352 2.7

Table 2: Averages for Each Performance Counter

Test Jobs Total Backup DML non-SQL

2 Backup + Archive 1,302,721,683 1,249,620,151 50,876,681 2,224,851

3 Backup + File Copy 1,310,577,528 1,242,809,672 668,304 67,099,552

4 Backup + Archive + File Copy 1,376,766,300 1,259,453,992 50,690,840 66,621,468

Table 4: Total Bytes Written by Category

Backup Job Archive Job File Copy Job

Table 3: Percentage of the Total Bytes Transferred

Bytes Written SQL Server

% of Total Bytes Written

Trang 10

After each test the actual job durations were recorded (Table 6).

The highlighted cells show the percentage increase in duration for

each job/test combination

Perhaps the most enlightening measure is the “Combined

Duration” column, which reflects the aggregate change in

dura-tion for all jobs in the test Note that for Test 4 (Backup + Archive +

Copy), combined durations increased by 67% over the combinedoptimal durations Also note that this happened as a result ofincreasing the total bytes written by less than 9%! (See Table 3, Test4) This represents a clear illustration of how even a relatively smallamount of disk contention can have a big impact on perform-ance, and prevent jobs from achieving their optimal runtimes

Resolving Schedule Contention

At this point we’ve covered how you can go about measuring disk

contention in the context of scheduled jobs You can use the

tech-niques described above whenever you aren’t sure whether or not

particular jobs are competing for resources However, monitoring

for contention is not always the first place I’d recommend you start

— if you can prevent contention from ever happening in the first

place, then why not go that route first?

At the risk of sounding melodramatic, when you think about ways

to reduce contention, it may be helpful to remember that old quote

which goes something like, “Have the serenity to accept the things

you can’t change, the courage to change the things you can, and

the wisdom to know the difference.”

In the DBA world, the things you likely can’t change are the

demands put on your SQL Servers by end users; i.e., you can’t

real-ly control when or how often they are going to run that massive

report query that brings the server to its knees and defies all

attempts at optimization

You may have already guessed what you can change — job

schedules and the associated load incurred by job collisions, of

course! You, the DBA, are usually in complete control of many, if

not all, of the job schedules And if you aren’t in complete control

because of some special business requirements, you still likely have

some influence on when and/or how often the jobs run

So, if you can muster a little courage you can usually reduce or

eliminate much of the contention that’s directly related to colliding

jobs Here’s an approach you can take to optimize (or “level”) a

server’s schedule:

1 Make a spreadsheet with all of the jobs on the server with the

associated schedule information You can query the system

tables for this (msdb sysjobs and msdb sysjobschedules) and

use the undocumented stored proc ule_description to generate the schedule descriptions, butdepending on how many jobs you have it may be easier to gointo each job’s properties and view its schedule description(s),then copy it into the spreadsheet

msdb sp_get_sched-2 Add two columns to the spreadsheet: “Notes” and “AdjustedSchedule” This is where you’ll record any special schedulingrequirements for future reference, and the schedule changes

if applicable

3 Record the duration statistics for each job in the spreadsheet.Use the query in Listing 1 to convert the duration information inmsdb sysjobhistory to minutes The Duration_68Pct andDuration_95Pct fields reflect 1 and 2 standard deviations fromthe average runtime respectively In layman’s terms, thismeans that 68% of the runtimes will fall within theDuration_68Pct value, and 95% of the runtimes will fall withinDuration_95Pct These can be more valuable measures thanthe simple average or maximum durations when you aredetermining the appropriate spacing between jobs in the fol-lowing steps Note that sometimes the 95% value will be high-

er than the maximum, in which case you may want to givemore weight to the maximum

4 Recurring, non-recurring, and “maintenance window” jobsneed to be analyzed a bit differently, so separate the jobs into

3 groups, then sort each group by “start time”:

a.Recurring jobs.These are the jobs that run multiple times

a day

b.Maintenance window jobs. These are jobs that shouldonly run during your defined maintenance window

c.Non-recurring jobs. These jobs run daily or less

frequent-ly, but fall outside of your maintenance window

Optimal Duration Duration Change % Duration Change % Duration Change % Duration Change % Test Jobs (sec) (sec) (sec) Change (sec) (sec) Change (sec) (sec) Change (sec) (sec) Change

1 Backup + Archive 105 128 +23 +22% 107 +12 +13% 21 +11 +52%

1 Backup + Archive + File Copy 117 195 +78 +67% 116 +21 +22% 22 +12 +55% 57 +45 +79%

Table 6: Duration Changes for each Job

Optimal Job Duration (sec)

Table 5: Optional Durations for each job

Combination (All Jobs)* Backup Job Archive Job File Copy Job

Trang 11

5 Now it’s time to record the schedule adjustments in the

spreadsheet You’ll want to firm up the schedules for the

main-tenance window and non-recurring groups first by following

these steps:

a Take a look at any jobs with special scheduling

require-ments, and make any necessary adjustments For

exam-ple, a data warehousing job that must run every day at

4am

b Next, consider jobs with dependencies on other jobs and

make any needed ordering adjustments

c Now take a look at the average, maximum, 68% and 95%

duration values for each job, and adjust the schedule

spacing accordingly to ensure there is adequate room

between the jobs to avoid any overlap

6 Now that the non-recurring jobs are settled, you can focus on

the recurring jobs Chances are many of the recurring jobs

start at 12:00:00am, which is the default Look at the

recur-rence intervals for those jobs with the same start times as well

as their duration statistics, and determine when and how often

they will collide Next adjust the start times to stagger the jobs

as best as possible to avoid the collisions See Table 7 for an

example scenario

Note only two adjustments were made:

a The start time for Job 1, which runs every 1 minute, was

moved back 30 seconds This is because it normally runs

around 30 seconds or less, so if we start it 30 seconds past

the minute, the chances of it colliding with any other

short-running jobs starting on the minute will be much

reduced

Most importantly perhaps, this includes collisions with Job

2, since it runs every 15 minutes Note Job 2’s start time

was left at 12:00:00am, and because it runs for less than

30 seconds 68% of the time, most of the time it won’t

col-lide with Job 1 It will still colcol-lide with Job 1 sometimes

when it runs long, but since Job 1 runs every minute this

really can’t be avoided If Job 1 ran less frequently than

every minute, even every 2 minutes, we could adjust its

start time to 12:01:30am, which would avoid collisions with

Job 2 95% of the time or more

b The start time for Job 3 was moved back to 12:02:00am,

meaning it will run at :02, and :32 minutes after the hour

This will effectively prevent it from colliding with Job 2 four

times per hour Also, since we moved Job 1 back 30

sec-onds, most of the contention between it and Job 3 will be

avoided

This was a relatively simple example If you have more than 3

fre-quently recurring jobs things can get more complicated very

quick-ly, especially if you have one or more jobs which run every minute

For that reason it’s usually a good idea to avoid jobs that run every

minute if at all possible, since they effectively close any gaps in the

schedule that you can “fill” with other recurring jobs

7 There’s one more step that you can take to minimize tention caused by recurring jobs, and that’s during your main-tenance window For those recurring jobs that absolutely don’thave to run during the maintenance window, you can “split”their schedules This can be done one of two ways as listedbelow For these examples we’ll use a daily maintenancewindow between 3:00am and 5:00am

con-a.Add a second schedule.(See Figure 5) First, change theend time for the original schedule to the start time of themaintenance window, 3:00am in this case, and leave theoriginal start time alone Next add the second schedulewith the same recurrence frequency as the first schedule,and for its start time use 5:00 am

NOTE: If you have any existing “collision avoidance” logic ascovered in Step 6, for the second schedule’s start time besure to add the delta between midnight and the firstschedule’s start time For example, if the first scheduledstarts at 12:00:30am, use 5:00:30am as the second sched-ule’s start time (This assumes your maintenance windowends on the hour – if not, further adjustments to the sec-ond schedule’s start time may be needed.)

b.Split the original schedule (See Figure 6) Believe it ornot, this can be done by using the start of the mainte-nance window as the end time (3:00am in this case), andthe end of the maintenance window as the start time(5:00am) When you do this, SQLAgent will automaticallyrun the job from midnight until the maintenance windowstart, then start back up again when it’s over The down-sides of this approach are that it can be a bit more diffi-cult to read the schedule, and also that it doesn’t work formore than one schedule split, which can be neededwhen intensive jobs outside of the maintenance windoware involved However, it does work with both SQL Server7.0 and 2000!

8 Finally, take the schedule adjustments from the spreadsheetand apply them to the jobs You may also want to restartSQLAgent when you’re finished I have seen cases where itdoesn’t automatically pickup every schedule change.Don’t forget to save the spreadsheet to a safe place, so youcan refer back to it whenever adding new jobs to the server.Also, since runtimes will inevitably change over time I’d rec-ommend performing a periodic review of the job duration sta-tistics and comparing them to those recorded in the spread-sheet If any have changed dramatically, you may need tomake further adjustments to keep contention to a minimum

Recurrence Job Original Start Original Adjusted Adjusted Interval

Test Time End Time Start Time End Time (minutes) Avg 68% 95% Max

Duration (minutes)

Table 7: Adjusting recurring job schedules

Trang 12

Listing 1: Calculates job duration statistics in minutes

USE msdb

SELECT sysjobs.name,

COUNT(*)

AS RecCt, MIN(run_duration)

AS MinDuration, CAST(AVG(run_duration) AS decimal(9,2))

AS AvgDuration, CAST(AVG(run_duration) + STDEVP(run_duration) AS decimal(9,2))

AS Duration_68Pct, CAST(AVG(run_duration) + (STDEVP(run_duration) * 2)

AS decimal(9,2))

AS Duration_95Pct, MAX(run_duration)

AS MaxDuration FROM (

SELECT job_id,

CAST((LEFT(run_duration, 3) * 60 + SUBSTRING(run_duration, 4, 2) + CAST(RIGHT(run_duration, 2) AS decimal(2,0)) / 60)

AS decimal(9,2))

AS run_duration FROM ( SELECT job_id,

REPLICATE(‘ ‘, (7 - LEN(run_duration))) + CAST(run_duration AS varchar(7))

AS run_duration FROM sysjobhistory

WHERE sysjobhistory.step_id = 0 ) t1

cal-Conclusion

I’ve demonstrated how heavy contention for disk resourcesincurred by scheduled jobs can have significant impact on SQLServer performance, since this is one of the most common and eas-ily controllable culprits It’s important to note, however, that the phe-nomena where contention causes everything to take longer than itwould otherwise is not isolated to disk resources For example, just

as idle time causes this with disks, in the case of CPU resources itcan be high context switching Your initial approach should be thesame in most every case – first eliminate the overlap wherever pos-sible, before simply upgrading hardware as a solution to perform-ance problems

Hopefully I have armed you with some new techniques which willhelp you effectively identify and combat both job-related and gen-eral contention issues that may be impacting your performance

Figure 5: Schedule-splitting

with 2 schedules

Figure 6: Schedule-splitting with a single schedule

Greg is the architect of sqlSentry, a visual job ing and notification management system for SQL Server He is also the founder of InterCerve, a leading Microsoft focused hosting and development services firm, and the company behind sqlSentry Greg has been working with SQL Server for over 10 years.

schedul-Greg Gonzalez

Trang 13

Locking is a natural part of any OLTP application However, if the

design of the applications and transactions is not done correctly,

you can run into severe blocking issues that can manifest

them-selves into severe performance and scalability issues by resulting

into contention on resources Controlling blocking in an application

is a matter of correct application design, correct transaction

archi-tecture, and a correct set of parameter settings and testing your

application under a heavy load with volume data to make sure

that the application scales well The primary focus of this article is

OLTP applications, and we will focus on locking and blocking in

applications and how to resolve the blocking conflicts

Transactions

A transaction is essentially a sequence of operation that is

per-formed as a single logical unit of work, and that logical unit of work

must adhere to the ACID properties We, as programmers, are

responsible for starting and ending transactions at points that

enforce the logical consistency of the data The ANSI standards

state the Isolation Levels for these transactions and it is the

respon-sibility of an enterprise database system, such as SQL Server, to

pro-vide mechanisms ensuring the physical integrity of each

transac-tion SQL Server provides:

• Locking facilities that preserve transaction isolation

• Logging facilities that ensure transaction durability

Even if the server hardware, operating system, or

SQL Server itself fails, SQL Server uses the transaction

logs, upon restart, to automatically roll back any

incompleted transactions to the point of the system

failure

• Transaction management features that enforce

trans-action atomicity and consistency After a transtrans-action

has started, it must be successfully completed, or SQL

Server undoes all of the data modifications made

since the transaction started

Locking prevents users from reading data being changed by other

users and prevents multiple users from changing the same data at

the same time If locking is not used, data within the database may

become logically incorrect, and queries executed against that

data may produce unexpected results Although SQL Server

enforces locking automatically, you can design applications that

are more efficient by understanding and customizing locking in

your applications

How locking is implemented decides how much concurrency

(along with performance and scalability) is allowed in the

appli-cation It is important to know and understand how to maintain the

logical unit of work and correctly manage the locks in the

appli-cation code Due to the poor appliappli-cation design, some of the

incorrect settings on the server, or poorly written transactions, the

locks can conflict with other locks leading to high number of waits

and thus slowing down the response of the system and resulting

into a non-scalable solution

Difference between Blocking

and Deadlocks

Many people confuse blocking with deadlocks Blocking and

deadlocks are two very different occurrences Blocking occurs due

to one transaction locking the resources that the other transactionwants to read or modify, usually when one connection holds a lockand a second connection requires a conflicting lock type Thisforces the second connection to wait, blocked on the first Anyconnection can block any another connection, regardless of fromwhere they emanate Most blocking conflicts are temporary innature, and will resolve themselves eventually unless you havehung transactions

Deadlocks are much worse than blocking A deadlock occurswhen first transaction has locks on the resources that second trans-action wants to modify, and the second transaction has locks onthe resources that the first transaction intends to modify So, a dead-lock is much like an infinite loop: If you let it go, it will continue indef-initely Fortunately, SQL Server has a built-in algorithm for resolvingdeadlocks It will choose one of the deadlock participants and rollback its transaction, sending the user the following message:

“Your transaction (process ID #x) was deadlocked on {lock | munication buffer | thread} resources with another process and has been chosen as the deadlock victim Rerun your transaction.”

com-Causes of Blocking

Here are the common causes of blocking:

• De-normalized Data-Model design:

More often than not, blocking problems are due to poor cation design & data-model design A transactional databasemodel should be highly normalized There are several normal-ization rules that you should adhere to when designing yourdatabase We won’t go into the details of normalization, but tosummarize the concept—you should not keep any redundantdata in your database Transactional databases should not haveany repeating columns, and each piece of data should bestored only once That way, the transactions modify lean tablesand release locks quickly Typically, adhering closely to the third-normal form works fine with a little de-normalization done attimes to improve performance

appli-• Lack of properly designed indexes:

The lack of appropriate indexes can often cause blocking lems as well If indexes are missing, then SQL Server might decide

prob-to acquire a table lock for the duration of your transaction Therest of the connections will be blocked until your transaction iscommitted or rolled back For the queries that are written using

“SELECT…WITH (UPDLOCK)”, or the Update statements & Deletestatements, make sure that you have verified the execution plan

to ensure that the access will based be based on indexes,preferably via an index seek operation Design your indexescarefully to avoid lock escalations as well

• Bad transaction design:

Poorly written transactions are by far the most common cause ofblocking Here are some scenarios that should be avoided whendesigning transactions:

a) Transactions that ask for an input value from aninterface in the middle of a transaction Imagine

L OCKING , B LOCKINGAND D EADLOCKS

By: Rahul Sharma

Trang 14

the user decides to take a break in the middle of transaction The transaction will hold the locks until the user inputs the value Therefore, never ask for a user input inside a transaction

b) Keep your transactions as small as possible so that

you do not hold locks for a long time This becomes especially important in the case of SQL Server (and default isolation level of READ COM-MITTED) where-in the readers (selects) block writers and writers (DML statements like Delete, Updateand Insert) block readers

c) Submitting queries that have long execution times

A long-running query can block other queries For example, a DELETE or UPDATE operation that affects many rows can acquire many locks that, whether or not they escalate to a table lock, blockother queries For this reason, you generally do not want to mix long-running decision support queriesand online transaction processing (OLTP) queries

on the same database The solution is to look for ways to optimize the query by changing indexes,breaking a large, complex query into simpler queries, or running the query during off hours or on

a separate computer

d) One reason queries can be long-running, and

hence cause blocking, is if they inappropriately use cursors Cursors can be a convenient methodfor navigating through a result set, but using themmay be slower than set-oriented queries So, when-ever possible, try avoiding the use of cursors and make use of more set-based approach

e) Cancelled Queries: This is one of the very common

reasons for seeing locks/blocks in the system

When a query is cancelled by the application (example: like by using the sqlcancel function when using ODBC or because of query time out/lock timeout), the application also needs to issue the required number of rollback and/or com-mit statements Canceling a query or afailed/timed-out query in a transaction does notmean that the transaction will be automaticallyrolled back or committed All locks acquired with-

in the transaction are retained after the query is canceled Subsequent transactions executedunder the same connection are treated as nested transactions, so all the locks acquired in these completed transactions are not released Thisproblem repeats with all the transactions execut-

ed from the same connection until a ROLLBACK is executed As a result, a large number of locks areheld, users are blocked, and transactions are lost, which results in data that is different from what youexpect Applications must properly manage trans-action nesting levels by committing or rolling back canceled transactions

f) Transactions should be designed such that the

end user is not allowed to enter bad data for thefields; i.e., do not design an application that allows users to fill in edit boxes that generate a long-run-ning query For example, do not design an appli-cation that allows certain fields to be left blank or

a wildcard to be entered as this may cause theapplication to submit a query with an excessive

running time, thereby causing a blocking problem.These can be avoided by using client side code tocheck the values

g) If the SET LOCK_TIMEOUT value is set very high,then wait-times will increase and hence blocks willincrease Make sure that you are using reasonablevalues for this SET option and have logic in place

in the application to handle the 1222 error thatarises because of the lock timeout

h) If the application is not using parameterizedqueries, then every SQL statement will get parsed,compiled and executed each time unless it is avery simple SQL statement in which case auto-parameterization is done by SQL Server However,

in most cases it will not be able to do meterization of the SQL Statements and hence thetime taken for the SQLs will be more since it cannot re-use the execution plan That can result intolonger waits and latches A well-designed OLTP application (OLAP is different) should alwaysmake use of parameterized queries (also known

auto-para-as bind variables usage) to parse and compilesuch queries once and execute them many times.a) Nested transactions and Savepoints and proper errorchecks: Make sure that you are checking for @@tran-count In the case of nested transactions, the transac-tion is either committed or rolled back based on theaction taken at the end of the outermost transaction Ifthe outer transaction is committed, the inner nestedtransactions are also committed If the outer transac-tion is rolled back, then all inner transactions are alsorolled back, regardless of whether or not the innertransactions were individually committed So, do notassume that the inner transaction results are saved even if the outermost transaction fails

Use savepoints only in situations where errors are

unlike-ly to occur The use of a savepoint to roll back part of

a transaction in the case of an infrequent error can bemore efficient than having each transaction test to see

if an update is valid before making the update.Updates and rollbacks are expensive operations, so savepoints are effective only if the probability of encountering the error is low and the cost of checking the validity of an update beforehand is relatively high

In T-SQL code, have proper error checks in the code after every statement Depending upon the error, the transaction may or may not abort and the codeshould take care of those scenarios

• Bad use of locking hints or query hints:

Inappropriate use of locking hints can be yet another cause ofblocking If you force SQL Server to acquire 50000 row level locks,your transaction might have to wait until other transactions com-plete, and this many locks are available

Most commonly used locking hints are: ROWLOCK, UPDLOCK,NOLOCK and READPAST Be very careful when you are usingthem and understand how your application works before start-ing to use them

The other query hints like “FAST n”, “FORCE ORDER” etc., tially the join hints, index hints, view hints, table hints need to bejudiciously used You should know and test all flows in the appli-cation with volume data (and multi-user scenarios) before put-ting that code into production

Trang 15

essen-• Configuration options for the Instance:

Most often, the default options for lock configuration are ok so

these should be considered only when everything else has been

done

a) Locks Option: Use the locks option to set the

maxi-mum number of available locks, thereby limiting

the amount of memory SQL Server uses for locks

The default setting is 0, which allows SQL Server to

allocate and deallocate locks dynamically based

on changing system requirements When the

serv-er is started with locks set to 0, the lock managserv-er

allocates two percent of the memory allocated to

SQL Server to an initial pool of lock structures As

the pool of locks is exhausted, additional locks are

allocated The dynamic lock pool does not

cate more than 40 percent of the memory

allo-cated to SQL Server Each lock consumes 96 bytes

of memory, hence increasing this value can

require an increase in the amount of memory

dedicated to the server

b) Customizing locking for indexes: Changing the

index behavior by using the sp_indexoption

pro-cedure: You can change the lock escalation

behavior for the indexes by using sp_indexoption

procedure and disallow the page level locks

c) Query Wait option: Memory-intensive queries,

such as those involving sorting and hashing, are

queued when there is not enough memory

avail-able to run the query The query times out after a

set amount of time calculated by SQL Server (25

times the estimated cost of the query) or the time

amount specified by the non-negative value of

the query wait A transaction containing the

wait-ing query may hold locks while the query waits for

memory Decreasing the query wait time lowers

the probability of such deadlocks Eventually, a

waiting query will be terminated and the tion locks released However, increasing the maxi-mum wait time may increase the amount of timefor the query to be terminated Changes to thisoption are not typically recommended andshould be done only when absolutely necessary

transac-d) Memory configuration: Typically, you would want

to let SQL Server manage memory using dynamic memory management configuration If at all, you have to play with the “min server memory” and

“max server memory” options, do so judiciously Ifenough memory is not available, then the memo-

ry needs of the Lock-Manager will not be satisfied leading to waits and blocks

e) Other advanced configuration options to look into are: “Max Degree of Parallelism”, “query governorcost limit”, “AWE” The discussion on those is out ofscope for this article In a future article, I will coverthe advanced options and how they effect SQL Server configuration

• Badly written queries:

There is really no substitute to a well-written application Pleasemake sure that you are using well-tuned SQL queries in yourapplication Run them through a benchmark database with rea-sonable and representative amount of data and test with differ-ent conditions Trace the application code out using SQL ServerProfiler and use the SET commands in Query Analyzer to look intothe execution plan the I/O associated with the SQLs and tunethem

• Usage of in-correct Isolation Levels:

Understanding the most appropriate isolation for your tion is important – for both concurrency and performance whilestill maintaining the appropriate level of accuracy The concept

applica-of Isolation Level is not new – in fact, details regarding the ANSIspecifications for Isolation can be found on: www.ansi.organdthe current specification to review is ANSI INCITS 135-1992 (R1998)

Isolation Dirty Read Non-repeatable Read Phantom (Possible Level (Possible Phenomena) (Possible Phenomena) Phenomena)

The application usage for each of the above varies based on the desired level of “correctness”

and the trade-off chosen in performance and administrative overhead

Isolation Levels

Trang 16

• Deciding the concurrency model:

Deciding which concurrency model to use for your application

is very critical You need to fully understand what each

concur-rency model does and in what scenarios they can and should

be used before deciding on what changes you need to make in

the application

a.) Pessimistic Concurrency Model:

Pessimistic concurrency control locks resources as theyare required, for the duration of a transaction Unlessdeadlocks occur, a transaction is assured of successfulcompletion Under a pessimistic concurrency control-based system, locks are used to prevent users from mod-ifying data in a way that affects other users Once alock has been applied, other users cannot performactions that would conflict with the lock until the ownerreleases it This level of control is used in environmentswhere there is high contention for data and where thecost of protecting the data using locks is less than thecost of rolling back transactions if/when concurrencyconflicts occur

When pessimistic locking (the ANSI standard for transaction

iso-lation) is used, applications typically exhibit blocking

Simultaneous data access requests from readers and writers

within transactions request conflicting locks; this is entirely normal

and provided the blocking is short lived, not a significant

per-formance bottleneck This can change on systems under stress,

as any increase in the time taken to process a transaction (forexample delays caused by over-utilized system resources such

as disk I/O, RAM or CPU as well as delays caused by poorly ten transaction such as those with user interaction) can have adisproportional impact on blocking – the longer the transactiontakes to execute, the longer locks are held and the greater thelikelihood of blocking

writ-b) Optimistic Concurrency Model:

Optimistic concurrency control works on the assumptionthat resource conflicts between multiple users areunlikely (but not impossible), and allows transactions toexecute without locking any resources Only whenattempting to change data are resources checked todetermine if any conflicts have occurred If a conflictoccurs, the application must read the data and attemptthe change again Under an optimistic concurrencycontrol-based system, users do not lock data when theyread it Instead, when an update is performed the sys-tem checks to see if another user changed the dataafter it was read If another user updated the data, anerror is raised Typically, the user receiving the error rollsback the transaction, resubmits (application/environ-ment dependant) and/or starts over This is called opti-mistic concurrency because it is mainly used in environ-ments where there is low contention for data, and where

Read uncommitted The application does not require absolute accuracy of data (and could

get a larger/smaller number than the final value) and wants performance

of OLTP operations above all else No version store, no locks acquired, nolocks are honored Data accuracy of queries in this isolation may seeuncommitted changes

Read committed The application does not require point-in-time consistency for long running

aggregations or long-running queries yet wants data values which areread to be only transactionally consistent The application does not want the overhead of the version store at the trade-off of potential incorrectnessfor long running queries because of non-repeatable reads

Repeatable read The application requires absolute accuracy for long running

multi-state-ment transactions and must hold all requested data from other tions until the transaction completes The application requires consistencyfor all data which is read repeatedly within this transaction and requires that no other modifications are allowed – this can impact concurrency in

modifica-a multi-user system if other trmodifica-ansmodifica-actions modifica-are modifica-attempting to updmodifica-ate dmodifica-atmodifica-a thmodifica-athas been locked by the reader This is best when the application is relying

on consistent data and plans to modify it later within the same transaction

Serializable The application requires absolute accuracy for long running

multi-state-ment transactions and must hold all requested data from other tions until the transaction completes Additionally, the transactions are requesting sets of data and not just singleton rows Each of the sets mustproduce the same output at each request within the transaction and withmodifications expected no other users can modify not only the data whichhas been read but must prevent new rows from entering the set This is best when the application is relying on consistent data, plans to modify it later within the same transaction, requires absolute accuracy and data consis-tency – even at the end of the transaction (within the active data)

modifica-Isolation Level Best Suited For An Application When:

Isolation Level and Application Best Suited

Trang 17

the cost of occasionally rolling back a transaction

out-weighs the costs of locking data when read

This can be implemented in SQL Server by either using a

rowver-sion (timestamp) data-type or by using an integer for doing the

row-versioning While updating the record, the client session

updates based on the primary key column(s) and the integer

column’s old value and also increments the value by 1

The client reads the row with the current value for the column but

does not maintain any locks in the DB At some later time, when

the client wants to update the row, it must ensure that no other

client has updated that record in the interim….that is done by

including the old value clause statement in the where clause

The second part of the where clause provides the “locking” If

some other client has updated the record, then the where

clause will fail to pick any rows The client then uses this signal

(zero row update) as an indication of lock failure and can then

choose to re-read the data or ask the end user to re-do their

work If the contention for the data is less, this will be suited for

your business This concurrency approach does provide the most

concurrent approach

c) Disconnected/”logical lock” Model:

Besides these 2 concurrency options that are used

99.99% of the time in the applications, people som

etime come up with different ways of imple

menting concurrency control in their applications For

instance, using logical locks by one connection’s

trans-action updating a column in a table with a “in-use”

value & by updating a datetime column with the time

when the update was made and then before another

connection tries to make updates to that record, it

checks that column for that special “in-use” value and

checks for how long that record has been in use (by

using system date – the datetime column value), and

based on a threshold (typically time-out setting) of the

time that lock has existed, it either overrides that lock or

just gives back a lock timeout message after a specified

wait interval

I have seen people using this approach in applications

where the UI is designed to take too much of user data

and using optimistic/pessimistic is ruled out according

to them The design of the screens themselves is an issue

in such scenarios and people don’t realize that UI

appli-cations do not work the same way as typical

connect-ed client-server applications

Such an approach is fraught with dangers since till the

threshold is not reached, that record is not available

anymore and in itself violates the basic rules of

transac-tions since the transactransac-tions are not split across a period

of time not to mention the effect that this approach has

on concurrency in a heavily loaded multi-user scenario

application Whenever you see such an application

scenario, you can re-design it to use optimistic

concur-rency and scripting logic

• Orphaned Sessions:

An orphaned session is a session that remains open on the

serv-er side aftserv-er the client has disconnected Do not confuseorphaned sessions with orphaned users Orphaned users arecreated when a database is backed up and restored to anoth-

er system that does not have a corresponding user account figured Orphaned sessions occur when the client is unable tofree network connections it is holding when it terminates If theclient terminates cleanly, Windows closes the connection andnotifies SQL Server If SQL Server is processing a client command,

con-it will detect the closed connection when con-it ends the session.Client applications that crash or have their processes killed (forexample, from Task Manager), are cleaned up immediately byWindows NT, rarely resulting in an orphaned session

One common cause of orphaned sessions arises when a clientcomputer loses power unexpectedly, or is powered off withoutperforming a proper shut down Orphaned sessions can alsooccur due to a “hung” application that never completely termi-nates, resulting in a dead connection Windows will not knowthat the connection is dead and will continue to report theaction as active to SQL Server SQL Server, in turn, keeps the ses-sion open and continues to wait for a command from the client

Issues with Orphaned Sessions:

Open sessions take up one of the SQL Server network tions The maximum number of connections is limited by thenumber of server CALs; therefore, orphaned sessions may pre-vent other clients from connecting

connec-Typically, a more important issue is that open sessions use serverresources and may have open cursors, temporary tables, orlocks These locks may block other connections from performinguseful work, and can sometimes be the result of a major “pile up”

of locks In severe cases, it can appear that SQL Server hasstopped working

Resolutions:

sysprocesses (or stored procedures, such as sp_who/sp_who2)reports information on existing server sessions Possible orphanedsessions can be identified if the status of a process is “awaitingcommand” and the interval of time found by subtractinglast_batch from GETDATE() is longer than usual for the process Ifthe session hostname is known to be down, it is orphaned

Windows checks inactive sessions periodically to ensure they areactive If a session does not respond, it is closed and SQL Server

is notified The frequency of the checking depends on the work protocol and registry settings However, by default, Windows

net-NT only performs a check every one or two hours, depending onthe protocol used These configuration settings can be changed

in the registry

To close an orphaned SQL Server session, use the KILL command.All resources held by the session are then released If orphanedsessions become a problem, registry settings can be changed

on Windows to increase the frequency that clients are checked

to verify they are active Changing these settings affects otherapplication connections and the following points should be con-sidered before making any changes

Resolving issues through the Query Analyzer:

At times, the orphaned session could be created by the cation and this session will be holding locks on some tables andhence other users will not be able to query these tables or write

appli-to these tables The download contains code that you can use appli-tofind the orphaned sessions and resolve issues: A spid value of –2

is an indicator of connectionless, or orphaned transactions You

Trang 18

can identify this is the sp_lock(spid column), sp_who or sp_who2

(blk column), syslockinfo and the sysprocesses table

There are two things that you may notice:

1.) The spid value of –2 In this case you will have to

use Kill UOW to kill the process UOW is the Unit ofWork of the DTC transaction and can be obtained from the syslockinfo table

2.) The last statement that was being executed by the

process is sp_cursorunprepare or sp_unpreparewhich are API cursors The situation in this case could be that the application forgot to clean upthe session in the case of an error condition, hence creating an orphaned session lying on theserver In this case, you will have to use kill spid toterminate the process (spid is the system process idfor that particular process) And then fix the appli-cation code to close out the session in case of anerror

You can also use the KILL spid/UOW with statusonly to check the

status of the kill statement

SQL Server caveats for the Oracle DBA:

Unlike Oracle, SQL Server 2000 does not implement multi-version

concurrency model (this is going to change in SQL Server 2005,

code-named “Yukon”) In SQL Server 2000, readers and writers

block each other in the default READ COMMITTED transaction

iso-lation level This comes as a big surprise to Oracle DBAs and that is

one of the reasons why when applications that were written with

Oracle in mind are ported to SQL Server 2000, they exhibit

signifi-cant concurrency issues Understanding the transactions and lock

architecture in SQL Server 2000 and modification to the application

code is needed in order to scale such an application on SQL

Server

I will be covering the new Isolation Levels in SQL Server 2005:

Snapshot and Read Committed Snapshot in another article which

makes porting such an application to SQL Server a breeze

Troubleshooting & Resolving blocking:

Most of the issues that we talked about in the article so far talked

about what the blocking problems are, what causes them, and

how to mitigate such issues at design time If proper considerations

are taken at design time, then you can develop a very robust

sys-tem that scales very well

However, regardless of that, you will run into some blocking issues

at the time of deployment of the application under heavy load

and under heavy multi-user scenarios That is just the nature of the

beast Also, if you are inheriting an application in which good

design considerations were not adhered to or if you are a

consult-ant and have been asked to find out and fix the application, then

you need to know how to detect the application/db locking issues

and how to resolve them The next few paragraphs will detail such

scripts and will give you links to some Microsoft KB articles as well

that will help you in detecting such issues

Detecting Issues:

You should use SQL Server Profiler and T-SQL scripts in order to

detect and log the locking and blocking issues in your

SyslockinfoSp_who/Sp_who2System objects to get meta-data information like sysob-jects, syscolumns etc

DBCC INPUTBUFFER commandThere is a sp_blocker_pss80 procedure published by theMicrosoft PSS team on this and there are many variants

of the same that are used by DBAs around the world.Here is that link:

us;271509

http://support.microsoft.com/default.aspx?scid=kb;en-b) SQL Server Profiler:

A sample profiler template is shown below and can beused as the starting point with additional filters as peryour application needs The same can be used fordeadlocks as well if you want to run it for extended timeand know that you will be able to trace the deadlockevent in case it happens (deadlocks will be covered inpart-II of this article) Modify this trace template toselect/remove the events and data-columns as per yourapplication needs

Always run the SQL Server Profiler trace from a clientmachine rather than running it directly on the produc-tion server Be aware that profiler is just a GUI tool andthe trace is really a server side trace It just gives yougood visibility into the data in a GUI format You can alsoscript server side traces for detecting issues I had written

an article before ( link) that shows how to use serverside scripting for traces Once you are done with thetrace, you can then save the output of the trace file into

a SQL Server table and directly query the data from thattable to diagnose the flow of events and the issues.Alternatively, you can also directly query the trace filesusing the fn_trace_gettable function (look up BOL formore information on it)

Resolving blocking issues:

a) Application Considerations: If you deduce that theblocking is caused by a poorly written transaction, try to rewrite it Often, a single transaction mightbring the entire application to its knees Othertimes, you’ll have to review many (or all) stored procedures of your application before you can resolve problems

If you see many table locks in your database, you mightwant to evaluate your indexing strategy The blockingproblems can often be resolved by adding appropriate

Ngày đăng: 05/03/2014, 20:20

TỪ KHÓA LIÊN QUAN