Also, when more than one SQL Server instance is on the same physical server, you need to divide the memory correctly for each.. Database allocations—We like to use an approach of putting
Trang 1Performance and Tuning Design Guidelines
We outline some of the major performance and tuning design guidelines here There are,
of course, many more, but if you a least consider and apply the ones outline here, you
should end up with a decently performing SQL Server implementation As we have
described previously, performance and tuning should first be “designed in” to your SQL
Server implementation Many of the guidelines discussed here can be adopted easily in
this way However, when you put off the performance and tuning until later, you have
fewer options to apply and less performance improvement when you do make changes
Remember, addressing performance and tuning is like peeling an onion And, for this
reason, we present our guidelines in that way—layer by layer This approach helps provide
you with a great reference point for each layer and a list you can check off as you develop
your SQL Server–based implementation Just ask yourself whether you have considered the
specific layer guidelines when you are dealing with that layer Also, several chapters take
you through the full breadth and depth of options and techniques introduced in many of
these guidelines We point you to those chapters as we outline the guidelines
Hardware and Operating System Guidelines
Let’s start with the salient hardware and operation system guidelines that you should be
considering:
Hardware/Physical Server:
Server sizing/CPUs—Physical (or virtual) servers that will host a SQL Server
instance should be roughly sized to handle the maximum processing load plus 35%
more CPUs (and you should always round up) As an example, for a workload that
you anticipate may be fully handled by a four-CPU server configuration, we
recom-mend automatically increasing the number of CPUs to six We also always leave at
least one CPU for the operating system So, if six CPUs are on the server, you should
allocate only five to SQL Server to use You can find details on configuring CPUs in
Chapter 55, “Configuring, Tuning, and Optimizing SQL Server Options,” and details
on monitoring CPU utilization in Chapter 39, “Monitoring SQL Server
Performance.”
Memory—The amount of memory you might need is often directly related to the
amount of data you need to be in the cache to achieve 100% or near 100% cache hit
ratios This, of course, yields higher overall performance We don’t believe there is
too much memory for SQL Server, but we do recognize that some memory must be
left to the operating system to handle OS-level processing, connections, and so on
So, in general, you should make 90% of memory available to SQL Server and 10% to
the OS You can find details on configuring memory in Chapter 49 and details on
monitoring memory utilization in Chapter 39
Disk/SAN/NAS/RAID—Your disk subsystem can be a major contributor to
perfor-mance degradation if not handled properly We recognize that there are many
differ-ent options available here We generally try to have some separate devices on
Trang 2different I/O channels so that disk I/O isolation techniques can be used This means
that you isolate heavy I/O away from other heavy I/O activity; otherwise, disk head
contention causes massive slowdowns in physical I/O When you use SAN/NAS
stor-age, much of the storage is just logical drives that are heavily cached This type of
situation limits the opportunity to spread out heavy I/O, but the caching layers
often alleviate that problem In general, RAID 10 is great for high update activity,
and RAID 5 is great for mostly read-only activity You can find more information on
RAID and storage options in Chapter 38, “Database Design and Performance.”
Operating System:
Page file location—When physical memory is exceeded, paging occurs to the page
file You need to make sure that the page file is not located on one of your database
disk locations; otherwise, performance of the whole server degrades rapidly
Processes’ priority—You should never lower the SQL Server processes in priority or
to the background You should always have them set as high as possible
Memory—As mentioned earlier, you should make sure that at least 10% of memory
is available to the OS for all its housekeeping, connection handling, process threads,
and so on
OS version—You should make sure you are using the most recent version of the
operating systems as you can and have updated with the latest patches or service
packs Also, often you must remove other software on your server, such as
special-ized virus protection We have lost track of the number of SQL Server
implementa-tions we have found that had some third-party virus software installed (and enabled)
on it, and all files and communication to the server were interrupted by the virus
scans Rely on Microsoft Windows and your firewalls for this protection rather than
a third-party virus solution that gets in the way of SQL Server If your organization
requires some type of virus protection on the server, at least disable scanning of the
database device files
Network:
Packet sizes/traffic—With broader bands and faster network adapters (typically at
least 1GB now), we recommend you utilize the larger packet sizes to accommodate
your heavier-traffic SQL Server instances Packets of 8KB and larger are easily
handled now Information on configuring the SQL Server packet size is available in
Chapter 49
Routers/switches/balancers—Depending on if you are using SQL clustering or
have multitiered application servers, you likely should utilize some type of load
bal-ancing at the network level to spread out connections from the network and avoid
bottlenecks
Trang 3SQL Server Instance Guidelines
Next comes the SQL Server instance itself and the critical items that must be considered:
SQL Server configuration—We do not list many of the SQL Server instance
options here, but many of the default options are more than sufficient to deal with
most SQL Server implementations See Chapter 49 for information on all the
avail-able options
SQL Server device allocations—Devices should be treated with care and not
over-allocated SQL databases utilize files and devices as their underlying allocation from
the operating system You do not want dozens and dozens of smaller files or devices
for each database Having all these files or devices becomes harder to administer,
move, and manipulate We often come into a SQL Server implementation and
simplify the device allocations before we do any other work on the database At a
minimum, you should create data devices and log devices so that you can easily
isolate (separate) them
tempdbdatabase—Perhaps the most misunderstood SQL Server shared resource is
tempdb General guidelines for tempdbis to minimize explicit usage (overusage) of it
by limiting temp table creation, sorts, queries using DISTINCT clause, so on You are
creating a hot spot in your SQL Server instance that is mostly not in your control.
You might find it hard to believe, but indexing, table design, and even not executing
certain SQL statements can have a huge impact on what gets done in tempdband
have a huge effect on performance And, of course, you need to isolate tempdbaway
from all other databases For additional information on placing and monitoring
tempdb, see Chapters 38 and 39
masterdatabase—There is one simple guideline here: protect the master database at
all costs This means frequent backups and isolation of masteraway from all other
databases
modeldatabase—It seems harmless enough, but all databases in SQL Server utilize
themodeldatabase as their base allocation template We recommend you tailor this
for your particular environment
Memory—The best way to utilize and allocate memory to SQL Server depends on a
number of factors One is how many other SQL Server instances are running on the
same physical server Another is what type of SQL Server–based application it is:
heavy update versus heavy reads And yet another is how much of your application
has been written with stored procedures, triggers, and so on In general, you want to
give as much of the OS memory to SQL Server as you can But this amount should
never exceed 90% of the available memory at the OS level You don’t want SQL
Server or the OS to start thrashing via the page file or competing against each other
for memory Also, when more than one SQL Server instance is on the same physical
server, you need to divide the memory correctly for each Don’t pit them against
Trang 4each other More information on configuring and monitoring SQL Server memory is
available in Chapters 39 and 49
Database-Level Guidelines
Database allocations—We like to use an approach of putting database files for
heavily used databases on the same drives as lightly used databases when more than
one database is being managed by a single SQL Server instance In other words, pair
big with small, not big with big This approach is termed reciprocal database pairing.
You should also not have too many databases on a single SQL Server instance If the
server fails, so do all the applications that were using the databases managed by this
one SQL Server instance It’s all about risk mitigation Remember the adage “never
put all your eggs in one basket.”
Databases have two primary file allocations: one for their data portion and the other
for their transaction log portion You should always isolate these file allocations
from each other onto separate disk subsystems with separate I/O channels if
possi-ble The transaction log is a hot spot for highly volatile applications (that have
frequent update activity) Isolate, isolate, and isolate some more There is also a
notion of something called reciprocal database device location More information is
available on this issue in Chapters 38 and 39
You need to size your database files appropriately large enough to avoid database file
fragmentation Heavily fragmented database files can lead to excessive file I/O
within the operating system and poor I/O performance For example, if you know
your database is going to grow to 500GB, size your database files at 500GB from the
start so that the operating system can allocate a contiguous 500GB file In addition,
be sure to disable the Auto-Shrink database option Allowing your database files to
continuously grow and shrink also leads to excessive file fragmentation as file space
is allocated and deallocated in small chunks
Database backup/recovery/administration—You should create a database
back-up and recovery schedule that matches the database back-update volatility and recovery
point objective All too often a set schedule is used when, in fact, it is not the
sched-ule that drives how often you do backups or how fast you must recover from failure
Table Design Guidelines
Table designs—Given the massively increased CPU, memory, and disk I/O speeds
that now exist, you should use a general guideline to create as “normalized” a table
design as is humanly possible No longer is it necessary to massively denormalize for
performance Most normalized table designs are easily supported by SQL Server
Normalized table designs ensure that data has high integrity and low overall
redun-dant data maintenance See Dr E F Codd’s original work on relational database
design (The Relational Model for Database Management: Version 2, Addison Wesley,
Trang 51990) Denormalize for performance as a last resort! For more information on
normalization and denormalization techniques, see Chapter 38
NOTE
Too often, we have seen attempts by developers and database designers to guess at
the performance problems they expect to encounter denormalizing the database design
before any real performance testing has even been done, This, more often than not,
results in an unnecessarily, and sometimes excessively, denormalized database design
Overly denormalized databases require creating additional code to maintain the
denor-malized data, and this often ends up creating more performance problems than it
attempts to solve, not to mention the greater potential for data integrity issues when
data is heavily denormalized It is always best to start with as normalized a database
as possible, and begin testing early in the development process with real data volumes
to identify potential areas where denormalization may be necessary for performance
reasons Then, and only when absolutely necessary, you can begin to look at areas in
your table design where denormalization may provide a performance benefit
Data types—You must be consistent! In other words, you need to take the time to
make sure you have the same data type definitions for columns that will be joined
and/or come from the same data domain—InttoInt, and so on Often the use of
user-defined data types goes a long way to standardize the underlying data types
across tables and databases This is a very strong method of ensuring consistency
Defaults—Defaults can help greatly in providing valid data values in columns that
are common or that have been specified as mandatory (not NULL) Defaults are tied
to the column and are consistently applied, regardless of the application that
touches the table
Check constraints—Check constraints can also be useful if you need to have
checks of data values as part of your table definition Again, it is a consistency
capa-bility at the column level that guarantees that only correct data ends up in the
column Let us add a word of warning, though: you have to be aware of the insert
and update errors that can occur in your application from invalid data values that
don’t meet the check constraints
Triggers—Often triggers are used to maintain denormalized data, custom audit logs,
and referential integrity Triggers are often used when you want certain behavior to
occur when updates, inserts, and deletes occur, regardless of where they are initiated
from Triggers can result in cascading changes to related (dependent) tables or
fail-ures to perform modifications because of restrictions Keep in mind that triggers add
overhead to even the simplest of data modification operations in your database and
are a classic item to look at for performance issues You should implement triggers
sparingly and implement only triggers that are “appropriate” for the level of
integrity or activity required by your applications, and no more than is necessary
Also, you need to be careful to keep the code within your triggers as efficient as
Trang 6possible so the impact on your data modifications is kept to a minimum For more
information on coding and using triggers, see Chapter 30, “Creating and Managing
Triggers.”
Primary keys/foreign keys—For OLTP and normalized table designs, you need to
utilize explicit primary key and foreign key constraints where possible For many
read-only tables, you may not even have to specify a primary key or foreign key at
all In fact, you will often be penalized with poorer load times or bulk updates to
tables that are used mostly as lookup tables SQL Server must invoke and enforce
integrity constraints if they are defined If you don’t absolutely need them (such as
with read-only tables), don’t specify them
Table allocations—When creating tables, you should consider using the fill factor
(free space) options (when you have a clustered index) to correspond to the
volatil-ity of the updates, inserts, and deletes that will be occurring in the table Fill factor
leaves free space in the index and data pages, allowing room for subsequent inserts
without incurring a page split You should avoid page splits as much as possible
because they increase the I/O cost of insert and update operations For more
infor-mation on fill factor and page splits, see Chapter 34, “Data Structures, Indexes, and
Performance.”
Table partitioning—It can be extremely powerful to segregate a table’s data into
physical partitions that are naturally accessed via some natural subsetting such as
date or key range Queries that can take advantage of partitions can help reduce I/O
by searching only the appropriate partitions rather than the entire table For more
information on table partitioning, see Chapters 24, “Creating and Managing Tables,”
and 34
Purge/archive strategy—You should anticipate the growth of your tables and
determine whether a purge/archive strategy will be needed If you need to archive or
purge data from large tables that are expected to continue to grow, it is best to plan
for archiving and purging from the beginning Many times, your archive/purge
method may require modifications to your table design to support an efficient
archive/purge method In addition, if you are archiving data to improve
perfor-mance of your OLTP applications, but the historical data needs to be maintained for
reporting purposes, this also often requires incorporating the historical data into
your database and application design It is much easier to build in an archive/purge
method to your database and application from the start than have to retrofit
some-thing back into an existing system Performance of the archive/purge process often is
better when it’s planned from the beginning as well
Indexing Guidelines
In general, you need to be sure not to overindex your tables, especially for tables that
require good performance for data modifications! Common mistakes include creating
redundant indexes on primary keys that already have primary key constraints defined or
creating multiple indexes with the same set of leading columns You should understand
when an index is required based on need, not just the desire to have an index Also, you
Trang 7should make sure that the indexes you define have sufficient cardinality to be useful for
your queries In most performance and tuning engagements that we do, we spend a good
portion of our time removing indexes or redefining them correctly to better support the
queries being executed against the tables For more information on defining useful
indexes and how queries are optimized, see Chapters 34, and 35, “Understanding Query
Optimization.”
Following are some indexing guidelines:
Have an indexing strategy that matches the database/table usages; this is paramount
Do not index OLTP tables with a DSS indexing strategy and vice versa
For composite indexes, try to keep the more selective columns leftmost in the index
Be sure to index columns used in joins Joins are processed inefficiently if no index
on the column(s) is specified in a join
Tailor your indexes for your most critical queries and transactions You cannot index
for every possible query that might be run against your tables However, your
appli-cations will perform better if you can identify your critical and most frequently
executed queries and design indexes to support them
Avoid indexes on columns that have poor selectivity The Query Optimizer is not
likely to use the indexes, so they would simply take up space and add unnecessary
overhead during inserts, updates, and deletes
Use clustered indexes when you need to keep your data rows physically sorted in a
specific column order If your data is growing sequentially or is primarily accessed in
a particular order (such as range retrievals by date), the clustered index allows you to
achieve this more efficiently
Use nonclustered indexes to provide quicker direct access to data rows than a table
scan when searching for data values not defined in your clustered index Create
nonclustered indexes wisely You can often add a few other data columns in the
nonclustered index (to the end of the index definition) to help satisfy SQL queries
completely in the index (and not have to read the data page and incur some extra
I/O) This is termed “covering your query.” All query columns can be satisfied from
the index structure
Consider specifying a clustered index fill factor (free space) value to minimize page
splits for volatile tables Keep in mind, however, that the fill factor is lost over time
as rows are added to the table and pages fill up You might need to implement a
database maintenance job that runs periodically to rebuild your indexes and reapply
the fill factor to the data and index pages
Be extremely aware of the table/index statistics that the optimizer has available to it
When your table has changed by more than 20% from updates, inserts, or deletes,
the data distribution can be affected quite a bit, and the optimizer decisions can
change greatly You’ll often want to ensure that the Auto-Update Statistics option is
enabled for your databases to help ensure that index statistics are kept up-to-date as
your data changes
Trang 8View Design Guidelines
In general, you can have as many views as you want Views are not tables and do not take
up any storage space (unless you create an index on the view) They are merely an
abstrac-tion for convenience Except for indexed views, views do not store any data; the results of
a view are materialized at the time the query is run against the view and the data is
retrieved from the underlying tables Views can be used to hide complex queries, can be
used to control data access, and can be used in the same place as a table in the FROM
state-ment of any SQL statestate-ment
Following are some view design guidelines:
Use views to hide tables that change their structure often By using views to
provide a stable data access view to your application, you can greatly reduce
programming changes
Utilize views to control security and control access to table data at the data value level
Be careful of overusing views containing complex multitable queries, especially code
that joins such views together When the query is materialized, what may appear as
a simple join between two or three views can result in an expensive join between
numerous tables, sometimes including joins to a single table multiple times
Use indexed views to dramatically improve performance for data accesses done via
views Essentially, SQL Server creates an indexed lookup via the view to the
underly-ing table’s data There is storage and overhead associated with these views, so be
careful when you utilize this performance feature Although indexed views can help
improve the performance of SELECTstatements, they add overhead to INSERT,
UPDATE, and DELETEstatements because the rows in the indexed view need to be
maintained as data rows are modified, similar to the maintenance overhead of
indexes
For more information on creating and using views, see Chapter 27, “Creating and
Managing Views.”
Transact-SQL Guidelines
Overall, how you write your Transact-SQL (T-SQL) code can have one of the greatest
impacts on your SQL Server performance Regardless of how well you’ve optimized your
server configuration and database design, poorly written and inefficient SQL code still
results in poor performance The following sections list some general guidelines to help
you write efficient, faster-performing code
General T-SQL Coding Guidelines
UseIF EXISTSinstead of SELECT COUNT(*)when checking only for the existence of
any matching data values IF EXISTSstops the processing of the SELECTquery as
soon as the first matching row is found, whereas SELECT COUNT(*)continues
search-ing until all matches are found, wastsearch-ing I/O and CPU cycles
Trang 9Using Exists/Not Exists in a sub-query is preferable to IN/ NOT IN for sets that are
queried As the potential target size of the set used in the IN gets larger, the
perfor-mance benefit increases
Avoid unnecessary ORDER BYorDISTINCTclauses Unless the Query Optimizer
deter-mines that the rows will be returned in sorted order or all rows are unique, these
operations require a worktable for processing the results, which incurs extra
over-head and I/O Avoid these operations if it is not imperative for the rows to be
returned in a specific order or if it’s not necessary to eliminate duplicate rows
UseUNION ALLinstead of UNIONif you do not need to eliminate duplicate result rows
from the result sets being combined with the UNIONoperator The UNIONstatement
has to combine the result sets into a worktable to remove any duplicate rows from
the result set UNION ALLsimply concatenates the result sets together, without the
overhead of putting them into a worktable to remove duplicate rows
Use table variables instead of temporary tables whenever possible or feasible Table
variables are memory resident and do not incur the I/O overhead and system table
and I/O contention that can occur intempdbwith normal temporary tables
If you need to use temporary tables, keep them as small as possible so they are created
and populated more quickly and use less memory and incur less I/O Select only the
required columns rather than usingSELECT *, and retrieve only the rows from the
base table that you actually need to reference The smaller the temporary table, the
faster it is to create and access the table
If a temporary table is of sufficient size and will be accessed multiple times, it is
often cost effective to create an index on it on the column(s) that will be referenced
in the search arguments (SARGs) of queries against the temporary table Do this only
if the time it takes to create the index plus the time the queries take to run using the
index is less than the sum total of the time it takes the queries against the temporary
table to run without the index
Avoid unnecessary function executions If you call a SQL Server function (for
example,getdate()) repeatedly within T-SQL code, consider using a local variable to
hold the value returned by the function and use the local variable repeatedly
throughout your SQL statements rather than repeatedly executing the SQL Server
function This saves CPU cycles within your T-SQL code
Try to use set-oriented operations instead of cursor operations whenever possible and
feasible SQL Server is optimized for set-oriented operations, so they are almost
always faster than cursor operations performing the same task However, one
poten-tial exception to this rule is if performing a large set-oriented operation lead to
locking concurrency issues Even though a single update runs faster than a cursor,
while it is running, the single update might end up locking the entire table, or large
portions of the table, for an extended period of time This would prevent other users
from accessing the table during the update If concurrent access to the table is more
important than the time it takes for the update itself to complete, you might want
to consider using a cursor
Trang 10Consider using the MERGEstatement introduced in SQL Server 2008 when you need
to perform multiple updates against a table (UPDATE,INSERT, or DELETE) because it
enables you to perform these operations in a single pass of the table rather than
perform a separate pass for each operation
Consider using the OUTPUTclause to return results from INSERT,UPDATE, or DELETE
statements rather than having to perform a separate lookup against the table
Use search arguments that can be effectively optimized by the Query Optimizer Try
to avoid using any negative logic in your SARGs (for example, !=,<>,not in) or
performing operations on, or applying functions to, the columns in the SARG Avoid
using expressions in your SARGs where the search value cannot be evaluated until
runtime (such as local variables, functions, and aggregations in subqueries) because
the optimizer cannot accurately determine the number of matching rows because it
doesn’t have a value to compare against the histogram values during query
optimiza-tion Consider putting the queries into stored procedures and passing in the value of
the expression as a parameter The Query Optimizer evaluates the value of a
parame-ter prior to optimization SQL Server evaluates the expression prior to optimizing the
stored procedure
Avoid data type mismatches on join columns
Avoid writing large complex queries whenever possible Complex queries with a
large number of tables and join conditions can take a long time to optimize It may
not be possible for the Query Optimizer to analyze the entire set of plan alternatives,
and it is possible that a suboptimal query plans could be chosen Typically, if a query
involves more than 12 tables, it is likely that the Query Optimizer will have to rely
on heuristics and shortcuts to generate a query plan and may miss some optimal
strategies
For more tips and information on coding effective and efficient queries, see Chapters 43,
“Transact-SQL Programming Guidelines, Tips, and Tricks,” and 35
Stored Procedure Guidelines
Use stored procedures for SQL execution from your applications Stored procedure
execution can be more efficient that ad hoc SQL due to reduced network traffic and
query plan caching for stored procedures
Use stored procedures to make your database sort of a “black box” as far as the as
your application code is concerned If all database access is managed through stored
procedures, the applications are shielded from possible changes to the underlying
database structures You can simply modify the existing stored procedures to reflect
the changes to the database structures without requiring any changes to the
front-end application code
Ensure that your parameter data types match the column data types they are being
compared against to avoid data type mismatches and poor query optimization