Introduction xv 1.1 Partitioning tables and indexes 11.2 Building indexes online 21.3 Transact SQL improvements 21.4 Adding the .NET Framework 31.5 Trace and replay objects 41.6 Monitori
Trang 2Microsoft ® SQL Server TM
2005 Performance
Optimization and Tuning
Handbook
Trang 3This Page Intentionally Left Blank
Trang 4Amsterdam • Boston • Heidelberg • London • New York • Oxford
Paris • San Diego• San Francisco • Singapore • Sydney • Tokyo
Digital Press is an imprint of Elsevier
Trang 5Digital Press is an imprint of Elsevier
30 Corporate Drive, Suite 400, Burlington, MA 01803, USA
Linacre House, Jordan Hill, Oxford OX2 8DP, UK
Copyright © 2007, Elsevier Inc All rights reserved
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333,E-mail: permissions@elsevier.com You may also complete your request online via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact”then “Copyright and Permission” and then “Obtaining Permissions.”
Recognizing the importance of preserving what has been written, Elsevier prints its books on acid-free paper whenever possible
Library of Congress Cataloging-in-Publication Data
Application Submitted
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Trang 6Contents at a Glance
Introduction xv
11 Architectural Performance Options and Choices 409
Trang 7This Page Intentionally Left Blank
Trang 8Introduction xv
1.1 Partitioning tables and indexes 11.2 Building indexes online 21.3 Transact SQL improvements 21.4 Adding the NET Framework 31.5 Trace and replay objects 41.6 Monitoring resource consumption with SQL OS 41.7 Establishing baseline metrics 41.8 Start using the GUI tools 71.8.1 SQL Server Management Studio 81.8.2 SQL Server Configuration Manager 91.8.3 Database Engine Tuning Advisor 91.8.4 SQL Server Profiler 121.8.5 Business Intelligence Development Studio 141.9 Availability and scalability 15
2.1 Introducing logical database design for performance 192.2 Commercial normalization techniques 212.2.1 Referential integrity 222.2.2 Primary and foreign keys 232.2.3 Business rules in a relational database model 252.2.4 Alternate indexes 262.3 Denormalization for performance 29
Trang 9viii Contents
2.3.1 What is denormalization? 312.3.2 Denormalizing the already normalized 312.3.2.1 Multiple table joins (more than two tables) 322.3.2.2 Multiple table joins finding a few fields 322.3.2.3 The presence of composite keys 342.3.2.4 One-to-one relationships 352.3.2.5 Denormalize static tables 372.3.2.6 Reconstructing collection lists 382.3.2.7 Removing tables with common fields 382.3.2.8 Reincorporating transitive dependencies 392.3.3 Denormalizing by context 402.3.3.1 Copies of single fields across tables 402.3.3.2 Summary fields in parent tables 422.3.3.3 Separating data by activity and
application requirements 432.3.3.4 Local application caching 442.3.4 Denormalizing and special purpose objects 442.4 Extreme denormalization in data warehouses 482.4.1 The dimensional data model 512.4.1.1 What is a star schema? 532.4.1.2 What is a snowflake schema? 542.4.2 Data warehouse data model design basics 562.4.2.1 Dimension tables 57
2.4.2.3 Other factors to consider during design 63
3.1 Introducing physical database design 653.2 Data volume analysis 673.3 Transaction analysis 693.4 Hardware environment considerations 73
4.3 Increasing the size of a database 834.4 Decreasing the size of a database 844.4.1 The autoshrink database option 864.4.2 Shrinking a database in the SQL Server
Trang 10Contents ix
4.4.3 Shrinking a database using DBCC statements 884.5 Modifying filegroup properties 904.6 Setting database options 924.7 Displaying information about databases 954.8 System tables used in database configuration 98
4.11 Looking into database pages 1084.12 Pages for space management 1124.13 Partitioning tables into physical chunks 1154.13.1 Types of partitions 1174.13.2 Creating a range partition 1174.13.3 Creating an even distribution partition 1184.14 The BankingDB database 119
Framework (SQL-DMF) 1555.9 Dropping and renaming indexes 1575.10 Displaying information about indexes 1585.10.1 The system stored procedure sp_helpindex 1585.10.2 The system table sysindexes 1595.10.3 Using metadata functions to obtain information
5.10.4 The DBCC statement DBCC SHOWCONTIG 1635.11 Creating indexes on views 1675.12 Creating indexes with computed columns 1705.13 Using indexes to retrieve data 171
Trang 11x Contents
5.13.1 Retrieving a single row 1735.13.2 Retrieving a range of rows 1755.13.3 Covered queries 1775.13.4 Retrieving a single row with a clustered index on
6.1 The SELECT statement 1946.1.1 Filtering with the WHERE clause 1956.1.2 Sorting with the ORDER BY clause 1966.1.2.1 Overriding WHERE with ORDER BY 1976.1.3 Grouping result sets 1986.1.3.1 Sorting with the GROUP BY clause 1986.1.3.2 Using DISTINCT 1996.1.3.3 The HAVING clause 199
6.2.1 Data type conversions 2006.3 Comparison conditions 2016.3.1 Equi, anti, and range 2026.3.2 LIKE pattern matching 203
6.4.3 How to tune a join 2096.5 Using subqueries for efficiency 210
Trang 12Contents xi
6.5.1 Correlated versus non-correlated subqueries 2106.5.2 IN versus EXISTS 2106.5.3 Nested subqueries 2106.5.4 Advanced subquery joins 2116.6 Specialized metadata objects 2136.7 Procedures in Transact SQL 214
7.1 When is a query optimized? 2187.2 The steps in query optimization 218
7.4.7.5 Multiple non-clustered indexes present 2457.5 Join order selection 2467.6 How joins are processed 2477.6.1 Nested loops joins 248
8.1 Text-based query plans and statistics 2598.1.1 SET SHOWPLAN_TEXT { ON | OFF } 2598.1.2 SET SHOWPLAN_ALL { ON | OFF } 2608.1.3 SET SHOWPLAN_XML { ON | OFF } 2658.1.4 SET STATISTICS PROFILE { ON | OFF } 266
Trang 13xii Contents
8.1.5 SET STATISTICS IO { ON | OFF } 2678.1.6 SET STATISTICS TIME { ON | OFF } 2688.1.7 SET STATISTICS XML { ON | OFF } 2708.2 Query plans in Management Studio 2708.2.1 Statistics and cost-based optimization 2758.3 Hinting to the optimizer 282
manipulation language statements 2968.4.2 Temporary tables 2978.4.3 Forcing recompilation 2988.4.4 Aging stored procedures from cache 3008.5 Non-stored procedure plans 3018.6 The syscacheobjects system table 304
9.1 SQL Server and CPU 3079.1.1 An overview of Windows and CPU utilization 3079.1.2 How SQL Server uses CPU 309
9.1.2.2 Use of symmetric multiprocessing systems 311
9.1.2.4 Query parallelism 3139.1.3 Investigating CPU bottlenecks 3149.1.4 Solving problems with CPU 3219.2 SQL Server and memory 3239.2.1 An overview of Windows virtual memory management 3239.2.2 How SQL Server uses memory 3259.2.2.1 Configuring memory for SQL Server 3269.2.3 Investigating memory bottlenecks 3299.2.4 Solving problems with memory 335
Trang 14Contents xiii
9.3 SQL Server and disk I/O 3359.3.1 An overview of Windows and disk I/O 3369.3.2 How SQL Server uses disk I/O 3399.3.2.1 An overview of the data cache 3409.3.2.2 Keeping tables and indexes in cache 3439.3.2.3 Read-ahead scans 3449.3.2.4 Shrinking database files 3469.3.3 Investigating disk I/O bottlenecks 3489.3.4 Solving problems with disk I/O 352
10.1 Why a locking protocol? 356
10.2 The SQL Server locking protocol 35810.2.1 Shared and exclusive locks 35810.2.2 Row-, page-, and table-level locking 36010.2.2.1 When are row-level locks used? 36110.2.2.2 When are table-level locks used? 36210.2.3 Lock timeouts 363
10.4.4 More modified locking behavior 405
Trang 1512.1 System stored procedures 42212.2 System monitor, performance logs, and alerts 42412.3 SQL Server 2005 Management Studio 42712.3.1 Client statistics 42712.3.2 The SQL Server Profiler 42812.3.2.1 What events can be traced? 42912.3.2.2 What information is collected? 43012.3.2.3 Filtering information 43112.3.2.4 Creating an SQL Server profiler trace 43112.3.2.5 Creating traces with stored procedures 43812.3.3 Database Engine Tuning Advisor 44212.4 SQL OS and resource consumption 443
Trang 16What is the goal of tuning an SQL Server database? The goal is to improveperformance until acceptable levels are reached Acceptable levels can bedefined in a number of ways For a large online transaction processing(OLTP) application the performance goal might be to provide sub-secondresponse time for critical transactions and to provide a response time of lessthan two seconds for 95 percent of the other main transactions For somesystems, typically batch systems, acceptable performance might be mea-sured in throughput For example, a settlement system may define accept-able performance in terms of the number of trades settled per hour For anovernight batch suite acceptable performance might be that it must finishbefore the business day starts
Whatever the system, designing for performance should start early inthe design process and continue after the application has gone live Per-formance tuning is not a one-off process but an iterative process duringwhich response time is measured, tuning performed, and response timemeasured again
There is no right way to design a database; there are a number of ble approaches and all these may be perfectly valid It is sometimes said thatperformance tuning is an art, not a science This may be true, but it isimportant to undertake performance tuning experiments with the samekind of rigorous, controlled conditions under which scientific experimentsare performed Measurements should be taken before and after any modifi-cation, and these should be made one at a time so it can be establishedwhich modification, if any, resulted in an improvement or degradation What areas should the database designer concentrate on? The simpleanswer to this question is that the database designer should concentrate onthose areas that will return the most benefit In my experience, for mostdatabase designs I have worked with, large gains are typically made in thearea of query and index design As we shall see later in this book, inappro-
Trang 17possi-xvi Introduction
priate indexes and badly written queries, as well as some other contributingfactors, can negatively influence the query optimizer such that it chooses aninefficient strategy
To give you some idea of the gains to be made in this area, I once wasasked to look at a query that joined a number of large tables together Thequery was abandoned after it had not completed within 12 hours Theaddition of an index in conjunction with a modification to the query meantthe query now completed in less than eight minutes! This magnitude ofgain cannot be achieved just by purchasing more hardware or by twiddlingwith some arcane SQL Server configuration option A database designer oradministrator’s time is always limited, so make the best use of it! The othermain area where gains can be dramatic is lock contention Removing lockbottlenecks in a system with a large number of users can have a huge impact
on response times
Now, some words of caution when chasing performance problems Ifusers phone up to tell you that they are getting poor response times, do notimmediately jump to conclusions about what is causing the problem Circle
at a high altitude first Having made sure that you are about to monitor thecorrect server, use the System Monitor to look at the CPU, disk subsystem,and memory use Are there any obvious bottlenecks? If there are, then lookfor the culprit Everyone blames the database, but it could just as easily besomeone running his or her favorite game! If there are no obvious bottle-necks, and the CPU, disk, and memory counters in the System Monitor arelower than usual, then that might tell you something Perhaps the network
is sluggish or there is lock contention Also be aware of the fact that somebottlenecks hide others A memory bottleneck often manifests itself as adisk bottleneck
There is no substitute for knowing your own server and knowing thenormal range of System Monitor counters Establish trends Measure a set
of counters regularly, and then, when someone comments that the system isslow, you can wave a graph in front of him or her showing that it isn’t! Also there are special thanks to be made to Craig Mullins for his work
on technical editing of this book
So, when do we start to worry about performance? As soon as possible,
of course! We want to take the logical design and start to look at how weshould transform it into an efficient physical design
Gavin Powell can be contacted at the following email address:
ezpowell@ezpowell.com
Trang 18Performance and SQL Server 2005
Partitioning lets you split large chunks of data in much more manageablesmaller physical chunks of disk space The intention is to reduce I/O activ-ity For example, let’s say you have a table with 10 million rows and youonly want to read 1 million rows to compile an analytical report If the table
is divided into 10 partitions, and your 1 million rows are contained in a gle partition, then you get to read 1 million rows as opposed to 10 millionrows On that scale you can get quite a serious difference in I/O activity for
sin-a single report
SQL Server 2005 allows for table partitioning and index partitioning.What this means is that you can create a table as a partitioned table, defin-ing specifically where each physical chunk of the table or index resides.SQL Server 2000 partitioning was essentially manual partitioning, usingmultiple tables, distributed across multiple SQL Server computers Then aview (partition view) was created to overlay those tables across the servers
In other words, a query required access to a view, which contained a query,not data SQL Server 2005 table partitions contain real physical rows.Physically partitioning tables and indexes has a number of benefits:
Data can be read from a single partition at once, cutting down mously on performance hogging I/O
enor- Data can be accessed from multiple partitions in parallel, which getsthings done at double the speed, depending on how many processors
a server platform has
Different partitions can be managed separately, without having tointerfere with the entire table
Trang 192 1.3 Transact SQL improvements
Building an index online allows the table indexed against to be accessedduring the index creation process Creating or regenerating an index for avery large table can consume a considerable period of time (hours, days).Without online index building, creating an index puts a table offline If that
is crucial to the running of a computer system, then you have down time.The result was usually that indexes are not created, or never regenerated.Even the most versatile BTree indexes can sometimes require rebuilding
to increase their performance Constant data manipulation activity on atable (record insert, update and deletions) can cause a BTree index to deteri-orate over time Online index building is crucial to the constant uptimerequired by modern databases for popular websites
Transact SQL provides programmable access to SQL Server Programmableaccess means that Transact SQL allows you to construct database storedcode blocks, such as stored procedures, triggers, and functions These codeblocks have direct access to other database objects—most significantlytables where query and data manipulation commands can be executeddirectly in the stored code blocks; and code blocks are executed on the data-base server New capabilities added to Transact SQL in SQL Server 2005are as follows:
Error handling
Recursive queries
Better query writing capabilities
There is also something new to SQL Server 2005 called Multiple ActiveResult Sets (MARS) MARS allows for more than a single set of rows for asingle connection In other words, a second query can be submitted to aSQL Server while the result set of a first query is still being returned fromdatabase server to client application
The overall result of Transact SQL enhancements to SQL Server 2005 isincreased performance of code, better written code, and more versatility.Better written code can ultimately make for better performing applications
in general
Trang 201.4 Adding the NET Framework 3
You can use programming languages other than just Transact SQL andembed code into SQL Server as NET Framework executables These pro-graming languages can leverage existing personnel skills Perhaps moreimportantly, some tasks can be written in programming languages moreappropriate to a task at hand For example a language like C# can be used,letting a programmer take advantage of the enormous speed advantages ofwriting executable code using the C programming language
Overall, you get support for languages not inherently part of SQL Server(Transact SQL) You get faster and easier development You get to use WebServices and XML (with Native XML capabilities using XML data types).The result is faster development, better development, and hopefully betterover database performance in the long run
The result you get is something called managed code Managed code iscode executed by the NET Framework As already stated, managed codecan be written using all sorts of programming languages Different pro-gramming languages have different benefits For example, C is fast and effi-cient, where Visual Basic is easier to write code with but executes slower.Additionally, the NET Framework has tremendous built-in functionality NET is much, much more versatile and powerful than Transact SQL.There is much to be said for placing executable into a database, on adatabase server such as SQL Server There is also much to be said againstthis practice Essentially, the more metadata and logic you add to a data-base, the more business logic you add to a database In my experience, add-ing too much business logic to a database can cause performance problems
in the long run After all, application development languages cater to ber crunching and other tasks Why put intensive, non-data access process-ing into a database? The database system has enough to do just in keepingyour data up to date and available
num-Managed code also compiles to native code, or native form in SQLServer, immediately prior to execution So, it should execute a little fasterbecause it executes in a form which is amenable to best performance inSQL Server
SQL Server 2005 includes a new management object model called SQLManagement Objects (SMO) The SMO has a basis in the NET Frame-work The new graphical, SQL Server Management Studio, is written usingthe SMO
Trang 214 1.7 Establishing baseline metrics
Tracing is the process of producing large amounts of log entry informationduring the process of normal database operations However, it might beprudent to not choose tracing as a first option to solving a performanceissue Tracing can hurt performance simply because it generates lots of data.The point of producing trace files is to aid in finding errors or performancebottlenecks, which cannot be deciphered by more readily available means
So, tracing quite literally produces trace information Replay allows replay
of actions that generated those trace events So, you could replay a sequence
of events against a SQL Server, without actually changing any data, andreproduce the unpleasant performance problem And then you could try toreanalyze the problem, try to decipher it, and try to resolve or improve it
SQL OS is a new tool for SQL Server 2005, which lives between an SQLServer database and the underlying Windows operating system (OS) Theoperating system manages, runs, and accesses computer hardware on yourdatabase server, such as CPU, memory, disk I/O, and even tasks and sched-uling SQL OS allows a direct picture into the hardware side of SQL Serverand how the database is perhaps abusing that hardware and operating sys-tem The idea is to view the hardware and the operating system from within
an SQL Server 2005 database
A baseline is a setting established by a database administrator, either written
on paper, but preferably stored in a database (generated by the database).This baseline establishes an acceptable standard of performance If a base-line is exceeded then the database is deemed to have a performance prob-lem A metric is essentially a measure of something The result is manymetrics, with established acceptable baseline values If one or more metricbaselines are exceeded then there is deemed to be one or more performanceproblems Additionally, each metric can be exceeded for a previously estab-lished reason, based on what the metric is So, if a table, with its indexes,has an established baseline value of 10 bytes per minute of I/O activity, andsuddenly that value jumps up to 10 giga bytes per minute—there is proba-bly a performance problem
Trang 221.7 Establishing baseline metrics 5
An established baseline metric is a measure of normal or acceptableactivity
Metric baselines have more significance (there are more metrics) in SQLServer 2005 than in SQL Server 2000 The overall effect is that an SQLServer 2005 database is now more easily monitored, and the prospect ofsome automated tuning activities becomes more practical in the long term.SQL Server 2005 has added over 70 additional baseline measures applicable
to performance of an SQL Server database These new baseline metricscover areas such as memory usage, locking activities, scheduling, networkusage, transaction management, and disk I/O activity
The obvious answer to a situation such as this is that a key index isdropped, corrupt, or deteriorated Or a query could be doing somethingunexpected such as reading all rows in a very large table
Using metrics and their established baseline or expected values, one canperform a certain amount of automated monitoring and detection of per-formance problems
Baseline metrics are essentially statistical values collected for a set ofmetrics
A metric is a measure of some activity in a database
The most effective method of gathering those expected metric values is
to collect multiple values—and then aggregate and average them And thusthe term statistic applies because a statistic is an aggregate or average value,resulting from a sample of multiple values So, when some activity veersaway from previously established statistics, you know that there could besome kind of performance problem—the larger the variation, the larger thepotential problem
Baseline metrics should be gathered in the following activity sectors:
High load: Peak times (highest database activity)
Low load: Off peak times (lowest database activity)
Batch activity: Batch processing time such as during backup
process-ing and heavy reportprocess-ing or extraction cycles
Downtime: How long it takes to backup, restore, and recover is
something the executive management will always have to detail to ents This equates to uptime and potential downtime
Trang 23cli-6 1.7 Establishing baseline metrics
Some very generalized categories areas of metric baseline measurementare as follows:
Applications database access: The most common performance
problems are caused by poorly built queries and locking or hot blocks(conflict caused by too much concurrency on the same data)
In computer jargon, concurrency means lots of users accessingand changing the same data all at the same time If there are toomany concurrent users, ultimately any relational database has its lim-itations on what it can manage efficiently
Internalized database activity: Statistics must not only be present
but also kept up to date When a query reads a table, it uses what’scalled an optimizer process to make a wild guess at what it should do
If a table has 1 million rows, plus an index, and a query seeks 1record, the optimizer will tell the query to read the index The opti-mizer uses statistics to compare 1 record required, within 1 millionrows available Without the optimizer 1 million rows will be read tofind 1 record Without the statistics the optimizer cannot even hazard
a guess and will probably read everything If statistics are out of datewhere the optimizer thinks the table has 2 rows, but there are really 1million, then the optimizer will likely guess very badly
Internalized database structure: Too much business logic, such as
stored procedures or a highly over normalized table structure, canultimately cause overloading of a database, slowing performance
because a database is just a little too top heavy.
Database configuration: An OLTP database accesses a few rows at a
time It often uses indexes, depending on table size, and will pass verysmall amounts of data across network and telephone cables So, anOLTP database can be specifically configured to use lots of mem-ory—things like caching on client computers and middle tier servers(web and application servers), plus very little I/O A data warehouse
on the other hand produces a small number of very large tions, with low memory usage, enormous amounts of I/O, and lots ofthroughput processing So, a data warehouse doesn’t care too muchabout memory but wants the fastest access to disk possible, plus lots
transac-of localized (LAN) network bandwidth An OLTP database uses allhardware resources and a data warehouse uses mainly I/O
Hardware resource usage: This is really very similar to the above
point under database configuration, expect that hardware can be
Trang 241.8 Start using the GUI tools 7
improved upon In some circumstances beefing up hardware willsolve performance issues For example, an OLTP database serverneeds plenty of memory, whereas a data warehouse does well with fastdisks, and perhaps multiple CPUs with partitioning for rapid parallelprocessing Beefing up hardware doesn’t always help Sometimesincreasing CPU speed and number, or increasing onboard memory,can only hide performance problems until a database grows in physi-cal size, or there are more users—the problem still exists For exam-ple, poor query coding and indexing in an OLTP database will alwayscause performance problems, no matter how much money is spent
on hardware Sometimes hardware solutions are easier and cheaper,but often only a stopgap solution
Network design and configuration: Network bandwidth and
bot-tlenecks can cause problems sometimes, but this is something rarelyseen in commercial environments because the network engineers areusually prepared for potential bandwidth requirements
The above categories are most often the culprits of the biggest mance issues There are other possibilities, but they are rare and don’t reallywarrant mentioning at this point Additionally, the most frequent and exac-erbating causes of performance problems are usually the most obvious ones,and more often than not something to do with the people maintaining andusing the software, inadequate software, or inadequate hardware Hardware
perfor-is usually the easiest problem to fix Fixing software perfor-is more expensivedepending on location of errors in database or application software Per-suading users to use your applications and database the way you want iseither a matter of expensive training, or developers having built softwarewithout enough of a human use (user friendly) perspective in mind
Traditionally, many database administrators will still utilize command linetools because they perceive them as being more grassroots and, thus easier
to use Sometimes these administrators are correct I am as guilty of this as
is anyone else However, as in any profession, new gadgets are oftenfrowned upon due to simple resistance to change and a desire to deal withtools and methods which are familiar The new GUI tools appearing inmany relational databases these days are just too good to miss
Trang 258 1.8 Start using the GUI tools
1.8.1 SQL Server Management Studio
The SQL Server Management Studio is a new tool used to manage all thefacets of an SQL Server, including multiple databases, tables, indexes, fields,and data types, anything you can think of Figure 1.1 shows a sample view
of the SQL Server Management Studio tool in SQL Server 2005
SQL Server Management Studio is a fully integrated, multi-task ented screen (console) that can be used to manage all aspects of an SQLServer installation, including direct access to metadata and business logic,integration, analysis, reports, notification, scheduling, and XML, amongother facets of SQL Server architecture Additionally, queries and scriptingcan be constructed, tested, and executed Scripting also includes versioningcontrol (multiple historical versions of the same piece of code allow forbacktracking) It can also be used for very easy general database mainte-nance
ori-SQL Server Management Studio is in reality wholly constructed usingsomething called SQL Management Objects (SMO) SMO is essentially avery large group of predefined objects, built-in and reusable, which can beused to access all functionality of a SQL Server database SMO is writtenusing the object-oriented and highly versatile NET Framework Database
Figure 1.1
SQL Server
Management
Studio
Trang 261.8 Start using the GUI tools 9
administrators and programmers can use SMO objects in order to createtheir own customized procedures, for instance, to automate something likedaily backup processing
SMO is an SQL Server 2005 updated and more reliable version of tributed Management Objects (DMO), as seen in versions of SQL Serverprior to SQL Server 2005
Dis-1.8.2 SQL Server Configuration Manager
The SQL Server 2005 Configuration Manager tool allows access to theoperating system level This includes services such as configuration for cli-ent application access to an SQL Server database, as well as access to data-base server services running on a Windows server This is all shown inFigure 1.2
1.8.3 Database Engine Tuning Advisor
The SQL Server 2005 Database Engine Tuning Advisor tool is just that, atuning advisor used to assess options for tuning the performance of an SQL
Figure 1.2
SQL Server
Configuration
Manager
Trang 2710 1.8 Start using the GUI tools
Server database This tool includes both a Graphical User Interface in
Win-dows and a command line tool called dta.exe.
This book will focus on the GUI tools as they are becoming more inent in recent versions of all relational databases,
prom-The SQL Server 2005 Database Engine Tuning Advisor includes othertools from SQL Server 2000, such as the Index Tuning Wizard However,SQL Server 2005 is very much enhanced to cater to more scenarios and moresensible recommendations In the past, recommendations have been basic atbest, and even wildly incorrect Also, now included are more object typesincluding differentiating between clustered and non-clustered indexing, plusindexing for view, and of course partitioning and parallel processing
The Database Engine Tuning Advisor is backwardly compatible withprevious versions of SQL Server
New features provided by the SQL Server 2005 Database Engine ing Advisor tool are as follows:
Tun- Multiple databases: Multiple databases can be accessed at the same
time
More objects types: As already stated, more object types can be
tuned This includes XML, XML data types, and partitioning mendations There is also more versatility in choosing what to tuneand what to recommend for tuning Figure 1.3 shows availableoptions for differing object types allowed to be subjected to analysis
recom-And there are also some advanced tuning options as shown in Figure 1.4
Time period workload analysis: Workloads can be analyzed over set
time periods, thus isolating peak times, off-peak times, and so on.Figure 1.5 shows analysis, allowance of time period settings, as well asapplication and evaluation of recommendations made by the tool
Tuning log entries: A log file containing a record of events which the
Database Engine Tuning Advisor cannot tune automatically This logcan be use by a database administrator to attempt manual tuning ifappropriate
Negligible size test database copy: The Database Engine Tuning
Advisor can create a duplicate test copy of a production environment,
Trang 281.8 Start using the GUI tools 11
in order to offload performance tuning testing processing Mostimportantly, the test database created does not copy data The onlything copied is the state of a database without the actual data This isactually very easy for a relational database like SQL Server All that is
Trang 2912 1.8 Start using the GUI tools
copied are objects, such as tables and indexes, plus statistics of thoseobjects Typical table statistics include record counts and physicalsize This allows a process such as the optimizer to accurately estimatehow to execute a query
What-if scenarios: A database administrator can create a
configura-tion and scenario and subject it to the Database Engine Tuning sor The advisory tool can give a response as to the possible effects ofspecified configuration changes In other words, you can experimentwith changes, and get an estimation of their impact, without makingthose changes in a production environment
Advi-1.8.4 SQL Server Profiler
The SQL Server Profiler tool was available in SQL Server 2000 but hassome improvements in SQL Server 2005 Improvements apply to therecording of things or events, which have happened in the database, and theability to replay those recordings The replay feature allows repetition ofproblematic scenarios which are difficult to resolve
Essentially, the SQL Server Profiler is a direct window into trace files.Trace for any relational database contain a record of some, most, or even allactivities in a database Trace files can also include general table and index-ing statistics as well Performance issues related to trace files themselves isthat tracing can be relatively intensive, depending on how tracing is config-ured Sometimes too much tracing can affect overall database performance,
Trang 301.8 Start using the GUI tools 13
and sometimes even quite drastically Tracing is usually a last resort but also
a very powerful option when it comes to tracking down the reason for formance problems and bottlenecks
per-There are a number of things new to SQL Server 2005 for SQL ServerProfiler:
Trace file replays: Rollover trace files can be replayed Figure 1.6
shows various options that can be set for tracing, rollover, and quent tracing entry replay
subse- XML: The profiler tool has more flexibility by allowing for various
definitions using XML
Query plans in XML: Query plans can be stored as XML allowing
for viewing without database access
Trace entries as XML: Trace file entries can be stored as XML
allow-ing for viewallow-ing without database access
Figure 1.6
SQL Server Profiler
options
Trang 3114 1.8 Start using the GUI tools
Analysis Services: SQL Server Profiler now allows for tracing of
Analysis Services (SQL Server data warehousing) and Integration vices
Ser- Various other things: Aggregate views of trace results and
Perfor-mance Monitor counters matched with SQL Server database events.The Windows Performance Monitor tool is shown in Figure 1.7
1.8.5 Business Intelligence Development Studio
This tool is used to build something called Business Intelligence (BI)objects The BI Development Studio is a new SQL Server 2005 tool used tomanage projects for development This tool allows for integration of variousaspects of SQL Server databases, including analysis, integration, and report-ing services This tool doesn’t really do much for database performance ingeneral, but moreover can help to speed up development, and make devel-opment a cleaner and better coded process In the long term, better builtapplications perform better
Trang 321.9 Availability and scalability 15
Availability means that an SQL Server database will have less down timeand is less likely to irritate your source of income—your customers Scal-ability means you can now service more customers with SQL Server 2005.Availability and scalability are improved in SQL Server 2005 by the addi-tion and enhancement of the following:
Data mirroring: Addition hot standby databases This is called
data-base mirroring in SQL Server
Clustering: Clustering is introduced which is not the same thing as a
hot standby A hot standby takes over from a primary database, in theevent that the primary database fails Standby is purely failover abil-ity A clustered environment provides more capacity and up-time byallowing connections and requests to be serviced by more than onecomputer in a cluster of computers Many computers work in a clus-ter, in concert with each other A cluster of multiple SQL Server data-bases effectively becomes a single database spread across multiplelocally located computers—just a much larger and more powerfuldatabase Essentially, clustering provides higher capacity, speedthrough mirrored and parallel databases access, in addition to justfailover potential When failover occurs in a clustered environmentthe failed node is simply no longer servicing the needs of the entirecluster, whereas a hot standby is a switch from one server to another
Replication: Replication is enhanced in SQL Server 2005 Workload
can be spread across multiple, distributed, replicated databases Alsothe addition of a graphical Replication Monitor tool eases manage-ment of replication and distributed databases
Snapshot flashbacks: A snapshot is a picture of a database, frozen at
a specific point in time This allows users to go back in time, and look
at data at that point in time in the past The performance benefit isthat the availability of old data sets, in static databases, allows queries
to be executed against multiple sets of the same data
Backup and restore: Improved restoration using snapshots to
enhance restore after crash recovery, and giving partial, general onlineaccess, during the recovery process
Bulk imports: This is improved in SQL Server 2005.
Trang 3316 1.10 Other useful stuff
Other useful stuff introduced in SQL Server 2005 offers the potential ofimproving the speed of development and software quality Improvementsinclude the following:
Native XML data types: Native XML allows the storage of XML
doc-uments in their entirety inside an SQL Server database The termNative implies that that stored XML document is not only stored asthe textual data of the XML document, but it also includes thebrowser interpretive XML structure and metadata meaning In otherwords, XML data types are directly accessible from the database asfully executable XML documents The result is the inclusion of all thepower, versatility, and performance of XML in general—ALL OF IT!Essentially, XML data types allow direct access to XQuery, SOAP,XML data manipulation languages, XSD—anything and everythingXML Also included are specifics of XML exclusive to SQL Server [1]
XML is the eXtensible Markup Language There is a whole host of
stuff added to SQL Server 2005 with the introduction of XML datatypes You can even create specific XML indexes, indexing storedXML data type and XML documents
Service broker notification: This helps performance enormously
because multiple applications are often tied together in eCommercearchitectures The Server Broker part essentially organizes messagesbetween different things The notification part knows what to send,and where For example, a user purchases a book online at the Ama-zon website What happens? Transactions are placed into multipledifferent types of databases:
stock inventory databases
shipping detail databases
payment processing such as credit cards or a provider like Paypal
data warehouse archives
accounting databasesThe different targets for data messages are really dependent on thesize of the online operation The larger the retailer, the more distrib-uted their architecture becomes This stuff is just too big to manageall in one place
New data modeling techniques: A Unified Dimensional Model
(UDM) used for OLAP and analysis is data warehouse environments
Trang 341 Beginning XML Databases, Gavin Powell, Nov 2006, ISBN:
0471791202, Wiley
Trang 35This Page Intentionally Left Blank
Trang 36Logical Database Design for Performance
In database parlance, logical design is the human perceivable organization
of the slots into which data is put These are the tables, the fields, datatypes, and the relationships between tables Physical design is the underly-ing file structure, within the operating system, out of which a database isbuilt as a whole This chapter covers logical database design Physical design
is covered in the next chapter
This book is about performance Some knowledge of logical relationaldatabase design, plus general underlying operating system and file systemstructure, and file system functioning is assumed Let’s begin with logicaldatabase design, with database performance foremost in mind
performance
Logical database design for relational databases can be divided into a ber of distinct areas:
num- Normalization: A sequence of steps by which a relational database
model is both created and improved upon The sequence of steps
involved in the normalization process is called normal forms Each
normal form improves the previous one The objective is to removeredundancy and duplication, plus reduce the chance of inconsisten-cies in data and increase the precision of relationships between differ-ent data sets within a database
Denormalization: Does the opposite of normalization by undoing
normal forms It thus reintroduces some redundancy and tion, plus increases the potential for inconsistencies in data, and so
duplica-on Denormalization is typically implemented to help increase
Trang 37per-20 2.1 Introducing logical database design for performance
formance The extreme in denormalization helps to create specializeddata models in data warehouses
Object Design: The advent of object databases was first expected to
introduce a new competitor to relational database technology Thisdid not happen What did happen was that relational databases haveabsorbed some aspects of object modeling, in many cases helping toenhance the relational database model into what is now known as anobject-relational database model
Objects stored in object-relational databases typically do not enhancerelational database performance, but rather enhance functionality and ver-satility
To find out more about normalization [1] and object design [2] you willhave to read other books as these topics are both very comprehensive all bythemselves There simply isn’t enough space in this book This book dealswith performance tuning So, let’s begin with the topic of normalization,and how it can be both used (or not used) to help enhance the performance
of a relational database in general
So, how do we go about tuning a relational database model? What doesnormalization have to do with tuning? There are a few simple guidelines tofollow and some things to watch out for:
Normalization optimizes data modification at the possible expense ofdata retrieval Denormalization is just the opposite, optimizing dataretrieval at the expense of data modification
Too little normalization can lead to too much duplication The resultcould be a database that is bigger than it should be, resulting in moredisk I/O Then again, disk space is cheap compared with processorand memory power
Incorrect normalization is often made obvious by convoluted andcomplex application code
Too much normalization leads to overcomplex SQL code which can
be difficult, if not impossible, to tune Be very careful implementingbeyond 3rd normal form in a commercial environment
Too many tables results in bigger joins, which makes for slower queries
Quite often databases are designed without forehand knowledge ofapplications The data model could be built on a purely theoretical
Trang 382.2 Commercial normalization techniques 21
basis Later in the development cycle, applications may have culty mating to a highly granular data model (highly normalized).One possible answer is that both development and administrationpeople should be involved in data modeling Busy commercial devel-opment projects rarely have spare time to ensure that absolutelyeverything is taken into account It’s just too expensive It should beacceptable to alter the data model at least during the developmentprocess, possibly substantially Most of the problems with relationaldatabase model tuning are normalization related
diffi- Normalization should be simple because it is simple! Don’t plicate it Normalization is somewhat based on mathematical set the-ory, which is very simple mathematics
overcom- Watch out for excessive use of outer joins in SQL code This couldmean that your data model is too granular You could have overusedthe higher normal forms Higher normal forms are rarely needed inthe name of efficiency, but rather preferred in the name of perfection,and possibly overapplication of business rules into database defini-
tions Sometimes excessive use of outer joins might be akin to: Go
and get this Oh! Go and get that too because this doesn’t quite cover it.
The other side of normalization is of course denormalization In manycases, denormalization is the undoing of normalization Normalization isperformed by the application of normal form transformations In othercases, denormalization is performed through the application of numerousspecialized tricks Let’s begin with some basic rules for normalization in amodern commercial environment
The terms modern and commercial imply a lack of traditional
normaliza-tion This also means that techniques used in a commercial environmentare likely to bear only a vague resemblance to what you were taught aboutnormalization in college or university In busy commercial environments,relational database models tend to contradict the mathematical purity of ahighly normalized table structure Purity is often sacrificed for the sake ofperformance, particularly with respect to queries This is often becausecommercial implementations tend to do things that an academic wouldnever dream of, such as mix small transactions of an OLTP database withlarge transactions of a data warehouse Some academics may not think too
Trang 3922 2.2 Commercial normalization techniques
highly of a data warehouse dimensional model, which is essentially
denor-malized up the gazoo! Each approach has its role to play in the constant
dance of trying to get things right and turn a profit.
2.2.1 Referential integrity
How referential integrity and tuning are related is twofold:
Implement Referential Integrity? Yes Too many problems and
issues can arise if not
How to Implement Referential Integrity? Use built-in database
constraints if possible Do not use triggers or application coding.Triggers can especially hurt performance Application coding cancause much duplication when distributed Triggers are event driven,and thus by their very definition cannot contain transaction termina-tion commands (COMMIT and ROLLBACK commands) Theresult is that their over use can result in a huge mess, with no transac-tion termination commands
Some things to remember:
Always Index Foreign Keys: This helps to avoid locking contention
on small tables when referential integrity is validated Why? Withoutindexes on foreign key fields, every referential integrity check against
a foreign key will read an entire table without the index to read.Unfavorable results can be hot blocking on small tables and too muchI/O activity for large tables
Note: A hot block is a section of physical disk or memory with excessive
activity—more than the software or hardware can handle
Avoid Generic Tables: A table within a table In some older
data-base models, a single centralized table was used to store systeminformation; for example, sequence numbers for surrogate keys, orsystem codes This is a very bad idea Hot blocking on tables likethis can completely kill performance in even a low concurrencymulti-user environment
Trang 402.2 Commercial normalization techniques 23
2.2.2 Primary and foreign keys
Using surrogates for primary and foreign keys can help to improve mance A surrogate key is a field added to a table, usually an integersequence counter, giving a unique value for a record in a table It is alsototally unrelated to the content of a record, other than just uniqueness forthat record
perfor-Primary and foreign keys can use natural values, effectively names orcodes for values For example, in Figure 2.1, primary keys and foreign keysare created on the names of companies, divisions, departments, andemployees These values are easily identifiable to the human eye but arelengthy and complex string values as far as a computer is concerned People
do not check referential integrity of records, the relational database model issupposed to do that
The data model in Figure 2.1 could use coded values for names, makingvalues shorter However, years ago, coded values were often used for names
in order to facilitate easier typing and selection of static values—not for thepurpose of efficiency in a data model For example, it is much easier to type
USA (or select it from a pick list), rather than United States of America,
when typing in a client address
The primary and foreign keys, denoted in Figure 2.1 as PK and FKrespectively, are what apply referential integrity In Figure 2.1, in order for adivision to exist it must be part of a company Additionally, a company can-not be removed if it has an existing division What referential integrity does
is verify that these conditions exist whenever changes are attempted to any
of these tables If a violation occurs an error will be returned It is also
pos-Figure 2.1
Natural value keys