1. Trang chủ
  2. » Công Nghệ Thông Tin

Microsoft® SQL ServerTM 2005 Performance Optimization and Tuning Handbook pptx

517 334 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Microsoft® SQL ServerTM 2005 Performance Optimization and Tuning Handbook
Tác giả Ken England, Gavin Powell
Trường học Digital Press, Elsevier
Chuyên ngành Performance Optimization and Tuning of Microsoft® SQL ServerTM 2005
Thể loại Handbook
Năm xuất bản 2007
Thành phố Amsterdam
Định dạng
Số trang 517
Dung lượng 5,78 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Introduction xv 1.1 Partitioning tables and indexes 11.2 Building indexes online 21.3 Transact SQL improvements 21.4 Adding the .NET Framework 31.5 Trace and replay objects 41.6 Monitori

Trang 2

Microsoft ® SQL Server TM

2005 Performance

Optimization and Tuning

Handbook

Trang 3

This Page Intentionally Left Blank

Trang 4

Amsterdam • Boston • Heidelberg • London • New York • Oxford

Paris • San Diego• San Francisco • Singapore • Sydney • Tokyo

Digital Press is an imprint of Elsevier

Trang 5

Digital Press is an imprint of Elsevier

30 Corporate Drive, Suite 400, Burlington, MA 01803, USA

Linacre House, Jordan Hill, Oxford OX2 8DP, UK

Copyright © 2007, Elsevier Inc All rights reserved

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333,E-mail: permissions@elsevier.com You may also complete your request online via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact”then “Copyright and Permission” and then “Obtaining Permissions.”

Recognizing the importance of preserving what has been written, Elsevier prints its books on acid-free paper whenever possible

Library of Congress Cataloging-in-Publication Data

Application Submitted

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

Trang 6

Contents at a Glance

Introduction xv

11 Architectural Performance Options and Choices 409

Trang 7

This Page Intentionally Left Blank

Trang 8

Introduction xv

1.1 Partitioning tables and indexes 11.2 Building indexes online 21.3 Transact SQL improvements 21.4 Adding the NET Framework 31.5 Trace and replay objects 41.6 Monitoring resource consumption with SQL OS 41.7 Establishing baseline metrics 41.8 Start using the GUI tools 71.8.1 SQL Server Management Studio 81.8.2 SQL Server Configuration Manager 91.8.3 Database Engine Tuning Advisor 91.8.4 SQL Server Profiler 121.8.5 Business Intelligence Development Studio 141.9 Availability and scalability 15

2.1 Introducing logical database design for performance 192.2 Commercial normalization techniques 212.2.1 Referential integrity 222.2.2 Primary and foreign keys 232.2.3 Business rules in a relational database model 252.2.4 Alternate indexes 262.3 Denormalization for performance 29

Trang 9

viii Contents

2.3.1 What is denormalization? 312.3.2 Denormalizing the already normalized 312.3.2.1 Multiple table joins (more than two tables) 322.3.2.2 Multiple table joins finding a few fields 322.3.2.3 The presence of composite keys 342.3.2.4 One-to-one relationships 352.3.2.5 Denormalize static tables 372.3.2.6 Reconstructing collection lists 382.3.2.7 Removing tables with common fields 382.3.2.8 Reincorporating transitive dependencies 392.3.3 Denormalizing by context 402.3.3.1 Copies of single fields across tables 402.3.3.2 Summary fields in parent tables 422.3.3.3 Separating data by activity and

application requirements 432.3.3.4 Local application caching 442.3.4 Denormalizing and special purpose objects 442.4 Extreme denormalization in data warehouses 482.4.1 The dimensional data model 512.4.1.1 What is a star schema? 532.4.1.2 What is a snowflake schema? 542.4.2 Data warehouse data model design basics 562.4.2.1 Dimension tables 57

2.4.2.3 Other factors to consider during design 63

3.1 Introducing physical database design 653.2 Data volume analysis 673.3 Transaction analysis 693.4 Hardware environment considerations 73

4.3 Increasing the size of a database 834.4 Decreasing the size of a database 844.4.1 The autoshrink database option 864.4.2 Shrinking a database in the SQL Server

Trang 10

Contents ix

4.4.3 Shrinking a database using DBCC statements 884.5 Modifying filegroup properties 904.6 Setting database options 924.7 Displaying information about databases 954.8 System tables used in database configuration 98

4.11 Looking into database pages 1084.12 Pages for space management 1124.13 Partitioning tables into physical chunks 1154.13.1 Types of partitions 1174.13.2 Creating a range partition 1174.13.3 Creating an even distribution partition 1184.14 The BankingDB database 119

Framework (SQL-DMF) 1555.9 Dropping and renaming indexes 1575.10 Displaying information about indexes 1585.10.1 The system stored procedure sp_helpindex 1585.10.2 The system table sysindexes 1595.10.3 Using metadata functions to obtain information

5.10.4 The DBCC statement DBCC SHOWCONTIG 1635.11 Creating indexes on views 1675.12 Creating indexes with computed columns 1705.13 Using indexes to retrieve data 171

Trang 11

x Contents

5.13.1 Retrieving a single row 1735.13.2 Retrieving a range of rows 1755.13.3 Covered queries 1775.13.4 Retrieving a single row with a clustered index on

6.1 The SELECT statement 1946.1.1 Filtering with the WHERE clause 1956.1.2 Sorting with the ORDER BY clause 1966.1.2.1 Overriding WHERE with ORDER BY 1976.1.3 Grouping result sets 1986.1.3.1 Sorting with the GROUP BY clause 1986.1.3.2 Using DISTINCT 1996.1.3.3 The HAVING clause 199

6.2.1 Data type conversions 2006.3 Comparison conditions 2016.3.1 Equi, anti, and range 2026.3.2 LIKE pattern matching 203

6.4.3 How to tune a join 2096.5 Using subqueries for efficiency 210

Trang 12

Contents xi

6.5.1 Correlated versus non-correlated subqueries 2106.5.2 IN versus EXISTS 2106.5.3 Nested subqueries 2106.5.4 Advanced subquery joins 2116.6 Specialized metadata objects 2136.7 Procedures in Transact SQL 214

7.1 When is a query optimized? 2187.2 The steps in query optimization 218

7.4.7.5 Multiple non-clustered indexes present 2457.5 Join order selection 2467.6 How joins are processed 2477.6.1 Nested loops joins 248

8.1 Text-based query plans and statistics 2598.1.1 SET SHOWPLAN_TEXT { ON | OFF } 2598.1.2 SET SHOWPLAN_ALL { ON | OFF } 2608.1.3 SET SHOWPLAN_XML { ON | OFF } 2658.1.4 SET STATISTICS PROFILE { ON | OFF } 266

Trang 13

xii Contents

8.1.5 SET STATISTICS IO { ON | OFF } 2678.1.6 SET STATISTICS TIME { ON | OFF } 2688.1.7 SET STATISTICS XML { ON | OFF } 2708.2 Query plans in Management Studio 2708.2.1 Statistics and cost-based optimization 2758.3 Hinting to the optimizer 282

manipulation language statements 2968.4.2 Temporary tables 2978.4.3 Forcing recompilation 2988.4.4 Aging stored procedures from cache 3008.5 Non-stored procedure plans 3018.6 The syscacheobjects system table 304

9.1 SQL Server and CPU 3079.1.1 An overview of Windows and CPU utilization 3079.1.2 How SQL Server uses CPU 309

9.1.2.2 Use of symmetric multiprocessing systems 311

9.1.2.4 Query parallelism 3139.1.3 Investigating CPU bottlenecks 3149.1.4 Solving problems with CPU 3219.2 SQL Server and memory 3239.2.1 An overview of Windows virtual memory management 3239.2.2 How SQL Server uses memory 3259.2.2.1 Configuring memory for SQL Server 3269.2.3 Investigating memory bottlenecks 3299.2.4 Solving problems with memory 335

Trang 14

Contents xiii

9.3 SQL Server and disk I/O 3359.3.1 An overview of Windows and disk I/O 3369.3.2 How SQL Server uses disk I/O 3399.3.2.1 An overview of the data cache 3409.3.2.2 Keeping tables and indexes in cache 3439.3.2.3 Read-ahead scans 3449.3.2.4 Shrinking database files 3469.3.3 Investigating disk I/O bottlenecks 3489.3.4 Solving problems with disk I/O 352

10.1 Why a locking protocol? 356

10.2 The SQL Server locking protocol 35810.2.1 Shared and exclusive locks 35810.2.2 Row-, page-, and table-level locking 36010.2.2.1 When are row-level locks used? 36110.2.2.2 When are table-level locks used? 36210.2.3 Lock timeouts 363

10.4.4 More modified locking behavior 405

Trang 15

12.1 System stored procedures 42212.2 System monitor, performance logs, and alerts 42412.3 SQL Server 2005 Management Studio 42712.3.1 Client statistics 42712.3.2 The SQL Server Profiler 42812.3.2.1 What events can be traced? 42912.3.2.2 What information is collected? 43012.3.2.3 Filtering information 43112.3.2.4 Creating an SQL Server profiler trace 43112.3.2.5 Creating traces with stored procedures 43812.3.3 Database Engine Tuning Advisor 44212.4 SQL OS and resource consumption 443

Trang 16

What is the goal of tuning an SQL Server database? The goal is to improveperformance until acceptable levels are reached Acceptable levels can bedefined in a number of ways For a large online transaction processing(OLTP) application the performance goal might be to provide sub-secondresponse time for critical transactions and to provide a response time of lessthan two seconds for 95 percent of the other main transactions For somesystems, typically batch systems, acceptable performance might be mea-sured in throughput For example, a settlement system may define accept-able performance in terms of the number of trades settled per hour For anovernight batch suite acceptable performance might be that it must finishbefore the business day starts

Whatever the system, designing for performance should start early inthe design process and continue after the application has gone live Per-formance tuning is not a one-off process but an iterative process duringwhich response time is measured, tuning performed, and response timemeasured again

There is no right way to design a database; there are a number of ble approaches and all these may be perfectly valid It is sometimes said thatperformance tuning is an art, not a science This may be true, but it isimportant to undertake performance tuning experiments with the samekind of rigorous, controlled conditions under which scientific experimentsare performed Measurements should be taken before and after any modifi-cation, and these should be made one at a time so it can be establishedwhich modification, if any, resulted in an improvement or degradation What areas should the database designer concentrate on? The simpleanswer to this question is that the database designer should concentrate onthose areas that will return the most benefit In my experience, for mostdatabase designs I have worked with, large gains are typically made in thearea of query and index design As we shall see later in this book, inappro-

Trang 17

possi-xvi Introduction

priate indexes and badly written queries, as well as some other contributingfactors, can negatively influence the query optimizer such that it chooses aninefficient strategy

To give you some idea of the gains to be made in this area, I once wasasked to look at a query that joined a number of large tables together Thequery was abandoned after it had not completed within 12 hours Theaddition of an index in conjunction with a modification to the query meantthe query now completed in less than eight minutes! This magnitude ofgain cannot be achieved just by purchasing more hardware or by twiddlingwith some arcane SQL Server configuration option A database designer oradministrator’s time is always limited, so make the best use of it! The othermain area where gains can be dramatic is lock contention Removing lockbottlenecks in a system with a large number of users can have a huge impact

on response times

Now, some words of caution when chasing performance problems Ifusers phone up to tell you that they are getting poor response times, do notimmediately jump to conclusions about what is causing the problem Circle

at a high altitude first Having made sure that you are about to monitor thecorrect server, use the System Monitor to look at the CPU, disk subsystem,and memory use Are there any obvious bottlenecks? If there are, then lookfor the culprit Everyone blames the database, but it could just as easily besomeone running his or her favorite game! If there are no obvious bottle-necks, and the CPU, disk, and memory counters in the System Monitor arelower than usual, then that might tell you something Perhaps the network

is sluggish or there is lock contention Also be aware of the fact that somebottlenecks hide others A memory bottleneck often manifests itself as adisk bottleneck

There is no substitute for knowing your own server and knowing thenormal range of System Monitor counters Establish trends Measure a set

of counters regularly, and then, when someone comments that the system isslow, you can wave a graph in front of him or her showing that it isn’t! Also there are special thanks to be made to Craig Mullins for his work

on technical editing of this book

So, when do we start to worry about performance? As soon as possible,

of course! We want to take the logical design and start to look at how weshould transform it into an efficient physical design

Gavin Powell can be contacted at the following email address:

ezpowell@ezpowell.com

Trang 18

Performance and SQL Server 2005

Partitioning lets you split large chunks of data in much more manageablesmaller physical chunks of disk space The intention is to reduce I/O activ-ity For example, let’s say you have a table with 10 million rows and youonly want to read 1 million rows to compile an analytical report If the table

is divided into 10 partitions, and your 1 million rows are contained in a gle partition, then you get to read 1 million rows as opposed to 10 millionrows On that scale you can get quite a serious difference in I/O activity for

sin-a single report

SQL Server 2005 allows for table partitioning and index partitioning.What this means is that you can create a table as a partitioned table, defin-ing specifically where each physical chunk of the table or index resides.SQL Server 2000 partitioning was essentially manual partitioning, usingmultiple tables, distributed across multiple SQL Server computers Then aview (partition view) was created to overlay those tables across the servers

In other words, a query required access to a view, which contained a query,not data SQL Server 2005 table partitions contain real physical rows.Physically partitioning tables and indexes has a number of benefits:

 Data can be read from a single partition at once, cutting down mously on performance hogging I/O

enor- Data can be accessed from multiple partitions in parallel, which getsthings done at double the speed, depending on how many processors

a server platform has

 Different partitions can be managed separately, without having tointerfere with the entire table

Trang 19

2 1.3 Transact SQL improvements

Building an index online allows the table indexed against to be accessedduring the index creation process Creating or regenerating an index for avery large table can consume a considerable period of time (hours, days).Without online index building, creating an index puts a table offline If that

is crucial to the running of a computer system, then you have down time.The result was usually that indexes are not created, or never regenerated.Even the most versatile BTree indexes can sometimes require rebuilding

to increase their performance Constant data manipulation activity on atable (record insert, update and deletions) can cause a BTree index to deteri-orate over time Online index building is crucial to the constant uptimerequired by modern databases for popular websites

Transact SQL provides programmable access to SQL Server Programmableaccess means that Transact SQL allows you to construct database storedcode blocks, such as stored procedures, triggers, and functions These codeblocks have direct access to other database objects—most significantlytables where query and data manipulation commands can be executeddirectly in the stored code blocks; and code blocks are executed on the data-base server New capabilities added to Transact SQL in SQL Server 2005are as follows:

 Error handling

 Recursive queries

 Better query writing capabilities

There is also something new to SQL Server 2005 called Multiple ActiveResult Sets (MARS) MARS allows for more than a single set of rows for asingle connection In other words, a second query can be submitted to aSQL Server while the result set of a first query is still being returned fromdatabase server to client application

The overall result of Transact SQL enhancements to SQL Server 2005 isincreased performance of code, better written code, and more versatility.Better written code can ultimately make for better performing applications

in general

Trang 20

1.4 Adding the NET Framework 3

You can use programming languages other than just Transact SQL andembed code into SQL Server as NET Framework executables These pro-graming languages can leverage existing personnel skills Perhaps moreimportantly, some tasks can be written in programming languages moreappropriate to a task at hand For example a language like C# can be used,letting a programmer take advantage of the enormous speed advantages ofwriting executable code using the C programming language

Overall, you get support for languages not inherently part of SQL Server(Transact SQL) You get faster and easier development You get to use WebServices and XML (with Native XML capabilities using XML data types).The result is faster development, better development, and hopefully betterover database performance in the long run

The result you get is something called managed code Managed code iscode executed by the NET Framework As already stated, managed codecan be written using all sorts of programming languages Different pro-gramming languages have different benefits For example, C is fast and effi-cient, where Visual Basic is easier to write code with but executes slower.Additionally, the NET Framework has tremendous built-in functionality NET is much, much more versatile and powerful than Transact SQL.There is much to be said for placing executable into a database, on adatabase server such as SQL Server There is also much to be said againstthis practice Essentially, the more metadata and logic you add to a data-base, the more business logic you add to a database In my experience, add-ing too much business logic to a database can cause performance problems

in the long run After all, application development languages cater to ber crunching and other tasks Why put intensive, non-data access process-ing into a database? The database system has enough to do just in keepingyour data up to date and available

num-Managed code also compiles to native code, or native form in SQLServer, immediately prior to execution So, it should execute a little fasterbecause it executes in a form which is amenable to best performance inSQL Server

SQL Server 2005 includes a new management object model called SQLManagement Objects (SMO) The SMO has a basis in the NET Frame-work The new graphical, SQL Server Management Studio, is written usingthe SMO

Trang 21

4 1.7 Establishing baseline metrics

Tracing is the process of producing large amounts of log entry informationduring the process of normal database operations However, it might beprudent to not choose tracing as a first option to solving a performanceissue Tracing can hurt performance simply because it generates lots of data.The point of producing trace files is to aid in finding errors or performancebottlenecks, which cannot be deciphered by more readily available means

So, tracing quite literally produces trace information Replay allows replay

of actions that generated those trace events So, you could replay a sequence

of events against a SQL Server, without actually changing any data, andreproduce the unpleasant performance problem And then you could try toreanalyze the problem, try to decipher it, and try to resolve or improve it

SQL OS is a new tool for SQL Server 2005, which lives between an SQLServer database and the underlying Windows operating system (OS) Theoperating system manages, runs, and accesses computer hardware on yourdatabase server, such as CPU, memory, disk I/O, and even tasks and sched-uling SQL OS allows a direct picture into the hardware side of SQL Serverand how the database is perhaps abusing that hardware and operating sys-tem The idea is to view the hardware and the operating system from within

an SQL Server 2005 database

A baseline is a setting established by a database administrator, either written

on paper, but preferably stored in a database (generated by the database).This baseline establishes an acceptable standard of performance If a base-line is exceeded then the database is deemed to have a performance prob-lem A metric is essentially a measure of something The result is manymetrics, with established acceptable baseline values If one or more metricbaselines are exceeded then there is deemed to be one or more performanceproblems Additionally, each metric can be exceeded for a previously estab-lished reason, based on what the metric is So, if a table, with its indexes,has an established baseline value of 10 bytes per minute of I/O activity, andsuddenly that value jumps up to 10 giga bytes per minute—there is proba-bly a performance problem

Trang 22

1.7 Establishing baseline metrics 5

An established baseline metric is a measure of normal or acceptableactivity

Metric baselines have more significance (there are more metrics) in SQLServer 2005 than in SQL Server 2000 The overall effect is that an SQLServer 2005 database is now more easily monitored, and the prospect ofsome automated tuning activities becomes more practical in the long term.SQL Server 2005 has added over 70 additional baseline measures applicable

to performance of an SQL Server database These new baseline metricscover areas such as memory usage, locking activities, scheduling, networkusage, transaction management, and disk I/O activity

The obvious answer to a situation such as this is that a key index isdropped, corrupt, or deteriorated Or a query could be doing somethingunexpected such as reading all rows in a very large table

Using metrics and their established baseline or expected values, one canperform a certain amount of automated monitoring and detection of per-formance problems

Baseline metrics are essentially statistical values collected for a set ofmetrics

A metric is a measure of some activity in a database

The most effective method of gathering those expected metric values is

to collect multiple values—and then aggregate and average them And thusthe term statistic applies because a statistic is an aggregate or average value,resulting from a sample of multiple values So, when some activity veersaway from previously established statistics, you know that there could besome kind of performance problem—the larger the variation, the larger thepotential problem

Baseline metrics should be gathered in the following activity sectors:

 High load: Peak times (highest database activity)

 Low load: Off peak times (lowest database activity)

 Batch activity: Batch processing time such as during backup

process-ing and heavy reportprocess-ing or extraction cycles

 Downtime: How long it takes to backup, restore, and recover is

something the executive management will always have to detail to ents This equates to uptime and potential downtime

Trang 23

cli-6 1.7 Establishing baseline metrics

Some very generalized categories areas of metric baseline measurementare as follows:

 Applications database access: The most common performance

problems are caused by poorly built queries and locking or hot blocks(conflict caused by too much concurrency on the same data)

In computer jargon, concurrency means lots of users accessingand changing the same data all at the same time If there are toomany concurrent users, ultimately any relational database has its lim-itations on what it can manage efficiently

 Internalized database activity: Statistics must not only be present

but also kept up to date When a query reads a table, it uses what’scalled an optimizer process to make a wild guess at what it should do

If a table has 1 million rows, plus an index, and a query seeks 1record, the optimizer will tell the query to read the index The opti-mizer uses statistics to compare 1 record required, within 1 millionrows available Without the optimizer 1 million rows will be read tofind 1 record Without the statistics the optimizer cannot even hazard

a guess and will probably read everything If statistics are out of datewhere the optimizer thinks the table has 2 rows, but there are really 1million, then the optimizer will likely guess very badly

 Internalized database structure: Too much business logic, such as

stored procedures or a highly over normalized table structure, canultimately cause overloading of a database, slowing performance

because a database is just a little too top heavy.

 Database configuration: An OLTP database accesses a few rows at a

time It often uses indexes, depending on table size, and will pass verysmall amounts of data across network and telephone cables So, anOLTP database can be specifically configured to use lots of mem-ory—things like caching on client computers and middle tier servers(web and application servers), plus very little I/O A data warehouse

on the other hand produces a small number of very large tions, with low memory usage, enormous amounts of I/O, and lots ofthroughput processing So, a data warehouse doesn’t care too muchabout memory but wants the fastest access to disk possible, plus lots

transac-of localized (LAN) network bandwidth An OLTP database uses allhardware resources and a data warehouse uses mainly I/O

 Hardware resource usage: This is really very similar to the above

point under database configuration, expect that hardware can be

Trang 24

1.8 Start using the GUI tools 7

improved upon In some circumstances beefing up hardware willsolve performance issues For example, an OLTP database serverneeds plenty of memory, whereas a data warehouse does well with fastdisks, and perhaps multiple CPUs with partitioning for rapid parallelprocessing Beefing up hardware doesn’t always help Sometimesincreasing CPU speed and number, or increasing onboard memory,can only hide performance problems until a database grows in physi-cal size, or there are more users—the problem still exists For exam-ple, poor query coding and indexing in an OLTP database will alwayscause performance problems, no matter how much money is spent

on hardware Sometimes hardware solutions are easier and cheaper,but often only a stopgap solution

 Network design and configuration: Network bandwidth and

bot-tlenecks can cause problems sometimes, but this is something rarelyseen in commercial environments because the network engineers areusually prepared for potential bandwidth requirements

The above categories are most often the culprits of the biggest mance issues There are other possibilities, but they are rare and don’t reallywarrant mentioning at this point Additionally, the most frequent and exac-erbating causes of performance problems are usually the most obvious ones,and more often than not something to do with the people maintaining andusing the software, inadequate software, or inadequate hardware Hardware

perfor-is usually the easiest problem to fix Fixing software perfor-is more expensivedepending on location of errors in database or application software Per-suading users to use your applications and database the way you want iseither a matter of expensive training, or developers having built softwarewithout enough of a human use (user friendly) perspective in mind

Traditionally, many database administrators will still utilize command linetools because they perceive them as being more grassroots and, thus easier

to use Sometimes these administrators are correct I am as guilty of this as

is anyone else However, as in any profession, new gadgets are oftenfrowned upon due to simple resistance to change and a desire to deal withtools and methods which are familiar The new GUI tools appearing inmany relational databases these days are just too good to miss

Trang 25

8 1.8 Start using the GUI tools

1.8.1 SQL Server Management Studio

The SQL Server Management Studio is a new tool used to manage all thefacets of an SQL Server, including multiple databases, tables, indexes, fields,and data types, anything you can think of Figure 1.1 shows a sample view

of the SQL Server Management Studio tool in SQL Server 2005

SQL Server Management Studio is a fully integrated, multi-task ented screen (console) that can be used to manage all aspects of an SQLServer installation, including direct access to metadata and business logic,integration, analysis, reports, notification, scheduling, and XML, amongother facets of SQL Server architecture Additionally, queries and scriptingcan be constructed, tested, and executed Scripting also includes versioningcontrol (multiple historical versions of the same piece of code allow forbacktracking) It can also be used for very easy general database mainte-nance

ori-SQL Server Management Studio is in reality wholly constructed usingsomething called SQL Management Objects (SMO) SMO is essentially avery large group of predefined objects, built-in and reusable, which can beused to access all functionality of a SQL Server database SMO is writtenusing the object-oriented and highly versatile NET Framework Database

Figure 1.1

SQL Server

Management

Studio

Trang 26

1.8 Start using the GUI tools 9

administrators and programmers can use SMO objects in order to createtheir own customized procedures, for instance, to automate something likedaily backup processing

SMO is an SQL Server 2005 updated and more reliable version of tributed Management Objects (DMO), as seen in versions of SQL Serverprior to SQL Server 2005

Dis-1.8.2 SQL Server Configuration Manager

The SQL Server 2005 Configuration Manager tool allows access to theoperating system level This includes services such as configuration for cli-ent application access to an SQL Server database, as well as access to data-base server services running on a Windows server This is all shown inFigure 1.2

1.8.3 Database Engine Tuning Advisor

The SQL Server 2005 Database Engine Tuning Advisor tool is just that, atuning advisor used to assess options for tuning the performance of an SQL

Figure 1.2

SQL Server

Configuration

Manager

Trang 27

10 1.8 Start using the GUI tools

Server database This tool includes both a Graphical User Interface in

Win-dows and a command line tool called dta.exe.

This book will focus on the GUI tools as they are becoming more inent in recent versions of all relational databases,

prom-The SQL Server 2005 Database Engine Tuning Advisor includes othertools from SQL Server 2000, such as the Index Tuning Wizard However,SQL Server 2005 is very much enhanced to cater to more scenarios and moresensible recommendations In the past, recommendations have been basic atbest, and even wildly incorrect Also, now included are more object typesincluding differentiating between clustered and non-clustered indexing, plusindexing for view, and of course partitioning and parallel processing

The Database Engine Tuning Advisor is backwardly compatible withprevious versions of SQL Server

New features provided by the SQL Server 2005 Database Engine ing Advisor tool are as follows:

Tun- Multiple databases: Multiple databases can be accessed at the same

time

 More objects types: As already stated, more object types can be

tuned This includes XML, XML data types, and partitioning mendations There is also more versatility in choosing what to tuneand what to recommend for tuning Figure 1.3 shows availableoptions for differing object types allowed to be subjected to analysis

recom-And there are also some advanced tuning options as shown in Figure 1.4

 Time period workload analysis: Workloads can be analyzed over set

time periods, thus isolating peak times, off-peak times, and so on.Figure 1.5 shows analysis, allowance of time period settings, as well asapplication and evaluation of recommendations made by the tool

 Tuning log entries: A log file containing a record of events which the

Database Engine Tuning Advisor cannot tune automatically This logcan be use by a database administrator to attempt manual tuning ifappropriate

 Negligible size test database copy: The Database Engine Tuning

Advisor can create a duplicate test copy of a production environment,

Trang 28

1.8 Start using the GUI tools 11

in order to offload performance tuning testing processing Mostimportantly, the test database created does not copy data The onlything copied is the state of a database without the actual data This isactually very easy for a relational database like SQL Server All that is

Trang 29

12 1.8 Start using the GUI tools

copied are objects, such as tables and indexes, plus statistics of thoseobjects Typical table statistics include record counts and physicalsize This allows a process such as the optimizer to accurately estimatehow to execute a query

 What-if scenarios: A database administrator can create a

configura-tion and scenario and subject it to the Database Engine Tuning sor The advisory tool can give a response as to the possible effects ofspecified configuration changes In other words, you can experimentwith changes, and get an estimation of their impact, without makingthose changes in a production environment

Advi-1.8.4 SQL Server Profiler

The SQL Server Profiler tool was available in SQL Server 2000 but hassome improvements in SQL Server 2005 Improvements apply to therecording of things or events, which have happened in the database, and theability to replay those recordings The replay feature allows repetition ofproblematic scenarios which are difficult to resolve

Essentially, the SQL Server Profiler is a direct window into trace files.Trace for any relational database contain a record of some, most, or even allactivities in a database Trace files can also include general table and index-ing statistics as well Performance issues related to trace files themselves isthat tracing can be relatively intensive, depending on how tracing is config-ured Sometimes too much tracing can affect overall database performance,

Trang 30

1.8 Start using the GUI tools 13

and sometimes even quite drastically Tracing is usually a last resort but also

a very powerful option when it comes to tracking down the reason for formance problems and bottlenecks

per-There are a number of things new to SQL Server 2005 for SQL ServerProfiler:

 Trace file replays: Rollover trace files can be replayed Figure 1.6

shows various options that can be set for tracing, rollover, and quent tracing entry replay

subse- XML: The profiler tool has more flexibility by allowing for various

definitions using XML

 Query plans in XML: Query plans can be stored as XML allowing

for viewing without database access

 Trace entries as XML: Trace file entries can be stored as XML

allow-ing for viewallow-ing without database access

Figure 1.6

SQL Server Profiler

options

Trang 31

14 1.8 Start using the GUI tools

 Analysis Services: SQL Server Profiler now allows for tracing of

Analysis Services (SQL Server data warehousing) and Integration vices

Ser- Various other things: Aggregate views of trace results and

Perfor-mance Monitor counters matched with SQL Server database events.The Windows Performance Monitor tool is shown in Figure 1.7

1.8.5 Business Intelligence Development Studio

This tool is used to build something called Business Intelligence (BI)objects The BI Development Studio is a new SQL Server 2005 tool used tomanage projects for development This tool allows for integration of variousaspects of SQL Server databases, including analysis, integration, and report-ing services This tool doesn’t really do much for database performance ingeneral, but moreover can help to speed up development, and make devel-opment a cleaner and better coded process In the long term, better builtapplications perform better

Trang 32

1.9 Availability and scalability 15

Availability means that an SQL Server database will have less down timeand is less likely to irritate your source of income—your customers Scal-ability means you can now service more customers with SQL Server 2005.Availability and scalability are improved in SQL Server 2005 by the addi-tion and enhancement of the following:

 Data mirroring: Addition hot standby databases This is called

data-base mirroring in SQL Server

 Clustering: Clustering is introduced which is not the same thing as a

hot standby A hot standby takes over from a primary database, in theevent that the primary database fails Standby is purely failover abil-ity A clustered environment provides more capacity and up-time byallowing connections and requests to be serviced by more than onecomputer in a cluster of computers Many computers work in a clus-ter, in concert with each other A cluster of multiple SQL Server data-bases effectively becomes a single database spread across multiplelocally located computers—just a much larger and more powerfuldatabase Essentially, clustering provides higher capacity, speedthrough mirrored and parallel databases access, in addition to justfailover potential When failover occurs in a clustered environmentthe failed node is simply no longer servicing the needs of the entirecluster, whereas a hot standby is a switch from one server to another

 Replication: Replication is enhanced in SQL Server 2005 Workload

can be spread across multiple, distributed, replicated databases Alsothe addition of a graphical Replication Monitor tool eases manage-ment of replication and distributed databases

 Snapshot flashbacks: A snapshot is a picture of a database, frozen at

a specific point in time This allows users to go back in time, and look

at data at that point in time in the past The performance benefit isthat the availability of old data sets, in static databases, allows queries

to be executed against multiple sets of the same data

 Backup and restore: Improved restoration using snapshots to

enhance restore after crash recovery, and giving partial, general onlineaccess, during the recovery process

 Bulk imports: This is improved in SQL Server 2005.

Trang 33

16 1.10 Other useful stuff

Other useful stuff introduced in SQL Server 2005 offers the potential ofimproving the speed of development and software quality Improvementsinclude the following:

 Native XML data types: Native XML allows the storage of XML

doc-uments in their entirety inside an SQL Server database The termNative implies that that stored XML document is not only stored asthe textual data of the XML document, but it also includes thebrowser interpretive XML structure and metadata meaning In otherwords, XML data types are directly accessible from the database asfully executable XML documents The result is the inclusion of all thepower, versatility, and performance of XML in general—ALL OF IT!Essentially, XML data types allow direct access to XQuery, SOAP,XML data manipulation languages, XSD—anything and everythingXML Also included are specifics of XML exclusive to SQL Server [1]

 XML is the eXtensible Markup Language There is a whole host of

stuff added to SQL Server 2005 with the introduction of XML datatypes You can even create specific XML indexes, indexing storedXML data type and XML documents

 Service broker notification: This helps performance enormously

because multiple applications are often tied together in eCommercearchitectures The Server Broker part essentially organizes messagesbetween different things The notification part knows what to send,and where For example, a user purchases a book online at the Ama-zon website What happens? Transactions are placed into multipledifferent types of databases:

 stock inventory databases

 shipping detail databases

 payment processing such as credit cards or a provider like Paypal

 data warehouse archives

 accounting databasesThe different targets for data messages are really dependent on thesize of the online operation The larger the retailer, the more distrib-uted their architecture becomes This stuff is just too big to manageall in one place

 New data modeling techniques: A Unified Dimensional Model

(UDM) used for OLAP and analysis is data warehouse environments

Trang 34

1 Beginning XML Databases, Gavin Powell, Nov 2006, ISBN:

0471791202, Wiley

Trang 35

This Page Intentionally Left Blank

Trang 36

Logical Database Design for Performance

In database parlance, logical design is the human perceivable organization

of the slots into which data is put These are the tables, the fields, datatypes, and the relationships between tables Physical design is the underly-ing file structure, within the operating system, out of which a database isbuilt as a whole This chapter covers logical database design Physical design

is covered in the next chapter

This book is about performance Some knowledge of logical relationaldatabase design, plus general underlying operating system and file systemstructure, and file system functioning is assumed Let’s begin with logicaldatabase design, with database performance foremost in mind

performance

Logical database design for relational databases can be divided into a ber of distinct areas:

num- Normalization: A sequence of steps by which a relational database

model is both created and improved upon The sequence of steps

involved in the normalization process is called normal forms Each

normal form improves the previous one The objective is to removeredundancy and duplication, plus reduce the chance of inconsisten-cies in data and increase the precision of relationships between differ-ent data sets within a database

 Denormalization: Does the opposite of normalization by undoing

normal forms It thus reintroduces some redundancy and tion, plus increases the potential for inconsistencies in data, and so

duplica-on Denormalization is typically implemented to help increase

Trang 37

per-20 2.1 Introducing logical database design for performance

formance The extreme in denormalization helps to create specializeddata models in data warehouses

 Object Design: The advent of object databases was first expected to

introduce a new competitor to relational database technology Thisdid not happen What did happen was that relational databases haveabsorbed some aspects of object modeling, in many cases helping toenhance the relational database model into what is now known as anobject-relational database model

Objects stored in object-relational databases typically do not enhancerelational database performance, but rather enhance functionality and ver-satility

To find out more about normalization [1] and object design [2] you willhave to read other books as these topics are both very comprehensive all bythemselves There simply isn’t enough space in this book This book dealswith performance tuning So, let’s begin with the topic of normalization,and how it can be both used (or not used) to help enhance the performance

of a relational database in general

So, how do we go about tuning a relational database model? What doesnormalization have to do with tuning? There are a few simple guidelines tofollow and some things to watch out for:

 Normalization optimizes data modification at the possible expense ofdata retrieval Denormalization is just the opposite, optimizing dataretrieval at the expense of data modification

 Too little normalization can lead to too much duplication The resultcould be a database that is bigger than it should be, resulting in moredisk I/O Then again, disk space is cheap compared with processorand memory power

 Incorrect normalization is often made obvious by convoluted andcomplex application code

 Too much normalization leads to overcomplex SQL code which can

be difficult, if not impossible, to tune Be very careful implementingbeyond 3rd normal form in a commercial environment

 Too many tables results in bigger joins, which makes for slower queries

 Quite often databases are designed without forehand knowledge ofapplications The data model could be built on a purely theoretical

Trang 38

2.2 Commercial normalization techniques 21

basis Later in the development cycle, applications may have culty mating to a highly granular data model (highly normalized).One possible answer is that both development and administrationpeople should be involved in data modeling Busy commercial devel-opment projects rarely have spare time to ensure that absolutelyeverything is taken into account It’s just too expensive It should beacceptable to alter the data model at least during the developmentprocess, possibly substantially Most of the problems with relationaldatabase model tuning are normalization related

diffi- Normalization should be simple because it is simple! Don’t plicate it Normalization is somewhat based on mathematical set the-ory, which is very simple mathematics

overcom- Watch out for excessive use of outer joins in SQL code This couldmean that your data model is too granular You could have overusedthe higher normal forms Higher normal forms are rarely needed inthe name of efficiency, but rather preferred in the name of perfection,and possibly overapplication of business rules into database defini-

tions Sometimes excessive use of outer joins might be akin to: Go

and get this Oh! Go and get that too because this doesn’t quite cover it.

The other side of normalization is of course denormalization In manycases, denormalization is the undoing of normalization Normalization isperformed by the application of normal form transformations In othercases, denormalization is performed through the application of numerousspecialized tricks Let’s begin with some basic rules for normalization in amodern commercial environment

The terms modern and commercial imply a lack of traditional

normaliza-tion This also means that techniques used in a commercial environmentare likely to bear only a vague resemblance to what you were taught aboutnormalization in college or university In busy commercial environments,relational database models tend to contradict the mathematical purity of ahighly normalized table structure Purity is often sacrificed for the sake ofperformance, particularly with respect to queries This is often becausecommercial implementations tend to do things that an academic wouldnever dream of, such as mix small transactions of an OLTP database withlarge transactions of a data warehouse Some academics may not think too

Trang 39

22 2.2 Commercial normalization techniques

highly of a data warehouse dimensional model, which is essentially

denor-malized up the gazoo! Each approach has its role to play in the constant

dance of trying to get things right and turn a profit.

2.2.1 Referential integrity

How referential integrity and tuning are related is twofold:

 Implement Referential Integrity? Yes Too many problems and

issues can arise if not

 How to Implement Referential Integrity? Use built-in database

constraints if possible Do not use triggers or application coding.Triggers can especially hurt performance Application coding cancause much duplication when distributed Triggers are event driven,and thus by their very definition cannot contain transaction termina-tion commands (COMMIT and ROLLBACK commands) Theresult is that their over use can result in a huge mess, with no transac-tion termination commands

Some things to remember:

 Always Index Foreign Keys: This helps to avoid locking contention

on small tables when referential integrity is validated Why? Withoutindexes on foreign key fields, every referential integrity check against

a foreign key will read an entire table without the index to read.Unfavorable results can be hot blocking on small tables and too muchI/O activity for large tables

Note: A hot block is a section of physical disk or memory with excessive

activity—more than the software or hardware can handle

 Avoid Generic Tables: A table within a table In some older

data-base models, a single centralized table was used to store systeminformation; for example, sequence numbers for surrogate keys, orsystem codes This is a very bad idea Hot blocking on tables likethis can completely kill performance in even a low concurrencymulti-user environment

Trang 40

2.2 Commercial normalization techniques 23

2.2.2 Primary and foreign keys

Using surrogates for primary and foreign keys can help to improve mance A surrogate key is a field added to a table, usually an integersequence counter, giving a unique value for a record in a table It is alsototally unrelated to the content of a record, other than just uniqueness forthat record

perfor-Primary and foreign keys can use natural values, effectively names orcodes for values For example, in Figure 2.1, primary keys and foreign keysare created on the names of companies, divisions, departments, andemployees These values are easily identifiable to the human eye but arelengthy and complex string values as far as a computer is concerned People

do not check referential integrity of records, the relational database model issupposed to do that

The data model in Figure 2.1 could use coded values for names, makingvalues shorter However, years ago, coded values were often used for names

in order to facilitate easier typing and selection of static values—not for thepurpose of efficiency in a data model For example, it is much easier to type

USA (or select it from a pick list), rather than United States of America,

when typing in a client address

The primary and foreign keys, denoted in Figure 2.1 as PK and FKrespectively, are what apply referential integrity In Figure 2.1, in order for adivision to exist it must be part of a company Additionally, a company can-not be removed if it has an existing division What referential integrity does

is verify that these conditions exist whenever changes are attempted to any

of these tables If a violation occurs an error will be returned It is also

pos-Figure 2.1

Natural value keys

Ngày đăng: 14/03/2014, 16:20

TỪ KHÓA LIÊN QUAN

w