Hướng dẫn học Microsoft SQL Server 2008 part 56 docx

Creating the PhysicalDatabase Schema IN THIS CHAPTER Creating the database files Creating tables Creating primary and foreign keys Configuring constraints Creating the user data columns

Trang 2

Creating the Physical

Database Schema

IN THIS CHAPTER Creating the database files Creating tables

Creating primary and foreign keys

Configuring constraints Creating the user data columns Documenting the database schema

Creating indexes

The longer I work with databases, the more I become convinced that the

real magic is the physical schema design No aspect of application

devel-opment has more potential to derail an application or enable it to soar

than the physical schema — not even indexing (This idea is crucial to my view

of Smart Database Design as expressed in Chapter 2, ‘‘Data Architecture’’.)

The primary features of the application are designed at the data schema level

If the data schema supports a feature, then the code will readily bring the feature

to life; but if the feature is not designed in the tables, then the client application

can jump through as many hoops as you can code and it will never work right

The logical database schema, discussed in Chapter 3, ‘‘Relational Database

Design,’’ is a necessary design step to ensure that the business requirements

are well understood However, a logical design has never stored nor served up

any data

In contrast, the physical database schema is an actual data store that must meet

the Information Architecture Principle’s call to make information ‘‘readily available

in a usable format for daily operations and analysis by individuals, groups, and

processes.’’ It’s the physical design that meets the database objectives of usability,

scalability, integrity, and extensibility

This chapter first discusses designing the physical database schema and then

focuses on the actual implementation of the physical design:

■ Creating the database files

■ Creating the tables

■ Creating the primary and foreign keys

■ Creating the data columns

■ Adding data-integrity constraints

Trang 3

Similarly, while this chapter covers the syntax and mechanics of creating indexes, Chapter 64, ‘‘Indexing

Strategies,’’ explores how to fine-tune indexes for performance.

Part III, ‘‘Beyond Relational,’’ has three chapters related to database design Chapters 17–19, discuss

modeling and working with data that’s not traditionally thought of as relational: hierarchical data, spatial

data, XML data, full-text searches, and storing BLOBs using Filestream.

Implementation of the database often considers partitioning the data, which is covered in Chapter 68,

‘‘Partitioning.’’

And whereas this chapter focuses on building relational databases, Chapter 70, ‘‘BI Design,’’ covers

creating data warehouses.

What’s New with the Physical Schema?

SQL Server 2008 supports several new data types, and an entirely new way of storing data in the data

pages —sparse columns Read on!

Designing the Physical Database Schema

If there’s one area where I believe the SQL Server community is lacking in skills and emphasis, it’s

translating a logical design into a decent physical schema

When designing the physical design, the design team should begin with a clean logical design and/or

well-understood and documented business rules, and then brainstorm until a simple, flexible design

emerges that performs excellently, is extensible and flexible, has appropriate data integrity, and is

usable by all those who will consume the data I firmly believe it’s not a question of compromising one

database attribute for another; all these goals can be met by a well-designed elegant physical schema

Translating the logical database schema into a physical database schema may involve the following

changes:

■ Converting complex logical designs into simpler, more agile table structures

■ Converting logical many-to-many relationships to two physical one-to-many relationships with

an associative, or junction, table

■ Converting logical composite primary keys to surrogate (computer-generated) single-column primary keys

■ Converting the business rules into constraints or triggers, or, better yet, into data-driven designs

Trang 4

Logical to physical options

Every project team develops the physical database schema drawing from these two disciplines (logical

data modeling and physical schema design) in one of the following possible combinations:

■ A logical database schema is designed and then implemented without the benefit of

physi-cal schema development This plan is a sure way to develop a slow and unwieldy database

schema The application code will be frustrating to write and the code will not be able to

overcome the performance limitations of the design

■ A logical database schema is developed to ensure that the business requirements are

under-stood Based on the logical design, the database development team develops a physical

database schema This method can result in a fast, usable schema

Developing the schema in two stages is a good plan if the development team is large enough

and one team is designing and collecting the business requirements and another team is

developing the physical database schema Make sure that having a completed logical database

schema does not squelch the team’s creativity as the physical database schema is designed

■ The third combination of logical and physical design methodologies combines the two into

a single development step as the database development team develops a physical database

schema directly from the business requirements This method can work well providing that

the design team fully understands logical database modeling, physical database modeling, and

advanced query design

The key task in designing a physical database schema is brainstorming multiple possible designs, each

of which meets the user requirements and ensures data integrity Each design is evaluated based on its

simplicity, performance of possible query paths, flexibility, and maintainability

Refining the data patterns

The key to simplicity is refining the entity definition with a lot of team brainstorming so that each table

does more work — rearranging the data patterns until an elegant and simple pattern emerges This is

where a broad repertoire of database experience aids the design process

Often the solution is to view the data from multiple angles, finding the commonality between them

Users are too close to the data and they seldom correctly identify the true entities What a user might

see as multiple entities a database design team might model as a single entity with dynamic roles

Combining this quest for simplicity with some data-driven design methods can yield normalized

databases with higher data integrity, more flexibility and agility, and dramatically fewer tables

Designing for performance

A normalized logical database design without the benefit of physical database schema optimization will

perform poorly, because the logical design alone doesn’t consider performance Issues such as lock

con-tention, composite keys, excessive joins for common queries, and table structures that are difficult to

update are just some of the problems that a logical design might bring to the database

Designing for performance is greatly influenced by the simplicity or complexity of the design Each

unnecessary complexity requires additional code, extra joins, and breeds even more complexity

Trang 5

A popular myth is that the primary task of translating a logical design into a physical schema is

denor-malization Denormalization, purposefully breaking the normal forms, is the technique of duplicating

data within the data to make it easier to retrieve Interestingly, the Microsoft Word spell checker

sug-gests replacing ‘‘denormalization’’ with ‘‘demoralization.’’ Within the context of a transactional, OLTP

database, I couldn’t agree more

Normalization is described in Chapter 3, ‘‘Relational Database Design.’’

Some examples of denormalizing a data structure, including the customer name in an[Order]table

would enable retrieving the customer name when querying an order without joining to theCustomer

table Or, including theCustomerIDin aShipDetailtable would enable joining directly from

theShipDetailtable to theCustomertable while bypassing theOrderDetailand[Order]

tables Both of these examples violate the normalization because the attributes don’t depend on the

primary key

Some developers regularly denormalize portions of the database in an attempt to improve performance

While it might seem that this would improve performance because it reduces the number of joins, I

have found that in practice the additional code (procedures, triggers, constraints, etc.) required to keep

the data consistent, or to renormalize data for use in set-based queries, actually costs performance In

my consulting practice, I’ve tested a normalized design vs a denormalized design several times In every

case the normalized design was about 15% faster than the denormalized design

Best Practice

There’s a common saying in the database field: ‘‘Normalize ‘til it hurts, then denormalize ‘til it works.’’

Poppycock! That’s a saying of data modelers who don’t know how to design an efficient physical schema

I never denormalize apart from the two cases identified as responsible denormalization

■ Denormalize aggregate data — such as account balances, or inventory on hand quantities within OLTP databases — for performance even though such data could be calculated from the inventory transaction table or the account transaction ledger table These may be calculated using a trigger or a persisted computed column

■ If the data is not original and is primarily there for OLAP or reporting purposes, data consistency is not the primary concern For performance, denormalization is a wise move

The architecture of the databases, and which databases or tables are being used for which purpose, are

the driving factors in any decision to denormalize a part of the database

Trang 6

If the database requires both OLTP and OLAP, the best solution might just be to create a few tables that

duplicate data for their own distinct purposes The OLTP side might need its own tables to maintain the

data, but the reporting side might need that same data in a single, wide, fast table from which it can

retrieve data without any joins or locking concerns The trick is to correctly populate the denormalized

data in a timely manner

Indexed views are basically denormalized clustered indexes Chapter 64, ‘‘Indexing

Strate-gies,’’ discusses setting up an indexed view Chapter 70, ‘‘BI Design,’’ includes advice on

creating a denormalized reporting database and data warehouse.

Designing for extensibility

Maintenance over the life of the application will cost significantly more than the initial development

Therefore, during the initial development process you should consider as a primary objective making it

as easy as possible to maintain the physical design, code, and data The following techniques may reduce

the cost of database maintenance:

■ Enforce a strong T-SQL based abstraction layer

■ Always normalize the schema

■ Data-driven designs are more flexible, and therefore more extensible, than rigid designs

■ Use a consistent naming convention

■ Avoid data structures that are overly complex, as well as unwieldy data structures, when

simpler data structures will suffice

■ Develop with scripts instead of using Management Studio’s UI

■ Enforce the data integrity constraints from the beginning Polluted data is a bear to clean up

after even a short time of loose data-integrity rules

■ Develop the core feature first, and once that’s working, then add the bells and whistles

■ Document not only how the procedure works, but also why it works

Creating the Database

The database is the physical container for all database schemas, data, and server-side programming SQL

Server’s database is a single logical unit, even though it may exist in several files

Database creation is one of those areas in which SQL Server requires little administrative work, but you

may decide instead to fine-tune the database files with more sophisticated techniques

The Create DDL command

Creating a database using the default parameters is very simple The following data definition language

(DDL) command is taken from theCape Hatteras Adventuressample database:

CREATE DATABASE CHA2;

TheCREATEcommand will create a data file with the name provided and a.mdffile extension, as well

as a transaction log with an.ldfextension

Trang 7

can be changed in the Database Settings page of the Server Properties dialog.

While these defaults might be acceptable for a sample or development database, they are sorely

inade-quate for a production database Better alternatives are explained as theCREATE DATABASEcommand is

covered

Using the Object Explorer, creating a new database requires only that the database name be entered in

the New Database form, as shown in Figure 20-1 Use the New Database menu command from the

Databases node’s context menu to open the New Database form

FIGURE 20-1

The simplest way to create a new database is by entering the database name in Object Explorer’s

New Database page

Trang 8

The New Database page includes several individual subpages — the General, Options, and Filegroups

pages, as shown in Table 20-1 For existing databases, the Files, Permissions, Extended Properties,

Mirroring, and Log Shipping pages are added to the Database Properties page (not shown)

TABLE 20-1

Database Property Pages Page New Database Existing Database

General Create new database, setting the

name, owner, collation, recovery model, full-text indexing, and data file properties

View (read-only) general properties: name, last back-up, size, collation

recovery model, full-text indexing, and database files

Filegroups View and modify filegroup

information

View and modify filegroup information

Options View and modify database options

such as auto shrink, ANSI settings, page verification method, and single-user access

View and modify database options such as auto shrink, ANSI settings, page verification method, and single-user access

Permissions n/a View and modify server roles, users, and

permissions See Chapter 50, ‘‘Authorizing Securables,’’ for more details

Extended

Properties

Mirroring n/a View and configure database mirroring,

covered in Chapter 47, ‘‘Mirroring’’

Transaction Log

Shipping

n/a View and configure database mirroring,

covered in Chapter 46, ‘‘Log Shipping’’

Database-file concepts

A database consists of two files (or two sets of files): the data file and the transaction log The data file

contains all system and user tables, indexes, views, stored procedures, user-defined functions, triggers,

and security permissions The write-ahead transaction log is central to SQL Server’s design All updates

to the data file are first written and verified in the transaction log, ensuring that all data updates are

written to two places

Never store the transaction log on the same disk subsystem as the data file For the sake

of the transactional-integrity ACID properties and the recoverability of the database, it’s

critical that a failing disk subsystem not be able to take out both the data file and the transaction file.

Trang 9

recovery plan, as discussed in Chapter 41, ‘‘Recovery Planning.’’ How SQL Server uses the transaction log within transactions is covered in Chapter 66, ‘‘Managing Transactions, Locking, and

Blocking.’’

Configuring file growth

Prior to SQL Server version 7, the data files required manual size adjustment to handle additional data

Fortunately, for about a decade now, SQL Server can automatically grow thanks to the following options

(see Figure 20-2):

■ Enable Autogrowth: As the database begins to hold more data, the file size must grow If autogrowth is not enabled, an observant DBA will have to manually adjust the size If auto-growth is enabled, SQL Server automatically adjusts the size according to one of the following growth parameters:

■ In percent: When the data file needs to grow, this option will expand it by the percent specified Growing by percent is the best option for smaller databases With very large files, this option may add too much space in one operation and hurt performance while the data file is being resized For example, adding 10 percent to a 5GB data file will add 500MB;

writing 500MB could take a while

■ In megabytes: When the data file needs to grow, this option will add the specified number of megabytes to the file Growing by a fixed size is a good option for larger data files

Best Practice

The default setting is to grow the data file by 1 MB Autogrow events require database locks, which

severely impact performance Imagine a database that grows by a couple of gigabytes It will have to

endure 2,048 tiny autogrow events On the other hand, a large autogrowth event will consume more time

The best solution is to turn autogrow off and manually increase the file size during the database maintenance

window However, if that’s not expedient, then I recommend setting Autogrow to a reasonable mid-size

growth For my open-source O/R DBMS product, Nordic, I set both the initial size and the autogrow to

100MB

■ Maximum file size: Setting a maximum size can prevent the data file or transaction log file from filling the entire disk subsystem, which would cause trouble for the operating system

The maximum size for a data file is 16 terabytes, and log files are limited to 2 terabytes This does not

limit the size of the database because a database can include multiple files

Trang 10

FIGURE 20-2

With Management Studio’s New Database form, a new database is configured for automatic file

growth and a maximum size of 20GB

Automatic file growth can be specified in code by adding the file options to theCREATE DATABASE

DDL command File size can be specified in kilobytes (KB), megabytes (MB), gigabytes (GB), or

terabytes (TB) Megabytes is the default File growth can be set to a size or a percent The following

code creates theNewDBdatabase with an initial data-file size of 10MB, a maximum size of 200GB, and

a file growth of 10MB The transaction log file is initially 5MB, with a maximum size of 10GB and a

growth of 100MB:

CREATE DATABASE NewDB

ON

PRIMARY

(NAME = NewDB,

FILENAME = ‘c:\SQLData\NewDB.mdf’,

SIZE = 10MB,

MAXSIZE = 200Gb,

FILEGROWTH = 100)

LOG ON

(NAME = NewDBLog,

FILENAME = ‘d:\SQLLog\NewDBLog.ldf’,

SIZE = 5MB,

MAXSIZE = 10Gb,

FILEGROWTH = 100);

All the code in this chapter and all the sample databases are available for download from

the book’s website, www.sqlserverbible.com In addition, there are extensive queries

using the catalog views that relate to this chapter.

If autogrowth is not enabled, then the files require manual adjustment if they are to handle additional

data File size can be adjusted in Management Studio by editing it in the database properties form

An easy way to determine the files and file sizes for all databases from code is to query the

sys.database_files catalog view.

Định dạng
Số trang	10
Dung lượng	1,23 MB