Hướng dẫn học Microsoft SQL Server 2008 part 9 pptx

SQL Server developers generally refer to database elements as tables, rows, and columns when discussingthe SQL DDL layer or physical schema, and sometimes use the terms entity, tuple, an

Trang 2

Relational Database

Design

IN THIS CHAPTER Introducing entities, tuples, and attributes

Conceptual diagramming vs.

SQL DDL Avoiding normalization over-complexity Choosing the right database design pattern

Ensuring data integrity Exploring alternative patterns Normal forms

Iplay jazz guitar — well, I used to play before life became so busy

(You can listen to some of my MP3s on my ‘‘about’’ page on

www.sqlserverbible.com.) There are some musicians who can

hear a song and then play it; I’m not one of those I can feel the rhythm, but

I have to work through the chords and figure them out almost mathematically

before I can play anything but a simple piece To me, building chords and chord

progressions is like drawing geometric patterns on the guitar neck using the frets

and strings

Music theory encompasses the scales, chords, and progressions used to make

music Every melody, harmony, rhythm, and song draws from music theory For

some musicians there’s just a feeling that the song sounds right For those who

make music their profession, they understand the theory behind why a song feels

right Great musicians have both the feel and the theory in their music

Designing databases is similar to playing music Databases are designed by

combining the right patterns to correctly model a specific solution to a problem

Normalization is the theory that shapes the design There’s both the mathematic

theory of relational algebra and the intuitive feel of an elegant database

Designing databases is both science and art

Database Basics

The purpose of a database is to store the information required by an organization

Any means of collecting and organizing data is a database Prior to the

Informa-tion Age, informaInforma-tion was primarily stored on cards, in file folders, or in ledger

books Before the adding machine, offices employed dozens of workers who spent

all day adding columns of numbers and double-checking the math of others The

job title of those who had that exciting career was computer.

Trang 3

Author’s Note

Welcome to the second of five chapters that deal with database design Although they’re spread out in

the table of contents, they weave a consistent theme that good design yields great performance:

■ Chapter 2, ‘‘Data Architecture,’’ provides an overview of data architecture

■ This chapter details relational database theory

■ Chapter 20, ‘‘Creating the Physical Database Schema,’’ discusses the DDL layer of

database design and development

■ Partitioning the physical layer is covered in Chapter 68, ‘‘Partitioning.’’

■ Designing data warehouses for business intelligence is covered in Chapter 70,

‘‘BI Design.’’

There’s more to this chapter than the standard ‘‘Intro to Normalization.’’ This chapter draws on the lessons

I’ve learned over the years and has a few original ideas

This chapter covers a book’s worth of material (which is why I rewrote it three times), but I tried to concisely

summarize the main ideas The chapter opens with an introduction to database design term and concepts

Then I present the same concept from three perspectives: first with the common patterns, then with my

custom Layered Design concept, and lastly with the normal forms I’ve tried to make the chapter flow, but

each of these ideas is easier to comprehend after you understand the other two, so if you have the time, read

the chapter twice to get the most out of it

As the number crunching began to be handled by digital machines, human labor, rather than being

eliminated, shifted to other tasks Analysts, programmers, managers, and IT staff have replaced the

human ‘‘computers’’ of days gone by

Speaking of old computers, I collect abacuses, and I know how to use them too — it keeps me in touch with the roots of computing On my office wall is a very cool nineteenth-century Russian abacus.

Benefits of a digital database

The Information Age and the relational database brought several measurable benefits to organizations:

■ Increased data consistency and better enforcement of business rules

■ Improved sharing of data, especially across distances

■ Improved ability to search for and retrieve information

■ Improved generation of comprehensive reports

■ Improved ability to analyze data trends

The general theme is that a computer database originally didn’t save time in the entry of data, but rather

in the retrieval of data and in the quality of the data retrieved However, with automated data collection

in manufacturing, bar codes in retailing, databases sharing more data, and consumers placing their own

orders on the Internet, the effort required to enter the data has also decreased

Trang 4

The previous chapter’s sidebar titled ‘‘Planning Data Stores’’ discusses different types or

styles of databases This chapter presents the relational database design principles and

pat-terns used to develop operational, or OLTP (online transaction processing), databases.

Some of the relational principles and patterns may apply to other types of databases, but databases that

are not used for first-generation data (such as most BI, reporting databases, data warehouses, or

refer-ence data stores) do not necessarily benefit from normalization.

In this chapter, when I use the term ‘‘database,’’ I’m referring exclusively to a relational, OLTP-style

database.

Tables, rows, columns

A relational database collects related, or common, data in a single list For example, all the product

information may be listed in one table and all the customers in another table

A table appears similar to a spreadsheet and is constructed of columns and rows The appeal (and the

curse) of the spreadsheet is its informal development style, which makes it easy to modify and add to

as the design matures In fact, managers tend to store critical information in spreadsheets, and many

databases started as informal spreadsheets

In both a spreadsheet and a database table, each row is an item in the list and each column is a specific

piece of data concerning that item, so each cell should contain a single piece of data about a single item

Whereas a spreadsheet tends to be free-flowing and loose in its design, database tables should be very

consistent in terms of the meaning of the data in a column Because row and column consistency is so

important to a database table, the design of the table is critical

Over the years, different development styles have referred to these concepts with various different terms,

listed in Table 3-1

TABLE 3-1

Comparing Database Terms

The List of Common A Piece of Information Development Style Items An Item in the List in the List

Spreadsheet Spreadsheet/worksheet/

named range

Row Column/cell

Relational algebra/

logical design

Entity, or relation Tuple (rhymes with

couple)

Attribute

Object-oriented

design

Class Object instance Property

Trang 5

SQL Server developers generally refer to database elements as tables, rows, and columns when discussing

the SQL DDL layer or physical schema, and sometimes use the terms entity, tuple, and attribute when

discussing the logical design The rest of this book uses the SQL terms (table, row, column), but this

chapter is devoted to the theory behind the design, so I also use the relational algebra terms (entity,

tuple, and attribute)

Database design phases

Traditionally, data modeling has been split into two phases, the logical design and the physical design;

but Louis Davidson and I have been co-presenting at conferences on the topic of database design and

I’ve become convinced that Louis is right when he defines three phases to database design To avoid

confusion with the traditional terms, I’m defining them as follows:

■ Conceptual Model: The first phase digests the organizational requirements and identifies the

entities, their attributes, and their relationships

The conceptual diagram model is great for understanding, communicating, and verifying the organization’s requirements The diagramming method should be easily understood by all the stakeholders — the subject-matter experts, the development team, and management

At this layer, the design is implementation independent: It could end up on Oracle, SQL Server, or even Access Some designers refer to this as the ‘‘logical model.’’

■ SQL DDL Layer: This phase concentrates on performance without losing the fidelity of the

logical model as it applies the design to a specific version of a database engine — SQL Server

2008, for example, generating the DDL for the actual tables, keys, and attributes Typically, the SQL DDL layer generalizes some entities, and replaces some natural keys with surrogate computer-generated keys

The SQL DDL layer might look very different than the conceptual model

■ Physical Layer: The implementation phase considers how the data will be physically stored

on the disk subsystems using indexes, partitioning, and materialized views Changes made to this layer won’t affect how the data is accessed, only how it’s stored on the disk

The physical layer ranges from simple, for small databases (under 20Gb), to complex, with multiple filegroups, indexed views, and data routing partitions

This chapter focuses on designing the conceptual model, with a brief look at normalization followed by

a repertoire of database patterns

Implementing a database without working through the SQL DLL Layer design phase is a certain path to a poorly performing database I’ve seen far too many database purists who didn’t care to learn SQL Server implement conceptual designs only to blame SQL Server for the horrible

performance.

The SQL DLL Layer is covered in Chapter 20, ‘‘Creating the Physical Database Schema.’’

Tuning the physical layer is discussed in Chapters 64, ‘‘Indexing Strategies,’’ and 68,

‘‘Partitioning.’’

Trang 6

In 1970, Dr Edgar F Codd published ‘‘A Relational Model of Data for Large Shared Data Bank’’ and

became the father of relational database During the 1970s Codd wrote a series of papers that defined

the concept of database normalization He wrote his famous ‘‘Codd’s 12 Rules’’ in 1985 to define what

constitutes a relational database and to defend the relational database from software vendors who

were falsely claiming to be relational Since that time, others have amended and refined the concept of

normalization

The primary purpose of normalization is to improve the data integrity of the database by reducing or

eliminating modification anomalies that can occur when the same fact is stored in multiple locations

within the database

Duplicate data raises all sorts of interesting problems for inserts, updates, and deletes For example, if

the product name is stored in the order detail table, and the product name is edited, should every order

details row be updated? If so, is there a mechanism to ensure that the edit to the product name

prop-agates down to every duplicate entry of the product name? If data is stored in multiple locations, is it

safe to read just one of those locations without double-checking other locations? Normalization prevents

these kinds of modification anomalies

Besides the primary goal of consistency and data integrity, there are several other very good reasons to

normalize an OLTP relational database:

■ Performance: Duplicate data requires extra code to perform extra writes, maintain

consis-tency, and manipulate data into a set when reading data On my last large production contract

(several terabytes, OLTP, 35K transactions per second), I tested a normalized version of the

database vs a denormalized version The normalized version was 15% faster I’ve found similar

results in other databases over the years

Normalization also reduces locking contention and improves multiple-user concurrency

■ Development costs: While it may take longer to design a normalized database, it’s easier to

work with a normalized database and it reduces development costs

■ Usability: By placing columns in the correct table, it’s easier to understand the database and

easier to write correct queries

■ Extensibility: A non-normalized database is often more complex and therefore more difficult

to modify

The three ‘‘Rules of One’’

Normalization is well defined as normalized forms — specific issues that address specific potential

errors in the design (there’s a whole section on normal forms later in this chapter) But I don’t design a

database with errors and then normalize the errors away; I follow normalization from the beginning to

the conclusion of the design process That’s why I prefer to think of normalization as positively stated

principles

When I teach normalization I open with the three ‘‘Rules of One,’’ which summarize normalization from

a positive point of view One type of item is represented by one entity (table) The key to designing

Trang 7

a schema that avoids update anomalies is to ensure that each single fact in real life is modeled by a

single data point in the database Three principles define a single data point:

■ One group of similar things is represented by one entity (table)

■ One thing is represented by one tuple (row)

■ One descriptive fact about the thing is represented by one attribute (column)

Grok these three simple rules and you’ll be a long way toward designing a properly normalized

database

Normalization As Story

The Time Traveler’s Wife, by Audrey Niffenegger, is one of my favorite books Without giving away the

plot or any spoilers, it’s an amazing sci-fi romance story She moves through time conventionally, while

he bounces uncontrollably through time and space Even though the plot is more complex than the average

novel, I love how Ms Niffenegger weaves every detail together into an intricate flow Every detail fits and

builds the characters and the story

In some ways, a database is like a good story The plot of the story is in the data model, and the data

represents the characters and the details.Normalization is the grammar of the database

When two writers tell the same story, each crafts the story differently There’s no single correct way to tell a

story Likewise, there may be multiple ways to model the database There’s no single correct way to model

a database — as long as the database contains all the information needed to extract the story and it follows

the normalized grammar rules, the database will work (Don’t take this to mean that any design might be

a correct design While there may be multiple correct designs, there are many more incorrect designs.) A

corollary is that just as some books read better than others, so do some database schemas flow well, while

other database designs are difficult to query

As with writing a novel, the foundation of data modeling is careful observation, an understanding of reality,

and clear thinking Based on those insights, the data modeler constructs a logical system — a new virtual

world — that models a slice of reality Therefore, how the designer views reality and identifies entities and

their interactions will influence the design of the virtual world Like postmodernism, there’s no single perfect

correct representation, only the viewpoint of the author/designer

Identifying entities

The first step to designing a database conceptual diagram is to identify the entities (tables) Because any

entity represents only one type of thing, it takes several entities together to represent an entire process

or organization

Entities are usually discovered from several sources:

■ Examining existing documents (order forms, registration forms, patient files, reports)

■ Interviews with subject-matter experts

■ Diagramming the process flow

At this early stage the goal is to simply collect a list of possible entities and their facts Some of the

entities will be obvious nouns, such as customers, products, flights, materials, and machines

Trang 8

Other entities will be verbs: shipping, processing, assembling parts to build a product Verbs may be

entities, or they may indicate a relationship between two entities

The goal is to simply collect all the possible entities and their attributes At this early stage, it’s also

useful to document as many known relationships as possible, even if those relationships will be edited

several times

Generalization

Normalization has a reputation of creating databases that are complex and unwieldy It’s true that some

database schemas are far too complex, but I don’t believe normalization, by itself, is the root cause

I’ve found that the difference between elegant databases that are a joy to query and overly complex

designs that make you want to polish your resume is the data modeler’s view of entities

When identifying entities, there’s a continuum, illustrated in Figure 3-1, ranging from a broad

all-inclusive view to a very specific narrow definition of the entity

FIGURE 3-1

Entities can be identified along a continuum, from overly generalized with a single table, to overly

specific with too many tables

Result:

Overly

Simple

One Table

Overly Complex Specific Tables

• Data-driven design

• Fewer tables

• Easier to extend

The overly simple view groups together entities that are in fact different types of things, e.g., storing

machines, products, and processes in the single entity This approach might risk data integrity for two

reasons First, it’s difficult to enforce referential integrity (foreign key constraints) because the primary

key attempts to represent multiple types of items Second, these designs tend to merge entities with

different attributes, which means that many of the attributes (columns) won’t apply to various rows

and will simply be left null Many nullable columns means the data will probably be sparsely filled and

inconsistent

At the other extreme, the overly specific view segments entities that could be represented by a single

entity into multiple entities, e.g., splitting different types of subassemblies and finished products into

multiple different entities This type of design risks flexibility and usability:

■ The additional tables create additional work at every layer of the software

■ Database relationships become more complex because what could have been a single

rela-tionship is now multiple relarela-tionships For example, instead of relating an assembly process

between any part, the assembly relationship must now relate with multiple types of parts

Trang 9

■ The database has now hard-coded the specific types of similar entities, making it very difficult

to add another similar type of entity Using the manufacturing example again, if there’s an entity for every type of subassembly, then adding another type of subassembly means changes

at every level of the software

The sweet spot in the middle generalizes, or combines, similar entities into single entities This approach

creates a more flexible and elegant database design that is easier to query and extend:

■ Look for entities with similar attributes, or entities that share some attributes

■ Look for types of entities that might have an additional similar entity added in the future

■ Look for entities that might be summarized together in reports

When designing a generalized entity, two techniques are essential:

■ Use a lookup entity to organize the types of entities For the manufacturing example, a

subassemblytypeattribute would serve the purpose of organizing the parts by subassembly type Typically, this would be a foreign key to asubassemblytypeentity

■ Typically, the different entity types that could be generalized together do have some differences

(which is why a purist view would want to segment them) Employing the supertype/subtype (discussed in the ‘‘Data Design Patterns’’ section) solves this dilemma perfectly

I’ve heard from some that generalization sounds like denormalization — it’s not When generalizing, it’s

critical that the entities comply with all the rules of normalization

Generalized databases tend to be data-driven, have fewer tables, and are easier to extend I was once

asked to optimize a database design that was modeled by a very specific-style data modeler His design

had 78 entities, mine had 18 and covered more features For which would you rather write stored

procedures?

On the other hand, be careful to merge entities because they actually do share a root meaning in

the data Don’t merge unlike entities just to save programming The result will be more complex

programming

Best Practice

Granted, knowing when to generalize and when to segment can be an art form and requires a repertoire of

database experience, but generalization is the buffer against database over-complexity; and consciously

working at understanding generalization is the key to becoming an excellent data modeler

In my seminars I use an extreme example of specific vs generalized design, asking groups of three to

four attendees to model the database in two ways: first using an overly specific data modeling technique,

and then modeling the database trying to hit the generalization sweet spot

Assume your team has been contracted to develop a database for a cruise ship’s activity director — think

Julie McCoy, the cruise director on the Love Boat

Trang 10

The cruise offers a lot of activities: tango dance lessons, tweetups, theater, scuba lessons, hang-gliding,

off-boat excursions, authentic Hawaiian luau, hula-dancing lessons, swimming lessons, Captain’s dinners,

aerobics, and the ever-popular shark-feeding scuba trips These various activities have differing

require-ments, are offered multiple times throughout the cruise, and some are held at different locations A

pas-senger entity already exists; you’re expected to extend the database with new entities to handle activities

but still use the existing passenger entity

In the seminars, the specialized designs often have an entity for every activity, every time an activity is

offered, activities at different locations, and even activity requirements I believe the maximum number

of entities by a seminar group is 36 Admittedly, it’s an extreme example for illustration purposes, but

I’ve seen database designs in production using this style

Each group’s generalized design tends to be similar to the one shown in Figure 3-2 A generalized

activity entity stores all activities and descriptions of their requirements organized by activity type The

ActivityTimeentity has one tuple (row) for every instance or offering of an activity, so if hula-dance

lessons are offered three times, there will be three tuples in this entity

FIGURE 3-2

A generalized cruise activity design can easily accommodate new activities and locations

Generalized Design

ActivityType

Activity

Time

Activity Time

SignUp

Primary keys

Perhaps the most important concept of an entity (table) is that it has a primary key — an attribute or

set of attributes that can be used to uniquely identify the tuple (row) Every entity must have a primary

key; without a primary key, it’s not a valid entity

By definition, a primary key must be unique and must have a value (not null)

Định dạng
Số trang	10
Dung lượng	1,08 MB