1. Trang chủ
  2. » Công Nghệ Thông Tin

Hướng dẫn học Microsoft SQL Server 2008 part 11 pdf

10 331 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 621,62 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Table 3-3 shows base camp data in a model that violates the first normal form.. Lauderdale Amazon Trek West Virginia Gauley River Rafting To redesign the data model so that it complies w

Trang 1

To use the standard organization chart as an example, each tuple in theemployeeentity represents

one employee Each employee reports to a supervisor who is also listed in theemployeeentity The

ReportsToIDforeign key points to the supervisor’s primary key

BecauseEmployeeIDis a primary key andReportsToIDis a foreign key, the relationship

cardinal-ity is one-to-many, as shown in Figure 3-12 One manager may have several direct reports, but each

employee may have only one manager

FIGURE 3-12

The reflexive, or recursive, relationship is a one-to-many relationship between two tuples of the same

entity This shows the organization chart for members of the Adventure Works IT department

Primary Key: ContactID Foreign Key: ReportsToID

Contact

Ken Sánchez <NULL>

Jean Trenary Ken Sánchez

Stephanie Conroy Jean Trenary

François Ajenstat Jean Trenary

Dan Wilson Jean Trenary

A bill of materials is a more complex form of the recursive pattern because a part may be built from

sev-eral source parts, and the part may be used to build sevsev-eral parts in the next step of the manufacturing

process, as illustrated in Figure 3-13

FIGURE 3-13

The conceptual diagram of a many-to-many recursive relationship shows multiple cardinality at each

end of the relationship

Part

An associative entity is required to resolve the many-to-many relationship between the component parts

being used and the part being assembled In theMaterialSpecificationsample database, theBoM

Trang 2

(bill of materials) associative entity has two foreign keys that both point to thePartentity, as shown

in Figure 3-14 The first foreign key points to the part being built The second foreign key points to the

source parts

FIGURE 3-14

The physical implementation of the many-to-many reflexive relationship must include a associative

entity to resolve the many-to-many relationship, just like the many-to-many two-entity relationship

BoM

Part A Part B Part C

Primary Key:ContactID

Widget

Thing1 Bolt

ForeignKey:AssemblyID Foreign Key: ComponentID

Widget Part A

Part B

Super Widget

Part A

Widget Part A Thing 1 Part A Bolt

Part B

Thing 1

Super Widget Part A SuperWidget

Part C

Part C

In the sample data, Part A is constructed from two parts (a Thing1 and a bolt) and is used in the

assem-bly of two parts (Widget and SuperWidget)

The first foreign key points to the material being built The second foreign key points to the source

material

Entity-Value Pairs Pattern

Every couple of months, I hear about data modelers working with theentity-value pairs pattern, also known

as theentity-attribute-value (EAV) pattern, sometimes called the generic pattern or property bag/property

table pattern, illustrated in Figure 3-15 In the SQL Server 2000 Bible, I called it the ‘‘dynamic/relational

pattern.’’

continued

Trang 3

FIGURE 3-15

The entity-values pairs pattern is a simple design with only four tables: class/type, attribute/column,

object/item, and value The value table stores every value for every attribute for every item — one long list

Class

Category

Object

Item

Attribute

Property

Value

This design can be popular when applications require dynamic attributes Sometimes it’s used as an OO

DBMS physical design within a RDBMS product It’s also gaining popularity with cloud databases

At first blush, the entity-value pairs pattern is attractive, novel, and appealing It offers unlimited logical

design alterations without any physical schema changes — the ultimate flexible extensible design

But there are problems Many problems

■ The entity-value pairs pattern lacks data integrity — specifically, data typing The data

type is the most basic data constraint The basic entity-value pairs pattern stores every

value in a single nvarchar or sql_variant column and ignores data typing One

option that I wouldn’t recommend is to create a value table for each data type While

this adds data typing, it certainly complicates the code

■ It’s difficult to query the entity-value pairs pattern I’ve seen two solutions The most

common method is hard-coding NET code to extract and normalize the data Another

option is to code-gen a table-valued UDF or crosstab view for each class/type to

extract the data and return a normalized data set This has the advantage of being

usable in normal SQL queries, but performance and inserts/updates remain difficult

Either solution defeats the dynamic goal of the pattern

■ Perhaps the greatest complaint against the entity-value pairs pattern is that it’s nearly

impossible to enforce referential integrity

Can the value-pairs pattern be an efficient, practical solution? I doubt it I continue to hear of projects using

this pattern that initially look promising and then fail under the weight of querying once it’s fully populated

Nulltheless, someday I’d like to build out a complete EAV code-gen tool and test it under a heavy load — just

for the fun of it

Trang 4

Database design layers

I’ve observed that every database can be visualized as three layers: domain integrity (lookup) layer,

busi-ness visible layer, and supporting layer, as drawn in Figure 3-16

FIGURE 3-16

Visualizing the database as three layers can be useful when designing the conceptual diagram and

coding the SQL DLL implementation

• Domain Integrity

Look up tables

• Business Entities (Visible)

Objects the user can describe

• Supporting Entities

Associative tables

While you are designing the conceptual diagram, visualizing the database as three layers can help

orga-nize the entities and clarify the design When the database design moves into the SQL DDL

implementa-tion phase, the database design layers become critical in optimizing the primary keys for performance

The center layer contains those entities that the client or subject-matter expert would readily recognize

and understand These are the main work tables that contain working data such as transaction, account,

or contact information When a user enters data on a daily basis, these are the tables hit by the insert

and update I refer to this layer as the visible layer or the business entity layer.

Above the business entity layer is the domain integrity layer This top layer has the entities used for

val-idating foreign key values These tables may or may not be recognizable by the subject-matter expert or

a typical end-user The key point is that they are used only to maintain the list of what’s legal for a

for-eign key, and they are rarely updated once initially populated

Below the visible layer live the tables that are a mystery to the end-user — associative tables used to

materialize a many-to-many logical relationship are a perfect example of a supporting table Like the

vis-ible layer, these tables are often heavily updated

Normal Forms

Taking a detailed look at the normal forms moves this chapter into a more formal study of relational

database design

Contrary to popular opinion, the forms are not a progressive methodology, but they do represent a

pro-gressive level of compliance Technically, you can’t be in 2NF until 1NF has been met Don’t plan on

designing an entity and moving it through first normal form to second normal form, and so on Each

normal form is simply a different type of data integrity fault to be avoided

Trang 5

First normal form (1NF)

The first normalized form means the data is in an entity format, such that the following three conditions

are met:

■ Every unit of data is represented within scalar attributes A scalar value is a value ‘‘capable of

being represented by a point on a scale,’’ according to Merriam-Webster

Every attribute must contain one unit of data, and each unit of data must fill one attribute

Designs that embed multiple pieces of information within an attribute violate the first normal form Likewise, if multiple attributes must be combined in some way to determine a single unit of data, then the attribute design is incomplete

■ All data must be represented in unique attributes Each attribute must have a unique name and a

unique purpose An entity should have no repeating attributes If the attributes repeat, or the entity is very wide, then the object is too broadly designed

A design that repeats attributes, such as an order entity that includesitem1,item2, and item3attributes to hold multiple line items, violates the first normal form

■ All data must be represented within unique tuples If the entity design requires or permits

duplicate tuples, that design violates the first normal form

If the design requires multiple tuples to represent a single item, or multiple items are repre-sented by a single tuple, then the table violates first normal form

For an example of the first normal form in action, consider the listing of base camps and tours from the

Cape Hatteras Adventuresdatabase Table 3-3 shows base camp data in a model that violates the

first normal form The repeating tour attribute is not unique

TABLE 3-3

Violating the First Normal Form

Ashville Appalachian Trail Blue Ridge Parkway Hike

Cape Hatteras Outer Banks Lighthouses

Ft Lauderdale Amazon Trek

West Virginia Gauley River Rafting

To redesign the data model so that it complies with the first normal form, resolve the repeating group

of tour attributes into a single unique attribute, as shown in Table 3-4, and then move any multiple

val-ues to a unique tuple TheBaseCampentity contains a unique tuple for each base camp, and theTour

entity’sBaseCampIDrefers to the primary key in theBaseCampentity

Trang 6

TABLE 3-4

Conforming to the First Normal Form

Gauley River Rafting

Another example of a data structure that desperately needs to adhere to the first normal form is a

cor-porate product code that embeds the department, model, color, size, and so forth within the code I’ve

even seen product codes that were so complex they included digits to signify the syntax for the

follow-ing digits

In a theoretical sense, this type of design is wrong because the attribute isn’t a scalar value In practical

terms, it has the following problems:

■ Using a digit or two for each data element means that the database will soon run out of

possible data values

■ Databases don’t index based on the internal values of a string, so searches require scanning the

entire table and parsing each value

■ Business rules are difficult to code and enforce

Entities with non-scalar attributes need to be completely redesigned so that each individual data attribute

has its own attribute Smart keys may be useful for humans, but it is best if it is generated by combining

data from the tables

Second normal form (2NF)

The second normal form ensures that each attribute does in fact describe the entity It’s a dependency

issue Does the attribute depend on, or describe, the item identified by the primary key?

If the entity’s primary key is a single value, this isn’t too difficult Composite primary keys can

some-times get into trouble with the second normal form if the attributes aren’t dependent on every attribute

in the primary key If an attribute depends on one of the primary key attributes but not the other, that

is a partial dependency, which violates the second normal form

An example of a data model that violates the second normal form is one in which the base camp phone

number is added to theBaseCampTourentity, as shown in Table 3-5 Assume that the primary key

Trang 7

(PK) is a composite of both theBaseCampand theTour, and that the phone number is a permanent

phone number for the base camp, not a phone number assigned for each tour

TABLE 3-5

Violating the Second Normal Form

The problem with this design is that the phone number is an attribute of the base camp but not the

tour, so thePhoneNumberattribute is only partially dependent on the entity’s primary key

An obvious practical problem with this design is that updating the phone number requires either

updat-ing multiple tuples or riskupdat-ing havupdat-ing two phone numbers for the same phone

The solution is to remove the partially dependent attribute from the entity with the composite keys, and

create an entity with a unique primary key for the base camp, as shown in Table 3-6 This new entity is

then an appropriate location for the dependent attribute

TABLE 3-6

Conforming to the Second Normal Form

West Virginia Gauley River Rafting

Trang 8

ThePhoneNumberattribute is now fully dependent on the entity’s primary key Each phone number is

stored in only one location, and no partial dependencies exist

Third normal form (3NF)

The third normal form checks for transitive dependencies A transitive dependency is similar to a partial

dependency in that they both refer to attributes that are not fully dependent on a primary key A

depen-dency is transient whenattribute1is dependent onattribute2, which is dependent on the

pri-mary key

The second normal form is violated when an attribute depends on part of the key The third normal

form is violated when the attribute does depend on the key but also depends on another non-key

attribute

The key phrase when describing third normal form is that every attribute ‘‘must provide a fact about the

key, the whole key, and nothing but the key.’’

Just as with the second normal form, the third normal form is resolved by moving the non-dependent

attribute to a new entity

Continuing with the Cape Hatteras Adventures example, a guide is assigned as the lead guide

respon-sible for each base camp TheBaseCampGuideattribute belongs in theBaseCampentity; but it is a

violation of the third normal form if other information describing the guide is stored in the base camp,

as shown in Table 3-7

TABLE 3-7

Violating the Third Normal Form

Base Camp Entity

TheDateofHiredescribes the guide not the base, so the hire-date attribute is not directly dependent

on theBaseCampentity’s primary key TheDateOfHire’s dependency is transitive — it describes the

key and a non-key attribute — in that it goes through theLeadGuideattribute

Creating aGuideentity and moving its attributes to the new entity resolves the violation of the third

normal form and cleans up the logical design, as demonstrated in Table 3-8

Trang 9

TABLE 3-8

Conforming to the Third Normal Form

Best Practice

If the entity has a good primary key and every attribute is scalar and fully dependent on the primary key,

then the logical design is in the third normal form Most database designs stop at the third normal form

The additional forms prevent problems with more complex logical designs If you tend to work with

mind-bending modeling problems and develop creative solutions, then understanding the advanced forms

will prove useful

The Boyce-Codd normal form (BCNF)

The Boyce-Codd normal form occurs between the third and fourth normal forms, and it handles a

prob-lem with an entity that has multiple candidate keys One of the candidate keys is chosen as the primary

key and the others become alternate keys For example, a person might be uniquely identified by his or

her social security number (ssn), employee number, and driver’s license number If the ssn is the

pri-mary key, then the employee number and driver’s license number are the alternate keys

The Boyce-Codd normal form simply stipulates that in such a case every attribute must describe every

candidate key If an attribute describes one of the candidate keys but not another candidate key, then

the entity violates BCNF

Fourth normal form (4NF)

The fourth normal form deals with problems created by complex composite primary keys If two

inde-pendent attributes are brought together to form a primary key along with a third attribute but the two

attributes don’t really uniquely identify the entity without the third attribute, then the design violates the

fourth normal form

Trang 10

For example, assume the following conditions:

1 TheBaseCampand the base camp’sLeadGuidewere used as a composite primary key

2 AnEventand theGuidewere brought together as a primary key

3 Because both used a guide all three were combined into a single entity.

The preceding example violates the fourth normal form

The fourth normal form is used to help identify entities that should be split into separate entities

Usu-ally this is only an issue if large composite primary keys have brought too many disparate objects into a

single entity

Fifth normal form (5NF)

The fifth normal form provides the method for designing complex relationships that involve multiple

(three or more) entities A three-way or ternary relationship, if properly designed, is in the fifth normal

form The cardinality of any of the relationships could be one or many What makes it a ternary

rela-tionship is the number of related entities

As an example of a ternary relationship, consider a manufacturing process that involves an operator, a

machine, and a bill of materials From one point of view, this could be an operation entity with three

foreign keys Alternately, it could be thought of as a ternary relationship with additional attributes

Just like a two-entity many-to-many relationship, a ternary relationship requires a resolution entity in

the physical schema design to resolve the many-to-many relationship into multiple artificial one-to-many

relationships; but in this case the resolution entity has three or more foreign keys

In such a complex relationship, the fifth normal form requires that each entity, if separated from the

ternary relationship, remains a proper entity without any loss of data

It’s commonly stated that third normal form is enough Boyce-Codd, fourth, and fifth normal forms may

be complex, but violating them can cause severe problems It’s not a matter of more entities vs fewer

entities; it’s a matter of properly aligned attributes and keys

As I mentioned earlier in this chapter, Louis Davidson (aka Dr SQL) and I co-present a

session at conferences on database design I recommend his book Pro SQL Server 2008

Relational Database Design and Implementation (Apress, 2008).

Summary

Relational database design, covered in Chapter 2, showed why the database physical schema is critical

to the database’s performance This chapter looked at the theory behind the logical correctness of the

database design and the many patterns used to assemble a database schema

■ There are three phases in database design: the conceptual (diagramming) phase, the SQL

DDL (create table) phase, and the physical layer (partition and file location) phase Databases

designed with only the conceptual phase perform poorly

Ngày đăng: 04/07/2014, 09:20

TỪ KHÓA LIÊN QUAN