Table 3-3 shows base camp data in a model that violates the first normal form.. Lauderdale Amazon Trek West Virginia Gauley River Rafting To redesign the data model so that it complies w
Trang 1To use the standard organization chart as an example, each tuple in theemployeeentity represents
one employee Each employee reports to a supervisor who is also listed in theemployeeentity The
ReportsToIDforeign key points to the supervisor’s primary key
BecauseEmployeeIDis a primary key andReportsToIDis a foreign key, the relationship
cardinal-ity is one-to-many, as shown in Figure 3-12 One manager may have several direct reports, but each
employee may have only one manager
FIGURE 3-12
The reflexive, or recursive, relationship is a one-to-many relationship between two tuples of the same
entity This shows the organization chart for members of the Adventure Works IT department
Primary Key: ContactID Foreign Key: ReportsToID
Contact
Ken Sánchez <NULL>
Jean Trenary Ken Sánchez
Stephanie Conroy Jean Trenary
François Ajenstat Jean Trenary
Dan Wilson Jean Trenary
A bill of materials is a more complex form of the recursive pattern because a part may be built from
sev-eral source parts, and the part may be used to build sevsev-eral parts in the next step of the manufacturing
process, as illustrated in Figure 3-13
FIGURE 3-13
The conceptual diagram of a many-to-many recursive relationship shows multiple cardinality at each
end of the relationship
Part
An associative entity is required to resolve the many-to-many relationship between the component parts
being used and the part being assembled In theMaterialSpecificationsample database, theBoM
Trang 2(bill of materials) associative entity has two foreign keys that both point to thePartentity, as shown
in Figure 3-14 The first foreign key points to the part being built The second foreign key points to the
source parts
FIGURE 3-14
The physical implementation of the many-to-many reflexive relationship must include a associative
entity to resolve the many-to-many relationship, just like the many-to-many two-entity relationship
BoM
Part A Part B Part C
Primary Key:ContactID
Widget
Thing1 Bolt
ForeignKey:AssemblyID Foreign Key: ComponentID
Widget Part A
Part B
Super Widget
Part A
Widget Part A Thing 1 Part A Bolt
Part B
Thing 1
Super Widget Part A SuperWidget
Part C
Part C
In the sample data, Part A is constructed from two parts (a Thing1 and a bolt) and is used in the
assem-bly of two parts (Widget and SuperWidget)
The first foreign key points to the material being built The second foreign key points to the source
material
Entity-Value Pairs Pattern
Every couple of months, I hear about data modelers working with theentity-value pairs pattern, also known
as theentity-attribute-value (EAV) pattern, sometimes called the generic pattern or property bag/property
table pattern, illustrated in Figure 3-15 In the SQL Server 2000 Bible, I called it the ‘‘dynamic/relational
pattern.’’
continued
Trang 3FIGURE 3-15
The entity-values pairs pattern is a simple design with only four tables: class/type, attribute/column,
object/item, and value The value table stores every value for every attribute for every item — one long list
Class
Category
Object
Item
Attribute
Property
Value
This design can be popular when applications require dynamic attributes Sometimes it’s used as an OO
DBMS physical design within a RDBMS product It’s also gaining popularity with cloud databases
At first blush, the entity-value pairs pattern is attractive, novel, and appealing It offers unlimited logical
design alterations without any physical schema changes — the ultimate flexible extensible design
But there are problems Many problems
■ The entity-value pairs pattern lacks data integrity — specifically, data typing The data
type is the most basic data constraint The basic entity-value pairs pattern stores every
value in a single nvarchar or sql_variant column and ignores data typing One
option that I wouldn’t recommend is to create a value table for each data type While
this adds data typing, it certainly complicates the code
■ It’s difficult to query the entity-value pairs pattern I’ve seen two solutions The most
common method is hard-coding NET code to extract and normalize the data Another
option is to code-gen a table-valued UDF or crosstab view for each class/type to
extract the data and return a normalized data set This has the advantage of being
usable in normal SQL queries, but performance and inserts/updates remain difficult
Either solution defeats the dynamic goal of the pattern
■ Perhaps the greatest complaint against the entity-value pairs pattern is that it’s nearly
impossible to enforce referential integrity
Can the value-pairs pattern be an efficient, practical solution? I doubt it I continue to hear of projects using
this pattern that initially look promising and then fail under the weight of querying once it’s fully populated
Nulltheless, someday I’d like to build out a complete EAV code-gen tool and test it under a heavy load — just
for the fun of it
Trang 4Database design layers
I’ve observed that every database can be visualized as three layers: domain integrity (lookup) layer,
busi-ness visible layer, and supporting layer, as drawn in Figure 3-16
FIGURE 3-16
Visualizing the database as three layers can be useful when designing the conceptual diagram and
coding the SQL DLL implementation
• Domain Integrity
Look up tables
• Business Entities (Visible)
Objects the user can describe
• Supporting Entities
Associative tables
While you are designing the conceptual diagram, visualizing the database as three layers can help
orga-nize the entities and clarify the design When the database design moves into the SQL DDL
implementa-tion phase, the database design layers become critical in optimizing the primary keys for performance
The center layer contains those entities that the client or subject-matter expert would readily recognize
and understand These are the main work tables that contain working data such as transaction, account,
or contact information When a user enters data on a daily basis, these are the tables hit by the insert
and update I refer to this layer as the visible layer or the business entity layer.
Above the business entity layer is the domain integrity layer This top layer has the entities used for
val-idating foreign key values These tables may or may not be recognizable by the subject-matter expert or
a typical end-user The key point is that they are used only to maintain the list of what’s legal for a
for-eign key, and they are rarely updated once initially populated
Below the visible layer live the tables that are a mystery to the end-user — associative tables used to
materialize a many-to-many logical relationship are a perfect example of a supporting table Like the
vis-ible layer, these tables are often heavily updated
Normal Forms
Taking a detailed look at the normal forms moves this chapter into a more formal study of relational
database design
Contrary to popular opinion, the forms are not a progressive methodology, but they do represent a
pro-gressive level of compliance Technically, you can’t be in 2NF until 1NF has been met Don’t plan on
designing an entity and moving it through first normal form to second normal form, and so on Each
normal form is simply a different type of data integrity fault to be avoided
Trang 5First normal form (1NF)
The first normalized form means the data is in an entity format, such that the following three conditions
are met:
■ Every unit of data is represented within scalar attributes A scalar value is a value ‘‘capable of
being represented by a point on a scale,’’ according to Merriam-Webster
Every attribute must contain one unit of data, and each unit of data must fill one attribute
Designs that embed multiple pieces of information within an attribute violate the first normal form Likewise, if multiple attributes must be combined in some way to determine a single unit of data, then the attribute design is incomplete
■ All data must be represented in unique attributes Each attribute must have a unique name and a
unique purpose An entity should have no repeating attributes If the attributes repeat, or the entity is very wide, then the object is too broadly designed
A design that repeats attributes, such as an order entity that includesitem1,item2, and item3attributes to hold multiple line items, violates the first normal form
■ All data must be represented within unique tuples If the entity design requires or permits
duplicate tuples, that design violates the first normal form
If the design requires multiple tuples to represent a single item, or multiple items are repre-sented by a single tuple, then the table violates first normal form
For an example of the first normal form in action, consider the listing of base camps and tours from the
Cape Hatteras Adventuresdatabase Table 3-3 shows base camp data in a model that violates the
first normal form The repeating tour attribute is not unique
TABLE 3-3
Violating the First Normal Form
Ashville Appalachian Trail Blue Ridge Parkway Hike
Cape Hatteras Outer Banks Lighthouses
Ft Lauderdale Amazon Trek
West Virginia Gauley River Rafting
To redesign the data model so that it complies with the first normal form, resolve the repeating group
of tour attributes into a single unique attribute, as shown in Table 3-4, and then move any multiple
val-ues to a unique tuple TheBaseCampentity contains a unique tuple for each base camp, and theTour
entity’sBaseCampIDrefers to the primary key in theBaseCampentity
Trang 6TABLE 3-4
Conforming to the First Normal Form
Gauley River Rafting
Another example of a data structure that desperately needs to adhere to the first normal form is a
cor-porate product code that embeds the department, model, color, size, and so forth within the code I’ve
even seen product codes that were so complex they included digits to signify the syntax for the
follow-ing digits
In a theoretical sense, this type of design is wrong because the attribute isn’t a scalar value In practical
terms, it has the following problems:
■ Using a digit or two for each data element means that the database will soon run out of
possible data values
■ Databases don’t index based on the internal values of a string, so searches require scanning the
entire table and parsing each value
■ Business rules are difficult to code and enforce
Entities with non-scalar attributes need to be completely redesigned so that each individual data attribute
has its own attribute Smart keys may be useful for humans, but it is best if it is generated by combining
data from the tables
Second normal form (2NF)
The second normal form ensures that each attribute does in fact describe the entity It’s a dependency
issue Does the attribute depend on, or describe, the item identified by the primary key?
If the entity’s primary key is a single value, this isn’t too difficult Composite primary keys can
some-times get into trouble with the second normal form if the attributes aren’t dependent on every attribute
in the primary key If an attribute depends on one of the primary key attributes but not the other, that
is a partial dependency, which violates the second normal form
An example of a data model that violates the second normal form is one in which the base camp phone
number is added to theBaseCampTourentity, as shown in Table 3-5 Assume that the primary key
Trang 7(PK) is a composite of both theBaseCampand theTour, and that the phone number is a permanent
phone number for the base camp, not a phone number assigned for each tour
TABLE 3-5
Violating the Second Normal Form
The problem with this design is that the phone number is an attribute of the base camp but not the
tour, so thePhoneNumberattribute is only partially dependent on the entity’s primary key
An obvious practical problem with this design is that updating the phone number requires either
updat-ing multiple tuples or riskupdat-ing havupdat-ing two phone numbers for the same phone
The solution is to remove the partially dependent attribute from the entity with the composite keys, and
create an entity with a unique primary key for the base camp, as shown in Table 3-6 This new entity is
then an appropriate location for the dependent attribute
TABLE 3-6
Conforming to the Second Normal Form
West Virginia Gauley River Rafting
Trang 8ThePhoneNumberattribute is now fully dependent on the entity’s primary key Each phone number is
stored in only one location, and no partial dependencies exist
Third normal form (3NF)
The third normal form checks for transitive dependencies A transitive dependency is similar to a partial
dependency in that they both refer to attributes that are not fully dependent on a primary key A
depen-dency is transient whenattribute1is dependent onattribute2, which is dependent on the
pri-mary key
The second normal form is violated when an attribute depends on part of the key The third normal
form is violated when the attribute does depend on the key but also depends on another non-key
attribute
The key phrase when describing third normal form is that every attribute ‘‘must provide a fact about the
key, the whole key, and nothing but the key.’’
Just as with the second normal form, the third normal form is resolved by moving the non-dependent
attribute to a new entity
Continuing with the Cape Hatteras Adventures example, a guide is assigned as the lead guide
respon-sible for each base camp TheBaseCampGuideattribute belongs in theBaseCampentity; but it is a
violation of the third normal form if other information describing the guide is stored in the base camp,
as shown in Table 3-7
TABLE 3-7
Violating the Third Normal Form
Base Camp Entity
TheDateofHiredescribes the guide not the base, so the hire-date attribute is not directly dependent
on theBaseCampentity’s primary key TheDateOfHire’s dependency is transitive — it describes the
key and a non-key attribute — in that it goes through theLeadGuideattribute
Creating aGuideentity and moving its attributes to the new entity resolves the violation of the third
normal form and cleans up the logical design, as demonstrated in Table 3-8
Trang 9TABLE 3-8
Conforming to the Third Normal Form
Best Practice
If the entity has a good primary key and every attribute is scalar and fully dependent on the primary key,
then the logical design is in the third normal form Most database designs stop at the third normal form
The additional forms prevent problems with more complex logical designs If you tend to work with
mind-bending modeling problems and develop creative solutions, then understanding the advanced forms
will prove useful
The Boyce-Codd normal form (BCNF)
The Boyce-Codd normal form occurs between the third and fourth normal forms, and it handles a
prob-lem with an entity that has multiple candidate keys One of the candidate keys is chosen as the primary
key and the others become alternate keys For example, a person might be uniquely identified by his or
her social security number (ssn), employee number, and driver’s license number If the ssn is the
pri-mary key, then the employee number and driver’s license number are the alternate keys
The Boyce-Codd normal form simply stipulates that in such a case every attribute must describe every
candidate key If an attribute describes one of the candidate keys but not another candidate key, then
the entity violates BCNF
Fourth normal form (4NF)
The fourth normal form deals with problems created by complex composite primary keys If two
inde-pendent attributes are brought together to form a primary key along with a third attribute but the two
attributes don’t really uniquely identify the entity without the third attribute, then the design violates the
fourth normal form
Trang 10For example, assume the following conditions:
1 TheBaseCampand the base camp’sLeadGuidewere used as a composite primary key
2 AnEventand theGuidewere brought together as a primary key
3 Because both used a guide all three were combined into a single entity.
The preceding example violates the fourth normal form
The fourth normal form is used to help identify entities that should be split into separate entities
Usu-ally this is only an issue if large composite primary keys have brought too many disparate objects into a
single entity
Fifth normal form (5NF)
The fifth normal form provides the method for designing complex relationships that involve multiple
(three or more) entities A three-way or ternary relationship, if properly designed, is in the fifth normal
form The cardinality of any of the relationships could be one or many What makes it a ternary
rela-tionship is the number of related entities
As an example of a ternary relationship, consider a manufacturing process that involves an operator, a
machine, and a bill of materials From one point of view, this could be an operation entity with three
foreign keys Alternately, it could be thought of as a ternary relationship with additional attributes
Just like a two-entity many-to-many relationship, a ternary relationship requires a resolution entity in
the physical schema design to resolve the many-to-many relationship into multiple artificial one-to-many
relationships; but in this case the resolution entity has three or more foreign keys
In such a complex relationship, the fifth normal form requires that each entity, if separated from the
ternary relationship, remains a proper entity without any loss of data
It’s commonly stated that third normal form is enough Boyce-Codd, fourth, and fifth normal forms may
be complex, but violating them can cause severe problems It’s not a matter of more entities vs fewer
entities; it’s a matter of properly aligned attributes and keys
As I mentioned earlier in this chapter, Louis Davidson (aka Dr SQL) and I co-present a
session at conferences on database design I recommend his book Pro SQL Server 2008
Relational Database Design and Implementation (Apress, 2008).
Summary
Relational database design, covered in Chapter 2, showed why the database physical schema is critical
to the database’s performance This chapter looked at the theory behind the logical correctness of the
database design and the many patterns used to assemble a database schema
■ There are three phases in database design: the conceptual (diagramming) phase, the SQL
DDL (create table) phase, and the physical layer (partition and file location) phase Databases
designed with only the conceptual phase perform poorly