Database Modeling & Design Fourth Edition- P25 doc

6 Normalization his chapter focuses on the fundamentals of normal forms for rela-tional databases and the database design step that normalizes the candidate tables [step IId of the datab

Trang 1

6 Normalization

his chapter focuses on the fundamentals of normal forms for rela-tional databases and the database design step that normalizes the candidate tables [step II(d) of the database design life cycle] It also inves-tigates the equivalence between the conceptual data model (e.g., the ER model) and normal forms for tables As we go through the examples in this chapter, it should become obvious that good, thoughtful design of a conceptual model will result in databases that are either already normal-ized or can be easily normalnormal-ized with minor changes This truth illus-trates the beauty of the conceptual modeling approach to database design, in that the experienced relational database designer will develop

a natural gravitation toward a normalized model from the beginning For most database practitioners, Sections 6.1 through 6.4 cover the critical normalization needed for everyday use, through Boyce-Codd normal form (BCNF) Section 6.5 covers the higher normal forms of mostly theoretical interest; however, we do show the equivalency between higher normal forms and ternary relationships in the ER model and UML for the interested reader

6.1 Fundamentals of Normalization

Relational database tables, whether they are derived from ER or UML models, sometimes suffer from some rather serious problems in terms of

T

Trang 2

performance, integrity and maintainability For example, when the entire database is defined as a single large table, it can result in a large amount of redundant data and lengthy searches for just a small number

of target rows It can also result in long and expensive updates, and dele-tions in particular can result in the elimination of useful data as an unwanted side effect

Such a situation is shown in Figure 6.1, where products, salespersons, customers, and orders are all stored in a single table called Sales In this table, we see that certain product and customer information is stored redundantly, wasting storage space Certain queries, such as “Which cus-tomers ordered vacuum cleaners last month?” would require a search of the entire table Also, updates such as changing the address of the cus-tomer Dave Bachmann would require changing many rows Finally, deleting an order by a valued customer such as Qiang Zhu (who bought

an expensive computer), if that is his only outstanding order, deletes the only copy of his address and credit rating as a side effect Such informa-tion may be difficult (or sometimes impossible) to recover These prob-lems also occur for situations in which the database has already been set

up as a collection of many tables, but some of the tables are still too large

If we had a method of breaking up such a large table into smaller tables so that these types of problems would be eliminated, the database would be much more efficient and reliable Classes of relational database schemes or table definitions, called normal forms, are commonly used to accomplish this goal The creation of a normal form database table is called normalization Normalization is accomplished by analyzing the

Figure 6.1 Single table database

DVD player

Plymouth product_name

Sales

order_no cust_name cust_addr credit date

1-3-03 4-15-05 9-12-04 12-5-04 5-10-05 8-3-02 10-1-04 4-19-99 1-4-04 3-15-04

sales_name Carl Bloch Ted Hanss Dick Phillips Fred Remley

R Metz Paul Basile Carl Bloch Carl Bloch Dick Phillips Fred Remley

6 10 8 3 7 8 7 8 6 6

Austin Ann Arbor Detroit Chicago Mumbai Chicago Detroit Austin East Lansing

1458 2730 2460 519 1986 1817 1865 1885 1943 2315

Dave Bachmann Qiang Zhu Mike Stolarchuck Peter Honeyman Charles Antonelli C.V Ravishankar Charles Antonelli Betsy Karmeisool Dave Bachmann Sakti Pramanik

vacuum cleaner

computer

refrigerator

radio

CD player

vacuum cleaner

refrigerator

television

Trang 3

interdependencies among individual attributes associated with those tables and taking projections (subsets of columns) of larger tables to form smaller ones

Let us first review the basic normal forms, which have been well established in the relational database literature and in practice

6.1.1 First Normal Form

Definition A table is in first normal form (1NF) if and only if all

col-umns contain only atomic values, that is, each column can have only one value for each row in the table

Relational database tables, such as the Sales table illustrated in Figure

6.1, have only atomic values for each row and for each column Such tables are considered to be in first normal form, the most basic level of normalized tables

To better understand the definition for 1NF, it helps to know the

dif-ference between a domain, an attribute, and a column A domain is the

set of all possible values for a particular type of attribute, but may be used for more than one attribute For example, the domain of people’s names is the underlying set of all possible names that could be used for either customer-name or salesperson-name in the database table in Fig-ure 6.1 Each column in a relational table represents a single attribute, but in some cases more than one column may refer to different attributes from the same domain When this occurs, the table is still in 1NF because the values in the table are still atomic In fact, standard SQL assumes only atomic values and a relational table is by default in 1NF A nice explanation of this is given in Muller [1999]

6.1.2 Superkeys, Candidate Keys, and Primary Keys

A table in 1NF often suffers from data duplication, update performance, and update integrity problems, as noted above To understand these issues better, however, we must define the concept of a key in the

con-text of normalized tables A superkey is a set of one or more attributes,

which, when taken collectively, allows us to identify uniquely an entity

or table Any subset of the attributes of a superkey that is also a superkey,

and not reducible to another superkey, is called a candidate key A primary

key is selected arbitrarily from the set of candidate keys to be used in an index for that table

Trang 4

As an example, in Figure 6.2 a composite of all the attributes of the table forms a superkey because duplicate rows are not allowed in the relational model Thus, a trivial superkey is formed from the composite

of all attributes in a table Assuming that each department address (dept_addr) in this table is single valued, we can conclude that the com-posite of all attributes except dept_addr is also a superkey Looking at smaller and smaller composites of attributes and making realistic assumptions about which attributes are single valued, we find that the composite (report_no, author_id) uniquely determines all the other attributes in the table and is therefore a superkey However, neither report_no nor author_id alone can determine a row uniquely, and the composite of these two attributes cannot be reduced and still be a super-key Thus, the composite (report_no, author_id) becomes a candidate key Since it is the only candidate key in this table, it also becomes the primary key

A table can have more than one candidate key If, for example, in Figure 6.2, we had an additional column for author_ssn, and the com-posite of report_no and author_ssn uniquely determine all the other attributes of the table, then both (report_no, author_id) and (report_no, author_ssn) would be candidate keys The primary key would then be an arbitrary choice between these two candidate keys

Other examples of multiple candidate keys can be seen in Figure 5.5

(see Chapter 5) In Figure 5.5a the table uses_notebook has three

candi-date keys: (emp_id, project_name), (emp_id, notebook_no), and

(project_name, notebook_no); and in Figure 5.5b the table assigned_to

has two candidate keys: (emp_id, loc_name) and (emp_id, project_name) Figures 5.5c and 5.5d each have only a single candidate key

Figure 6.2 Report table

5789

design report_no

Report

editor dept_no dept_name dept_addr author_id author_name

mantei bolton koenig fry umar koenig

author_addr cs-tor mathrev mathrev folkstone prise mathrev

53 44 71 26 38 71

design design analysis analysis analysis

argus1 argus1 argus1 argus2 argus2 argus2

woolf woolf woolf koenig koenig koenig

15 15 15 27 27 27

4216

5789

Trang 5

6.1.3 Second Normal Form

To explain the concept of second normal form (2NF) and higher, we introduce the concept of functional dependence, which was briefly described in Chapter 2 The property of one or more attributes that uniquely determine the value of one or more other attributes is called

functional dependence Given a table (R), a set of attributes (B) is

function-ally dependent on another set of attributes (A) if, at each instant of time, each A value is associated with only one B value Such a functional dependence is denoted by A -> B In the preceding example from Figure 6.2, let us assume we are given the following functional dependencies

for the table report:

report: report_no -> editor, dept_no

dept_no -> dept_name, dept_addr author_id -> author_name, author_addr

Definition A table is in second normal form (2NF) if and only if it is

in 1NF and every nonkey attribute is fully dependent on the primary key An attribute is fully dependent on the primary key if it is on the right side of an FD for which the left side is either the primary key itself or something that can be derived from the primary key using the transitivity of FDs

An example of a transitive FD in report is the following:

report_no -> dept_no dept_no -> dept_name Therefore we can derive the FD (report_no -> dept_name), since dept_name is transitively dependent on report_no

Continuing our example, the composite key in Figure 6.2, (report_no, author_id), is the only candidate key and is therefore the primary key However, there exists one FD (dept_no -> dept_name, dept_addr) that has no component of the primary key on the left side, and two FDs (report_no -> editor, dept_no and author_id -> author_name, author_addr) that contain one component of the primary key on the left

side, but not both components As such, report does not satisfy the

condition for 2NF for any of the FDs

Định dạng
Số trang	5
Dung lượng	184,72 KB