1. Trang chủ
  2. » Công Nghệ Thông Tin

Joe Celko s SQL for Smarties - Advanced SQL Programming P11 potx

10 259 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 132,35 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

72 CHAPTER 2: NORMALIZATION section_id CHAR1 NOT NULL, grade CHAR1 NOT NULL; CREATE TABLE Students student_name CHAR 25 NOT NULL PRIMARY KEY, major CHAR10 NOT NULL; A common misundersta

Trang 1

72 CHAPTER 2: NORMALIZATION

section_id CHAR(1) NOT NULL, grade CHAR(1) NOT NULL);

CREATE TABLE Students (student_name CHAR (25) NOT NULL PRIMARY KEY, major CHAR(10) NOT NULL);

A common misunderstanding about relational theory is that 3NF tables have no transitive dependencies As indicated above, if X → Y, X does not have to be a key if Y is part of a candidate key We still have a transitive dependency in the example—(room_nbr, time_period) → (course_name, section_id)—but since the right side of the dependency

is a key, it is technically in 3NF This table structure still has unreasonable behavior, though: several courses can be assigned to the same room at the same time

Another form of transitive dependency is a computed column For example:

CREATE TABLE Stuff (width INTEGER NOT NULL, length INTEGER NOT NULL, height INTEGER NOT NULL, volume INTEGER NOT NULL CHECK (width * length * height = volume), PRIMARY KEY (width, length, height));

The volume column is determined by the other three columns, so any change to one of the three columns will require a change to the volume column You can use a VIEW to create computed columns

Elementary Key Normal Form (EKNF) is a subtle enhancement on 3NF

By definition, EKNF tables are also in 3NF This happens when there is more than one unique composite key and they overlap Such cases can cause redundant information in the overlapping column(s) For example, in the following table, let’s assume that a subject title is also a unique identifier for a given subject in the following table:

CREATE TABLE Enrollment (student_id INTEGER NOT NULL,

Trang 2

2.6 Boyce-Codd Normal Form (BCNF) 73

course_code CHAR(6) NOT NULL,

course_name VARCHAR(15) NOT NULL,

PRIMARY KEY (student_id, course_name)

, UNIQUE (student_id, course_name) alternative key

);

Enrollment

student_id course_name course_name

=======================================

1 'CS-100' 'ER Diagrams'

1 'CS-114' 'Database Design'

2 'CS-114' 'Database Design'

This table, although it is in 3NF, violates EKNF The primary key of the above table is the combination of (student_id, course_name)

However, we can also see an alternative key (student_id, course_name)

as well The above schema could result in update and deletion

anomalies, because values of both course_name and course_name tend

to be repeated for a given subject

The following schema is a decomposition of the above table to satisfy EKNF:

CREATE TABLE Subjects

(course_name CHAR(6) NOT NULL PRIMARY KEY,

course_name VARCHAR(15) NOT NULL);

CREATE TABLE Enrollment

(student_id INTEGER NOT NULL,

course_name CHAR(6) NOT NULL,

PRIMARY KEY (student_id, course_name));

For reasons that will become obvious in the following section, most designers do not ensure a table is in EKNF, as they will move directly on

to Boyce-Codd Normal Form after ensuring that a schema is in 3NF

Thus, EKNF is included here only for reasons of historical accuracy and completeness

A table is in BCNF when for all nontrivial FDs (X → A), X is a superkey for the whole schema A superkey is a unique set of columns that

Trang 3

74 CHAPTER 2: NORMALIZATION

identify each row in a table, but you can remove some columns from it and it will still be a key Informally, a superkey is carrying extra weight

BCNF is the normal form that actually removes all transitive dependencies A table is in BCNF if for all (X → Y), X is a key—period

We can go to this normal form just by adding another key with UNIQUE (room, time_period) constraint clause to the table Classes

There are some other interesting and useful “higher” normal forms, but they are outside of the scope of this discussion In our example, we have removed all of the important anomalies with BCNF

3NF was concerned with the relationship between key and nonkey columns However, a column can often play both roles Consider a table for computing salesmen’s bonus gifts, which has for each salesman his base salary, the number of sales points he has won in a contest, and the bonus gift awarded for that combination of salary range and points For example, we might give a fountain pen to a beginning salesman with a base pay rate somewhere between $15,000 and $20,000 and 100 sales points, but give a car to a master salesman whose salary is between

$30,000 and $60,000 and who has 200 points The functional dependencies are, therefore:

(paystep, points) -> gift gift -> points

Let’s start with a table that has all the data in it and normalize it

Gifts Salary_amt points gift

==========================

15000 100 'Pencil'

17000 100 'Pen'

30000 200 'Car'

31000 200 'Car'

32000 200 'Car'

This schema is in 3NF, but it has problems You cannot insert a new gift into our offerings and points unless you have a salary to go with it If you remove any sales points, you lose information about the gifts and salaries (e.g., only people in the $30,000 range can win a car) And, finally, a change in the gifts for a particular point score would have to affect all the rows within the same pay step This table needs to be broken apart into two tables:

Trang 4

2.7 Fourth Normal Form (4NF) 75

PayGifts

Salary_amt gift

=================

15000 'Pencil'

17000 'Pen'

30000 'Car'

31000 'Car'

32000 'Car'

GiftsPoints

gift points

===============

'Pencil' 100

'Pen' 100

'Car' 200

Fourth Normal Form (4NF) makes use of multivalued dependencies

The problem it solves is that the table has too many of them For

example, consider a table of departments, their projects, and the parts

they stock The MVDs in the table would be:

dept_name ->> jobs

dept_name ->> parts

Assume that dept_name ‘d1’ works on jobs ‘j1’ and ‘j2’ with parts ‘p1’

and ‘p2’; that dept_name ‘d2’ works on jobs ‘j3’, ‘j4’, and ‘j5’ with parts

‘p2’ and ‘p4’; and that dept_name ‘d3’ works only on job ‘j2’ with parts

‘p5’ and ‘p6’ The table would look like this:

dept job part

=================

'd1' 'j1' 'p1'

'd1' 'j1' 'p2'

'd1' 'j2' 'p1'

'd1' 'j2' 'p2'

'd2' 'j3' 'p2'

'd2' 'j3' 'p4'

'd2' 'j4' 'p2'

Trang 5

If you want to add a part to a dept_name, you must create more than one new row Likewise, removing a part or a job from a row can destroy information Updating a part or job name will also require multiple rows

to be changed

The solution is to split this table into two tables, one with (dept_name, jobs) in it and one with (dept_name, parts) in it The definition of 4NF is that we have no more than one MVD in a table If a table is in 4NF, it is also in BCNF

Fifth Normal Form (5NF), also called the Join-Projection Normal Form

or the Projection-Join Normal Form, is based on the idea of a lossless JOIN or the lack of a join-projection anomaly This problem occurs

when you have an n-way relationship, where n > 2 A quick check for

5NF is to see if the table is in 3NF and all the candidate keys are single columns

As an example of the problems solved by 5NF, consider a table of house notes that records the buyer, the seller, and the lender:

HouseNotes buyer seller lender

==================================

'Smith' 'Jones' 'National Bank' 'Smith' 'Wilson' 'Home Bank' 'Nelson' 'Jones' 'Home Bank'

This table is a three-way relationship, but because many CASE tools allow only binary relationships, it might have to be expressed in an E-R (entity-relationship) diagram as three binary relationships, which would generate CREATE TABLE statements leading to these tables:

Trang 6

2.8 Fifth Normal Form (5NF) 77

BuyerLender

buyer lender

=============================

'Smith' 'National Bank'

'Smith' 'Home Bank'

'Nelson' 'Home Bank'

SellerLender

seller lender

=======================

'Jones' 'National Bank'

'Wilson' 'Home Bank'

'Jones' 'Home Bank'

BuyerSeller

buyer seller

================

'Smith' 'Jones'

'Smith' 'Wilson'

'Nelson' 'Jones'

The trouble is that when you try to assemble the original information

by joining pairs of these three tables together as follows:

SELECT BS.buyer, SL.seller, BL.lender

FROM BuyerLender AS BL,

SellerLender AS SL,

BuyerSeller AS BS

WHERE BL.buyer = BS.buyer

AND BL.lender = SL.lender

AND SL.seller = BS.seller;

you will recreate all the valid rows in the original table, such as (‘Smith’,

‘Jones’, ‘National Bank’), but there will also be false rows, such as (‘Smith’, ‘Jones’, ‘Home Bank’), which were not part of the original table This is called a join-projection anomaly

There are also strong JPNFs and overstrong JPNFs, which make use

of JOIN dependencies (JDs for short) Unfortunately, there is no systematic way to find a JPNF or 4NF schema, because the problem is known to be NP-complete

Trang 7

schema from a set of FDs Most CASE tools that produce an E-R diagram will find only one of them, and go no further However, if you use an ORM (Object Role Model) tool properly, the schema will be in 5NF I suggest strongly that you get any of the books by Terry Halpin on this

technique: Information Modeling and Relational Databases from Conceptual Analysis to Logical Design, ISBN 1-55860-672-6; Database Modeling with Microsoft® Visio for Enterprise Architects, ISBN 1-55860-919-9.

Ronald Fagin defined Domain/Key Normal Form (DKNF) in 1981 as a schema that has all of the domain constraints and functional

dependencies enforced There is not yet a general algorithm that will always generate the DKNF solution, given a set of constraints We can, however, determine DKNF in many special cases, and it is a good guide

to writing DDL in the real world

Let’s back up a bit and look at the mathematical model under normalization An FD has a defined system of axioms that can be used in normalization problems These six axioms, known as Armstrong’s axioms, are given below:

Reflexive: X → X

Augmentation: If X → Y then XZ → Y

Union: If (X → Y and X → Z) then X → YZ

Decomposition: If X → Y and Z a subset of Y, then X → Z

Transitivity: If (X → Y and Y → Z) then X → Z

Pseudotransitivity: If (X → Y and YZ → W) then XZ → W

They make good sense if you just look at them, which is something

we like in a set of axioms In the real world, the FDs are the business rules we are trying to model

Trang 8

2.9 Domain-Key Normal Form (DKNF) 79

In the normalization algorithm for 3NF (developed by P A Berstein, 1976) we use the axioms to get rid of redundant FDs For example, if we are given:

A -> B

A -> C

B -> C

DB -> E

DAF -> E

A → C is redundant, because it can be derived from A → B and B →

C with transitivity Also, DAF → E is redundant, because it can be derived from DB → E and A → B with transitivity (which gives us DA → E) and augmentation (which then allows DAF → E) What we would like

to find is the smallest set of FDs from which we can generate all of the given rules This is called a nonredundant cover For the FDs above, one cover would be:

A -> B

B -> C

DB -> E

Once we do this, Berstein shows that we can just create a table for each of the FDs where A, B, and DB are the respective keys We have taken it easy so far, but now it’s time for a challenge

As an example of a schema with multiple 3NF tables, here is a problem that was used in a demonstration by DBStar Corporation (now Evoke Software) The company used it as an example in a demonstration that comes with their CASE tool

We are given an imaginary and simplified airline that has a database for scheduling flights and pilots Most of the relationships are obvious things Flights have only one departure time and one destination They can get a different pilot and can be assigned to a different gate each day

of the week The FDs for the database are given below:

1) flight -> destination

2) flight -> hour

3) (day, flight) -> gate

4) (day, flight) -> pilot

5) (day, hour, pilot) -> gate

6) (day, hour, pilot) -> flight

Trang 9

A purist, looking at this collection of FDs, will be bothered by the redundancies in this list But in the real world, when you interview people, they do not speak to you in a minimal set; they state things that they know to be true in their situation In fact, they very often leave out relationships that they consider too obvious to mention

Your problem is to find 3NF or stronger database schemas in these FDs You have to be careful! You have to have all of the columns, obviously, but your answer could be in 3NF and still ignore some of the FDs For example, this will not work:

CREATE TABLE PlannedSchedule (flight, destination, hour, PRIMARY KEY (flight));

CREATE TABLE ActualSchedule (day, flight, gate, pilot, PRIMARY KEY (day, flight));

If we apply the Union axiom to some of the FDs, we get:

(day, hour, gate) -> (destination, flight, pilot) (day, hour, pilot) -> (destination, flight, gate)

This example says that the user has required that if we are given a day, an hour, and a gate we should be able to determine a unique flight for that day, hour, and gate We should also be able to determine a unique flight given a day, hour, and pilot

Given the PlannedSchedule and ActualSchedule tables, you cannot produce views where either of the two constraints we just mentioned is enforced If the query “What flight does pilot X have on day Y and hour Z?” gives you more than one answer, it violates the FDs and common sense Here is an example of a schema that is allowable in this proposed schema, but is undesirable given our constraints:

PlannedSchedule flight hour destination =========================

118 17:00 Dallas

Trang 10

2.9 Domain-Key Normal Form (DKNF) 81

123 13:00 Omaha

155 17:00 Los Angeles

171 13:00 New York

666 13:00 Dis

ActualSchedule

day flight pilot gate

========================

Wed 118 Tom 12A

Wed 155 Tom 13B

Wed 171 Tom 12A

Thu 123 John 12A

Thu 155 John 12A

Thu 171 John 13B

The constraints mean that we should be able to find a unique answer

to each of the following questions and not lose any information when inserting and deleting data:

1 Which flight is leaving from gate 12A on Thursdays at 13:00? This question looks fine until you realize that you don’t know any further information about flight 666, which was not required to have anything about its day or pilot in the alSchedule table Likewise, I can add a flight to the Actu-alSchedule table that has no information in the PlannedSchedule table

2 Which pilot is assigned to the flight that leaves gate 12A on Thursdays at 13:00? This has the same problem as before

3 What is the destination of the flight in query 1 and 2? This has the same problem as before

4 What gate is John leaving from on Thursdays at 13:00?

5 Where is Tom flying to on Wednesdays at 17:00?

6 What flight is assigned to Tom on Wednesdays at 17:00?

It might help if we gave an example showing how one of the FDs in the problem can be derived using the axioms of FD calculus, just like you would do a geometry proof:

Ngày đăng: 06/07/2014, 09:20

TỪ KHÓA LIÊN QUAN