The Principle of Orthogonal Design: Let A and B be any two base relvars* in the database.. ────────── * Recall that, from the user's point of view, all relvars are base ones apart from
Trang 1A little more science! The Principle of Orthogonal Design: Let A and B be any two base relvars* in the database Then there must
not exist nonloss decompositions of A and B into A1, , Am and B1, , Bn (respectively) such that some projection Ai in the set A1, , Am and some projection Bj in the set B1, , Bn have
overlapping meanings (This version of the principle subsumes the simpler version, because one nonloss decomposition that always
exists for relvar R is the identity projection of R, i.e., the projection of R over all of its attributes.)
──────────
* Recall that, from the user's point of view, all relvars are
base ones (apart from views defined as mere shorthands); i.e., the principle applies to the design of all "expressible" databases,
not just to the "real" database──The Principle of Database
Relativity at work once again Of course, analogous remarks apply
to the principles of normalization also
──────────
It's predicates, not names, that represent data semantics
Mention "orthogonal decomposition" (this will be relevant when
we get to distributed databases in Chapter 21)
Violating The Principle of Orthogonal Design in fact violates The Information Principle! The principle is just formalized
common sense, of course (like the principles of further
normalization) Remind students of the relevance of the principle
to updating union, intersection, and difference views (Chapter 10)
13.7 Other Normal Forms
You're welcome to skip this section If you do cover it, note that there's some confusion in the literature over exactly what DK/NF is (see, e.g., "The Road to Normalization," by Douglas W
Hubbard and Joe Celko, DBMS, April 1994) Note: After I first
wrote these notes, the topic of DK/NF came up on the website
www.dbdebunk.com I've attached my response to that question as
an appendix to this chapter of the manual
References and Bibliography
Trang 2Copyright (c) 2003 C J Date page 13.8
Reference [13.15] is a classic and should be distributed to
students if at all possible
The annotation to reference [13.14] says this: "The two
embedded MVDs [in relvar CTXD] would have to be stated as
additional, explicit constraints on the relvar The details are
left as an exercise." Answer:
CONSTRAINT EMVD_ON_CTXD
CTXD { COURSE, TEACHER, TEXT } =
CTXD { COURSE, TEACHER } JOIN CTXD { COURSE, TEXT } ;
Note that this constraint is much harder to state in SQL, because
SQL doesn't support relational comparisons! Here it is in SQL: CREATE ASSERTION EMVD_ON_CTXD
( NOT EXISTS ( SELECT DISTINCT COURSE, TEACHER, TEXT
FROM CTXD AS CTXD1 WHERE NOT EXISTS
( SELECT DISTINCT COURSE, TEACHER, TEXT FROM ( ( SELECT DISTINCT COURSE, TEACHER
FROM CTXD ) AS POINTLESS1
NATURAL JOIN ( SELECT DISTINCT COURSE, TEXT
FROM CTXD ) AS POINTLESS2 ) )
AS CTXD2 WHERE CTXD1.COURSE = CTXD2.COURSE AND CTXD1.TEACHER = CTXD2.TEACHER AND CTXD1.TEXT = CTXD2.TEXT ) AND
( NOT EXISTS ( SELECT DISTINCT COURSE, TEACHER, TEXT
FROM ( ( SELECT DISTINCT COURSE, TEACHER
FROM CTXD ) AS POINTLESS1
NATURAL JOIN ( SELECT DISTINCT COURSE, TEXT
FROM CTXD ) AS POINTLESS2 ) )
AS CTXD2 WHERE NOT EXISTS
( SELECT DISTINCT COURSE, TEACHER, TEXT FROM CTXD AS CTXD1
WHERE CTXD1.COURSE = CTXD2.COURSE AND CTXD1.TEACHER = CTXD2.TEACHER AND CTXD1.TEXT = CTXD2.TEXT ) ; You might want to discuss this SQL formulation in detail
Answers to Exercises
13.1 Here first is the MVD for relvar CTX (algebraic version):
Trang 3CONSTRAINT CTX_MVD CTX = CTX { COURSE, TEACHER } JOIN
CTX { COURSE, TEXT } ; Calculus version:
CONSTRAINT CTX_MVD CTX =
{ CTXX.COURSE, CTXX.TEACHER, CTXY.TEXT }
WHERE CTXX.COURSE = CTXY.COURSE ; CTXX and CTXY are range variables ranging over CTX
Second, here is the JD for relvar SPJ (algebraic version): CONSTRAINT SPJ_JD SPJ = SPJ { S#, P# } JOIN
SPJ { P#, J# } JOIN SPJ { J#, S# } ; Calculus version:
CONSTRAINT SPJ_JD SPJ =
{ SPJX.S#, SPJY.P#, SPJZ.J# } WHERE SPJX.P# = SPJY.P#
AND SPJY.J# = SPJZ.J# AND SPJZ.S# = SPJX.S# ; SPJX, SPJY, and SPJZ are range variables ranging over SPJ
13.2 Note first that R contains every a value paired with every b value, and further that the set of all a values in R, S say, is the same as the set of all b values in R Loosely speaking,
therefore, the body of R is equal to the Cartesian product of set
S with itself; more precisely, R is equal to the Cartesian product
of its projections R{A} and R{B} R thus satisfies the following MVDs (which are not trivial, please note, since they're certainly
not satisfied by all binary relvars):
{ } →→ A | B
Equivalently, R satisfies the JD *{A,B} (remember that join
degenerates to Cartesian product when there are no common
attributes) It follows that R isn't in 4NF, and it can be
nonloss-decomposed into its projections on A and B.* R is,
however, in BCNF (it's all key), and it satisfies no nontrivial FDs
──────────
* Those projections will have identical bodies, of course For that reason, it might be better to define just one of them as a
Trang 4Copyright (c) 2003 C J Date page
13.10
base relvar, and define R as a view over that base relvar (the
Cartesian product of that base relvar with itself, loosely
speaking)
──────────
Note: R also satisfies the MVDs
A →→ B | { }
and
B →→ A | { }
However, these MVDs are trivial, since they're satisfied by every binary relvar R with attributes A and B
13.3 First we introduce three relvars
REP { REP#, }
KEY { REP# } AREA { AREA#, }
KEY { AREA# } PRODUCT { PROD#, }
KEY { PROD# } with the obvious interpretation Second, we can represent the relationship between sales representatives and sales areas by a relvar
RA { REP#, AREA# }
KEY { REP#, AREA# }
and the relationship between sales representatives and products by
a relvar
RP { REP#, PROD# }
KEY { REP#, PROD# }
(both of these relationships are many-to-many)
Next, we're told that every product is sold in every area So
if we introduce a relvar
AP { AREA#, PROD# }
KEY { AREA#, PROD# }
Trang 5to represent the relationship between areas and products, then we have the constraint (let's call it C) that
AP = AREA { AREA# } JOIN PRODUCT { PROD# }
Notice that constraint C implies that relvar AP isn't in 4NF (see Exercise 13.2) In fact, relvar AP doesn't give us any
information that can't be obtained from the other relvars; to be precise, we have
AP { AREA# } = AREA { AREA# }
and
AP { PROD# } = PRODUCT { PROD# }
But let's assume for the moment that relvar AP is included in our
design anyway
No two representatives sell the same product in the same area
In other words, given an {AREA#,PROD#} combination, there's
exactly one responsible sales representative (REP#), so we can
introduce a relvar
APR { AREA#, PROD#, REP# }
KEY { AREA#, PROD# }
in which (to make the FD explicit)
{ AREA#, PROD# } → REP#
(of course, specification of the combination {AREA#,PROD#} as a key is sufficient to express this FD) Now, however, relvars RA,
RP, and AP are all redundant, since they're all projections of
APR; they can therefore all be dropped In place of constraint C,
we now need constraint C1:
APR { AREA#, PROD# } = AREA { AREA# } JOIN PRODUCT { PROD# } This constraint must be stated separately and explicitly (it isn't
"implied by keys")
Also, since every representative sells all of that
representative's products in all of that representative's areas,
we have the additional constraint C2 on relvar APR:
REP# →→ AREA# | PROD#
(a nontrivial MVD; relvar APR isn't in 4NF) Again the constraint must be stated separately and explicitly
Trang 6Copyright (c) 2003 C J Date page
13.12
Thus the final design consists of the relvars REP, AREA,
PRODUCT, and APR, together with the constraints C1 and C2:
CONSTRAINT C1 APR { AREA#, PROD# } =
AREA { AREA# } JOIN PRODUCT { PROD# } ; CONSTRAINT C2 APR =
APR { REP#, AREA# } JOIN APR { REP#, PROD# } ; This exercise illustrates very clearly the point that, in
general, the normalization discipline is adequate to represent
some semantic aspects of a given problem (basically, dependencies
that are implied by keys, where by "dependencies" we mean FDs,
MVDs, or JDs), but explicit statement of additional dependencies might also be needed for other aspects, and some aspects can't be represented in terms of such dependencies at all It also
illustrates the point (once again) that it isn't always desirable
to normalize "all the way" (relvar APR is in BCNF but not in 4NF)
Note: As a subsidiary exercise, you might like to consider
whether a design involving RVAs might be appropriate for the
problem under consideration Might such a design mean that some
of the comments in the previous paragraph no longer apply?
13.4 The revision is straightforward──all that's necessary is to replace the references to FDs and BCNF by analogous references to MVDs and 4NF, thus:
1 Initialize D to contain just R
2 For each non4NF relvar T in D, execute Steps 3 and 4
3 Let X →→ Y be an MVD for T that violates the requirements
for 4NF
4 Replace T in D by two of its projections, that over X and Y and that over all attributes except those in Y
13.5 This is a "cyclic constraint" example The following design
is suitable:
REP { REP#, }
KEY { REP# } AREA { AREA#, }
KEY { AREA# } PRODUCT { PROD#, }
KEY { PROD# }
Trang 7RA { REP#, AREA# }
KEY { REP#, AREA# }
AP { AREA#, PROD# }
KEY { AREA#, PROD# }
PR { PROD#, REP# }
KEY { PROD#, REP# }
Also, the user needs to be informed that the join of RA, AP, and
PR does not involve any "connection trap":
CONSTRAINT NO_TRAP
( RA JOIN AP JOIN PR ) { REP#, AREA# } = RA AND
( RA JOIN AP JOIN PR ) { AREA#, PROD# } = AP AND
( RA JOIN AP JOIN PR ) { PROD#, REP# } = PR ;
Note: As with Exercise 13.3, you might like to consider
whether a design involving RVAs might be appropriate for the
problem under consideration
13.6 Perhaps surprisingly, the design does conform to
normalization principles! First, SX and SY are both in 5NF
Second, the original suppliers relvar can be reconstructed by
joining SX and SY back together Third, neither SX nor SY is
redundant in that reconstruction process Fourth, SX and SY are independent in Rissanen's sense
Despite the foregoing observations, the design is very bad, of course; to be specific, it involves some obviously undesirable redundancy But the design isn't bad because it violates the
principles of normalization; rather, it's bad because it violates
The Principle of Orthogonal Design, as explained in Section 13.6
Thus, we see that following the principles of normalization are
necessary but not sufficient to ensure a good design We also see
that (as stated in Section 13.6) the principles of normalization
and The Principle of Orthogonal Design complement each other, in a
sense
Appendix (DK/NF)
This appendix consists (apart from this introductory paragraph) of the text──slightly edited here──of a message posted on the website
www.dbdebunk.com in May 2003 It's my response to a question from
someone I'll refer to here as Victor
(Begin quote)
Trang 8Copyright (c) 2003 C J Date page
13.14
Victor has "trouble understanding domain-key normal form
(DK/NF)." I don't blame him; there's certainly been some serious nonsense published on this topic in the trade press and elsewhere Let me see if I can clarify matters
DK/NF is best thought of as a straw man (sorry, straw person)
It was introduced by Ron Fagin in his paper "A Normal Form for
Relational Databases that Is Based on Domains and Keys," ACM TODS
6, No 3 (September 1981) As Victor says (more or less), Fagin defines a relvar R to be in DK/NF if and only if every constraint
on R is a logical consequence of what he (Fagin) calls the domain constraints and key constraints on R Here:
• A domain constraint──better called an attribute
constraint──is simply a constraint to the effect a given
attribute A of R takes its values from some given domain D
• A key constraint is simply a constraint to the effect that a
given set A, B, , C of R constitutes a key for R
Thus, if R is in DK/NF, then it is sufficient to enforce the
domain and key constraints for R, and all constraints on R will be
enforced automatically And enforcing those domain and key
constraints is, of course, very simple (most DBMS products do it already) To be specific, enforcing domain constraints just means checking that attribute values are always values from the
applicable domain (i.e., values of the right type); enforcing key constraints just means checking that key values are unique
The trouble is, lots of relvars aren't in DK/NF in the first
place For example, suppose there's a constraint on R to the
effect that R must contain at least ten tuples Then that
constraint is certainly not a consequence of the domain and key
constraints that apply to R, and so R isn't in DK/NF The sad
fact is, not all relvars can be reduced to DK/NF; nor do we know
the answer to the question "Exactly when can a relvar be so
reduced?"
Now, it's true that Fagin proves in his paper that if relvar R
is in DK/NF, then R is automatically in 5NF (and hence 4NF, BCNF,
etc.) as well However, it's wrong to think of DK/NF as another step in the progression from 1NF to 2NF to to 5NF, because 5NF
is always achievable, but DK/NF is not
It's also wrong to say there are "no normal forms higher than
DK/NF." In recent work of my own──documented in the book Temporal Data and the Relational Model, by myself with Hugh Darwen and
Nikos Lorentzos (Morgan Kaufmann, 2003)──my coworkers and I have
come up with a new sixth normal form, 6NF 6NF is higher than 5NF
(all 6NF relvars are in 5NF, but the converse isn't true);
Trang 9moreover, 6NF is always achievable, but it isn't implied by DK/NF
In other words, there are relvars in DK/NF that aren't in 6NF A trivial example is:
EMP { EMP#, DEPT#, SALARY } KEY { EMP# }
(with the obvious semantics)
Victor also asks: "If a [relvar] has an atomic primary key
and is in 3NF, is it automatically in DK/NF?" No If the EMP
relvar just shown is subject to the constraint that there must be
at least ten employees, then EMP is in 3NF (and in fact 5NF) but not DK/NF (Incidentally, this example also answers another of Victor's questions: "Can [we] give "an example of a [relvar]
that's in 5NF but not in DK/NF?") Note: I'm assuming here
that the term "atomic key" means what would more correctly be
called a simple key (meaning it doesn't involve more than one
attribute) I'm also assuming that the relvar in question has
just one key, which we might harmlessly regard as the "primary"
key If either of these assumptions is invalid, the answer to the original question is probably "no" even more strongly!
The net of all of the above is that DK/NF is (at least at the time of writing) a concept that's of some considerable theoretical interest but not yet of much practical ditto The reason is that, while it would be nice if all relvars in the database were in
DK/NF, we know that goal is impossible to achieve in general, nor
do we know when it is possible For practical purposes, stick to
5NF (and 6NF) Hope this helps!
(End quote)
Trang 10Copyright (c) 2003 C J Date page 14.1
Chapter 14
Principal Sections
• The overall approach
• The E/R model
• E/R diagrams
• DB design with the E/R model
• A brief analysis
General Remarks
The field of "semantic modeling" encompasses more than just
database design, but for obvious reasons the emphasis in this
chapter is on database design aspects (though the first two
sections do consider the wider perspective briefly, and so does the annotation to several of the references at the end of the
chapter) The chapter shouldn't be skipped, but portions of it might be skipped You could also beef up the treatment of "E/R modeling" if you like
Let me repeat the following remarks from the preface to this manual:
You could also read Chapter 14 earlier if you like, possibly right after Chapter 4 Many instructors like to treat the entity/relationship material much earlier than I do For that reason I've tried to make Chapter 14 more or less
self-contained, so that it can be read "early" if you like
And the expanded version of these remarks from the preface to the book itself:
Some reviewers of earlier editions complained that database design issues were treated too late But it's my feeling that students aren't ready to design databases properly or to
appreciate design issues fully until they have some
understanding of what databases are and how they're used; in other words, I believe it's important to spend some time on the relational model and related matters before exposing the student to design questions Thus, I still believe Part III
is in the right place (That said, I do recognize that many instructors prefer to treat the entity/relationship material much earlier To that end, I've tried to make Chapter 14 more