An Introduction to Database Systems 8Ed - C J Date - Solutions Manual Episode 1 Part 6 pot

A typical query is "Get suppliers who supply part P2." Using SEMIJOIN: S SEMIJOIN SP WHERE P# = P# 'P2' Without SEMIJOIN: S JOIN SP WHERE P# = P# 'P2' { S#, SNAME, STATUS, CITY

Trang 1

( P WHERE COLOR = COLOR ('Purple') ) { P# }

PER SP { S#, P# }

• Again stress the usefulness of WITH in breaking complex

expressions down into "step-at-a-time" ones Also stress the

fact that using WITH does not sacrifice nonprocedurality

Note: The discussion of projection in Section 7.4 includes the following question: Why can't any attribute be mentioned more than once in the attribute name commalist? The answer, of course,

is that the commalist is supposed to denote a set of attributes,

and attribute names in the result must therefore be unique

The discussion under Example 7.5.5 includes the following

text:

(Begin quote)

The purpose of the condition SA < SB is twofold:

• It eliminates pairs of supplier numbers of the form (x,x)

• It guarantees that the pairs (x,y) and (y,x) won't both

appear

(End quote)

The example might thus be used, if desired, to introduce the

concepts of:

• Reflexivity: A binary relation R{A,B} is reflexive if and only if A and B are of the same type and the tuple {A:x,B:x} appears in R for all applicable values x of that common type

• Symmetry: A binary relation R{A,B} is reflexive if and only

if A and B are of the same type and whenever the tuple

{A:x,B:y} appears in R, then the tuple {A:y,B:x} also appears

in R)

See, e.g., reference [24.1] for further discussion

7.6 What's the Algebra for?

Dispel the popular misconception that the algebra (or the

calculus) is just for queries Note in particular that the

algebra or calculus is fundamentally required in order to be able

to express integrity constraints, which is why Chapters 7 and 8

precede Chapter 9

Trang 2

Regarding relational completeness: The point is worth making

that, once Codd had defined this notion of linguistic power, it really became incumbent on the designer of any database language

either to ensure that the language in question was at least that

powerful or to have a really good justification for not doing so

And there really isn't any good justification This fact is a cogent criticism of several nonrelational database languages,

including object ones in particular and, I strongly suspect, XML query languages (see Chapter 27)

Regarding primitive operators: The "RISC algebra" A is worth

a mention

The section includes the following inline exercise: The

expression

( ( SP JOIN S ) WHERE P# = P# ( 'P2' ) ) { SNAME }

can be transformed into the logically equivalent, but probably more efficient, expression

( ( SP WHERE P# = P# ( 'P2' ) ) JOIN S ) { SNAME }

In what sense is the second expression probably more efficient? Why only "probably"?

Answer: The second expression performs the restriction before the join Loosely speaking, therefore, it reduces the size of the input to the join, meaning there's less data to be scanned to do the join, and the result of the join is smaller as well In fact, the second expression might allow the result of the join to be kept in main memory, while the first might not; thus, there could

be orders of magnitude difference in performance between the two expressions

On the other hand, recall that the relational model has

nothing to say regarding physical storage Thus, for example, the join of SP and S might be physically stored as a single file, in which case the first expression might perform better Also, there will be little performance difference between the two expressions anyway if the relations are small (as an extreme case, consider what happens if they're both empty)

This might be a good place to digress for a couple of minutes

to explain why duplicate tuples inhibit optimization! A detailed example and discussion can be found in reference [6.6] That same paper also refutes the claim that a "tuple-bag algebra" is "just

as respectable" (and in particular just as optimizable) as the relational algebra

Trang 3

7.7 Further Points

Explain associativity and commutativity briefly and show which operators are associative and which commutative Discuss some of

the implications Note: One such implication, not explicitly

mentioned in the book, is that we can legitimately talk about

(e.g.) the join of any number of relations (i.e., such an

expression does have a well-defined unique meaning)

Also explain the specified equivalences──especially the ones involving TABLE_DEE Introduce the terms "identity restriction," etc

Define "joins" (etc.) of one relation and of no relations at all I wouldn't bother to get into the specifics of why the join

of no relations and the intersection of no relations aren't the same! But if you're interested, an explanation can be found in Chapter 1 of reference [23.4] See also Exercise 7.10

7.8 Additional Operators

Regarding semijoin: It's worth noting that semijoin is often more

directly useful in practice than join is! A typical query is "Get suppliers who supply part P2." Using SEMIJOIN:

S SEMIJOIN ( SP WHERE P# = P# ( 'P2' ) )

Without SEMIJOIN:

( S JOIN ( SP WHERE P# = P# ( 'P2' ) ) )

{ S#, SNAME, STATUS, CITY }

It might be helpful to point out that the SQL analog refers to table S (only) in the SELECT and FROM clauses and mentions table

SP only in the WHERE clause:

SELECT *

FROM S

WHERE S# IN

( SELECT S#

FROM SP WHERE P# = 'P2' ) ;

In a sense, this SQL expression corresponds more directly to the semijoin formulation than to the join one

Analogous remarks apply to semidifference

Trang 4

Regarding extend: EXTEND is one of the most useful operators

of all Consider the query "Get parts and their weight in grams for parts whose gram weight exceeds 10000" (recall that part

weights are given in pounds) Relational algebra formulation:* ( EXTEND P ADD ( WEIGHT * 454 ) AS GMWT )

WHERE GMWT > WEIGHT ( 10000.0 ) Conventional SQL analog (note the repeated subexpression):

SELECT P.*, ( WEIGHT * 454 ) AS GMWT

FROM P

WHERE ( WEIGHT * 454 ) > 10000.0 ;

The name GMWT cannot be used in the WHERE clause because it's the

name of a column of the result table

──────────

* The discussion of EXTEND in the book asks what the type of the result of the expression WEIGHT * 454 is As this formulation suggests, the answer is, obviously enough, WEIGHT once again However, if we assume (as we're supposed to) that WEIGHT values are given in pounds, then the result of WEIGHT * 454 presumably has to be interpreted as a weight in pounds, too!──not as a weight

in grams Clearly something strange is going on here See the

discussion of units of measure in Chapter 5, Section 5.4

──────────

As this example suggests, the SQL idea that all queries must

be expressed as a projection (SELECT) of a restriction (WHERE) of

a product (FROM) is really much too rigid, and of course there's

no such limitation in the relational algebra──operations can be combined in arbitrary ways and executed in arbitrary sequences

Note: It's true that the SQL standard would now allow the repetition of the subexpression to be avoided as follows:

SELECT P#, PNAME, COLOR, WEIGHT, CITY, GMWT

FROM ( SELECT P.*, ( WEIGHT * 454 ) AS GMWT

FROM P ) AS POINTLESS

WHERE GMWT > 10000.0 ;

(The specification AS POINTLESS is pointless but is required by

SQL's syntax rules──see reference [4.20].) However, not all SQL products permit subqueries in the FROM clause at the time of

writing Note too that a select-item of the form "P.*" in the

Trang 5

outer SELECT clause would be illegal in this formulation! See reference [4.20] for further discussion of this point also

Note: The subsection on EXTEND is also the place where the

aggregate operators COUNT, SUM, etc., are first mentioned

Observe the important differences (both syntactic and semantic) in

the treatment of such operators between Tutorial D and SQL Note

too the aggregate operators ALL and ANY, both of which operate on arguments consisting of boolean values; ALL returns TRUE if and only if all arguments evaluate to TRUE, ANY returns TRUE if and only if any argument does

Regarding summarize: As the book says, please note that a

<summarize add> is not the same thing as an <aggregate operator invocation> An <aggregate operator invocation> is a scalar

expression and can appear wherever a scalar selector

invocation──in particular, a scalar literal──can appear A

<summarize add>, by contrast, is merely a SUMMARIZE operand; it's

not a scalar expression, it has no meaning outside the context of SUMMARIZE, and in fact it can't appear outside that context Note the two forms of SUMMARIZE (PER and BY)

Regarding tclose: Don't go into much detail The operator is

mentioned here mainly for completeness Do note, though, that it really is a new primitive──it can't be defined in terms of

operators we've already discussed (Explain why? See the answer

to Exercise 8.7 in the next chapter.)

7.8 Grouping and Ungrouping

This section could either be deferred or assigned as background reading.* Certainly the remarks on reversibility shouldn't be gone into too closely on a first pass Perhaps just say that

since we allow relation-valued attributes, we need a way of

mapping between relations with such attributes and relations

without them, and that's what GROUP and UNGROUP are for Show an ungrouped relation and its grouped counterpart; that's probably sufficient

──────────

* The article "What Does First Normal Form Really Mean?" (already mentioned in Chapter 6 of this manual) is relevant

──────────

Trang 6

Note clearly that "grouping" as described here is not the same thing as the GROUP BY operation in SQL──it returns a relation

(with a relation-valued attribute), not an SQL-style "grouped

table." In fact, SQL's GROUP BY violates the relational closure property

Relations with relation-valued attributes are not "NF²

relations"! In fact, it's hard to say exactly what "NF²

relations" are──the concept doesn't seem too coherent when you

really poke into it (Certainly we don't need all of the

additional operators──and additional complexity──that "NF²

relations" seem to involve.)

Answers to Exercises

7.1 The only operators whose definitions don't rely on tuple

equality are restrict, Cartesian product, extend, and ungroup

(Even these cases are debatable, as a matter of fact.)

7.2 The trap is that the join involves the CITY attributes as well

as the S# and P# attributes The result looks like this:

┌────┬───────┬────────┬────────┬────┬─────┬───────┬───────┬───────

─┐

│ S# │ SNAME │ STATUS │ CITY │ P# │ QTY │ PNAME │ COLOR │ WEIGHT │

├════┼───────┼────────┼────────┼════┼─────┼───────┼───────┼────────┤

│ S1 │ Smith │ 20 │ London │ P1 │ 300 │ Nut │ Red │ 12.0 │

│ S1 │ Smith │ 20 │ London │ P4 │ 200 │ Screw │ Red │ 14.0 │

│ S1 │ Smith │ 20 │ London │ P6 │ 100 │ Cog │ Red │ 19.0 │

│ S2 │ Jones │ 10 │ Paris │ P2 │ 400 │ Bolt │ Green │ 17.0 │

│ S3 │ Blake │ 30 │ Paris │ P2 │ 200 │ Bolt │ Green │ 17.0 │

│ S4 │ Clark │ 20 │ London │ P4 │ 200 │ Screw │ Red │ 14.0 │

└────┴───────┴────────┴────────┴────┴─────┴───────┴───────┴────────┘

7.3 2n This count includes the identity projection (i.e., the

projection over all n attributes), which yields a result identical

to the original relation r, and the nullary projection (i.e., the

projection over no attributes at all), which yields TABLE_DUM if

the original relation r is empty and TABLE_DEE otherwise

7.4 INTERSECT and TIMES are both special cases of JOIN, so we can

ignore them here The commutativity of UNION and JOIN is obvious

from the definitions, which are symmetric in the two relations

concerned We can show that UNION is associative as follows Let

t be a tuple Then:*

t ε A UNION (B UNION C) iff t ε A OR t ε (B UNION C),

i.e., iff t ε A OR (t ε B OR t ε C), i.e., iff (t ε A OR t ε B) OR t ε C,

Trang 7

i.e., iff t ε (A UNION B) OR t ε C, i.e., iff t ε (A UNION B) UNION C

Note the appeal in the third line to the associativity of OR

──────────

* The shorthand "iff" stands for "if and only if."

──────────

The proof that JOIN is associative is analogous

7.5 We omit the verifications, which are straightforward The

answer to the last part of the exercise is b SEMIJOIN a

7.6 JOIN is discussed in Section 7.4 INTERSECT can be defined as follows:

A INTERSECT B ≡ A MINUS ( A MINUS B )

or (equally well)

A INTERSECT B ≡ B MINUS ( B MINUS A )

These equivalences, though valid, are slightly unsatisfactory,

since A INTERSECT B is symmetric in A and B and the other two

expressions aren't Here by contrast is a symmetric equivalent:

( A MINUS ( A MINUS B ) ) UNION ( B MINUS ( B MINUS A ) )

Note: Given that A and B must be of the same type, we also have:

A INTERSECT B ≡ A JOIN B

As for DIVIDEBY, we have:

A DIVIDEBY B PER C ≡ A { X }

MINUS ( ( A { X } TIMES B { Y } )

MINUS C { X, Y } ) { X } Here X is the set of attributes common to A and C and Y is the set

of attributes common to B and C

Note: DIVIDEBY as just defined is actually a generalization

of the version defined in the body of the chapter──though it's

still a Small Divide [7.4]──inasmuch as we assumed previously that

A had no attributes apart from X, B had no attributes apart from

Trang 8

Y, and C had no attributes apart from X and Y The foregoing

generalization would allow, e.g., the query "Get supplier numbers for suppliers who supply all parts," to be expressed more simply

as just

S DIVIDEBY P PER SP

instead of (as previously) as

S { S# } DIVIDEBY P { P# } PER SP { S#, P# }

7.7 The short answer is no Codd's original DIVIDEBY did satisfy the property that

( a TIMES b ) DIVIDEBY b ≡ a

so long as b is nonempty (what happens otherwise?) However:

• Codd's DIVIDEBY was a dyadic operator; our DIVIDEBY is

triadic, and hence can't possibly satisfy a similar property

• In any case, even with Codd's DIVIDEBY, dividing a by b and then forming the Cartesian product of the result with b will yield a relation that might be identical to a, but is more

likely to be some proper subset of a:

( A DIVIDEBY B ) TIMES B ⊆ A

Codd's DIVIDEBY is thus more analogous to integer division in

ordinary arithmetic (i.e., it ignores the remainder)

7.8 We can say that TABLE_DEE (DEE for short) is the analog of 1 with respect to multiplication in ordinary arithmetic because

r TIMES DEE ≡ DEE TIMES r ≡ r

for all relations r (in other words, DEE is the identity with

respect to TIMES and, more generally, with respect to JOIN)

However, there's no relation that behaves with respect to TIMES in

a way that is exactly analogous to the way that 0 behaves with

respect to multiplication──but the behavior of TABLE_DUM (DUM for short) is somewhat reminiscent of the behavior of 0, inasmuch as

r TIMES DUM ≡ DUM TIMES r ≡ an empty relation with

the same heading as r for all relations r

Trang 9

We turn now to the effect of the algebraic operators on DEE and DUM We note first that the only relations that are of the same type as DEE and DUM are DEE and DUM themselves We have: UNION │ DEE DUM INTERSECT │ DEE DUM MINUS │ DEE DUM

──────┼──────── ──────────┼──────── ──────┼──────── DEE │ DEE DEE DEE │ DEE DUM DEE │ DUM DEE DUM │ DEE DUM DUM │ DUM DUM DUM │ DUM DUM

In the case of MINUS, the first operand is shown at the left and the second at the top (for the other operators, of course, the operands are interchangeable) Notice how reminiscent these

tables are of the truth tables for OR, AND, and AND NOT,

respectively; of course, the resemblance isn't a coincidence

As for restrict and project, we have:

• Any restriction of DEE yields DEE if the restriction

condition evaluates to TRUE, DUM if it evaluates to FALSE

• Any restriction of DUM yields DUM

• Projection of any relation over no attributes yields DUM if the original relation is empty, DEE otherwise In particular, projection of DEE or DUM, necessarily over no attributes at all, returns its input

For extend and summarize, we have:

• Extending DEE or DUM to add a new attribute yields a relation

of degree one and the same cardinality as its input

• Summarizing DEE or DUM (necessarily by no attributes at all) yields a relation of degree one and the same cardinality as its input

Note: We omit consideration of DIVIDEBY, SEMIJOIN, and

SEMIMINUS because they're not primitive TCLOSE is irrelevant (it applies to binary relations only) We also omit consideration of GROUP and UNGROUP for obvious reasons

7.9 No!

7.10 INTERSECT is defined only if its operand relations are all of the same type, while no such limitation applies to JOIN It

follows that, when there are no operands at all, we can define the

result for JOIN generically, but we can't do the same for

INTERSECT──we can define the result only for specific INTERSECT

operations (i.e., INTERSECT operations that are specific to some particular relation type) In fact, when we say that INTERSECT is

Trang 10

a special case of JOIN, what we really mean is that every specific INTERSECT is a special case of some specific JOIN Let S_JOIN be

such a specific JOIN Then S_JOIN and JOIN aren't the same

operator, and it's reasonable to say that the S_JOIN and the JOIN

of no relations at all give different results

7.11 In every case the result is a relation of degree one If r

is nonempty, all four expressions return a one-tuple relation

containing the cardinality n of r If r is empty, expressions a

and c both return an empty result, while expressions b and d both return a one-tuple relation containing zero (the cardinality

of r)

7.12 Relation r has the same cardinality as SP and the same

heading, except that it has one additional attribute, X, which is relation-valued The relations that are values of X have degree zero (i.e., they are nullary relations); furthermore, each of

those relations is TABLE_DEE, not TABLE_DUM, because every tuple

sp in SP effectively includes the 0-tuple as its value for that

subtuple of sp that corresponds to the empty set of attributes Thus, each tuple in r effectively consists of the corresponding

tuple from SP extended with the X value TABLE_DEE

The expression r UNGROUP X yields the original SP relation

again

7.13 J

7.14 J WHERE CITY = 'London'

7.15 ( SPJ WHERE J# = J# ( 'J1' ) ) { S# }

7.16 SPJ WHERE QTY ≥ QTY ( 300 ) AND QTY ≤ QTY ( 750 )

7.17 P { COLOR, CITY }

7.18 ( S JOIN P JOIN J ) { S#, P#, J# }

7.19 ( ( ( S RENAME CITY AS SCITY ) TIMES

( P RENAME CITY AS PCITY ) TIMES

( J RENAME CITY AS JCITY ) )

WHERE SCITY =/ PCITY

OR PCITY =/ JCITY

OR JCITY =/ SCITY ) { S#, P#, J# }

7.20 ( ( ( S RENAME CITY AS SCITY ) TIMES

( P RENAME CITY AS PCITY ) TIMES

( J RENAME CITY AS JCITY ) )

WHERE SCITY =/ PCITY

AND PCITY =/ JCITY

Định dạng
Số trang	20
Dung lượng	99,17 KB