Database Modeling & Design Fourth Edition- P18 pps

In Figure 4.7, there is some redundancy between Publication and Report in terms of the relationships with Department and Topic-area.. Then in schema 4.2 Figure 4.7c we see that the Figur

Trang 1

72 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling

At this point we have sufficient commonality between schemas to attempt a merge In schemas 1 and 2.2 we have two sets of common entities, Department and Topic-area Other entities do not overlap and must appear intact in the superimposed, or merged, schema The merged schema, schema 3, is shown in Figure 4.7a Because the common entities are truly equivalent, there are no bad side effects of the merge due to existing relationships involving those entities in one schema and not in the other (Such a relationship that remains intact exists in schema 1 between Topic-area and Report, for example.) If true equivalence cannot

be established, the merge may not be possible in the existing form

In Figure 4.7, there is some redundancy between Publication and Report in terms of the relationships with Department and Topic-area Such a redundancy can be eliminated if there is a supertype/subtype relationship between Publication and Report, which does in fact occur

in this case because Publication is a generalization of Report In schema 4.1 (Figure 4.7b) we see the introduction of this generalization from Report to Publication Then in schema 4.2 (Figure 4.7c) we see that the

Figure 4.7 View integration: the merged schema

Publication

includes has

N N

N

N 1

contains

research-area written-for

title

title address

(a) Schema 3, the result of merging schema 1 and schema 2.2

code

name name

name

Report

Contractor

publishes

address

Trang 2

(b) Schema 3.1, new generalization

(c) Schema 3.2, elimination of redundant relationships

Figure 4.7 (continued)

Publication

includes has

d

N N

N

N 1

contains

title

title address

code

name name

name

Report

Contractor

publishes

address

Publication

includes has

d

N

title

address

code

name name

name

Report

Contractor

address

Trang 3

redundant relationships between Report and Department and Topic-area have been dropped The attribute “title” has been eliminated as an attribute of Report in Figure 4.7c because “title” already appears as an attribute of Publication at a higher level of abstraction; “title” is inher-ited by the subtype Report

The final schema, in Figure 4.7c, expresses completeness because all the original concepts (report, publication, topic area, department, and contractor) are kept intact It expresses minimality because of the transformation of “dept-name” from attribute in schema 1 to entity and attribute in schema 2.2, and the merger between schema 1 and schema 2.2 to form schema 3, and because of the elimination of “title”

as an attribute of Report and of Report relationships with Topic-area and Department Finally, it expresses understandability in that the final schema actually has more meaning than the individual original schemas

The view integration process is one of continual refinement and reevaluation It should also be noted that minimality may not always be the most efficient way to proceed If, for example, the elimination of the redundant relationships “publishes” and/or “contains” from schema 3.1

to 3.2 causes the time required to perform certain queries to be exces-sively long, it may be better from a performance viewpoint to leave them in This decision could be made during the analysis of the transac-tions on the database or during the testing phase of the fully imple-mented database

4.5 Entity Clustering for ER Models

This section presents the concept of entity clustering, which abstracts the ER schema to such a degree that the entire schema can appear on a single sheet of paper or a single computer screen This has happy conse-quences for the end user and database designer in terms of developing a mutual understanding of the database contents and formally document-ing the conceptual model

An entity cluster is the result of a grouping operation on a collection

of entities and relationships Entity clustering is potentially useful for designing large databases When the scale of a database or information structure is large and includes a large number of interconnections among its different components, it may be very difficult to understand the semantics of such a structure and to manage it, especially for the end users or managers In an ER diagram with 1,000 entities, the overall

Trang 4

structure will probably not be very clear, even to a well-trained database analyst Clustering is therefore important because it provides a method

to organize a conceptual database schema into layers of abstraction, and

it supports the different views of a variety of end users

4.5.1 Clustering Concepts

One should think of grouping as an operation that combines entities and their relationships to form a higher-level construct The result of a

grouping operation on simple entities is called an entity cluster A

group-ing operation on entity clusters, or on combinations of elementary enti-ties and entity clusters, results in a higher-level entity cluster The high-est-level entity cluster, representing the entire database conceptual

schema, is called the root entity cluster

Figure 4.8a illustrates the concept of entity clustering in a simple case where (elementary) entities R-sec (report section), R-abbr (report abbreviation), and Author are naturally bound to (dominated by) the entity Report; and entities Department, Contractor, and Project are not dominated (Note that to avoid unnecessary detail, we do not include the attributes of entities in the diagrams.) In Figure 4.8b, the dark-bor-dered box around the entity Report and the entities it dominates defines the entity cluster Report The dark-bordered box is called the EC box to represent the idea of an entity cluster In general, the name of the entity cluster need not be the same as the name of any internal entity; how-ever, when there is a single dominant entity, the names are often the same The EC box number in the lower-right corner is a clustering-level number used to keep track of the sequence in which clustering is done The number 2.1 signifies that the entity cluster Report is the first entity cluster at level 2 Note that all the original entities are considered to be

at level 1

The higher-level abstraction, the entity cluster, must maintain the same relationships between entities inside and outside the entity cluster

as occur between the same entities in the lower-level diagram Thus, the entity names inside the entity cluster should appear just outside the EC box along the path of their direct relationship to the appropriately related entities outside the box, maintaining consistent interfaces (rela-tionships) as shown in Figure 4.8b For simplicity, we modify this rule slightly: If the relationship is between an external entity and the domi-nant internal entity (for which the entity cluster is named), the entity cluster name need not be repeated outside the EC box Thus, in Figure 4.8b, we could drop the name Report both places it occurs outside the

Trang 5

Report box, but we must retain the name Author, which is not the name

of the entity cluster

4.5.2 Grouping Operations

Grouping operations are the fundamental components of the entity clustering technique They define what collections of entities and rela-tionships comprise higher-level objects, the entity clusters The opera-tions are heuristic in nature and include (see Figure 4.9):

Figure 4.8 Entity clustering concepts

N 1

N N

1

N

1 N

has

(a) ER model before clustering

Report

Department Contractor

has does

does

has in

R-abbr R-sec

(b) ER model after clustering

N N Report Report

N N

Author

Project

Department Contractor

has does

does

Report (entity cluster)

2.1

Định dạng
Số trang	5
Dung lượng	166,41 KB