In Figure 4.7, there is some redundancy between Publication and Report in terms of the relationships with Department and Topic-area.. Then in schema 4.2 Figure 4.7c we see that the Figur
Trang 172 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling
At this point we have sufficient commonality between schemas to attempt a merge In schemas 1 and 2.2 we have two sets of common entities, Department and Topic-area Other entities do not overlap and must appear intact in the superimposed, or merged, schema The merged schema, schema 3, is shown in Figure 4.7a Because the common entities are truly equivalent, there are no bad side effects of the merge due to existing relationships involving those entities in one schema and not in the other (Such a relationship that remains intact exists in schema 1 between Topic-area and Report, for example.) If true equivalence cannot
be established, the merge may not be possible in the existing form
In Figure 4.7, there is some redundancy between Publication and Report in terms of the relationships with Department and Topic-area Such a redundancy can be eliminated if there is a supertype/subtype relationship between Publication and Report, which does in fact occur
in this case because Publication is a generalization of Report In schema 4.1 (Figure 4.7b) we see the introduction of this generalization from Report to Publication Then in schema 4.2 (Figure 4.7c) we see that the
Figure 4.7 View integration: the merged schema
Publication
includes has
N N
N
N
N 1
contains
research-area written-for
title
title address
(a) Schema 3, the result of merging schema 1 and schema 2.2
code
code
name name
name
Report
Contractor
publishes
address
Trang 2(b) Schema 3.1, new generalization
(c) Schema 3.2, elimination of redundant relationships
Figure 4.7 (continued)
Publication
includes has
d
N N
N
N
N 1
contains
research-area written-for
title
title address
code
code
name name
name
Report
Contractor
publishes
address
Publication
includes has
d
N
N
research-area written-for
title
address
code
code
name name
name
Report
Contractor
address
Trang 374 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling
redundant relationships between Report and Department and Topic-area have been dropped The attribute “title” has been eliminated as an attribute of Report in Figure 4.7c because “title” already appears as an attribute of Publication at a higher level of abstraction; “title” is inher-ited by the subtype Report
The final schema, in Figure 4.7c, expresses completeness because all the original concepts (report, publication, topic area, department, and contractor) are kept intact It expresses minimality because of the transformation of “dept-name” from attribute in schema 1 to entity and attribute in schema 2.2, and the merger between schema 1 and schema 2.2 to form schema 3, and because of the elimination of “title”
as an attribute of Report and of Report relationships with Topic-area and Department Finally, it expresses understandability in that the final schema actually has more meaning than the individual original schemas
The view integration process is one of continual refinement and reevaluation It should also be noted that minimality may not always be the most efficient way to proceed If, for example, the elimination of the redundant relationships “publishes” and/or “contains” from schema 3.1
to 3.2 causes the time required to perform certain queries to be exces-sively long, it may be better from a performance viewpoint to leave them in This decision could be made during the analysis of the transac-tions on the database or during the testing phase of the fully imple-mented database
4.5 Entity Clustering for ER Models
This section presents the concept of entity clustering, which abstracts the ER schema to such a degree that the entire schema can appear on a single sheet of paper or a single computer screen This has happy conse-quences for the end user and database designer in terms of developing a mutual understanding of the database contents and formally document-ing the conceptual model
An entity cluster is the result of a grouping operation on a collection
of entities and relationships Entity clustering is potentially useful for designing large databases When the scale of a database or information structure is large and includes a large number of interconnections among its different components, it may be very difficult to understand the semantics of such a structure and to manage it, especially for the end users or managers In an ER diagram with 1,000 entities, the overall
Trang 4structure will probably not be very clear, even to a well-trained database analyst Clustering is therefore important because it provides a method
to organize a conceptual database schema into layers of abstraction, and
it supports the different views of a variety of end users
4.5.1 Clustering Concepts
One should think of grouping as an operation that combines entities and their relationships to form a higher-level construct The result of a
grouping operation on simple entities is called an entity cluster A
group-ing operation on entity clusters, or on combinations of elementary enti-ties and entity clusters, results in a higher-level entity cluster The high-est-level entity cluster, representing the entire database conceptual
schema, is called the root entity cluster
Figure 4.8a illustrates the concept of entity clustering in a simple case where (elementary) entities R-sec (report section), R-abbr (report abbreviation), and Author are naturally bound to (dominated by) the entity Report; and entities Department, Contractor, and Project are not dominated (Note that to avoid unnecessary detail, we do not include the attributes of entities in the diagrams.) In Figure 4.8b, the dark-bor-dered box around the entity Report and the entities it dominates defines the entity cluster Report The dark-bordered box is called the EC box to represent the idea of an entity cluster In general, the name of the entity cluster need not be the same as the name of any internal entity; how-ever, when there is a single dominant entity, the names are often the same The EC box number in the lower-right corner is a clustering-level number used to keep track of the sequence in which clustering is done The number 2.1 signifies that the entity cluster Report is the first entity cluster at level 2 Note that all the original entities are considered to be
at level 1
The higher-level abstraction, the entity cluster, must maintain the same relationships between entities inside and outside the entity cluster
as occur between the same entities in the lower-level diagram Thus, the entity names inside the entity cluster should appear just outside the EC box along the path of their direct relationship to the appropriately related entities outside the box, maintaining consistent interfaces (rela-tionships) as shown in Figure 4.8b For simplicity, we modify this rule slightly: If the relationship is between an external entity and the domi-nant internal entity (for which the entity cluster is named), the entity cluster name need not be repeated outside the EC box Thus, in Figure 4.8b, we could drop the name Report both places it occurs outside the
Trang 576 CHAPTER 4 Requirements Analysis and Conceptual Data Modeling
Report box, but we must retain the name Author, which is not the name
of the entity cluster
4.5.2 Grouping Operations
Grouping operations are the fundamental components of the entity clustering technique They define what collections of entities and rela-tionships comprise higher-level objects, the entity clusters The opera-tions are heuristic in nature and include (see Figure 4.9):
Figure 4.8 Entity clustering concepts
N 1
N N
1
N
1 N
has
(a) ER model before clustering
Report
Department Contractor
has does
does
has in
R-abbr R-sec
(b) ER model after clustering
N N Report Report
N N
Author
Project
Department Contractor
has does
does
Report (entity cluster)
2.1