1465.3.1 Semitemporalizing Suppliers and Parts 1475.3.2 Fully Temporalizing Suppliers and Parts 1495.5 Interval Types 1565.6 Scalar Operators on Intervals 1595.7 Aggregate Operators on
Trang 1TE AM
Team-Fly®
Trang 2Advanced Database Technology and Design
Trang 3This Page Intentionally Left Blank
Trang 4Advanced Database Technology and Design
Mario Piattini Oscar Díaz Editors
Artech House Boston London www.artechhouse.com
Trang 5Library of Congress Cataloging-in-Publication Data
Advanced database technology and design / Mario G Piattini, Oscar Díaz, editors.
p cm (Artech House computing library)
Includes bibliographical references and index.
ISBN 0-89006-395-8 (alk paper)
1 Database management 2 Database design I Piattini, Mario, 1966
II Díaz, Oscar III Series.
QA76.9.D3 A3435 2000
CIP
British Library Cataloguing in Publication Data
Advanced database technology and design (Artech House computing library)
1 Databases 2 Database design
I Piattini, Mario G II Díaz, Oscar
005.74
ISBN 1-58053-469-4
Cover design by Igor Valdman
© 2000 ARTECH HOUSE, INC.
685 Canton Street
Norwood, MA 02062
All rights reserved Printed and bound in the United States of America No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, in- cluding photocopying, recording, or by any information storage and retrieval system, with- out permission in writing from the publisher.
All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized Artech House cannot attest to the accuracy of this informa- tion Use of a term in this book should not be regarded as affecting the validity of any trade- mark or service mark.
International Standard Book Number: 0-89006-395-8
Library of Congress Catalog Card Number: 00-055842
10 9 8 7 6 5 4 3 2 1
Trang 6The Impact on DBs 111.3.3 Nontraditional Applications 131.4 Research and Market Trends 15
1.4.2 Distribution and Integration 171.4.3 Functionality and Intelligence 181.5 Maturity of DB Technology 20
Selected Bibliography 23
v
Trang 72 An Introduction to Conceptual Modeling of
Information Systems 252.1 The Functions of an Information System 252.1.1 The Memory Function 282.1.2 The Informative Function 282.1.3 The Active Function 302.1.4 Examples of ISs 312.2 Conceptual Modeling 332.2.1 Conceptual Schema of the State 342.2.2 Information Base 382.2.3 Conceptual Schema of the Behavior 392.2.4 Integrity Constraints 432.2.5 Derivation Rules 452.3 Abstract Architecture of an IS 462.4 Requirements Engineering 512.5 Desirable Properties of Conceptual Schemas 53
Selected Bibliography 57Part II: Advanced Technologies 59
3.4.3 Supporting Causal Business Policies Through
3.4.4 Active Behavior 763.5 Implementation Issues 78
Trang 83.5.1 Active Rules in Oracle 793.5.2 Active Rules in Use 813.5.3 Standardizing Active Behavior in SQL: 1999 853.6 Rule Maintenance 85
4.2.4 Deductive Versus Relational Databases 1004.3 Query Processing 1024.3.1 Bottom-Up Query Evaluation 1034.3.2 Top-Down Query Evaluation 105
4.4 Update Processing 1084.4.1 Change Computation 1094.4.2 View Updating 1144.4.3 Integrity Constraint Enforcement 1174.4.4 A Common Framework for Database Updating
Trang 95.3 Whats the Problem? 1465.3.1 Semitemporalizing Suppliers and Parts 1475.3.2 Fully Temporalizing Suppliers and Parts 149
5.5 Interval Types 1565.6 Scalar Operators on Intervals 1595.7 Aggregate Operators on Intervals 1605.8 Relational Operators Involving Intervals 1625.9 Constraints Involving Intervals 1705.10 Update Operators Involving Intervals 1745.11 Database Design Considerations 1765.11.1 Horizontal Decomposition 1775.11.2 Vertical Decomposition 1795.12 Further Points 181
6.3 Contrasting the Major Features of Pure Relational
and Object-Oriented Databases 1926.4 Drawbacks of Pure Relational and Object-Oriented
6.5 Technology Issues: Enabling Object Functionality
in the Relational World 195
6.5.2 Collection Types 1966.5.3 Encapsulation 1976.5.4 Polymorphism 1976.5.5 Inheritance 1976.6 ORDBMS: A Closer Look at Characteristics in
the Physical Implementation 198
Trang 106.7 Design Issues: Capturing the Essence of the
Object-Relational Paradigm 2016.8 An Object-Relational Example 2036.9 The ABC Corporation Example 207
Selected Bibliography 208
7 Object-Oriented Database Systems 2117.1 Introduction and Motivation 2117.2 Basic Concepts of the Object-Oriented Data Model 2127.2.1 Objects and Object Identifiers 2147.2.2 Aggregation 216
7.2.4 Classes and Instantiation Mechanisms 2187.2.5 Inheritance 2197.3 Graphical Notation and Example 2207.4 ODMG Standard 2217.4.1 Objects and Literals 2227.4.2 Types: Classes and Interfaces 2227.4.3 Subtypes and Inheritance 223
Trang 118 Multimedia Database Management Systems 251
8.4.1 Main Achievements of MM-DBMS Technology 2748.4.2 Commercial Products and Research Prototypes 2808.4.3 Further Directions and Trends 282
Relational DDBs 3019.3.2 Top-Down Design of Relational DDBs 3049.3.3 Bottom-Up Design of Heterogeneous DDBs 3079.4 Distributed Query Processing 3109.4.1 Query Processing in Relational DDBs 3109.4.2 Query Processing in Heterogeneous DDBs 3149.5 Distributed Transaction Management 3159.5.1 Distributed Concurrency Control 3169.5.2 Distributed Commit 319
Team-Fly®
Trang 129.5.3 Distributed Recovery 3229.5.4 Transaction Management in Heterogeneous DDBs 3229.6 Current Trends and Challenges 3239.6.1 Alternative Transaction Models 3239.6.2 Mediator Architectures 3249.6.3 Databases and the World Wide Web 325
10.5.2 Wireless Medium 33610.5.3 Portability of Mobile Elements 33710.6 Impact of Mobile Computing on Data Management 33810.6.1 Transactions 33810.6.2 Data Dissemination by Broadcasting 33910.6.3 Query Processing 340
10.6.5 Database Interfaces 34110.7 Communication Models and Agents 34210.7.1 Communication Models 342
10.8 Mobile Computer Design Features for Accessing
Trang 1412.3.1 DBMS Architecture 40912.3.2 Components and DBMS Architecture 41212.3.3 Typology of Component DBMSs 41312.4 Component Database Models 41612.4.1 Plug-In Components 41712.4.2 Middleware DBMS 41912.4.3 Service-Oriented DBMSs 42212.4.4 Configurable DBMSs 42312.4.5 Categories of Component DBMS Models 42412.5 Development of Component DBMSs and Their
12.5.1 Database Design for CDBMSs 42612.5.2 Development of CDBMS Components 42712.6 Related Work: The Roots of CDBMSs 428
Part III: Advanced Design Issues 437
13 CASE Tools: Computer Support for Conceptual
13.1 Introduction to CASE Tools 43913.1.1 Functional Classification of CASE Tools 44013.1.2 Communication Between CASE Tools 44413.2 A CASE Framework for Database Design 44513.3 Conceptual Design Tools 44713.3.1 The Choice of the Conceptual Model 44813.3.2 Conceptual Modeling Tools 44913.3.3 Verification and Validation Tools 45513.3.4 Conceptual Design by Schema Integration 46313.3.5 Conceptual Design Based Upon Reusable
13.3.6 Conclusion on the Conceptual Level 46813.4 Logical Design Tools 46913.4.1 Fundamentals of Relational Design 46913.4.2 Functional Dependency Acquisition 470
Trang 1513.4.3 Mapping From Conceptual Schema to
Logical Schema 47313.4.4 Concluding Remarks on the Logical Design 479
Selected Bibliography 482
14 Database Quality 48514.1 Introduction 48514.2 Data Model Quality 48814.2.1 Quality Factors 49014.2.2 Stakeholders 49014.2.3 Quality Concepts 49214.2.4 Improvement Strategies 49314.2.5 Quality Metrics 493
14.3 Data Quality 50114.3.1 Management Issues 50214.3.2 Design Issues 504
Selected Bibliography 509About the Authors 511
Trang 16Since computers were introduced to automate organization management,information system evolution has influenced data management considerably.Applications demand more and more services from information stored incomputing systems These new services impose more stringent conditions onthe currently prevailing client/server architectures and relational databasemanagement systems (DBMSs) For the purpose of this book, thosedemands can be arranged along three aspects, namely:
Enhancements on the structural side The tabular representation of datahas proved to be suitable for applications, such as insurance and banking,that have to process large volumes of well-formatted data However, newerapplications such as computer-aided manufacturing or geographic informa-tion systems have a tough job attempting to fit more elaborate structures intoflat records Moreover, the SQL92 types are clearly insufficient to tackletime or multimedia concerns
Improvements on the behavioral side Data are no longer the only aspect
to be shared Code can, and must, be shared DBMS providers are striving tomake their products evolve from data servers to code servers The introduc-tion of rules to support active and deductive capabilities and the inclusion ofuser-defined data types are now part of that trend
Architectural issues New applications need access to heterogeneous anddistributed data, require a higher throughoutput (e.g., large number of trans-actions in e-commerce applications), or need to share code The client/serverarchitecture cannot always meet those new demands
xv
Trang 17This book aims to provide a gentle and application-oriented duction to those topics Motivation and application-development considera-tions, rather than state-of-the-art research, are the main focus Examples areextensively used in the text, and a brief selected reading section appears at theend of each chapter for readers who want more information Special atten-tion is given to the design issues raised by the new trends.
intro-The book is structured as follows:
Part I: Fundamentals
Chapter 1 gives an overview of the evolution of DBMS and how its historyhas been a continuous effort to meet the increasing demands of the applica-tions Chapter 2 provides a gentle introduction to the key concepts of con-ceptual modeling
Part II: Advanced Technologies
This part presents technological and design issues that we need to face toaddress new application requirements The first two chapters deal with rulemanagement, Chapter 3 covers active database systems, and Chapter 4deductive ones Chapter 5 examines the concepts of temporal databases andthe problems of time management Chapters 6 and 7 discuss two differentways of introducing object orientation in database technology: the moreevolutionary one (object-relational DBMSs) and the more revolutionary one(object-oriented DBMSs) Chapter 8 discusses the issues related to multime-dia databases and their management Chapters 9 and 10 present distributedand mobile DBMSs, respectively Chapter 11 focuses on security concerns
by discussing secure DBMSs Chapter 12 introduces a new approach toDBMS implementation: component DBMSs
Part III: Advanced Design Issues
Part III looks at two topics that are necessary for obtaining databases of a tain level of quality Chapter 13 examines various concepts associated withcomputer-aided database design that claim to be an effective way to improvedatabase design Chapter 14 concentrates on considering quality issues indatabase design and implementation
cer-As for the audience, the book is targeted to senior undergraduates andgraduate students Thus, it is mainly a textbook However, database profes-sional and application developers can also find a gentle introduction to thesetopics and useful hints for their job The prerequisites for understanding thebook are a basic knowledge of relational databases and software engineering.Some knowledge of object-oriented technology and networks is desirable
Trang 18We would like to thank Artech House, especially Viki Williams, andMarcela Genero of UCLM for their support during the preparation of thisbook.
It is our hope that the efforts made by the distinct authors to provide
a friendly introduction to their respective areas of expertise will make thereaders journey along the database landscape more pleasant
Mario PiattiniOscar DíazAugust 2000
Trang 19Part I:
Fundamentals
Trang 20This Page Intentionally Left Blank
Trang 21The history of database (DB) dates from the mid-1960s DB has proved to
be exceptionally productive and of great economic impact In fact, today, the
DB market exceeds $8 billion, with an 8% annual growth rate (IDC cast) Databases have become a first-order strategic product as the basis ofInformation Systems (IS), and support management and decision making.This chapter studies from a global perspective the current problemsthat led to the next generation of DBs.1The next four sections examine thepast, that is, the evolution of DB (Section 1.2); the troubles and challengesfacing current DBs, including changes in the organizations and changes
fore-in the type of applications (Section 1.3); the current research and markettrends based on the performance, functionality, and distribution dimensions(Section 1.4); and the maturity level of the technology (Section 1.5)
3
1 Development and tendencies in DB technology are too complicated to sum up in a few pages This chapter presents one approach, but the authors are aware that some aspects that are important to us may not be significant to other experts and vice versa In spite of that, we think it would be interesting for the reader to have a global view of the emer- gence and development of DB, the problems that have to be solved, and DB trends.
Team-Fly®
Trang 221.2 Database Evolution
In the initial stages of computing, data were stored in files systems Theproblems (redundancy, maintenance, security, the great dependence betweendata and applications, and, mainly, rigidity) associated with the use of suchsystems gave rise to new technology for the management of stored data: data-bases The first generation of DB management systems (DBMSs) evolvedover time, and some of the problems with files were solved Other problems,however, persisted, and the relational model was proposed to correct them.With that model, the second generation of DBs was born The difficulties indesigning the DBs effectively brought about design methodologies based ondata models
1.2.1 Historical Overview: First and Second DB Generations
Ever since computers were introduced to automate organization ment, IS evolution has considerably influenced data management ISdemands more and more services from information stored in computing sys-tems Gradually, the focus of computing, which had previously concentrated
manage-on processing, shifted from process-oriented to data-oriented systems, wheredata play an important role for software engineers Today, many IS designproblems center around data modeling and structuring
After the rigid files systems in the initial stages of computing, in the1960s and early 1970s, the first generation of DB products was born Data-base systems can be considered intermediaries between the physical deviceswhere data are stored and the users (human beings) of the data DBMSs arethe software tools that enable the management (definition, creation, mainte-nance, and use) of large amounts of interrelated data stored in computer-accessible media The early DBMSs, which were based on hierarchical andnetwork (Codasyl) models, provided logical organization of data in treesand graphs IBMs IMS, General Electrics IDS, (after Bulls), Univacs DMS
1100, Cincoms Total, MRIs System 2000, and Cullinets (now ComputerAssociates) IDMS are some of the well-known representatives of this genera-tion Although efficient, this type of product used procedural languages, didnot have real physical or logical independence, and was very limited in itsflexibility In spite of that, DBMSs were an important advance compared tothe files systems
IBMs addition of data communication facilities to its IMS softwaregave rise to the first large-scale database/data communication (DB/DC) sys-tem, in which many users access the DB through a communication network
Trang 23Since then, access to DBs through communication networks has been offered
by commercially available DBMSs
C W Bachman played a pioneering role in the development of work DB systems (IDS product and Codasyl DataBase Task Group, orDBTG, proposals) In his paper The Programmer as Navigator (Bach-mans lecture on the occasion of his receiving the 1973 Turing award), Bach-man describes the process of traveling through the DB; the programmer has
net-to follow explicit paths in search of one piece of data going from record net-torecord [1]
The DBTG model is based on the data structure diagrams [2], whichare also known as Bachmans diagrams In the model, the links betweenrecord types, called Codasyl sets, are always one occurrence of one recordtype to many, that is, a functional link In its 1978 specifications [3],Codasyl also proposed a data definition language (DDL) at three levels(schema DDL, subschema DDL, and internal DDL) and a procedural (pre-scriptive) data manipulation language (DML)
Hierarchical links and Codasyl sets are physically implemented viapointers That implementation, together with the functional constraints ofthose links and sets, is the cause of their principal weaknesses (little flexibility
of such physical structures, data/application dependence, and complexity oftheir navigational languages) of the systems based on those models Never-theless, those same pointers are precisely the reason for their efficiency, one
of the great strengths of the products
In 19691970, Dr E F Codd proposed the relational model [4],which was considered an elegant mathematical theory (a toy for certainexperts) without any possibility of efficient implementation in commercialproducts In 1970, few people imagined that, in the 1980s, the relationalmodel would become mandatory (a decoy) for the promotion of DBMSs.Relational products like Oracle, DB2, Ingres, Informix, Sybase, and so
on are considered the second generation of DBs These products have morephysical and logical independence, greater flexibility, and declarative querylanguages (users indicate what they want without describing how to getit) that deal with sets of records, and they can be automatically optimized,although their DML and host language are not integrated With relationalDBMSs (RDBMSs), organizations have more facilities for data distribution.RDBMSs provide not only better usability but also a more solid theoreticalfoundation
Unlike network models, the relational model is value-oriented and doesnot support object identity (There is an important tradeoff between objectidentity and declarativeness.) As a result of Codasyl DBTG and IMS support
Evolution and Trends of Database Technology 5
Trang 24object identity, some authors introduced them in the object-oriented DBclass As Ullman asserts: Many would disagree with our use of the term
object-oriented when applied to the first two languages: the Codasyl DBTGlanguage, which was the origin of the network model, and IMS, an earlydatabase system using the hierarchical model However, these languages sup-port object identity, and thus present significant problems and significantadvantages when compared with relational languages [5]
After initial resistance to relational systems, mainly due to performanceproblems, these products have now achieved such wide acceptance that thenetwork products have almost disappeared from the market In spite of theadvantages of the relational model, it must be recognized that the relationalproducts are not exempt from difficulties Perhaps one of the greatestdemands on RDBMSs is the support of increasingly complex data types;also, null values, recursive queries, and scarce support for integrity rules andfor domains (or abstract data types) are now other weaknesses of relationalsystems Some of those problems probably will be solved in the next version
of Structured Query Language (SQL), SQL: 1999 (previously SQL3) [6]
In the 1970s, the great debate on the relative merits of Codasyl andrelational models served to compare both classes of models and to obtain abetter understanding of their strengths and weaknesses
During the late 1970s and in the 1980s, research work (and, later,industrial applications) focused on query optimization, high-level languages,the normalization theory, physical structures for stored relations, bufferand memory management algorithms, indexing techniques (variations ofB-trees), distributed systems, data dictionaries, transaction management, and
so on That work allowed efficient and secure on-line transactional ing (OLTP) environments (in the first DB generation, DBMSs were ori-ented toward batch processing) In the 1980s, the SQL language was alsostandardized (SQL/ANS 86 was approved by the American National Stan-dard Institute (ANSI) and the International Standard Organization (ISO) in1986), and today, every RDBMS offers SQL
process-Many of the DB technology advances at that time were founded ontwo elements: reference models and data models (see Figure 1.1) [7] ISOand ANSI proposals on reference models [810] have positively influencednot only theoretical researches but also practical applications, especially
in DB development methodologies In most of those reference models, twomain concepts can be found: the well-known three-level architecture (exter-nal, logical, and internal layers), also proposed by Codasyl in 1978, and therecursive data description The separation between logical description ofdata and physical implementation (data application independence) devices
Trang 25was always an important objective in DB evolution, and the three-levelarchitecture, together with the relational data model, was a major step in thatdirection.
In terms of data models, the relational model has influenced researchagendas for many years and is supported by most of the current products.Recently, other DBMSs have appeared that implement other models, most
of which are based on object-oriented principles.2
Three key factors can be identified in the evolution of DBs: theoreticalbasis (resulting from researchers work), products (developed by vendors),and practical applications (requested by users) Those three factors have beenpresent throughout the history of DB, but the equilibrium among themhas changed What began as a product technology demanded by users needs
in the 1960s became a vendor industry during the 1970s and 1980s In the1970s, the relational model marked the consideration of DB as a researchtechnology, a consideration that still persists In general, users needs havealways influenced the evolution of DB technology, but especially so in thelast decade
Today, we are witnessing an extraordinary development of DB nology Areas that were exclusive of research laboratories and centers areappearing in DBMSs latest releases: World Wide Web, multimedia, active,object-oriented, secure, temporal, parallel, and multidimensional DBs
tech-Evolution and Trends of Database Technology 7
Reference models (ISO, ANSI)
Relational Object-oriented
2 An IDC forecast in 1997 denoted that object-oriented DBMSs would not overcome 5%
of the whole DB market.
Trang 26Table 1.1 summarizes the history of DBs (years are approximate because ofthe big gaps that sometimes existed between theoretical research, the appear-ance of the resulting prototypes, and when the corresponding products wereoffered in the market).
1.2.2 Evolution of DB Design Methodologies3
DB modeling is a complex problem that deals with the conception, hension, structure, and description of the real world (universe of discourse),
Table 1.1 Database Evolution
1960 First DB products (DBOM, IMS, IDS, Total, IDMS)
Codasyl standards
1970 Relational model
RDBMS prototypes Relational theoretical works Three-level architecture (ANSI and Codasyl) E/R model
First relational market products
1980 Distributed DBs
CASE tools SQL standard (ANSI, ISO) Object-oriented DB manifesto
SQL/MM
3 In considering the contents of this book and the significance of DB design, we thought it appropriate to dedicate a part of this first chapter to presenting the evolution of DB design.
Trang 27through the creation of schemata, based on the abstraction processes andmodels The use of methodologies that guide the designer in the process ofobtaining the different schemata is essential Some methodologies offer onlyvague indications or are limited to proposing some heuristics Other meth-odologies establish well-defined stages (e.g., the schemata transformationprocess from entity relationship (E/R) model to relational model [1113])and even formalize theories (e.g., the normalization process introduced byCodd in 1970 [4] and developed in many other published papers.4
Database design also evolved according to the evolution of DBMSsand data models When data models with more expressive power were born,DBMSs were capable of incorporating more semantics, and physical andlogical designs started distinguishing one from the other as well With theappearance of the relational model, DB design focused, especially in the aca-demic field, on the normalization theory ANSI architecture, with its threelevels, also had a considerable influence on the evolution of design method-ologies It helped to differentiate the phases of DB design In 1976, the E/Rmodel proposed by Chen [14, 15] introduced a new phase in DB design:conceptual modeling (discussed in Chapters 2 and 14) This stage constitutesthe most abstract level, closer to the universe of discourse than to its com-puter implementation and independent of the DBMSs In conceptual mod-eling, the semantics of the universe of discourse have to be understood andrepresented in the DB schema through the facilities the model provides AsSaltor [16] said, a greater semantic level helps to solve different problems,such as federated IS engineering, workflow, transaction management, con-currency control, security, confidentiality, and schemata evolution
Database design is usually divided into three stages: conceptual design,logical design, and physical design
• The objective of conceptual design is to obtain a good tion of the enterprise data resources, independent of the implemen-tation level as well as the specific needs of each user or application It
representa-is based on conceptual or object-oriented models
Evolution and Trends of Database Technology 9
4 The normalization theory (or dependency theory) has greatly expanded over the past years, and there are a lot of published works on the subject For that reason, we refer only
to the first paper by Codd introducing the first three normal forms Readers who want to get into the subject should consult Kents work A Simple Guide to Five Normal Forms
in Relational Database Theory (CACM, 26 (2), 1983), which presents a simple, tive characterization of the normal forms.
Trang 28intui-• The objective of logical design is to transform the conceptualschema by adapting it to the data model that implements the DBMS
to be used (usually relational) In this stage, a logical schema and themost important users views are obtained
• The objective of physical design is to achieve the most efficientimplementation of the logical schema in the physical devices of thecomputer
During the last few years, there have been many attempts to offer a more tematic approach to solving design problems In the mid-1980s, one of thoseattempts was design automatization through the use of computer-aided soft-ware/system engineering (CASE) tools (see Chapter 13) CASE tools con-tributed to spreading the applications of conceptual modeling andrelaunching DB design methodologies While it is true that some CASEtools adopted more advanced approaches, many continued to be simpledrawing tools At times, they do not even have a methodological support orare not strict enough in their application As a result, designers cannot findthe correct path to do their job [17] Furthermore, the models the tools gen-erally support are logical models that usually include too many physicalaspects, in spite of the fact that the graphic notation used is a subset of theE/R model
sys-New (object-oriented) analysis and design techniques, which at firstfocused on programming language and recently on DBs [18, 19], haveappeared in the last decade Those methodologiesBooch method, object-oriented software engineering (OOSE), object modeling technique (OMT),unified method, fusion method, Shlaer-Mellor method, and Coad-Yourdonmethod, to name some important examplesare mainly distinguished bythe life cycle phase in which they are more focused and the approach adopted
in each phase (object-oriented or functional) [20] A common characteristic
is that they generally are event driven
The IDEA methodology [21], as a recent methodological approach, is
an innovative object-oriented methodology driven by DB technology Ittakes a data-centered approach, in which the data design is performed first,followed by the application design
1.3 The New DB Generation
Many nontraditional applications still do not use DB technology because
of the special requirements for such a category of applications The current