FUNDAMENTALS OF DATABASE SYSTEMS Fourth Edition phần 10 doc

The followingWWWreferencesmay be consulted for additional information: CA- JASMINE Multimedia ODBMS: http://www.cai.com/products/iasmine.htmExcalibur technologies: http://www.excalib.com

Trang 1

• Marketing, advertising, retailing, entertainment, and travel:There are virtually no limits

to using multimedia information in these applications-from effective sales

presenta-tionstovirtual tours of cities and art galleries The film industry has already shown

the power of special effects in creating animations and synthetically designed

ani-mals, aliens, and special effects The use of predesigned stored objects in multimedia

databases will expand the range of these applications

• Real-time control and monitoring:Coupled with active database technology (see

Chap-ter 24), multimedia presentation of information can be a very effective means for

monitoring and controlling complex tasks such as manufacturing operations, nuclear

power plants, patients in intensive care units, and transportation systems

Commercial Systems for Multimedia Information Management There are no

OBMSs designed for the sole purpose of multimedia data management, and therefore there

are none that have the range of functionality required to fully support all of the

multimedia information management applications that we discussed above However,

several OBMSs today support multimedia data types; these include lnformix Dynamic

Server, OB2 Universal database (UOB) of IBM, Oracle 9 and 10, CA- JASMINE, Sybase, OOB

II All of these OBMSs have support for objects, which is essential for modeling a variety of

complex multimedia objects One major problem with these systems is that the "blades,

cartridges, and extenders" for handling multimedia data are designed in a very ad hoc

manner The functionality is provided without much apparent attention to scalability and

performance There are products available that operate either stand-alone or in

conjunction with other vendors' systems to allow retrieval of image data by content They

include Virage, Excalibur, and IBM's QBIC Operations on multimedia need to be

standardized The MPEG- 7 and other standards are addressing some of these issues

29.2.5 Selected Bibliography on Multimedia Databases

Multimedia database management is becoming a very heavily researched area with

sev-eral industrial projects on the way Grosky (1994, 1997) provides two excellent

tutori-als on the topic Pazandak and Srivastava (1995) provide an evaluation of database

systems related to the requirements of multimedia databases Grosky et al (1997)

con-tains contributed articles including a survey on content-based indexing and retrieval by

]agadish (1997) Faloutsos et al (1994) also discuss a system for image querying by

con-tent.Li et al (1998) introduce image modeling in which an image is viewed as a

hierar-chical structured complex object with both semantics and visual properties Nwosu et

al (1996) and Subramanian and ]ajodia (1997) have written books on the topic

Lassila (1998) discusses the need for metadata for accessing mutimedia information on

the web; the semantic web effort is summarized in Fensel (2000) Khan (2000) did a

dissertation on ontology-based information retrieval Uschold and Gruninger (1996) is

a good resource on ontologies Corcho et al (2003) compare ontology languages and

discuss methodologies to build ontologies Multimedia content analysis, indexing, and

filtering are discussed in Dimitrova (1999) A survey of content-based multimedia

Trang 2

930 IChapter 29 Emerging Database Technologies and Applications

retrieval is provided by Yoshitaka and Ichikawa (1999) The followingWWWreferencesmay be consulted for additional information:

CA- JASMINE (Multimedia ODBMS): http://www.cai.com/products/iasmine.htmExcalibur technologies: http://www.excalib.com

Virage, Inc (Content based image retrieval): http://www.virage.comIBM's QBlC (Query by Image Content) product:

29.3 GEOGRAPHIC INFORMATION SYSTEMS

Geographic information systems (GIS) are used to collect, model, store, and analyzeinformation describing physical properties of the geographical world The scope of GISbroadly encompasses two types of data: (1) spatial data, originating from maps, digitalimages, administrative and political boundaries, roads, transportation networks; physicaldata such as rivers, soil characteristics, climatic regions, land elevations, and (2) nonspa-tial data, such as socio-economic data (like census counts), economic data, and sales ormarketing information GIS is a rapidly developing domain that offers highly innovativeapproachestomeet some challenging technical demands

29.3.1 GIS Applications

Itis possible to divide GISs into three categories: (1) cartographic applications, (2) digitalterrain modeling applications, and (3) geographic objects applications Figure 29.3summarizes these categories

Incartographic and terrain modeling applications, variations in spatial attributes arecaptured-for example, soil characteristics, crop density, and air quality Ingeographicobjects applications, objects of interest are identified from a physical domain-forexample, power plants, electoral districts, property parcels, product distribution districts,and city landmarks These objects are related with pertinent application data-whichmay be, for this specific example, power consumption, voting patterns, property salesvolumes, product sales volume, and traffic density

The first two categories of GIS applications require a field-based representation,

whereas the third category requires an object-based one The cartographic approach

involves special functions that can include the overlapping of layers of maps to combineattribute data that will allow, for example, the measuring of distances in three-dimensional space and the reclassification of data on the map Digital terrain modelingrequires a digital representation of parts of earth's surface using land elevations at samplepoints that are connected to yield a surface model such as a three-dimensional net(connected lines in 3D) showing the surface terrain.Itrequires functions of interpolationbetween observed points as well as visualization.Inobject-based geographic applications,additional spatial functions are needed to deal with data related to roads, physicalpipelines, communication cables, power lines, and such For example, for a given region,

Trang 3

Earth science resource studies

Civil engineering and military evaluation

Geographic Objects Applications

Car navigation systems

Geographic market analysis

Utility distribution and consumption

Consumer product and services- economic analysis

FIGURE 29.3 A possible classification of GIS applications (Adapted from Adam and

Gangopadhyay (1997))

comparable maps can be used for comparison at various points of time to show changes in

certain data such as locations of roads, cables, buildings, and streams

The functional requirements of theGISapplications above translate into the following

data-base requirements

Data Modeling and Representation GISdata can be broadly represented in two

formats: (l) vector and (2) raster Vector data represents geometric objects such as points,

lines, and polygons Thus a lake may be represented as a polygon, a river by a series of line

segments Raster data is characterized as an array of points, where each point represents the

value of an attribute for a real-world location Informally, raster images are n-dimensional

arrays where each entry is a unit of the image and represents an attribute Two-dimensional

units are calledpixels, while three-dimensional units are called voxels. Three-dimensional

elevation data is stored in a raster-based digital elevation model (OEM)format Another

ras-ter format called triangular irregular network(TIN) is a topological vector-based approach

that models surfaces by connecting sample points as vertices of triangles and has a point

density that may vary with the roughness of the terrain Rectangular grids (or elevation

Trang 4

matrices) are two-dimensional array structures In digital terrain modeling (OTM), themodel also may be used by substituting the elevation with some attribute of interest such aspopulation density or air temperature GIS data often includes a temporal structure in addi-tion to a spatial structure For example, traffic flow or average vehicular speeds in traffic may

be measured every 60 seconds at a set of points in a roadway nework

Data Analysis. GIS data undergoes various types of analysis For example, in tions such as soil erosion studies, environmental impact studies, or hydrological runoff simu-lations, OTM data may undergo various types of geomorphometric analysis-measurementssuch as slope values,gradients (the rate of change in altitude), aspect (the compass direction

applica-of the gradient),profile convexity (the rate of change of gradient), plan convexity (the

con-vexity of contours and other parameters) When GIS data is used for decision support cations, it may undergo aggregation and expansion operations using data warehousing, as

appli-we discussed in Section 28.3 In addition, geometric operations (to compute distances,areas, volumes), topological operations (to compute overlaps, intersections, shortest paths),and temporal operations (to compute internal-based or event-based queries) are involved.Analysis involves a number of temporal and spatial operations, which were discussed inChapter 24

Data Integration. GISs must integrate both vector and raster data from a variety ofsources Sometimes edges and regions are inferred from a raster image to form a vector model,

or conversely, raster images such as aerial photographs are used to update vector models eral coordinate systems such as Universal Transverse Mercator(UTM), latitude/longitude, andlocal cadastral systems are used to identify locations Data originating from different coordi-nate systems requires appropriate transformations Major public sources of geographic data,including the TIGER files maintained by U.S Department of Commerce, are used for roadmaps by many Web-based map drawing tools (e.g., http://maps.yahoo.com) Often there arehigh-accuracy, attribute-poor maps that have to be merged with low-accuracy, attribute-richmaps This is done with a process called "rubber-banding" where the user defines a set of con-trol points in both maps and the transformation of the low accuracy map is accomplished bylining up the control points A major integration issue is to create and maintain attributeinformation (such as air quality or traffic flow), which can be related to and integrated withappropriate geographical information over time as both evolve

Sev-Data Capture. The first step in developing a spatial database for cartographic ing is to capture the two-dimensional or three-dimensional geographical information in dig-ital form-a process that is sometimes impeded by source map characteristics such asresolution, type of projection, map scales, cartographic licensing, diversity of measurementtechniques, and coordinate system differences Spatial data can also be captured fromremote sensors in satellites such as Landsat, NORA, and Advanced Very High ResolutionRadiometer (AVHRR) as well as SPOT HRV (High Resolution Visible Range Instrument),which is free of interpretive bias and very accurate For digital terrain modeling, data cap-ture methods range from manual to fully automated Ground surveys are the traditionalapproach and the most accurate, but they are very time consuming Other techniquesinclude photogrammetric sampling and digitizing cartographic documents

Trang 5

model-29.3.3 Specific GIS Data Operations

GISapplications are conducted through the use of special operators such as the following:

1 Interpolation: This process derives elevation data for points at which no samples

have been taken.Itincludes computation at single points, computation for a

rect-angular grid or along a contour, and so forth Most interpolation methods are

based on triangulation that uses the TIN method for interpolating elevations

inside the triangle based on those of its vertices

2 Interpretation: Digital terrain modeling involves the interpretation of operations

on terrain data such as editing, smoothing, reducing details, and enhancing

Additional operations involve patching or zipping the borders of triangles (in TIN

data), and merging, which implies combining overlapping models and resolving

conflicts among attribute data Conversions among grid models, contour models,

and TIN data are involved in the interpretation of the terrain

3 Proximity analysis: Several classes of proximity analysis include computations of

"zones of interest" around objects, such as the determination of a buffer around a

car on a highway Shortest path algorithms using 2D or 3D information is an

important class of proximity analysis

4 Raster image processing: This process can be divided into two categories: (1) map

algebra, which is used to integrate geographic features on different map layers to

produce new maps algebraically; and (2) digital image analysis, which deals with

analysis of a digital image for features such as edge detection and object detection

Detecting roads in a satellite image of a city is an example of the latter

5 Analysis of networks: Networks occur in GIS in many contexts that must be

ana-lyzed and may be subjected to segmentations, overlays, and so on Network overlay

refers to a type of spatial join where a given network-for example, a highway

net-work-is joined with a point database-for example, incident locations-to yield,

in this case, a profile of high-incident roadways

Other Database Functionality The functionality of a GIS database is also subject

to other considerations

• Extensibility: GISs are required to be extensible to accommodate a variety of

con-stantly evolving applications and corresponding data types If a standard DBMS is

used, it must allow a core set of data types with a provision for defining additional

types and methods for those types

• Data quality control:As in many other applications, quality of source data is of

amount importance for providing accurate results to queries This problem is

par-ticularly significant in the GIS context because of the variety of data, sources, and

measurement techniques involved and the absolute accuracy expected by

applica-tions users

6 Visualization: A crucial function in GIS is related to visualization-the graphical

display of terrain information and the appropriate representation of application

Trang 6

attributes to go with it Major visualization techniques include (1) contouring

through the use ofisolines, spatial units of lines or arcs of equal attribute values; (2)

hillshading, an illumination method used for qualitative relief depiction using ied light intensities for individual facets of the terrain model; and (3) perspective displays, three-dimensional images of terrain model facets using perspective projec-tion methods from computer graphics These techniques impose cartographic dataand other three-dimensional objects on terrain data providing animated scene ren-derings such as those in flight simulations and animated movies

var-Such requirements clearly illustrate that standard RDBMSs or ODBMSs do not meet thespecial needs of GIS It is therefore necessary to design systems that support the vector andraster representations and the spatial functionality as well as the required DBMS features Apopular GIS software called ARC-INFO, which is not a DBMS but integrates RDBMSfunctionality in the INFO part of the system, is briefly discussed in the subsection that follows.More systems are likely to be designed in the future to work with relational or objectdatabases that will contain some of the spatial and most of the nonspatial information

29.3.4 An Example of a GIS Software: ARC-INFO

ARC/INFo-a popular GIS software launched in 1981 by Environmental System ResearchInstitute (ESRr)-uses the arc node model to store spatial data A geographic layer-ealled

coverage in ARC/INFO-eonsists of three primitives: (1) nodes (points), (2) arcs (similar to

lines), and (3) polygons The arc is the most important of the three and stores a largeamount of topological information An arc has a start node and an end node (and it there-fore has direction too).Inaddition, the polygons to the left and the right of the arc are alsostored along with each arc As there is no restriction on the shape of the arc, shape pointsthat have no topological information are also stored along with each arc The databasemanaged by the INFO RDBMS thus consists of three required tables: (1) node attribute table(NAT), (2) arc attribute table (AAT), and (3) polygon attribute table (PAT) Additionalinformation can be stored in separate tables and joined with any of these three tables.The NAT contains an internal!Dfor the node, a user-specified!D,the coordinates ofthe node, and any other information associated with that node (e.g., names of theintersecting roads at the node) The AAT contains an internal !D for the are, a user-specified !D,the internal!Dof the start and end nodes, the internal!Dof the polygons tothe left and the right, a series of coordinates of shape points (if any), the length of the are,and any other data associated with the arc (e.g., the name of the road the arc represents).The PAT contains an internal ID for the polygon, a user-specified !D, the area of the

polygon, the perimeter of the polygon, and any other associated data (e.g., name of thecounty the polygon represents)

Typical spatial queries are related to adjacency, containment, and connectivity The arcnode model has enough information to satisfy all three types of queries, but the RDBMS is notideally suited for this type of querying A simple example will highlight the number of timesarelational database has to be queried to extract adjacency information Assume that we aretrying to determine whether two polygons, A and B,are adjacent to each other We wouldhave to exhaustively look at the entireAAT todetermine whether there is an edge that has A

Trang 7

on one side and B on the other The search cannot be limited to the edges of either polygon as

we do not explicitly store all the arcs that make a polygon in the PAT Storing all the arcs in

the PAT would be redundant because all the information is already there in the AAT

ESRI has released Arc/Storm (Arc Store Manager) which allows multiple users to use

the same GIS, handles distributed databases, and integrates with other commercial

RDBMSs like ORACLE, INFORMIX, and SYBASE While it offers many performance and

functional advantages over ARC/INFO, it is essentially an RDBMS embedded within a GIS

29.3.5 Problems and Future Issues in GIS

GIS is an expanding application area of databases, reflecting an explosion in the number of

end users using digitized maps, terrain data, space images, weather data, and traffic

informa-tion support data As a consequence, an increasing number of problems related to GIS

appli-cations has been generated and will need to be solved:

1.New architectures:GIS applications will need a new client-server architecture that

will benefit from existing advances in RDBMS and ODBMS technology One

possi-ble solution is to separate spatial from nonspatial data and tomanage the latter

entirely by a DBMS Such a process calls for appropriate modeling and integration

as both types of data evolve Commercial vendors find that it is more viable to

keep a small number of independent databases with an automatic posting of

updates across them Appropriate tools for data transfer, change management, and

workflow management will be required

2 Versioningand object life-cycle approach: Because of constantly evolving

geographi-cal features, GISs must maintain elaborate cartographic and terrain data-a

man-agement problem that might be eased by incremental updating coupled with

update authorization schemes for different levels of users Under the object

life-cycle approach, which covers the activities of creating, destroying, and modifying

objects as well as promoting versions into permanent objects, a complete set of

methods may be predefined to control these activities for GIS objects

3 Data standards: Because of the diversity of representation schemes and models,

formalization of data transfer.standards is crucial for the success of GIS The

inter-national standardization body (rso Tc2l0 and the European standards body

(CEN Tc278) are now in the process of debating relevant issues-among them

conversion between vector and raster data for fast query performance

4 Matching applications and data structures: Looking again at Figure 27.5, we see that

a classification of GIS applications is based on the nature and organization of data

Inthe future, systems covering a wide range of functions-from market analysis

and utilities to car navigation-will need boundary-oriented data and

functional-ity On the other hand, applications in environmental science, hydrology, and

agriculture will require more area-oriented and terrain model data It is not clear

that all this functionality can be supported by a single general-purpose GIS The

specialized needs of GISs will require that general purpose DBMSs must be

Trang 8

enhanced with additional data types and functionality before full-fledged GISapplications can be supported

5 Lack of semantics in data structures: This is evident especially in maps Information

such as highway and road crossings may be difficult to determine based on thestored data One-way streets are also hard to represent in the present GISs Trans-portationCADsystems have incorporated such semantics into GIS

29.3.6 Selected Bibliography for GIS

There are a number of books written on GIS Adam and Gangopadhyay (1997) and Lauriniand Thompson (1992) focus on GIS database and information management problems.Kemp (1993) gives an overview of GIS issues and data sources Huxhold (1991) gives anintruduction to Urban GIS Maguire et al (1991) have a very good collection of GIS-relatedpapers Antenucci (1998) presents a discussion of the GIS technologies Shekhar andChawla (2002) discusses issues and approaches to spatial data management which is at thecore of all GIS Demers (2002) is another recent book on the fundamentals of GIS Bosso-maier and Green (2002) is a primer on GIS operations, languages, metadata paradigms andstandards Peng and Tsou (2003) discusses Internet GIS which includes a suite of emergingnew technologies aimed at making GIS more mobile, powerful, and flexible, as well as betterable to share and communicate geographic information The TIGER files for road data in theUnited States are managed by the U.S Department of Commerce (1993) Laser-Scan'sWeb site (http://www.lsl.co.uk/papers) is a good source of information

Environmental System Research Institute (ESRI) has an excellent library of GISbooks for all levels at http://www.esri.com The GIS terminology is defined at http://www.esri.com/library/glossary/glossary.html The university of Edinburgh maintains aGIS WWW resource list at http://www.geo.ed.ac.uk/home/giswww.html

29.4 GENOME DATA MANAGEMENT

29.4.1 Biological Sciences and Genetics

The biological sciences encompass an enormous variety of information Environmental ence gives us a view of how species live and interact in a world filled with natural phenom-ena Biology and ecology study particular species Anatomy focuses on the overall structure

sci-of an organism, documenting the physical aspects sci-of individual bodies Traditional medicineand physiology break the organism into systems and tissues and strive to collect information

on the workings of these systems and the organism as a whole Histology and cell biologydelve into the tissue and cellular levels and provide knowledge about the inner structureand function of the cell This wealth of information that has been generated, classified, andstored for centuries has only recently become a major application of database technology.Genetics has emerged as an ideal field for the application of information technology

In a broad sense, it can be thought of as the construction of models based on information

Trang 9

about genes-which can be defined as basic units of heredity-and populations and the

seeking out of relationships in that information The study of genetics can be divided into

three branches: (1) Mendelian genetics, (2) molecular genetics, and (3) population

genetics Mendelian genetics is the study of the transmission of traits between

generations Molecular genetics is the study of the chemical structure and function of

genes at the molecular level Population genetics is the study of how genetic information

varies across populations of organisms

Molecular genetics provides a more detailed look at genetic information by allowing

researchers to examine the composition, structure, and function of genes The origins of

molecular genetics can be traced to two important discoveries The first occurred in 1869

when Friedrich Miescher discovered nuclein and its primary component, deoxyribonucleic

acid (DNA). In subsequent researchDNA and a related compound, ribonucleic acid (RNA),

were found to be composed of nucleotides (a sugar, a phosphate, and a base, which

combined to form nucleic acid) linked into long polymers via the sugar and phosphate The

second discovery was the demonstration in 1944 by Oswald Avery thatDNAwas indeed the

molecular substance carrying genetic information Genes were thus shown to be composed

of chains of nucleic acids arranged linearly on chromosomes and to serve three primary

functions: (1) replicating genetic information between generations, (2) providing

blueprints for the creation of polypeptides, and (3) accumulating changes-thereby

allowing evolution to occur Waston and Crick found the double-helix structure of the

DNA in 1953, which gave molecular genetics research a new direction.6Discovery of the

DNA and its structure is hailed as probably the most important biological work of the last

100 years, and the field it opened may be the scientific frontier for the next 100 In 1962,

Watson, Crick, and Wilkins won the Nobel Prize for physiology/medicine for this

breakthrough.7

29.4.2 Characteristics of Biological Data

Biological data exhibits many special characteristics that make management of biological

information a particularly challenging problem We will thus begin by summarizing the

characteristics related to biological information, and focusing on a multidisciplinary field

called bioinforrnatics that has emerged, with graduate degree programs now in place in

sev-eral universities Bioinformatics addresses information management of genetic information

with special emphasis on DNA sequence analysis It needs to be broadened into a wider

scope to harness all types of biological information-its modeling, storage, retrieval, and

management Moreover, applications of bioinformatics span design of targets for drugs,

study of mutations and related diseases, anthropological investigations on migration

pat-terms of tribes, and therapeutic treatments

Characteristic 1: Biological data is highly complex when compared with most other

domains orapplications. Definitions of such data must thus be able to represent a complex

substructure of data as well as relationships and to ensure that no information is lost

6 See Nature, 171:737 1953

7 http://www.pbs.org/wgbh/aso/databank/entries/doS3dn.html

Trang 10

during biological data modeling The structure of biological data often provides anadditional context for interpretation of the information Biological information systemsmust be able to represent any level of complexity in any data schema, relationship, orschema substructure-not just hierarchical, binary, or table data As an example,

MITOMAP is a database documenting the human mitochondrial genome.f This singlegenome is a small, circular piece of DNA encompassing information about 16,569nucleotide bases; 52 gene loci encoding messenger RNA, ribosomal RNA, and transfer

RNA; 1000 known population variants; over 60 known disease associations; and a limitedset of knowledge on the complex molecular interactions of the biochemical energyproducing pathway of oxidative phosphorylation As might be expected, its managementhas encountered a large number of problems; we have been unable to use the traditional

RDBMSorODBMSapproches to capture all aspects of the data

Characteristic 2: The amount and range of variabilityindataishigh. Hence, biologicalsystems must be flexible in handling data types and values With such a wide range ofpossible data values, placing constraints on data types must be limited since this mayexclude unexpected values-e.g., outlier values-that are particularly common in thebiological domain Exclusion of such values results in a loss of information In addition,frequent exceptions to biological data structures may require a choice of data types to beavailable for a given piece of data

Characteristic 3: Schemas in biological databases change at a rapid pace.Hence, forimproved information flow between generations or releases of databases, schemaevolution and data object migration must be supported The ability to extend the schema,

a frequent occurrence in the biological setting, is unsupported in most relational andobject database systems Presently systems such as GenBank rerelease the entire databasewith new schemas once or twice a year rather than incrementally changing the system aschanges become necessary Such an evolutionary database would provide a timely andorderly mechanism for following changes to individual data entities in biologicaldatabases over time This sort of tracking is important for biological researchers to be able

to access and reproduce previous results

Characteristic 4: Representations of the same data by different biologists will likely be different (even when using the same system). Hence, mechanisms for "aligning" differentbiological schemas or different versions of schemas should be supported Given thecomplexity of biological data, there are a multitude of ways of modeling any given entity,with the results often reflecting the particular focus of the scientist While two individualsmay produce different data models if asked tointerpret the same entity, these models willlikely have numerous points in common In such situations, it would be useful tobiological investigators to be able to run queries across these common points By linkingdata elements in a network of schemas, this could be accomplished

Characteristic 5:Most users of biological datadonot require write access to the database; read-only access is adequate. Write access is limited to privileged users calledcurators. Forexample, the database created as part of theMITOMAPproject has on average more than

8. Details ofMITOMAPand its information complexity can be seen in Kogelniket al.(1997, 1998)

and at http://www mitomap.org

Trang 11

15,000 users per month on the Internet There are fewer than twenty noncurator

generated submissions to MITOMAP every month In other words, the number of users

requiring write access is small Users generate a wide variety of read-access patterns into

the database, but these patterns are not the same as those seen in traditional relational

databases User requested ad hoc searches demand indexing of often unexpected

combinations of data instance classes

Characteristic 6: Most biologists are not likely to have any knowledge of the internal

structure of the database or about schema design. Biological database interfaces should

display information to users in a manner that is applicable to the problem they are trying

to address and that reflects the underlying data structure Biological users usually know

which data they require, but they have no technical knowledge of the data structure or

how a DBMS represents the data They rely on technical userstoprovide them with views

into the database Relational schemas fail to provide cues or any intuitive information to

the user regarding the meaning of their schema Web interfaces in particular often

provide preset search interfaces, which may limit access into the database However, if

these interfaces are generated directly from database structures, they are likely to produce

a wider possible range of access, although they may not guarantee usability

Characteristic 7: The context of data gives added meaning for its use in biological

applications. Hence, context must be maintained and conveyed to the user when

appropriate In addition, it should be possible to integrate as many contexts as possible to

maximize the interpretation of a biological data value Isolated values are of less use in

biological systems For example, the sequence of a DNA strand is not particularly useful

without additional information describing its organization, function, and such A single

nucleotide on a DNA strand, for example, seen in context with nondisease-causing DNA

strands, could be seen as a causative element for sickle cell anemia

Characteristic8: Definingand representing complex queries isextremely importanttothe

biologist. Hence, biological systems must support complex queries Without any

knowledge of the data structure (see Characteristic 6), average users cannot construct a

complex query across data sets on their own Thus, in order tobe truly useful, systems

must provide some tools for building these queries As mentioned previously, many

systems provide predefined query templates

Characteristic9:Users of biological information often require access to "old" values of the

data-particularly when verifying previously reported results.Hence, changes to the values of

data in the database must be supported through a system of archives Access to both the

most recent version of a data value and its previous version are important in the

biological domain Investigators consistently want to query the most up-to-date data, but

they must also be able to reconstruct previous work and reevaluate prior and current

information Consequently, values that are about to be updated in a biological database

cannot simply be thrown away

All of these characteristics clearly point to the fact that today's DBMSs do not fully

cater to the requirements of complex biological data A new direction in database

management systems is necessary,"

9 See Kogelnik et al (1997, 1998) for further details

Trang 12

29.4.3 The Human Genome Project and Existing

Biological Databases

The termgenomeis defined as the total genetic information that can be obtained about an

entity The human genome, for example, generally refers to the complete set of genes

required to create a human being estimatedtobe more than 30,000 genes spread over 23pairs of chromosomes, with an estimated 3 to 4 billion nucleotides The goal of the HumanGenome Project (HGP) has been to obtain the complete sequence-the ordering of thebases-of those nucleotides A rough draft of entire human genome sequence wasannounced in June 2000 and the 13-year effort will end in year 2003 with the completion ofthe human genetic sequence In isolation, the human DNA sequence is not particularly use-ful The sequence can however be combined with other data and used as a powerful toolto

help address questions in genetics, biochemistry, medicine, anthropology, and agriculture

In the existing genome databases, the focus has been on "curating" (or collecting with someinitial scrutiny and quality check) and classifying information about genome sequence data

In addition to the human genome, numerous organisms such as E.coli, Drosophila, and

C.eleganshave been investigated We will briefly discuss some of the existing database tems that are supporting or have grown out of the Human Genome Project

sys-GenBank The preeminent DNA sequence database in the world today is GenBank,maintained by the National Center for Biotechnology Information (NCB!) of theNational Library of Medicine (NLM).lt was established in 1978 as a central repository forDNA sequence data Since then it has expanded somewhat in scope to include expressedsequence tag data, protein sequence data, three-dimensional protein structure, taxonomy,and links to the biomedical literature (MEDLINE) As of release 135.0 in April 2003,GenBank contains over 31 billion nucleotide bases of more than 24 million sequencesfrom over 100,000 species with roughly 1400 new organisms being added each month.The database size in flat file format is over 100 GB uncompressed and has been doublingevery 15 months Through international collaboration with the European MolecularBiology Laboratory(EMBL) in the U.K and the DNA Data Bank of Japan (DDBJ), dataare exchanged among the three sites on a daily basis The mirroring of sequence data atthe three sites affords fast access to this data to scientists in varous geographical parts ofthe world

While it is a complex, comprehensive database, the scope of its coverage is focused

on human sequences and links to the literature Other limited data sources (e.g dimensional structure and OMIM, discussed below), have been added recently byreformatting the existing OMIM and PDB databases and redesigning the structure of theGenBank system to accommodate these new data sets

three-The system is maintained as a combination of flat files, relational databases, and filescontaining Abstract Syntax Notation One (ASN.l)-a syntax for defining data structuresdeveloped for the telecommunications industry Each GenBank entry is assigned a uniqueidentifier by the NCB! Updates are assigned a new identifier, with the identifier of theoriginal entity remaining unchanged for archival purposes Older references to an entitythus do not inadvertently indicate a new and possibly inappropriate value The mostcurrent concepts also receive a second set of unique identifiers (UIDs), which mark the

Trang 13

most up-to-date form of a concept while allowing older versions to be accessed via their

original identifier

The average user of the database is not able to access the structure of the data directly

for querying or other functions, although complete snapshots of the database are available

for export in a number of formats, including ASN.1 The query mechanism provided is via

the Entrez application (or its World Wide Web version), which allows keyword,

sequence, and GenBankUIDsearching through a static interface

The Genome Database (GOB). Created in 1989, the Genome Database (GOB) is a

catalog of human gene mapping data, a process that associates a piece of information with

a particular location on the human genome The degree of precision of this location on

the map depends upon the source of the data, but it is usually not at the level of

individual nucleotide bases GOB data includes data describing primarily map information

(distance and confidence limits), and Polymerase Chain Reaction (PCR) probe data

(experimental conditions, PCR primers, and reagents used) More recently efforts have

been made to add data on mutations linked to genetic loci, cell lines used in experiments,

DNA probe libraries, and some limited polymorphism and population data

The GOB system is built around SYBASE, a commercial relational DBMS, and its data

are modeled using standard Entity-Relationship techniques (see Chapters 3 and 4) The

implementors of GOB have noted difficulties in using this model to capture more than

simple map and probe data In order to improve data integrity and to simplify the

programming for application writers, GOB distributes a Database Access Toolkit

However, most users use a Web interface to search the ten interlinked data managers

Each manager keeps track of the links (relationships) for one of the ten tables within the

GOB system As with GenBank, users are given only a very high-level view of the data at

the time of searching and thus cannot easily make use of any knowledge gleaned from the

structure of the GOB tables Search methods are most useful when users are simply looking

for an index into map or probe data Exploratory ad hoc searching of the database is not

encouraged by present interfaces Integration of the database structures of GOB and OMIM

(see below) was never fully established

Online Mendelian Inheritance in Man Online Mendelian Inheritance in Man

(OMIM) is an electronic compendium of information on the genetic basis of human

disease Begun in hard-copy form by Victor McCusick in 1966 with 1500 entries, it was

converted to a full-text electronic form between 1987 and 1989 by the GOB In 1991 its

administration was transferred from Johns Hopkins University to the NCBI, and the entire

database was converted to NCBI's GenBank format Today it contains more than 14,000

entries

OMIM covers material on five disease areas based loosely on organs and systems Any

morphological, biochemical, behavioral, or other properties under study are referred to as

phenotype of an individual (or a cell) Mendel realized that genes can exist in numerous

different forms known as alleles A genotype refers to the actual allelic composition of an

individual

The structure of the phenotype and genotype entries contains textual data loosely

structured as general descriptions, nomenclature, modes of inheritance, variations, gene

Trang 14

structure, mapping, and numerous lesser categories The full-text entries were converted to

an ASN.1 structured format whenOMIM was transferred to theNCB!.This greatly improvedthe ability to linkOMIM data to other databases and it also provided a rigorous structure forthe data However, the basic form of the database remained difficult to modify

EcoCyc The Encyclopedia ofEscherichia coli Genes and Metabolism (EcoCyc) is arecent experiment in combining information about the genome and the metabolism of E

coliK-12 The database was created in 1996 as a collaboration between Stanford ResearchInstitute and the Marine Biological Laboratory It catalogs and describes the known genes

of E.coli,the enzymes encoded by those genes, and the biochemical reactions catalyzed byeach enzyme and their organization into metabolic pathways In so doing, EcoCyc spansthe sequence and function domains of genomic information.Itcontains 1283 compoundswith 965 structures as well as lists of bonds and atoms, molecular weights, and empiricalformulas Itcontains 3038 biochemical reactions described using 269 data classes

An object-oriented data model was first used to implement the system, with datastored on Ocelot, a frame knowledge representation system EcoCyc data was arranged in

a hierarchy of object classes based on the observations that (1) the properties of areaction are independent of an enzyme that catalyzes it, and (2) an enzyme has a number

of properties that are "logically distinct" from its reactions

EcoCyc provides two methods of querying: (1) direct (via predefined queries) and (2)indirect (via hypertext navigation) Direct queries are performed using menus and dialogsthat can initiate a large but finite set of queries No navigation of the actual datastructures is supported In addition, no mechanism for evolving the schema isdocumented

Table 29.1 summarizes the features of the major genome-related databases, as well as

HGMOB and ACEOB databases Some additional protein databases exist; they containinformation about protein structures Prominent protein databases include SWISS- PROTat the University of Geneva, Protein Data Bank (POB) at Brookhaven NationalLaboratory, and Protein Identification Resource (PIR) at National Biomedical ResearchFoundation

Over the past ten years, there has been an increasing interest in the applications ofdatabases in biology and medicine GenBank,GOB,and OMIMhave been created as centralrepositories of certain types of biological data but, while extremely useful, they do not yetcover the complete spectrum of the Human Genome Project data However, efforts areunder way around the world to design new tools and techniques that will alleviate the datamanagement problem for the biological scientists and medical researchers

Gene Ontology We already explained the concept of ontologies in Section 29.2.3

in the context of modeling of multimedia information Gene Ontology (GO)Consortium was formed in 1998 as a collaboration among three model organismdatabases: FlyBase, Mouse Genome Informatics (MGI) and Saccharomyces or yeastGenome Database (SGD) Its goal is to produce a structured, precisely defined, common,controlled vocabulary for describing the roles of genes and gene products in any organism.With the completion of genome sequencing of many species, it has been observed that alarge fraction of genes among organisms display similarity in biological roles and

Trang 15

biologists have acknowledge that there is likely to be a single limited universe of genes

and proteins that are conserved in most or all living cells On the other hand, genome

data is increasing exponentially and there is no uniform way to interpret and

conceptualize the shared biological elements Gene Ontology makes possible the

annotation of gene products using a common vocabulary based on their shared biological

attributes and interoperability between genomic databases

The GO Consortium has developed three ontologies: Molecular function, biological

process, and cellular component, to describe attributes of genes, gene products or gene

product groups Molecular function is defined as the biochemical activity of a gene product

Biological process refers to a biological objective to which the gene or gene product

contributes Cellular component refers to the place in the cell where a gene product is

active Each ontology comprises a set of well-defined vocabularies of terms and

relationships The terms are organized in the form of directed acyclic graphs (DAGs), in

TABLE29.1 SUMMARY OF THE MAJOR GENOME-RELATED DATABASES

Genbank DNA/RNA Text files Flat-file/ASN.1 Schema brows- Text, numeric,

ing to other dbs

OMIM Disease Index cards/text Flat-file/ASN.l Unstructured, Text

dbs

complexobjects, linking

to other dbs

HGMDB Sequence and Flat file- Flat-file- Schema expan- Text

sequence application application sion/evolution,

dbs

evolution

Trang 16

which a term node may have multiple parents and multiple children A child term can be

aninstance of (is a) or a part of its parent.Inthe latest release of the GO database, there areover 13,000 terms and more than 18,000 relationships between terms The annotation ofgene products is operated independently by each of the collaborating databases A subset ofthe annotations is included in GO database, which contains over 1,386,000 gene productsand 5,244,000 associations between gene products and GO terms

The Gene Ontology was implemented using MySQL, an open source relationalDBMS and a monthly database release is available in SQL and XML formats A set oftools and libraries, written in C, Java, Perl and XML etc, is available for database accessand development of applications Web-based and stand-alone GO browsers are availablefrom the GO consortium

29.4.4 Selected Bibliography for Genome Databases

Bioinformatics has become a popular area of research in recent years and many workshopsand conferences are being organized around this topic Robbins (1993) gives a good over-view while Frenkel (1991) surveys the human genome project with its special role in bioin-formatics at large Cutticchia et a1 (1993), Benson et a1 (2002), and Pearson et a1 (1994)are references on GOB, GenBank, and OMIM. In an international collaboration amongGeneBank ( USA), DNA Data Bank of Japan (DDBJ) (http://www.ddbj.nig.ac.jp/E-mail/homology.html) , and Euporean Molecular Biology Laborarory (EBML) (Stoesser G, 2003),data are exchanged amongst the collaborating databases on a daily basis to achieve optimalsynchronization Wheeler et a1 (2000) discuss the various tools that currently allow usersaccess and analysis of the data availableinthe databases

Wallace (1995) has been a pioneer in the mitochondrial genome research, whichdeals with a specific part of the human genome; the sequence and organizational details ofthis area appear in Anderson et al (1981) Recent work in Kogelnik et al (1997, 1998)and Kogelnik (1998) addresses the development of a generic solution to the datamanagement problem in biological sciences by developing a prototype solution Apweiler

et al (2003) review the core Bioinformatics resources maintained at the EuropeanBioinformatics Institute (EBI) (such as Swiss-prot +TrEMBL) and summarize importantissues of database management of such resources They discuss three main types ofdatabases: Sequence Databases such as DDBJJEMBL/ GENEBANK Nucleotide SequenceDatabase; Secondary Databases such as PROSITE, PRINTS and Pfam; and IntegratedDatabases such as InterPro, that integrates data from six major protein signature databases(Pfam, PRINTS, ProDom, PROSITE, SMART, and TIGRFAMs)

The European Bioinformatics Institute Macromolecular Structure Database MSD), which is a relational database (http://www.ebi.ac.uk/msd) (Boutselakis et al.,2003) is designed to be a single access point for protein and nucleic acid structures andrelated information The database is derived from Protein Data Bank (PDB) entries Thesearch database contains an extensive set of derived properties, goodness-of-fit indicators,and links to other EBI databases including InterPro, GO, and SWISS-PROT, togetherwith links to SCOP, CATH, PFAM and PROSITE Karp (1996) discusses the problems ofinterlinking the variety of databases mentioned in this section He defines two types of

Trang 17

(E-links: those that integrate the data and those that relate the data between databases.

These were used to design the Ecocyc database

Some of the important web links include the following: The Human Genome

sequence information can be found at: http://www.ncbLnlm.nih.gov/genome/seq/

The MITOMAP database developed in Kogelnik (1998) can be accessed at

http://www.mitomap.org/ The biggest protein database SWISS-PROT can be

accessed from http://expasy.hcuge.ch/sprot/ The ACEDB database information is

available at http://probe.nalusda.gov:8080/acedocs/

Trang 18

Alternative Diagrammatic

Figure A.I shows a number of different diagrammatic notations for representingERand

EERmodel concepts Unfortunately, there is no standard notation: different database

design practitioners prefer different notations Similarly, various CASE (computer-aided

software engineering) tools andOOA(object-oriented analysis) methodologies use various

notations Some notations are associated with models that have additional concepts and

constraints beyond those of theERandEERmodels described in Chapters 3 and 24, while

other models have fewer concepts and constraints The notation we used in Chapter 3 is

quite close to the original notation forERdiagrams, which is still widely used We discuss

some alternate notations here

Figure Al (a) shows different notations for displaying entity types/classes, attributes,

and relationships In Chapters 3 and 24, we used the symbols marked (i) in Figure

AI(a)-namely, rectangle, oval, and diamond Notice that symbol (ii) for entity types/

classes, symbol [ii ) for attributes, and symbol (ii) for relationships are similar, but they are

used by different methodologies to represent three different concepts The straight line

symbol (iii) for representing relationships is used by several tools and methodologies

Figure A 1(b) shows some notations for attaching attributestoentity types We used

notation(i) Notation (ii) uses the third notation (iii) for attributes from Figure Al (a)

The last two notations in Figure Al(b)-(iii) and (iv)-are popular in OOA

methodologies and in someCASE tools In particular, the last notation displays both the

attributes and the methods of a class, separated by a horizontal line

947

Trang 19

(a) entity type/class symbols (i) [ I ] (ii) CD

attribute symbols (i) 5D (ii)

? (iii) - - - 0 A relationship symbols (i) ~ (ii) CD (iii) R

EMPLOYEE (ii)I 8s sn

EMPLOYEE Name

Address (iii) EMPLOYEE

Ssn Name Address

(iv)

1 - - - 1

Ssn Name Address

Figure A.I (c) shows various notations for representing the cardinality ratio of binaryrelationships We used notation (i) in Chapters 3 and 24 Notation (ii)-known as the

chicken feet notation-is quite popular Notation (iv) uses the arrow as a functionalreference (from the N tothe 1 side) and resembles our notation for foreign keys in therelational model (see Figure 7.7); notation (v)-used in Bachman diagrams-uses the

Trang 20

Appendix A Alternative Diagrammatic Notations for ERModels I 949

arrow in the reverse direction (from the 1 to the N side) For a 1:1 relationship, (ii) uses a

straight line without any chicken feet; (iii) makes both halves of the diamond white; and

(iv) places arrowheads on both sides For an M:N relationship, (ii) uses chicken feet at

both ends of the line; (iii) makes both halves of the diamond black; and (iv) does not

display any arrowheads

Figure A.l(d) shows several variations for displaying (min, max) constraints, which

are used to display both cardinality ratio and total/partial participation Notation (ii) is

the alternative notation we used in Figure 3.15 and discussed in Section 3.7.4 Recall that

our notation specifies the constraint that each entity must participate in at least min and

at most max relationship instances Hence, for a 1:1 relationship, both max values are 1;

and for M:N, both max values are n A min value greater than 0 (zero) specifies total

participation (existence dependency) In methodologies that use the straight line for

displaying relationships, it is common to reverse the positioning of the (min, max)

constraints, as shown in (iii) Another popular technique-which follows the same

positioning as (iii)-is todisplay the min as0 (Uoh" or circle, which stands for zero) or as

I (vertical dash, which stands for 1), and to display the max as I (vertical dash, which

stands for 1)or as chicken feet (which stands for n), as shown in (iv)

Figure A.l(e) shows some notations for displaying specialization/generalization We

used notation (i) in Chapter 14, where a d in the circle specifies that the subclasses (S 1,

S2, and S3) are disjoint and an a specifies overlapping subclasses Notation (ii) uses G

(for generalization) to specify disjoint, and Gs to specify overlapping; some notations use

the solid arrow, while others use the empty arrow (shown at the side) Notation (iii) uses

a triangle pointing toward the superclass, and notation (v) uses a triangle pointing toward

the subclasses; it is also possibletouse both notations in the same methodology, with (iii)

indicating generalization and (v) indicating specialization Notation (iv) places the boxes

representing subclasses within the box representing the superclass Of the notations based

on (vi), some use a single-lined arrow, and others use a double-lined arrow (shown at the

side)

The notations shown in Figure A.l show only some of the diagrammatic symbols

that have been used or suggested for displaying database conceptual schemes Other

notations, as well as various combinations of the preceding, have also been used It would

be useful to establish a standard that everyone would adhere to, in order to prevent

misunderstandings and reduce confusion

Trang 21

of Disks

The most important disk parameter is the time required to locate an arbitrary disk block,

given its block address, and then to transfer the block between the disk and a main

mem-ory buffer This is the random access time for accessing a disk block There are three time

components to consider:

1.Seek time (s): This is the time needed to mechanically position the read/write

head on the correct track for movable-head disks (For fixed-head disks, it is the

time needed to electronically switch to the appropriate read/write head.) For

movable head disks this time varies, depending on the distance between the

cur-rent track under the read/write head and the track specified in the block address

Usually, the disk manufacturer provides an average seek time in milliseconds The

typical range of average seek time is 10 to 60 msec This is the main "culprit" for

the delay involved in transferring blocks between disk and memory

2 Rotational delay (rd): Once the read/write head is at the correct track, the user

must wait for the beginning of the required block to rotate into position under the

read/write head On the average, this takes about the time for half a revolution of

the disk, but it actually ranges from immediate access (if the start of the required

block is in position under the read/write head right after the seek) to a full disk

revolution (if the start of the required block just passed the read/write head after

951

Trang 22

952 IAppendix C Parameters of Disks

the seek) If the speed of disk rotation is p revolutions per minute (rpm), then theaverage rotational delay rd is given by

rd = (1/2)*(1/p) min = (60*1000)/(2*p) msec

A typical value for p is 10,000 rpm, which gives a rotational delay of rd = 3 msec.For fixed-head disks, where the seek time is negligible, this component causes thegreatest delay in transferring a disk block

3 Block transfer time (btt): Once the read/write head is at the beginning of therequired block, some time is needed totransfer the data in the block This blocktransfer time depends on the block size, the track size, and the rotational speed.If

the transfer rate for the disk is tr bytes/msec and the block size is B bytes, then

btt = B/tr msec

If we have a track size of 50 Kbvtes and p is 3600 rpm, the transfer rate in bytes/msec is

tr = (50*1000)/(60*1000/3600) = 3000 bytes/msec

In this case, btt=B/3000 msec, where B is the block size in bytes

The average time needed to find and transfer a block, given its block address, isestimated by

(s + rd + btt) msec

This holds for either reading or writing a block The principal method of reducingthis time is to transfer several blocks that are stored on one or more tracks of the samecylinder; then the seek time is required only for the first block To transfer consecutively k

noncontiguous blocks that are on the same cylinder, we need approximately

s + (k * (rd + btt)) msec

In this case, we need two or more buffers in main storage, because we arecontinuously reading or writing the k blocks, as we discussed in Section 4.3 The transfertime per block is reduced even further when consecutive blocks on the same track or

cylinder are transferred This eliminates the rotational delay for all but the first block, sothe estimate for transferring k consecutive blocks is

5 + rd + (k * btt) msec

A more accurate estimate for transferring consecutive blocks takes into account theinterblock gap (see Section 5.2.1), which includes the information that enables the read/write head to determine which block it is about to read Usually, the disk manufacturerprovides a bulk transfer rate (btr ) that takes the gap size into account when readingconsecutively stored blocks Ifthe gap size is G bytes, then

btr = (B/(B + G)) * tr bytes/msec

The bulk transfer rate is the rate of transferringuseful bytes in the data blocks The

disk read/write head must go over all bytes on a track as the disk rotates, includingthe bytes in the interblock gaps, which store control information but not real data Whenthe bulk transfer rate is used, the time needed to transfer the useful data in one block out

Trang 23

of several consecutive blocks is B/btr. Hence, the estimated time to read k blocks

consecutively stored on the same cylinder becomes

5 + rd + (k * (B/btr)) msec

Another parameter of disks is the rewrite time This is useful in cases when we read a

block from the disk into a main memory buffer, update the buffer, and then write the

buffer back to the same disk block on which it was stored In many cases, the time

required to update the buffer in main memory is less than the time required for one disk

revolution If we know that the buffer is ready for rewriting, the system can keep the disk

heads on the same track, and during the next disk revolution the updated buffer is

rewritten back to the disk block Hence, the rewrite time Trw' is usually estimated to be

the time needed for one disk revolution:

Trw = 2 ~, rd msec

To summarize, here is a list of the parameters we have discussed and the symbols we

use for them:

seek time: s msec

rotational delay: rd msec

block transfer time: btt msec

rewrite time: Trw msec

transfer rate: tr byres/msec

bulk transfer rate: btr bytes/msec

block size: B bytes

interblock gap size: G bytes

Trang 24

Overview of the

The Query-By-Example(QBE) language is important because it is one of the first

graphi-cal query languages with minimum syntax developed for database systems It was

devel-oped at IBMResearch and is available as an IBMcommercial product as part of theQMF

(Query Management Facility) interface option to DB2. The language was also

imple-mented in the PARADOX DBMS, and is related to a point-and-click type interface in the

ACCESS DBMS(see Chapter 10) It differs fromSQLin that the user does not have to

spec-ify a structured query explicitly; rather, the query is formulated by filling in templates of

relations that are displayed on a monitor screen Figure 9.5 shows how these templates

may look for the database of Figure 7.6 The user does not have to remember the names of

attributes or relations, because they are displayed as part of these templates In addition,

the user does not have to follow any rigid syntax rules for query specification; rather,

con-stants and variables are entered in the columns of the templates to construct an example

related to the retrieval or update request.QBEis related to the domain relational calculus,

as we shall see, and its original specification has been shown to be relationally complete

InQBE,retrieval queries are specified by filling in one or more rows in the templates of the

tables For a single relation query, we enter either constants or example elements (a QBE

term) in the columns of the template of that relation An example element stands for a

955

Trang 25

FIGURE D.l The relational schema of Figure 7.6 as it may be displayed byQBE.

domain variable and is specified as an example value preceded by the underscore ter L). Additionally, aP.prefix (called the P dot operator) is entered in certain columns

charac-to indicate that we would like toprint (or display) values in those columns for our result.The constants specify values that must be exactly matched in those columns

For example, consider the query QO: "Retrieve the birthdate and address of John B.Smith." We show in Figures 9.6(a) through 9.6(d) how this query can be specified in aprogressively more terse form in QBE. In Figure 9.6(a) an example of an employee is pre-sented as the type of row that we are interested in By leaving John B Smith as constants

in the FNAME, MINH,andLNAMEcolumns, we are specifying an exact match in those columns.All the rest of the columns are preceded by an underscore indicating that they are domain

Trang 26

Appendix 0 Overview of the QBELanguage I 957

variables (example elements) TheP.prefix is placed in the BDATE and ADDRESS columns to

indicate that we would like to output valuets) in those columns

QO can be abbreviated as shown in Figure 9.6(b) There is no need to specify

exam-ple values for columns in which we are not interested Moreover, because examexam-ple values

are completely arbitrary, we can just specify variable names for them, as shown in Figure

9.6(c) Finally, we can also leave out the example values entirely, as shown in Figure

9.6(d), and just specify aP under the columnstobe retrieved

To see how retrieval queries in QBEare similar to the domain relational calculus,

compare Figure 9.6(d) with QO (simplified) in domain calculus, which is as follows:

QO : {uv I EMPLOYEE(qrstuvwxyz) and q='John' and r='B' and s='Smith'}

We can think of each column in aQBEtemplate as an implicit domain variable;hence,

FNAME corresponds to the domain variable q, MINH corresponds to r, , and DNO corresponds

to z In theQBEquery, the columns withP.correspondtovariables specified to the left of

the bar in domain calculus, whereas the columns with constant values correspond to tuple

variables with equality selection conditions on them The condition EMPLOYEE(qrstuvwxyz)

and the existential quantifiers are implicit in theQBEquery because the template

corre-sponding to the EMPLOYEE relation is used

InQBE,the user interface first allows the user to choose the tables (relations) needed

to formulate a query by displaying a list of all relation names The templates for the

cho-sen relations are then displayed The user moves to the appropriate columns in the

tem-plates and specifies the query Special function keys were provided to move among

templates and perform certain functions

We now give examples to illustrate basic facilities of QBE.Comparison operators

other than = (such as> or 2:) may be entered in a column before typing a constant value

For example, the query QOA: "List the social security numbers of employees who work

more than 20 hours per week on project number 1," can be specified as shown in Figure

9.7(a) For more complex conditions, the user can ask for a condition box, which is

cre-ated by pressing a particular function key The user can then type the complex

condi-tion.' For example, the query QOB-"List the social security numbers of employees who

work more than 20 hours per week on either project 1 or project 2"-ean be specified as

shown in Figure 9.7(b)

Some complex conditions can be specified without a condition box The rule is that

all conditions specified on the same row of a relation template are connected by the and

logical connective(allmust be satisfied by a selected tuple), whereas conditions specified

on distinct rows are connected by or(at least onemust be satisfied) Hence, QOB can also

be specified, as shown in Figure 9.7(c), by entering two distinct rows in the template

Now consider query QOC: "List the social security numbers of employees who work

onbothproject 1 and project 2"; this cannot be specified as in Figure 9.8(a), which lists

those who work on eitherproject 1 or project 2 The example variable_ESwill bind itself

to ESSN values in <-, 1, -> tuplesas well asto those in <-,2, -> tuples Figure 9.8(b)

1.Negation with the -, symbol is not allowed in a condition box.

Trang 27

(b)

ESSNP

Trang 28

Appendix 0 Overview of theQSELanguage I 959

I RE~ULT 1-_-E-1-1 _E-2 -11~ -1r -1

FIGURE 0.5 IllustratingJOINand result relations inQSE. (a) The queryQl (b) The queryQ8

A join operation is specified in QBEby using the same variable 2in the columns to be

joined For example, the query Q1: "List the name and address of all employees who work for

the 'Research' department," can be specified as shown in Figure 9.9(a) Any number of joins

can be specified in a single query We can also specify a result table to display the result of the

join query, as shown in Figure 9.9(a); this is needed if the result includes attributes from two or

more relations If no result table is specified, the system provides the query result in the

col-umns of the various relations, which may make it difficult to interpret Figure 9.9(a) also

illus-trates the feature ofQBEfor specifying that all attributes of a relation should be retrieved, by

placing theP.operator under the relation name in the relation template

To join a table with itself, we specify different variables to represent the different

ref-erencestothe table For example, query QS-"For each employee retrieve the employee's

first and last name as well as the first and last name of his or her immediate

supervisor"-can be specified as shown in Figure 9.9(b), where the variables starting with E refer to an

employee and those starting with S refertoa supervisor

D.2 GROUPING, AGGREGATION, AND

DATABASE MODIFICATION IN QBE

Next, consider the types of queries that require grouping or aggregate functions A

group-ing operator G can be specified in a column to indicate that tuples should be grouped by

2 A variable is called an example element in QBE manuals.

Trang 29

the value of that column Common functions can be specified, such asAVG., SUM., CNT.

(count), MAX., and MIN. In QBE the functionsAVG., SUM., andCNT. are applied to tinct values within a group in the default case.Ifwe want these functions to apply to allvalues, we must use the prefixALL.3This convention isdifferentinSQL,where the default

dis-is to apply a function toall values

Figure 9.1O(a) shows queryQ23,which counts the number ofdistinctsalary values inthe EMPLOYEE relation Query Q23A (Figure 9.1Ob) counts all salary values, which is thesame as counting the number of employees Figure 9.10(c) shows Q24, which retrieveseach department number and the number of employees and average salary within eachdepartment; hence, theDNOcolumn is used for grouping as indicated by the G function.Several of the operatorsG.,P.,andALLcan be specified in a single column Figure 9.l0(d)shows query Q26,which displays each project name and the number of employees work-ing on it for projects on which more than two employees work

QBEhas a negation symbol, " which is used in a manner similar to the NOT EXISTS

function in SQL. Figure 9.11 shows query Q6, which lists the names of employees whohave no dependents The negation symbol ' says that we will select values of the_SX

variable from the EMPLOYEErelation only if they do not occur in theDEPENDENTrelation Thesame effect can be produced by placing a ' _SX in theESSNcolumn

Trang 30

Appendix D Overview of theQSE Language I 961

ADDRESS

DEPENDENT ~ NAME RELATIONSHIP

FIGURE 0.7 Illustrating negation by the query Q6

Although theQBE language as originally proposed was shown to support the

equiva-lent of the EXISTS and NOT EXISTS functions of SQL, the QBE implementation in QMF

(under the DBl system) does not provide this support Hence, the QMF version ofQBE,

which we discuss here, is not relationally complete Queries such as Q3-"Find employees

who work on all projects controlled by department 5" cannot be specified.

There are threeQBEoperators for modifying the database:1.for insert, D.for delete,

and U for update The insert and delete operators are specified in the template column

under the relation name, whereas the update operator is specified under the columns to be

updated Figure 9.12(a) shows how to insert a new EMPLOYEEtuple For deletion, we first

enter the D operator and then specify the tuples to be deleted by a condition (Figure

9.12b) To update a tuple, we specify theU.operator under the attribute name, followed

by the new value of the attribute We should also select the tuple or tuples to be updated

in the usual way Figure 9.12(c) shows an update request to increase the salary of 'John

Smith' by 10 percent and also to reassign him to department number 4

QBE also has data definition capabilities The tables of a database can be specified

interactively, and a table definition can also be updated by adding, renaming, or removing

a column We can also specify various characteristics for each column, such as whether it

is a key of the relation, what its data type is, and whether an index should be created on

that field QBE also has facilities for view definition, authorization, storing query

defini-tions for later use, and so on

QBEdoes not use the "linear" style ofSQL; rather, it is a "two-dimensional" language,

because users specify a query moving around the full area of the screen Tests on users

ADDRESS

FIGURE 0.8 Modifying the database inQBE. (a) Insertion (b) Deletion (c) Update inQSE.

Trang 31

have shown that QBE is easier to learn than SQL, especially for nonspecialists In thissense,QBEwas the first user-friendly "visual" relational database language.

More recently, numerous other user-friendly interfaces have been developed for mercial database systems The use of menus, graphics, and forms is now becoming quitecommon Visual query languages, which are still not so common, are likely to be offeredwith commercial relational databases in the future

Trang 32

com-Selected Bibliography

Abbreviations Used in the Bibliography

ACM: Association for Computing Machinery

AFIPS: American Federation of Information Processing Societies

CACM: Communications of the ACM (journal)

CIKM: Proceedings of the International Conference on Information and Knowledge

Management

EDS: Proceedings of the International Conference on Expert Database Systems

ER Conference: Proceedings of the International Conference on Entity-Relationship

Approach (now called International Conference on Conceptual Modeling)

ICDE: Proceedings of the IEEE International Conference on Data Engineering

IEEE: Institute of Electrical and Electronics Engineers

IEEE Computer: Computer magazine (journal) of the IEEE CS

IEEE CS: IEEE Computer Society

IFIP: International Federation for Information Processing

JACM: Journal of the ACM

KDD: Knowledge Discovery in Databases

LNCS: Lecture Notes in Computer Science

NCC: Proceedings of the National Computer Conference (published by AFIPS)

963

Trang 33

OOPSLA: Proceedings of the ACM Conference on Object-Oriented Programming tems, Languages, and Applications

Sys-PODS: Proceedings of the ACM Symposium on Principles of Database SystemsSIGMOD: Proceedings of the ACM SIGMOD International Conference onManagement of Data

TKDE: IEEE Transactions on Knowledge and Data Engineering (journal)TOCS: ACM Transactions on Computer Systems (journal)

TODS: ACM Transactions on Database Systems (journal)TOIS: ACM Transactions on Information Systems (journal)TOOlS: ACM Transactions on Office Information Systems (journal)TSE: IEEE Transactions on Software Engineering (journal)

VLDB: Proceedings of the International Conference on Very Large Data Bases (issuesafter 1981 available from Morgan Kaufmann, Menlo Park, California)

Format for Bibliographic Citations

Book titles are in boldface-for example, Database Computers Conference proceedingsnames are in italics-for example, ACM Pacific Conference Journal names are in bold-

face-for example,TODSor Information Systems For journal citations, we give the ume number and issue number (within the volume, if any) and date of issue For example

vol-"TODS, 3:4, December 1978" refers to the December 1978 issue ofACMTransactions on Database Systems, which is Volume3,Number4.Articles that appear in books or confer-ence proceedings that are themselves cited in the bibliography are referenced as "in"these references-for example, "in VLDB [1978]" or "in Rustin [1974]." Page numbers(abbreviated "pp.") are provided with pp at the end of the citation whenever available.For citations with more than four authors, we will give the first author only followed by eta1 In the selected bibliography at the end of each chapter, we use et a1 if there are morethan two authors

Abbott, R., and Garcia-Molina, H [1989] "Scheduling Real-Time Transactions with DiskResident Data," inVLDB [1989]

Abiteboul, S., and Kanellakis, P [1989] "Object Identity as a Query Language Primitive,"

inSIGMOD[1989]

Abiteboul, S Hull, R., and Vianu,V.[1995] Foundations of Databases, Addison-Wesley,1995

Abrial, J [1974] "Data Semantics," in Klimbie and Koffeman [1974]

Adam, N., and Gongopadhyay,A.[1993] "Integrating Functional and Data Modeling in aComputer Integrated Manufacturing System," inICDE[1993]

Trang 34

Selected Bibliography I 965

Adriaans,P.,and Zantinge, D [1996] Data Mining, Addison-Wesley, 1996

Afsarmanesh, H., McLeod, D., Knapp, D., and Parker,A [1985] "An Extensible

Object-Oriented Approach to Databases forVLSI/CAD,"inVLDB[1985]

Agrawal, D., and ElAbbadi, A [1990] "Storage Efficient Replicated Databases," TKDE,

Agrawal, R., Imielinski, T., and Swami A [1993] "Mining Association Rules Between

Sets of Items in Databases," inSIGMOD[1993]

Agrawal, R., Imielinski, T.,and Swami, A [1993b] "Database Mining: A Performance

Perspective,"IEEE TKOE 5:6, December1993~

Agrawal, R., Mehta, M., and Shafer, ]., and Srikant, R [1996] "The Quest Data Mining

System," inKDD[1996]

Agrawal, R., and Srikant, R [1994] "Fast Algorithms for Mining Association Rules in

Large Databases," inVLDB[1994]

Ahad, R., and Basu,A [1991] "ESQL: A Query Language for the Relational Model

Sup-porting Image Domains," inICDE[1991]

Aho,A, Beeri, C., and Ullman,] [1979] "The Theory of Joins in Relational Databases,"

TOOS,4:3, September 1979

Aho,A,Sagiv, Y., and Ullman, J [1979a] "Efficient Optimization of a Class of Relational

Expressions,"TOOS,4:4, December 1979

Aho,A and Ullman, J [1979] "Universality of Data Retrieval Languages," Proceedingsof

the POPL Conference, San Antonio TX,ACM, 1979

Akl, S [1983] "Digital Signatures: A Tutorial Survey,"IEEE Computer, 16:2, February

1983

Alashqur, A, Su, S., and Lam, H [1989] "OQL: A Query Language for Manipulating

Object-Oriented Databases," inVLDB[1989]

Albano, A., Cardelli, L.,and Orsini, R [1985]"GALILEO: A Strongly Typed Interactive

Conceptual Language,"TOOS,10:2, June 1985

Allen, E, Loomis, M., and Mannino, M [1982] "The Integrated Dictionary/Directory

System,"ACMComputing Surveys, 14:2, June 1982

Alonso, G., Agrawal, D., EI Abbadi,A,and Mohan,C.[1997] "Functionalities and

lim-itations of Current Workflow Management Systems,"IEEEExpert, 1997

Amir, A, Feldman, R., and Kashi, R [1997] "A New and Versatile Method for

Associa-tion GeneraAssocia-tion," InformaAssocia-tion Systems, 22:6, September 1997

Anderson, S., Bankier, A., Barrell, B., deBruijn, M., Coulson, A., Drouin, J., Eperon, I.,

Nierlich, D., Rose, B., Sanger, E, Schreier, P., Smith,A,Staden, R., Young,I.[1981]

"Sequence and Organization of the Human Mitochondrial Genome." Nature,

290:457-465,1981

Trang 35

ANSI [1975] American National Standards Institute Study Group on Data Base ment Systems: Interim Report,FDT, 7:2,ACM, 1975.

Manage-ANSI [1986] American National Standards Institute: The Database LanguageSQL, mentANSIX3.135, 1986

Docu-ANSI [1986a] American National Standards Institute: The Database LanguageNOL, umentANSIX3.133, 1986

Doc-ANSI [1989] American National Standards Institute: Information Resource DictionarySystems, DocumentANSI X3.138, 1989

Anwar,T, Beck, H., andNavathe,S [1992] "Knowledge Mining by Imprecise Querying:

A Classification Based Approach," in ICDE[1992]

Apers, P., Hevner,A.,and Yao, S [1983] "Optimization Algorithms for Distributed ries,"TSE,9:1, January 1983

Que-Armstrong, W [1974] "Dependency Structures of Data Base Relationships,"Proceedings of the IFIP Congress, 1974.

Astrahan, M., et al [1976] "System R: A Relational Approach to Data Base ment,"TOOS,1:2, June 1976

Manage-Atkinson, M., and Buneman,P [1987] "Types and Persistence in Database ProgrammingLanguages" inACMComputing Surveys, 19:2, June 1987

Atluri, v.,[ajodia, S., Keefe, TE, McCollum,c.,and Mukkamala, R [1997] "MultilevelSecure Transaction Processing: Status and Prospects," in Database Security: Statusand Prospects, Chapman and Hall, 1997, pp 79-98

Atzeni, P., and De Antonellis, V [1993] Relational Database Theory, mings, 1993

Benjamin/Cum-Atzeni, P., Mecca,G.,and Merialdo, P [1997] "To Weave the Web," in VLDB [1997].Bachman, C [1969] "Data Structure Diagrams," Data Base (Bulletin ofACM SIGFIDET),

Ran-Badal, D., and Popek, G [1979J "Cost and Performance Analysis of Semantic IntegrityValidation Methods," inSIGMOD[1979]

Badrinath, B and Ramamritham, K [1992J "Semantics-Based Concurrency Control:Beyond Commutativity,"TOOS,17:1, March 1992

Baeaa-Yates, R., and Larson, P.A [1989J "Performance of Bf -trees with Partial sions," TKOE, 1:2, June 1989

Expan-Baeza-Yates, R., and Ribero-Neto, B [1999] Modern Information Retrieval, Wesley, 1999

Trang 36

Addison-Selected Bibliography I967

Balbin, I.,and Ramamohanrao, K [1987] "A Generalization of the Different Approach to

Recursive Query Evaluation," Journal of Logic Programming, 15:4, 1987

Bancilhon, E, and Buneman, P., eds [1990] Advances in Database Programming

Lan-guages,ACM Press, 1990

Bancilhon, E, Delobel, c., and Kanellakis, P., eds [1992] Building an Object-Oriented

Database System: The Story of02, Morgan Kaufmann, 1992

Bancilhon, E, Maier, D., Sagiv, Y., and Ullman, ] [1986] "Magic sets and other strange

ways to implement logic programs,"PODS[1986]

Bancilhon, E, and Ramakrishnan, R [1986] "An Amateur's Introduction to Recursive

Query Processing Strategies, " inSIGMOD[1986]

Banerjee, ]., et al [1987] "Data Model Issues for Object-Oriented Applications,"TOOlS,

5:1, January 1987

Banerjee, J., Kim, W., Kim, H., and Korth, H [1987a] "Semantics and Implementation of

Schema Evolution in Object-Oriented Databases," inSIGMOD [1987]

Baroody,A.,and DeWitt, D [1981] "An Object-Oriented ApproachtoDatabase System

Implementation,"TODS,6:4, December 1981

Barsalou, T.,Siambela, N., Keller,A.,and Wiederhold, G [1991] "Updating Relational

Databases Through Object-Based Views," inSIGMOD[1991]

Bassiouni, M [1988] "Single-Site and Distributed Optimistic Protocols for Concurrency

Control,"TSE, 14:8, August 1988

Batini, c.,Ceri, S., and Navathe, S [1992] Database Design: An Entity-Relationship

Approach, Benjamin/Cummings, 1992

Batini, C; Lenzerini, M., and Navathe, S [1987] "A Comparative Analysis of Methodologies

for Database Schema Integration," ACMComputing Surveys, 18:4, December 1987

Batory, D., and Buchmann,A.[1984] "Molecular Objects, Abstract Data Types, and Data

Models: A Framework," inVLDB[1984]

Batory, D., et al [1988] "GENESIS: An Extensible Database Management System," TSE,

14:11, November 1988

Bayer, R., Graham, M., and Seegmuller, G., eds [1978] Operating Systems: An

Advanced Course, Springer-Verlag, 1978

Bayer, R., and McCreight, E [1972] "Organization and Maintenance of Large Ordered

Indexes," Acta Informatica, 1:3, February 1972

Beck, H., Anwar, T.,and Navathe, S [1993] "A Conceptual Clustering Algorithm for

Database Schema Design,"TKDE,to appear

Beck, H., Gala, S., and Navathe, S [1989] "Classification as a Query Processing

Tech-nique in theCANDIDESemantic Data Model," inICDE[1989]

Beeri,c., Fagin, R., and Howard,] [1977] "A Complete Axiomatization for Functional

and Multivalued Dependencies," inSIGMOD[1977]

Beeri,c.,and Ramakrishnan, R [1987] "On the Power of Magic" inPODS[1987]

Benson, D., Boguski, M., Lipman, D., and Ostell, ]., "GenBank," Nucleic Acids

Research, 24:1, 1996

Trang 37

Ben-Zvi, J [1982] "The Time Relational Model," Ph.D dissertation, University of fornia, Los Angeles, 1982.

Cali-Berg, B and Roth, J [1989] Software for Optical Disk, Meckler, 1989

Berners-Lee, T.,Caillian, R., Grooff, J., Pollerrnann, B [1992] "World-Wide Web: TheInformation Universe," Electronic Networking: Research, Applications and Pol-icy, 1:2, 1992

Berners-Lee, T., Caillian, R., Lautonen, A., Nielsen, H., and Secret, A [1994] "TheWorld Wide Web," CACM, 13:2, August 1994

Bernstein, P.[1976] "Synthesizing Third Normal Form Relations from Functional dencies,"TODS,1:4, December 1976

Depen-Bernstein, P., Blaustein, B., and Clarke, E [1980] "Fast Maintenance of Semantic rity Assertions Using Redundant Aggregate Data," inVLDB[1980]

Integ-Bernstein, P., and Goodman, N [1980] "Timestamp-Based Algorithms for ConcurrencyControl in Distributed Database Systems," inVLDB[1980]

Bernstein,P.,and Goodman, N [1981] "The Power of Natural Semijoins,"SIAMJournal

Multi-Bertino, E., and Ferrari, E [1998] "Data Security," Twenty-Second Annual International ConferenceonComputer Software and Applications, August 1998, pp 228-237.Bertino, E., and Kim, W [1989] "Indexing Techniques for Queries on Nested Objects,"TKDE, 1:2, June 1989

Bertino, E., Negri, M., Pelagatti, G., and Sbattella, L.[1992] "Object-Oriented QueryLanguages: The Notion and the Issues,"TKDE, 4:3, June 1992

Bertino, E., Pagani, E., and Rossi, G [1992] "Fault Tolerance and Recovery in MobileComputing Systems, in Kumar and Han [1992]

Bertino, E, Rabbitti and Gibbs, S [1988] "Query Processing in a Multimedia ment,"TOlS,6, 1988

Environ-Bhargava, B., ed [1987] Concurrency and Reliability in Distributed Systems, Van trand-Reinhold,1987

Nos-Bhargava, B.,and Helal,A.[1993] "Efficient Reliability Mechanisms in Distributed base Systems,"CIKM, November 1993

Trang 38

Data-Selected Bibliography I 969

Bhargava, B., and Reidl, ] [1988] "A Model for Adaptable Systems for Transaction

Pro-cessing," in ICDE [1988]

Biliris,A [1992] "The Performance of Three Database Storage Structures for Managing

Large Objects," in SIGMOD [1992]

Biller, H [1979] "On the Equivalence of Data Base Schemas-A Semantic Approach to

Data Translation," Information Systems, 4:1, 1979

Bischoff, ]., and T Alexander, eds., Data Warehouse: Practical Advice from the

Blakeley, J., Coburn, N., and Larson, P. [1989] "Updated Derived Relations: Detecting

Irrelevant and Autonomously Computable Updates,"TODS, 14:3, September 1989

Blakeley, ]., and Martin, N [1990] "Join Index, Materialized View, and Hybrid-Hash Join:

A Performance Analysis," in ICDE [1990]

Blasgen, M., and Eswaran,K.[1976] "On the Evaluation of Queries in a Relational

Data-base System," IBM Systems Journal, 16:1, January 1976

Blasgen, M., et al [1981] "System R: An Architectural Overview," IBM Systems Journal,

20:1, January 1981

Bleier, R., and Vorhaus, A [1968] "File Organization in the socTOMS,"Proceedings of the

IFIPCongress.

Bocca, J [1986] "EDUCE-A Marriage of Convenience: Prolog and a Relational DBMS,"

Proceedings of the Third International Conference on Logic Programming,

Springer-Ver-lag, 1986

Bocca,] [1986a] "On the Evaluation Strategy of EDUCE," in SIGMOD [1986]

Bodorick,P.,Riordan, J., and Pyra, J [1992] "Deciding on Correct Distributed Query

Pro-cessing,"TKDE, 4:3, June 1992

Booch, G.,Rumbaugh, J., and Jacobson, I., Unified Modeling Language User Guide,

Addison-Wesley, 1999

Borgida,A, Brachman, R., McGuinness, D., and Resnick,L.[1989] "CLASSIC: A

Struc-tural Data Model for Objects," in SIGMOD [1989]

Borkin, S [1978] "Data Model Equivalence," in VLDB [1978]

Bouzeghoub, M., and Metals, E [1991] "Semantic Model1ing of Object-Oriented

Data-bases," in VLDB [1991]

Boyce, R., Chamberlin, D., King, w., and Hammer, M [1975] "Specifying Queries as

Relational Expressions," CACM, 18:11, November 1975

Trang 39

Bracchi, G., Paolini, P., and Pelagatti, G [1976] "Binary Logical Associations in DataModelling," in Nijssen [1976].

Brachman, R., and Levesque, H [1984] "What Makes a Knowledge Base Knowledgeable?

A View of Databases from the Knowledge Level," inEDS[1984]

Bratbergsengen, K [1984] "Hashing Methods and Relational Algebra Operators," inVLDB[1984]

Bray, O [1988] Computer Integrated Manufacturing-The Data Management Strategy,Digital Press, 1988

Breitbart,Y., Silberschatz, A.,and Thompson, G [1990] "Reliable Transaction ment in a Multidatabase System," inSIGMOD[1990]

Manage-Brodie, M., and Mylopoulos, J., eds [1985] On Knowledge Base Management Systems,Springer- Verlag, 1985

Brodie, M., Mvlopoulos, J., and Schmidt, J., eds [1984] On Conceptual Modeling,Springer-Verlag, 1984

Brosey, M., and Shneiderman, B [1978] "Two Experimental Comparisons of Relationaland Hierarchical Database Models," International Journal of Man-Machine Stud-ies, 1978

Bry,F [1990] "Query Evaluation in Recursive Databases: Bottom-up and Top-down onciled,"TKDE,2, 1990

Rec-Bukhres, O [1992] "Performance Comparison of Distributed Deadlock Detection rithms," inICDE[1992]

Algo-Buneman, P., and Frankel, R [1979]"FQL: A Functional Query Language," inSIGMOD

Byte [1995] Special Issue on Mobile Computing, June 1995

CACM [1995] Special issue of the Communications of the ACM, on Digital Libraries,38:5, May 1995

CACM [1998] Special issue of the Communications of the ACMon Digital Libraries: bal Scope and Unlimited Access, 41:4, April 1998

Glo-Cammarata, S., Ramachandra, P., and Shane, D [1989] "Extending a Relational base with Deferred Referential Integrity Checking and Intelligent Joins," inSIGMOD

Trang 40

Selected Bibliography I971

Carey, M., et a! [1986] "The Architecture of the EXODUS Extensible DBMS," in Dittrich

and Dayal [1986]

Carey, M., DeWitt, D., Richardson, J and Shekita, E [1986a] "Object and File

Manage-ment in theEXODUS Extensible Database System," inVLDB[1986]

Carey, M., DeWitt, D., and Vandenberg, S [1988] "A Data Model and Query Language

for Exodus," inSIGMOD[1988]

Carey, M., Franklin, M., Livny, M., and Shekita, E [1991] "Data Caching Tradeoffs in

Client-ServerDBMSArchitectures," inSIGMOD[1991]

Carlis, J [1986]"HAS, a Relational Algebra Operator or Divide Is Not Enough to

Con-quer," inICDE[1986]

Carlis, J., and March, S [1984] "A Descriptive Model of Physical Database Design

Prob-lems and Solutions," inICDE[1984]

Carroll, J M., [1995] Scenario Based Design: Envisioning Work and Technology in

System Development, Wiley, 1995

Casanova, M., Fagin, R., and Papadimitriou, C [1981] "Inclusion Dependencies and

Their Interaction with Functional Dependencies," inPODS[1981]

Casanova, M., Furtado, A., and Tuchermann, L [1991] "A Software Tool for Modular

Database Design,"TODS,16:2, June 1991

Casanova, M., Tuchermann,L.,Furtado, A., and Braga, A [1989] "Optimization of

Rela-tional Schemas Containing Inclusion Dependencies," inVLDB[1989]

Casanova, M., and Vidal,V.[1982] "Toward a Sound View Integration Method," inPODS

[1982]

Cattell, R., and Skeen, J [1992] "Object Operations Benchmark," TODS, 17:1, March

1992

Castano, S., DeAntonellio, V., Fugini, M.G., and Pernici, B [1998] "Conceptual Schema

Analysis: Techniques and Applications,"TODS,23:3, September 1998, pp 286-332

Castano, S., Fugini, M., MartellaG.,and Samarati, P [1995] Database Security, ACM

Press and Addison-Wesley, 1995

Catarci, T.,Costabile, M E, Santucci, G.,and Tarantino, L.,eds [1998] Proceedings of the

Fourth International WorkshoponAdvancedVisualInterfaces,ACM Press, 1998

Catarci,T.,Costabile, M E, Levialdi, S., and Batini,C [1997] "Visual Query Systems for

Databases: A Survey," Journal of Visual Languages and Computing, 8:2, June 1997,

Ceri, S., and Fraternali, P [1997] Designing Database Applications with Objects and

Rules: The IDEA Methodology, Addison-Wesley, 1997

Ceri, S., Gottlob, G.,Tanca, L [1990], Logic Programming and Databases,

Springer-Verlag, 1990

Định dạng
Số trang	99
Dung lượng	3,75 MB