Database Management systems phần 10 pdf

func-The transaction processing capabilities of database systems are improving continually.For example, many vendors offer distributed DBMS products today in which a transac-tion can exe

Trang 1

Additional Topics 823

developing transactions with the ACID properties In addition to providing a uniforminterface to the services of different resource managers, a TP monitor also routestransactions to the appropriate resource managers Finally, a TP monitor ensures that

an application behaves as a transaction by implementing concurrency control, logging,and recovery functions, and by exploiting the transaction processing capabilities of theunderlying resource managers

TP monitors are used in environments where applications require advanced featuressuch as access to multiple resource managers; sophisticated request routing (also called

workflow management); assigning priorities to transactions and doing

priority-based load-balancing across servers; and so on A DBMS provides many of the tions supported by a TP monitor in addition to processing queries and database up-dates efficiently A DBMS is appropriate for environments where the wealth of trans-action management capabilities provided by a TP monitor is not necessary and, inparticular, where very high scalability (with respect to transaction processing activ-ity) and interoperability are not essential

func-The transaction processing capabilities of database systems are improving continually.For example, many vendors offer distributed DBMS products today in which a transac-tion can execute across several resource managers, each of which is a DBMS Currently,all the DBMSs must be from the same vendor; however, as transaction-oriented servicesfrom different vendors become more standardized, distributed, heterogeneous DBMSsshould become available Eventually, perhaps, the functions of current TP monitorswill also be available in many DBMSs; for now, TP monitors provide essential infras-tructure for high-end transaction processing environments

28.1.2 New Transaction Models

Consider an application such as computer-aided design, in which users retrieve largedesign objects from a database and interactively analyze and modify them Eachtransaction takes a long time—minutes or even hours, whereas the TPC benchmarktransactions take under a millisecond—and holding locks this long affects performance.Further, if a crash occurs, undoing an active transaction completely is unsatisfactory,since considerable user effort may be lost Ideally we want to be able to restore most

of the actions of an active transaction and resume execution Finally, if several usersare concurrently developing a design, they may want to see changes being made byothers without waiting until the end of the transaction that changes the data

To address the needs of long-duration activities, several refinements of the transactionconcept have been proposed The basic idea is to treat each transaction as a collection

of related subtransactions Subtransactions can acquire locks, and the changes made

by a subtransaction become visible to other transactions after the subtransaction ends

(and before the main transaction of which it is a part commits) In multilevel

Trang 2

trans-824 Chapter 28

actions, locks held by a subtransaction are released when the subtransaction ends.

In nested transactions, locks held by a subtransaction are assigned to the parent

(sub)transaction when the subtransaction ends These refinements to the transactionconcept have a significant effect on concurrency control and recovery algorithms

be placed A soft deadline means the value of the transaction decreases after the

deadline, eventually going to zero For example, in a DBMS designed to monitor someactivity (e.g., a complex reactor), a transaction that looks up the current reading of asensor must be executed within a short time, say, one second The longer it takes toexecute the transaction, the less useful the reading becomes In a real-time DBMS, thegoal is to maximize the value of executed transactions, and the DBMS must prioritizetransactions, taking their deadlines into account

28.2 INTEGRATED ACCESS TO MULTIPLE DATA SOURCES

As databases proliferate, users want to access data from more than one source Forexample, if several travel agents market their travel packages through the Web, cus-tomers would like to look at packages from different agents and compare them Amore traditional example is that large organizations typically have several databases,created (and maintained) by different divisions such as Sales, Production, and Pur-chasing While these databases contain much common information, determining theexact relationship between tables in different databases can be a complicated prob-lem For example, prices in one database might be in dollars per dozen items, whileprices in another database might be in dollars per item The development of XML

DTDs (see Section 22.3.3) offers the promise that such semantic mismatches can be

avoided if all parties conform to a single standard DTD However, there are manylegacy databases and most domains still do not have agreed-upon DTDs; the problem

of semantic mismatches will be frequently encountered for the foreseeable future.Semantic mismatches can be resolved and hidden from users by defining relationalviews over the tables from the two databases Defining a collection of views to give

a group of users a uniform presentation of relevant data from multiple databases is

called semantic integration Creating views that mask semantic mismatches in a

natural manner is a difficult task and has been widely studied In practice, the task

is made harder by the fact that the schemas of existing databases are often poorly

Trang 3

in Chapter 5 Alternatively, the integrating views can be materialized and stored in

a data warehouse, as discussed in Chapter 23 Queries can then be executed over thewarehoused data without accessing the source DBMSs at run-time

28.3 MOBILE DATABASES

The availability of portable computers and wireless communications has created a newbreed of nomadic database users At one level these users are simply accessing adatabase through a network, which is similar to distributed DBMSs At another levelthe network as well as data and user characteristics now have several novel properties,which affect basic assumptions in many components of a DBMS, including the queryengine, transaction manager, and recovery manager;

Users are connected through a wireless link whose bandwidth is ten times lessthan Ethernet and 100 times less than ATM networks Communication costs aretherefore significantly higher in proportion to I/O and CPU costs

Users’ locations are constantly changing, and mobile computers have a limitedbattery life Therefore, the true communication costs reflect connection time andbattery usage in addition to bytes transferred, and change constantly depending

on location Data is frequently replicated to minimize the cost of accessing it fromdifferent locations

As a user moves around, data could be accessed from multiple database serverswithin a single transaction The likelihood of losing connections is also muchgreater than in a traditional network Centralized transaction management maytherefore be impractical, especially if some data is resident at the mobile comput-ers We may in fact have to give up on ACID transactions and develop alternativenotions of consistency for user programs

The price of main memory is now low enough that we can buy enough main memory

to hold the entire database for many applications; with 64-bit addressing, modernCPUs also have very large address spaces Some commercial systems now have several

gigabytes of main memory This shift prompts a reexamination of some basic DBMS

Trang 4

a bottleneck To minimize this problem, rather than commit each transaction as

it completes, we can collect completed transactions and commit them in batches;

this is called group commit Recovery algorithms can also be optimized since

pages rarely have to be written out to make room for other pages

The implementation of in-memory operations has to be optimized carefully sincedisk accesses are no longer the limiting factor for performance

A new criterion must be considered while optimizing queries, namely the amount

of space required to execute a plan It is important to minimize the space overheadbecause exceeding available physical memory would lead to swapping pages to disk(through the operating system’s virtual memory mechanisms), greatly slowingdown execution

Page-oriented data structures become less important (since pages are no longer theunit of data retrieval), and clustering is not important (since the cost of accessingany region of main memory is uniform)

In an object-relational DBMS, users can define ADTs with appropriate methods, which

is an improvement over an RDBMS Nonetheless, supporting just ADTs falls short of

what is required to deal with very large collections of multimedia objects, including

audio, images, free text, text marked up in HTML or variants, sequence data, andvideos Illustrative applications include NASA’s EOS project, which aims to create arepository of satellite imagery, the Human Genome project, which is creating databases

of genetic information such as GenBank, and NSF/DARPA’s Digital Libraries project,which aims to put entire libraries into database systems and then make them accessiblethrough computer networks Industrial applications such as collaborative development

of engineering designs also require multimedia database management, and are beingaddressed by several vendors

We outline some applications and challenges in this area:

Content-based retrieval: Users must be able to specify selection conditions

based on the contents of multimedia objects For example, users may search forimages using queries such as: “Find all images that are similar to this image” and

“Find all images that contain at least three airplanes.” As images are inserted into

Trang 5

the database, the DBMS must analyze them and automatically extract features

that will help answer such content-based queries This information can then beused to search for images that satisfy a given query, as discussed in Chapter 26

As another example, users would like to search for documents of interest usinginformation retrieval techniques and keyword searches Vendors are moving to-wards incorporating such techniques into DBMS products It is still not clear howthese domain-specific retrieval and search techniques can be combined effectivelywith traditional DBMS queries Research into abstract data types and ORDBMSquery processing has provided a starting point, but more work is needed

Managing repositories of large objects: Traditionally, DBMSs have

concen-trated on tables that contain a large number of tuples, each of which is relativelysmall Once multimedia objects such as images, sound clips, and videos are stored

in a database, individual objects of very large size have to be handled efficiently.For example, compression techniques must be carefully integrated into the DBMSenvironment As another example, distributed DBMSs must develop techniques

to efficiently retrieve such objects Retrieval of multimedia objects in a distributedsystem has been addressed in limited contexts, such as client-server systems, but

in general remains a difficult problem

Video-on-demand: Many companies want to provide video-on-demand services

that enable users to dial into a server and request a particular video The videomust then be delivered to the user’s computer in real time, reliably and inex-pensively Ideally, users must be able to perform familiar VCR functions such asfast-forward and reverse From a database perspective, the server has to contendwith specialized real-time constraints; video delivery rates must be synchronized

at the server and at the client, taking into account the characteristics of the munication network

Geographic Information Systems (GIS) contain spatial information about cities,

states, countries, streets, highways, lakes, rivers, and other geographical features, andsupport applications to combine such spatial information with non-spatial data Asdiscussed in Chapter 26, spatial data is stored in either raster or vector formats Inaddition, there is often a temporal dimension, as when we measure rainfall at severallocations over time An important issue with spatial data sets is how to integrate datafrom multiple sources, since each source may record data using a different coordinatesystem to identify locations

Now let us consider how spatial data in a GIS is analyzed Spatial information is mostnaturally thought of as being overlaid on maps Typical queries include “What citieslie on I-94 between Madison and Chicago?” and “What is the shortest route fromMadison to St Louis?” These kinds of queries can be addressed using the techniques

Trang 6

In addition, many applications involve interpolating measurements at certain locations

across an entire region to obtain a model, and combining overlapping models For

ex-ample, if we have measured rainfall at certain locations, we can use the TIN approach

to triangulate the region with the locations at which we have measurements being thevertices of the triangles Then, we use some form of interpolation to estimate therainfall at points within triangles Interpolation, triangulation, map overlays, visual-izations of spatial data, and many other domain-specific operations are supported inGIS products such as ESRI Systems’ ARC-Info Thus, while spatial query processingtechniques as discussed in Chapter 26 are an important part of a GIS product, con-siderable additional functionality must be incorporated as well How best to extendORDBMS systems with this additional functionality is an important problem yet to

be resolved Agreeing upon standards for data representation formats and coordinatesystems is another major challenge facing the field

Currently available DBMSs provide little support for queries over ordered collections

of records, or sequences, and over temporal data Typical sequence queries include

“Find the weekly moving average of the Dow Jones Industrial Average,” and “Find thefirst five consecutively increasing temperature readings” (from a trace of temperatureobservations) Such queries can be easily expressed and often efficiently executed bysystems that support query languages designed for sequences Some commercial SQLsystems now support such SQL extensions

The first example is also a temporal query However, temporal queries involve morethan just record ordering For example, consider the following query: “Find the longestinterval in which the same person managed two different departments.” If the period

during which a given person managed a department is indicated by two fields from and

to, we have to reason about a collection of intervals, rather than a sequence of records.

Further, temporal queries require the DBMS to be aware of the anomalies associatedwith calendars (such as leap years) Temporal extensions are likely to be incorporated

in future versions of the SQL standard

A distinct and important class of sequence data consists of DNA sequences, which arebeing generated at a rapid pace by the biological community These are in fact closer

to sequences of characters in text than to time sequences as in the above examples.The field of biological information management and analysis has become very popular

Trang 7

in recent years, and is called bioinformatics Biological data, such as DNA sequence

data, is characterized by complex structure and numerous relationships among dataelements, many overlapping and incomplete or erroneous data fragments (because ex-perimentally collected data from several groups, often working on related problems,

is stored in the databases), a need to frequently change the database schema itself as

new kinds of relationships in the data are discovered, and the need to maintain severalversions of data for archival and reference

28.8 INFORMATION VISUALIZATION

As computers become faster and main memory becomes cheaper, it becomes ingly feasible to create visual presentations of data, rather than just text-based reports.Data visualization makes it easier for users to understand the information in largecomplex datasets The challenge here is to make it easy for users to develop visualpresentation of their data and to interactively query such presentations Although anumber of data visualization tools are available, efficient visualization of large datasetspresents many challenges

increas-The need for visualization is especially important in the context of decision support;when confronted with large quantities of high-dimensional data and various kinds ofdata summaries produced by using analysis tools such as SQL, OLAP, and data miningalgorithms, the information can be overwhelming Visualizing the data, together withthe generated summaries, can be a powerful way to sift through this information andspot interesting trends or patterns The human eye, after all, is very good at findingpatterns A good framework for data mining must combine analytic tools to processdata, and bring out latent anomalies or trends, with a visualization environment inwhich a user can notice these patterns and interactively drill down to the original datafor further analysis

The database area continues to grow vigorously, both in terms of technology and interms of applications The fundamental reason for this growth is that the amount ofinformation stored and processed using computers is growing rapidly Regardless ofthe nature of the data and its intended applications, users need database managementsystems and their services (concurrent access, crash recovery, easy and efficient query-ing, etc.) as the volume of data increases As the range of applications is broadened,however, some shortcomings of current DBMSs become serious limitations Theseproblems are being actively studied in the database research community

The coverage in this book provides a good introduction, but is not intended to coverall aspects of database systems Ample material is available for further study, as this

Trang 8

intro-Determining which entities are the same across different databases is a difficult problem;

it is an example of a semantic mismatch Resolving such mismatches has been addressed

in many papers, including [362, 412, 558, 576] [329] is an overview of theoretical work inthis area Also see the bibliographic notes for Chapter 21 for references to related work onmultidatabases, and see the notes for Chapter 2 for references to work on view integration.[260] is an early paper on main memory databases [345, 89] describe the Dali main memorystorage manager [359] surveys visualization idioms designed for large databases, and [291]discusses visualization for data mining

Visualization systems for databases include DataSpace [515], DEVise [424], IVEE [23], theMineset suite from SGI, Tioga [27], and VisDB [358] In addition, a number of general toolsare available for data visualization

Querying text repositories has been studied extensively in information retrieval; see [545] for

a recent survey This topic has generated considerable interest in the database communityrecently because of the widespread use of the Web, which contains many text sources Inparticular, HTML documents have some structure if we interpret links as edges in a graph.Such documents are examples of semistructured data; see [2] for a good overview Recentpapers on queries over the Web include [2, 384, 457, 493]

See [501] for a survey of multimedia issues in database management There has been muchrecent interest in database issues in a mobile computing environment, for example, [327, 337].See [334] for a collection of articles on this subject [639] contains several articles that coverall aspects of temporal databases The use of constraints in databases has been activelyinvestigated in recent years; [356] is a good overview Geographic Information Systems havealso been studied extensively; [511] describes the Paradise system, which is notable for itsscalability

The book [695] contains detailed discussions of temporal databases (including the TSQL2language, which is influencing the SQL standard), spatial and multimedia databases, anduncertainty in databases Another SQL extension to query sequence data, called SRQL, isproposed in [532]

Trang 9

A DATABASE DESIGN CASE STUDY: THE INTERNET SHOP

Advice for software developers and horse racing enthusiasts: Avoid hacks

—Anonymous

We now present an illustrative, ‘cradle-to-grave’ design example DBDudes Inc., awell-known database consulting firm, has been called in to help Barns and Nobble(B&N) with their database design and implementation B&N is a large bookstorespecializing in books on horse racing, and they’ve decided to go online DBDudes firstverify that B&N is willing and able to pay their steep fees and then schedule a lunchmeeting—billed to B&N, naturally—to do requirements analysis

The owner of B&N has thought about what he wants and offers a concise summary:

“I would like my customers to be able to browse my catalog of books and to place ordersover the Internet Currently, I take orders over the phone I have mostly corporatecustomers who call me and give me the ISBN number of a book and a quantity Ithen prepare a shipment that contains the books they have ordered If I don’t haveenough copies in stock, I order additional copies and delay the shipment until the newcopies arrive; I want to ship a customer’s entire order together My catalog includesall the books that I sell For each book, the catalog contains its ISBN number, title,author, purchase price, sales price, and the year the book was published Most of mycustomers are regulars, and I have records with their name, address, and credit cardnumber New customers have to call me first and establish an account before they canuse my Web site

On my new Web site, customers should first identify themselves by their unique tomer identification number Then they should be able to browse my catalog and toplace orders online.”

cus-DBDudes’s consultants are a little surprised by how quickly the requirements phasewas completed—it usually takes them weeks of discussions (and many lunches anddinners) to get this done—but return to their offices to analyze this information

831

Trang 10

832 Appendix A

In the conceptual design step, DBDudes develop a high level description of the data

in terms of the ER model Their initial design is shown in Figure A.1 Books andcustomers are modeled as entities and are related through orders that customers place.Orders is a relationship set connecting the Books and Customers entity sets For eachorder, the following attributes are stored: quantity, order date, and ship date As soon

as an order is shipped, the ship date is set; until then the ship date is set to null,

indicating that this order has not been shipped yet

DBDudes has an internal design review at this point, and several questions are raised

To protect their identities, we will refer to the design team leader as Dude 1 and thedesign reviewer as Dude 2:

Dude 2: What if a customer places two orders for the same book on the same day? Dude 1: The first order is handled by creating a new Orders relationship and the second

order is handled by updating the value of the quantity attribute in this relationship

Dude 2: What if a customer places two orders for different books on the same day? Dude 1: No problem Each instance of the Orders relationship set relates the customer

to a different book

Dude 2: Ah, but what if a customer places two orders for the same book on different

days?

Dude 1: We can use the attribute order date of the orders relationship to distinguish

the two orders

Dude 2: Oh no you can’t The attributes of Customers and Books must jointly contain

a key for Orders So this design does not allow a customer to place orders for the samebook on different days

Dude 1: Yikes, you’re right Oh well, B&N probably won’t care; we’ll see.

DBDudes decides to proceed with the next phase, logical database design

Using the standard approach discussed in Chapter 3, DBDudes maps the ER diagramshown in Figure A.1 to the relational model, generating the following tables:

CREATE TABLE Books ( isbn CHAR(10),

Trang 11

Design Case Study: An Internet Shop 833

isbn

title price

year_published

qty_in_stock author

cardnum cid

Customers

address cname

order_date ship_date qty

Orders Books

Figure A.1 ER Diagram of the Initial Design

CREATE TABLE Orders ( isbn CHAR(10),

order date DATE,ship date DATE,PRIMARY KEY (isbn,cid),FOREIGN KEY (isbn) REFERENCES Books,FOREIGN KEY (cid) REFERENCES Customers )

CREATE TABLE Customers ( cid INTEGER,

cname CHAR(80),address CHAR(200),cardnum CHAR(16),PRIMARY KEY (cid)UNIQUE (cardnum))

The design team leader, who is still brooding over the fact that the review exposed

a flaw in the design, now has an inspiration The Orders table contains the field

order date and the key for the table contains only the fields isbn and cid Because of

this, a customer cannot order the same book on different days, a restriction that was

not intended Why not add the order date attribute to the key for the Orders table?

This would eliminate the unwanted restriction:

CREATE TABLE Orders ( isbn CHAR(10),

PRIMARY KEY (isbn,cid,ship date), )

The reviewer, Dude 2, is not entirely happy with this solution, which he calls a ‘hack’

He points out that there is no natural ER diagram that reflects this design, and stresses

Trang 12

834 Appendix A

the importance of the ER diagram as a design document Dude 1 argues that whileDude 2 has a point, it is important to present B&N with a preliminary design and getfeedback; everyone agrees with this, and they go back to B&N

The owner of B&N now brings up some additional requirements that he did not mentionduring the initial discussions: “Customers should be able to purchase several differentbooks in a single order For example, if a customer wants to purchase three copies of

‘The English Teacher’ and two copies of ‘The Character of Physical Law,’ the customershould be able to place a single order for both books.”

The design team leader, Dude 1, asks how this affects the shippping policy Does B&Nstill want to ship all books in an order together? The owner of B&N explains theirshipping policy: “As soon as we have have enough copies of an ordered book we ship

it, even if an order contains several books So it could happen that the three copies

of ‘The English Teacher’ are shipped today because we have five copies in stock, butthat ‘The Character of Physical Law’ is shipped tomorrow, because we currently haveonly one copy in stock and another copy arrives tomorrow In addition, my customerscould place more than one order per day, and they want to be able to identify theorders they placed.”

The DBDudes team thinks this over and identifies two new requirements: first, itmust be possible to order several different books in a single order, and second, acustomer must be able to distinguish between several orders placed the same day Toaccomodate these requirements, they introduce a new attribute into the Orders table

called ordernum, which uniquely identifies an order and therefore the customer placing the order However, since several books could be purchased in a single order, ordernum and isbn are both needed to determine qty and ship date in the Orders table.

Orders are assigned order numbers sequentially and orders that are placed later havehigher order numbers If several orders are placed by the same customer on a singleday, these orders have different order numbers and can thus be distinguished TheSQL DDL statement to create the modified Orders table is given below:

CREATE TABLE Orders ( ordernum INTEGER,

Trang 13

Next, DBDudes analyzes the set of relations for possible redundancy The Books

rela-tion has only one key (isbn), and no other funcrela-tional dependencies hold over the table Thus, Books is in BCNF The Customers relation has the key (cid), and since a credit card number uniquely identifies its card holder, the functional dependency cardnum → cid also holds Since cid is a key, cardnum is also a key No other dependencies hold,

and so Customers is also in BCNF

DBDudes has already identified the pair hordernum, isbni as the key for the Orders

table In addition, since each order is placed by one customer on one specific date, thefollowing two functional dependencies hold:

ordernum → cid, and ordernum → order date

The experts at DBDudes conclude that Orders is not even in 3NF (Can you see why?)They decide to decompose Orders into the following two relations:

Orders(ordernum, cid, order date, and

Orderlists(ordernum, isbn, qty, ship date)

The resulting two relations, Orders and Orderlists, are both in BCNF, and the

decom-position is lossless-join since ordernum is a key for (the new) Orders The reader is

invited to check that this decomposition is also dependency-preserving For ness, we give the SQL DDL for the Orders and Orderlists relations below:

complete-CREATE TABLE Orders ( ordernum INTEGER,

order date DATE,PRIMARY KEY (ordernum),FOREIGN KEY (cid) REFERENCES Customers )

CREATE TABLE Orderlists ( ordernum INTEGER,

ship date DATE,PRIMARY KEY (ordernum, isbn),FOREIGN KEY (isbn) REFERENCES Books)

Figure A.2 shows an updated ER diagram that reflects the new design Note thatDBDudes could have arrived immediately at this diagram if they had made Orders anentity set instead of a relationship set right at the beginning But at that time they didnot understand the requirements completely, and it seemed natural to model Orders

Trang 14

cardnum cid

Customers

address cname

Orders Books

qty ship_date order_date Order_List Place_Order

ordernum

Figure A.2 ER Diagram Reflecting the Final Design

as a relationship set This iterative refinement process is typical of real-life databasedesign processes As DBDudes has learned over time, it is rare to achieve an initialdesign that is not changed as a project progresses

The DBDudes team celebrates the successful completion of logical database design andschema refinement by opening a bottle of champagne and charging it to B&N Afterrecovering from the celebration, they move on to the physical design phase

A.5 PHYSICAL DATABASE DESIGN

Next, DBDudes considers the expected workload The owner of the bookstore expectsmost of his customers to search for books by ISBN number before placing an order.Placing an order involves inserting one record into the Orders table and inserting one

or more records into the Orderlists relation If a sufficient number of books is available,

a shipment is prepared and a value for the ship date in the Orderlists relation is set In

addition, the available quantities of books in stocks changes all the time since ordersare placed that decrease the quantity available and new books arrive from suppliersand increase the quantity available

The DBDudes team begins by considering searches for books by ISBN Since isbn is

a key, an equality query on isbn returns at most one record Thus, in order to speed

up queries from customers who look for books with a given ISBN, DBDudes decides

to build an unclustered hash index on isbn.

Next, they consider updates to book quantities To update the qty in stock value for

a book, we must first search for the book by ISBN; the index on isbn speeds this

up Since the qty in stock value for a book is updated quite frequently, DBDudes also

considers partitioning the Books relation vertically into the following two relations:

Trang 15

BooksQty(isbn, qty), and

BookRest(isbn, title, author, price, year published).

Unfortunately, this vertical partition would slow down another very popular query:Equality search on ISBN to retrieve full information about a book would require ajoin between BooksQty and BooksRest So DBDudes decide not to vertically partitionBooks

DBDudes thinks it is likely that customers will also want to search for books by title

and by author, and decides to add unclustered hash indexes on title and author—these

indexes are inexpensive to maintain because the set of books is rarely changed eventhough the quantity in stock for a book changes often

Next, they consider the Customers relation A customer is first identified by the uniquecustomer identifaction number Thus, the most common queries on Customers areequality queries involving the customer identification number, and DBDudes decides

to build a clustered hash index on cid to achieve maximum speedup for this query.

Moving on to the Orders relation, they see that it is involved in two queries: insertion

of new orders and retrieval of existing orders Both queries involve the ordernum

attribute as search key and so they decide to build an index on it What type ofindex should this be—a B+ tree or a hash index? Since order numbers are assigned

sequentially and thus correspond to the order date, sorting by ordernum effectively

sorts by order date as well Thus DBDudes decides to build a clustered B+ tree index

on ordernum Although the operational requirements that have been mentioned until

know favor neither a B+ tree nor a hash index, B&N will probably want to monitordaily activities, and the clustered B+ tree is a better choice for such range queries Ofcourse, this means that retrieving all orders for a given customer could be expensive

for customers with many orders, since clustering by ordernum precludes clustering by other attributes, such as cid.

The Orderlists relation mostly involves insertions, with an occasional update of ashipment date or a query to list all components of a given order If Orderlists is kept

sorted on ordernum, all insertions are appends at the end of the relation and thus very efficient A clustered B+ tree index on ordernum maintains this sort order and also

speeds up retrieval of all items for a given order To update a shipment date, we need

to search for a tuple by ordernum and isbn The index on ordernum helps here as well.

Although an index on hordernum, isbni would be better for this purpose, insertions

would not be as efficient as with an index on just ordernum; DBDudes therefore decides

to index Orderlists on just ordernum.

Trang 16

838 Appendix A

A.5.1 Tuning the Database

We digress from our discussion of the initial design to consider a problem that arisesseveral months after the launch of the B&N site DBDudes is called in and told thatcustomer enquiries about pending orders are being processed very slowly B&N hasbecome very successful, and the Orders and Orderlists tables have grown huge.Thinking further about the design, DBDudes realizes that there are two types of orders:

completed orders, for which all books have already shipped, and partially completed ders, for which some books are yet to be shipped Most customer requests to look up

or-an order involve partially completed orders, which are a small fraction of all orders.DBDudes therefore decides to horizontally partition both the Orders table and the Or-

derlists table by ordernum This results in four new relations: NewOrders, OldOrders,

NewOrderlists, and OldOrderlists

An order and its components are always in exactly one pair of relations—and we

can determine which pair, old or new, by a simple check on ordernum—and queries

involving that order can always be evaluated using only the relevant relations Somequeries are now slower, such as those asking for all of a customer’s orders, since theyrequire us to search two sets of relations However, these queries are infrequent andtheir performance is acceptable

Returning to our discussion of the initial design phase, recall that DBDudes completedphysical database design Next, they address security There are three groups of users:customers, employees, and the owner of the book shop (Of course, there is also thedatabase administrator who has universal access to all data and who is responsible forregular operation of the database system.)

The owner of the store has full privileges on all tables Customers can query the Bookstable and can place orders online, but they should not have access to other customers’records nor to other customers’ orders DBDudes restricts access in two ways First,they design a simple Web page with several forms similar to the page shown in Figure22.1 in Chapter 22 This allows customers to submit a small collection of valid requestswithout giving them the ability to directly access the underlying DBMS through anSQL interface Second, they use the security features of the DBMS to limit access tosensitive data

The Web page allows customers to query the Books relation by ISBN number, name ofthe author, and title of a book The Web page also has two buttons The first buttonretrieves a list of all of the customer’s orders that are not completely fulfilled yet Thesecond button will display a list of all completed orders for that customer Note that

Trang 17

customers cannot specify actual SQL queries through the Web; they can only fill insome parameters in a form to instantiate an automatically generated SQL query Allqueries that are generated through form input have a WHERE clause that includes the

cid attribute value of the current customer, and evaluation of the queries generated

by the two buttons requires knowledge of the customer identification number Sinceall users have to log on to the Web site before browsing the catalog, the business logic(discussed in Section A.7) must maintain state information about a customer (i.e., thecustomer identification number) during the customer’s visit to the Web site

The second step is to configure the database to limit access according to each usergroup’s need to know DBDudes creates a special customer account that has thefollowing privileges:

SELECT ON Books, NewOrders, OldOrders, NewOrderlists, OldOrderlistsINSERT ON NewOrders, OldOrders, NewOrderlists, OldOrderlists

Employees should be able to add new books to the catalog, update the quantity of abook in stock, revise customer orders if necessary, and update all customer information

except the credit card information In fact, employees should not even be able to see a

customer’s credit card number Thus, DBDudes creates the following view:

CREATE VIEW CustomerInfo (cid,cname,address)

AS SELECT C.cid, C.cname, C.addressFROM Customers C

They give the employee account the following privileges:

SELECT ON CustomerInfo, Books,

NewOrders, OldOrders, NewOrderlists, OldOrderlistsINSERT ON CustomerInfo, Books,

NewOrders, OldOrders, NewOrderlists, OldOrderlistsUPDATE ON CustomerInfo, Books,

NewOrders, OldOrders, NewOrderlists, OldOrderlistsDELETE ON Books, NewOrders, OldOrders, NewOrderlists, OldOrderlists

In addition, there are security issues when the user first logs on to the Web site usingthe customer identification number Sending the number unencrypted over the Internet

is a security hazard, and a secure protocol such as the SSL should be used

There are companies such as CyberCash and DigiCash that offer electronic commercepayment solutions, even including ‘electronic’ cash Discussion of how to incorporatesuch techniques into the Website are outside the scope of this book

Trang 18

840 Appendix A

DBDudes now moves on to the implementation of the application layer and considersalternatives for connecting the DBMS to the World-Wide Web (see Chapter 22).DBDudes note the need for session management For example, users who log in tothe site, browse the catalog, and then select books to buy do not want to re-entertheir customer identification number Session management has to extend to the wholeprocess of selecting books, adding them to a shopping cart, possibly removing booksfrom the cart, and then checking out and paying for the books

DBDudes then considers whether Web pages for books should be static or dynamic

If there is a static Web page for each book, then we need an extra database field inthe Books relation that points to the location of the file Even though this enablesspecial page designs for different books, it is a very labor intensive solution DBDudesconvinces B&N to dynamically assemble the Web page for a book from a standardtemplate instantiated with information about the book in the Books relation

This leaves DBDudes with one final decision, namely how to connect applications tothe DBMS They consider the two main alternatives that we presented in Section 22.2:CGI scripts versus using an application server infrastructure If they use CGI scripts,they would have to encode session management logic—not an easy task If they use

an application server, they can make use of all the functionality that the applicationserver provides Thus, they recommend that B&N implement server-side processingusing an application server

B&N, however, refuses to pay for an application server and decides that for theirpurposes CGI scripts are fine DBDudes accepts B&N’s decision and proceeds to buildthe following pieces:

The top level HTML pages that allow users to navigate the site, and various formsthat allow users to search the catalog by ISBN, author name, or author title Anexample page containing a search form is shown in Figure 22.1 in Chapter 22 Inaddition to the input forms, DBDudes must develop appropriate presentations forthe results

The logic to track a customer session Relevant information must be stored either

in a server-side data structure or be cached in hte customer’s browser using a

mechanism like cookies Cookies are pieces of information that a Web server

can store in a user’s Web browser Whenever the user generates a request, thebrowser passes along the stored information, thereby enabling the Web server to

‘remember’ what the user did earlier

The scripts that process the user requests For example, a customer can use aform called ‘Search books by title’ to type in a title and search for books with that

Trang 19

title The CGI interface communicates with a script that processes the request

An example of such a script written in Perl using DBI for data access is shown inFigure 22.4 in Chapter 22

For completeness, we remark that if B&N had agreed to use an application server,DBDudes would have had the following tasks:

As in the CGI-based architecture, they would have to design top level pages thatallow customers to navigate the Web site as well as various search forms and resultpresentations

Assuming that DBDudes select a Java-based application server, they have to writeJava Servlets to process form-generated requests Potentially, they could reuseexisting (possibly commercially available) JavaBeans They can use JDBC as adatabase interface; examples of JDBC code can be found in Section 5.10 Instead

of programming Servlets, they could resort to Java Server Pages and annotatepages with special JSP markup tags An example of a Web page that includesJSP commands is shown in Section 22.2.1

If DBDudes select an application server that uses proprietary markup tags, theyhave to develop Web pages by using such tags An example using Cold Fusionmarkup tags can be found in Section 22.2.1

Our discussion thus far only covers the ‘client-interface’, the part of the Web site that

is exposed to B&N’s customers DBDudes also need to add applications that allowthe employees and the shop owner to query and access the database and to generatesummary reports of business activities

This completes our discussion of Barns and Nobble While this study only describes

a small part of a real problem, we saw that a design even at this scale involved trivial tradeoffs We would like to emphasize again that database design is an iterativeprocess and that therefore it is very important not to lock oneself down early on in afixed model that is too inflexible to accomodate a changing environment Welcome tothe exciting world of database management!

Trang 20

non-B THE MINIBASE SOFTWARE

Practice is the best of all instructors

—Publius Syrus, 42 B.C

Minibase is a small relational DBMS, together with a suite of visualization tools, that

has been developed for use with this book While the book makes no direct reference tothe software and can be used independently, Minibase offers instructors an opportunity

to design a variety of hands-on assignments, with or without programming To see anonline description of the software, visit this URL:

http://www.cs.wisc.edu/˜ dbbook/minibase.html

The software is available freely through ftp By registering themselves as users atthe URL for the book, instructors can receive prompt notification of any major bugreports and fixes Sample project assignments, which elaborate upon some of the

briefly sketched ideas in the project-based exercises at the end of chapters, can be seen

at

http://www.cs.wisc.edu/˜ dbbook/minihwk.html

Instructors should consider making small modifications to each assignment to age undesirable ‘code reuse’ by students; assignment handouts formatted using Latexare available by ftp Instructors can also obtain solutions to these assignments bycontacting the authors (raghu@cs.wisc.edu, johannes@cs.cornell.edu)

Minibase is intended to supplement the use of a commercial DBMS such as Oracle orSybase in course projects, not to replace them While a commercial DBMS is idealfor SQL assignments, it does not allow students to understand how the DBMS works.Minibase is intended to address the latter issue; the subset of SQL that it supports isintentionally kept small, and students should also be asked to use a commercial DBMSfor writing SQL queries and programs Minibase is provided on an as-is basis with nowarranties or restrictions for educational or personal use It includes the following:

842

Trang 21

The Minibase Software 843

Code for a small single-user relational DBMS, including a parser and query mizer for a subset of SQL, and components designed to be (re)written by students

opti-as project opti-assignments: heap files, buffer manager, B+ trees, sorting, and joins.

Graphical visualization tools to aid in students’ exploration and understanding of

the behavior of the buffer management, B+ tree, and query optimization

compo-nents of the system There is also a graphical tool to refine a relational database

design using normalization.

Several assignments involving the use of Minibase are described below Each of thesehas been tested in a course already, but the details of how Minibase is set up might vary

at your school, so you may have to modify the assignments accordingly If you plan touse these assignments, you are advised to download and try them at your site well inadvance of handing them to students We have done our best to test and documentthese assignments, and the Minibase software, but bugs undoubtedly persist Pleasereport bugs at this URL:

http://www.cs.wisc.edu/˜ dbbook/minibase.comments.html

I hope that users will contribute bug fixes, additional project assignments, and sions to Minibase These will be made publicly available through the Minibase site,together with pointers to the authors

exten-B.2.1 Overview of Programming Projects

In several assignments, students are asked to rewrite a component of Minibase Thebook provides the necessary background for all of these assignments, and the assign-ment handout provides additional system-level details The online HTML documen-tation provides an overview of the software, in particular the component interfaces,and can be downloaded and installed at each school that uses Minibase The projectslisted below should be assigned after covering the relevant material from the indicatedchapter

Buffer manager (Chapter 7): Students are given code for the layer that

man-ages space on disk and supports the concept of pman-ages with page ids They areasked to implement a buffer manager that brings requested pages into memory ifthey are not already there One variation of this assignment could use differentreplacement policies Students are asked to assume a single-user environment,with no concurrency control or recovery management

HF page (Chapter 7): Students must write code that manages records on a

page using a slot-directory page format to keep track of records on a page Possible

Trang 22

844 Appendix B

variants include fixed-length versus variable-length records and other ways to keeptrack of records on a page

Heap files (Chapter 7): Using the HF page and buffer manager code, students

are asked to implement a layer that supports the abstraction of files of unorderedpages, that is, heap files

B+ trees (Chapter 9): This is one of the more complex assignments Students

have to implement a page class that maintains records in sorted order within apage and implement the B+ tree index structure to impose a sort order acrossseveral leaf-level pages Indexes storehkey, record-pointeri pairs in leaf pages, and

data records are stored separately (in heap files) Similar assignments can easily

be created for Linear Hashing or Extendible Hashing index structures

External sorting (Chapter 11): Building upon the buffer manager and heap

file layers, students are asked to implement external merge-sort The emphasis is

on minimizing I/O, rather than on the in-memory sort used to create sorted runs

Sort-merge join (Chapter 12): Building upon the code for external sorting,

students are asked to implement the sort-merge join algorithm This assignmentcan be easily modified to create assignments that involve other join algorithms

Index nested-loop join (Chapter 12): This assignment is similar to the

sort-merge join assignment, but relies on B+ tree (or other indexing) code, instead ofsorting code

B.2.2 Overview of Nonprogramming Assignments

Four assignments that do not require students to write any code (other than SQL, inone assignment) are also available

Optimizer exercises (Chapter 13): The Minibase optimizer visualizer offers

a flexible tool to explore how a typical relational query optimizer works It cepts single-block SQL queries (including some queries that cannot be executed

ac-in Mac-inibase, such as queries ac-involvac-ing groupac-ing and aggregate operators) dents can inspect and modify synthetic catalogs, add and drop indexes, enable ordisable different join algorithms, enable or disable index-only evaluation strate-gies, and see the effect of such changes on the plan produced for a given query.All (sub)plans generated by an iterative System R style optimizer can be viewed,ordered by the iteration in which they are generated, and details on a given plancan be obtained readily All interaction with the optimizer visualizer is through aGUI and requires no programming

Stu-The assignment introduces students to this tool and then requires them to answerquestions involving specific catalogs, queries, and plans generated by controllingvarious parameters

Trang 23

The Minibase Software 845

Buffer manager viewer (Chapter 12): This viewer lets students visualize

how pages are moved in and out of the buffer pool, their status (e.g., dirty bit,pin count) while in the pool, and some statistics (e.g., number of hits) The as-signment requires students to generate traces by modifying some trace-generationcode (provided) and to answer questions about these traces by using the visual-izer to look at them While this assignment can be used after covering Chapter

7, deferring it until after Chapter 12 enables students to examine traces that arerepresentative of different relational operations

B+ tree viewer (Chapter 9): This viewer lets students see a B+ tree as it is

modified through insert and delete statements The assignment requires students

to work with trace files, answer questions about them, and generate operationtraces (i.e., a sequence of inserts and deletes) that create specified kinds of trees

Normalization tool (Chapter 15): The normalization viewer is a tool for

nor-malizing relational tables It supports the concept of a refinement session, in

which a schema is decomposed repeatedly and the resulting decomposition tree isthen saved For a given schema, a user might consider several alternative decom-positions (more precisely, decomposition trees), and each of these can be saved

as a refinement session Refinement sessions are a very flexible and convenientmechanism for trying out several alternative decomposition strategies The nor-malization assignment introduces students to this tool and asks design-orientedquestions involving the use of the tool

Assignments that require students to evaluate various components can also be oped For example, students can be asked to compare different join methods, differentindex methods, and different buffer management policies

The Minibase software was inpired by Minirel, a small relational DBMS developed byDavid DeWitt for instructional use Minibase was developed by a large number ofdedicated students over a long time, and the design was guided by Mike Carey and R.Ramakrishnan See the online documentation for more on Minibase’s history

Trang 25

[1] R Abbott and H Garcia-Molina Scheduling real-time transactions: a performance

evaluation ACM Transactions on Database Systems, 17(3), 1992.

[2] S Abiteboul Querying semi-structured data In Intl Conf on Database Theory, 1997 [3] S Abiteboul, R Hull, and V Vianu Foundations of Databases Addison-Wesley, 1995 [4] S Abiteboul and P Kanellakis Object identity as a query language primitive In Proc.

ACM SIGMOD Conf on the Management of Data, 1989.

[5] S Abiteboul and V Vianu Regular path queries with constraints In Proc ACM Symp.

on Principles of Database Systems, 1997.

[6] K Achyutuni, E Omiecinski, and S Navathe Two techniques for on-line index

mod-ification in shared nothing parallel databases In Proc ACM SIGMOD Conf on the

Management of Data, 1996.

[7] S Adali, K Candan, Y Papakonstantinou, and V Subrahmanian Query caching and

optimization in distributed mediator systems In Proc ACM SIGMOD Conf on the

[8] M E Adiba Derived relations: A unified mechanism for views, snapshots and

dis-tributed data In Proc Intl Conf on Very Large Databases, 1981.

[9] S Agarwal, R Agrawal, P Deshpande, A Gupta, J Naughton, R Ramakrishnan, and

S Sarawagi On the computation of multidimensional aggregates In Proc Intl Conf.

on Very Large Databases, 1996.

[10] D Agrawal and A El Abbadi The generalized tree quorum protocol: an efficient

approach for managing replicated data ACM Transactions on Database Systems, 17(4),

1992

[11] D Agrawal, A El Abbadi, and R Jeffers Using delayed commitment in locking

pro-tocols for real-time databases In Proc ACM SIGMOD Conf on the Management of

Data, 1992.

[12] R Agrawal, M Carey, and M Livny Concurrency control performance-modeling:

Alternatives and implications In Proc ACM SIGMOD Conf on the Management of

Data, 1985.

[13] R Agrawal and D DeWitt Integrated concurrency control and recovery

mecha-nisms: Design and performance evaluation ACM Transactions on Database Systems,

10(4):529–564, 1985

[14] R Agrawal and N Gehani ODE (Object Database and Environment): The language

and the data model In Proc ACM SIGMOD Conf on the Management of Data, 1989.

[15] R Agrawal, J E Gehrke, D Gunopulos, and P Raghavan Automatic subspace

clus-tering of high dimensional data for data mining In Proc ACM SIGMOD Conf on

[16] R Agrawal, T Imielinski, and A Swami Database mining: A performance perspective

IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925, December 1993.

[17] R Agrawal, H Mannila, R Srikant, H Toivonen, and A I Verkamo Fast Discovery

of Association Rules In U M Fayyad, G Piatetsky-Shapiro, P Smyth, and R

Uthu-rusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 12, pages

307–328 AAAI/MIT Press, 1996

847

Trang 26

848 Database Management Systems

[18] R Agrawal, G Psaila, E Wimmers, and M Zaot Querying shapes of histories In

Proc Intl Conf on Very Large Databases, 1995.

[19] R Agrawal and J Shafer Parallel mining of association rules IEEE Transactions on

Knowledge and Data Engineering, 8(6):962–969, 1996.

[20] R Agrawal and R Srikant Mining sequential patterns In Proc IEEE Intl Conf on

Data Engineering, 1995.

[21] R Agrawal, P Stolorz, and G Piatetsky-Shapiro, editors Proc Intl Conf on

Knowl-edge Discovery and Data Mining AAAI Press, 1998.

[22] R Ahad, K BapaRao, and D McLeod On estimating the cardinality of the projection

of a database relation ACM Transactions on Database Systems, 14(1):28–40, 1989.

[23] C Ahlberg and E Wistrand IVEE: an information visualization exploration

environ-ment In Intl Symp on Information Visualization, 1995.

[24] A Aho, C Beeri, and J Ullman The theory of joins in relational databases ACM

Transactions on Database Systems, 4(3):297–314, 1979.

[25] A Aho, J Hopcroft, and J Ullman The Design and Analysis of Computer Algorithms.

Addison-Wesley, 1983

[26] A Aho, Y Sagiv, and J Ullman Equivalences among relational expressions SIAM

Journal of Computing, 8(2):218–246, 1979.

[27] A Aiken, J Chen, M Stonebraker, and A Woodruff Tioga-2: A direct manipulation

database visualization environment In Proc IEEE Intl Conf on Data Engineering,

1996

[28] A Aiken, J Widom, and J Hellerstein Static analysis techniques for predicting the

behavior of active database rules ACM Transactions on Database Systems, 20(1):3–41,

1995

[29] E Anwar, L Maugis, and U Chakravarthy A new perspective on rule support for

object-oriented databases In Proc ACM SIGMOD Conf on the Management of Data,

1993

[30] K Apt, H Blair, and A Walker Towards a theory of declarative knowledge In

Foundations of Deductive Databases and Logic Programming J Minker (ed.), Morgan

Kaufmann, 1988

[31] W Armstrong Dependency structures of database relationships In Proc IFIP

Congress, 1974.

[32] G Arocena and A O Mendelzon WebOQL: restructuring documents, databases and

webs In Proc Intl Conf on Data Engineering, 1988.

[33] M Astrahan, M Blasgen, D Chamberlin, K Eswaran, J Gray, P Griffiths, W King,

R Lorie, P McJones, J Mehl, G Putzolu, I Traiger, B Wade, and V Watson System

R: A relational approach to database management ACM Transactions on Database

Systems, 1(2):97–137, 1976.

[34] M Atkinson, P Bailey, K Chisholm, P Cockshott, and R Morrison An approach to

persistent programming In Readings in Object-Oriented Databases eds S.B Zdonik

and D Maier, Morgan Kaufmann, 1990

[35] M Atkinson and O Buneman Types and persistence in database programming

lan-guages ACM Computing Surveys, 19(2):105–190, 1987.

[36] R Attar, P Bernstein, and N Goodman Site initialization, recovery, and back-up in a

distributed database system IEEE Transactions on Software Engineering, 10(6):645–

650, 1983

[37] P Atzeni, L Cabibbo, and G Mecca Isalog: a declarative language for complex objects

with hierarchies In Proc IEEE Intl Conf on Data Engineering, 1993.

[38] P Atzeni and V De Antonellis Relational Database Theory Benjamin-Cummings,

1993

Trang 27

REFERENCES 849

[39] P Atzeni, G Mecca, and P Merialdo To weave the web In Proc Intl Conf Very

Large Data Bases, 1997.

[40] R Avnur, J Hellerstein, B Lo, C Olston, B Raman, V Raman, T Roth, and K Wylie

Control: Continuous output and navigation technology with refinement online In Proc.

[41] D Badal and G Popek Cost and performance analysis of semantic integrity validation

methods In Proc ACM SIGMOD Conf on the Management of Data, 1979.

[42] A Badia, D Van Gucht, and M Gyssens Querying with generalized quantifiers In

Applications of Logic Databases ed R Ramakrishnan, Kluwer Academic, 1995.

[43] I Balbin, G Port, K Ramamohanarao, and K Meenakshi Efficient bottom-up

compu-tation of queries on stratified databases Journal of Logic Programming, 11(3):295–344,

1991

[44] I Balbin and K Ramamohanarao A generalization of the differential approach to

recursive query evaluation Journal of Logic Programming, 4(3):259–262, 1987 [45] F Bancilhon, C Delobel, and P Kanellakis Building an Object-Oriented Database

System Morgan Kaufmann, 1991.

[46] F Bancilhon and S Khoshafian A calculus for complex objects Journal of Computer

and System Sciences, 38(2):326–340, 1989.

[47] F Bancilhon, D Maier, Y Sagiv, and J Ullman Magic sets and other strange ways

to implement logic programs In ACM Symp on Principles of Database Systems, 1986.

[48] F Bancilhon and R Ramakrishnan An amateur’s introduction to recursive query

processing strategies In Proc ACM SIGMOD Conf on the Management of Data,

1986

[49] F Bancilhon and N Spyratos Update semantics of relational views ACM Transactions

on Database Systems, 6(4):557–575, 1981.

[50] E Baralis, S Ceri, and S Paraboschi Modularization techniques for active rules

design ACM Transactions on Database Systems, 21(1):1–29, 1996.

[51] R Barquin and H Edelstein Planning and Designing the Data Warehouse

Prentice-Hall, 1997

[52] C Batini, S Ceri, and S Navathe Database Design: An Entity Relationship Approach.

Benjamin/Cummings Publishers, 1992

[53] C Batini, M Lenzerini, and S Navathe A comparative analysis of methodologies for

database schema integration ACM Computing Surveys, 18(4):323–364, 1986.

[54] D Batory, J Barnett, J Garza, K Smith, K Tsukuda, B Twichell, and T Wise

GENESIS: an extensible database management system In Readings in Object-Oriented

Databases eds S.B Zdonik and D Maier, Morgan Kaufmann, 1990.

[55] B Baugsto and J Greipsland Parallel sorting methods for large data volumes on a

hypercube database computer In Proc Intl Workshop on Database Machines, 1989.

[56] R Bayer and E McCreight Organization and maintenance of large ordered indexes

Acta Informatica, 1(3):173–189, 1972.

[57] R Bayer and M Schkolnick Concurrency of operations on B-trees Acta Informatica,

9(1):1–21, 1977

[58] M Beck, D Bitton, and W Wilkinson Sorting large files on a backend multiprocessor

IEEE Transactions on Computers, 37(7):769–778, 1988.

[59] N Beckmann, H.-P Kriegel, R Schneider, and B Seeger The r∗ tree: An efficient and

robust access method for points and rectangles In Proc ACM SIGMOD Conf on the

[60] C Beeri, R Fagin, and J Howard A complete axiomatization of functional and

mul-tivalued dependencies in database relations In Proc ACM SIGMOD Conf on the

Trang 28

[61] C Beeri and P Honeyman Preserving functional dependencies SIAM Journal of

Computing, 10(3):647–656, 1982.

[62] C Beeri and T Milo A model for active object-oriented database In Proc Intl Conf.

[63] C Beeri, S Naqvi, R Ramakrishnan, O Shmueli, and S Tsur Sets and negation in

a logic database language (LDL1) In ACM Symp on Principles of Database Systems,

[67] S Berchtold, C Bohm, and H.-P Kriegel The pyramid-tree: Breaking the curse of

dimensionality In ACM SIGMOD Conf on the Management of Data, 1998.

[68] P Bernstein Synthesizing third normal form relations from functional dependencies

ACM Transactions on Database Systems, 1(4):277–298, 1976.

[69] P Bernstein, B Blaustein, and E Clarke Fast maintenance of semantic integrity

assertions using redundant aggregate data In Proc Intl Conf on Very Large Databases,

1980

[70] P Bernstein and D Chiu Using semi-joins to solve relational queries Journal of the

ACM, 28(1):25–40, 1981.

[71] P Bernstein and N Goodman Timestamp-based algorithms for concurrency control in

distributed database systems In Proc Intl Conf on Very Large Databases, 1980.

[72] P Bernstein and N Goodman Concurrency control in distributed database systems

ACM Computing Surveys, 13(2):185–222, 1981.

[73] P Bernstein and N Goodman Power of natural semijoins SIAM Journal of Computing,

10(4):751–771, 1981

[74] P Bernstein and N Goodman Multiversion concurrency control—theory and

algo-rithms ACM Transactions on Database Systems, 8(4):465–483, 1983.

[75] P Bernstein, N Goodman, E Wong, C Reeve, and J Rothnie Query processing in

a system for distributed databases (SDD-1) ACM Transactions on Database Systems,

6(4):602–625, 1981

[76] P Bernstein, V Hadzilacos, and N Goodman Concurrency Control and Recovery in

Database Systems Addison-Wesley, 1987.

[77] P Bernstein and E Newcomer Principles of Transaction Processing Morgan

Kauf-mann, 1997

[78] P Bernstein, D Shipman, and J Rothnie Concurrency control in a system for

dis-tributed databases (SDD-1) ACM Transactions on Database Systems, 5(1):18–51, 1980.

[79] P Bernstein, D Shipman, and W Wong Formal aspects of serializability in database

concurrency control IEEE Transactions on Software Engineering, 5(3):203–216, 1979.

[80] K Beyer, J Goldstein, R Ramakrishnan, and U Shaft When is nearest neighbor

meaningful? In IEEE International Conference on Database Theory, 1999.

[81] K Beyer and R Ramakrishnan Bottom-up computation of sparse and iceberg cubes

In Proc ACM SIGMOD Conf on the Management of Data, 1999.

[82] B Bhargava (ed.) Concurrency Control and Reliability in Distributed Systems Van

Nostrand Reinhold, 1987

[83] A Biliris The performance of three database storage structures for managing large

objects In Proc ACM SIGMOD Conf on the Management of Data, 1992.

Trang 29

REFERENCES 851

[84] J Biskup and B Convent A formal view integration method In Proc ACM SIGMOD

Conf on the Management of Data, 1986.

[85] J Biskup, U Dayal, and P Bernstein Synthesizing independent database schemas In

Proc ACM SIGMOD Conf on the Management of Data, 1979.

[86] D Bitton and D DeWitt Duplicate record elimination in large data files ACM Transactions on Database Systems, 8(2):255–265, 1983.

[87] J Blakeley, P.-A Larson, and F Tompa Efficiently updating materialized views In

[88] M Blasgen and K Eswaran On the evaluation of queries in a database system nical report, IBM FJ (RJ1745), San Jose, 1975

Tech-[89] P Bohannon, D Leinbaugh, R Rastogi, S Seshadri, A Silberschatz, and S Sudarshan

Logical and physical versioning in main memory databases In Proc Intl Conf on

Very Large Databases, 1997.

[90] R Boyce and D Chamberlin SEQUEL: a structured English query language In Proc.

[91] P S Bradley and U M Fayyad Refining initial points for K-Means clustering In Proc.

Intl Conf on Machine Learning, pages 91–99 Morgan Kaufmann, San Francisco, CA,

1998

[92] P S Bradley, U M Fayyad, and C Reina Scaling clustering algorithms to large

databases In Proc Intl Conf on Knowledge Discovery and Data Mining, 1998 [93] K Bratbergsengen Hashing methods and relational algebra operations In Proc Intl.

Conf on Very Large Databases, 1984.

[94] L Breiman, J H Friedman, R A Olshen, and C J Stone Classification and Regression

Trees Wadsworth, Belmont, 1984.

[95] Y Breitbart, H Garcia-Molina, and A Silberschatz Overview of multidatabase

trans-action management In Proc Intl Conf on Very Large Databases, 1992.

[96] Y Breitbart, A Silberschatz, and G Thompson Reliable transaction management in

a multidatabase system In Proc ACM SIGMOD Conf on the Management of Data,

1990

[97] Y Breitbart, A Silberschatz, and G Thompson An approach to recovery management

in a multidatabase system In Proc Intl Conf on Very Large Databases, 1992.

[98] S Brin, R Motwani, and C Silverstein Beyond market baskets: Generalizing

associ-ation rules to correlassoci-ations In Proc ACM SIGMOD Conf on the Management of Data,

1997

[99] S Brin and L Page The anatomy of a large-scale hypertextual web search engine In

Proceedings of 7th World Wide Web Conference, 1998.

[100] T Brinkhoff, H Kriegel, and R Schneider Comparison of approximations of complexobjects used for approximation-based query processing in spatial database systems In

Proc IEEE Intl Conf on Data Engineering, 1993.

[101] K Brown, M Carey, and M Livny Goal-oriented buffer management revisited In

[102] F Bry Towards an efficient evaluation of general queries: Quantifier and disjunction

processing revisited In Proc ACM SIGMOD Conf on the Management of Data, 1989.

[103] F Bry and R Manthey Checking consistency of database constraints: a logical basis

In Proc Intl Conf on Very Large Databases, 1986.

[104] O Buneman and E Clemons Efficiently monitoring relational databases ACM

Trans-actions on Database Systems, 4(3), 1979.

[105] O Buneman, S Naqvi, V Tannen, and L Wong Principles of programming with

complex objects and collection types Theoretical Computer Science, 149(1):3–48, 1995.

Trang 30

[106] P Buneman, S Davidson, G Hillebrand, and D Suciu A query language and

optimiza-tion techniques for unstructured data In Proc ACM SIGMOD Conf on Management

the Management of Data, 1999.

[109] M Carey, D DeWitt, M Franklin, N Hall, M McAuliffe, J Naughton, D Schuh,

M Solomon, C Tan, O Tsatalos, S White, and M Zwilling Shoring up persistent

applications In Proc ACM SIGMOD Conf on the Management of Data, 1994.

[110] M Carey, D DeWitt, G Graefe, D Haight, J Richardson, D Schuh, E Shekita, and

S Vandenberg The EXODUS Extensible DBMS project: An overview In Readings in

Object-Oriented Databases S.B Zdonik and D Maier (eds.), Morgan Kaufmann, 1990.

[111] M Carey, D DeWitt, and J Naughton The dec 007 benchmark In Proc ACM

SIGMOD Conf on the Management of Data, 1993.

[112] M Carey, D DeWitt, J Naughton, M Asgarian, J Gehrke, and D Shah The BUCKY

object-relational benchmark In Proc ACM SIGMOD Conf on the Management of

Data, 1997.

[113] M Carey, D DeWitt, J Richardson, and E Shekita Object and file management in

the Exodus extensible database system In Proc Intl Conf on Very Large Databases,

1986

[114] M Carey and D Kossman On saying “Enough Already!” in SQL In Proc ACM

[115] M Carey and D Kossman Reducing the braking distance of an SQL query engine In

[116] M Carey and M Livny Conflict detection tradeoffs for replicated data ACM

Trans-actions on Database Systems, 16(4), 1991.

[117] M Casanova, L Tucherman, and A Furtado Enforcing inclusion dependencies and

referential integrity In Proc Intl Conf on Very Large Databases, 1988.

[118] M Casanova and M Vidal Towards a sound view integration methodology In ACM

Symp on Principles of Database Systems, 1983.

[119] S Castano, M Fugini, G Martella, and P Samarati Database Security

Addison-Wesley, 1995

[120] R Cattell The Object Database Standard: ODMG-93 (Release 1.1) Morgan

Kauf-mann, 1994

[121] S Ceri, P Fraternali, S Paraboschi, and L Tanca Active rule management in Chimera

In Active Database Systems J Widom and S Ceri (eds.), Morgan Kaufmann, 1996 [122] S Ceri, G Gottlob, and L Tanca Logic Programming and Databases Springer Verlag,

1990

[123] S Ceri and G Pelagatti Distributed Database Design: Principles and Systems.

McGraw-Hill, 1984

[124] S Ceri and J Widom Deriving production rules for constraint maintenance In Proc.

Intl Conf on Very Large Databases, 1990.

[125] F Cesarini, M Missikoff, and G Soda An expert system approach for database

application tuning Data and Knowledge Engineering, 8:35–55, 1992.

[126] U Chakravarthy Architectures and monitoring techniques for active databases: An

evaluation Data and Knowledge Engineering, 16(1):1–26, 1995.

[127] U Chakravarthy, J Grant, and J Minker Logic-based approach to semantic query

optimization ACM Transactions on Database Systems, 15(2):162–207, 1990.

Trang 31

REFERENCES 853

[128] D Chamberlin Using the New DB2 Morgan Kaufmann, 1996.

[129] D Chamberlin, M Astrahan, M Blasgen, J Gray, W King, B Lindsay, R Lorie,

J Mehl, T Price, P Selinger, M Schkolnick, D Slutz, I Traiger, B Wade, and R Yost

A history and evaluation of System R Communications of the ACM, 24(10):632–646,

1981

[130] D Chamberlin, M Astrahan, K Eswaran, P Griffiths, R Lorie, J Mehl, P Reisner,and B Wade Sequel 2: A unified approach to data definition, manipulation, and

control IBM Journal of Research and Development, 20(6):560–575, 1976.

[131] A Chandra and D Harel Structure and complexity of relational queries J Computer

and System Sciences, 25:99–128, 1982.

[132] A Chandra and P Merlin Optimal implementation of conjunctive queries in relational

databases In Proc ACM SIGACT Symp on Theory of Computing, 1977.

[133] M Chandy, L Haas, and J Misra Distributed deadlock detection ACM Transactions

on Computer Systems, 1(3):144–156, 1983.

[134] C Chang and D Leu Multi-key sorting as a file organization scheme when queries are

not equally likely In Proc Intl Symp on Database Systems for Advanced Applications,

1989

[135] D Chang and D Harkey Client/server data access with Java and XML John Wiley

and Sons, 1998

[136] D Chatziantoniou and K Ross Groupwise processing of relational queries In Proc.

Intl Conf on Very Large Databases, 1997.

[137] S Chaudhuri and U Dayal An overview of data warehousing and OLAP technology

SIGMOD Record, 26(1):65–74, 1997.

[138] S Chaudhuri and V Narasayya An efficient cost-driven index selection tool for

Mi-crosoft SQL Server In Proc Intl Conf on Very Large Databases, 1997.

[139] S Chaudhuri and K Shim Optimization queries with aggregate views In Intl Conf.

on Extending Database Technology, 1996.

[140] S Chaudhuri and K Shim Optimization of queries with user-defined predicates In

[141] J Cheiney, P Faudemay, R Michel, and J Thevenin A reliable parallel backend using

multiattribute clustering and select-join operator In Proc Intl Conf on Very Large

Databases, 1986.

[142] C Chen and N Roussopoulos Adaptive database buffer management using query

feedback In Proc Intl Conf on Very Large Databases, 1993.

[143] C Chen and N Roussopoulos Adaptive selectivity estimation using query feedback

[144] P M Chen, E K Lee, G A Gibson, R H Katz, and D A Patterson RAID:

high-performance, reliable secondary storage ACM Computing Surveys, 26(2):145–185, June

1994

[145] P P Chen The entity-relationship model—toward a unified view of data ACM

Trans-actions on Database Systems, 1(1):9–36, 1976.

[146] D Childs Feasibility of a set theoretical data structure—a general structure based on

a reconstructed definition of relation Proc Tri-annual IFIP Conference, 1968.

[147] D Chimenti, R Gamboa, R Krishnamurthy, S Naqvi, S Tsur, and C Zaniolo The ldl

system prototype IEEE Transactions on Knowledge and Data Engineering, 2(1):76–90,

Trang 32

[150] H.-T Chou and D DeWitt An evaluation of buffer management strategies for relational

database systems In Proc Intl Conf on Very Large Databases, 1985.

[151] P Chrysanthis and K Ramamritham Acta: a framework for specifying and

reason-ing about transaction structure and behavior In Proc ACM SIGMOD Conf on the

[152] F Chu, J Halpern, and P Seshadri Least expected cost query optimization: An

exercise in utility ACM Symp on Principles of Database Systems, 1999.

[153] F Civelek, A Dogac, and S Spaccapietra An expert system approach to view definition

and integration In Proc Entity-Relationship Conference, 1988.

[154] R Cochrane, H Pirahesh, and N Mattos Integrating triggers and declarative

con-straints in SQL database systems In Proc Intl Conf on Very Large Databases, 1996 [155] CODASYL Report of the CODASYL Data Base Task Group ACM, 1971.

[156] E Codd A relational model of data for large shared data banks Communications of

the ACM, 13(6):377–387, 1970.

[157] E Codd Further normalization of the data base relational model In Data Base

Systems ed R Rustin, PrenticeHall, 1972.

[158] E Codd Relational completeness of data base sub-languages In Data Base Systems.

ed R Rustin, PrenticeHall, 1972

[159] E Codd Extending the database relational model to capture more meaning ACM

Transactions on Database Systems, 4(4):397–434, 1979.

[160] E Codd Twelve rules for on-line analytic processing Computerworld, April 13 1995.

[161] L Colby, T Griffin, L Libkin, I Mumick, and H Trickey Algorithms for deferred view

maintenance In Proc ACM SIGMOD Conf on the Management of Data, 1996.

[162] L Colby, A Kawaguchi, D Lieuwen, I Mumick, and K Ross Supporting multiple

view maintenance policies: Concepts, algorithms, and performance analysis In Proc.

[163] D Comer The ubiquitous B-tree ACM C Surveys, 11(2):121–137, 1979.

[164] D Connolly, editor XML Principles, Tools and Techniques O’Reilly & Associates,

Sebastopol, USA, 1997

[165] D Copeland and D Maier Making SMALLTALK a database system In Proc ACM

[166] G Cornell and K Abdali CGI Programming With Java PrenticeHall, 1998.

[167] C Date A critique of the SQL database language ACM SIGMOD Record, 14(3):8–54,

1984

[168] C Date Relational Database: Selected Writings Addison-Wesley, 1986.

[169] C Date An Introduction to Database Systems (6th ed.) Addison-Wesley, 1995 [170] C Date and H Darwen A Guide to the SQL Standard (3rd ed.) Addison-Wesley, 1993.

[171] C Date and R Fagin Simple conditions for guaranteeing higher normal forms in

relational databases ACM Transactions on Database Systems, 17(3), 1992.

[172] C Date and D McGoveran A Guide to Sybase and SQL Server Addison-Wesley, 1993 [173] U Dayal and P Bernstein On the updatability of relational views In Proc Intl Conf.

[174] U Dayal and P Bernstein On the correct translation of update operations on relational

views ACM Transactions on Database Systems, 7(3), 1982.

[175] P DeBra and J Paredaens Horizontal decompositions for handling exceptions to FDs

In Advances in Database Theory, H Gallaire eds J Minker and J-M Nicolas, Plenum

Press, 1984

[176] J Deep and P Holfelder Developing CGI applications with Perl Wiley, 1996.

Trang 33

REFERENCES 855

[177] C Delobel Normalization and hierarchial dependencies in the relational data model

[178] D Denning Secure statistical databases with random sample queries ACM

Transac-tions on Database Systems, 5(3):291–315, 1980.

[179] D E Denning Cryptography and Data Security Addison-Wesley, 1982.

[180] M Derr, S Morishita, and G Phipps The glue-nail deductive database system: Design,

implementation, and evaluation VLDB Journal, 3(2):123–160, 1994.

[181] A Deshpande An implementation for nested relational databases Technical report,PhD thesis, Indiana University, 1989

[182] P Deshpande, K Ramasamy, A Shukla, and J F Naughton Caching multidimensional

queries using chunks In Proc ACM SIGMOD Intl Conf on Management of Data, 1998 [183] O e a Deux The story of O2 IEEE Transactions on Knowledge and Data Engineering,

2(1), 1990

[184] D DeWitt, H.-T Chou, R Katz, and A Klug Design and implementation of the

Wisconsin Storage System Software Practice and Experience, 15(10):943–962, 1985.

[185] D DeWitt, R Gerber, G Graefe, M Heytens, K Kumar, and M Muralikrishna

Gamma—a high performance dataflow database machine In Proc Intl Conf on Very

Large Databases, 1986.

[186] D DeWitt and J Gray Parallel database systems: The future of high-performance

database systems Communications of the ACM, 35(6):85–98, 1992.

[187] D DeWitt, R Katz, F Olken, L Shapiro, M Stonebraker, and D Wood

Implemen-tation techniques for main memory databases In Proc ACM SIGMOD Conf on the

[188] D DeWitt, J Naughton, and D Schneider Parallel sorting on a shared-nothing

archi-tecture using probabilistic splitting In Proc Conf on Parallel and Distributed

Infor-mation Systems, 1991.

[189] D DeWitt, J Naughton, D Schneider, and S Seshadri Practical skew handling in

parallel joins In Proc Intl Conf on Very Large Databases, 1992.

[190] O Diaz, N Paton, and P Gray Rule management in object-oriented databases: A

uniform approach In Proc Intl Conf on Very Large Databases, 1991.

[191] S Dietrich Extension tables: Memo relations in logic programming In Proc Intl.

Symp on Logic Programming, 1987.

[192] D Donjerkovic and R Ramakrishnan Probabilistic optimization of top n queries In

[193] W Du and A Elmagarmid Quasi-serializability: a correctness criterion for global

concurrency control in interbase In Proc Intl Conf on Very Large Databases, 1989.

[194] W Du, R Krishnamurthy, and M.-C Shan Query optimization in a heterogeneous

DBMS In Proc Intl Conf on Very Large Databases, 1992.

[195] R C Dubes and A Jain Clustering methodologies in exploratory data analysis,

Ad-vances in Computers Academic Press, New York, 1980.

[196] N Duppel Parallel SQL on TANDEM’s NonStop SQL IEEE COMPCON, 1989 [197] H Edelstein The challenge of replication, Parts 1 and 2 DBMS: Database and Client-

Server Solutions, 1995.

[198] W Effelsberg and T Haerder Principles of database buffer management ACM

Trans-actions on Database Systems, 9(4):560–595, 1984.

[199] M H Eich A classification and comparison of main memory database recovery

tech-niques In Proc IEEE Intl Conf on Data Engineering, 1987.

[200] A Eisenberg and J Melton Sql: 1999, formerly known as sql 3 ACM SIGMOD Record,

28(1):131–138, 1999

Trang 34

[201] A El Abbadi Adaptive protocols for managing replicated distributed databases In

IEEE Symp on Parallel and Distributed Processing, 1991.

[202] A El Abbadi, D Skeen, and F Cristian An efficient, fault-tolerant protocol for

repli-cated data management In ACM Symp on Principles of Database Systems, 1985 [203] C Ellis Concurrency in Linear Hashing ACM Transactions on Database Systems,

12(2):195–217, 1987

[204] A Elmagarmid Database Transaction Models for Advanced Applications Morgan

Kaufmann, 1992

[205] A Elmagarmid, J Jing, W Kim, O Bukhres, and A Zhang Global committability

in multidatabase systems IEEE Transactions on Knowledge and Data Engineering,

8(5):816–824, 1996

[206] A Elmagarmid, A Sheth, and M Liu Deadlock detection algorithms in distributed

database systems In Proc IEEE Intl Conf on Data Engineering, 1986.

[207] R Elmasri and S Navathe Object integration in database design In Proc IEEE Intl.

Conf on Data Engineering, 1984.

[208] R Elmasri and S Navathe Fundamentals of Database Systems (2nd ed.)

Benjamin-Cummings, 1994

[209] R Epstein Techniques for processing of aggregates in relational database systems.Technical report, UC-Berkeley, Electronics Research Laboratory, M798, 1979

[210] R Epstein, M Stonebraker, and E Wong Distributed query processing in a relational

data base system In Proc ACM SIGMOD Conf on the Management of Data, 1978.

[211] M Ester, H.-P Kriegel, J Sander, and X Xu A density-based algorithm for

discov-ering clusters in large spatial databases with noise In Proc Intl Conf on Knowledge

Discovery in Databases and Data Mining, 1995.

[212] M Ester, H.-P Kriegel, and X Xu A database interface for clustering in large spatial

databases In Proc Intl Conf on Knowledge Discovery in Databases and Data Mining,

1995

[213] K Eswaran and D Chamberlin Functional specification of a subsystem for data base

integrity In Proc Intl Conf on Very Large Databases, 1975.

[214] K Eswaran, J Gray, R Lorie, and I Traiger The notions of consistency and predicate

locks in a data base system Communications of the ACM, 19(11):624–633, 1976.

[215] R Fagin Multivalued dependencies and a new normal form for relational databases

[216] R Fagin Normal forms and relational database operators In Proc ACM SIGMOD

[217] R Fagin A normal form for relational databases that is based on domains and keys

[218] R Fagin, J Nievergelt, N Pippenger, and H Strong Extendible Hashing—a fast access

method for dynamic files ACM Transactions on Database Systems, 4(3), 1979 [219] C Faloutsos Access methods for text ACM Computing Surveys, 17(1):49–74, 1985 [220] C Faloutsos Searching Multimedia Databases by Content Kluwer Academic, 1996.

[221] C Faloutsos and S Christodoulakis Signature files: An access method for documents

and its analytical performance evaluation ACM Transactions on Office Information

Systems, 2(4):267–288, 1984.

[222] C Faloutsos and H Jagadish On B-Tree indices for skewed distributions In Proc Intl.

[223] C Faloutsos, R Ng, and T Sellis Predictive load control for flexible buffer allocation

Trang 35

REFERENCES 857

[224] C Faloutsos, M Ranganathan, and Y Manolopoulos Fast subsequence matching

in time-series databases In Proc ACM SIGMOD Conf on the Management of Data,

1994

[225] C Faloutsos and S Roseman Fractals for secondary key retrieval In ACM Symp on

Principles of Database Systems, 1989.

[226] M Fang, N Shivakumar, H Garcia-Molina, R Motwani, and J D Ullman Computing

iceberg queries efficiently In Proc Intl Conf On Very Large Data Bases, 1998.

[227] U Fayyad, G Piatetsky-Shapiro, and P Smyth The kdd process for extracting useful

knowledge from volumes of data Communications of the ACM, 39(11):27–34, 1996 [228] U Fayyad, G Piatetsky-Shapiro, P Smyth, and R Uthurusamy Advances in Knowl-

edge Discovery and Data Mining MIT Press, 1996.

[229] U Fayyad and E Simoudis Data mining and knowledge discovery: Tutorial notes In

Intl Joint Conf on Artificial Intelligence, 1997.

[230] U M Fayyad, G Piatetsky-Shapiro, P Smyth, and R Uthurusamy, editors Advances

in Knowledge Discovery and Data Mining AAAI/MIT Press, 1996.

[231] U M Fayyad and R Uthurusamy, editors Proc Intl Conf on Knowledge Discovery

and Data Mining AAAI Press, 1995.

[232] M Fernandez, D Florescu, J Kang, A Y Levy, and D Suciu STRUDEL: A Web site

management system In Proc ACM SIGMOD Conf on Management of Data, 1997.

[233] M Fernandez, D Florescu, A Y Levy, and D Suciu A query language for a Web-site

management system SIGMOD Record (ACM Special Interest Group on Management

of Data), 26(3):4–11, 1997.

[234] S Finkelstein, M Schkolnick, and P Tiberio Physical database design for relational

databases IBM Research Review RJ5034, 1986.

[235] D Fishman, D Beech, H Cate, E Chow, T Connors, J Davis, N Derrett, C Hoch,

W Kent, P Lyngbaek, B Mahbod, M.-A Neimat, T Ryan, and M.-C Shan Iris: An

object-oriented database management system ACM Transactions on Office Information

Systems, 5(1):48–69, 1987.

[236] C Fleming and B von Halle Handbook of Relational Database Design Addison-Wesley,

1989

[237] D Florescu, A Y Levy, and A O Mendelzon Database techniques for the

World-Wide Web: A survey SIGMOD Record (ACM Special Interest Group on Management

of Data), 27(3):59–74, 1998.

[238] F Fotouhi and S Pramanik Optimal secondary storage access sequence for performing

relational join IEEE Transactions on Knowledge and Data Engineering, 1(3):318–328,

1989

[239] W B Frakes and R Baeza-Yates, editors Information Retrieval: Data Structures and

Algorithms PrenticeHall, 1992.

[240] P Franaszek, J Robinson, and A Thomasian Concurrency control for high contention

environments ACM Transactions on Database Systems, 17(2), 1992.

[241] M Franklin Concurrency control and recovery In Handbook of Computer Science,

A.B Tucker (ed.), CRC Press, 1996.

[242] M Franklin, M Carey, and M Livny Local disk caching for client-server database

systems In Proc Intl Conf on Very Large Databases, 1993.

[243] M Franklin, B Jonsson, and D Kossman Performance tradeoffs for client-server query

processing In Proc ACM SIGMOD Conf on the Management of Data, 1996.

[244] P Fraternali and L Tanca A structured approach for the definition of the semantics

of active databases ACM Transactions on Database Systems, 20(4):414–471, 1995 [245] M W Freeston The BANG file: A new kind of Grid File In Proc ACM SIGMOD

Trang 36

[246] J Freytag A rule-based view of query optimization In Proc ACM SIGMOD Conf on

the Management of Data, 1987.

[247] O Friesen, A Lefebvre, and L Vieille VALIDITY: Applications of a DOOD system

In Intl Conf on Extending Database Technology, 1996.

[248] J Fry and E Sibley Evolution of data-base management systems ACM Computing

Surveys, 8(1):7–42, 1976.

[249] T Fukuda, Y Morimoto, S Morishita, and T Tokuyama Mining optimized association

rules for numeric attributes In ACM Symp on Principles of Database Systems, 1996.

[250] A Furtado and M Casanova Updating relational views In Query Processing in

Database Systems eds W Kim, D.S Reiner and D.S Batory, Springer-Verlag, 1985.

[251] S Fushimi, M Kitsuregawa, and H Tanaka An overview of the systems software

of a parallel relational database machine: Grace In Proc Intl Conf on Very Large

Databases, 1986.

[252] V Gaede and O Guenther Multidimensional access methods Computing Surveys,

30(2):170–231, 1998

[253] H Gallaire, J Minker, and J.-M Nicolas (eds.) Advances in Database Theory, Vols 1

and 2 Plenum Press, 1984.

[254] H Gallaire and J Minker (eds.) Logic and Data Bases Plenum Press, 1978.

[255] S Ganguly, W Hasan, and R Krishnamurthy Query optimization for parallel

execu-tion In Proc ACM SIGMOD Conf on the Management of Data, 1992.

[256] R Ganski and H Wong Optimization of nested SQL queries revisited In Proc ACM

[257] V Ganti, J E Gehrke, and R Ramakrishnan Cactus–clustering categorical data using

summaries In Proc ACM Intl Conf on Knowledge Discovery in Databases, 1999.

[258] V Ganti, R Ramakrishnan, J E Gehrke, A Powell, and J French Clustering large

datasets in arbitrary metric spaces In Proc IEEE Intl Conf Data Engineering, 1999 [259] H Garcia-Molina and D Barbara How to assign votes in a distributed system Journal

of the ACM, 32(4), 1985.

[260] H Garcia-Molina, R Lipton, and J Valdes A massive memory system machine IEEE

Transactions on Computers, C33(4):391–399, 1984.

[261] H Garcia-Molina and G Wiederhold Read-only transactions in a distributed database

[262] E Garfield Citation analysis as a tool in journal evaluation Science, 178(4060):471–

479, 1972

[263] A Garg and C Gotlieb Order preserving key transformations ACM Transactions on

Database Systems, 11(2):213–234, 1986.

[264] J E Gehrke, V Ganti, R Ramakrishnan, and W.-Y Loh Boat: Optimistic decision

tree construction In Proc ACM SIGMOD Conf on Managment of Data, 1999.

[265] J E Gehrke, R Ramakrishnan, and V Ganti Rainforest: a framework for fast decision

tree construction of large datasets In Proc Intl Conf on Very Large Databases, 1998 [266] S P Ghosh Data Base Organization for Data Management (2nd ed.) Academic Press,

1986

[267] D Gibson, J M Kleinberg, and P Raghavan Clustering categorical data: An approach

based on dynamical systems In Proceedings of the 24th International Conference on

Very Large Databases, pages 311–323, New York City, New York, August 24-27 1998.

[268] D Gibson, J M Kleinberg, and P Raghavan Inferring web communities from link

topology In Proc ACM Conf on Hypertext, Structural Queries, 1998.

[269] G A Gibson Redundant Disk Arrays: Reliable, Parallel Secondary Storage An ACM

Distinguished Dissertation 1991 MIT Press, 1992

Trang 37

REFERENCES 859

[270] D Gifford Weighted voting for replicated data In ACM Symp on Operating Systems

Principles, 1979.

[271] C F Goldfarb and P Prescod The XML Handbook PrenticeHall PTR, 1998.

[272] R Goldman and J Widom DataGuides: enabling query formulation and optimization

in semistructured databases In Proc Intl Conf on Very Large Data Bases, pages

436–445, 1997

[273] J Goldstein, R Ramakrishnan, U Shaft, and J.-B Yu Processing queries by linear

constraints In Proc ACM Symposium on Principles of Database Systems, 1997.

[274] G Graefe Encapsulation of parallelism in the Volcano query processing system In

[275] G Graefe Query evaluation techniques for large databases ACM Computing Surveys,

25(2), 1993

[276] G Graefe, R Bunker, and S Cooper Hash joins and hash teams in microsoft sql server:

[277] G Graefe and D DeWitt The Exodus optimizer generator In Proc ACM SIGMOD

[278] G Graefe and K Ward Dynamic query optimization plans In Proc ACM SIGMOD

[279] M Graham, A Mendelzon, and M Vardi Notions of dependency satisfaction Journal

of the ACM, 33(1):105–129, 1986.

[280] G Grahne The Problem of Incomplete Information in Relational Databases

Springer-Verlag, 1991

[281] J Gray Notes on data base operating systems In Operating Systems: An Advanced

Course eds Bayer, Graham, and Seegmuller, Springer-Verlag, 1978.

[282] J Gray The transaction concept: Virtues and limitations In Proc Intl Conf on Very

Large Databases, 1981.

[283] J Gray Transparency in its place—the case against transparent access to geographically

distributed data Tandem Computers, TR-89-1, 1989.

[284] J Gray The Benchmark Handbook: for Database and Transaction Processing Systems.

Morgan Kaufmann, 1991

[285] J Gray, A Bosworth, A Layman, and H Pirahesh Data cube: A relational aggregation

operator generalizing group-by, cross-tab and sub-totals In Proc IEEE Intl Conf on

Data Engineering, 1996.

[286] J Gray, R Lorie, G Putzolu, and I Traiger Granularity of locks and degrees of

consistency in a shared data base In Proc of IFIP Working Conf on Modelling of

Data Base Management Systems, 1977.

[287] J Gray, P McJones, M Blasgen, B Lindsay, R Lorie, G Putzolu, T Price, and

I Traiger The recovery manager of the System R database manager ACM Computing

Surveys, 13(2):223–242, 1981.

[288] J Gray and A Reuter Transaction Processing: Concepts and Techniques Morgan

Kaufmann, 1992

[289] P Gray Logic, Algebra, and Databases John Wiley, 1984.

[290] P Griffiths and B Wade An authorization mechanism for a relational database system

[291] G Grinstein Visualization and data mining In Intl Conf on Knowledge Discovery

in Databases, 1996.

[292] S Guha, R Rastogi, and K Shim Cure: an efficient clustering algorithm for large

databases In Proc ACM SIGMOD Conf on Management of Data, 1998.

Trang 38

[293] A Gupta and I Mumick Materialized Views: Techniques, Implementations, and

Ap-plications MIT Press, 1999.

[294] A Gupta, I Mumick, and V Subrahmanian Maintaining views incrementally In Proc.

[295] A Guttman R-trees: a dynamic index structure for spatial searching In Proc ACM

[296] L Haas, W Chang, G Lohman, J McPherson, P Wilms, G Lapis, B Lindsay, H

Pi-rahesh, M Carey, and E Shekita Starburst mid-flight: As the dust clears IEEE

Transactions on Knowledge and Data Engineering, 2(1), 1990.

[297] L M Haas and A Tiwary, editors SIGMOD 1998, Proceedings of the ACM SIGMOD

International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA ACM Press, 1998.

[298] P Haas, J Naughton, S Seshadri, and L Stokes Sampling-based estimation of the

number of distinct values of an attribute In Proc Intl Conf on Very Large Databases,

1995

[299] P Haas and A Swami Sampling-based selectivity estimation for joins using augmented

frequent value statistics In Proc IEEE Intl Conf on Data Engineering, 1995.

[300] T Haerder and A Reuter Principles of transaction oriented database recovery—a

taxonomy ACM Computing Surveys, 15(4), 1982.

[301] U Halici and A Dogac Concurrency control in distributed databases through time

intervals and short-term locks IEEE Transactions on Software Engineering, 15(8):994–

1003, 1989

[302] M Hall Core Web Programming: HTML, Java, CGI, & Javascript Prentice-Hall,

1997

[303] P Hall Optimization of a simple expression in a relational data base system IBM

Journal of Research and Development, 20(3):244–257, 1976.

[304] G Hamilton, R Cattell, and M Fisher JDBC Database Access With Java: A Tutorial

and Annotated Reference Java Series Addison-Wesley, 1997.

[305] M Hammer and D McLeod Semantic integrity in a relational data base system In

[306] J Han and Y Fu Discovery of multiple-level association rules from large databases

[307] D Hand Construction and Assessment of Classification Rules John Wiley & Sons,

Chichester, England, 1997

[308] E Hanson A performance analysis of view materialization strategies In Proc ACM

[309] E Hanson Rule condition testing and action execution in Ariel In Proc ACM SIGMOD

[310] V Harinarayan, A Rajaraman, and J Ullman Implementing data cubes efficiently In

[311] J Haritsa, M Carey, and M Livny On being optimistic about real-time constraints

In ACM Symp on Principles of Database Systems, 1990.

[312] J Harrison and S Dietrich Maintenance of materialized views in deductive databases:

An update propagation approach In Proc Workshop on Deductive Databases, 1992 [313] D Heckerman Bayesian networks for knowledge discovery In Advances in Knowledge

Discovery and Data Mining eds U.M Fayyad, G Piatetsky-Shapiro, P Smyth, and R.

Uthurusamy, MIT Press, 1996

[314] D Heckerman, H Mannila, D Pregibon, and R Uthurusamy, editors Proceedings of the

Third International Conference on Knowledge Discovery and Data Mining (KDD-97).

AAAI Press, 1997

Trang 39

REFERENCES 861

[315] J Hellerstein Optimization and execution techniques for queries with expensive

meth-ods Ph.D thesis, University of Wisconsin-Madison, 1995.

[316] J Hellerstein, P Haas, and H Wang Online aggregation In Proc ACM SIGMOD

[317] J Hellerstein, J Naughton, and A Pfeffer Generalized search trees for database

sys-tems In Proc Intl Conf on Very Large Databases, 1995.

[318] J M Hellerstein, E Koutsoupias, and C H Papadimitriou On the analysis of indexing

schemes In Proc ACM Symposium on Principles of Database Systems, pages 249–256,

1997

[319] R Himmeroeder, G Lausen, B Ludaescher, and C Schlepphorst On a declarative

semantics for Web queries Lecture Notes in Computer Science, 1341:386–398, 1997.

[320] C.-T Ho, R Agrawal, N Megiddo, and R Srikant Range queries in OLAP data cubes

[321] S Holzner XML Complete McGraw-Hill, 1998.

[322] D Hong, T Johnson, and U Chakravarthy Real-time transaction scheduling: A cost

conscious approach In Proc ACM SIGMOD Conf on the Management of Data, 1993.

[323] W Hong and M Stonebraker Optimization of parallel query execution plans in XPRS

In Proc Intl Conf on Parallel and Distributed Information Systems, 1991.

[324] W.-C Hou and G Ozsoyoglu Statistical estimators for aggregate relational algebra

queries ACM Transactions on Database Systems, 16(4), 1991.

[325] H Hsiao and D DeWitt A performance study of three high availability data replication

strategies In Proc Intl Conf on Parallel and Distributed Information Systems, 1991.

[326] J Huang, J Stankovic, K Ramamritham, and D Towsley Experimental evaluation of

real-time optimistic concurrency control schemes In Proc Intl Conf on Very Large

Databases, 1991.

[327] Y Huang, A Sistla, and O Wolfson Data replication for mobile computers In Proc.

[328] Y Huang and O Wolfson A competitive dynamic data replication algorithm In Proc.

IEEE CS IEEE Intl Conf on Data Engineering, 1993.

[329] R Hull Managing semantic heterogeneity in databases: A theoretical perspective In

ACM Symp on Principles of Database Systems, 1997.

[330] R Hull and R King Semantic database modeling: Survey, applications, and research

issues ACM Computing Surveys, 19(19):201–260, 1987.

[331] R Hull and J Su Algebraic and calculus query languages for recursively typed complex

objects Journal of Computer and System Sciences, 47(1):121–156, 1993.

[332] R Hull and M Yoshikawa ILOG: declarative creation and manipulation of

object-identifiers In Proc Intl Conf on Very Large Databases, 1990.

[333] J Hunter Java Servlet Programming O’Reilly Associates, Inc., 1998.

[334] T Imielinski and H Korth (eds.) Mobile Computing Kluwer Academic, 1996 [335] T Imielinski and W Lipski Incomplete information in relational databases Journal

of the ACM, 31(4):761–791, 1984.

[336] T Imielinski and H Mannila A database perspective on knowledge discovery

Com-munications of the ACM, 38(11):58–64, 1996.

[337] T Imielinski, S Viswanathan, and B Badrinath Energy efficient indexing on air In

[338] Y Ioannidis Query optimization In Handbook of Computer Science ed A.B Tucker,

CRC Press, 1996

[339] Y Ioannidis and S Christodoulakis Optimal histograms for limiting worst-case error

propagation in the size of join results ACM Transactions on Database Systems, 1993.

Trang 40

[340] Y Ioannidis and Y Kang Randomized algorithms for optimizing large join queries In

[341] Y Ioannidis and Y Kang Left-deep vs bushy trees: An analysis of strategy spaces

and its implications for query optimization In Proc ACM SIGMOD Conf on the

[342] Y Ioannidis, R Ng, K Shim, and T Sellis Parametric query processing In Proc Intl.

[343] Y Ioannidis and R Ramakrishnan Containment of conjunctive queries: Beyond

rela-tions as sets ACM Transacrela-tions on Database Systems, 20(3):288–324, 1995.

[344] Y E Ioannidis Universality of serial histograms In Proc Intl Conf on Very Large

Databases, 1993.

[345] H Jagadish, D Lieuwen, R Rastogi, A Silberschatz, and S Sudarshan Dali: a

high performance main-memory storage manager In Proc Intl Conf on Very Large

Databases, 1994.

[346] A K Jain and R C Dubes Algorithms for Clustering Data PrenticeHall, 1988.

[347] S Jajodia and D Mutchler Dynamic voting algorithms for maintaining the consistency

of a replicated database ACM Transactions on Database Systems, 15(2):230–280, 1990 [348] S Jajodia and R Sandhu Polyinstantiation integrity in multilevel relations In Proc.

IEEE Symp on Security and Privacy, 1990.

[349] M Jarke and J Koch Query optimization in database systems ACM Computing

Surveys, 16(2):111–152, 1984.

[350] K S Jones and P Willett, editors Readings in Information Retrieval Multimedia

Information and Systems Morgan Kaufmann Publishers, 1997

[351] J Jou and P Fischer The complexity of recognizing 3nf schemes Information

Pro-cessing Letters, 14(4):187–190, 1983.

[352] R J B Jr Efficiently mining long patterns from databases In Haas and Tiwary [297].[353] N Kabra and D J DeWitt Efficient mid-query re-optimization of sub-optimal query

execution plans In Proc ACM SIGMOD Intl Conf on Management of Data, 1998.

[354] Y Kambayashi, M Yoshikawa, and S Yajima Query processing for distributed

databases using generalized semi-joins In Proc ACM SIGMOD Conf on the

Man-agement of Data, 1982.

[355] P Kanellakis Elements of relational database theory In Handbook of Theoretical

Computer Science ed J Van Leeuwen, Elsevier, 1991.

[356] P Kanellakis Constraint programming and database languages: A tutorial In ACM

Symp on Principles of Database Systems, 1995.

[357] L Kaufman and P Rousseeuw Finding Groups in Data: An Introduction to Cluster

Analysis John Wiley and Sons, 1990.

[358] D Keim and H.-P Kriegel VisDB: a system for visualizing large databases In Proc.

[359] D Keim and H.-P Kriegel Visualization techniques for mining large databases: A

comparison IEEE Transactions on Knowledge and Data Engineering, 8(6):923–938,

1996

[360] A Keller Algorithms for translating view updates to database updates for views

involv-ing selections, projections, and joins ACM Symp on Principles of Database Systems,

1985

[361] W Kent Data and Reality, Basic Assumptions in Data Processing Reconsidered.

North-Holland, 1978

[362] W Kent, R Ahmed, J Albert, M Ketabchi, and M.-C Shan Object identification in

multi-database systems In IFIP Intl Conf on Data Semantics, 1992.

Định dạng
Số trang	85
Dung lượng	530,07 KB