func-The transaction processing capabilities of database systems are improving continually.For example, many vendors offer distributed DBMS products today in which a transac-tion can exe
Trang 1Additional Topics 823
developing transactions with the ACID properties In addition to providing a uniforminterface to the services of different resource managers, a TP monitor also routestransactions to the appropriate resource managers Finally, a TP monitor ensures that
an application behaves as a transaction by implementing concurrency control, logging,and recovery functions, and by exploiting the transaction processing capabilities of theunderlying resource managers
TP monitors are used in environments where applications require advanced featuressuch as access to multiple resource managers; sophisticated request routing (also called
workflow management); assigning priorities to transactions and doing
priority-based load-balancing across servers; and so on A DBMS provides many of the tions supported by a TP monitor in addition to processing queries and database up-dates efficiently A DBMS is appropriate for environments where the wealth of trans-action management capabilities provided by a TP monitor is not necessary and, inparticular, where very high scalability (with respect to transaction processing activ-ity) and interoperability are not essential
func-The transaction processing capabilities of database systems are improving continually.For example, many vendors offer distributed DBMS products today in which a transac-tion can execute across several resource managers, each of which is a DBMS Currently,all the DBMSs must be from the same vendor; however, as transaction-oriented servicesfrom different vendors become more standardized, distributed, heterogeneous DBMSsshould become available Eventually, perhaps, the functions of current TP monitorswill also be available in many DBMSs; for now, TP monitors provide essential infras-tructure for high-end transaction processing environments
28.1.2 New Transaction Models
Consider an application such as computer-aided design, in which users retrieve largedesign objects from a database and interactively analyze and modify them Eachtransaction takes a long time—minutes or even hours, whereas the TPC benchmarktransactions take under a millisecond—and holding locks this long affects performance.Further, if a crash occurs, undoing an active transaction completely is unsatisfactory,since considerable user effort may be lost Ideally we want to be able to restore most
of the actions of an active transaction and resume execution Finally, if several usersare concurrently developing a design, they may want to see changes being made byothers without waiting until the end of the transaction that changes the data
To address the needs of long-duration activities, several refinements of the transactionconcept have been proposed The basic idea is to treat each transaction as a collection
of related subtransactions Subtransactions can acquire locks, and the changes made
by a subtransaction become visible to other transactions after the subtransaction ends
(and before the main transaction of which it is a part commits) In multilevel
Trang 2trans-824 Chapter 28
actions, locks held by a subtransaction are released when the subtransaction ends.
In nested transactions, locks held by a subtransaction are assigned to the parent
(sub)transaction when the subtransaction ends These refinements to the transactionconcept have a significant effect on concurrency control and recovery algorithms
be placed A soft deadline means the value of the transaction decreases after the
deadline, eventually going to zero For example, in a DBMS designed to monitor someactivity (e.g., a complex reactor), a transaction that looks up the current reading of asensor must be executed within a short time, say, one second The longer it takes toexecute the transaction, the less useful the reading becomes In a real-time DBMS, thegoal is to maximize the value of executed transactions, and the DBMS must prioritizetransactions, taking their deadlines into account
28.2 INTEGRATED ACCESS TO MULTIPLE DATA SOURCES
As databases proliferate, users want to access data from more than one source Forexample, if several travel agents market their travel packages through the Web, cus-tomers would like to look at packages from different agents and compare them Amore traditional example is that large organizations typically have several databases,created (and maintained) by different divisions such as Sales, Production, and Pur-chasing While these databases contain much common information, determining theexact relationship between tables in different databases can be a complicated prob-lem For example, prices in one database might be in dollars per dozen items, whileprices in another database might be in dollars per item The development of XML
DTDs (see Section 22.3.3) offers the promise that such semantic mismatches can be
avoided if all parties conform to a single standard DTD However, there are manylegacy databases and most domains still do not have agreed-upon DTDs; the problem
of semantic mismatches will be frequently encountered for the foreseeable future.Semantic mismatches can be resolved and hidden from users by defining relationalviews over the tables from the two databases Defining a collection of views to give
a group of users a uniform presentation of relevant data from multiple databases is
called semantic integration Creating views that mask semantic mismatches in a
natural manner is a difficult task and has been widely studied In practice, the task
is made harder by the fact that the schemas of existing databases are often poorly
Trang 3in Chapter 5 Alternatively, the integrating views can be materialized and stored in
a data warehouse, as discussed in Chapter 23 Queries can then be executed over thewarehoused data without accessing the source DBMSs at run-time
28.3 MOBILE DATABASES
The availability of portable computers and wireless communications has created a newbreed of nomadic database users At one level these users are simply accessing adatabase through a network, which is similar to distributed DBMSs At another levelthe network as well as data and user characteristics now have several novel properties,which affect basic assumptions in many components of a DBMS, including the queryengine, transaction manager, and recovery manager;
Users are connected through a wireless link whose bandwidth is ten times lessthan Ethernet and 100 times less than ATM networks Communication costs aretherefore significantly higher in proportion to I/O and CPU costs
Users’ locations are constantly changing, and mobile computers have a limitedbattery life Therefore, the true communication costs reflect connection time andbattery usage in addition to bytes transferred, and change constantly depending
on location Data is frequently replicated to minimize the cost of accessing it fromdifferent locations
As a user moves around, data could be accessed from multiple database serverswithin a single transaction The likelihood of losing connections is also muchgreater than in a traditional network Centralized transaction management maytherefore be impractical, especially if some data is resident at the mobile comput-ers We may in fact have to give up on ACID transactions and develop alternativenotions of consistency for user programs
The price of main memory is now low enough that we can buy enough main memory
to hold the entire database for many applications; with 64-bit addressing, modernCPUs also have very large address spaces Some commercial systems now have several
gigabytes of main memory This shift prompts a reexamination of some basic DBMS
Trang 4a bottleneck To minimize this problem, rather than commit each transaction as
it completes, we can collect completed transactions and commit them in batches;
this is called group commit Recovery algorithms can also be optimized since
pages rarely have to be written out to make room for other pages
The implementation of in-memory operations has to be optimized carefully sincedisk accesses are no longer the limiting factor for performance
A new criterion must be considered while optimizing queries, namely the amount
of space required to execute a plan It is important to minimize the space overheadbecause exceeding available physical memory would lead to swapping pages to disk(through the operating system’s virtual memory mechanisms), greatly slowingdown execution
Page-oriented data structures become less important (since pages are no longer theunit of data retrieval), and clustering is not important (since the cost of accessingany region of main memory is uniform)
In an object-relational DBMS, users can define ADTs with appropriate methods, which
is an improvement over an RDBMS Nonetheless, supporting just ADTs falls short of
what is required to deal with very large collections of multimedia objects, including
audio, images, free text, text marked up in HTML or variants, sequence data, andvideos Illustrative applications include NASA’s EOS project, which aims to create arepository of satellite imagery, the Human Genome project, which is creating databases
of genetic information such as GenBank, and NSF/DARPA’s Digital Libraries project,which aims to put entire libraries into database systems and then make them accessiblethrough computer networks Industrial applications such as collaborative development
of engineering designs also require multimedia database management, and are beingaddressed by several vendors
We outline some applications and challenges in this area:
Content-based retrieval: Users must be able to specify selection conditions
based on the contents of multimedia objects For example, users may search forimages using queries such as: “Find all images that are similar to this image” and
“Find all images that contain at least three airplanes.” As images are inserted into
Trang 5Additional Topics 827
the database, the DBMS must analyze them and automatically extract features
that will help answer such content-based queries This information can then beused to search for images that satisfy a given query, as discussed in Chapter 26
As another example, users would like to search for documents of interest usinginformation retrieval techniques and keyword searches Vendors are moving to-wards incorporating such techniques into DBMS products It is still not clear howthese domain-specific retrieval and search techniques can be combined effectivelywith traditional DBMS queries Research into abstract data types and ORDBMSquery processing has provided a starting point, but more work is needed
Managing repositories of large objects: Traditionally, DBMSs have
concen-trated on tables that contain a large number of tuples, each of which is relativelysmall Once multimedia objects such as images, sound clips, and videos are stored
in a database, individual objects of very large size have to be handled efficiently.For example, compression techniques must be carefully integrated into the DBMSenvironment As another example, distributed DBMSs must develop techniques
to efficiently retrieve such objects Retrieval of multimedia objects in a distributedsystem has been addressed in limited contexts, such as client-server systems, but
in general remains a difficult problem
Video-on-demand: Many companies want to provide video-on-demand services
that enable users to dial into a server and request a particular video The videomust then be delivered to the user’s computer in real time, reliably and inex-pensively Ideally, users must be able to perform familiar VCR functions such asfast-forward and reverse From a database perspective, the server has to contendwith specialized real-time constraints; video delivery rates must be synchronized
at the server and at the client, taking into account the characteristics of the munication network
Geographic Information Systems (GIS) contain spatial information about cities,
states, countries, streets, highways, lakes, rivers, and other geographical features, andsupport applications to combine such spatial information with non-spatial data Asdiscussed in Chapter 26, spatial data is stored in either raster or vector formats Inaddition, there is often a temporal dimension, as when we measure rainfall at severallocations over time An important issue with spatial data sets is how to integrate datafrom multiple sources, since each source may record data using a different coordinatesystem to identify locations
Now let us consider how spatial data in a GIS is analyzed Spatial information is mostnaturally thought of as being overlaid on maps Typical queries include “What citieslie on I-94 between Madison and Chicago?” and “What is the shortest route fromMadison to St Louis?” These kinds of queries can be addressed using the techniques
Trang 6In addition, many applications involve interpolating measurements at certain locations
across an entire region to obtain a model, and combining overlapping models For
ex-ample, if we have measured rainfall at certain locations, we can use the TIN approach
to triangulate the region with the locations at which we have measurements being thevertices of the triangles Then, we use some form of interpolation to estimate therainfall at points within triangles Interpolation, triangulation, map overlays, visual-izations of spatial data, and many other domain-specific operations are supported inGIS products such as ESRI Systems’ ARC-Info Thus, while spatial query processingtechniques as discussed in Chapter 26 are an important part of a GIS product, con-siderable additional functionality must be incorporated as well How best to extendORDBMS systems with this additional functionality is an important problem yet to
be resolved Agreeing upon standards for data representation formats and coordinatesystems is another major challenge facing the field
Currently available DBMSs provide little support for queries over ordered collections
of records, or sequences, and over temporal data Typical sequence queries include
“Find the weekly moving average of the Dow Jones Industrial Average,” and “Find thefirst five consecutively increasing temperature readings” (from a trace of temperatureobservations) Such queries can be easily expressed and often efficiently executed bysystems that support query languages designed for sequences Some commercial SQLsystems now support such SQL extensions
The first example is also a temporal query However, temporal queries involve morethan just record ordering For example, consider the following query: “Find the longestinterval in which the same person managed two different departments.” If the period
during which a given person managed a department is indicated by two fields from and
to, we have to reason about a collection of intervals, rather than a sequence of records.
Further, temporal queries require the DBMS to be aware of the anomalies associatedwith calendars (such as leap years) Temporal extensions are likely to be incorporated
in future versions of the SQL standard
A distinct and important class of sequence data consists of DNA sequences, which arebeing generated at a rapid pace by the biological community These are in fact closer
to sequences of characters in text than to time sequences as in the above examples.The field of biological information management and analysis has become very popular
Trang 7Additional Topics 829
in recent years, and is called bioinformatics Biological data, such as DNA sequence
data, is characterized by complex structure and numerous relationships among dataelements, many overlapping and incomplete or erroneous data fragments (because ex-perimentally collected data from several groups, often working on related problems,
is stored in the databases), a need to frequently change the database schema itself as
new kinds of relationships in the data are discovered, and the need to maintain severalversions of data for archival and reference
28.8 INFORMATION VISUALIZATION
As computers become faster and main memory becomes cheaper, it becomes ingly feasible to create visual presentations of data, rather than just text-based reports.Data visualization makes it easier for users to understand the information in largecomplex datasets The challenge here is to make it easy for users to develop visualpresentation of their data and to interactively query such presentations Although anumber of data visualization tools are available, efficient visualization of large datasetspresents many challenges
increas-The need for visualization is especially important in the context of decision support;when confronted with large quantities of high-dimensional data and various kinds ofdata summaries produced by using analysis tools such as SQL, OLAP, and data miningalgorithms, the information can be overwhelming Visualizing the data, together withthe generated summaries, can be a powerful way to sift through this information andspot interesting trends or patterns The human eye, after all, is very good at findingpatterns A good framework for data mining must combine analytic tools to processdata, and bring out latent anomalies or trends, with a visualization environment inwhich a user can notice these patterns and interactively drill down to the original datafor further analysis
The database area continues to grow vigorously, both in terms of technology and interms of applications The fundamental reason for this growth is that the amount ofinformation stored and processed using computers is growing rapidly Regardless ofthe nature of the data and its intended applications, users need database managementsystems and their services (concurrent access, crash recovery, easy and efficient query-ing, etc.) as the volume of data increases As the range of applications is broadened,however, some shortcomings of current DBMSs become serious limitations Theseproblems are being actively studied in the database research community
The coverage in this book provides a good introduction, but is not intended to coverall aspects of database systems Ample material is available for further study, as this
Trang 8intro-Determining which entities are the same across different databases is a difficult problem;
it is an example of a semantic mismatch Resolving such mismatches has been addressed
in many papers, including [362, 412, 558, 576] [329] is an overview of theoretical work inthis area Also see the bibliographic notes for Chapter 21 for references to related work onmultidatabases, and see the notes for Chapter 2 for references to work on view integration.[260] is an early paper on main memory databases [345, 89] describe the Dali main memorystorage manager [359] surveys visualization idioms designed for large databases, and [291]discusses visualization for data mining
Visualization systems for databases include DataSpace [515], DEVise [424], IVEE [23], theMineset suite from SGI, Tioga [27], and VisDB [358] In addition, a number of general toolsare available for data visualization
Querying text repositories has been studied extensively in information retrieval; see [545] for
a recent survey This topic has generated considerable interest in the database communityrecently because of the widespread use of the Web, which contains many text sources Inparticular, HTML documents have some structure if we interpret links as edges in a graph.Such documents are examples of semistructured data; see [2] for a good overview Recentpapers on queries over the Web include [2, 384, 457, 493]
See [501] for a survey of multimedia issues in database management There has been muchrecent interest in database issues in a mobile computing environment, for example, [327, 337].See [334] for a collection of articles on this subject [639] contains several articles that coverall aspects of temporal databases The use of constraints in databases has been activelyinvestigated in recent years; [356] is a good overview Geographic Information Systems havealso been studied extensively; [511] describes the Paradise system, which is notable for itsscalability
The book [695] contains detailed discussions of temporal databases (including the TSQL2language, which is influencing the SQL standard), spatial and multimedia databases, anduncertainty in databases Another SQL extension to query sequence data, called SRQL, isproposed in [532]
Trang 9A DATABASE DESIGN CASE STUDY: THE INTERNET SHOP
Advice for software developers and horse racing enthusiasts: Avoid hacks
—Anonymous
We now present an illustrative, ‘cradle-to-grave’ design example DBDudes Inc., awell-known database consulting firm, has been called in to help Barns and Nobble(B&N) with their database design and implementation B&N is a large bookstorespecializing in books on horse racing, and they’ve decided to go online DBDudes firstverify that B&N is willing and able to pay their steep fees and then schedule a lunchmeeting—billed to B&N, naturally—to do requirements analysis
The owner of B&N has thought about what he wants and offers a concise summary:
“I would like my customers to be able to browse my catalog of books and to place ordersover the Internet Currently, I take orders over the phone I have mostly corporatecustomers who call me and give me the ISBN number of a book and a quantity Ithen prepare a shipment that contains the books they have ordered If I don’t haveenough copies in stock, I order additional copies and delay the shipment until the newcopies arrive; I want to ship a customer’s entire order together My catalog includesall the books that I sell For each book, the catalog contains its ISBN number, title,author, purchase price, sales price, and the year the book was published Most of mycustomers are regulars, and I have records with their name, address, and credit cardnumber New customers have to call me first and establish an account before they canuse my Web site
On my new Web site, customers should first identify themselves by their unique tomer identification number Then they should be able to browse my catalog and toplace orders online.”
cus-DBDudes’s consultants are a little surprised by how quickly the requirements phasewas completed—it usually takes them weeks of discussions (and many lunches anddinners) to get this done—but return to their offices to analyze this information
831
Trang 10832 Appendix A
In the conceptual design step, DBDudes develop a high level description of the data
in terms of the ER model Their initial design is shown in Figure A.1 Books andcustomers are modeled as entities and are related through orders that customers place.Orders is a relationship set connecting the Books and Customers entity sets For eachorder, the following attributes are stored: quantity, order date, and ship date As soon
as an order is shipped, the ship date is set; until then the ship date is set to null,
indicating that this order has not been shipped yet
DBDudes has an internal design review at this point, and several questions are raised
To protect their identities, we will refer to the design team leader as Dude 1 and thedesign reviewer as Dude 2:
Dude 2: What if a customer places two orders for the same book on the same day? Dude 1: The first order is handled by creating a new Orders relationship and the second
order is handled by updating the value of the quantity attribute in this relationship
Dude 2: What if a customer places two orders for different books on the same day? Dude 1: No problem Each instance of the Orders relationship set relates the customer
to a different book
Dude 2: Ah, but what if a customer places two orders for the same book on different
days?
Dude 1: We can use the attribute order date of the orders relationship to distinguish
the two orders
Dude 2: Oh no you can’t The attributes of Customers and Books must jointly contain
a key for Orders So this design does not allow a customer to place orders for the samebook on different days
Dude 1: Yikes, you’re right Oh well, B&N probably won’t care; we’ll see.
DBDudes decides to proceed with the next phase, logical database design
Using the standard approach discussed in Chapter 3, DBDudes maps the ER diagramshown in Figure A.1 to the relational model, generating the following tables:
CREATE TABLE Books ( isbn CHAR(10),
Trang 11Design Case Study: An Internet Shop 833
isbn
title price
year_published
qty_in_stock author
cardnum cid
Customers
address cname
order_date ship_date qty
Orders Books
Figure A.1 ER Diagram of the Initial Design
CREATE TABLE Orders ( isbn CHAR(10),
order date DATE,ship date DATE,PRIMARY KEY (isbn,cid),FOREIGN KEY (isbn) REFERENCES Books,FOREIGN KEY (cid) REFERENCES Customers )
CREATE TABLE Customers ( cid INTEGER,
cname CHAR(80),address CHAR(200),cardnum CHAR(16),PRIMARY KEY (cid)UNIQUE (cardnum))
The design team leader, who is still brooding over the fact that the review exposed
a flaw in the design, now has an inspiration The Orders table contains the field
order date and the key for the table contains only the fields isbn and cid Because of
this, a customer cannot order the same book on different days, a restriction that was
not intended Why not add the order date attribute to the key for the Orders table?
This would eliminate the unwanted restriction:
CREATE TABLE Orders ( isbn CHAR(10),
PRIMARY KEY (isbn,cid,ship date), )
The reviewer, Dude 2, is not entirely happy with this solution, which he calls a ‘hack’
He points out that there is no natural ER diagram that reflects this design, and stresses
Trang 12834 Appendix A
the importance of the ER diagram as a design document Dude 1 argues that whileDude 2 has a point, it is important to present B&N with a preliminary design and getfeedback; everyone agrees with this, and they go back to B&N
The owner of B&N now brings up some additional requirements that he did not mentionduring the initial discussions: “Customers should be able to purchase several differentbooks in a single order For example, if a customer wants to purchase three copies of
‘The English Teacher’ and two copies of ‘The Character of Physical Law,’ the customershould be able to place a single order for both books.”
The design team leader, Dude 1, asks how this affects the shippping policy Does B&Nstill want to ship all books in an order together? The owner of B&N explains theirshipping policy: “As soon as we have have enough copies of an ordered book we ship
it, even if an order contains several books So it could happen that the three copies
of ‘The English Teacher’ are shipped today because we have five copies in stock, butthat ‘The Character of Physical Law’ is shipped tomorrow, because we currently haveonly one copy in stock and another copy arrives tomorrow In addition, my customerscould place more than one order per day, and they want to be able to identify theorders they placed.”
The DBDudes team thinks this over and identifies two new requirements: first, itmust be possible to order several different books in a single order, and second, acustomer must be able to distinguish between several orders placed the same day Toaccomodate these requirements, they introduce a new attribute into the Orders table
called ordernum, which uniquely identifies an order and therefore the customer placing the order However, since several books could be purchased in a single order, ordernum and isbn are both needed to determine qty and ship date in the Orders table.
Orders are assigned order numbers sequentially and orders that are placed later havehigher order numbers If several orders are placed by the same customer on a singleday, these orders have different order numbers and can thus be distinguished TheSQL DDL statement to create the modified Orders table is given below:
CREATE TABLE Orders ( ordernum INTEGER,
Trang 13Design Case Study: An Internet Shop 835
Next, DBDudes analyzes the set of relations for possible redundancy The Books
rela-tion has only one key (isbn), and no other funcrela-tional dependencies hold over the table Thus, Books is in BCNF The Customers relation has the key (cid), and since a credit card number uniquely identifies its card holder, the functional dependency cardnum → cid also holds Since cid is a key, cardnum is also a key No other dependencies hold,
and so Customers is also in BCNF
DBDudes has already identified the pair hordernum, isbni as the key for the Orders
table In addition, since each order is placed by one customer on one specific date, thefollowing two functional dependencies hold:
ordernum → cid, and ordernum → order date
The experts at DBDudes conclude that Orders is not even in 3NF (Can you see why?)They decide to decompose Orders into the following two relations:
Orders(ordernum, cid, order date, and
Orderlists(ordernum, isbn, qty, ship date)
The resulting two relations, Orders and Orderlists, are both in BCNF, and the
decom-position is lossless-join since ordernum is a key for (the new) Orders The reader is
invited to check that this decomposition is also dependency-preserving For ness, we give the SQL DDL for the Orders and Orderlists relations below:
complete-CREATE TABLE Orders ( ordernum INTEGER,
order date DATE,PRIMARY KEY (ordernum),FOREIGN KEY (cid) REFERENCES Customers )
CREATE TABLE Orderlists ( ordernum INTEGER,
ship date DATE,PRIMARY KEY (ordernum, isbn),FOREIGN KEY (isbn) REFERENCES Books)
Figure A.2 shows an updated ER diagram that reflects the new design Note thatDBDudes could have arrived immediately at this diagram if they had made Orders anentity set instead of a relationship set right at the beginning But at that time they didnot understand the requirements completely, and it seemed natural to model Orders
Trang 14cardnum cid
Customers
address cname
Orders Books
qty ship_date order_date Order_List Place_Order
ordernum
Figure A.2 ER Diagram Reflecting the Final Design
as a relationship set This iterative refinement process is typical of real-life databasedesign processes As DBDudes has learned over time, it is rare to achieve an initialdesign that is not changed as a project progresses
The DBDudes team celebrates the successful completion of logical database design andschema refinement by opening a bottle of champagne and charging it to B&N Afterrecovering from the celebration, they move on to the physical design phase
A.5 PHYSICAL DATABASE DESIGN
Next, DBDudes considers the expected workload The owner of the bookstore expectsmost of his customers to search for books by ISBN number before placing an order.Placing an order involves inserting one record into the Orders table and inserting one
or more records into the Orderlists relation If a sufficient number of books is available,
a shipment is prepared and a value for the ship date in the Orderlists relation is set In
addition, the available quantities of books in stocks changes all the time since ordersare placed that decrease the quantity available and new books arrive from suppliersand increase the quantity available
The DBDudes team begins by considering searches for books by ISBN Since isbn is
a key, an equality query on isbn returns at most one record Thus, in order to speed
up queries from customers who look for books with a given ISBN, DBDudes decides
to build an unclustered hash index on isbn.
Next, they consider updates to book quantities To update the qty in stock value for
a book, we must first search for the book by ISBN; the index on isbn speeds this
up Since the qty in stock value for a book is updated quite frequently, DBDudes also
considers partitioning the Books relation vertically into the following two relations:
Trang 15Design Case Study: An Internet Shop 837
BooksQty(isbn, qty), and
BookRest(isbn, title, author, price, year published).
Unfortunately, this vertical partition would slow down another very popular query:Equality search on ISBN to retrieve full information about a book would require ajoin between BooksQty and BooksRest So DBDudes decide not to vertically partitionBooks
DBDudes thinks it is likely that customers will also want to search for books by title
and by author, and decides to add unclustered hash indexes on title and author—these
indexes are inexpensive to maintain because the set of books is rarely changed eventhough the quantity in stock for a book changes often
Next, they consider the Customers relation A customer is first identified by the uniquecustomer identifaction number Thus, the most common queries on Customers areequality queries involving the customer identification number, and DBDudes decides
to build a clustered hash index on cid to achieve maximum speedup for this query.
Moving on to the Orders relation, they see that it is involved in two queries: insertion
of new orders and retrieval of existing orders Both queries involve the ordernum
attribute as search key and so they decide to build an index on it What type ofindex should this be—a B+ tree or a hash index? Since order numbers are assigned
sequentially and thus correspond to the order date, sorting by ordernum effectively
sorts by order date as well Thus DBDudes decides to build a clustered B+ tree index
on ordernum Although the operational requirements that have been mentioned until
know favor neither a B+ tree nor a hash index, B&N will probably want to monitordaily activities, and the clustered B+ tree is a better choice for such range queries Ofcourse, this means that retrieving all orders for a given customer could be expensive
for customers with many orders, since clustering by ordernum precludes clustering by other attributes, such as cid.
The Orderlists relation mostly involves insertions, with an occasional update of ashipment date or a query to list all components of a given order If Orderlists is kept
sorted on ordernum, all insertions are appends at the end of the relation and thus very efficient A clustered B+ tree index on ordernum maintains this sort order and also
speeds up retrieval of all items for a given order To update a shipment date, we need
to search for a tuple by ordernum and isbn The index on ordernum helps here as well.
Although an index on hordernum, isbni would be better for this purpose, insertions
would not be as efficient as with an index on just ordernum; DBDudes therefore decides
to index Orderlists on just ordernum.
Trang 16838 Appendix A
A.5.1 Tuning the Database
We digress from our discussion of the initial design to consider a problem that arisesseveral months after the launch of the B&N site DBDudes is called in and told thatcustomer enquiries about pending orders are being processed very slowly B&N hasbecome very successful, and the Orders and Orderlists tables have grown huge.Thinking further about the design, DBDudes realizes that there are two types of orders:
completed orders, for which all books have already shipped, and partially completed ders, for which some books are yet to be shipped Most customer requests to look up
or-an order involve partially completed orders, which are a small fraction of all orders.DBDudes therefore decides to horizontally partition both the Orders table and the Or-
derlists table by ordernum This results in four new relations: NewOrders, OldOrders,
NewOrderlists, and OldOrderlists
An order and its components are always in exactly one pair of relations—and we
can determine which pair, old or new, by a simple check on ordernum—and queries
involving that order can always be evaluated using only the relevant relations Somequeries are now slower, such as those asking for all of a customer’s orders, since theyrequire us to search two sets of relations However, these queries are infrequent andtheir performance is acceptable
Returning to our discussion of the initial design phase, recall that DBDudes completedphysical database design Next, they address security There are three groups of users:customers, employees, and the owner of the book shop (Of course, there is also thedatabase administrator who has universal access to all data and who is responsible forregular operation of the database system.)
The owner of the store has full privileges on all tables Customers can query the Bookstable and can place orders online, but they should not have access to other customers’records nor to other customers’ orders DBDudes restricts access in two ways First,they design a simple Web page with several forms similar to the page shown in Figure22.1 in Chapter 22 This allows customers to submit a small collection of valid requestswithout giving them the ability to directly access the underlying DBMS through anSQL interface Second, they use the security features of the DBMS to limit access tosensitive data
The Web page allows customers to query the Books relation by ISBN number, name ofthe author, and title of a book The Web page also has two buttons The first buttonretrieves a list of all of the customer’s orders that are not completely fulfilled yet Thesecond button will display a list of all completed orders for that customer Note that
Trang 17Design Case Study: An Internet Shop 839
customers cannot specify actual SQL queries through the Web; they can only fill insome parameters in a form to instantiate an automatically generated SQL query Allqueries that are generated through form input have a WHERE clause that includes the
cid attribute value of the current customer, and evaluation of the queries generated
by the two buttons requires knowledge of the customer identification number Sinceall users have to log on to the Web site before browsing the catalog, the business logic(discussed in Section A.7) must maintain state information about a customer (i.e., thecustomer identification number) during the customer’s visit to the Web site
The second step is to configure the database to limit access according to each usergroup’s need to know DBDudes creates a special customer account that has thefollowing privileges:
SELECT ON Books, NewOrders, OldOrders, NewOrderlists, OldOrderlistsINSERT ON NewOrders, OldOrders, NewOrderlists, OldOrderlists
Employees should be able to add new books to the catalog, update the quantity of abook in stock, revise customer orders if necessary, and update all customer information
except the credit card information In fact, employees should not even be able to see a
customer’s credit card number Thus, DBDudes creates the following view:
CREATE VIEW CustomerInfo (cid,cname,address)
AS SELECT C.cid, C.cname, C.addressFROM Customers C
They give the employee account the following privileges:
SELECT ON CustomerInfo, Books,
NewOrders, OldOrders, NewOrderlists, OldOrderlistsINSERT ON CustomerInfo, Books,
NewOrders, OldOrders, NewOrderlists, OldOrderlistsUPDATE ON CustomerInfo, Books,
NewOrders, OldOrders, NewOrderlists, OldOrderlistsDELETE ON Books, NewOrders, OldOrders, NewOrderlists, OldOrderlists
In addition, there are security issues when the user first logs on to the Web site usingthe customer identification number Sending the number unencrypted over the Internet
is a security hazard, and a secure protocol such as the SSL should be used
There are companies such as CyberCash and DigiCash that offer electronic commercepayment solutions, even including ‘electronic’ cash Discussion of how to incorporatesuch techniques into the Website are outside the scope of this book
Trang 18840 Appendix A
DBDudes now moves on to the implementation of the application layer and considersalternatives for connecting the DBMS to the World-Wide Web (see Chapter 22).DBDudes note the need for session management For example, users who log in tothe site, browse the catalog, and then select books to buy do not want to re-entertheir customer identification number Session management has to extend to the wholeprocess of selecting books, adding them to a shopping cart, possibly removing booksfrom the cart, and then checking out and paying for the books
DBDudes then considers whether Web pages for books should be static or dynamic
If there is a static Web page for each book, then we need an extra database field inthe Books relation that points to the location of the file Even though this enablesspecial page designs for different books, it is a very labor intensive solution DBDudesconvinces B&N to dynamically assemble the Web page for a book from a standardtemplate instantiated with information about the book in the Books relation
This leaves DBDudes with one final decision, namely how to connect applications tothe DBMS They consider the two main alternatives that we presented in Section 22.2:CGI scripts versus using an application server infrastructure If they use CGI scripts,they would have to encode session management logic—not an easy task If they use
an application server, they can make use of all the functionality that the applicationserver provides Thus, they recommend that B&N implement server-side processingusing an application server
B&N, however, refuses to pay for an application server and decides that for theirpurposes CGI scripts are fine DBDudes accepts B&N’s decision and proceeds to buildthe following pieces:
The top level HTML pages that allow users to navigate the site, and various formsthat allow users to search the catalog by ISBN, author name, or author title Anexample page containing a search form is shown in Figure 22.1 in Chapter 22 Inaddition to the input forms, DBDudes must develop appropriate presentations forthe results
The logic to track a customer session Relevant information must be stored either
in a server-side data structure or be cached in hte customer’s browser using a
mechanism like cookies Cookies are pieces of information that a Web server
can store in a user’s Web browser Whenever the user generates a request, thebrowser passes along the stored information, thereby enabling the Web server to
‘remember’ what the user did earlier
The scripts that process the user requests For example, a customer can use aform called ‘Search books by title’ to type in a title and search for books with that
Trang 19Design Case Study: An Internet Shop 841
title The CGI interface communicates with a script that processes the request
An example of such a script written in Perl using DBI for data access is shown inFigure 22.4 in Chapter 22
For completeness, we remark that if B&N had agreed to use an application server,DBDudes would have had the following tasks:
As in the CGI-based architecture, they would have to design top level pages thatallow customers to navigate the Web site as well as various search forms and resultpresentations
Assuming that DBDudes select a Java-based application server, they have to writeJava Servlets to process form-generated requests Potentially, they could reuseexisting (possibly commercially available) JavaBeans They can use JDBC as adatabase interface; examples of JDBC code can be found in Section 5.10 Instead
of programming Servlets, they could resort to Java Server Pages and annotatepages with special JSP markup tags An example of a Web page that includesJSP commands is shown in Section 22.2.1
If DBDudes select an application server that uses proprietary markup tags, theyhave to develop Web pages by using such tags An example using Cold Fusionmarkup tags can be found in Section 22.2.1
Our discussion thus far only covers the ‘client-interface’, the part of the Web site that
is exposed to B&N’s customers DBDudes also need to add applications that allowthe employees and the shop owner to query and access the database and to generatesummary reports of business activities
This completes our discussion of Barns and Nobble While this study only describes
a small part of a real problem, we saw that a design even at this scale involved trivial tradeoffs We would like to emphasize again that database design is an iterativeprocess and that therefore it is very important not to lock oneself down early on in afixed model that is too inflexible to accomodate a changing environment Welcome tothe exciting world of database management!
Trang 20non-B THE MINIBASE SOFTWARE
Practice is the best of all instructors
—Publius Syrus, 42 B.C
Minibase is a small relational DBMS, together with a suite of visualization tools, that
has been developed for use with this book While the book makes no direct reference tothe software and can be used independently, Minibase offers instructors an opportunity
to design a variety of hands-on assignments, with or without programming To see anonline description of the software, visit this URL:
http://www.cs.wisc.edu/˜ dbbook/minibase.html
The software is available freely through ftp By registering themselves as users atthe URL for the book, instructors can receive prompt notification of any major bugreports and fixes Sample project assignments, which elaborate upon some of the
briefly sketched ideas in the project-based exercises at the end of chapters, can be seen
at
http://www.cs.wisc.edu/˜ dbbook/minihwk.html
Instructors should consider making small modifications to each assignment to age undesirable ‘code reuse’ by students; assignment handouts formatted using Latexare available by ftp Instructors can also obtain solutions to these assignments bycontacting the authors (raghu@cs.wisc.edu, johannes@cs.cornell.edu)
Minibase is intended to supplement the use of a commercial DBMS such as Oracle orSybase in course projects, not to replace them While a commercial DBMS is idealfor SQL assignments, it does not allow students to understand how the DBMS works.Minibase is intended to address the latter issue; the subset of SQL that it supports isintentionally kept small, and students should also be asked to use a commercial DBMSfor writing SQL queries and programs Minibase is provided on an as-is basis with nowarranties or restrictions for educational or personal use It includes the following:
842
Trang 21The Minibase Software 843
Code for a small single-user relational DBMS, including a parser and query mizer for a subset of SQL, and components designed to be (re)written by students
opti-as project opti-assignments: heap files, buffer manager, B+ trees, sorting, and joins.
Graphical visualization tools to aid in students’ exploration and understanding of
the behavior of the buffer management, B+ tree, and query optimization
compo-nents of the system There is also a graphical tool to refine a relational database
design using normalization.
Several assignments involving the use of Minibase are described below Each of thesehas been tested in a course already, but the details of how Minibase is set up might vary
at your school, so you may have to modify the assignments accordingly If you plan touse these assignments, you are advised to download and try them at your site well inadvance of handing them to students We have done our best to test and documentthese assignments, and the Minibase software, but bugs undoubtedly persist Pleasereport bugs at this URL:
http://www.cs.wisc.edu/˜ dbbook/minibase.comments.html
I hope that users will contribute bug fixes, additional project assignments, and sions to Minibase These will be made publicly available through the Minibase site,together with pointers to the authors
exten-B.2.1 Overview of Programming Projects
In several assignments, students are asked to rewrite a component of Minibase Thebook provides the necessary background for all of these assignments, and the assign-ment handout provides additional system-level details The online HTML documen-tation provides an overview of the software, in particular the component interfaces,and can be downloaded and installed at each school that uses Minibase The projectslisted below should be assigned after covering the relevant material from the indicatedchapter
Buffer manager (Chapter 7): Students are given code for the layer that
man-ages space on disk and supports the concept of pman-ages with page ids They areasked to implement a buffer manager that brings requested pages into memory ifthey are not already there One variation of this assignment could use differentreplacement policies Students are asked to assume a single-user environment,with no concurrency control or recovery management
HF page (Chapter 7): Students must write code that manages records on a
page using a slot-directory page format to keep track of records on a page Possible
Trang 22844 Appendix B
variants include fixed-length versus variable-length records and other ways to keeptrack of records on a page
Heap files (Chapter 7): Using the HF page and buffer manager code, students
are asked to implement a layer that supports the abstraction of files of unorderedpages, that is, heap files
B+ trees (Chapter 9): This is one of the more complex assignments Students
have to implement a page class that maintains records in sorted order within apage and implement the B+ tree index structure to impose a sort order acrossseveral leaf-level pages Indexes storehkey, record-pointeri pairs in leaf pages, and
data records are stored separately (in heap files) Similar assignments can easily
be created for Linear Hashing or Extendible Hashing index structures
External sorting (Chapter 11): Building upon the buffer manager and heap
file layers, students are asked to implement external merge-sort The emphasis is
on minimizing I/O, rather than on the in-memory sort used to create sorted runs
Sort-merge join (Chapter 12): Building upon the code for external sorting,
students are asked to implement the sort-merge join algorithm This assignmentcan be easily modified to create assignments that involve other join algorithms
Index nested-loop join (Chapter 12): This assignment is similar to the
sort-merge join assignment, but relies on B+ tree (or other indexing) code, instead ofsorting code
B.2.2 Overview of Nonprogramming Assignments
Four assignments that do not require students to write any code (other than SQL, inone assignment) are also available
Optimizer exercises (Chapter 13): The Minibase optimizer visualizer offers
a flexible tool to explore how a typical relational query optimizer works It cepts single-block SQL queries (including some queries that cannot be executed
ac-in Mac-inibase, such as queries ac-involvac-ing groupac-ing and aggregate operators) dents can inspect and modify synthetic catalogs, add and drop indexes, enable ordisable different join algorithms, enable or disable index-only evaluation strate-gies, and see the effect of such changes on the plan produced for a given query.All (sub)plans generated by an iterative System R style optimizer can be viewed,ordered by the iteration in which they are generated, and details on a given plancan be obtained readily All interaction with the optimizer visualizer is through aGUI and requires no programming
Stu-The assignment introduces students to this tool and then requires them to answerquestions involving specific catalogs, queries, and plans generated by controllingvarious parameters
Trang 23The Minibase Software 845
Buffer manager viewer (Chapter 12): This viewer lets students visualize
how pages are moved in and out of the buffer pool, their status (e.g., dirty bit,pin count) while in the pool, and some statistics (e.g., number of hits) The as-signment requires students to generate traces by modifying some trace-generationcode (provided) and to answer questions about these traces by using the visual-izer to look at them While this assignment can be used after covering Chapter
7, deferring it until after Chapter 12 enables students to examine traces that arerepresentative of different relational operations
B+ tree viewer (Chapter 9): This viewer lets students see a B+ tree as it is
modified through insert and delete statements The assignment requires students
to work with trace files, answer questions about them, and generate operationtraces (i.e., a sequence of inserts and deletes) that create specified kinds of trees
Normalization tool (Chapter 15): The normalization viewer is a tool for
nor-malizing relational tables It supports the concept of a refinement session, in
which a schema is decomposed repeatedly and the resulting decomposition tree isthen saved For a given schema, a user might consider several alternative decom-positions (more precisely, decomposition trees), and each of these can be saved
as a refinement session Refinement sessions are a very flexible and convenientmechanism for trying out several alternative decomposition strategies The nor-malization assignment introduces students to this tool and asks design-orientedquestions involving the use of the tool
Assignments that require students to evaluate various components can also be oped For example, students can be asked to compare different join methods, differentindex methods, and different buffer management policies
The Minibase software was inpired by Minirel, a small relational DBMS developed byDavid DeWitt for instructional use Minibase was developed by a large number ofdedicated students over a long time, and the design was guided by Mike Carey and R.Ramakrishnan See the online documentation for more on Minibase’s history
Trang 25[1] R Abbott and H Garcia-Molina Scheduling real-time transactions: a performance
evaluation ACM Transactions on Database Systems, 17(3), 1992.
[2] S Abiteboul Querying semi-structured data In Intl Conf on Database Theory, 1997 [3] S Abiteboul, R Hull, and V Vianu Foundations of Databases Addison-Wesley, 1995 [4] S Abiteboul and P Kanellakis Object identity as a query language primitive In Proc.
ACM SIGMOD Conf on the Management of Data, 1989.
[5] S Abiteboul and V Vianu Regular path queries with constraints In Proc ACM Symp.
on Principles of Database Systems, 1997.
[6] K Achyutuni, E Omiecinski, and S Navathe Two techniques for on-line index
mod-ification in shared nothing parallel databases In Proc ACM SIGMOD Conf on the
Management of Data, 1996.
[7] S Adali, K Candan, Y Papakonstantinou, and V Subrahmanian Query caching and
optimization in distributed mediator systems In Proc ACM SIGMOD Conf on the
Management of Data, 1996.
[8] M E Adiba Derived relations: A unified mechanism for views, snapshots and
dis-tributed data In Proc Intl Conf on Very Large Databases, 1981.
[9] S Agarwal, R Agrawal, P Deshpande, A Gupta, J Naughton, R Ramakrishnan, and
S Sarawagi On the computation of multidimensional aggregates In Proc Intl Conf.
on Very Large Databases, 1996.
[10] D Agrawal and A El Abbadi The generalized tree quorum protocol: an efficient
approach for managing replicated data ACM Transactions on Database Systems, 17(4),
1992
[11] D Agrawal, A El Abbadi, and R Jeffers Using delayed commitment in locking
pro-tocols for real-time databases In Proc ACM SIGMOD Conf on the Management of
Data, 1992.
[12] R Agrawal, M Carey, and M Livny Concurrency control performance-modeling:
Alternatives and implications In Proc ACM SIGMOD Conf on the Management of
Data, 1985.
[13] R Agrawal and D DeWitt Integrated concurrency control and recovery
mecha-nisms: Design and performance evaluation ACM Transactions on Database Systems,
10(4):529–564, 1985
[14] R Agrawal and N Gehani ODE (Object Database and Environment): The language
and the data model In Proc ACM SIGMOD Conf on the Management of Data, 1989.
[15] R Agrawal, J E Gehrke, D Gunopulos, and P Raghavan Automatic subspace
clus-tering of high dimensional data for data mining In Proc ACM SIGMOD Conf on
Management of Data, 1998.
[16] R Agrawal, T Imielinski, and A Swami Database mining: A performance perspective
IEEE Transactions on Knowledge and Data Engineering, 5(6):914–925, December 1993.
[17] R Agrawal, H Mannila, R Srikant, H Toivonen, and A I Verkamo Fast Discovery
of Association Rules In U M Fayyad, G Piatetsky-Shapiro, P Smyth, and R
Uthu-rusamy, editors, Advances in Knowledge Discovery and Data Mining, chapter 12, pages
307–328 AAAI/MIT Press, 1996
847
Trang 26848 Database Management Systems
[18] R Agrawal, G Psaila, E Wimmers, and M Zaot Querying shapes of histories In
Proc Intl Conf on Very Large Databases, 1995.
[19] R Agrawal and J Shafer Parallel mining of association rules IEEE Transactions on
Knowledge and Data Engineering, 8(6):962–969, 1996.
[20] R Agrawal and R Srikant Mining sequential patterns In Proc IEEE Intl Conf on
Data Engineering, 1995.
[21] R Agrawal, P Stolorz, and G Piatetsky-Shapiro, editors Proc Intl Conf on
Knowl-edge Discovery and Data Mining AAAI Press, 1998.
[22] R Ahad, K BapaRao, and D McLeod On estimating the cardinality of the projection
of a database relation ACM Transactions on Database Systems, 14(1):28–40, 1989.
[23] C Ahlberg and E Wistrand IVEE: an information visualization exploration
environ-ment In Intl Symp on Information Visualization, 1995.
[24] A Aho, C Beeri, and J Ullman The theory of joins in relational databases ACM
Transactions on Database Systems, 4(3):297–314, 1979.
[25] A Aho, J Hopcroft, and J Ullman The Design and Analysis of Computer Algorithms.
Addison-Wesley, 1983
[26] A Aho, Y Sagiv, and J Ullman Equivalences among relational expressions SIAM
Journal of Computing, 8(2):218–246, 1979.
[27] A Aiken, J Chen, M Stonebraker, and A Woodruff Tioga-2: A direct manipulation
database visualization environment In Proc IEEE Intl Conf on Data Engineering,
1996
[28] A Aiken, J Widom, and J Hellerstein Static analysis techniques for predicting the
behavior of active database rules ACM Transactions on Database Systems, 20(1):3–41,
1995
[29] E Anwar, L Maugis, and U Chakravarthy A new perspective on rule support for
object-oriented databases In Proc ACM SIGMOD Conf on the Management of Data,
1993
[30] K Apt, H Blair, and A Walker Towards a theory of declarative knowledge In
Foundations of Deductive Databases and Logic Programming J Minker (ed.), Morgan
Kaufmann, 1988
[31] W Armstrong Dependency structures of database relationships In Proc IFIP
Congress, 1974.
[32] G Arocena and A O Mendelzon WebOQL: restructuring documents, databases and
webs In Proc Intl Conf on Data Engineering, 1988.
[33] M Astrahan, M Blasgen, D Chamberlin, K Eswaran, J Gray, P Griffiths, W King,
R Lorie, P McJones, J Mehl, G Putzolu, I Traiger, B Wade, and V Watson System
R: A relational approach to database management ACM Transactions on Database
Systems, 1(2):97–137, 1976.
[34] M Atkinson, P Bailey, K Chisholm, P Cockshott, and R Morrison An approach to
persistent programming In Readings in Object-Oriented Databases eds S.B Zdonik
and D Maier, Morgan Kaufmann, 1990
[35] M Atkinson and O Buneman Types and persistence in database programming
lan-guages ACM Computing Surveys, 19(2):105–190, 1987.
[36] R Attar, P Bernstein, and N Goodman Site initialization, recovery, and back-up in a
distributed database system IEEE Transactions on Software Engineering, 10(6):645–
650, 1983
[37] P Atzeni, L Cabibbo, and G Mecca Isalog: a declarative language for complex objects
with hierarchies In Proc IEEE Intl Conf on Data Engineering, 1993.
[38] P Atzeni and V De Antonellis Relational Database Theory Benjamin-Cummings,
1993
Trang 27REFERENCES 849
[39] P Atzeni, G Mecca, and P Merialdo To weave the web In Proc Intl Conf Very
Large Data Bases, 1997.
[40] R Avnur, J Hellerstein, B Lo, C Olston, B Raman, V Raman, T Roth, and K Wylie
Control: Continuous output and navigation technology with refinement online In Proc.
ACM SIGMOD Conf on the Management of Data, 1998.
[41] D Badal and G Popek Cost and performance analysis of semantic integrity validation
methods In Proc ACM SIGMOD Conf on the Management of Data, 1979.
[42] A Badia, D Van Gucht, and M Gyssens Querying with generalized quantifiers In
Applications of Logic Databases ed R Ramakrishnan, Kluwer Academic, 1995.
[43] I Balbin, G Port, K Ramamohanarao, and K Meenakshi Efficient bottom-up
compu-tation of queries on stratified databases Journal of Logic Programming, 11(3):295–344,
1991
[44] I Balbin and K Ramamohanarao A generalization of the differential approach to
recursive query evaluation Journal of Logic Programming, 4(3):259–262, 1987 [45] F Bancilhon, C Delobel, and P Kanellakis Building an Object-Oriented Database
System Morgan Kaufmann, 1991.
[46] F Bancilhon and S Khoshafian A calculus for complex objects Journal of Computer
and System Sciences, 38(2):326–340, 1989.
[47] F Bancilhon, D Maier, Y Sagiv, and J Ullman Magic sets and other strange ways
to implement logic programs In ACM Symp on Principles of Database Systems, 1986.
[48] F Bancilhon and R Ramakrishnan An amateur’s introduction to recursive query
processing strategies In Proc ACM SIGMOD Conf on the Management of Data,
1986
[49] F Bancilhon and N Spyratos Update semantics of relational views ACM Transactions
on Database Systems, 6(4):557–575, 1981.
[50] E Baralis, S Ceri, and S Paraboschi Modularization techniques for active rules
design ACM Transactions on Database Systems, 21(1):1–29, 1996.
[51] R Barquin and H Edelstein Planning and Designing the Data Warehouse
Prentice-Hall, 1997
[52] C Batini, S Ceri, and S Navathe Database Design: An Entity Relationship Approach.
Benjamin/Cummings Publishers, 1992
[53] C Batini, M Lenzerini, and S Navathe A comparative analysis of methodologies for
database schema integration ACM Computing Surveys, 18(4):323–364, 1986.
[54] D Batory, J Barnett, J Garza, K Smith, K Tsukuda, B Twichell, and T Wise
GENESIS: an extensible database management system In Readings in Object-Oriented
Databases eds S.B Zdonik and D Maier, Morgan Kaufmann, 1990.
[55] B Baugsto and J Greipsland Parallel sorting methods for large data volumes on a
hypercube database computer In Proc Intl Workshop on Database Machines, 1989.
[56] R Bayer and E McCreight Organization and maintenance of large ordered indexes
Acta Informatica, 1(3):173–189, 1972.
[57] R Bayer and M Schkolnick Concurrency of operations on B-trees Acta Informatica,
9(1):1–21, 1977
[58] M Beck, D Bitton, and W Wilkinson Sorting large files on a backend multiprocessor
IEEE Transactions on Computers, 37(7):769–778, 1988.
[59] N Beckmann, H.-P Kriegel, R Schneider, and B Seeger The r∗ tree: An efficient and
robust access method for points and rectangles In Proc ACM SIGMOD Conf on the
Management of Data, 1990.
[60] C Beeri, R Fagin, and J Howard A complete axiomatization of functional and
mul-tivalued dependencies in database relations In Proc ACM SIGMOD Conf on the
Management of Data, 1977.
Trang 28850 Database Management Systems
[61] C Beeri and P Honeyman Preserving functional dependencies SIAM Journal of
Computing, 10(3):647–656, 1982.
[62] C Beeri and T Milo A model for active object-oriented database In Proc Intl Conf.
on Very Large Databases, 1991.
[63] C Beeri, S Naqvi, R Ramakrishnan, O Shmueli, and S Tsur Sets and negation in
a logic database language (LDL1) In ACM Symp on Principles of Database Systems,
[67] S Berchtold, C Bohm, and H.-P Kriegel The pyramid-tree: Breaking the curse of
dimensionality In ACM SIGMOD Conf on the Management of Data, 1998.
[68] P Bernstein Synthesizing third normal form relations from functional dependencies
ACM Transactions on Database Systems, 1(4):277–298, 1976.
[69] P Bernstein, B Blaustein, and E Clarke Fast maintenance of semantic integrity
assertions using redundant aggregate data In Proc Intl Conf on Very Large Databases,
1980
[70] P Bernstein and D Chiu Using semi-joins to solve relational queries Journal of the
ACM, 28(1):25–40, 1981.
[71] P Bernstein and N Goodman Timestamp-based algorithms for concurrency control in
distributed database systems In Proc Intl Conf on Very Large Databases, 1980.
[72] P Bernstein and N Goodman Concurrency control in distributed database systems
ACM Computing Surveys, 13(2):185–222, 1981.
[73] P Bernstein and N Goodman Power of natural semijoins SIAM Journal of Computing,
10(4):751–771, 1981
[74] P Bernstein and N Goodman Multiversion concurrency control—theory and
algo-rithms ACM Transactions on Database Systems, 8(4):465–483, 1983.
[75] P Bernstein, N Goodman, E Wong, C Reeve, and J Rothnie Query processing in
a system for distributed databases (SDD-1) ACM Transactions on Database Systems,
6(4):602–625, 1981
[76] P Bernstein, V Hadzilacos, and N Goodman Concurrency Control and Recovery in
Database Systems Addison-Wesley, 1987.
[77] P Bernstein and E Newcomer Principles of Transaction Processing Morgan
Kauf-mann, 1997
[78] P Bernstein, D Shipman, and J Rothnie Concurrency control in a system for
dis-tributed databases (SDD-1) ACM Transactions on Database Systems, 5(1):18–51, 1980.
[79] P Bernstein, D Shipman, and W Wong Formal aspects of serializability in database
concurrency control IEEE Transactions on Software Engineering, 5(3):203–216, 1979.
[80] K Beyer, J Goldstein, R Ramakrishnan, and U Shaft When is nearest neighbor
meaningful? In IEEE International Conference on Database Theory, 1999.
[81] K Beyer and R Ramakrishnan Bottom-up computation of sparse and iceberg cubes
In Proc ACM SIGMOD Conf on the Management of Data, 1999.
[82] B Bhargava (ed.) Concurrency Control and Reliability in Distributed Systems Van
Nostrand Reinhold, 1987
[83] A Biliris The performance of three database storage structures for managing large
objects In Proc ACM SIGMOD Conf on the Management of Data, 1992.
Trang 29REFERENCES 851
[84] J Biskup and B Convent A formal view integration method In Proc ACM SIGMOD
Conf on the Management of Data, 1986.
[85] J Biskup, U Dayal, and P Bernstein Synthesizing independent database schemas In
Proc ACM SIGMOD Conf on the Management of Data, 1979.
[86] D Bitton and D DeWitt Duplicate record elimination in large data files ACM Transactions on Database Systems, 8(2):255–265, 1983.
[87] J Blakeley, P.-A Larson, and F Tompa Efficiently updating materialized views In
Proc ACM SIGMOD Conf on the Management of Data, 1986.
[88] M Blasgen and K Eswaran On the evaluation of queries in a database system nical report, IBM FJ (RJ1745), San Jose, 1975
Tech-[89] P Bohannon, D Leinbaugh, R Rastogi, S Seshadri, A Silberschatz, and S Sudarshan
Logical and physical versioning in main memory databases In Proc Intl Conf on
Very Large Databases, 1997.
[90] R Boyce and D Chamberlin SEQUEL: a structured English query language In Proc.
ACM SIGMOD Conf on the Management of Data, 1974.
[91] P S Bradley and U M Fayyad Refining initial points for K-Means clustering In Proc.
Intl Conf on Machine Learning, pages 91–99 Morgan Kaufmann, San Francisco, CA,
1998
[92] P S Bradley, U M Fayyad, and C Reina Scaling clustering algorithms to large
databases In Proc Intl Conf on Knowledge Discovery and Data Mining, 1998 [93] K Bratbergsengen Hashing methods and relational algebra operations In Proc Intl.
Conf on Very Large Databases, 1984.
[94] L Breiman, J H Friedman, R A Olshen, and C J Stone Classification and Regression
Trees Wadsworth, Belmont, 1984.
[95] Y Breitbart, H Garcia-Molina, and A Silberschatz Overview of multidatabase
trans-action management In Proc Intl Conf on Very Large Databases, 1992.
[96] Y Breitbart, A Silberschatz, and G Thompson Reliable transaction management in
a multidatabase system In Proc ACM SIGMOD Conf on the Management of Data,
1990
[97] Y Breitbart, A Silberschatz, and G Thompson An approach to recovery management
in a multidatabase system In Proc Intl Conf on Very Large Databases, 1992.
[98] S Brin, R Motwani, and C Silverstein Beyond market baskets: Generalizing
associ-ation rules to correlassoci-ations In Proc ACM SIGMOD Conf on the Management of Data,
1997
[99] S Brin and L Page The anatomy of a large-scale hypertextual web search engine In
Proceedings of 7th World Wide Web Conference, 1998.
[100] T Brinkhoff, H Kriegel, and R Schneider Comparison of approximations of complexobjects used for approximation-based query processing in spatial database systems In
Proc IEEE Intl Conf on Data Engineering, 1993.
[101] K Brown, M Carey, and M Livny Goal-oriented buffer management revisited In
Proc ACM SIGMOD Conf on the Management of Data, 1996.
[102] F Bry Towards an efficient evaluation of general queries: Quantifier and disjunction
processing revisited In Proc ACM SIGMOD Conf on the Management of Data, 1989.
[103] F Bry and R Manthey Checking consistency of database constraints: a logical basis
In Proc Intl Conf on Very Large Databases, 1986.
[104] O Buneman and E Clemons Efficiently monitoring relational databases ACM
Trans-actions on Database Systems, 4(3), 1979.
[105] O Buneman, S Naqvi, V Tannen, and L Wong Principles of programming with
complex objects and collection types Theoretical Computer Science, 149(1):3–48, 1995.
Trang 30852 Database Management Systems
[106] P Buneman, S Davidson, G Hillebrand, and D Suciu A query language and
optimiza-tion techniques for unstructured data In Proc ACM SIGMOD Conf on Management
the Management of Data, 1999.
[109] M Carey, D DeWitt, M Franklin, N Hall, M McAuliffe, J Naughton, D Schuh,
M Solomon, C Tan, O Tsatalos, S White, and M Zwilling Shoring up persistent
applications In Proc ACM SIGMOD Conf on the Management of Data, 1994.
[110] M Carey, D DeWitt, G Graefe, D Haight, J Richardson, D Schuh, E Shekita, and
S Vandenberg The EXODUS Extensible DBMS project: An overview In Readings in
Object-Oriented Databases S.B Zdonik and D Maier (eds.), Morgan Kaufmann, 1990.
[111] M Carey, D DeWitt, and J Naughton The dec 007 benchmark In Proc ACM
SIGMOD Conf on the Management of Data, 1993.
[112] M Carey, D DeWitt, J Naughton, M Asgarian, J Gehrke, and D Shah The BUCKY
object-relational benchmark In Proc ACM SIGMOD Conf on the Management of
Data, 1997.
[113] M Carey, D DeWitt, J Richardson, and E Shekita Object and file management in
the Exodus extensible database system In Proc Intl Conf on Very Large Databases,
1986
[114] M Carey and D Kossman On saying “Enough Already!” in SQL In Proc ACM
SIGMOD Conf on the Management of Data, 1997.
[115] M Carey and D Kossman Reducing the braking distance of an SQL query engine In
Proc Intl Conf on Very Large Databases, 1998.
[116] M Carey and M Livny Conflict detection tradeoffs for replicated data ACM
Trans-actions on Database Systems, 16(4), 1991.
[117] M Casanova, L Tucherman, and A Furtado Enforcing inclusion dependencies and
referential integrity In Proc Intl Conf on Very Large Databases, 1988.
[118] M Casanova and M Vidal Towards a sound view integration methodology In ACM
Symp on Principles of Database Systems, 1983.
[119] S Castano, M Fugini, G Martella, and P Samarati Database Security
Addison-Wesley, 1995
[120] R Cattell The Object Database Standard: ODMG-93 (Release 1.1) Morgan
Kauf-mann, 1994
[121] S Ceri, P Fraternali, S Paraboschi, and L Tanca Active rule management in Chimera
In Active Database Systems J Widom and S Ceri (eds.), Morgan Kaufmann, 1996 [122] S Ceri, G Gottlob, and L Tanca Logic Programming and Databases Springer Verlag,
1990
[123] S Ceri and G Pelagatti Distributed Database Design: Principles and Systems.
McGraw-Hill, 1984
[124] S Ceri and J Widom Deriving production rules for constraint maintenance In Proc.
Intl Conf on Very Large Databases, 1990.
[125] F Cesarini, M Missikoff, and G Soda An expert system approach for database
application tuning Data and Knowledge Engineering, 8:35–55, 1992.
[126] U Chakravarthy Architectures and monitoring techniques for active databases: An
evaluation Data and Knowledge Engineering, 16(1):1–26, 1995.
[127] U Chakravarthy, J Grant, and J Minker Logic-based approach to semantic query
optimization ACM Transactions on Database Systems, 15(2):162–207, 1990.
Trang 31REFERENCES 853
[128] D Chamberlin Using the New DB2 Morgan Kaufmann, 1996.
[129] D Chamberlin, M Astrahan, M Blasgen, J Gray, W King, B Lindsay, R Lorie,
J Mehl, T Price, P Selinger, M Schkolnick, D Slutz, I Traiger, B Wade, and R Yost
A history and evaluation of System R Communications of the ACM, 24(10):632–646,
1981
[130] D Chamberlin, M Astrahan, K Eswaran, P Griffiths, R Lorie, J Mehl, P Reisner,and B Wade Sequel 2: A unified approach to data definition, manipulation, and
control IBM Journal of Research and Development, 20(6):560–575, 1976.
[131] A Chandra and D Harel Structure and complexity of relational queries J Computer
and System Sciences, 25:99–128, 1982.
[132] A Chandra and P Merlin Optimal implementation of conjunctive queries in relational
databases In Proc ACM SIGACT Symp on Theory of Computing, 1977.
[133] M Chandy, L Haas, and J Misra Distributed deadlock detection ACM Transactions
on Computer Systems, 1(3):144–156, 1983.
[134] C Chang and D Leu Multi-key sorting as a file organization scheme when queries are
not equally likely In Proc Intl Symp on Database Systems for Advanced Applications,
1989
[135] D Chang and D Harkey Client/server data access with Java and XML John Wiley
and Sons, 1998
[136] D Chatziantoniou and K Ross Groupwise processing of relational queries In Proc.
Intl Conf on Very Large Databases, 1997.
[137] S Chaudhuri and U Dayal An overview of data warehousing and OLAP technology
SIGMOD Record, 26(1):65–74, 1997.
[138] S Chaudhuri and V Narasayya An efficient cost-driven index selection tool for
Mi-crosoft SQL Server In Proc Intl Conf on Very Large Databases, 1997.
[139] S Chaudhuri and K Shim Optimization queries with aggregate views In Intl Conf.
on Extending Database Technology, 1996.
[140] S Chaudhuri and K Shim Optimization of queries with user-defined predicates In
Proc Intl Conf on Very Large Databases, 1996.
[141] J Cheiney, P Faudemay, R Michel, and J Thevenin A reliable parallel backend using
multiattribute clustering and select-join operator In Proc Intl Conf on Very Large
Databases, 1986.
[142] C Chen and N Roussopoulos Adaptive database buffer management using query
feedback In Proc Intl Conf on Very Large Databases, 1993.
[143] C Chen and N Roussopoulos Adaptive selectivity estimation using query feedback
In Proc ACM SIGMOD Conf on the Management of Data, 1994.
[144] P M Chen, E K Lee, G A Gibson, R H Katz, and D A Patterson RAID:
high-performance, reliable secondary storage ACM Computing Surveys, 26(2):145–185, June
1994
[145] P P Chen The entity-relationship model—toward a unified view of data ACM
Trans-actions on Database Systems, 1(1):9–36, 1976.
[146] D Childs Feasibility of a set theoretical data structure—a general structure based on
a reconstructed definition of relation Proc Tri-annual IFIP Conference, 1968.
[147] D Chimenti, R Gamboa, R Krishnamurthy, S Naqvi, S Tsur, and C Zaniolo The ldl
system prototype IEEE Transactions on Knowledge and Data Engineering, 2(1):76–90,
Trang 32854 Database Management Systems
[150] H.-T Chou and D DeWitt An evaluation of buffer management strategies for relational
database systems In Proc Intl Conf on Very Large Databases, 1985.
[151] P Chrysanthis and K Ramamritham Acta: a framework for specifying and
reason-ing about transaction structure and behavior In Proc ACM SIGMOD Conf on the
Management of Data, 1990.
[152] F Chu, J Halpern, and P Seshadri Least expected cost query optimization: An
exercise in utility ACM Symp on Principles of Database Systems, 1999.
[153] F Civelek, A Dogac, and S Spaccapietra An expert system approach to view definition
and integration In Proc Entity-Relationship Conference, 1988.
[154] R Cochrane, H Pirahesh, and N Mattos Integrating triggers and declarative
con-straints in SQL database systems In Proc Intl Conf on Very Large Databases, 1996 [155] CODASYL Report of the CODASYL Data Base Task Group ACM, 1971.
[156] E Codd A relational model of data for large shared data banks Communications of
the ACM, 13(6):377–387, 1970.
[157] E Codd Further normalization of the data base relational model In Data Base
Systems ed R Rustin, PrenticeHall, 1972.
[158] E Codd Relational completeness of data base sub-languages In Data Base Systems.
ed R Rustin, PrenticeHall, 1972
[159] E Codd Extending the database relational model to capture more meaning ACM
Transactions on Database Systems, 4(4):397–434, 1979.
[160] E Codd Twelve rules for on-line analytic processing Computerworld, April 13 1995.
[161] L Colby, T Griffin, L Libkin, I Mumick, and H Trickey Algorithms for deferred view
maintenance In Proc ACM SIGMOD Conf on the Management of Data, 1996.
[162] L Colby, A Kawaguchi, D Lieuwen, I Mumick, and K Ross Supporting multiple
view maintenance policies: Concepts, algorithms, and performance analysis In Proc.
ACM SIGMOD Conf on the Management of Data, 1997.
[163] D Comer The ubiquitous B-tree ACM C Surveys, 11(2):121–137, 1979.
[164] D Connolly, editor XML Principles, Tools and Techniques O’Reilly & Associates,
Sebastopol, USA, 1997
[165] D Copeland and D Maier Making SMALLTALK a database system In Proc ACM
SIGMOD Conf on the Management of Data, 1984.
[166] G Cornell and K Abdali CGI Programming With Java PrenticeHall, 1998.
[167] C Date A critique of the SQL database language ACM SIGMOD Record, 14(3):8–54,
1984
[168] C Date Relational Database: Selected Writings Addison-Wesley, 1986.
[169] C Date An Introduction to Database Systems (6th ed.) Addison-Wesley, 1995 [170] C Date and H Darwen A Guide to the SQL Standard (3rd ed.) Addison-Wesley, 1993.
[171] C Date and R Fagin Simple conditions for guaranteeing higher normal forms in
relational databases ACM Transactions on Database Systems, 17(3), 1992.
[172] C Date and D McGoveran A Guide to Sybase and SQL Server Addison-Wesley, 1993 [173] U Dayal and P Bernstein On the updatability of relational views In Proc Intl Conf.
on Very Large Databases, 1978.
[174] U Dayal and P Bernstein On the correct translation of update operations on relational
views ACM Transactions on Database Systems, 7(3), 1982.
[175] P DeBra and J Paredaens Horizontal decompositions for handling exceptions to FDs
In Advances in Database Theory, H Gallaire eds J Minker and J-M Nicolas, Plenum
Press, 1984
[176] J Deep and P Holfelder Developing CGI applications with Perl Wiley, 1996.
Trang 33REFERENCES 855
[177] C Delobel Normalization and hierarchial dependencies in the relational data model
ACM Transactions on Database Systems, 3(3):201–222, 1978.
[178] D Denning Secure statistical databases with random sample queries ACM
Transac-tions on Database Systems, 5(3):291–315, 1980.
[179] D E Denning Cryptography and Data Security Addison-Wesley, 1982.
[180] M Derr, S Morishita, and G Phipps The glue-nail deductive database system: Design,
implementation, and evaluation VLDB Journal, 3(2):123–160, 1994.
[181] A Deshpande An implementation for nested relational databases Technical report,PhD thesis, Indiana University, 1989
[182] P Deshpande, K Ramasamy, A Shukla, and J F Naughton Caching multidimensional
queries using chunks In Proc ACM SIGMOD Intl Conf on Management of Data, 1998 [183] O e a Deux The story of O2 IEEE Transactions on Knowledge and Data Engineering,
2(1), 1990
[184] D DeWitt, H.-T Chou, R Katz, and A Klug Design and implementation of the
Wisconsin Storage System Software Practice and Experience, 15(10):943–962, 1985.
[185] D DeWitt, R Gerber, G Graefe, M Heytens, K Kumar, and M Muralikrishna
Gamma—a high performance dataflow database machine In Proc Intl Conf on Very
Large Databases, 1986.
[186] D DeWitt and J Gray Parallel database systems: The future of high-performance
database systems Communications of the ACM, 35(6):85–98, 1992.
[187] D DeWitt, R Katz, F Olken, L Shapiro, M Stonebraker, and D Wood
Implemen-tation techniques for main memory databases In Proc ACM SIGMOD Conf on the
Management of Data, 1984.
[188] D DeWitt, J Naughton, and D Schneider Parallel sorting on a shared-nothing
archi-tecture using probabilistic splitting In Proc Conf on Parallel and Distributed
Infor-mation Systems, 1991.
[189] D DeWitt, J Naughton, D Schneider, and S Seshadri Practical skew handling in
parallel joins In Proc Intl Conf on Very Large Databases, 1992.
[190] O Diaz, N Paton, and P Gray Rule management in object-oriented databases: A
uniform approach In Proc Intl Conf on Very Large Databases, 1991.
[191] S Dietrich Extension tables: Memo relations in logic programming In Proc Intl.
Symp on Logic Programming, 1987.
[192] D Donjerkovic and R Ramakrishnan Probabilistic optimization of top n queries In
Proc Intl Conf on Very Large Databases, 1999.
[193] W Du and A Elmagarmid Quasi-serializability: a correctness criterion for global
concurrency control in interbase In Proc Intl Conf on Very Large Databases, 1989.
[194] W Du, R Krishnamurthy, and M.-C Shan Query optimization in a heterogeneous
DBMS In Proc Intl Conf on Very Large Databases, 1992.
[195] R C Dubes and A Jain Clustering methodologies in exploratory data analysis,
Ad-vances in Computers Academic Press, New York, 1980.
[196] N Duppel Parallel SQL on TANDEM’s NonStop SQL IEEE COMPCON, 1989 [197] H Edelstein The challenge of replication, Parts 1 and 2 DBMS: Database and Client-
Server Solutions, 1995.
[198] W Effelsberg and T Haerder Principles of database buffer management ACM
Trans-actions on Database Systems, 9(4):560–595, 1984.
[199] M H Eich A classification and comparison of main memory database recovery
tech-niques In Proc IEEE Intl Conf on Data Engineering, 1987.
[200] A Eisenberg and J Melton Sql: 1999, formerly known as sql 3 ACM SIGMOD Record,
28(1):131–138, 1999
Trang 34856 Database Management Systems
[201] A El Abbadi Adaptive protocols for managing replicated distributed databases In
IEEE Symp on Parallel and Distributed Processing, 1991.
[202] A El Abbadi, D Skeen, and F Cristian An efficient, fault-tolerant protocol for
repli-cated data management In ACM Symp on Principles of Database Systems, 1985 [203] C Ellis Concurrency in Linear Hashing ACM Transactions on Database Systems,
12(2):195–217, 1987
[204] A Elmagarmid Database Transaction Models for Advanced Applications Morgan
Kaufmann, 1992
[205] A Elmagarmid, J Jing, W Kim, O Bukhres, and A Zhang Global committability
in multidatabase systems IEEE Transactions on Knowledge and Data Engineering,
8(5):816–824, 1996
[206] A Elmagarmid, A Sheth, and M Liu Deadlock detection algorithms in distributed
database systems In Proc IEEE Intl Conf on Data Engineering, 1986.
[207] R Elmasri and S Navathe Object integration in database design In Proc IEEE Intl.
Conf on Data Engineering, 1984.
[208] R Elmasri and S Navathe Fundamentals of Database Systems (2nd ed.)
Benjamin-Cummings, 1994
[209] R Epstein Techniques for processing of aggregates in relational database systems.Technical report, UC-Berkeley, Electronics Research Laboratory, M798, 1979
[210] R Epstein, M Stonebraker, and E Wong Distributed query processing in a relational
data base system In Proc ACM SIGMOD Conf on the Management of Data, 1978.
[211] M Ester, H.-P Kriegel, J Sander, and X Xu A density-based algorithm for
discov-ering clusters in large spatial databases with noise In Proc Intl Conf on Knowledge
Discovery in Databases and Data Mining, 1995.
[212] M Ester, H.-P Kriegel, and X Xu A database interface for clustering in large spatial
databases In Proc Intl Conf on Knowledge Discovery in Databases and Data Mining,
1995
[213] K Eswaran and D Chamberlin Functional specification of a subsystem for data base
integrity In Proc Intl Conf on Very Large Databases, 1975.
[214] K Eswaran, J Gray, R Lorie, and I Traiger The notions of consistency and predicate
locks in a data base system Communications of the ACM, 19(11):624–633, 1976.
[215] R Fagin Multivalued dependencies and a new normal form for relational databases
ACM Transactions on Database Systems, 2(3):262–278, 1977.
[216] R Fagin Normal forms and relational database operators In Proc ACM SIGMOD
Conf on the Management of Data, 1979.
[217] R Fagin A normal form for relational databases that is based on domains and keys
ACM Transactions on Database Systems, 6(3):387–415, 1981.
[218] R Fagin, J Nievergelt, N Pippenger, and H Strong Extendible Hashing—a fast access
method for dynamic files ACM Transactions on Database Systems, 4(3), 1979 [219] C Faloutsos Access methods for text ACM Computing Surveys, 17(1):49–74, 1985 [220] C Faloutsos Searching Multimedia Databases by Content Kluwer Academic, 1996.
[221] C Faloutsos and S Christodoulakis Signature files: An access method for documents
and its analytical performance evaluation ACM Transactions on Office Information
Systems, 2(4):267–288, 1984.
[222] C Faloutsos and H Jagadish On B-Tree indices for skewed distributions In Proc Intl.
Conf on Very Large Databases, 1992.
[223] C Faloutsos, R Ng, and T Sellis Predictive load control for flexible buffer allocation
In Proc Intl Conf on Very Large Databases, 1991.
Trang 35REFERENCES 857
[224] C Faloutsos, M Ranganathan, and Y Manolopoulos Fast subsequence matching
in time-series databases In Proc ACM SIGMOD Conf on the Management of Data,
1994
[225] C Faloutsos and S Roseman Fractals for secondary key retrieval In ACM Symp on
Principles of Database Systems, 1989.
[226] M Fang, N Shivakumar, H Garcia-Molina, R Motwani, and J D Ullman Computing
iceberg queries efficiently In Proc Intl Conf On Very Large Data Bases, 1998.
[227] U Fayyad, G Piatetsky-Shapiro, and P Smyth The kdd process for extracting useful
knowledge from volumes of data Communications of the ACM, 39(11):27–34, 1996 [228] U Fayyad, G Piatetsky-Shapiro, P Smyth, and R Uthurusamy Advances in Knowl-
edge Discovery and Data Mining MIT Press, 1996.
[229] U Fayyad and E Simoudis Data mining and knowledge discovery: Tutorial notes In
Intl Joint Conf on Artificial Intelligence, 1997.
[230] U M Fayyad, G Piatetsky-Shapiro, P Smyth, and R Uthurusamy, editors Advances
in Knowledge Discovery and Data Mining AAAI/MIT Press, 1996.
[231] U M Fayyad and R Uthurusamy, editors Proc Intl Conf on Knowledge Discovery
and Data Mining AAAI Press, 1995.
[232] M Fernandez, D Florescu, J Kang, A Y Levy, and D Suciu STRUDEL: A Web site
management system In Proc ACM SIGMOD Conf on Management of Data, 1997.
[233] M Fernandez, D Florescu, A Y Levy, and D Suciu A query language for a Web-site
management system SIGMOD Record (ACM Special Interest Group on Management
of Data), 26(3):4–11, 1997.
[234] S Finkelstein, M Schkolnick, and P Tiberio Physical database design for relational
databases IBM Research Review RJ5034, 1986.
[235] D Fishman, D Beech, H Cate, E Chow, T Connors, J Davis, N Derrett, C Hoch,
W Kent, P Lyngbaek, B Mahbod, M.-A Neimat, T Ryan, and M.-C Shan Iris: An
object-oriented database management system ACM Transactions on Office Information
Systems, 5(1):48–69, 1987.
[236] C Fleming and B von Halle Handbook of Relational Database Design Addison-Wesley,
1989
[237] D Florescu, A Y Levy, and A O Mendelzon Database techniques for the
World-Wide Web: A survey SIGMOD Record (ACM Special Interest Group on Management
of Data), 27(3):59–74, 1998.
[238] F Fotouhi and S Pramanik Optimal secondary storage access sequence for performing
relational join IEEE Transactions on Knowledge and Data Engineering, 1(3):318–328,
1989
[239] W B Frakes and R Baeza-Yates, editors Information Retrieval: Data Structures and
Algorithms PrenticeHall, 1992.
[240] P Franaszek, J Robinson, and A Thomasian Concurrency control for high contention
environments ACM Transactions on Database Systems, 17(2), 1992.
[241] M Franklin Concurrency control and recovery In Handbook of Computer Science,
A.B Tucker (ed.), CRC Press, 1996.
[242] M Franklin, M Carey, and M Livny Local disk caching for client-server database
systems In Proc Intl Conf on Very Large Databases, 1993.
[243] M Franklin, B Jonsson, and D Kossman Performance tradeoffs for client-server query
processing In Proc ACM SIGMOD Conf on the Management of Data, 1996.
[244] P Fraternali and L Tanca A structured approach for the definition of the semantics
of active databases ACM Transactions on Database Systems, 20(4):414–471, 1995 [245] M W Freeston The BANG file: A new kind of Grid File In Proc ACM SIGMOD
Conf on the Management of Data, 1987.
Trang 36858 Database Management Systems
[246] J Freytag A rule-based view of query optimization In Proc ACM SIGMOD Conf on
the Management of Data, 1987.
[247] O Friesen, A Lefebvre, and L Vieille VALIDITY: Applications of a DOOD system
In Intl Conf on Extending Database Technology, 1996.
[248] J Fry and E Sibley Evolution of data-base management systems ACM Computing
Surveys, 8(1):7–42, 1976.
[249] T Fukuda, Y Morimoto, S Morishita, and T Tokuyama Mining optimized association
rules for numeric attributes In ACM Symp on Principles of Database Systems, 1996.
[250] A Furtado and M Casanova Updating relational views In Query Processing in
Database Systems eds W Kim, D.S Reiner and D.S Batory, Springer-Verlag, 1985.
[251] S Fushimi, M Kitsuregawa, and H Tanaka An overview of the systems software
of a parallel relational database machine: Grace In Proc Intl Conf on Very Large
Databases, 1986.
[252] V Gaede and O Guenther Multidimensional access methods Computing Surveys,
30(2):170–231, 1998
[253] H Gallaire, J Minker, and J.-M Nicolas (eds.) Advances in Database Theory, Vols 1
and 2 Plenum Press, 1984.
[254] H Gallaire and J Minker (eds.) Logic and Data Bases Plenum Press, 1978.
[255] S Ganguly, W Hasan, and R Krishnamurthy Query optimization for parallel
execu-tion In Proc ACM SIGMOD Conf on the Management of Data, 1992.
[256] R Ganski and H Wong Optimization of nested SQL queries revisited In Proc ACM
SIGMOD Conf on the Management of Data, 1987.
[257] V Ganti, J E Gehrke, and R Ramakrishnan Cactus–clustering categorical data using
summaries In Proc ACM Intl Conf on Knowledge Discovery in Databases, 1999.
[258] V Ganti, R Ramakrishnan, J E Gehrke, A Powell, and J French Clustering large
datasets in arbitrary metric spaces In Proc IEEE Intl Conf Data Engineering, 1999 [259] H Garcia-Molina and D Barbara How to assign votes in a distributed system Journal
of the ACM, 32(4), 1985.
[260] H Garcia-Molina, R Lipton, and J Valdes A massive memory system machine IEEE
Transactions on Computers, C33(4):391–399, 1984.
[261] H Garcia-Molina and G Wiederhold Read-only transactions in a distributed database
ACM Transactions on Database Systems, 7(2):209–234, 1982.
[262] E Garfield Citation analysis as a tool in journal evaluation Science, 178(4060):471–
479, 1972
[263] A Garg and C Gotlieb Order preserving key transformations ACM Transactions on
Database Systems, 11(2):213–234, 1986.
[264] J E Gehrke, V Ganti, R Ramakrishnan, and W.-Y Loh Boat: Optimistic decision
tree construction In Proc ACM SIGMOD Conf on Managment of Data, 1999.
[265] J E Gehrke, R Ramakrishnan, and V Ganti Rainforest: a framework for fast decision
tree construction of large datasets In Proc Intl Conf on Very Large Databases, 1998 [266] S P Ghosh Data Base Organization for Data Management (2nd ed.) Academic Press,
1986
[267] D Gibson, J M Kleinberg, and P Raghavan Clustering categorical data: An approach
based on dynamical systems In Proceedings of the 24th International Conference on
Very Large Databases, pages 311–323, New York City, New York, August 24-27 1998.
[268] D Gibson, J M Kleinberg, and P Raghavan Inferring web communities from link
topology In Proc ACM Conf on Hypertext, Structural Queries, 1998.
[269] G A Gibson Redundant Disk Arrays: Reliable, Parallel Secondary Storage An ACM
Distinguished Dissertation 1991 MIT Press, 1992
Trang 37REFERENCES 859
[270] D Gifford Weighted voting for replicated data In ACM Symp on Operating Systems
Principles, 1979.
[271] C F Goldfarb and P Prescod The XML Handbook PrenticeHall PTR, 1998.
[272] R Goldman and J Widom DataGuides: enabling query formulation and optimization
in semistructured databases In Proc Intl Conf on Very Large Data Bases, pages
436–445, 1997
[273] J Goldstein, R Ramakrishnan, U Shaft, and J.-B Yu Processing queries by linear
constraints In Proc ACM Symposium on Principles of Database Systems, 1997.
[274] G Graefe Encapsulation of parallelism in the Volcano query processing system In
Proc ACM SIGMOD Conf on the Management of Data, 1990.
[275] G Graefe Query evaluation techniques for large databases ACM Computing Surveys,
25(2), 1993
[276] G Graefe, R Bunker, and S Cooper Hash joins and hash teams in microsoft sql server:
In Proc Intl Conf on Very Large Databases, 1998.
[277] G Graefe and D DeWitt The Exodus optimizer generator In Proc ACM SIGMOD
Conf on the Management of Data, 1987.
[278] G Graefe and K Ward Dynamic query optimization plans In Proc ACM SIGMOD
Conf on the Management of Data, 1989.
[279] M Graham, A Mendelzon, and M Vardi Notions of dependency satisfaction Journal
of the ACM, 33(1):105–129, 1986.
[280] G Grahne The Problem of Incomplete Information in Relational Databases
Springer-Verlag, 1991
[281] J Gray Notes on data base operating systems In Operating Systems: An Advanced
Course eds Bayer, Graham, and Seegmuller, Springer-Verlag, 1978.
[282] J Gray The transaction concept: Virtues and limitations In Proc Intl Conf on Very
Large Databases, 1981.
[283] J Gray Transparency in its place—the case against transparent access to geographically
distributed data Tandem Computers, TR-89-1, 1989.
[284] J Gray The Benchmark Handbook: for Database and Transaction Processing Systems.
Morgan Kaufmann, 1991
[285] J Gray, A Bosworth, A Layman, and H Pirahesh Data cube: A relational aggregation
operator generalizing group-by, cross-tab and sub-totals In Proc IEEE Intl Conf on
Data Engineering, 1996.
[286] J Gray, R Lorie, G Putzolu, and I Traiger Granularity of locks and degrees of
consistency in a shared data base In Proc of IFIP Working Conf on Modelling of
Data Base Management Systems, 1977.
[287] J Gray, P McJones, M Blasgen, B Lindsay, R Lorie, G Putzolu, T Price, and
I Traiger The recovery manager of the System R database manager ACM Computing
Surveys, 13(2):223–242, 1981.
[288] J Gray and A Reuter Transaction Processing: Concepts and Techniques Morgan
Kaufmann, 1992
[289] P Gray Logic, Algebra, and Databases John Wiley, 1984.
[290] P Griffiths and B Wade An authorization mechanism for a relational database system
ACM Transactions on Database Systems, 1(3):242–255, 1976.
[291] G Grinstein Visualization and data mining In Intl Conf on Knowledge Discovery
in Databases, 1996.
[292] S Guha, R Rastogi, and K Shim Cure: an efficient clustering algorithm for large
databases In Proc ACM SIGMOD Conf on Management of Data, 1998.
Trang 38860 Database Management Systems
[293] A Gupta and I Mumick Materialized Views: Techniques, Implementations, and
Ap-plications MIT Press, 1999.
[294] A Gupta, I Mumick, and V Subrahmanian Maintaining views incrementally In Proc.
ACM SIGMOD Conf on the Management of Data, 1993.
[295] A Guttman R-trees: a dynamic index structure for spatial searching In Proc ACM
SIGMOD Conf on the Management of Data, 1984.
[296] L Haas, W Chang, G Lohman, J McPherson, P Wilms, G Lapis, B Lindsay, H
Pi-rahesh, M Carey, and E Shekita Starburst mid-flight: As the dust clears IEEE
Transactions on Knowledge and Data Engineering, 2(1), 1990.
[297] L M Haas and A Tiwary, editors SIGMOD 1998, Proceedings of the ACM SIGMOD
International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA ACM Press, 1998.
[298] P Haas, J Naughton, S Seshadri, and L Stokes Sampling-based estimation of the
number of distinct values of an attribute In Proc Intl Conf on Very Large Databases,
1995
[299] P Haas and A Swami Sampling-based selectivity estimation for joins using augmented
frequent value statistics In Proc IEEE Intl Conf on Data Engineering, 1995.
[300] T Haerder and A Reuter Principles of transaction oriented database recovery—a
taxonomy ACM Computing Surveys, 15(4), 1982.
[301] U Halici and A Dogac Concurrency control in distributed databases through time
intervals and short-term locks IEEE Transactions on Software Engineering, 15(8):994–
1003, 1989
[302] M Hall Core Web Programming: HTML, Java, CGI, & Javascript Prentice-Hall,
1997
[303] P Hall Optimization of a simple expression in a relational data base system IBM
Journal of Research and Development, 20(3):244–257, 1976.
[304] G Hamilton, R Cattell, and M Fisher JDBC Database Access With Java: A Tutorial
and Annotated Reference Java Series Addison-Wesley, 1997.
[305] M Hammer and D McLeod Semantic integrity in a relational data base system In
Proc Intl Conf on Very Large Databases, 1975.
[306] J Han and Y Fu Discovery of multiple-level association rules from large databases
In Proc Intl Conf on Very Large Databases, 1995.
[307] D Hand Construction and Assessment of Classification Rules John Wiley & Sons,
Chichester, England, 1997
[308] E Hanson A performance analysis of view materialization strategies In Proc ACM
SIGMOD Conf on the Management of Data, 1987.
[309] E Hanson Rule condition testing and action execution in Ariel In Proc ACM SIGMOD
Conf on the Management of Data, 1992.
[310] V Harinarayan, A Rajaraman, and J Ullman Implementing data cubes efficiently In
Proc ACM SIGMOD Conf on the Management of Data, 1996.
[311] J Haritsa, M Carey, and M Livny On being optimistic about real-time constraints
In ACM Symp on Principles of Database Systems, 1990.
[312] J Harrison and S Dietrich Maintenance of materialized views in deductive databases:
An update propagation approach In Proc Workshop on Deductive Databases, 1992 [313] D Heckerman Bayesian networks for knowledge discovery In Advances in Knowledge
Discovery and Data Mining eds U.M Fayyad, G Piatetsky-Shapiro, P Smyth, and R.
Uthurusamy, MIT Press, 1996
[314] D Heckerman, H Mannila, D Pregibon, and R Uthurusamy, editors Proceedings of the
Third International Conference on Knowledge Discovery and Data Mining (KDD-97).
AAAI Press, 1997
Trang 39REFERENCES 861
[315] J Hellerstein Optimization and execution techniques for queries with expensive
meth-ods Ph.D thesis, University of Wisconsin-Madison, 1995.
[316] J Hellerstein, P Haas, and H Wang Online aggregation In Proc ACM SIGMOD
Conf on the Management of Data, 1997.
[317] J Hellerstein, J Naughton, and A Pfeffer Generalized search trees for database
sys-tems In Proc Intl Conf on Very Large Databases, 1995.
[318] J M Hellerstein, E Koutsoupias, and C H Papadimitriou On the analysis of indexing
schemes In Proc ACM Symposium on Principles of Database Systems, pages 249–256,
1997
[319] R Himmeroeder, G Lausen, B Ludaescher, and C Schlepphorst On a declarative
semantics for Web queries Lecture Notes in Computer Science, 1341:386–398, 1997.
[320] C.-T Ho, R Agrawal, N Megiddo, and R Srikant Range queries in OLAP data cubes
In Proc ACM SIGMOD Conf on the Management of Data, 1997.
[321] S Holzner XML Complete McGraw-Hill, 1998.
[322] D Hong, T Johnson, and U Chakravarthy Real-time transaction scheduling: A cost
conscious approach In Proc ACM SIGMOD Conf on the Management of Data, 1993.
[323] W Hong and M Stonebraker Optimization of parallel query execution plans in XPRS
In Proc Intl Conf on Parallel and Distributed Information Systems, 1991.
[324] W.-C Hou and G Ozsoyoglu Statistical estimators for aggregate relational algebra
queries ACM Transactions on Database Systems, 16(4), 1991.
[325] H Hsiao and D DeWitt A performance study of three high availability data replication
strategies In Proc Intl Conf on Parallel and Distributed Information Systems, 1991.
[326] J Huang, J Stankovic, K Ramamritham, and D Towsley Experimental evaluation of
real-time optimistic concurrency control schemes In Proc Intl Conf on Very Large
Databases, 1991.
[327] Y Huang, A Sistla, and O Wolfson Data replication for mobile computers In Proc.
ACM SIGMOD Conf on the Management of Data, 1994.
[328] Y Huang and O Wolfson A competitive dynamic data replication algorithm In Proc.
IEEE CS IEEE Intl Conf on Data Engineering, 1993.
[329] R Hull Managing semantic heterogeneity in databases: A theoretical perspective In
ACM Symp on Principles of Database Systems, 1997.
[330] R Hull and R King Semantic database modeling: Survey, applications, and research
issues ACM Computing Surveys, 19(19):201–260, 1987.
[331] R Hull and J Su Algebraic and calculus query languages for recursively typed complex
objects Journal of Computer and System Sciences, 47(1):121–156, 1993.
[332] R Hull and M Yoshikawa ILOG: declarative creation and manipulation of
object-identifiers In Proc Intl Conf on Very Large Databases, 1990.
[333] J Hunter Java Servlet Programming O’Reilly Associates, Inc., 1998.
[334] T Imielinski and H Korth (eds.) Mobile Computing Kluwer Academic, 1996 [335] T Imielinski and W Lipski Incomplete information in relational databases Journal
of the ACM, 31(4):761–791, 1984.
[336] T Imielinski and H Mannila A database perspective on knowledge discovery
Com-munications of the ACM, 38(11):58–64, 1996.
[337] T Imielinski, S Viswanathan, and B Badrinath Energy efficient indexing on air In
Proc ACM SIGMOD Conf on the Management of Data, 1994.
[338] Y Ioannidis Query optimization In Handbook of Computer Science ed A.B Tucker,
CRC Press, 1996
[339] Y Ioannidis and S Christodoulakis Optimal histograms for limiting worst-case error
propagation in the size of join results ACM Transactions on Database Systems, 1993.
Trang 40862 Database Management Systems
[340] Y Ioannidis and Y Kang Randomized algorithms for optimizing large join queries In
Proc ACM SIGMOD Conf on the Management of Data, 1990.
[341] Y Ioannidis and Y Kang Left-deep vs bushy trees: An analysis of strategy spaces
and its implications for query optimization In Proc ACM SIGMOD Conf on the
Management of Data, 1991.
[342] Y Ioannidis, R Ng, K Shim, and T Sellis Parametric query processing In Proc Intl.
Conf on Very Large Databases, 1992.
[343] Y Ioannidis and R Ramakrishnan Containment of conjunctive queries: Beyond
rela-tions as sets ACM Transacrela-tions on Database Systems, 20(3):288–324, 1995.
[344] Y E Ioannidis Universality of serial histograms In Proc Intl Conf on Very Large
Databases, 1993.
[345] H Jagadish, D Lieuwen, R Rastogi, A Silberschatz, and S Sudarshan Dali: a
high performance main-memory storage manager In Proc Intl Conf on Very Large
Databases, 1994.
[346] A K Jain and R C Dubes Algorithms for Clustering Data PrenticeHall, 1988.
[347] S Jajodia and D Mutchler Dynamic voting algorithms for maintaining the consistency
of a replicated database ACM Transactions on Database Systems, 15(2):230–280, 1990 [348] S Jajodia and R Sandhu Polyinstantiation integrity in multilevel relations In Proc.
IEEE Symp on Security and Privacy, 1990.
[349] M Jarke and J Koch Query optimization in database systems ACM Computing
Surveys, 16(2):111–152, 1984.
[350] K S Jones and P Willett, editors Readings in Information Retrieval Multimedia
Information and Systems Morgan Kaufmann Publishers, 1997
[351] J Jou and P Fischer The complexity of recognizing 3nf schemes Information
Pro-cessing Letters, 14(4):187–190, 1983.
[352] R J B Jr Efficiently mining long patterns from databases In Haas and Tiwary [297].[353] N Kabra and D J DeWitt Efficient mid-query re-optimization of sub-optimal query
execution plans In Proc ACM SIGMOD Intl Conf on Management of Data, 1998.
[354] Y Kambayashi, M Yoshikawa, and S Yajima Query processing for distributed
databases using generalized semi-joins In Proc ACM SIGMOD Conf on the
Man-agement of Data, 1982.
[355] P Kanellakis Elements of relational database theory In Handbook of Theoretical
Computer Science ed J Van Leeuwen, Elsevier, 1991.
[356] P Kanellakis Constraint programming and database languages: A tutorial In ACM
Symp on Principles of Database Systems, 1995.
[357] L Kaufman and P Rousseeuw Finding Groups in Data: An Introduction to Cluster
Analysis John Wiley and Sons, 1990.
[358] D Keim and H.-P Kriegel VisDB: a system for visualizing large databases In Proc.
ACM SIGMOD Conf on the Management of Data, 1995.
[359] D Keim and H.-P Kriegel Visualization techniques for mining large databases: A
comparison IEEE Transactions on Knowledge and Data Engineering, 8(6):923–938,
1996
[360] A Keller Algorithms for translating view updates to database updates for views
involv-ing selections, projections, and joins ACM Symp on Principles of Database Systems,
1985
[361] W Kent Data and Reality, Basic Assumptions in Data Processing Reconsidered.
North-Holland, 1978
[362] W Kent, R Ahmed, J Albert, M Ketabchi, and M.-C Shan Object identification in
multi-database systems In IFIP Intl Conf on Data Semantics, 1992.