1. Trang chủ
  2. » Giáo án - Bài giảng

THE DESIGN OF POSTGRES

28 331 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 28
Dung lượng 97,32 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Although the basic structure of POSTQUEL is very similar to QUEL, numerous extensionshave been made to support complex objects, user-defined data types and access methods, timevarying da

Trang 1

THE DESIGN OF POSTGRES

Michael Stonebraker and Lawrence A Rowe Department of Electrical Engineering and Computer Sciences University of California Berkeley, CA 94720

Abstract

This paper presents the preliminary design of a new database management system, calledPOSTGRES, that is the successor to the INGRES relational database system The main designgoals of the new system are to:

1) provide better support for complex objects,

2) provide user extendibility for data types, operators and access methods,

3) provide facilities for active databases (i.e., alerters and triggers) and inferencing ing forward- and backward-chaining,

includ-4) simplify the DBMS code for crash recovery,

5) produce a design that can take advantage of optical disks, workstations composed ofmultiple tightly-coupled processors, and custom designed VLSI chips, and

6) make as few changes as possible (preferably none) to the relational model

The paper describes the query language, programming langauge interface, system architecture,query processing strategy, and storage system for the new system

1 INTRODUCTION

The INGRES relational database management system (DBMS) was implemented during1975-1977 at the Univerisity of California Since 1978 various prototype extensions have beenmade to support distributed databases [STON83a], ordered relations [STON83b], abstract datatypes [STON83c], and QUEL as a data type [STON84a] In addition, we proposed but never pro-totyped a new application program interface [STON84b] The University of California version ofINGRES has been ‘‘hacked up enough’’ to make the inclusion of substantial new functionextremely difficult Another problem with continuing to extend the existing system is that many

of our proposed ideas would be difficult to integrate into that system because of earlier designdecisions Consequently, we are building a new database system, called POSTGRES (POSTinGRES)

This paper describes the design rationale, the features of POSTGRES, and our proposedimplementation for the system The next section discusses the design goals for the system Sec-tions 3 and 4 presents the query language and programming language interface, respectively, tothe system Section 5 describes the system architecture including the process structure, query

Trang 2

processing strategies, and storage system.

2 DISCUSSION OF DESIGN GOALS

The relational data model has proven very successful at solving most business data ing problems Many commercial systems are being marketed that are based on the relationalmodel and in time these systems will replace older technology DBMS’s However, there aremany engineering applications (e.g., CAD systems, programming environments, geographic data,and graphics) for which a conventional relational system is not suitable We have embarked onthe design and implementation of a new generation of DBMS’s, based on the relational model,that will provide the facilities required by these applications This section describes the majordesign goals for this new system

process-The first goal is to support complex objects [LORI83, STON83c] Engineering data, in trast to business data, is more complex and dynamic Although the required data types can besimulated on a relational system, the performance of the applications is unacceptable Considerthe following simple example The objective is to store a collection of geographic objects in adatabase (e.g., polygons, lines, and circles) In a conventional relational DBMS, a relation foreach type of object with appropriate fields would be created:

con-POLYGON (id, other fields)

CIRCLE (id, other fields)

LINE (id, other fields)

To display these objects on the screen would require additional information that representeddisplay characteristics for each object (e.g., color, position, scaling factor, etc.) Because thisinformation is the same for all objects, it can be stored in a single relation:

DISPLAY( color, position, scaling, obj-type, object-id)

The ‘‘object-id’’ field is the identifier of a tuple in a relation identified by the ‘‘obj-type’’ field(i.e., POLYGON, CIRCLE, or LINE) Given this representation, the following commands wouldhave to be executed to produce a display:

foreach OBJ in {POLYGON, CIRCLE, LINE} do

range of O is OBJ

range of D is DISPLAY

retrieve (D.all, O.all)

where D.object-id = O.id

and D.obj-type = OBJ

Unfortunately, this collection of commands will not be executed fast enough by any relationalsystem to ‘‘paint the screen’’ in real time (i.e., one or two seconds) The problem is that regard-less of how fast your DBMS is there are too many queries that have to be executed to fetch thedata for the object The feature that is needed is the ability to store the object in a field inDISPLAY so that only one query is required to fetch it Consequently, our first goal is to correctthis deficiency

The second goal for POSTGRES is to make it easier to extend the DBMS so that it can beused in new application domains A conventional DBMS has a small set of built-in data typesand access methods Many applications require specialized data types (e.g., geometic data typesfor CAD/CAM or a latitude and longitude position data type for mapping applications) Whilethese data types can be simulated on the built-in data types, the resulting queries are verbose andconfusing and the performance can be poor A simple example using boxes is presented else-where [STON86] Such applications would be best served by the ability to add new data typesand new operators to a DBMS Moreover, B-trees are only appropriate for certain kinds of data,and new access methods are often required for some data types For example, K-D-B trees

Trang 3

[ROBI81] and R-trees [GUTM84] are appropriate access methods for point and polygon data,respectively.

Consequently, our second goal is to allow new data types, new operators and new accessmethods to be included in the DBMS Moreover, it is crucial that they be implementable bynon-experts which means easy-to-use interfaces should be preserved for any code that will bewritten by a user Other researchers are pursuing a similar goal [DEWI85]

The third goal for POSTGRES is to support active databases and rules Many applicationsare most easily programmed using alerters and triggers For example, form-flow applications such

as a bug reporting system require active forms that are passed from one user to another [TSIC82,ROWE82] In a bug report application, the manager of the program maintenance group should benotified if a high priority bug that has been assigned to a programmer has not been fixed by aspecified date A database alerter is needed that will send a message to the manager calling hisattention to the problem Triggers can be used to propagate updates in the database to maintainconsistency For example, deleting a department tuple in the DEPT relation might trigger anupdate to delete all employees in that department in the EMP relation

In addition, many expert system applications operate on data that is more easily described

as rules rather than as data values For example, the teaching load of professors in the EECSdepartment can be described by the following rules:

1) The normal load is 8 contact hours per year

2) The scheduling officer gets a 25 percent reduction

3) The chairman does not have to teach

4) Faculty on research leave receive a reduction proportional to their leave fraction

5) Courses with less than 10 students generate credit at 0.1 contact hours per student

6) Courses with more than 50 students generate EXTRA contact hours at a rate of 0.01 perstudent in excess of 50

7) Faculty can have a credit balance or a deficit of up to 2 contact hours

These rules are subject to frequent change The leave status, course assignments, and tive assignments (e.g., chairman and scheduling officer) all change frequently It would be mostnatural to store the above rules in a DBMS and then infer the actual teaching load of individualfaculty rather than storing teaching load as ordinary data and then attempting to enforce the aboverules by a collection of complex integrity constraints Consequently, our third goal is to supportalerters, triggers, and general rule processing

administra-The fourth goal for POSTGRES is to reduce the amount of code in the DBMS written tosupport crash recovery Most DBMS’s have a large amount of crash recovery code that is tricky

to write, full of special cases, and very difficult to test and debug Because one of our goals is toallow user-defined access methods, it is imperative that the model for crash recovery be as simple

as possible and easily extendible Our proposed approach is to treat the log as normal datamanaged by the DBMS which will simplify the recovery code and simultaneously provide sup-port for access to the historical data

Our next goal is to make use of new technologies whenever possible Optical disks (evenwritable optical disks) are becoming available in the commercial marketplace Although theyhave slower access characteristics, their price-performance and reliability may prove attractive

A system design that includes optical disks in the storage hierarchy will have an advantage.Another technology that we forsee is workstation-sized processors with several CPU’s We want

to design POSTGRES in such way as to take advantage of these CPU resources Lastly, a design

Trang 4

that could utilize special purpose hardware effectively might make a convincing case for ing and implementing custom designed VLSI chips Our fifth goal, then, is to investigate adesign that can effectively utilize an optical disk, several tightly coupled processors and customdesigned VLSI chips.

design-The last goal for POSTGRES is to make as few changes to the relational model as possible.First, many users in the business data processing world will become familiar with relational con-cepts and this framework should be preserved if possible Second, we believe the original ‘‘spar-tan simplicity’’ argument made by Codd [CODD70] is as true today as in 1970 Lastly, there aremany semantic data models but there does not appear to be a small model that will solveeveryone’s problem For example, a generalization hierarchy will not solve the problem of struc-turing CAD data and the design models developed by the CAD community will not handle gen-eralization hierarchies Rather than building a system that is based on a large, complex datamodel, we believe a new system should be built on a small, simple model that is extendible Webelieve that we can accomplish our goals while preserving the relational model Other researchersare striving for similar goals but they are using different approaches [AFSA85, ATKI84,COPE84, DERR85, LORI83, LUM85]

The remainder of the paper describes the design of POSTGRES and the basic system tecture we propose to use to implement the system

archi-3 POSTQUEL

This section describes the query language supported by POSTGRES The relational model

as described in the original definition by Codd [CODD70] has been preserved A database iscomposed of a collection of relations that contain tuples with the same fields defined, and thevalues in a field have the same data type The query language is based on the INGRES querylanguage QUEL [HELD75] Several extensions and changes have been made to QUEL so thenew language is called POSTQUEL to distinguish it from the original language and other QUELextensions described elsewhere [STON85a, KUNG84]

Most of QUEL is left intact The following commands are included in POSTQUEL withoutany changes: Create Relation, Destroy Relation, Append, Delete, Replace, Retrieve, Retrieve intoResult, Define View, Define Integrity, and Define Protection The Modify command whichspecified the storage structure for a relation has been omitted because all relations are stored in aparticular structure designed to support historical data The Index command is retained so thatother access paths to the data can be defined

Although the basic structure of POSTQUEL is very similar to QUEL, numerous extensionshave been made to support complex objects, user-defined data types and access methods, timevarying data (i.e., versions, snapshots, and historical data), iteration queries, alerters, triggers, andrules These changes are described in the subsections that follow

3.1 Data Definition

The following built-in data types are provided;

1) integers,

2) floating point,

3) fixed length character strings,

4) unbounded varying length arrays of fixed types with an arbitrary number of dimensions,5) POSTQUEL, and

6) procedure

Trang 5

Scalar type fields (e.g., integer, floating point, and fixed length character strings) are referenced

by the conventional dot notation (e.g., EMP.name)

Variable length arrays are provided for applications that need to store large homogenoussequences of data (e.g., signal processing data, image, or voice) Fields of this type are refer-enced in the standard way (e.g., EMP.picture[i] refers to the i-th element of the picture array) Aspecial case of arrays is the text data type which is a one-dimensional array of characters Notethat arrays can be extended dynamically

Fields of type POSTQUEL contain a sequence of data manipulation commands They arereferenced by the conventional dot notation However, if a POSTQUEL field contains a retrievecommand, the data specified by that command can be implicitly referenced by a multiple dotnotation (e.g., EMP.hobbies.battingavg) as proposed elsewhere [STON84a] and first suggested byZaniolo in GEM [ZANI83]

Fields of type procedure contain procedures written in a general purpose programminglanguage with embedded data manipulation commands (e.g., EQUEL [ALLM76] or Rigel[ROWE79]) Fields of type procedure and POSTQUEL can be executed using the Execute com-mand Suppose we are given a relation with the following definition

EMP(name, age, salary, hobbies, dept)

in which the ‘‘hobbies’’ field is of type POSTQUEL That is, ‘‘hobbies’’ contains queries thatretrieve data about the employee’s hobbies from other relations The following command willexecute the queries in that field:

execute (EMP.hobbies)

where EMP.name = ‘‘Smith’’

The value returned by this command can be a sequence of tuples with varying types because thefield can contain more than one retrieve command and different commands can return differenttypes of records Consequently, the programming language interface must provide facilities todetermine the type of the returned records and to access the fields dynamically

Fields of type POSTQUEL and procedure can be used to represent complex objects withshared subobjects and to support multiple representations of data Examples are given in the nextsection on complex objects

In addition to these built-in data types, user-defined data types can be defined using an face similar to the one developed for ADT-INGRES [STON83c, STON86] New data types andoperators can be defined with the user-defined data type facility

create OBJECT (name = char[10], obj = postquel)

The table in figure 1 shows sample values for this relation The relation contains the description

of two complex objects named ‘‘apple’’ and ‘‘orange.’’ The object ‘‘apple’’ is composed of apolygon and a circle and the object ‘‘orange’’ is composed of a line and a polygon Notice thatboth objects share the polygon with id equal to 10

Trang 6

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiapple retrieve (POLYGON.all)

where POLYGON.id = 10retrieve (CIRCLE.all)where CIRCLE.id = 40iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiorange retrieve (LINE.all)

where LINE.id = 17retrieve (POLYGON.all)where POLYGON.id = 10

an object stored in POLYGON, CIRCLE, and LINE:

create OBJECT(name=char[10], obj=postquel, display=cproc)

The value stored in the display field is a procedure written in C that queries the database to fetchthe subobjects that make up the object and that creates the display list representation for theobject

This solution has two problems: the code is repeated in every OBJECT tuple and the C cedure replicates the queries stored in the object field to retrieve the subobjects These problemscan be solved by storing the procedure in a separate relation (i.e., normalizing the databasedesign) and by passing the object to the procedure as an argument The definition of the relation

pro-in which the procedures will be stored is:

create OBJPROC(name=char[12], proc=cproc)

append to OBJPROC(name=‘‘display-list’’, proc=‘‘ source code ’’)

Now, the entry in the display field for the ‘‘apple’’ object is

Trang 7

execute (OBJECT.obj)

where OBJECT.name=argument

This solution is somewhat complex but it stores only one copy of the procedure’s source code inthe database and it stores only one copy of the commands to fetch the data that represents theobject

Fields of type POSTQUEL and procedure can be efficiently supported through a tion of compilation and precomputation described in sections 4 and 5

combina-3.3 Time Varying Data

POSTQUEL allows users to save and query historical data and versions [KATZ85,WOOD83] By default, data in a relation is never deleted or updated Conventional retrievalsalways access the current tuples in the relation Historical data can be accessed by indicating thedesired time when defining a tuple variable For example, to access historical employee data auser writes

retrieve (E.all)

from E in EMP[‘‘7 January 1985’’]

which retrieves all records for employees that worked for the company on 7 January 1985 TheFrom-clause which is similar to the SQL mechanism to define tuple variables [ASTR76], replacesthe QUEL Range command The Range command was removed from the query languagebecause it defined a tuple variable for the duration of the current user program Because queriescan be stored as the value of a field, the scope of tuple variable definitions must be constrained.The From-clause makes the scope of the definition the current query

This bracket notation for accessing historical data implicitly defines a snapshot [ADIB80].The implementation of queries that access this snapshot, described in detail in section 5, searchesback through the history of the relation to find the appropriate tuples The user can materializethe snapshot by executing a Retrieve-into command that will make a copy of the data in anotherrelation

Applications that do not want to save historical data can specify a cutoff point for a relation.Data that is older than the cutoff point is deleted from the database Cutoff points are defined bythe Discard command The command

discard EMP before ‘‘1 week’’

deletes data in the EMP relation that is more than 1 week old The commands

discard EMP before ‘‘now’’

and

discard EMP

retain only the current data in EMP

It is also possible to write queries that reference data which is valid between two dates Thenotation

relation-name[date1, date2]

specifies the relation containing all tuples that were in the relation at some time between date1and date2 Either or both of these dates can be omitted to specify all data in the relation from thetime it was created until a fixed date (i.e., relation-name[,date]), all data in the relation from afixed date to the present (i.e., relation-name[date,]), or all data that was every in the relation (i.e.,relation-name[ ]) For example, the query

Trang 8

Finally, POSTGRES provides support for versions A version can be created from a tion or a snapshot Updates to a version do not modify the underlying relation and updates to theunderlying relation will be visible through the version unless the value has been modified in theversion Versions are defined by the Newversion command The command

rela-newversion EMPTEST from EMP

creates a version named EMPTEST that is derived from the EMP relation If the user wants tocreate a version that is not changed by subsequent updates to the underlying relation as in mostsource code control systems [TICH82], he can create a version off a snapshot

A Merge command is provided that will merge the changes made in a version back into theunderlying relation An example of a Merge command is

merge EMPTEST into EMP

The Merge command will use a semi-automatic procedure to resolve updates to the underlyingrelation and the version that conflict [GARC84]

This section described POSTGRES support for time varying data The strategy for menting these features is described below in the section on system architecture

imple-3.4 Iteration Queries, Alerters, Triggers, and Rules

This section describes the POSTQUEL commands for specifying iterative execution ofqueries, alerters [BUNE79], triggers [ASTR76], and rules

Iterative queries are requried to support transitive closure [GUTM84 KUNG84] Iteration isspecified by appending an asterisk (‘‘*’’) to a command that should be repetitively executed Forexample, to construct a relation that includes all people managed by someone either directly orindirectly a Retrieve*-into command is used Suppose one is given an employee relation with aname and manager field:

create EMP(name=char[20], ,mgr=char[20], )

The following query creates a relation that conatins all employees who work for Jones:

retrieve* into SUBORDINATES(E.name, E.mgr)

from E in EMP, S in SUBORDINATES

where E.name=‘‘Jones’’

or E.mgr=S.name

This command continues to execute the Retrieve-into command until there are no changes made

to the SUBORDINATES relation

The ‘‘*’’ modifier can be appended to any of the POSTQUEL data manipulation mands: Append, Delete, Execute, Replace, Retrieve, and Retrieve-into Complex iterations, likethe A-* heuristic search algorithm, can be specified using sequences of these iteration queries[STON85b]

com-Alerters and triggers are specified by adding the keyword ‘‘always’’ to a query For ple, an alerter is specified by a Retrieve command such as

Trang 9

exam-retrieve always (EMP.all)

where EMP.name = ‘‘Bill’’

This command returns data to the application program that issued it whenever Bill’s employeerecord is changed.1A trigger is an update query (i.e., Append, Replace, or Delete command) with

an ‘‘always’’ keyword For example, the command

delete always DEPT

where count(EMP.name by DEPT.dname

where EMP.dept = DEPT.dname) = 0defines a trigger that will delete DEPT records for departments with no employees

Iteration queries differ from alerters and triggers in that iteration queries run until they cease

to have an effect while alerters and triggers run indefinitely An efficient mechanism to awaken

‘‘always’’ commands is described in the system architecture section

‘‘Always’’ commands support a forward-chaining control structure in which an updatewakes up a collection of alerters and triggers that can wake up other commands This process ter-minates when no new commands are awakened POSTGRES also provides support for abackward-chaining control structure

The conventional approach to supporting inference is to extend the view mechanism (orsomething equivalent) with additional capabilities (e.g [ULLM85, WONG84, JARK85]) Thecanonical example is the definition of the ANCESTOR relation based on a stored relationPARENT:

PARENT (parent-of, offspring)

Ancestor can then be defined by the following commands:

range of P is PARENT

range of A is ANCESTOR

define view ANCESTOR (P.all)

define view* ANCESTOR (A.parent-of, P.offspring)

where A.offspring = P.parent-ofNotice that the ANCESTOR view is defined by multiple commands that may involve recursion

A query such as:

retrieve (ANCESTOR parent-of)

where ANCESTOR.offspring = ‘‘Bill’’

is processed by extensions to a standard query modification algorithm [STON75] to generate arecursive command or a sequence of commands on stored relations To support this mechanism,the query optimizer must be extended to handle these commands

This approach works well when there are only a few commands which define a particularview and when the commands do not generate conflicting answers This approach is less success-ful if either of these conditions is violated as in the following example:

define view DESK-EMP (EMP.all, desk = ‘‘steel’’) where EMP.age < 40

define view DESK-EMP (EMP.all, desk = ‘‘wood’’ where EMP.age >= 40

define view DESK-EMP (EMP.all, desk = ‘‘wood’’) where EMP.name = ‘‘hotshot’’define view DESK-EMP (EMP.all, desk = ‘‘steel’’) where EMP.name = ‘‘bigshot’’

In this example, employees over 40 get a wood desk, those under 40 get a steel desk However,

‘‘hotshot’’ and ‘‘bigshot’’ are exceptions to these rules ‘‘Hotshot’’ is given a wood desk andhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

1

Strictly speaking the data is returned to the program through a portal which is defined in section 4

Trang 10

‘‘bigshot’’ is given a steel desk, regardless of their ages In this case, the query:

retrieve (DESK-EMP.desk) where DESK-EMP.name = ‘‘bigshot’’

will require 4 separate commands to be optimized and run Moreover, both the second and thefourth definitions produce an answer to the query that is different In the case that a largernumber of view definitions is used in the specification of an object, then the important perfor-mance parameter will be isolating the view definitions which are actually useful Moreover,when there are conflicting view definitions (e.g the general rule and then exceptional cases), onerequires a priority scheme to decide which of conflicting definitions to utilize The schemedescribed below works well in such situations

POSTGRES supports backward-chaining rules by virtual columns (i.e., columns for which

no value is stored) Data in such columns is inferred on demand from rules and cannot be directlyupdated, except by adding or dropping rules Rules are specified by adding the keyword

‘‘demand’’ to a query Hence, for the DESK-EMP example, the EMP relation would have a tual field, named ‘‘desk,’’ that would be defined by four rules:

vir-replace demand EMP (desk = ‘‘steel’’) where EMP.age < 40

replace demand EMP (desk = ‘‘wood’’ where EMP.age >= 40

replace demand EMP (desk = ‘‘wood’’) where EMP.name = ‘‘hotshot’’

replace demand EMP (desk = ‘‘steel’’) where EMP.name = ‘‘bigshot’’

The third and fourth commands would be defined at a higher priority than the first and second Aquery that accessed the desk field would cause the ‘‘demand’’ commands to be processed todetermine the appropriate desk value for each EMP tuple retrieved

This subsection has described a collection of facilities provided in POSTQUEL to supportcomplex queries (e.g., iteration) and active databases (e.g., alerters, triggers, and rules) Efficienttechniques for implementing these facilities are given in section 5

4 PROGRAMMING LANGUAGE INTERFACE

This section describes the programming language interface (HITCHING POST) toPOSTGRES We had three objectives when designing the HITCHING POST and POSTGRESfacilities First, we wanted to design and implement a mechanism that would simplify thedevelopment of browsing style applications Second, we wanted HITCHING POST to be power-ful enough that all programs that need to access the database including the ad hoc terminal moni-tor and any preprocessors for embedded query languages could be written with the interface Andlastly, we wanted to provide facilities that would allow an application developer to tune the per-formance of his program (i.e., to trade flexibility and reliability for performance)

Any POSTQUEL command can be executed in a program In addition, a mechanism,called a ‘‘portal,’’ is provided that allows the program to retrieve data from the database A por-tal is similar to a cursor [ASTR76], except that it allows random access to the data specified bythe query and the program can fetch more than one record at a time The portal mechanismdescribed here is different than the one we previously designed [STON84b], but the goal is stillthe same The following subsections describe the commands for defining portals and accessingdata through them and the facilities for improving the performance of query execution (i.e., com-pilation and fast-path)

4.1 Portals

A portal is defined by a Retrieve-portal or Execute-portal command For example, the lowing command defines a portal named P:

Trang 11

fol-retrieve portal P(EMP.all)

where EMP.age < 40

This command is passed to the backend process which generates a query plan to fetch the data.The program can now issue commands to fetch data from the backend process to the frontendprocess or to change the ‘‘current position’’ of the portal The portal can be thought of as a queryplan in execution in the DBMS process and a buffer containing fetched data in the applicationprocess

The program fetches data from the backend into the buffer by executing a Fetch command.For example, the command

fetch 20 into P

fetches the first twenty records in the portal into the frontend program These records can beaccessed by subscript and field references on P For example, P[i] refers to the i-th recordreturned by the last Fetch command and P[i].name refers to the ‘‘name’’ field in the i-th record.Subsequent fetches replace the previously fetched data in the frontend program buffer

The concept of a portal is that the data in the buffer is the data currently being displayed bythe browser Commands entered by the user at the terminal are translated into database com-mands that change the data in the buffer which is then redisplayed Suppose, for example, theuser entered a command to scroll forward half a screen This command would be translated bythe frontend program (i.e., the browser) into a Move command followed by a Fetch command.The following two commands would fetch data into the buffer which when redisplayed wouldappear to scroll the data forward by one half screen:

move P forward 10

fetch 20 into P

The Move command repositions the ‘‘current position’’ to point to the 11-th tuple in the portaland the Fetch command fetches tuples 11 through 30 in the ordering established by executing thequery plan The ‘‘current position’’ of the portal is the first tuple returned by the last Fetch com-mand If Move commands have been executed since the last Fetch command, the ‘‘current posi-tion’’ is the first tuple that would be returned by a Fetch command if it were executed

The Move command has other variations that simplify the implementation of other ing commands Variations exist that allow the portal postion to be moved forward or backward,

brows-to an absolute position, or brows-to the first tuple that satisfies a predicate For example, brows-to scroll wards one half screen, the following commands are issued:

back-move P backward 10

fetch 20 into P

In addition to keeping track of the ‘‘current position,’’ the backend process also keeps track of thesequence number of the current tuple so that the program can move to an absolute position Forexample, to scroll forward to the 63-rd tuple the program executes the command:

move P forward to 63

Lastly, a Move command is provided that will search forward or backward to the first tuplethat satisfies a predicate as illustrated by the following command that moves forward to the firstemployee whose salary is greater than $25,000:

move P forward to salary > 25K

This command positions the portal on the first qualifying tuple A Fetch command will fetch thistuple and the ones immediately following it which may not satisfy the predicate To fetch onlytuples that satisfy the predicate, the Fetch command is used as follows:

Trang 12

fetch 20 into P where salary > 25K

The backend process will continue to execute the query plan until 20 tuples have been found thatsatisfy the predicate or until the portal data is exhausted

Portals differ significantly from cursors in the way data is updated Once a cursor is tioned on a record, it can be modified or deleted (i.e., updated directly) Data in a portal cannot

posi-be updated directly It is updated by Delete or Replace commands on the relations from whichthe portal data is taken Suppose the user entered commands to a browser that change Smith’ssalary Assuming that Smith’s record is already in the buffer, the browser would translate thisrequest into the following sequence of commands:

pre-In addition to the Retrieve-portal command, portals can be defined by an Execute mand For example, suppose the EMP relation had a field of type POSTQUEL named ‘‘hobbies’’

com-EMP (name, salary, age, hobbies)

that contained commands to retrieve a person’s hobbies from the following relations:

SOFTBALL (name, position, batting-avg)

COMPUTERS (name, isowner, brand, interest)

An application program can define a portal that will range over the tuples describing a person’shobbies as follows:

execute portal H(EMP.hobbies)

where EMP.name = ‘‘Smith’’

This command defines a portal, named ‘‘H,’’ that is bound to Smith’s hobby records Since aperson can have several hobbies, represented by more than on Retrieve command in the ‘‘hob-bies’’ field, the records in the buffer may have different types Consequently, HITCHING POSTmust provide routines that allow the program to determine the number of fields, and the type,name, and value of each field in each record fetched into the buffer

4.2 Compilation and Fast-Path

This subsection describes facilities to improve the performance of query execution Twofacilities are provided: query compilation and fast-path Any POSTQUEL command, includingportal commands, can take advantage of these facilities

POSTGRES has a system catalog in which application programs can store queries that are

to be compiled The catalog is named ‘‘CODE’’ and has the following structure:

CODE(id, owner, command)

The ‘‘id’’ and ‘‘owner’’ fields form a unique identifier for each stored command The mand’’ field holds the command that is to be compiled Suppose the programmer of the relationbrowser described above wanted to compile the Replace command that was used to update theemployee’s salary field The program could append the command, with suitable parameters, tothe CODE catalog as follows:

Trang 13

‘‘com-append to CODE(id=1, owner=‘‘browser’’,

command=‘‘replace EMP(salary=$1) where EMP.name=$2’’)

‘‘$1’’ and ‘‘$2’’ denote the arguments to the command Now, to execute the Replace commandthat updates Smith’s salary shown above, the program executes the following command:

execute (CODE.command)

with (NewSalary, ‘‘Smith’’)

where CODE.id=1 and CODE.owner=‘‘browser’’

This command executes the Replace command after substituting the arguments

Executing commands stored in the CODE catalog does not by itself make the command runany faster However, a compilation demon is always executing that examines the entries in theCODE catalog in every database and compiles the queries Assuming the compilation demon hascompiled the Replace command in CODE, the query should run substantially faster because thetime to parse and optimize the query is avoided Section 5 describes a general purpose mechan-ism for invalidating compiled queries when the schema changes

Compiled queries are faster than queries that are parsed and optimized at run-time but forsome applications, even they are not fast enough The problem is that the Execute command thatinvokes the compiled query still must be processed Consequently, a fast-path facility is providedthat avoids this overhead In the Execute command above, the only variability is the argumentlist and the unique identifier that selects the query to be run HITCHING POST has a run-timeroutine that allows this information to be passed to the backend in a binary format For example,the following function call invokes the Replace command described above:

exec-fp(1, ‘‘browser’’, NewSalary, ‘‘Smith’’)

This function sends a message to the backend that includes only the information needed to mine where each value is located The backend retrieves the compiled plan (possibly from thebuffer pool), substitutes the parameters without type checking, and invokes the query plan Thispath through the backend is hand-optimized to be very fast so the overhead to invoke a compiledquery plan is minimal

deter-This subsection has described facilities that allow an application programmer to improvethe performance of a program by compiling queries or by using a special fast-path facility

5 SYSTEM ARCHITECTURE

This section describes how we propose to implement POSTGRES The first subsectiondescribes the process structure The second subsection describes how query processing will beimplemented, including fields of type POSTQUEL, procedure, and user-defined data type Thethird subsection describes how alerters, triggers, and rules will be implemented And finally, thefourth subsection describes the storage system for implementing time varying data

5.1 Process Structure

DBMS code must run as a sparate process from the application programs that access thedatabase in order to provide data protection The process structure can use one DBMS processper application program (i.e., a process-per-user model [STON81]) or one DBMS process for allapplication programs (i.e., a server model) The server model has many performance benefits(e.g., sharing of open file descriptors and buffers and optimized task switching and message send-ing overhead) in a large machine environment in which high performance is critical However,this approach requires that a fairly complete special-purpose operating system be built In con-strast, the process-per-user model is simpler to implement but will not perform as well on mostconventional operating systems We decided after much soul searching to implementPOSTGRES using a process-per-user model architecture because of our limited programming

Trang 14

resources POSTGRES is an ambitious undertaking and we believe the additional complexityintroduced by the server architecture was not worth the additional risk of not getting the systemrunning Our current plan then is to implement POSTGRES as a process-per-user model on Unix4.3 BSD.

The process structure for POSTGRES is shown in figure 3 The POSTMASTER will tain the lock manager (since there are no shared segments in 4.3 BSD) and will control thedemons that will perform various database services (such as asynchronously compiling user com-mands) There will be one POSTMASTER process per machine, and it will be started at ‘‘sys-gen’’ time

con-The POSTGRES run-time system executes commands on behalf of one application gram However, a program can have several commands executing at the same time The mes-sage protocol between the program and backend will use a simple request-answer model Therequest message will have a command designator and a sequence of bytes that contain the argu-ments The answer message format will include a response code and any other data requested bythe command Notice that in contrast to INGRES [STON76] the backend will not ‘‘load up’’ thecommunication channel with data The frontend requests a bounded amount of data with eachcommand

pro-5.2 Query Processing

This section describes the query processing strategies that will be implemented inPOSTGRES We plan to implement a conventional query optimizer However, three extensionsare required to support POSTQUEL First, the query optimizer must be able to take advantage ofuser-defined access methods Second, a general-purpose, efficient mechanism is needed to sup-port fields of type POSTQUEL and procedure And third, an efficient mechanism is required tosupport triggers and rules This section describes our proposed implementation of these mechan-isms

5.2.1 Support for New Types

As noted elsewhere [STON86], existing access methods must be usable for new data types,new access methods must be definable, and query processing heuristics must be able to optimizeplans for which new data types and new access methods are present The basic idea is that anhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

Figure 3 POSTGRES process structure

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

Ngày đăng: 28/04/2014, 13:31

TỪ KHÓA LIÊN QUAN