4. b Object Oriented Databases 2012

4. b Object Oriented Databases 2012 tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về tất cả các l...

Trang 1

Object-oriented databases (OODBs) can be viewed as an extension

of relational databases (RDBs): the attributes of the database can be

objects which are defined in an object-oriented language (OOL) In

con-trast to RDBs, OODBs are thus not completely isolated and portable,

but are tightly connected to their OOL Objectivity (C++, C#, Java,

Python, Smalltalk and XML), ObjectStore (C++, Java, NET), and O2

(C++) are the most prominent products The field of OODBs is in

gen-eral much smaller than the RDB field

OODBs vs RDBs

OODBs are data management systems that store data in tuples of

attributes organized in relations (Figure 1) There are thus similar

to RDBs (previous lecture), but there are also major differences The

first, distinctive difference is that the attributes of the OODB can be

objects, i.e attributes do not only contain a single entry, but consist

of a collection of bits (example 1 and subsection objects) Objects are

defined in an OOL like Python, Java, C++ or a language unique to

the database

STUDENT S-ID S-Name S-Birth 001

002

Alice Bob

<date::01>

<date::02>

DATE Day Month Year Age()

GRADES

LectureName S-ID

SDB

Grade

001 002

5.75 5.5 n:1

ECTS

4

Figure 1: OODBs store data in tuples (rows) of attributes (columns) organized

in relations (tables) Unlike RDBs, the attributes can be objects defined by the underlying OOL In the example DATE

is an object that stores the integers Day, Month and Year as well as the method Age().

Example 1 (OODB are RDBs where attributes can be objects) We

want to store the name and age of all students In RDBs, the age of a

stu-dent is stored, whereas in OODBs it can be computed by a complex

ob-ject storing only the birthday and a method to compute the age: Using

an RDB, the tableStudent contains the tuple Student.S-ID (integer),

Student.S-Name (string), Student.S-Birth (date), and Student.Age

(integer) Using an OODB, the tableStudent contains the tuple Student.S-ID

(integer),Student.S-Name (string), and Student.S-Birth, where

Student.S-Birth is an object storing the integers Day, Month, Year and

the methodAge() (Figure 1).

Trang 2

A second distinction from RDBs is that there exists no established

standard The Object Data Management Group (ODMG,http://

www.odbms.org/ODMG/) defined the Object Data Management Standard

ODMG 3.0 (2000) A major component of this standard is Object

Query Language (OQL), a non-procedural language similar to SQL

for RDBs OQL is based on SQL (see similarity to SQL in example 2);

it supports update and query functionalities But unlike SQL, OQL

is not an established standard; it has never been fully implemented

This is mainly because of the tight connection between OODBs and

programming languages: A OODB is closely depending on a

object-oriented language This causes unavoidable differences between the

various OODBs

Example 2 Imagine the university using the database introduced above

wants to find suitable candidates for a scholarship The criteria that have to

be met are

• grades that are on average better than 5.5 and

• being less than 25 years old.

Suitable candidates can be found by querying the tablesStudents and

Grades (figure 1) using OQL

SELECT Student.S-ID, AVG(Grades.grade)

FROM

SELECT S-ID, S-Name

FROM Student

WHERE Birth.Age < 25

WHERE Grades.S-ID = Student.S-ID

GROUP BY Student.S-ID

whereBirth.age is a function computing the age of student from his

birthday (part of the object that forms the attributeBirth in Students)

andAVG(Grades.grade) computes the average of all grades obtained by

the same student The SQL statement would look similar, but we would not

be able to make use of a method to compute the age Therefore, we would

need a more complex SQL statement, e.g using arithmetics or the function

DATEDIFF

Although OODBs are conceptually ideal for scientific databases

(SDBs), the main disadvantages are:

• not as frequently used as RDBs: less tools, libraries and support

available

Trang 3

• fewer good implementations

• restrictions imposed by use of a specific OOL

• research-your-own is very popular

Furthermore, RDBs converge into OODBs which makes OODBs

more and more obsolete The convergence is possible through

object-relational mapping (ORM) The key idea is to store objects defined

in the object-oriented language in a RDB using a group of attributes,

such that the properties and relationships are conserved The object

can be restored with all functionalities Therefore, this mapping

creates a virtual OODB using an RDB However, this approach has

some conceptual difficulties (object-relational impedance mismatch),

which arise from the different concepts of RDBs (relational algebra)

and OODBs (object orientation)

Glue language

All parts outside the database are programmed in the glue language

These parts outside include for example Backups, Archives, Auditing,

Computation, Validation, Output Production or Filters (see figure 2

"General Picture/Flow of the SDB" from the first lecture) One glue

language is thus always needed to connect an OODB to its outside

In addition, there should never be more than a single glue language,

as the glue language used should be the OOL the OODB is based on

Thus we arrive at:

Maxime 3: Oh No!#(glue languages) 6=1

Objects in SDBs

Unlike in RDBs, a matrix can be stored

in an OODBs in such a way that the system knows all properties (e.g number of rows and columns, the values and the validity) of the matrix.

Objects in the context of SDBs are named collections of bits which can

include attributes or any other components They are understood

by the system without additional knowledge Understanding means

knowing the following:

• the type (rich types, e.g the objectDATE in figure 1)

• the size

• the values

• the validity

Objects are fully described by blocks, selectors and constructors

Blocks can for example be numbers, strings, object references, or

Trang 4

closed entities we do not desire to look inside (pictures, movies, pdfs

etc.) Constructors on the other hand are functions that are called

when an object is initialized They guarantee, for example, that the

object is valid (see table 1 for an example of validity rules)

Name Type Validity rules

Day integer 0<Day≤31

(Month=2∧Year mod 4=0) ⇒Day≤28 Month integer 0<Day≤12

Year integer 0<Year≤today().getYear()

Table 1: The fields of the object date (see figure 1) with some basic validity rules The validity rules are checked by the constructor and on every update event.

It is also possible that objects themselves contain objects These

subordinate objects can can be either included or referenced Here, an

included object is a direct part of the superordinate object, whereas

a referenced object is an object that exists also outside of the

super-ordinate object Each attribute has a name, a type and validity rules

which are enforced by the object constructor

Another important feature of objects is that they allow the addition

of arbitrary fields that can also be empty Thus a value can be added

to the fields of some objects of a class without having to update all

This is a very desirable property for databases, as not every possible

evolution can be foreseen in the process of designing a database This

is more difficult using RDBs: even though nowadays ORMs allow

for the migration of database schemes, changing the design of the

relations in RDBs remains inconvenient

Normal Forms

All attributes or objects in OODB have to fulfil the normal forms (as

known from RDB, see RDB lecture notes): A → B → C, where→

means functional dependent C→B→A is a violation of the normal

forms The 12 rules of Codd (see Appendix) also apply to OODBs if

adapted

Normal forms:(see also previous lecture on RDBs) The normal Codd, E.F A Relational Model of Data for

Large Shared Data Banks Communica-tions of the ACM 13 (6): 377-387, June 1970

Codd, E.F Further Normalization of the Data Base Relational Model IBM Research Report RJ909, August 1971.

forms (NF) where defined in RDB theory in order to avoid anomalies

after insert, update or delete events The first three normal forms

were formulated by E F Codd in the early seventies:

1NFdefines the relation property between tables: an attribute has

to contain atomic values

2NF:No non prime attribute is dependent on any proper subset of

any candidate key of the table

3NF:Every non-prime attribute is non-transitively dependent on

every candidate key

Trang 5

Example 3 (Violation of normal forms) The relationGRADES in figure 1

violates the second normal form: The candidate key is the set{LectureName,

S-ID} The non-prime attributeECTS is only depending on LectureName

This database design can cause various anomalies: A change of the credit

points for a lecture would cause update anomalies, if not all corresponding

rows would be updated Moreover, a lecture can only be added, if the grade

for at least one student would be available

Figure 2 shows an alternative design, which is in second normal form.

STUDENT S-ID S-Name S-Birth 001

002

Alice Bob

<date::01>

<date::02>

DATE Day Month Year Age()

GRADES

LectureName S-ID

SDB

Grade

001 002

5.75 5.5

n:1 LECTURES

LectureName

SDB

ECTS 4 1:n

Figure 2: Table Grades is in the second normal form (in contrast to figure 1), as the only non-prime attribute is neither only depending on LectureName, nor

on S-ID.

As mentioned above, objects have names Names can be both

URLs or URIs and the object names (onames) are used to reference

the object within the DB The names can for example be composed of

Type:Locator:ID, where Locator is in principle a query, a name (of

a file or database) or a computation (see for exampleStudents.S-Birth

in figure 1)

Finally, objects have selectors which allow to select parts of

individ-ual objects They also supply attribute names of default values if not

defined Furthermore, selectors can compute results from the objects

Operations in OODBs

Although operations in OODBs are dependent on the underlying

OOL (see comparison of frequently used OOLs in Appendix 2), they

have some common characteristics of object oriented languages

First of all, operators can be polymorphic Polymorphism is the

notion of using a common operator for various types of inputs For

example a+b, a·b, ab, a∧b will adapt to the type of operands they

are applied to

This often means that an operator has different implementations

for various types of inputs – a concept called operator overloading (see

Trang 6

example 4) The used implementation is chosen based on the type of

the inputs, i.e there are different implementations for adding up two

integers or two matrices; the result will be valid in both cases

Example 4 (Operator overloading and computations on the fly) In

Darwin the operator+can be used to add up integers

> a := 5 + 7;

a := 12

but also to add random numbers to an existingStat() data structure

> c := Stat('one million [0,1] random numbers'):

> to 1e6 do c + Rand() od:

print(c);

one million [0,1] random numers: number of samples 1e+06

mean = 0.49983 +- 0.00057

variance = 0.08332 +- 0.00015

skewness= 0.00086821, escess=-1.19832

minimum=1.43307e-06, maximum=0.999997

The statistical information provided by theStat() structure c is hereby not

stored, but actually computed on the fly This can be seen if the union of

anotherStat() structure e with c is printed:

> e := Stat('another million [0,1] random numbers'):

> to 1e6 do e + Rand() od:

> print (c union e);

one million [0,1] random numbers and another million [0,1]

random numbers: number of sample points=2e+06

mean = 0.50006 +- 0.00040

variance = 0.08332 +- 0.00010

skewness=0.000104374, excess=-1.19933

minimum=3.62886e-07, maximum=1

Here c union e is not the union of the fixed statistical values of c and e, but

the statistical values of the union c and e

Objects in OOLs can also have methods which can be accessed

like attributes The result of methods is not stored, but computed on

the fly (see example 4 where the statistical values mean, variance,

skewness, excess, minimum and maximum are computed on the fly)

Hence, they correspond to the notion of views: A view is a stored

query, the result of which is computed on the fly based on stored

information

As can be seen in example 2, searching in OODBs works

simi-lar to SQL queries in RDBs with Select From Where statements

Apart from the obvious difference that not only attributes, but also

Trang 7

objects and attributes of objects can be used, the main difference is

that also the methods of objects can be used for queries Again, the

values have to be computed on the fly This has two consequences:

first, query optimization in OODBs is complicated; the complexity of

the model and query optimization are positively correlated Second,

OODB systems are slower and less efficient than their RDB

counter-parts because of the overhead in storing objects and the increased

complexity in interpretation

Summary

OODBs are similar to RDBs, but they have the huge advantage that

their attributes can be objects Objects are collections of bits which

are understood by the system without any further knowledge This

and also the fact that computations on the fly and the addition of

arbitrary fields are possible make OODBS very appealing for SDBs

But they are also many drawbacks First, OODBs are used less

fre-quenlty than RDBs, there are thus less tools, less support and less

libraries available Also there are fewer good implemetations and

no established standards Second, OODBs are dependent on and

thus restricted by an OOL Third, OODBs are less efficient and much

slower than their RDB counterparts because of the overhead in

stor-ing objects and increased complexity in interpretation And fourth,

OODBs become more and more obsolete with ORM allowing for

virtual OODBs in RDBs and thus convergence of RDBs to OODBs

Nevertheless, OODBs are a very attractive model for SDBs because of

their flexibilities

Trang 8

Appendix 1

Codd’s 12 rules cited from http://en.wikipedia.org/

wiki/Codd%27s_12_rules#The_rules

on 18/10/2012

Rule (0): The system must qualify as relational, as a database,

and as a management system For a system to qualify as a relational

database management system (RDBMS), that system must use its

relational facilities (exclusively) to manage the database

Rule 1: The information rule: All information in a relational

database (including table and column names) is represented in only

one way, namely as a value in a table

Rule 2: The guaranteed access rule: All data must be accessible

This rule is essentially a restatement of the fundamental requirement

for primary keys It says that every individual scalar value in the

database must be logically addressable by specifying the name of the

containing table, the name of the containing column and the primary

key value of the containing row

Rule 3: Systematic treatment of null values: The database

man-agement system must allow each field to remain null (or empty)

Specifically, it must support a representation of "missing information

and inapplicable information" that is systematic, distinct from all

reg-ular values (for example, "distinct from zero or any other number", in

the case of numeric values), and independent of data type It is also

implied that such representations must be manipulated by the DBMS

in a systematic way

Rule 4: Active online catalog based on the relational model: The

system must support an online, inline, relational catalog that is

acces-sible to authorized users by means of their regular query language

That is, users must be able to access the database’s structure (catalog)

using the same query language that they use to access the database’s

data

Rule 5: The comprehensive data sublanguage rule: The system

must support at least one relational language that has a linear syntax,

can be used both interactively and within application programs,

supports data definition operations (including view definitions), data

manipulation operations (update as well as retrieval), security and

integrity constraints, and transaction management operations (begin,

commit, and rollback)

Rule 6: The view updating rule: All views that are theoretically

updatable must be updatable by the system

Rule 7: High-level insert, update, and delete: The system must

support set-at-a-time insert, update, and delete operators This means

that data can be retrieved from a relational database in sets

con-structed of data from multiple rows and/or multiple tables This rule

states that insert, update, and delete operations should be supported

Trang 9

for any retrievable set rather than just for a single row in a single

table

Rule 8: Physical data independence: Changes to the physical level

(how the data is stored, whether in arrays or linked lists etc.) must

not require a change to an application based on the structure

Rule 9: Logical data independence: Changes to the logical level

(tables, columns, rows, and so on) must not require a change to an

application based on the structure Logical data independence is

more difficult to achieve than physical data independence

Rule 10: Integrity independence: Integrity constraints must be

specified separately from application programs and stored in the

catalog It must be possible to change such constraints as and when

appropriate without unnecessarily affecting existing applications

Rule 11: Distribution independence: The distribution of portions

of the database to various locations should be invisible to users of the

database Existing applications should continue to operate

success-fully when a distributed version of the DBMS is first introduced and

when existing distributed data are redistributed around the system

Rule 12: The nonsubversion rule: If the system provides a

low-level (record-at-a-time) interface, then that interface cannot be used

to subvert the system, for example, bypassing a relational security or

integrity constraint

Trang 10

Appendix 2

Comparison of frequently used OOLs as provided on the course

web-site (http://www.cbrg.ethz.ch/ education/SDB/languages.pdf) on

20 /10/2012.

Object construction A ặ );

A *a = new Ặ ) A a = new Ặ ) a = Ặ ) a := Ặ )

Javạlang.reflection.* ạXgetattr(a,”X”) a[X],(computed value)a[’X’] a[other]

Polymorphic operators A::operator+( ) (only predefined) a+b, c — set([1]), 5*’d’

Ạ ađ (self,other): a+b, c union 1, 5*d Inheritance class A : public B,C

Multiple inheritance Different protection lev-els

public class A extends B implements C (classes and interfaces)

class ĂB): Inherit(A,B)

ExtendClass(A,B, [name,type,def], ) Generics/Templates

for classes template<class A>class B public class A<B,C> no no

Generics/Templates

for functions/methods template<class A>A max(A a,A b) public static <A>A max(A a,A b) all parameters generic no

Table 1: Overview of some OO languages

Định dạng
Số trang	10
Dung lượng	231,58 KB