4. b Object Oriented Databases 2012 tài liệu, giáo án, bài giảng , luận văn, luận án, đồ án, bài tập lớn về tất cả các l...
Trang 1Object-oriented databases (OODBs) can be viewed as an extension
of relational databases (RDBs): the attributes of the database can be
objects which are defined in an object-oriented language (OOL) In
con-trast to RDBs, OODBs are thus not completely isolated and portable,
but are tightly connected to their OOL Objectivity (C++, C#, Java,
Python, Smalltalk and XML), ObjectStore (C++, Java, NET), and O2
(C++) are the most prominent products The field of OODBs is in
gen-eral much smaller than the RDB field
OODBs vs RDBs
OODBs are data management systems that store data in tuples of
attributes organized in relations (Figure 1) There are thus similar
to RDBs (previous lecture), but there are also major differences The
first, distinctive difference is that the attributes of the OODB can be
objects, i.e attributes do not only contain a single entry, but consist
of a collection of bits (example 1 and subsection objects) Objects are
defined in an OOL like Python, Java, C++ or a language unique to
the database
STUDENT S-ID S-Name S-Birth 001
002
Alice Bob
<date::01>
<date::02>
DATE Day Month Year Age()
GRADES
LectureName S-ID
SDB
SDB
Grade
001 002
5.75 5.5 n:1
ECTS
4
4
Figure 1: OODBs store data in tuples (rows) of attributes (columns) organized
in relations (tables) Unlike RDBs, the attributes can be objects defined by the underlying OOL In the example DATE
is an object that stores the integers Day, Month and Year as well as the method Age().
Example 1 (OODB are RDBs where attributes can be objects) We
want to store the name and age of all students In RDBs, the age of a
stu-dent is stored, whereas in OODBs it can be computed by a complex
ob-ject storing only the birthday and a method to compute the age: Using
an RDB, the tableStudent contains the tuple Student.S-ID (integer),
Student.S-Name (string), Student.S-Birth (date), and Student.Age
(integer) Using an OODB, the tableStudent contains the tuple Student.S-ID
(integer),Student.S-Name (string), and Student.S-Birth, where
Student.S-Birth is an object storing the integers Day, Month, Year and
the methodAge() (Figure 1).
Trang 2A second distinction from RDBs is that there exists no established
standard The Object Data Management Group (ODMG,http://
www.odbms.org/ODMG/) defined the Object Data Management Standard
ODMG 3.0 (2000) A major component of this standard is Object
Query Language (OQL), a non-procedural language similar to SQL
for RDBs OQL is based on SQL (see similarity to SQL in example 2);
it supports update and query functionalities But unlike SQL, OQL
is not an established standard; it has never been fully implemented
This is mainly because of the tight connection between OODBs and
programming languages: A OODB is closely depending on a
object-oriented language This causes unavoidable differences between the
various OODBs
Example 2 Imagine the university using the database introduced above
wants to find suitable candidates for a scholarship The criteria that have to
be met are
• grades that are on average better than 5.5 and
• being less than 25 years old.
Suitable candidates can be found by querying the tablesStudents and
Grades (figure 1) using OQL
SELECT Student.S-ID, AVG(Grades.grade)
FROM
SELECT S-ID, S-Name
FROM Student
WHERE Birth.Age < 25
WHERE Grades.S-ID = Student.S-ID
GROUP BY Student.S-ID
whereBirth.age is a function computing the age of student from his
birthday (part of the object that forms the attributeBirth in Students)
andAVG(Grades.grade) computes the average of all grades obtained by
the same student The SQL statement would look similar, but we would not
be able to make use of a method to compute the age Therefore, we would
need a more complex SQL statement, e.g using arithmetics or the function
DATEDIFF
Although OODBs are conceptually ideal for scientific databases
(SDBs), the main disadvantages are:
• not as frequently used as RDBs: less tools, libraries and support
available
Trang 3• fewer good implementations
• restrictions imposed by use of a specific OOL
• research-your-own is very popular
Furthermore, RDBs converge into OODBs which makes OODBs
more and more obsolete The convergence is possible through
object-relational mapping (ORM) The key idea is to store objects defined
in the object-oriented language in a RDB using a group of attributes,
such that the properties and relationships are conserved The object
can be restored with all functionalities Therefore, this mapping
creates a virtual OODB using an RDB However, this approach has
some conceptual difficulties (object-relational impedance mismatch),
which arise from the different concepts of RDBs (relational algebra)
and OODBs (object orientation)
Glue language
All parts outside the database are programmed in the glue language
These parts outside include for example Backups, Archives, Auditing,
Computation, Validation, Output Production or Filters (see figure 2
"General Picture/Flow of the SDB" from the first lecture) One glue
language is thus always needed to connect an OODB to its outside
In addition, there should never be more than a single glue language,
as the glue language used should be the OOL the OODB is based on
Thus we arrive at:
Maxime 3: Oh No!#(glue languages) 6=1
Objects in SDBs
Unlike in RDBs, a matrix can be stored
in an OODBs in such a way that the system knows all properties (e.g number of rows and columns, the values and the validity) of the matrix.
Objects in the context of SDBs are named collections of bits which can
include attributes or any other components They are understood
by the system without additional knowledge Understanding means
knowing the following:
• the type (rich types, e.g the objectDATE in figure 1)
• the size
• the values
• the validity
Objects are fully described by blocks, selectors and constructors
Blocks can for example be numbers, strings, object references, or
Trang 4closed entities we do not desire to look inside (pictures, movies, pdfs
etc.) Constructors on the other hand are functions that are called
when an object is initialized They guarantee, for example, that the
object is valid (see table 1 for an example of validity rules)
Name Type Validity rules
Day integer 0<Day≤31
(Month=2∧Year mod 4=0) ⇒Day≤28 Month integer 0<Day≤12
Year integer 0<Year≤today().getYear()
Table 1: The fields of the object date (see figure 1) with some basic validity rules The validity rules are checked by the constructor and on every update event.
It is also possible that objects themselves contain objects These
subordinate objects can can be either included or referenced Here, an
included object is a direct part of the superordinate object, whereas
a referenced object is an object that exists also outside of the
super-ordinate object Each attribute has a name, a type and validity rules
which are enforced by the object constructor
Another important feature of objects is that they allow the addition
of arbitrary fields that can also be empty Thus a value can be added
to the fields of some objects of a class without having to update all
This is a very desirable property for databases, as not every possible
evolution can be foreseen in the process of designing a database This
is more difficult using RDBs: even though nowadays ORMs allow
for the migration of database schemes, changing the design of the
relations in RDBs remains inconvenient
Normal Forms
All attributes or objects in OODB have to fulfil the normal forms (as
known from RDB, see RDB lecture notes): A → B → C, where→
means functional dependent C→B→A is a violation of the normal
forms The 12 rules of Codd (see Appendix) also apply to OODBs if
adapted
Normal forms:(see also previous lecture on RDBs) The normal Codd, E.F A Relational Model of Data for
Large Shared Data Banks Communica-tions of the ACM 13 (6): 377-387, June 1970
Codd, E.F Further Normalization of the Data Base Relational Model IBM Research Report RJ909, August 1971.
forms (NF) where defined in RDB theory in order to avoid anomalies
after insert, update or delete events The first three normal forms
were formulated by E F Codd in the early seventies:
1NFdefines the relation property between tables: an attribute has
to contain atomic values
2NF:No non prime attribute is dependent on any proper subset of
any candidate key of the table
3NF:Every non-prime attribute is non-transitively dependent on
every candidate key
Trang 5Example 3 (Violation of normal forms) The relationGRADES in figure 1
violates the second normal form: The candidate key is the set{LectureName,
S-ID} The non-prime attributeECTS is only depending on LectureName
This database design can cause various anomalies: A change of the credit
points for a lecture would cause update anomalies, if not all corresponding
rows would be updated Moreover, a lecture can only be added, if the grade
for at least one student would be available
Figure 2 shows an alternative design, which is in second normal form.
STUDENT S-ID S-Name S-Birth 001
002
Alice Bob
<date::01>
<date::02>
DATE Day Month Year Age()
GRADES
LectureName S-ID
SDB
SDB
Grade
001 002
5.75 5.5
n:1 LECTURES
LectureName
SDB
ECTS 4 1:n
Figure 2: Table Grades is in the second normal form (in contrast to figure 1), as the only non-prime attribute is neither only depending on LectureName, nor
on S-ID.
As mentioned above, objects have names Names can be both
URLs or URIs and the object names (onames) are used to reference
the object within the DB The names can for example be composed of
Type:Locator:ID, where Locator is in principle a query, a name (of
a file or database) or a computation (see for exampleStudents.S-Birth
in figure 1)
Finally, objects have selectors which allow to select parts of
individ-ual objects They also supply attribute names of default values if not
defined Furthermore, selectors can compute results from the objects
Operations in OODBs
Although operations in OODBs are dependent on the underlying
OOL (see comparison of frequently used OOLs in Appendix 2), they
have some common characteristics of object oriented languages
First of all, operators can be polymorphic Polymorphism is the
notion of using a common operator for various types of inputs For
example a+b, a·b, ab, a∧b will adapt to the type of operands they
are applied to
This often means that an operator has different implementations
for various types of inputs – a concept called operator overloading (see
Trang 6example 4) The used implementation is chosen based on the type of
the inputs, i.e there are different implementations for adding up two
integers or two matrices; the result will be valid in both cases
Example 4 (Operator overloading and computations on the fly) In
Darwin the operator+can be used to add up integers
> a := 5 + 7;
a := 12
but also to add random numbers to an existingStat() data structure
> c := Stat('one million [0,1] random numbers'):
> to 1e6 do c + Rand() od:
print(c);
one million [0,1] random numers: number of samples 1e+06
mean = 0.49983 +- 0.00057
variance = 0.08332 +- 0.00015
skewness= 0.00086821, escess=-1.19832
minimum=1.43307e-06, maximum=0.999997
The statistical information provided by theStat() structure c is hereby not
stored, but actually computed on the fly This can be seen if the union of
anotherStat() structure e with c is printed:
> e := Stat('another million [0,1] random numbers'):
> to 1e6 do e + Rand() od:
> print (c union e);
one million [0,1] random numbers and another million [0,1]
random numbers: number of sample points=2e+06
mean = 0.50006 +- 0.00040
variance = 0.08332 +- 0.00010
skewness=0.000104374, excess=-1.19933
minimum=3.62886e-07, maximum=1
Here c union e is not the union of the fixed statistical values of c and e, but
the statistical values of the union c and e
Objects in OOLs can also have methods which can be accessed
like attributes The result of methods is not stored, but computed on
the fly (see example 4 where the statistical values mean, variance,
skewness, excess, minimum and maximum are computed on the fly)
Hence, they correspond to the notion of views: A view is a stored
query, the result of which is computed on the fly based on stored
information
As can be seen in example 2, searching in OODBs works
simi-lar to SQL queries in RDBs with Select From Where statements
Apart from the obvious difference that not only attributes, but also
Trang 7objects and attributes of objects can be used, the main difference is
that also the methods of objects can be used for queries Again, the
values have to be computed on the fly This has two consequences:
first, query optimization in OODBs is complicated; the complexity of
the model and query optimization are positively correlated Second,
OODB systems are slower and less efficient than their RDB
counter-parts because of the overhead in storing objects and the increased
complexity in interpretation
Summary
OODBs are similar to RDBs, but they have the huge advantage that
their attributes can be objects Objects are collections of bits which
are understood by the system without any further knowledge This
and also the fact that computations on the fly and the addition of
arbitrary fields are possible make OODBS very appealing for SDBs
But they are also many drawbacks First, OODBs are used less
fre-quenlty than RDBs, there are thus less tools, less support and less
libraries available Also there are fewer good implemetations and
no established standards Second, OODBs are dependent on and
thus restricted by an OOL Third, OODBs are less efficient and much
slower than their RDB counterparts because of the overhead in
stor-ing objects and increased complexity in interpretation And fourth,
OODBs become more and more obsolete with ORM allowing for
virtual OODBs in RDBs and thus convergence of RDBs to OODBs
Nevertheless, OODBs are a very attractive model for SDBs because of
their flexibilities
Trang 8Appendix 1
Codd’s 12 rules cited from http://en.wikipedia.org/
wiki/Codd%27s_12_rules#The_rules
on 18/10/2012
Rule (0): The system must qualify as relational, as a database,
and as a management system For a system to qualify as a relational
database management system (RDBMS), that system must use its
relational facilities (exclusively) to manage the database
Rule 1: The information rule: All information in a relational
database (including table and column names) is represented in only
one way, namely as a value in a table
Rule 2: The guaranteed access rule: All data must be accessible
This rule is essentially a restatement of the fundamental requirement
for primary keys It says that every individual scalar value in the
database must be logically addressable by specifying the name of the
containing table, the name of the containing column and the primary
key value of the containing row
Rule 3: Systematic treatment of null values: The database
man-agement system must allow each field to remain null (or empty)
Specifically, it must support a representation of "missing information
and inapplicable information" that is systematic, distinct from all
reg-ular values (for example, "distinct from zero or any other number", in
the case of numeric values), and independent of data type It is also
implied that such representations must be manipulated by the DBMS
in a systematic way
Rule 4: Active online catalog based on the relational model: The
system must support an online, inline, relational catalog that is
acces-sible to authorized users by means of their regular query language
That is, users must be able to access the database’s structure (catalog)
using the same query language that they use to access the database’s
data
Rule 5: The comprehensive data sublanguage rule: The system
must support at least one relational language that has a linear syntax,
can be used both interactively and within application programs,
supports data definition operations (including view definitions), data
manipulation operations (update as well as retrieval), security and
integrity constraints, and transaction management operations (begin,
commit, and rollback)
Rule 6: The view updating rule: All views that are theoretically
updatable must be updatable by the system
Rule 7: High-level insert, update, and delete: The system must
support set-at-a-time insert, update, and delete operators This means
that data can be retrieved from a relational database in sets
con-structed of data from multiple rows and/or multiple tables This rule
states that insert, update, and delete operations should be supported
Trang 9for any retrievable set rather than just for a single row in a single
table
Rule 8: Physical data independence: Changes to the physical level
(how the data is stored, whether in arrays or linked lists etc.) must
not require a change to an application based on the structure
Rule 9: Logical data independence: Changes to the logical level
(tables, columns, rows, and so on) must not require a change to an
application based on the structure Logical data independence is
more difficult to achieve than physical data independence
Rule 10: Integrity independence: Integrity constraints must be
specified separately from application programs and stored in the
catalog It must be possible to change such constraints as and when
appropriate without unnecessarily affecting existing applications
Rule 11: Distribution independence: The distribution of portions
of the database to various locations should be invisible to users of the
database Existing applications should continue to operate
success-fully when a distributed version of the DBMS is first introduced and
when existing distributed data are redistributed around the system
Rule 12: The nonsubversion rule: If the system provides a
low-level (record-at-a-time) interface, then that interface cannot be used
to subvert the system, for example, bypassing a relational security or
integrity constraint
Trang 10Appendix 2
Comparison of frequently used OOLs as provided on the course
web-site (http://www.cbrg.ethz.ch/ education/SDB/languages.pdf) on
20 /10/2012.
Object construction A ặ );
A *a = new Ặ ) A a = new Ặ ) a = Ặ ) a := Ặ )
Javạlang.reflection.* ạXgetattr(a,”X”) a[X],(computed value)a[’X’] a[other]
Polymorphic operators A::operator+( ) (only predefined) a+b, c — set([1]), 5*’d’
Ạ ađ (self,other): a+b, c union 1, 5*d Inheritance class A : public B,C
Multiple inheritance Different protection lev-els
public class A extends B implements C (classes and interfaces)
class ĂB): Inherit(A,B)
ExtendClass(A,B, [name,type,def], ) Generics/Templates
for classes template<class A>class B public class A<B,C> no no
Generics/Templates
for functions/methods template<class A>A max(A a,A b) public static <A>A max(A a,A b) all parameters generic no
Table 1: Overview of some OO languages