000059726 A Web-based System for Schema Integration ( Một hệ thống dựa trên web cho lược đồ Tích hợp).
Trang 2for the honor degree of
Bachelor of Computer Science
THU VIEN DAI HOC HA NOI
HANOI UNIVERSITY LIBRARY
Trang 3
Abstract
Schema Integration is the solution for acquiring a non-redundant, unified representation of all data managed in organization to deal with the heterogeneity of
databases in database management systems It is the process of integrating existing or
proposed database schemas into a reconciled, global and integrated schema On the same view, the aim of this paper is to present a web-based system for schema integration This proposed system is originated from an existing data integration system named AutoMed that is a desktop-based system using a data integration technique called Both-as-view approach to establish mappings between schemas and expressed in a common hyper graph data model (HDM) At present, our system constitutes of two main functions They are displaying various schema models using a common layout in system of circles and then allowing users to interact with these schemas directly on web to perform integration And the schema models currently supported in the system are common models like Entity-relationship model, Relational model and UML model
Trang 4Acknowledgments
Firstly, I would like to send my thanks to my supervisor, Mr Le Minh Duc, who
is guiding me from the initial to the final stage of this project Actually, I have not been hard-working for the whole time during this research but thanks to his enthusiasm, there was time | try harder to find out the solution then gain much more valuable things Also, his professional working style is great motivation for me to do this project Once again thank for his suggestion, worthful feedbacks and his accompany with me during nearly 7
months
Secondly, I would like to thank Dr Nguyen Xuan Hoai, the director of our
research institute, who always encourage us, take time and best conditions for us to work
on this honor research program Thanks to his lectures on research methodology each week he delivered to us, that is very helpful to us during this research And in the start of each lecture class, he always asked us about our research progress, how we feel about it! This care and his encouragement when we said tired and unhappy gave us motivation to continue until the end of this research
Thirdly, | would like to thank the Faculty of the Information Technology for all knowledge I gained during four years, not only the academic knowledge but also the soft skills and especially for giving us the chance to participate in this research program This brings us an opportunity to learn how to do research as well as the chance for us to broaden our mind, gain a lot of knowledge, learn new things and learn how to deal with the pressure during researching
Trang 5Table of Contents
Table of Contents :ssscssessssseesseesneesneessesesseesseesssesssateesueesasessueesnecssecesnessuesesneceseessnees iv List of Figures
Chapter 1 Introduction csvsressgseisooreervonecposrsosnnserisunvescanavstuvsuvanseresaevvessaaabessstavinestaccstete 1 1,1 Ghallengess.cccssssocussasssvessavesnssnssescwcscwszsesosenessavasstesssaassstesesaor teetswiassstseciteviaweeeeee 1
1.2 Objectives
L3 MGHVRHOHisesiaoasrnnsenriredatiiiiisioiitdtilissliiaOETioAdELSfiiA084013i2f5551G603t 90030 800gg0 3
TA, MEUNGdGÌUBEW.eocesaonnaoornirbaiionaoiinriiidiiitiaodtlduoilitllslitOARQEQGklAtiiSugag ae 4 1.5 Contributions
1.6 An Overview of the Rest of the Document .s:sssscsssesseecssessessneecsneeesneeesneeeane 5 Chapter 2 The Background Knowledge .scsssssssssesscsssseesssssescssssecssuecssuseessnseeesnnneenssnnnets 7 DNs GQOMSH A TCS PALO sacessssssccnusacssesswasuazésusustnssettonesanestnaseneernneennepnesennonsnccennnennescenas 7
2.1.1 I COTC EES csssscssessasessaveccssssesseraesnetenkeracennsduarseensaceensoneracensensnnnecseensseneentseaneesene 7
2.1.2 Schema Integration Methodology ssssesssesseessesseeseeseesneeneecesneenseeneenee 14
Trang 63.2 System Functions csecsseesssescseessessseesetecseeesneessseesnessasesssessteceseesseessaeesseessed 43 Display schemas .scscsssesssseesseesseeesnecsneeeseeeeneenses
"D2 Có) 0L 1aa A 48
IM )L)Ể11 101000000 TT 48 Chapter 4 System Design and Implementation
` ) ` “ -Ö-Ö43544 50 LN 5 59 4.2.1 — Display Schemas
Graphuispec[GHINGIircoessecianranidskisdtkidiidiaGii105513015000888815u10301940110418441183Gã 64a mussao 6l Á:2/2 'TransfOrimis SGHEHiAS::a.sasssssionnootoinsoooonloinobdyaEaorstotzaodossal 65 Evaluation: the’ inte gration result ssccsvscosssccosucavsssascesvasassscsauscivussesscavssseessexssasccssessaisee 66 4.2.3 Display Schema Transformations Output scsssssssccssssccssssecssssessesseecessved 67 Chapter 5 ConclUS[O8iossssssessoesasgirdttogilgigiatticoitbita3148018514G1050080100081g00x34s.qoaad 68 5.I A Summary ofthe Lessons Learned -c¿-csc©cV2cseeccvserrrveerrrrsee 68 5.2 Future work
RSPEPENCES szeccpessrzzssserssescrsvnsiaassisititesesneepesnecnneeoneenvaneneesgnarsnsseeneeceazsersereneersvaznreneconseneneoranee 70
Trang 7Basic E-R model notations (Jeffrey A Hoffer et al 2006) -. - 9
An example of Entity-Relationship Model .c::ssssssessssesssesseesseesnseesseeesneees 10 EER super type and subtype notations (Jeffrey A Hoffer et al 2006) 11
An example of Relational model
Basic UML notations (Craig Larman 2005) .sssesssssssessessecsessecseesscsnecsneeneees 13
An example of UML model .ssssssesssssscsessssssssesssessssecssesssssseseessoesssnesnesessees 13 AutoMed system architecture
Conceptual View of the AutoMed Repository (Mc Brien 2006) 33
Hyper graph data model (HDM) in AutoMed cscesscessesscssessesscessneeneeneseoees 34
Figure 11: HDM and ER Models in AutoMed before customizing -‹- 36 Figure 12: E-R, Relational and UML schema model representation in AutoMed 36 Figure 13: User interface for creating schema transformations in AutoMed 38 Figure 14: Transformation pathway in AutoMed c-ccxscxvsrsxxerkrerkerrkerrrii 4I Figure 15: Transformation details in AutolMed -c-s+©cse+cxsscrsrsrkesrrresrrreccr 42 Figure 16: Proposed web-based schema integration system .ss:sssscssesseessseesneeesneeess 43 Figure 17: Citeleés: graphical syStenn sssccsssss ccccsssecsszusvsssvsacescesssevacesensicacesisatectasesaeeansenssene 47 Figure 18: ER model in web-based system -2++ccx+erxtrrrerrrrrrrresrrkrrrrerre 54 Figure 19: UML model in web-based system ssssesssssseesseeseesseesessessseeeseeseeneeeeeeseeeneeeee 55 Figure 20: Relational model in web-based system
vi
Trang 8Figure 21: Enhanced ER model in web-based system -+cs+©cscccveccccccrrvee 56 Figure 22: Demonstration of applying transformations
Figure 23: Single transformation pathway in web-based system -.- 58
Figure 24: Two pathways go through a Schema -ssssscssesesseesesseecesssesesecesssueeseeeneensees 58
Figure 25: The process for display schemas request
Figure 26: Graph data structure .essscssecssesssssesseesessecssesecssessecssscssseseesecaneanecsseesseesees! 60 Bigure:272Pointidata structune ccovessvsssvequesesnuncsncoscuvsssnsaresscoavesvsseavssisveaisosouscvessiaucesiavd 60 Figure 28: An example: schema in representation of graph model .- - 61 Figure 30: The process of apply transformation request .: sesssssssssssssesessesssnecssecesseeees 66 Figure 31: The process of view integration output request . -s+©css+cvseevcvz 67
Table I: ER model characteristics and equivalent SVG elements . - SI Table 2: UML model characteristics and equivalent SVG elements - - 52
Table 3: Relational model characteristics and equivalent SVG elements 52
Table 4: EER model characteristics and equivalent SVG elements
vil
Trang 9Chapter 1 Introduction
In larger applications or organizations, it is too complex to model the structure of the database by a single designer in a single view Moreover, user groups typically operate independently in organizations and have their own requirements and expectations of data, which may conflict with other user groups On demand of a unified database structure that provides a general view for the whole organization which will cover all other partial databases structure from groups and facilitate querying data stored in these groups, schema integration is posed as a solution for that situation to integrate database schemas from groups into a single and unified schema This paper aims to introduce a system to carry out this process That is a web-based system for integrating schemas
Trang 10browsers allows the users interact directly with schema objects to apply transformations between schemas to perform the integration, and viewing integration outputs on web? Generally, those are key functions in our system
This system is originated from an existing data integration system named AutoMed that is a software package with a Java API and a graphical tool for performing
schema integration I chose this system because currently AutoMed has good features for
a schema integration system The first feature it supports is automatically wrap relational
schemas Obviously, the inputs for schema integration system are schemas while many existing applications and organizations are managed in relational database management
systems (RDBMS); hence, it saves time and easy for users if a schema can be
automatically extracted from its database The second feature in AutoMed system is representing and displaying schema models in high level graph based model The availability of a Java Repository API that allows users to apply transformations between schemas to integrate is the third key function; especially AutoMed uses a low level hyper graph data model (HDM) as common data model for integrating schemas, the novel feature makes it different from other existing systems Another function in this system is viewing integration outputs They are pathways containing the transformations which users created on schema In addition to four primitive functions that help users to carry
out schema integration, AutoMed supports query over schemas
Besides these good features, AutoMed has some limitations such as the way AutoMed displays schemas is each schema is placed in a single window In other words, schemas are in different views This makes the comparison among schemas difficult Moreover, to create transformations between two schemas or query schemas users have to write complicated queries using a functional Intermediate Query Language (IQL) In addition,
Trang 11AutoMed requires users a specific working environment, people must know how to
configure and install it on their machine because so far AutoMed still has been a
standalone program
However, it cannot be denied that the great functions AutoMed are presently having, so from it I propose to build a user-friendly schema integration system for both non-expert and expert users which they can access anytime without installing and learning so much technical and academic knowledge to use At present, AutoMed includes a Java Repository API for performing schema integration process Therefore, | would like to utilize this engine and use it as schema integration engine on the web server in the web- based schema integration system All these good points as well as limitations about AutoMed are specified in the chapter 2
As a consequence, the rest of this paper is going to illustrate the methodology as well
as the process I have taken to accomplish this project
1.3 Motivation
Being interested in database management; especially, integrating schemas, hence
I chose this topic for my research and thesis project Moreover, making a practical thing instead of raising an idea without implementation is my ambition in which during studying about schema integration, I learned a lot from AutoMed system and discovered many good features as well as some limitations from it that I think, can be improved to be better, so I decided to develop this desktop-based system based on the knowledge and technique | learned for dealing with schema integration to a web-based system that is easier to use for the users than the original system
Trang 12specify new things I offer to do and think of the algorithm to fulfill those things
During this process, there are a lot of things totally new to me that are very complicated
to understand, so it takes time for reading over and over again, searching and keep reading recursively, taking notes and asking my supervisor for more explanation
Secondly, programming skill is strongly required to finish this project Although the idea and approach are clear, they are on the paper They need implementing The architecture of this system is client-server and AutoMed API engine would be placed on
the server that is implemented in Java, so I use JSP, Java and Servlet to work with this engine
In another side, | use SVG language (Scalable Vector Graphics) to display schemas on web because it is supported by almost of current web browsers and it is integrated with
Trang 13scripts to handle events triggered on its objects so it increases the interaction with the users Also, thanks to this, users can perform transforming schemas easily by interacting
with the objects in SVG and HTML form
1.5 Contributions
To deal with the challenges | stated before | desire to use a common layout both
to represent different schema models and integration outputs (i.e transformation
pathways) and then use Scalable Vector Graphics language to display on the web To
perform integration users interact directly to the objects in the schema to create transformations by manipulating with forms on the web browsers and make confirmations without learning too complex academic knowledge And then for now, | finally completed the project with some achievements:
- Define data structures to store and manipulate with schemas
- Successfully display schemas in Entity-relationship, UML and relational models
on the web using a system of circles
- Allows integrating schemas via the interaction with schema objects directly on the
web
1.6 An Overview of the Rest of the Document
The rest of this paper is organized as follows Background knowledge, necessary concepts and the original AutoMed system are reviewed in chapter 2 On the way of introducing AutoMed in this chapter I shall state functions and their problems in this system that I will propose to solve in my system In chapter 3, I will explain our proposed solutions and solutions for those problems which are AutoMed system’s limitations and
a
Trang 14the objectives specified in chapter | as well Chapter 4 is the detailed designs and implementations for solutions and proposed solutions described in chapter 3 Summary of the paper, learned lesions on finishing this project and future work are included in
Chapter 5
Trang 15Chapter 2 The Background Knowledge
2.1 Schema Integration
2.1.1 Concepts
To understand the schema integration problem, it is necessary to understand deeply and differentiate clearly the basic concepts This section is a brief explanation for the most basic concepts, terminologies in terms of schema integration problem
Schema (Database Schema)
“The DBMSs that store and manipulate a database must have a definition of the database in the form of a schema This is termed the intension of the database and the actual values of data in a database are called instances or occurrences of data
Sometimes they are termed the extension of a database, or just “the database”
(Batini and Lenzerini 1986)
Data Model (Database Model)
Codd (1970) stated “A data model is a notation for describing data or information”; the
description generally consists of three parts Structure of data, Operations on the data and
Constraints on data
Tools in programming languages such as C or Java describing the structure of the data
used by a program are arrays, structures or objects These data structures are used to implement data in the computer In the database world, data models are at somewhat
higher level than data structures Operations on data are generally anything that can be
7
Trang 16performed In the database models, operations are limited in a limited set of queries (operations that retrieve information) and modifications (operations that change the
database) Constraints on the data are limitations on what the data can be
From the definition of the schema, (the structure of database) and the model (the
notation for describing the structure and constraints on the database); we can refer the
model to the representation of the schema in specific forms
Some common database models are known as Relational model, Entity-Relationship,
Enhanced Entity-Relationship, UML, and XML
° Entity-Relationship Model
E-R Model is defined in the book [1] about modern database management as “a detailed, logical representation of data for an organization or for a business area The E-R model is expressed in terms of entities, the relationships (or associations) among those entities, and the attributes (or properties) of both the entities and their relationships An E-R model is normally expressed as an entity-relationship diagram (or E-R diagram), which this is a graphical representation of an E-R model” He also stated that “an entity is
a person, place, object, event, or concept in the user environment about which the
organization wishes to maintain data” Thus, an entity has a noun name, such a machine,
an employee, a class or a student
Fig | and 2 are notation and example of an E-R model
Trang 18° Enhanced Entity-Relationship Model
As defined in book [1] by Hoffer (2006) “the term enhanced entity-relationship (EER) model is used to identify the model that has resulted from extending the original E-R model with these new modeling constructs Beside the basic constructs such as
entities, relationships, attributes from original E-R model, the most modeling
constructs is incorporated in the EER model is super type/ subtype relationships The notations for two these constructs are depicted in Fig 3 This facility enables us to model
a general entity type (called the super type) and then subdivide it into several specialized entity types (called subtype)”
10
Trang 19‘Attributes shared Relationships \_by all entities
unnamed rows An attribute, consistent with its definition in E-R model is a named
column (or field) of a relation To enable storing and retrieving a row of data in a relation, based on the data values stored in that row, each relation must have a primary key that is an attribute (or combination of attributes) that uniquely identifies each row in
a relation Each row contains a unique instance of data for the categories defined by the columns (Hoffer 2006) A primary key is designated by underlining the attribute name
II
Trang 20The relationship between two tables or relations is represented through the use of foreign key, as defined by Hoffer (2006) “is an attribute (possibly composite) in a relation of a database that serves as the primary key of another relation” and designated by using a dashed underline Fig 4 below is an example of Relational model
tana [siren [Sung —]
Figure 4: An example of Relational model
° UML model
The UML data model represents data in form of classes, attributes (i.e data members in class) and associations (the relationships between classes) Fig 5 shows basic notations in a UML model Fig 6 gives an example to demonstrate a UML with those notations
Trang 21+ Dependency: Change * Association: Set of
to one thing will affect links between objects
Class the other
13
Trang 22related (called schema matching), then map one schema to another (schema mapping) by
creating transformations between schemas
Therefore, simply performing the integration task is defined as applying transformations between two schemas In other words, to fulfill integration task eventually is creating
transformations between schemas to achieve the correspondences
2.1.2 Schema Integration Methodology
Causes for Schema Diversity
The basic problems to be dealt with during integration come from structural (the difference in schema models) and semantic diversities of schemas to be merged The various causes for schema diversity are different perspectives, equivalence among constructs of the model, and incompatible design specifications[2]
Different Perspectives
In the design process, different user groups or designers adopt their own viewpoints in modeling the same objects in the application domain For instance, different names were attached to the same.concept in the two schemas
Equivalence among Constructs of the Model
Typically, in conceptual models, several combinations of constructs can model the same application domain equivalently As a consequence, “richer” models give rise to a larger variety of possibilities to model the same situation
Trang 23Incompatible Design Specifications
Erroneous choices regarding names, types, integrity constraints, etc may result in erroneous inputs to the schema integration process A good schema integration
methodology must lead to the detection of such errors
Common Concepts
Owing to the causes for schema diversity described above, it may very well happen that the same concept of the application domain can be represented by different representations (RI and R2) in different schemas and several types of semantic relationships can exist between such _ representations They may be identical, equivalent, compatible, or incompatible:
Identical (1): R\ and R2 are exactly the same This happens when the same modeling constructs are used, the same perceptions are applied, and no incoherence enters into the specification
Equivalent (2): R\ and R2 are not exactly the same because different but equivalent modeling constructs have been applied The perceptions are still the same and coherent Compatible (3): R1 and R2 are neither identical nor equivalent However, the modeling constructs, designer perception, and integrity constraints are not contradictory
Incompatible (4): R| and R2 are contradictory because of the incoherence of the
specification
Situations (2), (3), and (4) above can be interpreted as conflicts Conflicts and their resolutions are central to the problems of integration Generally, a conflict between two
15
Trang 24representations Rl, and R2 of the same concept is every situation that gives rise to
the representations RI and R2 not being identical
Steps and Goals of the Integration Process
According to Batini and Lenzerini (1986) stated in the article [3], the methodologies accomplish the task of schema integration can be considered to be a mixture of the four
activities:
i Pre-integration:
This phase is analysis of schemas to decide upon some integration policy This governs the choice of integration of schemas to be integrated, the order of integration and a possible assignment of preferences to entire schemas or portions of schemas Giving preference to financial applications over production applications is one example of an integration policy
Global strategies for integration, namely, the number of schemas to be integrated at one
time, are also considered in this phase Collecting of additional information relevant to
integration, such as constraints among views is also considered a part of this phase
ii Comparison of the schemas:
This phase analyses the schemas, compare to determine the correspondences among concepts and detect possible conflicts It refers to single out not only the set of common concepts but also the set of different concepts in different schemas Commonly two kinds
of conflicts are considered as name conflicts and structural conflicts In terms of the first type, Schemas in data models incorporate names for the various objects represented People from different application areas of the same organization refer to the same data
Trang 25using their own terminology and names results in a proliferation of names and a possible inconsistency among names in the component schemas as well Names conflicts can be categorized into types: Homonyms where the same name is used for two different
concepts that can raise the inconsistency and Synonyms, when the same concept is
described by two or more names Unless different names improve the understanding of
different users, they are not justified
The latter conflict type is structural conflicts They are conflicts that arise as a result of a different choice of modeling constructs or integrity constraints There are four types of structural conflicts that classified indecently from the various terminologies and from the specific characteristics of the different data models: type conflicts, dependency conflicts, key conflicts and behavioral conflicts
Type conflicts: arise when the same concept is represented by different modeling constructs in different schemas For instance, it’s the case of an object is represented as
an entity in one schema and as an attribute in another schema
Dependency conflicts: These arise when a group of concepts are related among themselves with different dependencies in different schemas For example, one relationship between two entities can be 1:1 in one schema but m: n in another
Key conflicts arise when different keys are assigned to the same concept in different schemas For example, SS# and Sno_ may be the keys of the common entity Student in two component schemas
Behavioral conflicts
These arise when different insertion/ deletion policies are associated with the same class
of objects in distinct schemas For example, in one schema, the existing of an entity is
17
Trang 26dependent of another, if the latter is deleted, the first also will be deleted; however, in
another schema, that entity is existed without any dependence to other entity
iii Conforming schemas
This step’s goal is to conform or align schemas to make them compatible, in other words
to construct a single global schema by changing some user view Achieving this goal amounts to resolving the conflicts, which in turn requires schema transformations be performed Sometimes schema transformations are performed during merging and restructuring
iv Merging and restructuring:
During this step the different kinds of operations, such as transforming of an attribute to an entity, to be performed on either the component schemas or the temporary integrated schema The activities are first merging the component schemas by means of a simple superimposition of common concepts, and then perform restructuring operations
on the integrated schema obtained by such a merging
2.1.3 Approaches
Upon introducing data integration approaches, Lenzerini (2002) stated in the article [4], one of the main tasks in the design of a data integration system! is to establish the mapping between the sources and the global schema In this article, Lenzerini introduced two approaches, Global-as-view and Local-as-view (GAV and LAV, respectively) which are used for the specification of the mapping between the global
' The aim of data integration system is combining the data residing at different sources, and
Trang 27schema and the source Therefore, here we present these data integration approaches as
mapping techniques for schema integration
Currently, GAV, LAV and Both-as-view (BAV) are known as the traditional data integration approaches GAV and LAV classic approaches were presented by Lenzerini (2002), while BAV was introduced as a technique of Bi-Directional Schema Transformation in data integration by Mc Brien and Poulovassilis (2002)
Below is the formalization for a schema integration system presented by Lenzerini (2002) borrowed from [4] for modeling GAV and LAV:
A schema integration system J is a triple of <G,M, S > where:
e Gis the global schema (structure and constraints),
® Š is the source schema (structure and constraints), and
© M is mapping between G and S
To specify the semantics of J we have to start with a source database
D (source data coherent withS) We call global database for J any database forG A global database B for J is said to be legal respect to D if:
© B is legal with repsect to G, ie B statisfies all the constraints of
G
s B Satisfies the mapping M with respect toD
We can also specify the semantics of queries posed to a data integration system, if g is a query of arity n and DB is a database, we denote with q DB the set of tuples (of arity 7) in DB that satisfy g Given a source database D forJ, the answer q J Dto a query q in J with respect to D, is the set of tuples ¢ such that ¢ € 4? for every global database B that is legal for J
19
Trang 28with respect to D The set g? is called the set of certain answers to q in J with
respect to D
Global-as-view
The GAV approach consists on a global schema which is constructed over the schemas of data sources This construct is defined as views (i.e., queries over source schemas to compose element for the global schema) The drawback of this approach is that the global schema is strongly coupled with the underlying source schemas and their changes It means that it does not readily support the evolution of the local schemas Once, a change
in information sources or adding new information requires the revision of the global schema and the mappings between the global schema and source schemas Thus, GAV is not scalable for larger application also a bad solution for the web context where sources are autonomous and volatile
When modeling with GAV, the mapping M associates to each element g in G a
query gs over Š
A GAV mapping is a set of assertions, one for each element g of G, of the form
g&~ qs The idea is that each element g of the global schema should be characterized in terms of view qs over the sources The mapping is explicitly telling the system how to retrieve data related to each element from the global schema
Local-as-view
The LAV approach takes the inverse point-of-view, that is defines source schemas over the global schema, so scales better Consequently, it has the advantage that changes on the underlying sources do not imply changes on the global schema Then, for a new (or
Trang 29changed) source schema, it’s only necessary to give a source description that describes source relation as views of the global schema However, LAV has problems if one needs
to change the global schema, since all the rules for defining local schemas as views of the global will need to be reviewed
When modeling with LAV, the mapping M associates to each element s of the
source schema S a query qg over G
A LAV mapping is a set of assertions, one for each element s of S, of the form
transformation sentences, LAV and GAV view definition can be fully derived, and also,
BAV transformation sentences, may be partially derived from LAV and GAV view definition
It is clear that this approach overcome both disadvantages of BAV and GAV because we need to define both global schema and source schemas inversely Thus, any change, modification or adding in source or global schemas can be updated for the other
21
Trang 30Sg research(ID, Topic, Area, Year)
publication(pubNo, /D, pubName, pubTime, title)
Sg is designed for references about research publications (i.e the publications of researches) The information in each record of publication is the publication’s title, publisher's name (‘pubName’), publisher’s officialLink, published time (‘pubTime’) and the ID of published research
Consider S, as the source schema and then over it we define mappings to define objects for S, schema (global schema)
In terms of GAV, the objects of S, can be defined as follows:
Gì research (ID, Topic, Area, Year) = {x, y, z, w | «x, y, Z„ w› € Research}
G2 publication (pubNo, ID, pubName, pubTime, title)
= {x, y, Z, u, V| «x, y, Z, u, v> € publish}
In terms of LAV, the objects of S, can be defined as follows:
Trang 31Li Research (ID, Topic, Area, Year) = {x, y, z, w| «x, y, Z, w> € research}
Lạ Publisher (pubName, officialLink)
={x,y|©, _, pubName, _, _› € publication A y = null}
Lạ publish (pubNo, /D, pubName, pubTime, title)
= {x y, Z, u, V | %, y, Z, U, v> € publication}
In terms of BAV:
To illustrate, inhere some primitive transformations are introduced named addRel (i.e
add relation), addAtt (i.e add attribute or field), delRel (i.e delete relation), delAtt
(delete attribute) Note that for each transformation the required parameter is query, which indicate how to derive the extent of the deleted/ new construct from the rest of schema constructs It is also noted that each schema that is resulted from each transformation step is called intermediate schema
These transformation steps below are created in terms of BAV technique
Transformation steps for defining objects of research relation in S;
addRel(«research, ID», { x | x € «Research, ID»})
addAtt(«research, Topic», notnull, { x, y | «x, y> € «Research, Topic» })
addAtt(«research, Area», notnull, { x, y | «x, y» € «Research, Area» })
addAtt(«research, Year», notnull, { x, y | «x, y» € «Research, Year» })
23
Trang 32Picture 1: Adding relation ‘research’ into Global schema
As can see in the screenshot above is the intermediate of 5 transformation steps
Note that step | is composed of 2 steps: addRel and addAtt key attribute ID
Moreover, query to define the extent of object in this screenshot is expressed
using functional Intermediate query language (IQL) that will be introduced in the
next chapter
Transformation steps to define objects for publication relation in S;
addRel(«publication, pubNo», {x | x € «publish, pubNo»})
addAtt(«publication, ID», notnull, { y | cy, x» € «publish, ID»})
addAtt(«publication, pubName», notnull, { x, y | <x, y» € «publish, pubName»})
addAtt(«publication, pubTime», notnull, { x, y | «x, y» € «publish, pubTime»})
addAtt(«publication, title», notnull, { x, y | «x, y» € «publish, title»})
And then, the resulting after step 9 is depicted below:
24
Trang 33NOU) Huy) < <<pbblish,pubTimés9~-> pubTime Ss i p —
KD GUUNG <cpublish; title2? 5 site
Picture 2: Transformation steps to define Global schema over Source Schema
So far, the intermediate schema contains following relations:
Publisher, publish, Research, research (as the result of transformation steps | to 4) and publication (as the result of transformation steps 5 to 9) The result is depicted in
the picture below:
25
Trang 34Transformation steps to remove Research relation
delAtt(«Research, Year», notnull, { x, y | «x, y» € «research, Year»
Ax€ «Research, ID»}) delAtt(«Research, Area», notnull, { x, y | «x, y> € « research, Area»
Ax € «Research, ID»}) delAtt(«Research, Topic», notnull, { x, y | «x, y> € « research, Topic»
Ax & «Research, ID»})
delRel(«Research, ID», { x | x € «research, ID»})
These steps are illustrated in the image below: ( it is intentional for clearing the line
connecting between table Research and its fields to note that the transformation
operations are delete!)
26
Trang 35AQ {y} lOuy) <- <cresearch,Area>>; “ 1 1
Trang 36pubName
Picture 5: Intermediate schema after removing relation ‘Research’ in Source schema
Transformation steps to remove Publisher relation
14 conAtt («Publisher, officialLink», null, void) ( conAtt also means delete attribute, but
there is no information that indicates how to derive the extent of this attribute In other
words, this attribute’s definition is not dependent on any remaining objects in schema)
15 delRel(«Publisher, pubName», { x, y | «x, y» € «publication, pubName»})
Transformation steps to remove publish relation
16 delAtt(«publish, title», notnull, { x, y | <x, y> € «publication, title»
Trang 37title pubNo
pubName
Picture 6: Global schema after finishing transformations
Returning now the transformation steps | to 9 incrementally define the objects of S„ from the objects of S, This can be regarded as the GA V aspect
Steps 10 to 20 then incrementally remove the objects of Ss from this intermediate schema, finally leaving only the objects of S, as desired From these queries steps we can
restore S, by reserving the transformation as:
addRel(«Research, ID», { x | x € «research, ID»})
addAtt(«Research, Topic», notnull, { x, y | «x, y» € « research, Topic»})
addAtt(«Research, Area», notnull, { x, y | «x, y» € « research, Area»})
addAtt(«Research, Year», notnull, { x, y | «x, y» € «research, Year»})
29
Trang 38extAtt(«Publisher, officialLink», null, void) (extAtt also means add attribute except that
attribute is defined not over any remaining objects in schema)
addRel(«publish, pubNo», { x | x € «publication, pubNo»})
addAtt(«publish, ID», not null, { x, y | «x, y » € «publication, ID»})
addAtt(«publish, pubName», not null, { x, y | «x, y› € «publication, pubName»})
addAtt(«publish, pubTime», not null, { x, y | «x, y» € «publication, pubTime»})
addAtt(«publish, title», not null, { x, y | «x, y» € «publication, title» })
So, this can be regarded as LAV aspect
30
Trang 392.2 AutoMed Schema Integration System
Currently, there exists systems for schema integration but AutoMed system provides a new solution to the problem of schema integration with superior and novel features in comparison with others Firstly, due to the use of a HDM that comprises of three primitive constructs: Nodes — Edges and Constraints as a common data model to express high level data models such as Entity-relationship, relational and UML models
will avoid semantic mismatches between modeling constructs Secondly, AutoMed bases
on Both-as-view mapping technique which is capable of providing a complete mapping between schemas in both directions and as analyzed overcomes other GAV and LAV techniques So, | chose it as the foundation for my web-based system This section is going to describe such features, AutoMed strengths in solving schema integration problem that motivates us to develop a new system from it as well as its limitations | would like to resolve
2.2.1 AutoMed is:
AutoMed” was originally an EPSRC (Engineering and Physical Sciences Research Council) funded project, but development work continues at Birkbeck and Imperial Colleges, under a number of related projects It is a framework and software package comprising of graphical tools (Swing GUI), and a programmers API
AutoMed is capable of handling a wide range of data sources, with the current implementation supporting data held relational DBMSs, XML files, and structured flat files, and development work on unstructured text files underway
> The official link for AutoMed project is located at http://www.doc.ic.ac.uk/automed/
31