000059726 A Web-based System for Schema Integration ( Một hệ thống dựa trên web cho lược đồ Tích hợp).

Trang 2

for the honor degree of

Bachelor of Computer Science

THU VIEN DAI HOC HA NOI

HANOI UNIVERSITY LIBRARY

Trang 3

Abstract

Schema Integration is the solution for acquiring a non-redundant, unified representation of all data managed in organization to deal with the heterogeneity of

databases in database management systems It is the process of integrating existing or

proposed database schemas into a reconciled, global and integrated schema On the same view, the aim of this paper is to present a web-based system for schema integration This proposed system is originated from an existing data integration system named AutoMed that is a desktop-based system using a data integration technique called Both-as-view approach to establish mappings between schemas and expressed in a common hyper graph data model (HDM) At present, our system constitutes of two main functions They are displaying various schema models using a common layout in system of circles and then allowing users to interact with these schemas directly on web to perform integration And the schema models currently supported in the system are common models like Entity-relationship model, Relational model and UML model

Trang 4

Acknowledgments

Firstly, I would like to send my thanks to my supervisor, Mr Le Minh Duc, who

is guiding me from the initial to the final stage of this project Actually, I have not been hard-working for the whole time during this research but thanks to his enthusiasm, there was time | try harder to find out the solution then gain much more valuable things Also, his professional working style is great motivation for me to do this project Once again thank for his suggestion, worthful feedbacks and his accompany with me during nearly 7

months

Secondly, I would like to thank Dr Nguyen Xuan Hoai, the director of our

research institute, who always encourage us, take time and best conditions for us to work

on this honor research program Thanks to his lectures on research methodology each week he delivered to us, that is very helpful to us during this research And in the start of each lecture class, he always asked us about our research progress, how we feel about it! This care and his encouragement when we said tired and unhappy gave us motivation to continue until the end of this research

Thirdly, | would like to thank the Faculty of the Information Technology for all knowledge I gained during four years, not only the academic knowledge but also the soft skills and especially for giving us the chance to participate in this research program This brings us an opportunity to learn how to do research as well as the chance for us to broaden our mind, gain a lot of knowledge, learn new things and learn how to deal with the pressure during researching

Trang 5

Table of Contents

Table of Contents :ssscssessssseesseesneesneessesesseesseesssesssateesueesasessueesnecssecesnessuesesneceseessnees iv List of Figures

Chapter 1 Introduction csvsressgseisooreervonecposrsosnnserisunvescanavstuvsuvanseresaevvessaaabessstavinestaccstete 1 1,1 Ghallengess.cccssssocussasssvessavesnssnssescwcscwszsesosenessavasstesssaassstesesaor teetswiassstseciteviaweeeeee 1

1.2 Objectives

L3 MGHVRHOHisesiaoasrnnsenriredatiiiiisioiitdtilissliiaOETioAdELSfiiA084013i2f5551G603t 90030 800gg0 3

TA, MEUNGdGÌUBEW.eocesaonnaoornirbaiionaoiinriiidiiitiaodtlduoilitllslitOARQEQGklAtiiSugag ae 4 1.5 Contributions

1.6 An Overview of the Rest of the Document .s:sssscsssesseecssessessneecsneeesneeesneeeane 5 Chapter 2 The Background Knowledge .scsssssssssesscsssseesssssescssssecssuecssuseessnseeesnnneenssnnnets 7 DNs GQOMSH A TCS PALO sacessssssccnusacssesswasuazésusustnssettonesanestnaseneernneennepnesennonsnccennnennescenas 7

2.1.1 I COTC EES csssscssessasessaveccssssesseraesnetenkeracennsduarseensaceensoneracensensnnnecseensseneentseaneesene 7

2.1.2 Schema Integration Methodology ssssesssesseessesseeseeseesneeneecesneenseeneenee 14

Trang 6

3.2 System Functions csecsseesssescseessessseesetecseeesneessseesnessasesssessteceseesseessaeesseessed 43 Display schemas .scscsssesssseesseesseeesnecsneeeseeeeneenses

"D2 Có) 0L 1aa A 48

IM )L)Ể11 101000000 TT 48 Chapter 4 System Design and Implementation

` ) ` “ -Ö-Ö43544 50 LN 5 59 4.2.1 — Display Schemas

Graphuispec[GHINGIircoessecianranidskisdtkidiidiaGii105513015000888815u10301940110418441183Gã 64a mussao 6l Á:2/2 'TransfOrimis SGHEHiAS::a.sasssssionnootoinsoooonloinobdyaEaorstotzaodossal 65 Evaluation: the’ inte gration result ssccsvscosssccosucavsssascesvasassscsauscivussesscavssseessexssasccssessaisee 66 4.2.3 Display Schema Transformations Output scsssssssccssssccssssecssssessesseecessved 67 Chapter 5 ConclUS[O8iossssssessoesasgirdttogilgigiatticoitbita3148018514G1050080100081g00x34s.qoaad 68 5.I A Summary ofthe Lessons Learned -c¿-csc©cV2cseeccvserrrveerrrrsee 68 5.2 Future work

RSPEPENCES szeccpessrzzssserssescrsvnsiaassisititesesneepesnecnneeoneenvaneneesgnarsnsseeneeceazsersereneersvaznreneconseneneoranee 70

Trang 7

Basic E-R model notations (Jeffrey A Hoffer et al 2006) -. - 9

An example of Entity-Relationship Model .c::ssssssessssesssesseesseesnseesseeesneees 10 EER super type and subtype notations (Jeffrey A Hoffer et al 2006) 11

An example of Relational model

Basic UML notations (Craig Larman 2005) .sssesssssssessessecsessecseesscsnecsneeneees 13

An example of UML model .ssssssesssssscsessssssssesssessssecssesssssseseessoesssnesnesessees 13 AutoMed system architecture

Conceptual View of the AutoMed Repository (Mc Brien 2006) 33

Hyper graph data model (HDM) in AutoMed cscesscessesscssessesscessneeneeneseoees 34

Figure 11: HDM and ER Models in AutoMed before customizing -‹- 36 Figure 12: E-R, Relational and UML schema model representation in AutoMed 36 Figure 13: User interface for creating schema transformations in AutoMed 38 Figure 14: Transformation pathway in AutoMed c-ccxscxvsrsxxerkrerkerrkerrrii 4I Figure 15: Transformation details in AutolMed -c-s+©cse+cxsscrsrsrkesrrresrrreccr 42 Figure 16: Proposed web-based schema integration system .ss:sssscssesseessseesneeesneeess 43 Figure 17: Citeleés: graphical syStenn sssccsssss ccccsssecsszusvsssvsacescesssevacesensicacesisatectasesaeeansenssene 47 Figure 18: ER model in web-based system -2++ccx+erxtrrrerrrrrrrresrrkrrrrerre 54 Figure 19: UML model in web-based system ssssesssssseesseeseesseesessessseeeseeseeneeeeeeseeeneeeee 55 Figure 20: Relational model in web-based system

vi

Trang 8

Figure 21: Enhanced ER model in web-based system -+cs+©cscccveccccccrrvee 56 Figure 22: Demonstration of applying transformations

Figure 23: Single transformation pathway in web-based system -.- 58

Figure 24: Two pathways go through a Schema -ssssscssesesseesesseecesssesesecesssueeseeeneensees 58

Figure 25: The process for display schemas request

Figure 26: Graph data structure .essscssecssesssssesseesessecssesecssessecssscssseseesecaneanecsseesseesees! 60 Bigure:272Pointidata structune ccovessvsssvequesesnuncsncoscuvsssnsaresscoavesvsseavssisveaisosouscvessiaucesiavd 60 Figure 28: An example: schema in representation of graph model .- - 61 Figure 30: The process of apply transformation request .: sesssssssssssssesessesssnecssecesseeees 66 Figure 31: The process of view integration output request . -s+©css+cvseevcvz 67

Table I: ER model characteristics and equivalent SVG elements . - SI Table 2: UML model characteristics and equivalent SVG elements - - 52

Table 3: Relational model characteristics and equivalent SVG elements 52

Table 4: EER model characteristics and equivalent SVG elements

vil

Trang 9

Chapter 1 Introduction

In larger applications or organizations, it is too complex to model the structure of the database by a single designer in a single view Moreover, user groups typically operate independently in organizations and have their own requirements and expectations of data, which may conflict with other user groups On demand of a unified database structure that provides a general view for the whole organization which will cover all other partial databases structure from groups and facilitate querying data stored in these groups, schema integration is posed as a solution for that situation to integrate database schemas from groups into a single and unified schema This paper aims to introduce a system to carry out this process That is a web-based system for integrating schemas

Trang 10

browsers allows the users interact directly with schema objects to apply transformations between schemas to perform the integration, and viewing integration outputs on web? Generally, those are key functions in our system

This system is originated from an existing data integration system named AutoMed that is a software package with a Java API and a graphical tool for performing

schema integration I chose this system because currently AutoMed has good features for

a schema integration system The first feature it supports is automatically wrap relational

schemas Obviously, the inputs for schema integration system are schemas while many existing applications and organizations are managed in relational database management

systems (RDBMS); hence, it saves time and easy for users if a schema can be

automatically extracted from its database The second feature in AutoMed system is representing and displaying schema models in high level graph based model The availability of a Java Repository API that allows users to apply transformations between schemas to integrate is the third key function; especially AutoMed uses a low level hyper graph data model (HDM) as common data model for integrating schemas, the novel feature makes it different from other existing systems Another function in this system is viewing integration outputs They are pathways containing the transformations which users created on schema In addition to four primitive functions that help users to carry

out schema integration, AutoMed supports query over schemas

Besides these good features, AutoMed has some limitations such as the way AutoMed displays schemas is each schema is placed in a single window In other words, schemas are in different views This makes the comparison among schemas difficult Moreover, to create transformations between two schemas or query schemas users have to write complicated queries using a functional Intermediate Query Language (IQL) In addition,

Trang 11

AutoMed requires users a specific working environment, people must know how to

configure and install it on their machine because so far AutoMed still has been a

standalone program

However, it cannot be denied that the great functions AutoMed are presently having, so from it I propose to build a user-friendly schema integration system for both non-expert and expert users which they can access anytime without installing and learning so much technical and academic knowledge to use At present, AutoMed includes a Java Repository API for performing schema integration process Therefore, | would like to utilize this engine and use it as schema integration engine on the web server in the web- based schema integration system All these good points as well as limitations about AutoMed are specified in the chapter 2

As a consequence, the rest of this paper is going to illustrate the methodology as well

as the process I have taken to accomplish this project

1.3 Motivation

Being interested in database management; especially, integrating schemas, hence

I chose this topic for my research and thesis project Moreover, making a practical thing instead of raising an idea without implementation is my ambition in which during studying about schema integration, I learned a lot from AutoMed system and discovered many good features as well as some limitations from it that I think, can be improved to be better, so I decided to develop this desktop-based system based on the knowledge and technique | learned for dealing with schema integration to a web-based system that is easier to use for the users than the original system

Trang 12

specify new things I offer to do and think of the algorithm to fulfill those things

During this process, there are a lot of things totally new to me that are very complicated

to understand, so it takes time for reading over and over again, searching and keep reading recursively, taking notes and asking my supervisor for more explanation

Secondly, programming skill is strongly required to finish this project Although the idea and approach are clear, they are on the paper They need implementing The architecture of this system is client-server and AutoMed API engine would be placed on

the server that is implemented in Java, so I use JSP, Java and Servlet to work with this engine

In another side, | use SVG language (Scalable Vector Graphics) to display schemas on web because it is supported by almost of current web browsers and it is integrated with

Trang 13

scripts to handle events triggered on its objects so it increases the interaction with the users Also, thanks to this, users can perform transforming schemas easily by interacting

with the objects in SVG and HTML form

1.5 Contributions

To deal with the challenges | stated before | desire to use a common layout both

to represent different schema models and integration outputs (i.e transformation

pathways) and then use Scalable Vector Graphics language to display on the web To

perform integration users interact directly to the objects in the schema to create transformations by manipulating with forms on the web browsers and make confirmations without learning too complex academic knowledge And then for now, | finally completed the project with some achievements:

- Define data structures to store and manipulate with schemas

- Successfully display schemas in Entity-relationship, UML and relational models

on the web using a system of circles

- Allows integrating schemas via the interaction with schema objects directly on the

web

1.6 An Overview of the Rest of the Document

The rest of this paper is organized as follows Background knowledge, necessary concepts and the original AutoMed system are reviewed in chapter 2 On the way of introducing AutoMed in this chapter I shall state functions and their problems in this system that I will propose to solve in my system In chapter 3, I will explain our proposed solutions and solutions for those problems which are AutoMed system’s limitations and

a

Trang 14

the objectives specified in chapter | as well Chapter 4 is the detailed designs and implementations for solutions and proposed solutions described in chapter 3 Summary of the paper, learned lesions on finishing this project and future work are included in

Chapter 5

Trang 15

Chapter 2 The Background Knowledge

2.1 Schema Integration

2.1.1 Concepts

To understand the schema integration problem, it is necessary to understand deeply and differentiate clearly the basic concepts This section is a brief explanation for the most basic concepts, terminologies in terms of schema integration problem

Schema (Database Schema)

“The DBMSs that store and manipulate a database must have a definition of the database in the form of a schema This is termed the intension of the database and the actual values of data in a database are called instances or occurrences of data

Sometimes they are termed the extension of a database, or just “the database”

(Batini and Lenzerini 1986)

Data Model (Database Model)

Codd (1970) stated “A data model is a notation for describing data or information”; the

description generally consists of three parts Structure of data, Operations on the data and

Constraints on data

Tools in programming languages such as C or Java describing the structure of the data

used by a program are arrays, structures or objects These data structures are used to implement data in the computer In the database world, data models are at somewhat

higher level than data structures Operations on data are generally anything that can be

7

Trang 16

performed In the database models, operations are limited in a limited set of queries (operations that retrieve information) and modifications (operations that change the

database) Constraints on the data are limitations on what the data can be

From the definition of the schema, (the structure of database) and the model (the

notation for describing the structure and constraints on the database); we can refer the

model to the representation of the schema in specific forms

Some common database models are known as Relational model, Entity-Relationship,

Enhanced Entity-Relationship, UML, and XML

° Entity-Relationship Model

E-R Model is defined in the book [1] about modern database management as “a detailed, logical representation of data for an organization or for a business area The E-R model is expressed in terms of entities, the relationships (or associations) among those entities, and the attributes (or properties) of both the entities and their relationships An E-R model is normally expressed as an entity-relationship diagram (or E-R diagram), which this is a graphical representation of an E-R model” He also stated that “an entity is

a person, place, object, event, or concept in the user environment about which the

organization wishes to maintain data” Thus, an entity has a noun name, such a machine,

an employee, a class or a student

Fig | and 2 are notation and example of an E-R model

Trang 18

° Enhanced Entity-Relationship Model

As defined in book [1] by Hoffer (2006) “the term enhanced entity-relationship (EER) model is used to identify the model that has resulted from extending the original E-R model with these new modeling constructs Beside the basic constructs such as

entities, relationships, attributes from original E-R model, the most modeling

constructs is incorporated in the EER model is super type/ subtype relationships The notations for two these constructs are depicted in Fig 3 This facility enables us to model

a general entity type (called the super type) and then subdivide it into several specialized entity types (called subtype)”

10

Trang 19

‘Attributes shared Relationships \_by all entities

unnamed rows An attribute, consistent with its definition in E-R model is a named

column (or field) of a relation To enable storing and retrieving a row of data in a relation, based on the data values stored in that row, each relation must have a primary key that is an attribute (or combination of attributes) that uniquely identifies each row in

a relation Each row contains a unique instance of data for the categories defined by the columns (Hoffer 2006) A primary key is designated by underlining the attribute name

II

Trang 20

The relationship between two tables or relations is represented through the use of foreign key, as defined by Hoffer (2006) “is an attribute (possibly composite) in a relation of a database that serves as the primary key of another relation” and designated by using a dashed underline Fig 4 below is an example of Relational model

tana [siren [Sung —]

Figure 4: An example of Relational model

° UML model

The UML data model represents data in form of classes, attributes (i.e data members in class) and associations (the relationships between classes) Fig 5 shows basic notations in a UML model Fig 6 gives an example to demonstrate a UML with those notations

Trang 21

+ Dependency: Change * Association: Set of

to one thing will affect links between objects

Class the other

13

Trang 22

related (called schema matching), then map one schema to another (schema mapping) by

creating transformations between schemas

Therefore, simply performing the integration task is defined as applying transformations between two schemas In other words, to fulfill integration task eventually is creating

transformations between schemas to achieve the correspondences

2.1.2 Schema Integration Methodology

Causes for Schema Diversity

The basic problems to be dealt with during integration come from structural (the difference in schema models) and semantic diversities of schemas to be merged The various causes for schema diversity are different perspectives, equivalence among constructs of the model, and incompatible design specifications[2]

Different Perspectives

In the design process, different user groups or designers adopt their own viewpoints in modeling the same objects in the application domain For instance, different names were attached to the same.concept in the two schemas

Equivalence among Constructs of the Model

Typically, in conceptual models, several combinations of constructs can model the same application domain equivalently As a consequence, “richer” models give rise to a larger variety of possibilities to model the same situation

Trang 23

Incompatible Design Specifications

Erroneous choices regarding names, types, integrity constraints, etc may result in erroneous inputs to the schema integration process A good schema integration

methodology must lead to the detection of such errors

Common Concepts

Owing to the causes for schema diversity described above, it may very well happen that the same concept of the application domain can be represented by different representations (RI and R2) in different schemas and several types of semantic relationships can exist between such _ representations They may be identical, equivalent, compatible, or incompatible:

Identical (1): R\ and R2 are exactly the same This happens when the same modeling constructs are used, the same perceptions are applied, and no incoherence enters into the specification

Equivalent (2): R\ and R2 are not exactly the same because different but equivalent modeling constructs have been applied The perceptions are still the same and coherent Compatible (3): R1 and R2 are neither identical nor equivalent However, the modeling constructs, designer perception, and integrity constraints are not contradictory

Incompatible (4): R| and R2 are contradictory because of the incoherence of the

specification

Situations (2), (3), and (4) above can be interpreted as conflicts Conflicts and their resolutions are central to the problems of integration Generally, a conflict between two

15

Trang 24

representations Rl, and R2 of the same concept is every situation that gives rise to

the representations RI and R2 not being identical

Steps and Goals of the Integration Process

According to Batini and Lenzerini (1986) stated in the article [3], the methodologies accomplish the task of schema integration can be considered to be a mixture of the four

activities:

i Pre-integration:

This phase is analysis of schemas to decide upon some integration policy This governs the choice of integration of schemas to be integrated, the order of integration and a possible assignment of preferences to entire schemas or portions of schemas Giving preference to financial applications over production applications is one example of an integration policy

Global strategies for integration, namely, the number of schemas to be integrated at one

time, are also considered in this phase Collecting of additional information relevant to

integration, such as constraints among views is also considered a part of this phase

ii Comparison of the schemas:

This phase analyses the schemas, compare to determine the correspondences among concepts and detect possible conflicts It refers to single out not only the set of common concepts but also the set of different concepts in different schemas Commonly two kinds

of conflicts are considered as name conflicts and structural conflicts In terms of the first type, Schemas in data models incorporate names for the various objects represented People from different application areas of the same organization refer to the same data

Trang 25

using their own terminology and names results in a proliferation of names and a possible inconsistency among names in the component schemas as well Names conflicts can be categorized into types: Homonyms where the same name is used for two different

concepts that can raise the inconsistency and Synonyms, when the same concept is

described by two or more names Unless different names improve the understanding of

different users, they are not justified

The latter conflict type is structural conflicts They are conflicts that arise as a result of a different choice of modeling constructs or integrity constraints There are four types of structural conflicts that classified indecently from the various terminologies and from the specific characteristics of the different data models: type conflicts, dependency conflicts, key conflicts and behavioral conflicts

Type conflicts: arise when the same concept is represented by different modeling constructs in different schemas For instance, it’s the case of an object is represented as

an entity in one schema and as an attribute in another schema

Dependency conflicts: These arise when a group of concepts are related among themselves with different dependencies in different schemas For example, one relationship between two entities can be 1:1 in one schema but m: n in another

Key conflicts arise when different keys are assigned to the same concept in different schemas For example, SS# and Sno_ may be the keys of the common entity Student in two component schemas

Behavioral conflicts

These arise when different insertion/ deletion policies are associated with the same class

of objects in distinct schemas For example, in one schema, the existing of an entity is

17

Trang 26

dependent of another, if the latter is deleted, the first also will be deleted; however, in

another schema, that entity is existed without any dependence to other entity

iii Conforming schemas

This step’s goal is to conform or align schemas to make them compatible, in other words

to construct a single global schema by changing some user view Achieving this goal amounts to resolving the conflicts, which in turn requires schema transformations be performed Sometimes schema transformations are performed during merging and restructuring

iv Merging and restructuring:

During this step the different kinds of operations, such as transforming of an attribute to an entity, to be performed on either the component schemas or the temporary integrated schema The activities are first merging the component schemas by means of a simple superimposition of common concepts, and then perform restructuring operations

on the integrated schema obtained by such a merging

2.1.3 Approaches

Upon introducing data integration approaches, Lenzerini (2002) stated in the article [4], one of the main tasks in the design of a data integration system! is to establish the mapping between the sources and the global schema In this article, Lenzerini introduced two approaches, Global-as-view and Local-as-view (GAV and LAV, respectively) which are used for the specification of the mapping between the global

' The aim of data integration system is combining the data residing at different sources, and

Trang 27

schema and the source Therefore, here we present these data integration approaches as

mapping techniques for schema integration

Currently, GAV, LAV and Both-as-view (BAV) are known as the traditional data integration approaches GAV and LAV classic approaches were presented by Lenzerini (2002), while BAV was introduced as a technique of Bi-Directional Schema Transformation in data integration by Mc Brien and Poulovassilis (2002)

Below is the formalization for a schema integration system presented by Lenzerini (2002) borrowed from [4] for modeling GAV and LAV:

A schema integration system J is a triple of <G,M, S > where:

e Gis the global schema (structure and constraints),

® Š is the source schema (structure and constraints), and

To specify the semantics of J we have to start with a source database

D (source data coherent withS) We call global database for J any database forG A global database B for J is said to be legal respect to D if:

G

s B Satisfies the mapping M with respect toD

We can also specify the semantics of queries posed to a data integration system, if g is a query of arity n and DB is a database, we denote with q DB the set of tuples (of arity 7) in DB that satisfy g Given a source database D forJ, the answer q J Dto a query q in J with respect to D, is the set of tuples ¢ such that ¢ € 4? for every global database B that is legal for J

19

Trang 28

with respect to D The set g? is called the set of certain answers to q in J with

respect to D

Global-as-view

The GAV approach consists on a global schema which is constructed over the schemas of data sources This construct is defined as views (i.e., queries over source schemas to compose element for the global schema) The drawback of this approach is that the global schema is strongly coupled with the underlying source schemas and their changes It means that it does not readily support the evolution of the local schemas Once, a change

in information sources or adding new information requires the revision of the global schema and the mappings between the global schema and source schemas Thus, GAV is not scalable for larger application also a bad solution for the web context where sources are autonomous and volatile

When modeling with GAV, the mapping M associates to each element g in G a

query gs over Š

A GAV mapping is a set of assertions, one for each element g of G, of the form

g&~ qs The idea is that each element g of the global schema should be characterized in terms of view qs over the sources The mapping is explicitly telling the system how to retrieve data related to each element from the global schema

Local-as-view

The LAV approach takes the inverse point-of-view, that is defines source schemas over the global schema, so scales better Consequently, it has the advantage that changes on the underlying sources do not imply changes on the global schema Then, for a new (or

Trang 29

changed) source schema, it’s only necessary to give a source description that describes source relation as views of the global schema However, LAV has problems if one needs

to change the global schema, since all the rules for defining local schemas as views of the global will need to be reviewed

When modeling with LAV, the mapping M associates to each element s of the

source schema S a query qg over G

A LAV mapping is a set of assertions, one for each element s of S, of the form

transformation sentences, LAV and GAV view definition can be fully derived, and also,

BAV transformation sentences, may be partially derived from LAV and GAV view definition

It is clear that this approach overcome both disadvantages of BAV and GAV because we need to define both global schema and source schemas inversely Thus, any change, modification or adding in source or global schemas can be updated for the other

21

Trang 30

Sg research(ID, Topic, Area, Year)

publication(pubNo, /D, pubName, pubTime, title)

Sg is designed for references about research publications (i.e the publications of researches) The information in each record of publication is the publication’s title, publisher's name (‘pubName’), publisher’s officialLink, published time (‘pubTime’) and the ID of published research

Consider S, as the source schema and then over it we define mappings to define objects for S, schema (global schema)

In terms of GAV, the objects of S, can be defined as follows:

Gì research (ID, Topic, Area, Year) = {x, y, z, w | «x, y, Z„ w› € Research}

G2 publication (pubNo, ID, pubName, pubTime, title)

= {x, y, Z, u, V| «x, y, Z, u, v> € publish}

In terms of LAV, the objects of S, can be defined as follows:

Trang 31

Li Research (ID, Topic, Area, Year) = {x, y, z, w| «x, y, Z, w> € research}

Lạ Publisher (pubName, officialLink)

Lạ publish (pubNo, /D, pubName, pubTime, title)

= {x y, Z, u, V | %, y, Z, U, v> € publication}

In terms of BAV:

To illustrate, inhere some primitive transformations are introduced named addRel (i.e

add relation), addAtt (i.e add attribute or field), delRel (i.e delete relation), delAtt

(delete attribute) Note that for each transformation the required parameter is query, which indicate how to derive the extent of the deleted/ new construct from the rest of schema constructs It is also noted that each schema that is resulted from each transformation step is called intermediate schema

These transformation steps below are created in terms of BAV technique

Transformation steps for defining objects of research relation in S;

addRel(«research, ID», { x | x € «Research, ID»})

addAtt(«research, Topic», notnull, { x, y | «x, y> € «Research, Topic» })

addAtt(«research, Area», notnull, { x, y | «x, y» € «Research, Area» })

addAtt(«research, Year», notnull, { x, y | «x, y» € «Research, Year» })

23

Trang 32

Picture 1: Adding relation ‘research’ into Global schema

As can see in the screenshot above is the intermediate of 5 transformation steps

Note that step | is composed of 2 steps: addRel and addAtt key attribute ID

Moreover, query to define the extent of object in this screenshot is expressed

using functional Intermediate query language (IQL) that will be introduced in the

next chapter

Transformation steps to define objects for publication relation in S;

addRel(«publication, pubNo», {x | x € «publish, pubNo»})

addAtt(«publication, ID», notnull, { y | cy, x» € «publish, ID»})

addAtt(«publication, pubName», notnull, { x, y | <x, y» € «publish, pubName»})

addAtt(«publication, pubTime», notnull, { x, y | «x, y» € «publish, pubTime»})

addAtt(«publication, title», notnull, { x, y | «x, y» € «publish, title»})

And then, the resulting after step 9 is depicted below:

24

Trang 33

NOU) Huy) < <<pbblish,pubTimés9~-> pubTime Ss i p —

KD GUUNG <cpublish; title2? 5 site

Picture 2: Transformation steps to define Global schema over Source Schema

So far, the intermediate schema contains following relations:

Publisher, publish, Research, research (as the result of transformation steps | to 4) and publication (as the result of transformation steps 5 to 9) The result is depicted in

the picture below:

25

Trang 34

Transformation steps to remove Research relation

delAtt(«Research, Year», notnull, { x, y | «x, y» € «research, Year»

Ax€ «Research, ID»}) delAtt(«Research, Area», notnull, { x, y | «x, y> € « research, Area»

Ax € «Research, ID»}) delAtt(«Research, Topic», notnull, { x, y | «x, y> € « research, Topic»

Ax & «Research, ID»})

delRel(«Research, ID», { x | x € «research, ID»})

These steps are illustrated in the image below: ( it is intentional for clearing the line

connecting between table Research and its fields to note that the transformation

operations are delete!)

26

Trang 35

AQ {y} lOuy) <- <cresearch,Area>>; “ 1 1

Trang 36

pubName

Picture 5: Intermediate schema after removing relation ‘Research’ in Source schema

Transformation steps to remove Publisher relation

14 conAtt («Publisher, officialLink», null, void) ( conAtt also means delete attribute, but

there is no information that indicates how to derive the extent of this attribute In other

words, this attribute’s definition is not dependent on any remaining objects in schema)

15 delRel(«Publisher, pubName», { x, y | «x, y» € «publication, pubName»})

Transformation steps to remove publish relation

16 delAtt(«publish, title», notnull, { x, y | <x, y> € «publication, title»

Trang 37

title pubNo

pubName

Picture 6: Global schema after finishing transformations

Returning now the transformation steps | to 9 incrementally define the objects of S„ from the objects of S, This can be regarded as the GA V aspect

Steps 10 to 20 then incrementally remove the objects of Ss from this intermediate schema, finally leaving only the objects of S, as desired From these queries steps we can

restore S, by reserving the transformation as:

addRel(«Research, ID», { x | x € «research, ID»})

addAtt(«Research, Topic», notnull, { x, y | «x, y» € « research, Topic»})

addAtt(«Research, Area», notnull, { x, y | «x, y» € « research, Area»})

addAtt(«Research, Year», notnull, { x, y | «x, y» € «research, Year»})

29

Trang 38

extAtt(«Publisher, officialLink», null, void) (extAtt also means add attribute except that

attribute is defined not over any remaining objects in schema)

addRel(«publish, pubNo», { x | x € «publication, pubNo»})

addAtt(«publish, ID», not null, { x, y | «x, y » € «publication, ID»})

addAtt(«publish, pubName», not null, { x, y | «x, y› € «publication, pubName»})

addAtt(«publish, pubTime», not null, { x, y | «x, y» € «publication, pubTime»})

addAtt(«publish, title», not null, { x, y | «x, y» € «publication, title» })

So, this can be regarded as LAV aspect

30

Trang 39

2.2 AutoMed Schema Integration System

Currently, there exists systems for schema integration but AutoMed system provides a new solution to the problem of schema integration with superior and novel features in comparison with others Firstly, due to the use of a HDM that comprises of three primitive constructs: Nodes — Edges and Constraints as a common data model to express high level data models such as Entity-relationship, relational and UML models

will avoid semantic mismatches between modeling constructs Secondly, AutoMed bases

on Both-as-view mapping technique which is capable of providing a complete mapping between schemas in both directions and as analyzed overcomes other GAV and LAV techniques So, | chose it as the foundation for my web-based system This section is going to describe such features, AutoMed strengths in solving schema integration problem that motivates us to develop a new system from it as well as its limitations | would like to resolve

2.2.1 AutoMed is:

AutoMed” was originally an EPSRC (Engineering and Physical Sciences Research Council) funded project, but development work continues at Birkbeck and Imperial Colleges, under a number of related projects It is a framework and software package comprising of graphical tools (Swing GUI), and a programmers API

AutoMed is capable of handling a wide range of data sources, with the current implementation supporting data held relational DBMSs, XML files, and structured flat files, and development work on unstructured text files underway

> The official link for AutoMed project is located at http://www.doc.ic.ac.uk/automed/

31

Tiêu đề	A web-based system for schema integration
Tác giả	Tran Thi Linh
Người hướng dẫn	Le Minh Duc, M.Sc.
Trường học	Hanoi University
Chuyên ngành	Computer Science
Thể loại	Graduation thesis
Năm xuất bản	2011
Thành phố	Hanoi

Định dạng
Số trang	79
Dung lượng	12,6 MB