Tài liệu Advances in Database Technology- P8 docx

Using polygons to represent regions cardinal direction relation matrix that corresponds to R is a 3×3 matrix defined as follows: For instance, the direction relation matrices that corres

Trang 1

332 S Skiadopoulos et al.

Fig 1 Reference tiles and relations

is in REG* Notice that region is disconnected and has a hole.

Let us now consider two arbitrary regions and in REG* Let region be

related to region through a cardinal direction relation (e.g., is north of

Region will be called the reference region (i.e., the region which the relation

refers to) while region will be called the primary region (i.e., the region for

which the relation is introduced) The axes forming the minimum bounding box

of the reference region divide the space into 9 areas which we call tiles (Fig 1a).

The peripheral tiles correspond to the eight cardinal direction relations south,

southwest, west, northwest, north, northeast, east and southeast These tiles

respectively The central area corresponds to the region’s minimum bounding

box and is denoted by By definition each one of these tiles includes the

parts of the axes forming it The union of all 9 tiles is

If a primary region is included (in the set-theoretic sense) in tile of

some reference region (Fig 1b) then we say that is south of and we write

Similarly, we can define southwest (SW), west (W), northwest (NW),

north (N), northeast (NE), east (E), southeast (SE) and bounding box (B)

relations If a primary region lies partly in the area and partly in

the area of some reference region (Fig 1c) then we say that is partly

northeast and partly east of and we write N E:E

The general definition of a cardinal direction relation in our framework is as

follows

Definition 1 A cardinal direction relation is an expression where

(c) for every such that 1 and A cardinal direction

relation is called single-tile if otherwise it is called multi-tile

Let and be two regions in REG* Single-tile cardinal direction relations

are defined as follows:

S

In Fig 1, regions and are in REG (also in REG*) and region

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 2

In Definition 1 notice that for every such that 1 and

and have disjoint interiors but may share points in their boundaries

Example 1 S, N E:E and B:S:SW:W:NW:N:E:SE are cardinal direction

re-lations The first relation is single-tile while the others are multi-tile In Fig 1,

we have S N E :E and B:S:SW:W:NW:N:E:SE For instance in Fig

in REG* such that

and

In order to avoid confusion, we will write the single-tile elements of a cardinal

direction relation according to the following order: B, S, SW, W, NW, N, NE,

E and SE Thus, we always write B:S:W instead of W:B:S or S:B:W Moreover,

for a relation such as B:S:W we will often refer to B, S and W as its tiles.

The set of cardinal direction relations for regions in REG* is denoted by

Relations in are jointly exhaustive and pairwise disjoint, and can be used to

represent definite information about cardinal directions, e.g., N Using the

relations of as our basis, we can define the powerset of which contains

relations Elements of are called disjunctive cardinal direction relations

and can be used to represent not only definite but also indefinite information

about cardinal directions, e.g., {N, W} denotes that region is north or west

of region

Notice that the inverse of a cardinal direction relation R, denoted by inv(R),

is not always a cardinal direction relation but, in general, it is a disjunctive

cardi-nal direction relation For instance, if S then it is possible that N E:N:NW

regions and is fully characterized by the pair where and

are cardinal directions such that (a) (b) (c) is a disjunct

of inv and (d) is a disjunct of An algorithm for computing

the inverse relation is discussed in [21] Moreover, algorithms that calculate the

composition of two cardinal direction relations and the consistency of a set of

cardinal direction constraints are discussed in [20,21,22]

Goyal and Egenhofer [5,6] use direction relation matrices to represent

cardi-nal direction relations Given a cardicardi-nal direction relation the

In general, each multi-tile relation is defined as follows:

B :S:SW:W:NW:N:SE:E

or N E:S or N:NW or N Specifically, the relative position of two

Trang 3

Fig 2 Using polygons to represent regions

cardinal direction relation matrix that corresponds to R is a 3×3 matrix defined

as follows:

For instance, the direction relation matrices that correspond to relations S,

N E:E and B:S:SW:W:NW:N:E:SE of Example 1 are as follows:

At a finer level of granularity, the model of [5,6] also offers the option to

record how much of the a region falls into each tile Such relations are called

cardinal direction relations with percentages and can be represented with cardinal

direction matrices with percentages Let and be two regions in REG* The

cardinal direction matrices with percentages can be defined as follows:

where denotes the area of region

Consider for example regions and in Fig 1c; region is 50% northeast

and 50% east of region This relation is captured with the following cardinal

direction matrix with percentages

In this paper, we will use simple assertions (e.g., S, B:S:SW) to capture

cardinal direction relations [20,21] and direction relations matrices to capture

cardinal direction relations with percentages [5,6]

Trang 4

Fig 3. Polygon clipping

Typically, in Geographical Information Systems and Spatial Databases, the

con-nected regions in REG are represented using single polygons, while the

com-posite regions in REG* are represented using sets of polygons [18,23] In this

paper, the edges of polygons are taken in a clockwise order For instance, in

Fig 2 region is represented using polygon and

re-gion is represented using polygons and

Notice than using sets of polygons, we can even represent gions with holes For instance, in Fig 2 region is represented using

re-polygons and

Given the polygon representations of a primary region and a reference

region the computation of cardinal direction relations problem lies in the

cal-culation of the cardinal direction relation R, such that R holds Similarly,

we can define the computation of cardinal direction relations with percentages

problem.

Let us consider a primary region and a reference region According to

Definition 1, in order to calculate the cardinal direction relation between region

and we have to divide the primary region into segments such that each

segment falls exactly into one tile of Furthermore, in order to calculate the

cardinal direction relation with percentages we also have to measure the area of

each segment Segmenting polygons using bounded boxes is a well-studied topic

of Computational Geometry called polygon clipping [7,10] A polygon clipping

algorithm can be extended to handle unbounded boxes (such as the tiles of

refer-ence region as well Since polygon clipping algorithms are very efficient (linear

in the number of polygon edges), someone would be tempted to use them for the

calculation of cardinal direction relations and cardinal direction relations with

percentages Let us briefly discuss the disadvantages of such an approach

Let us consider regions and presented in Fig 3a Region is formed

by a quadrangle (i.e., a total of 4 edges) To achieve the desired segmentation,

polygon clipping algorithms introduce to new edges [7,10] After the clipping

algorithms are performed (Fig 3b), region is formed by 4 quadrangles (i.e.,

a total of 16 edges) The worst case that we can think (illustrated in Fig 3c)

starts with 3 edges (a triangle) and ends with 35 edges (2 triangles, 6 quadrangles

and 1 pentagon) These new edges are only used for the calculation of cardinal

direction relations and are discarded afterwards Thus, it would be important

Trang 5

to minimize their number Moreover, in order to perform the clipping the edges

of the primary region must be scanned 9 times (one time for every tile of

the reference region In real GIS applications, we expect that the average

number of edges is high Thus, each scan of the edges of a polygon can be quite

time consuming Finally, polygon clipping algorithms sometimes require complex

floating point operations which are costly

In Sections 3.1 and 3.2, we consider the problem of calculating cardinal

direc-tion reladirec-tions and cardinal direcdirec-tion reladirec-tions with percentages respectively We

provide algorithms specifically tailored for this task, which avoid the drawbacks

of polygon clipping methods Our proposal does not segment polygons; instead

it only divides some of the polygon edges In Example 2, we show that such a

division is necessary for the correct calculation Interestingly, the resulting

num-ber of introduced edges is significantly smaller than the respective numnum-ber of

polygon clipping methods Furthermore, the complexity of our algorithms is not

only linear in the number of polygon edges but it can be performed with a single

pass Finally, our algorithms use simple arithmetic operations and comparisons

3.1 Cardinal Direction Relations

We will start by considering the calculation of cardinal constraints relations

problem First, we need the following definition

Definition 2 Let be basic cardinal direction relations The

tile-union of denoted by tile-union is a relation formed

from the union of the tiles of

Let and be sets of polygons representing

a primary region and a reference region To calculate the cardinal direction

R between the primary region and the reference region we first record the

tiles of region where the points forming the edges of the polygons

fall in Unfortunately, as the following example presents, this is not enough

Example 2. Let us consider the region (formed by the single polygon

and the region presented in Fig 4a Clearly points andlie in and respectively, but the relation between

and is B:W:NW:N:NE and not W:NW:NE.

The problem of Example 2 arises because there exist edges of polygon

that expand over three tiles of the reference region For instance,expands over tiles and In order to handle such situations,

we use the lines forming the minimum bounding box of the reference region to

divide the edges of the polygons representing the primary region and create

new edges such that (a) region does not change and (b) every new edge lies in

exactly one tile To this end, for every edge AB of region we compute the set of

intersection points of AB with the lines forming box We use the intersection

Trang 6

Fig 4 Illustration of Examples 2 and 3

points of to divide AB into a number of segments Each

segment lies in exactly one tile of and the union of all tiles is

AB Thus, we can safely replace edge AB with without affecting

region Finally, to compute the cardinal direction between regions and we

only have to record the tile of where each new segment lies Choosing a single

point from each segment is sufficient for this purpose; we choose to pick the

middle of the segment as a representative point Thus, the tile where the middle

point lies gives us the tile of the segment too The above procedure is captured in

Algorithm COMPUTE-CDR (Fig 5) and is illustrated in the following example

Example 3. Let us continue with the regions of Example 2 (see also Fig 4)

Algo-rithm COMPUTE-CDR considers every edge of region (polygon

in turn and performs the replacements presented in the following table

It easy to verify that every new edge lies in exactly one tile of (Fig 4b)

The middle points of the new edges lie in and

Therefore, Algorithm COMPUTE-CDR returns B:W:NW:N:NE:E, which

precisely captures the cardinal direction relation between regions and

Notice that in Example 3, Algorithm COMPUTE-CDR takes as input a

quad-rangle (4 edges) and returns 9 edges This should be contrasted with the polygon

clipping method that would have resulted in 19 edges (2 triangles, 2 quadrangles

and 1 pentagon) Similarly, for the shapes in Fig 3b-c, Algorithm COMPUTE

-CDR introduces 8 and 11 edges respectively while polygon clipping methods

introduce 16 and 34 edges respectively

The following theorem captures the correctness of Algorithm COMPUTE

-CDR and measures its complexity

Theorem 1 Algorithm COMPUTE-CDR is correct, i.e., it returns the cardinal

direction relation between two regions and in REG* that are represented

using two sets of polygons and respectively The running time of Algorithm

Trang 7

Fig 5 Algorithm C OMPUTE -CDR

COMPUTE -CDR is where (respectively is the total number of

edges of all polygons in (respectively

Summarizing this section, we can use Algorithm COMPUTE-CDR to compute

the cardinal direction relation between two sets of polygons representing two

regions and in REG* The following section considers the case of cardinal

direction relations with percentages

3.2 Cardinal Direction Relations with Percentages

In order to compute cardinal direction relations with percentages, we have to

calculate the area of the primary region that falls in each tile of the reference

region A naive way for this task is to segment the polygons that form the primary

region so that every polygon lies in exactly one tile of the reference region Then,

for each tile of the reference region we find the polygons of the primary region

that lie inside it and compute their area In this section, we will propose an

alternative method that is based on Algorithm COMPUTE-CDR This method

simply computes the area between the edges of the polygons that represent

the primary region and an appropriate reference line without segmenting these

polygons

We will now present a method to compute the area between a line and an

edge Then, we will see how we can extend this method to compute the area of

a polygon We will first need the following definition

Definition 3 Let AB be an edge and be a line We say that does not cross

AB if and only if one of the following holds: (a) AB and do not intersect, (b)

AB and intersect only at point A or B, or (c) AB completely lies on

Trang 8

Fig 6 Lines not crossing AB Fig 7. Area between an edge and a line

For example, in Fig 6 lines and do not cross edge AB Let us now

calculate the area between an edge and a line

Definition 4 Let and be two points forming edge

and be two lines that do not cross AB Let also and (respectively

– see also Fig 7 We define expression and as follows:

Expressions and can be positive or negative depending on

holds The absolute value of equals to the area

between edge AB and line i.e., the area of polygon In other

words, the following formula holds

Symmetrically, area between edge AB and line i.e., the area of polygon

equals to the absolute value of

Expressions and can be used to calculate the area of polygons Let

be a polygon, and be two lines that do not crosswith any edge of polygon The area of polygon denoted by can be

calculated as follows:

Notice that Computational Geometry algorithms, in order to calculate the

area of a polygon use a similar method that is based on a reference point

and be the projections of points A,B to line (respectively

the direction of vector It is easy to verify that and

Trang 9

Fig 8. Using expression to calculate the area of a polygon

(instead of a line) [12,16] This method is not appropriate for our case because

it requires to segment the primary region using polygon clipping algorithms (see

also the discussion at the beginning of Section 3) In the rest of this section, we

will present a method that utilizes expressions and and does not require

polygon clipping

pre-sented in Fig 8d The area of polygon can be calculated using formula

All the intermediateexpressions

are presented as the gray areas

of Fig 8a-d respectively

We will use expressions and to compute the percentage of the area of

the primary region that falls in each tile of the reference region Let us consider

region presented Fig 9 Region is formed by polygons and

Similarly to Algorithm COMPUTE-CDR, to compute the cardinaldirection relation with percentages of with we first use the to divide

the edges of region Let and be the lines forming

These lines divide the edges of polygons and

as shown in Fig 9

Let us now compute the area of that lies in the NW tile of (i.e.,

com-pute the area of polygon it is convenient to use as a reference line

Trang 10

Fig 9 Computing cardinal direction relations with percentages

Doing so, we do not have to compute edges and becauseand hold and thus the area we are looking forcan be calculated with the following formula:

In other words, to compute the area of that lies in

we calculate the area between the west line of and every

edge of that lies in i.e., the following formula holds:

Similarly, to calculate the area of that lies in the and we can

use the expressions:

For instance, in Fig 9 we have

and

To calculate the area of that lies in and we

simply have to change the line of reference that we use In the first three cases,

we use the east line of (i.e., in Fig 9), in the fourth case, we use

the south line of and in the last case, we use the north line of

In all cases, we use the edges of that fall in the tile of that

we are interested in Thus, have:

For instance, in Fig 9 we have

and

Trang 11

Let us now consider the area of that lies in None of the lines of

can help us compute without segmenting the polygonsthat represent region For instance, in Fig 9 using line we have:

Edge is not an edge of any of the polygons representing To handle such

situations, we employ the following method We use the south line of

as the reference line and calculate the areas between and all edges

that lie both in and This area will be denoted by

and is practically the area of that lies on and i.e.,

Since has been previouslycomputed, we just have to subtract it from in order toderive For instance, in Fig 9 we have:

and

The above described method is summarized in Algorithm COMPUTE-CDR-CDR%

presented in Fig 10 The following theorem captures the correctness of

Algo-rithm COMPUTE-CDR% and measures its complexity

Theorem 2 Algorithm COMPUTE-CDR% is correct, i.e., it returns the

cardi-nal direction relation with percentages between two regions and in REG* that

are represented using two sets of polygons and respectively The running

time of Algorithm COMPUTE-CDR% is where (respectively is

the total number of edges of all polygons in (respectively

In the following section, we will present an actual system, CARDIRECT,

that incorporates and implements Algorithms COMPUTE-CDR and COMPUTE

-CDR%

Information

In this section, we will present a tool that implements the aforementioned

reason-ing tasks for the computation of cardinal direction relationships among regions

The tool, CARDIRECT, has been implemented in C++ over the Microsoft

Vi-sual Studio toolkit Using CARDIRECT, the user can define regions of interest

over some underlying image (e.g., a map), compute their relationships (with and

Trang 12

Fig 10. Algorithm C OMPUTE -CDR%

without percentages) and pose queries The tool implements an XML interface,

through which the user can import and export the configuration he constructs

(i.e., the underlying image and the sets of polygons that form the regions); the

XML description of the configuration is also employed for querying purposes

The XML description of the exported scenarios is quite simple: A

configura-tion (Image) is defined upon an image file (e.g., a map) and comprises a set of

regions and a set of relations among them Each region comprises a set of

poly-gons of the same color and each polygon comprises a set of edges (defined by

and The direction relations among the different regions are all

stored in the XML description of the configuration The DTD for CARDIRECT

configurations is as follows

Trang 13

Fig 11. Using C AR D IRECT to annotate images

Observe Fig 11 In this configuration, the user has opened a map of Ancient

Greece at the time of the Peloponnesian war as the underlying image Then, the

user defined three sets of regions: the “Athenean Alliance” in blue, comprising

of Attica, the Islands, the regions in the East, Corfu and South Italy; (b) the

“Spartan Alliance” in red, comprising of Peloponnesos, Beotia, Crete and Sicely;

and (c) the “Pro-Spartan” in black, comprising of Macedonia

Moreover, using CARDIRECT, the user can compute the cardinal direction

relations and the cardinal direction relations with percentages between the

iden-tified regions In Fig 12, we have calculated the relations between the regions

of Fig 11 For instance, Peloponnesos is B:S:SW:W of Attica (left side of Fig.

12) while Attica is

of Peloponnesos (right-hand side of Fig 12)

The query language that we employ is based on the following simple model

Let be a set of regions in REG* over a configuration Let C

be a finite set of thematic attributes for the regions of REG* (e.g., the color of

each region) and a function, practically relating each of

the regions with a value over the domain of C (e.g., the fact that the Spartan

Alliance is colored red)

A query condition over variables is a conjunction the following

Trang 14

Fig 12. Using C AR D IRECT to extract cardinal direction relations

region of the configuration, is a value of a thematic attribute and

is a (possibly disjunctive) cardinal direction relation A query over

variables is a formula of the form

where is a query condition

Intuitively, the query returns a set of regions in the configuration of an image

that satisfy the query condition, which can take the form of: (a) a cardinal

direction constraint between the query variables (e.g., B:SE:S (b) arestriction in the thematic attributes of a variable (e.g., and

(c) direct reference of a particular region (e.g.,

For instance, for the configuration of Fig 11 we can pose the following query:

“Find all regions of the Athenean Alliance which are surrounded by a region in

the Spartan Alliance” This query can be expressed as follows:

In this paper, we have addressed the problem of efficiently computing the

car-dinal direction relations between regions that are composed of sets of polygons

(a) by presenting two linear algorithms for this task, and (b) by explaining

their incorporation into an actual system These algorithms take as inputs two

sets of polygons representing two regions respectively The first of the proposed

algorithms is purely qualitative and computes the cardinal direction relations

between the input regions The second has a quantitative aspect and computes

the cardinal direction relations with percentages between the input regions To

the best of our knowledge, these are the first algorithms that address the

afore-mentioned problem The algorithms have been implemented and embedded in

an actual system, CARDIRECT, which allows the user to specify, edit and

anno-tate regions of interest in an image Then, CARDIRECT automatically computes

Trang 15

the cardinal direction relations between these regions The configuration of the

image and the introduced regions is persistently stored using a simple XML

de-scription The user is allowed to query the stored XML description of the image

and retrieve combinations of interesting regions on the basis of the query

Although this part of our research addresses the problem of relation

compu-tation to a sufficient extent, there are still open issues for future research First,

we would like to evaluate experimentally our algorithm against polygon clipping

methods A second interesting topic is the possibility of combining topological

[2] and distance relations [3] Another issue is the possibility of combining the

underlying model with extra thematic information and the enrichment of the

employed query language on the basis of this combination Finally, a long term

goal would be the integration of CARDIRECT with image segmentation software,

which would provide a complete environment for the management of image

E Clementini, P Di Fellice, and G Califano Composite Regions in Topological

Queries Information Systems, 7:759–594, 1995.

M.J Egenhofer Reasoning about Binary Topological Relationships In Proceedings

of SSD’91, pages 143–160, 1991.

A.U Frank Qualitative Spatial Reasoning about Distances and Directions in

Geographic Space Journal of Visual Languages and Computing, 3:343–371, 1992.

A.U Frank Qualitative Spatial Reasoning: Cardinal Directions as an Example.

International Journal of GIS, 10(3):269–290, 1996.

R Goyal and M.J Egenhofer The Direction-Relation Matrix: A Representation

for Directions Relations Between Extended Spatial Objects In the annual

assem-bly and the summer retreat of University Consortium for Geographic Information

Systems Science, June 1997.

R Goyal and M.J Egenhofer Cardinal Directions Between Extended Spatial

Objects IEEE Transactions on Data and Knowledge Engineering, (in press), 2000.

Available at http://www.spatial.maine.edu/~max/RJ36.html.

Y.-D Liang and B.A Barsky A New Concept and Method for Line Clipping.

ACM Transactions on Graphics, 3(1):868–877, 1984.

G Ligozat Reasoning about Cardinal Directions Journal of Visual Languages

and Computing, 9:23–44, 1998.

S Lipschutz Set Theory and Related Topics McGraw Hill, 1998.

P.-G Maillot A New, Fast Method For 2D Polygon Clipping: Analysis and

Soft-ware Implementation ACM Transactions on Graphics, 11(3):276–290, 1992.

A Mukerjee and G Joe A Qualitative Model for Space In Proceedings of

AAAI’90, pages 721–727, 1990.

J O’Rourke Computational Geometry in C Cambridge University Press, 1994.

D Papadias, Y Theodoridis, T Sellis, and M.J Egenhofer Topological

Rela-tions in the World of Minimum Bounding Rectangles: A Study with R-trees In

Proceedings of ACM SIGMOD’95, pages 92–103, 1995.

C.H Papadimitriou, D Suciu, and V Vianu Topological Queries in Spatial

Databases Journal of Computer and System Sciences, 58(1):29–53, 1999.

Trang 16

D.J Peuquet and Z Ci-Xiang An Algorithm to Determine the Directional

Rela-tionship Between Arbitrarily-Shaped Polygons in the Plane Pattern Recognition,

20(1):65–74, 1987.

F Preparata and M Shamos Computational Geometry: An Introduction Springer

Verlag, 1985.

J Renz and B Nebel On the Complexity of Qualitative Spatial Reasoning: A

Maximal Tractable Fragment of the Region Connection Calculus Artificial

Intel-ligence, 1-2:95–149, 1999.

P Rigaux, M Scholl, and A Voisard Spatial Data Bases Morgan Kaufman, 2001.

S Skiadopoulos, C Giannoukos, P Vassiliadis, T Sellis, and M Koubarakis

Com-puting and Handling Cardinal Direction Information (Extended Report)

Techni-cal Report TR-2003-5, National TechniTechni-cal University of Athens, 2003 Available

at http://www.dblab.ece.ntua.gr/publications.

S Skiadopoulos and M Koubarakis Composing Cardinal Directions Relations In

Proceedings of the 7th International Symposium on Spatial and Temporal Databases

(SSTD’01), volume 2121 of LNCS, pages 299–317 Springer, July 2001.

S Skiadopoulos and M Koubarakis Qualitative Spatial Reasoning with Cardinal

Directions In Proceedings of the 7th International Conference on Principles and

Practice of Constraint Programming (CP’02), volume 2470 of LNCS, pages 341–

Trang 17

A Tale of Two Schemas: Creating a Temporal XML

Schema from a Snapshot Schema with

Faiz Currim1, Sabah Currim1, Curtis Dyreson2, and Richard T Snodgrass1

1 University of Arizona, Tucson, AZ, USA

of an XML document can vary over time, how the document can change, and where timestamps should be placed The advantage of using annotations to denote the time-varying aspects is that logical and physical data independence for temporal schemas can be achieved while remaining fully compatible with both existing XML Schema documents and the XML Schema recommendation.

1 Introduction

XML is becoming an increasingly popular language for documents and data XML

can be approached from two quite separate orientations: a document-centered

orientation (e.g., HTML) and a data-centered orientation (e.g., relational and

object-oriented databases) Schemas are important in both orientations A schema defines the

building blocks of an XML document, such as the types of elements and attributes

An XML document can be validated against a schema to ensure that the document

conforms to the formatting rules for an XML document (is well-formed) and to the

types, elements, and attributes defined in the schema (is valid) A schema also serves

as a valuable guide for querying and updating an XML document or database For

instance, to correctly construct a query, e.g., in XQuery, a user will (usually) consult

the schema rather than the data Finally, a schema can be helpful in query

optimization, e.g., in constructing a path index [24]

Several schema languages have been proposed for XML [22] From among these

languages, XML Schema is the most widely used The syntax and semantics of XML

Schema 1.0 are W3C recommendations [35, 36]

Time-varying data naturally arises in both document-centered and data-centered

orientations Consider the following wide-ranging scenarios In a university, students

take various courses in different semesters At a company, job positions and salaries

change At a warehouse, inventories evolve as deliveries are made and good are

E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 348–365, 2004.

Trang 18

shipped In a hospital, drug treatment regimes are adjusted And finally at a bank,

account balances are in flux In each scenario, querying the current state is important,

e.g., “how much is in my account right now”, but it also often useful to know how the

data has changed over time, e.g., “when has my account been below $200”

An obvious approach would have been to propose changes to XML Schema to

accommodate time-varying data Indeed, that has been the approach taken by many

researchers for the relational and object-oriented models [25, 29, 32] As we will

discuss in detail, that approach inherently introduces difficulties with respect to

document validation, data independence, tool support, and standardization So in this

paper we advocate a novel approach that retains the non-temporal XML schema for

the document, utilizing a series of separate schema documents to achieve data

independence, enable full document validation, and enable improved tool support,

while not requiring any changes to the XML Schema standard (nor subsequent

extensions of that standard; XML Schema 1.1 is in development)

The primary contribution of this paper is to introduce the (Temporal

XML Schema) data model and architecture is a system for constructing

schemas for time-varying XML documents1 A time-varying document records the

evolution of a document over time, i.e., all of the versions of the document

has a three-level architecture for specifying a schema for time-varyingdata2 The first level is the schema for an individual version, called the snapshot

schema. The snapshot schema is a conventional XML Schema document The second

level is the temporal annotations of the snapshot schema The temporal annotations

identify which elements can vary over time For those elements, the temporal

annotations also effect a temporal semantics to the various integrity constraints (such

as uniqueness) specified in the snapshot schema The third level is the physical

annotations. The physical annotations describe how the time-varying aspects are

represented Each annotation can be independently changed, so the architecture has

(logical and physical) data independence [7] Data independence allows XML

documents using one representation to be automatically converted to a different

representation while preserving the semantics of the data has a suite of

auxiliary tools to manage time-varying documents and schemas There are tools to

convert a time-varying document from one physical representation to a different

representation, to extract a time slice from that document (yielding a conventional

static XML document), and to create a time-varying document from a sequence of

static documents, in whatever representation the user specifies

As mentioned, reuses rather than extends XML Schema is

consistent and compatible with both XML Schema and the XML data model In

a temporal validator augments a conventional validator to morecomprehensively check the validity constraints of a document, especially temporal

constraints that cannot be checked by a conventional XML Schema validator We

describe a means of validating temporal documents that ensures the desirable property

of snapshot validation subsumption We show elsewhere how a temporal document

can be smaller and faster to validate than the associated XML snapshots [12]

1

We embrace both the document and data centric orientations of XML and will use the terms

“document” and “database” interchangeably.

2

Three-level architectures are a common architecture in both databases [33] and

spatio-temporal conceptual modeling [21].

Trang 19

350 F Currim et al.

While this paper concerns temporal XML Schema, we feel that the general

approach of separate temporal and physical annotations is applicable to other data

models, such as UML [28] The contribution of this paper is two-fold: (1) introducing

a three-level approach for logical data models and (2) showing in detail how this

approach works for XML Schema in particular, specifically concerning a theoretical

definition of snapshot validation subsumption for XML, validation of time-varying

XML documents, and implications for tools operating on realistic XML schemas and

data, thereby exemplifying in a substantial way the approach While we are confident

that the approach could be applied to other data models, designing the annotation

specifications, considering the specifics of data integrity constraint checking, and

ascertaining the impact on particular tools remain challenging (and interesting) tasks

focuses on instance versioning (representing a time-varying XML instance document) and not schema versioning [15, 31] The schema can describe

which aspects of an instance document change over time But we assume that the

schema itself is fixed, with no element types, data types, or attributes being added to

or removed from the schema over time Intensional XML data (also termed dynamic

XML documents [1]), that is, parts of XML documents that consist of programs that

generate data [26], are gaining popularity Incorporating intensional XML data is

beyond the scope of this paper

The next section motivates the need for a new approach Section 0 provides a

theoretical framework for while an overview of its architecture is in

Section 0 Details of the may be found in Section 0 Related work is

reviewed in Section 0 We end with a summary and list of future work in Section 0

This section discusses whether conventional XML Schema is appropriate and

satisfactory for time-varying data We first present an example that illustrates how a

time-varying document differs from a conventional XML document We then

pinpoint some of the limitations of XML Schema Finally we state the desiderata for

schemas for time-varying documents

2.1 Motivating Example

Assume that the history of the Winter Olympic games is described in an XML

document called winter.xml The document has information about the athletes

that participate, the events in which they participate, and the medals that are awarded

Over time the document is edited to add information about each new Winter

Olympics and to revise incorrect information Assume that information about the

athletes participating in the 2002 Winter Olympics in Salt Lake City, USA was added

on 2002-01-01 On 2002-03-01 the document was further edited to record the medal

winners Finally, a small correction was made on 2002-07-01

To depict some of the changes to the XML in the document, we focus on

information about the Norwegian skier Kjetil Andre Aamodt On 2002-01-01 it was

known that Kjetil would participate in the games and the information shown in Fig 1

Trang 20

was added towinter.xml Kjetil won a medal; so on 2002-03-01 the fragment was

revised to that shown in Fig 2 The edit on 2002-03-01 incorrectly recorded that

Kjetil won a silver medal in the Men’s Combined; Kjetil won a gold medal Fig 3

shows the correct medal information

Fig 2. Kjetil won a medal, as of 2002-03-01

Fig 3. Medal data is corrected on 2002-07-01

A time-varying document records a version history, which consists of the

information in each version, along with timestamps indicating the lifetime of that

version Fig 4 shows a fragment of a time-varying document that captures the history

of Kjetil The fragment is compact in the sense that each edit results in only a small,

localized change to the document The history is also bi-temporal because both the

valid time and transaction time lifetimes are captured [20] The valid time refers to the

time(s) when a particular fact is true in the modeled reality, while the transaction time

is the time when the information was edited The two concepts are orthogonal

Time-varying documents can have each kind of time In Fig 4 the valid- and

transaction-time lifetransaction-times of each element are represented with an optional <rs : transaction-timestamp>

sub-element3 If the timestamp is missing, the element has the same lifetime as its

enclosing element For example, there are two <athlete> elements with different

lifetimes since the content of the element changes The last version of <athlete>

has two <medal> elements because the medal information is revised There are

many different ways to represent the versions in a time-varying document; the

methods differ in which elements are timestamped, how the elements are

timestamped, and how changes are represented (e.g., perhaps only differences

between versions are represented)

Keeping the history in a document or data collection is useful because it provides

the ability to recover past versions, track changes over time, and evaluate temporal

queries [17] But it changes the nature of validating against a schema Assume that the

3

The introduced <rs : timestamp> element is in the “rs” namespace to distinguish it from

any <timestamp> elements already in the document This namespace will be discussed in

more detail in Sections 0 and 0.

Fig 1. A fragment of winter.xml on 2002-01-01

Trang 21

352 F Currim et al.

Fig 4. A fragment of a time-varying document

Fig 5. An extract from the winOlympic schema

file winOlympic.xsd contains the snapshot schema for winter.xml The

snapshot schema is the schema for an individual version The snapshot schema is a

valuable guide for editing and querying individual versions A fragment of the schema

is given in Fig 5 Note that the schema describes the structure of the fragment shown

in Fig 1, Fig 2, and Fig 3 The problem is that although individual versions conform

to the schema, the time-varying document does not So winOlympic.xsd cannot

be used (directly) to validate the time-varying document of Fig 4

The snapshot schema could be used indirectly for validation by individually

reconstituting and validating each version But validating every version can be

expensive if the changes are frequent or the document is large (e.g., if the document is

a database) While the Winter Olympics document may not change often, contrast this

with, e.g., a Customer Relationship Management database for a large company

Thousands of calls and service interactions may be recorded each day This would

lead to a very large number of versions, making it expensive to instantiate and

Trang 22

validate each individually The number of versions is further increased because there

can be both valid time and transaction time versions

To validate a time-varying document, a new, different schema is needed The

schema for a time-varying document should take into account the elements (and

attributes) and their associated timestamps, specify the kind(s) of time involved,

provide hints on how the elements vary over time, and accommodate differences in

version and timestamp representation Since this schema will express how the

time-varying information is represented, we will call it the representational schema The

representational schema will be related to the underlying snapshot schema (Fig 5),

and allows the time-varying document to be validated using a conventional XML

Schema validator (though not fully, as discussed in the next section)

2.2 Moving beyond XML Schema

Both the snapshot and representational schemas are needed for a time-varying

document The snapshot schema is useful in queries and updates For example, a

current query applies to the version valid now, a current update modifies the data in

the current version, creating a new version, and a timeslice query extracts a previous

version All of these apply to a single version of a time-varying document, a version

described by the snapshot schema The representational schema is essential for

validation and representation (storage) Many versions are combined into a single

temporal document, described by the representational schema

Unfortunately the XML Schema validator is incapable of fully validating a

time-varying document using the representational schema First, XML Schema is not

sufficiently expressive to enforce temporal constraints For example, XML Schema

cannot specify the following (desirable) schema constraint: the transaction-time

lifetime of a <medal> element should always be contained in the transaction-time

lifetime of its parent <athlete> element Second, a conventional XML Schema

document augmented with timestamps to denote time-varying data cannot, in general,

be used to validate a snapshot of a time-varying document A snapshot is an instance

of a time-varying document at a single point in time For instance, if the schema

asserts that an element is mandatory (minOccurs=1) in the context of another

element, there is no way to ensure that the element is in every snapshot since the

element’s timestamp may indicate that it has a shorter lifetime than its parent

(resulting in times during which the element is not there, violating this integrity

constraint); XML Schema provides no mechanism for reasoning about the

timestamps

Even though the representational and snapshot schemas are closely related, there

are no existing techniques to automatically derive a representational schema from a

snapshot schema (or vice-versa) The lack of an automatic technique means that users

have to resort to ad hoc methods to construct a representational schema Relying on

ad hoc methods limits data independence The designer of a schema for time-varying

data has to make a variety of decisions, such as whether to timestamp with periods or

with temporal elements [16], which are sets of non-overlapping periods and which

elements should be time-varying By adopting a tiered approach, where the snapshot

XML Schema, temporal annotations, and physical annotations are separate

documents, individual schema design decisions can be specified and changed, often

Trang 23

354 F Currim et al.

without impacting the other design decisions, or indeed, the processing of tools For

example, a tool that computes a snapshot should be concerned primarily with the

snapshot schema; the logical and physical aspects of time-varying information should

only affect (perhaps) the efficiency of that tool, not its correctness With physical data

independence, few applications that are unconcerned with representational details

would need to be changed

Finally, improved tool support for representing and validating time-varying

information is needed Creating a time-varying XML document and representational

schema for that document is potentially labor-intensive Currently a user has to

manually edit the time-varying document to insert timestamps indicating when

versions of XML data are valid (for valid time) or are present in the document (for

transaction time) The user also has to modify the snapshot schema to define the

syntax and semantics of the timestamps The entire process would be repeated if a

new timestamp representation were desired It would be better to have automated

tools to create, maintain, and update time-varying documents when the representation

of the timestamped elements changes

2.3 Desiderata

In augmenting XML Schema to accommodate time-varying data, we had several

goals in mind At a minimum, the new approach would exhibit the following desirable

features

Simplify the representation of time for the user

Support a three-level architecture to provide data independence, so that changes in

the logical and physical level are isolated

Retain full upward compatibly with existing standards and not require any changes

to these standards

Augment existing tools such as validating parsers for XML in such a way that

those tools are also upward compatible Ideally, any off-the-shelf validating parser

(for XML Schema) can be used for (partial) validation

Support both valid time and transaction time

Accommodate a variety of physical representations for time-varying data

Support instance versioning

Note that while ad hoc representational schemas may meet the last three desiderata,

they certainly don’t meet the first four Other desirable features, outside the scope of

this paper, include supporting schema versioning and accommodating temporal

indeterminacy and granularity

This section sketches the process of constructing a schema for a time-varying

document from a snapshot schema The goal of the construction process is to create a

schema that satisfies the snapshot validation subsumption property, which is

Trang 24

described in detail below In the relational data model, a schema defines the structure

of each relation in a database Each relation has a very simple structure: a relation is a

list of attributes, with each attribute having a specified data type The schema also

includes integrity constraints, such as the specification of primary and foreign keys In

a similar manner, an XML Schema document defines the valid structure for an XML

document But an XML document has a far more complex structure than a relation A

document is a (deeply) nested collection of elements, with each element potentially

having (text) content and attributes

3.1 Snapshot Validation Subsumption

Let be an XML document that contains timestamped elements A timestamped

element is an element that has an associated timestamp (A timestamped attribute can

be modeled as a special case of a timestamped element.) Logically, the timestamp is a

collection of times (usually periods) chosen from one or more temporal dimensions

(e.g., valid time, transaction time) Without loss of generality, we will restrict the

discussion in this section to lifetimes that consist of a single period in one temporal

dimension4 The timestamp records (part of) the lifetime of an element5 We will use

be denoted as One constraint on the lifetime is that the lifetime of an

element must be contained in the lifetime of each element that encloses it6

The snapshot operation extracts a complete snapshot of a time-varying document

at a particular instant Timestamps are not represented in the snapshot A snapshot at

or with the empty string, otherwise The snapshot operation is denoted as where D is the snapshot at time t of the time-varying document

Let be a representational schema for a time-varying document The snapshot

validation subsumption property captures the idea that, at the very least, the

representational schema must ensure that every snapshot of the document is valid

with respect to the snapshot schema Let vldt(S,D) represent the validation status of

document D with respect to schema S The status is true if the document is valid but

false otherwise Validation also applies to time-varying documents, e.g.,

is the validation status of with respect to a representational schema, using a

temporal validator

Property [Snapshot Validation Subsumption] Let S be an XML Schema document,

be a time-varying XML document, and be a representational schema, also an

4

The general case is that a timestamp is a collection of periods from multiple temporal

dimensions (a multidimensional temporal element).

5

Physically, there are myriad ways to represent a timestamp It could be represented as an

<rs: timestamp> subelement in the content of the timestamped element as is done in the

fragment in Fig 4 Or it could be a set of additional attributes in the timestamped element, or

it could even be a <rs : version> element that wraps the timestamped element.

6

Note that the lifetime captures only when an element appears in the context of the enclosing

elements The same element can appear in other contexts (enclosed by different elements) but

clearly it has a different lifetime in those contexts.

the notation to signify that element x has been timestamped Let the lifetime of

time t replaces each timestamped element with its non-timestamped copy x if t is in

Trang 25

356 F Currim et al.

XML Schema document is said to have snapshot validation subsumption with

respect to S if

Intuitively, the property asserts that a good representational schema will validate only

those time-varying documents for which every snapshot conforms to the snapshot

schema The subsumption property is depicted in the following correspondence

diagram

Fig 6 Snapshot validation subsumption

Details of the process for constructing a schema for a time-varying document that

conforms to the snapshot validation subsumption property from a snapshot schema

are available in a technical report by the authors [12]

The architecture of is illustrated in Fig 7 This figure is central to our

approach, so we describe it in detail and illustrate it with the example We note that

although the architecture has many components, only those components shaded gray

in the figure are specific to an individual time-varying document and need to be

supplied by a user New time-varying schemas can be quickly and easily developed

and deployed We also note that the representational schema, instead of being the only

schema in an ad hoc approach, is merely an artifact in our approach, with the snapshot

schema, temporal annotations, and physical annotations being the crucial

specifications to be created by the designer

The designer annotates the snapshot schema with temporal annotations (box 6).

The temporal annotations together with the snapshot schema form the logical schema.

Fig 8 provides an extract of the temporal annotations on the winOlympic schema

The temporal annotations specify a variety of characteristics such as whether an

element or attribute varies over valid time or transaction time, whether its lifetime is

described as a continuous state or a single event, whether the item itself may appear at

certain times (and not at others), and whether its content changes For example,

<athlete> is described as a state element, indicating that the <athlete> will be

valid over a period (continuous) of time rather than a single instant Annotations can

be nested, enabling the target to be relative to that of its parent, and inheriting as

Tiêu đề	Advances In Database Technology
Tác giả	S. Skiadopoulos
Trường học	University Name
Chuyên ngành	Database Technology
Thể loại	Bài báo

Định dạng
Số trang	50
Dung lượng	1,3 MB