Using polygons to represent regions cardinal direction relation matrix that corresponds to R is a 3×3 matrix defined as follows: For instance, the direction relation matrices that corres
Trang 1332 S Skiadopoulos et al.
Fig 1 Reference tiles and relations
is in REG* Notice that region is disconnected and has a hole.
Let us now consider two arbitrary regions and in REG* Let region be
related to region through a cardinal direction relation (e.g., is north of
Region will be called the reference region (i.e., the region which the relation
refers to) while region will be called the primary region (i.e., the region for
which the relation is introduced) The axes forming the minimum bounding box
of the reference region divide the space into 9 areas which we call tiles (Fig 1a).
The peripheral tiles correspond to the eight cardinal direction relations south,
southwest, west, northwest, north, northeast, east and southeast These tiles
respectively The central area corresponds to the region’s minimum bounding
box and is denoted by By definition each one of these tiles includes the
parts of the axes forming it The union of all 9 tiles is
If a primary region is included (in the set-theoretic sense) in tile of
some reference region (Fig 1b) then we say that is south of and we write
Similarly, we can define southwest (SW), west (W), northwest (NW),
north (N), northeast (NE), east (E), southeast (SE) and bounding box (B)
relations If a primary region lies partly in the area and partly in
the area of some reference region (Fig 1c) then we say that is partly
northeast and partly east of and we write N E:E
The general definition of a cardinal direction relation in our framework is as
follows
Definition 1 A cardinal direction relation is an expression where
(c) for every such that 1 and A cardinal direction
relation is called single-tile if otherwise it is called multi-tile
Let and be two regions in REG* Single-tile cardinal direction relations
are defined as follows:
S
In Fig 1, regions and are in REG (also in REG*) and region
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 2In Definition 1 notice that for every such that 1 and
and have disjoint interiors but may share points in their boundaries
Example 1 S, N E:E and B:S:SW:W:NW:N:E:SE are cardinal direction
re-lations The first relation is single-tile while the others are multi-tile In Fig 1,
we have S N E :E and B:S:SW:W:NW:N:E:SE For instance in Fig
in REG* such that
and
In order to avoid confusion, we will write the single-tile elements of a cardinal
direction relation according to the following order: B, S, SW, W, NW, N, NE,
E and SE Thus, we always write B:S:W instead of W:B:S or S:B:W Moreover,
for a relation such as B:S:W we will often refer to B, S and W as its tiles.
The set of cardinal direction relations for regions in REG* is denoted by
Relations in are jointly exhaustive and pairwise disjoint, and can be used to
represent definite information about cardinal directions, e.g., N Using the
relations of as our basis, we can define the powerset of which contains
relations Elements of are called disjunctive cardinal direction relations
and can be used to represent not only definite but also indefinite information
about cardinal directions, e.g., {N, W} denotes that region is north or west
of region
Notice that the inverse of a cardinal direction relation R, denoted by inv(R),
is not always a cardinal direction relation but, in general, it is a disjunctive
cardi-nal direction relation For instance, if S then it is possible that N E:N:NW
regions and is fully characterized by the pair where and
are cardinal directions such that (a) (b) (c) is a disjunct
of inv and (d) is a disjunct of An algorithm for computing
the inverse relation is discussed in [21] Moreover, algorithms that calculate the
composition of two cardinal direction relations and the consistency of a set of
cardinal direction constraints are discussed in [20,21,22]
Goyal and Egenhofer [5,6] use direction relation matrices to represent
cardi-nal direction relations Given a cardicardi-nal direction relation the
In general, each multi-tile relation is defined as follows:
B :S:SW:W:NW:N:SE:E
or N E:S or N:NW or N Specifically, the relative position of two
Trang 3334 S Skiadopoulos et al.
Fig 2 Using polygons to represent regions
cardinal direction relation matrix that corresponds to R is a 3×3 matrix defined
as follows:
For instance, the direction relation matrices that correspond to relations S,
N E:E and B:S:SW:W:NW:N:E:SE of Example 1 are as follows:
At a finer level of granularity, the model of [5,6] also offers the option to
record how much of the a region falls into each tile Such relations are called
cardinal direction relations with percentages and can be represented with cardinal
direction matrices with percentages Let and be two regions in REG* The
cardinal direction matrices with percentages can be defined as follows:
where denotes the area of region
Consider for example regions and in Fig 1c; region is 50% northeast
and 50% east of region This relation is captured with the following cardinal
direction matrix with percentages
In this paper, we will use simple assertions (e.g., S, B:S:SW) to capture
cardinal direction relations [20,21] and direction relations matrices to capture
cardinal direction relations with percentages [5,6]
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 4Fig 3. Polygon clipping
Typically, in Geographical Information Systems and Spatial Databases, the
con-nected regions in REG are represented using single polygons, while the
com-posite regions in REG* are represented using sets of polygons [18,23] In this
paper, the edges of polygons are taken in a clockwise order For instance, in
Fig 2 region is represented using polygon and
re-gion is represented using polygons and
Notice than using sets of polygons, we can even represent gions with holes For instance, in Fig 2 region is represented using
re-polygons and
Given the polygon representations of a primary region and a reference
region the computation of cardinal direction relations problem lies in the
cal-culation of the cardinal direction relation R, such that R holds Similarly,
we can define the computation of cardinal direction relations with percentages
problem.
Let us consider a primary region and a reference region According to
Definition 1, in order to calculate the cardinal direction relation between region
and we have to divide the primary region into segments such that each
segment falls exactly into one tile of Furthermore, in order to calculate the
cardinal direction relation with percentages we also have to measure the area of
each segment Segmenting polygons using bounded boxes is a well-studied topic
of Computational Geometry called polygon clipping [7,10] A polygon clipping
algorithm can be extended to handle unbounded boxes (such as the tiles of
refer-ence region as well Since polygon clipping algorithms are very efficient (linear
in the number of polygon edges), someone would be tempted to use them for the
calculation of cardinal direction relations and cardinal direction relations with
percentages Let us briefly discuss the disadvantages of such an approach
Let us consider regions and presented in Fig 3a Region is formed
by a quadrangle (i.e., a total of 4 edges) To achieve the desired segmentation,
polygon clipping algorithms introduce to new edges [7,10] After the clipping
algorithms are performed (Fig 3b), region is formed by 4 quadrangles (i.e.,
a total of 16 edges) The worst case that we can think (illustrated in Fig 3c)
starts with 3 edges (a triangle) and ends with 35 edges (2 triangles, 6 quadrangles
and 1 pentagon) These new edges are only used for the calculation of cardinal
direction relations and are discarded afterwards Thus, it would be important
Trang 5336 S Skiadopoulos et al.
to minimize their number Moreover, in order to perform the clipping the edges
of the primary region must be scanned 9 times (one time for every tile of
the reference region In real GIS applications, we expect that the average
number of edges is high Thus, each scan of the edges of a polygon can be quite
time consuming Finally, polygon clipping algorithms sometimes require complex
floating point operations which are costly
In Sections 3.1 and 3.2, we consider the problem of calculating cardinal
direc-tion reladirec-tions and cardinal direcdirec-tion reladirec-tions with percentages respectively We
provide algorithms specifically tailored for this task, which avoid the drawbacks
of polygon clipping methods Our proposal does not segment polygons; instead
it only divides some of the polygon edges In Example 2, we show that such a
division is necessary for the correct calculation Interestingly, the resulting
num-ber of introduced edges is significantly smaller than the respective numnum-ber of
polygon clipping methods Furthermore, the complexity of our algorithms is not
only linear in the number of polygon edges but it can be performed with a single
pass Finally, our algorithms use simple arithmetic operations and comparisons
3.1 Cardinal Direction Relations
We will start by considering the calculation of cardinal constraints relations
problem First, we need the following definition
Definition 2 Let be basic cardinal direction relations The
tile-union of denoted by tile-union is a relation formed
from the union of the tiles of
Let and be sets of polygons representing
a primary region and a reference region To calculate the cardinal direction
R between the primary region and the reference region we first record the
tiles of region where the points forming the edges of the polygons
fall in Unfortunately, as the following example presents, this is not enough
Example 2. Let us consider the region (formed by the single polygon
and the region presented in Fig 4a Clearly points andlie in and respectively, but the relation between
and is B:W:NW:N:NE and not W:NW:NE.
The problem of Example 2 arises because there exist edges of polygon
that expand over three tiles of the reference region For instance,expands over tiles and In order to handle such situations,
we use the lines forming the minimum bounding box of the reference region to
divide the edges of the polygons representing the primary region and create
new edges such that (a) region does not change and (b) every new edge lies in
exactly one tile To this end, for every edge AB of region we compute the set of
intersection points of AB with the lines forming box We use the intersection
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 6Fig 4 Illustration of Examples 2 and 3
points of to divide AB into a number of segments Each
segment lies in exactly one tile of and the union of all tiles is
AB Thus, we can safely replace edge AB with without affecting
region Finally, to compute the cardinal direction between regions and we
only have to record the tile of where each new segment lies Choosing a single
point from each segment is sufficient for this purpose; we choose to pick the
middle of the segment as a representative point Thus, the tile where the middle
point lies gives us the tile of the segment too The above procedure is captured in
Algorithm COMPUTE-CDR (Fig 5) and is illustrated in the following example
Example 3. Let us continue with the regions of Example 2 (see also Fig 4)
Algo-rithm COMPUTE-CDR considers every edge of region (polygon
in turn and performs the replacements presented in the following table
It easy to verify that every new edge lies in exactly one tile of (Fig 4b)
The middle points of the new edges lie in and
Therefore, Algorithm COMPUTE-CDR returns B:W:NW:N:NE:E, which
precisely captures the cardinal direction relation between regions and
Notice that in Example 3, Algorithm COMPUTE-CDR takes as input a
quad-rangle (4 edges) and returns 9 edges This should be contrasted with the polygon
clipping method that would have resulted in 19 edges (2 triangles, 2 quadrangles
and 1 pentagon) Similarly, for the shapes in Fig 3b-c, Algorithm COMPUTE
-CDR introduces 8 and 11 edges respectively while polygon clipping methods
introduce 16 and 34 edges respectively
The following theorem captures the correctness of Algorithm COMPUTE
-CDR and measures its complexity
Theorem 1 Algorithm COMPUTE-CDR is correct, i.e., it returns the cardinal
direction relation between two regions and in REG* that are represented
using two sets of polygons and respectively The running time of Algorithm
Trang 7338 S Skiadopoulos et al.
Fig 5 Algorithm C OMPUTE -CDR
COMPUTE -CDR is where (respectively is the total number of
edges of all polygons in (respectively
Summarizing this section, we can use Algorithm COMPUTE-CDR to compute
the cardinal direction relation between two sets of polygons representing two
regions and in REG* The following section considers the case of cardinal
direction relations with percentages
3.2 Cardinal Direction Relations with Percentages
In order to compute cardinal direction relations with percentages, we have to
calculate the area of the primary region that falls in each tile of the reference
region A naive way for this task is to segment the polygons that form the primary
region so that every polygon lies in exactly one tile of the reference region Then,
for each tile of the reference region we find the polygons of the primary region
that lie inside it and compute their area In this section, we will propose an
alternative method that is based on Algorithm COMPUTE-CDR This method
simply computes the area between the edges of the polygons that represent
the primary region and an appropriate reference line without segmenting these
polygons
We will now present a method to compute the area between a line and an
edge Then, we will see how we can extend this method to compute the area of
a polygon We will first need the following definition
Definition 3 Let AB be an edge and be a line We say that does not cross
AB if and only if one of the following holds: (a) AB and do not intersect, (b)
AB and intersect only at point A or B, or (c) AB completely lies on
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 8Fig 6 Lines not crossing AB Fig 7. Area between an edge and a line
For example, in Fig 6 lines and do not cross edge AB Let us now
calculate the area between an edge and a line
Definition 4 Let and be two points forming edge
and be two lines that do not cross AB Let also and (respectively
– see also Fig 7 We define expression and as follows:
Expressions and can be positive or negative depending on
holds The absolute value of equals to the area
between edge AB and line i.e., the area of polygon In other
words, the following formula holds
Symmetrically, area between edge AB and line i.e., the area of polygon
equals to the absolute value of
Expressions and can be used to calculate the area of polygons Let
be a polygon, and be two lines that do not crosswith any edge of polygon The area of polygon denoted by can be
calculated as follows:
Notice that Computational Geometry algorithms, in order to calculate the
area of a polygon use a similar method that is based on a reference point
and be the projections of points A,B to line (respectively
the direction of vector It is easy to verify that and
Trang 9340 S Skiadopoulos et al.
Fig 8. Using expression to calculate the area of a polygon
(instead of a line) [12,16] This method is not appropriate for our case because
it requires to segment the primary region using polygon clipping algorithms (see
also the discussion at the beginning of Section 3) In the rest of this section, we
will present a method that utilizes expressions and and does not require
polygon clipping
pre-sented in Fig 8d The area of polygon can be calculated using formula
All the intermediateexpressions
are presented as the gray areas
of Fig 8a-d respectively
We will use expressions and to compute the percentage of the area of
the primary region that falls in each tile of the reference region Let us consider
region presented Fig 9 Region is formed by polygons and
Similarly to Algorithm COMPUTE-CDR, to compute the cardinaldirection relation with percentages of with we first use the to divide
the edges of region Let and be the lines forming
These lines divide the edges of polygons and
as shown in Fig 9
Let us now compute the area of that lies in the NW tile of (i.e.,
com-pute the area of polygon it is convenient to use as a reference line
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 10Fig 9 Computing cardinal direction relations with percentages
Doing so, we do not have to compute edges and becauseand hold and thus the area we are looking forcan be calculated with the following formula:
In other words, to compute the area of that lies in
we calculate the area between the west line of and every
edge of that lies in i.e., the following formula holds:
Similarly, to calculate the area of that lies in the and we can
use the expressions:
For instance, in Fig 9 we have
and
To calculate the area of that lies in and we
simply have to change the line of reference that we use In the first three cases,
we use the east line of (i.e., in Fig 9), in the fourth case, we use
the south line of and in the last case, we use the north line of
In all cases, we use the edges of that fall in the tile of that
we are interested in Thus, have:
For instance, in Fig 9 we have
and
Trang 11342 S Skiadopoulos et al.
Let us now consider the area of that lies in None of the lines of
can help us compute without segmenting the polygonsthat represent region For instance, in Fig 9 using line we have:
Edge is not an edge of any of the polygons representing To handle such
situations, we employ the following method We use the south line of
as the reference line and calculate the areas between and all edges
that lie both in and This area will be denoted by
and is practically the area of that lies on and i.e.,
Since has been previouslycomputed, we just have to subtract it from in order toderive For instance, in Fig 9 we have:
and
The above described method is summarized in Algorithm COMPUTE-CDR-CDR%
presented in Fig 10 The following theorem captures the correctness of
Algo-rithm COMPUTE-CDR% and measures its complexity
Theorem 2 Algorithm COMPUTE-CDR% is correct, i.e., it returns the
cardi-nal direction relation with percentages between two regions and in REG* that
are represented using two sets of polygons and respectively The running
time of Algorithm COMPUTE-CDR% is where (respectively is
the total number of edges of all polygons in (respectively
In the following section, we will present an actual system, CARDIRECT,
that incorporates and implements Algorithms COMPUTE-CDR and COMPUTE
-CDR%
Information
In this section, we will present a tool that implements the aforementioned
reason-ing tasks for the computation of cardinal direction relationships among regions
The tool, CARDIRECT, has been implemented in C++ over the Microsoft
Vi-sual Studio toolkit Using CARDIRECT, the user can define regions of interest
over some underlying image (e.g., a map), compute their relationships (with and
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 12Fig 10. Algorithm C OMPUTE -CDR%
without percentages) and pose queries The tool implements an XML interface,
through which the user can import and export the configuration he constructs
(i.e., the underlying image and the sets of polygons that form the regions); the
XML description of the configuration is also employed for querying purposes
The XML description of the exported scenarios is quite simple: A
configura-tion (Image) is defined upon an image file (e.g., a map) and comprises a set of
regions and a set of relations among them Each region comprises a set of
poly-gons of the same color and each polygon comprises a set of edges (defined by
and The direction relations among the different regions are all
stored in the XML description of the configuration The DTD for CARDIRECT
configurations is as follows
Trang 13344 S Skiadopoulos et al.
Fig 11. Using C AR D IRECT to annotate images
Observe Fig 11 In this configuration, the user has opened a map of Ancient
Greece at the time of the Peloponnesian war as the underlying image Then, the
user defined three sets of regions: the “Athenean Alliance” in blue, comprising
of Attica, the Islands, the regions in the East, Corfu and South Italy; (b) the
“Spartan Alliance” in red, comprising of Peloponnesos, Beotia, Crete and Sicely;
and (c) the “Pro-Spartan” in black, comprising of Macedonia
Moreover, using CARDIRECT, the user can compute the cardinal direction
relations and the cardinal direction relations with percentages between the
iden-tified regions In Fig 12, we have calculated the relations between the regions
of Fig 11 For instance, Peloponnesos is B:S:SW:W of Attica (left side of Fig.
12) while Attica is
of Peloponnesos (right-hand side of Fig 12)
The query language that we employ is based on the following simple model
Let be a set of regions in REG* over a configuration Let C
be a finite set of thematic attributes for the regions of REG* (e.g., the color of
each region) and a function, practically relating each of
the regions with a value over the domain of C (e.g., the fact that the Spartan
Alliance is colored red)
A query condition over variables is a conjunction the following
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 14Fig 12. Using C AR D IRECT to extract cardinal direction relations
region of the configuration, is a value of a thematic attribute and
is a (possibly disjunctive) cardinal direction relation A query over
variables is a formula of the form
where is a query condition
Intuitively, the query returns a set of regions in the configuration of an image
that satisfy the query condition, which can take the form of: (a) a cardinal
direction constraint between the query variables (e.g., B:SE:S (b) arestriction in the thematic attributes of a variable (e.g., and
(c) direct reference of a particular region (e.g.,
For instance, for the configuration of Fig 11 we can pose the following query:
“Find all regions of the Athenean Alliance which are surrounded by a region in
the Spartan Alliance” This query can be expressed as follows:
In this paper, we have addressed the problem of efficiently computing the
car-dinal direction relations between regions that are composed of sets of polygons
(a) by presenting two linear algorithms for this task, and (b) by explaining
their incorporation into an actual system These algorithms take as inputs two
sets of polygons representing two regions respectively The first of the proposed
algorithms is purely qualitative and computes the cardinal direction relations
between the input regions The second has a quantitative aspect and computes
the cardinal direction relations with percentages between the input regions To
the best of our knowledge, these are the first algorithms that address the
afore-mentioned problem The algorithms have been implemented and embedded in
an actual system, CARDIRECT, which allows the user to specify, edit and
anno-tate regions of interest in an image Then, CARDIRECT automatically computes
Trang 15346 S Skiadopoulos et al.
the cardinal direction relations between these regions The configuration of the
image and the introduced regions is persistently stored using a simple XML
de-scription The user is allowed to query the stored XML description of the image
and retrieve combinations of interesting regions on the basis of the query
Although this part of our research addresses the problem of relation
compu-tation to a sufficient extent, there are still open issues for future research First,
we would like to evaluate experimentally our algorithm against polygon clipping
methods A second interesting topic is the possibility of combining topological
[2] and distance relations [3] Another issue is the possibility of combining the
underlying model with extra thematic information and the enrichment of the
employed query language on the basis of this combination Finally, a long term
goal would be the integration of CARDIRECT with image segmentation software,
which would provide a complete environment for the management of image
E Clementini, P Di Fellice, and G Califano Composite Regions in Topological
Queries Information Systems, 7:759–594, 1995.
M.J Egenhofer Reasoning about Binary Topological Relationships In Proceedings
of SSD’91, pages 143–160, 1991.
A.U Frank Qualitative Spatial Reasoning about Distances and Directions in
Geographic Space Journal of Visual Languages and Computing, 3:343–371, 1992.
A.U Frank Qualitative Spatial Reasoning: Cardinal Directions as an Example.
International Journal of GIS, 10(3):269–290, 1996.
R Goyal and M.J Egenhofer The Direction-Relation Matrix: A Representation
for Directions Relations Between Extended Spatial Objects In the annual
assem-bly and the summer retreat of University Consortium for Geographic Information
Systems Science, June 1997.
R Goyal and M.J Egenhofer Cardinal Directions Between Extended Spatial
Objects IEEE Transactions on Data and Knowledge Engineering, (in press), 2000.
Available at http://www.spatial.maine.edu/~max/RJ36.html.
Y.-D Liang and B.A Barsky A New Concept and Method for Line Clipping.
ACM Transactions on Graphics, 3(1):868–877, 1984.
G Ligozat Reasoning about Cardinal Directions Journal of Visual Languages
and Computing, 9:23–44, 1998.
S Lipschutz Set Theory and Related Topics McGraw Hill, 1998.
P.-G Maillot A New, Fast Method For 2D Polygon Clipping: Analysis and
Soft-ware Implementation ACM Transactions on Graphics, 11(3):276–290, 1992.
A Mukerjee and G Joe A Qualitative Model for Space In Proceedings of
AAAI’90, pages 721–727, 1990.
J O’Rourke Computational Geometry in C Cambridge University Press, 1994.
D Papadias, Y Theodoridis, T Sellis, and M.J Egenhofer Topological
Rela-tions in the World of Minimum Bounding Rectangles: A Study with R-trees In
Proceedings of ACM SIGMOD’95, pages 92–103, 1995.
C.H Papadimitriou, D Suciu, and V Vianu Topological Queries in Spatial
Databases Journal of Computer and System Sciences, 58(1):29–53, 1999.
Trang 16D.J Peuquet and Z Ci-Xiang An Algorithm to Determine the Directional
Rela-tionship Between Arbitrarily-Shaped Polygons in the Plane Pattern Recognition,
20(1):65–74, 1987.
F Preparata and M Shamos Computational Geometry: An Introduction Springer
Verlag, 1985.
J Renz and B Nebel On the Complexity of Qualitative Spatial Reasoning: A
Maximal Tractable Fragment of the Region Connection Calculus Artificial
Intel-ligence, 1-2:95–149, 1999.
P Rigaux, M Scholl, and A Voisard Spatial Data Bases Morgan Kaufman, 2001.
S Skiadopoulos, C Giannoukos, P Vassiliadis, T Sellis, and M Koubarakis
Com-puting and Handling Cardinal Direction Information (Extended Report)
Techni-cal Report TR-2003-5, National TechniTechni-cal University of Athens, 2003 Available
at http://www.dblab.ece.ntua.gr/publications.
S Skiadopoulos and M Koubarakis Composing Cardinal Directions Relations In
Proceedings of the 7th International Symposium on Spatial and Temporal Databases
(SSTD’01), volume 2121 of LNCS, pages 299–317 Springer, July 2001.
S Skiadopoulos and M Koubarakis Qualitative Spatial Reasoning with Cardinal
Directions In Proceedings of the 7th International Conference on Principles and
Practice of Constraint Programming (CP’02), volume 2470 of LNCS, pages 341–
Trang 17A Tale of Two Schemas: Creating a Temporal XML
Schema from a Snapshot Schema with
Faiz Currim1, Sabah Currim1, Curtis Dyreson2, and Richard T Snodgrass1
1 University of Arizona, Tucson, AZ, USA
of an XML document can vary over time, how the document can change, and where timestamps should be placed The advantage of using annotations to denote the time-varying aspects is that logical and physical data independence for temporal schemas can be achieved while remaining fully compatible with both existing XML Schema documents and the XML Schema recommendation.
1 Introduction
XML is becoming an increasingly popular language for documents and data XML
can be approached from two quite separate orientations: a document-centered
orientation (e.g., HTML) and a data-centered orientation (e.g., relational and
object-oriented databases) Schemas are important in both orientations A schema defines the
building blocks of an XML document, such as the types of elements and attributes
An XML document can be validated against a schema to ensure that the document
conforms to the formatting rules for an XML document (is well-formed) and to the
types, elements, and attributes defined in the schema (is valid) A schema also serves
as a valuable guide for querying and updating an XML document or database For
instance, to correctly construct a query, e.g., in XQuery, a user will (usually) consult
the schema rather than the data Finally, a schema can be helpful in query
optimization, e.g., in constructing a path index [24]
Several schema languages have been proposed for XML [22] From among these
languages, XML Schema is the most widely used The syntax and semantics of XML
Schema 1.0 are W3C recommendations [35, 36]
Time-varying data naturally arises in both document-centered and data-centered
orientations Consider the following wide-ranging scenarios In a university, students
take various courses in different semesters At a company, job positions and salaries
change At a warehouse, inventories evolve as deliveries are made and good are
E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 348–365, 2004.
© Springer-Verlag Berlin Heidelberg 2004
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 18shipped In a hospital, drug treatment regimes are adjusted And finally at a bank,
account balances are in flux In each scenario, querying the current state is important,
e.g., “how much is in my account right now”, but it also often useful to know how the
data has changed over time, e.g., “when has my account been below $200”
An obvious approach would have been to propose changes to XML Schema to
accommodate time-varying data Indeed, that has been the approach taken by many
researchers for the relational and object-oriented models [25, 29, 32] As we will
discuss in detail, that approach inherently introduces difficulties with respect to
document validation, data independence, tool support, and standardization So in this
paper we advocate a novel approach that retains the non-temporal XML schema for
the document, utilizing a series of separate schema documents to achieve data
independence, enable full document validation, and enable improved tool support,
while not requiring any changes to the XML Schema standard (nor subsequent
extensions of that standard; XML Schema 1.1 is in development)
The primary contribution of this paper is to introduce the (Temporal
XML Schema) data model and architecture is a system for constructing
schemas for time-varying XML documents1 A time-varying document records the
evolution of a document over time, i.e., all of the versions of the document
has a three-level architecture for specifying a schema for time-varyingdata2 The first level is the schema for an individual version, called the snapshot
schema. The snapshot schema is a conventional XML Schema document The second
level is the temporal annotations of the snapshot schema The temporal annotations
identify which elements can vary over time For those elements, the temporal
annotations also effect a temporal semantics to the various integrity constraints (such
as uniqueness) specified in the snapshot schema The third level is the physical
annotations. The physical annotations describe how the time-varying aspects are
represented Each annotation can be independently changed, so the architecture has
(logical and physical) data independence [7] Data independence allows XML
documents using one representation to be automatically converted to a different
representation while preserving the semantics of the data has a suite of
auxiliary tools to manage time-varying documents and schemas There are tools to
convert a time-varying document from one physical representation to a different
representation, to extract a time slice from that document (yielding a conventional
static XML document), and to create a time-varying document from a sequence of
static documents, in whatever representation the user specifies
As mentioned, reuses rather than extends XML Schema is
consistent and compatible with both XML Schema and the XML data model In
a temporal validator augments a conventional validator to morecomprehensively check the validity constraints of a document, especially temporal
constraints that cannot be checked by a conventional XML Schema validator We
describe a means of validating temporal documents that ensures the desirable property
of snapshot validation subsumption We show elsewhere how a temporal document
can be smaller and faster to validate than the associated XML snapshots [12]
1
We embrace both the document and data centric orientations of XML and will use the terms
“document” and “database” interchangeably.
2
Three-level architectures are a common architecture in both databases [33] and
spatio-temporal conceptual modeling [21].
Trang 19350 F Currim et al.
While this paper concerns temporal XML Schema, we feel that the general
approach of separate temporal and physical annotations is applicable to other data
models, such as UML [28] The contribution of this paper is two-fold: (1) introducing
a three-level approach for logical data models and (2) showing in detail how this
approach works for XML Schema in particular, specifically concerning a theoretical
definition of snapshot validation subsumption for XML, validation of time-varying
XML documents, and implications for tools operating on realistic XML schemas and
data, thereby exemplifying in a substantial way the approach While we are confident
that the approach could be applied to other data models, designing the annotation
specifications, considering the specifics of data integrity constraint checking, and
ascertaining the impact on particular tools remain challenging (and interesting) tasks
focuses on instance versioning (representing a time-varying XML instance document) and not schema versioning [15, 31] The schema can describe
which aspects of an instance document change over time But we assume that the
schema itself is fixed, with no element types, data types, or attributes being added to
or removed from the schema over time Intensional XML data (also termed dynamic
XML documents [1]), that is, parts of XML documents that consist of programs that
generate data [26], are gaining popularity Incorporating intensional XML data is
beyond the scope of this paper
The next section motivates the need for a new approach Section 0 provides a
theoretical framework for while an overview of its architecture is in
Section 0 Details of the may be found in Section 0 Related work is
reviewed in Section 0 We end with a summary and list of future work in Section 0
This section discusses whether conventional XML Schema is appropriate and
satisfactory for time-varying data We first present an example that illustrates how a
time-varying document differs from a conventional XML document We then
pinpoint some of the limitations of XML Schema Finally we state the desiderata for
schemas for time-varying documents
2.1 Motivating Example
Assume that the history of the Winter Olympic games is described in an XML
document called winter.xml The document has information about the athletes
that participate, the events in which they participate, and the medals that are awarded
Over time the document is edited to add information about each new Winter
Olympics and to revise incorrect information Assume that information about the
athletes participating in the 2002 Winter Olympics in Salt Lake City, USA was added
on 2002-01-01 On 2002-03-01 the document was further edited to record the medal
winners Finally, a small correction was made on 2002-07-01
To depict some of the changes to the XML in the document, we focus on
information about the Norwegian skier Kjetil Andre Aamodt On 2002-01-01 it was
known that Kjetil would participate in the games and the information shown in Fig 1
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 20was added towinter.xml Kjetil won a medal; so on 2002-03-01 the fragment was
revised to that shown in Fig 2 The edit on 2002-03-01 incorrectly recorded that
Kjetil won a silver medal in the Men’s Combined; Kjetil won a gold medal Fig 3
shows the correct medal information
Fig 2. Kjetil won a medal, as of 2002-03-01
Fig 3. Medal data is corrected on 2002-07-01
A time-varying document records a version history, which consists of the
information in each version, along with timestamps indicating the lifetime of that
version Fig 4 shows a fragment of a time-varying document that captures the history
of Kjetil The fragment is compact in the sense that each edit results in only a small,
localized change to the document The history is also bi-temporal because both the
valid time and transaction time lifetimes are captured [20] The valid time refers to the
time(s) when a particular fact is true in the modeled reality, while the transaction time
is the time when the information was edited The two concepts are orthogonal
Time-varying documents can have each kind of time In Fig 4 the valid- and
transaction-time lifetransaction-times of each element are represented with an optional <rs : transaction-timestamp>
sub-element3 If the timestamp is missing, the element has the same lifetime as its
enclosing element For example, there are two <athlete> elements with different
lifetimes since the content of the element changes The last version of <athlete>
has two <medal> elements because the medal information is revised There are
many different ways to represent the versions in a time-varying document; the
methods differ in which elements are timestamped, how the elements are
timestamped, and how changes are represented (e.g., perhaps only differences
between versions are represented)
Keeping the history in a document or data collection is useful because it provides
the ability to recover past versions, track changes over time, and evaluate temporal
queries [17] But it changes the nature of validating against a schema Assume that the
3
The introduced <rs : timestamp> element is in the “rs” namespace to distinguish it from
any <timestamp> elements already in the document This namespace will be discussed in
more detail in Sections 0 and 0.
Fig 1. A fragment of winter.xml on 2002-01-01
Trang 21352 F Currim et al.
Fig 4. A fragment of a time-varying document
Fig 5. An extract from the winOlympic schema
file winOlympic.xsd contains the snapshot schema for winter.xml The
snapshot schema is the schema for an individual version The snapshot schema is a
valuable guide for editing and querying individual versions A fragment of the schema
is given in Fig 5 Note that the schema describes the structure of the fragment shown
in Fig 1, Fig 2, and Fig 3 The problem is that although individual versions conform
to the schema, the time-varying document does not So winOlympic.xsd cannot
be used (directly) to validate the time-varying document of Fig 4
The snapshot schema could be used indirectly for validation by individually
reconstituting and validating each version But validating every version can be
expensive if the changes are frequent or the document is large (e.g., if the document is
a database) While the Winter Olympics document may not change often, contrast this
with, e.g., a Customer Relationship Management database for a large company
Thousands of calls and service interactions may be recorded each day This would
lead to a very large number of versions, making it expensive to instantiate and
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 22validate each individually The number of versions is further increased because there
can be both valid time and transaction time versions
To validate a time-varying document, a new, different schema is needed The
schema for a time-varying document should take into account the elements (and
attributes) and their associated timestamps, specify the kind(s) of time involved,
provide hints on how the elements vary over time, and accommodate differences in
version and timestamp representation Since this schema will express how the
time-varying information is represented, we will call it the representational schema The
representational schema will be related to the underlying snapshot schema (Fig 5),
and allows the time-varying document to be validated using a conventional XML
Schema validator (though not fully, as discussed in the next section)
2.2 Moving beyond XML Schema
Both the snapshot and representational schemas are needed for a time-varying
document The snapshot schema is useful in queries and updates For example, a
current query applies to the version valid now, a current update modifies the data in
the current version, creating a new version, and a timeslice query extracts a previous
version All of these apply to a single version of a time-varying document, a version
described by the snapshot schema The representational schema is essential for
validation and representation (storage) Many versions are combined into a single
temporal document, described by the representational schema
Unfortunately the XML Schema validator is incapable of fully validating a
time-varying document using the representational schema First, XML Schema is not
sufficiently expressive to enforce temporal constraints For example, XML Schema
cannot specify the following (desirable) schema constraint: the transaction-time
lifetime of a <medal> element should always be contained in the transaction-time
lifetime of its parent <athlete> element Second, a conventional XML Schema
document augmented with timestamps to denote time-varying data cannot, in general,
be used to validate a snapshot of a time-varying document A snapshot is an instance
of a time-varying document at a single point in time For instance, if the schema
asserts that an element is mandatory (minOccurs=1) in the context of another
element, there is no way to ensure that the element is in every snapshot since the
element’s timestamp may indicate that it has a shorter lifetime than its parent
(resulting in times during which the element is not there, violating this integrity
constraint); XML Schema provides no mechanism for reasoning about the
timestamps
Even though the representational and snapshot schemas are closely related, there
are no existing techniques to automatically derive a representational schema from a
snapshot schema (or vice-versa) The lack of an automatic technique means that users
have to resort to ad hoc methods to construct a representational schema Relying on
ad hoc methods limits data independence The designer of a schema for time-varying
data has to make a variety of decisions, such as whether to timestamp with periods or
with temporal elements [16], which are sets of non-overlapping periods and which
elements should be time-varying By adopting a tiered approach, where the snapshot
XML Schema, temporal annotations, and physical annotations are separate
documents, individual schema design decisions can be specified and changed, often
Trang 23354 F Currim et al.
without impacting the other design decisions, or indeed, the processing of tools For
example, a tool that computes a snapshot should be concerned primarily with the
snapshot schema; the logical and physical aspects of time-varying information should
only affect (perhaps) the efficiency of that tool, not its correctness With physical data
independence, few applications that are unconcerned with representational details
would need to be changed
Finally, improved tool support for representing and validating time-varying
information is needed Creating a time-varying XML document and representational
schema for that document is potentially labor-intensive Currently a user has to
manually edit the time-varying document to insert timestamps indicating when
versions of XML data are valid (for valid time) or are present in the document (for
transaction time) The user also has to modify the snapshot schema to define the
syntax and semantics of the timestamps The entire process would be repeated if a
new timestamp representation were desired It would be better to have automated
tools to create, maintain, and update time-varying documents when the representation
of the timestamped elements changes
2.3 Desiderata
In augmenting XML Schema to accommodate time-varying data, we had several
goals in mind At a minimum, the new approach would exhibit the following desirable
features
Simplify the representation of time for the user
Support a three-level architecture to provide data independence, so that changes in
the logical and physical level are isolated
Retain full upward compatibly with existing standards and not require any changes
to these standards
Augment existing tools such as validating parsers for XML in such a way that
those tools are also upward compatible Ideally, any off-the-shelf validating parser
(for XML Schema) can be used for (partial) validation
Support both valid time and transaction time
Accommodate a variety of physical representations for time-varying data
Support instance versioning
Note that while ad hoc representational schemas may meet the last three desiderata,
they certainly don’t meet the first four Other desirable features, outside the scope of
this paper, include supporting schema versioning and accommodating temporal
indeterminacy and granularity
This section sketches the process of constructing a schema for a time-varying
document from a snapshot schema The goal of the construction process is to create a
schema that satisfies the snapshot validation subsumption property, which is
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Trang 24described in detail below In the relational data model, a schema defines the structure
of each relation in a database Each relation has a very simple structure: a relation is a
list of attributes, with each attribute having a specified data type The schema also
includes integrity constraints, such as the specification of primary and foreign keys In
a similar manner, an XML Schema document defines the valid structure for an XML
document But an XML document has a far more complex structure than a relation A
document is a (deeply) nested collection of elements, with each element potentially
having (text) content and attributes
3.1 Snapshot Validation Subsumption
Let be an XML document that contains timestamped elements A timestamped
element is an element that has an associated timestamp (A timestamped attribute can
be modeled as a special case of a timestamped element.) Logically, the timestamp is a
collection of times (usually periods) chosen from one or more temporal dimensions
(e.g., valid time, transaction time) Without loss of generality, we will restrict the
discussion in this section to lifetimes that consist of a single period in one temporal
dimension4 The timestamp records (part of) the lifetime of an element5 We will use
be denoted as One constraint on the lifetime is that the lifetime of an
element must be contained in the lifetime of each element that encloses it6
The snapshot operation extracts a complete snapshot of a time-varying document
at a particular instant Timestamps are not represented in the snapshot A snapshot at
or with the empty string, otherwise The snapshot operation is denoted as where D is the snapshot at time t of the time-varying document
Let be a representational schema for a time-varying document The snapshot
validation subsumption property captures the idea that, at the very least, the
representational schema must ensure that every snapshot of the document is valid
with respect to the snapshot schema Let vldt(S,D) represent the validation status of
document D with respect to schema S The status is true if the document is valid but
false otherwise Validation also applies to time-varying documents, e.g.,
is the validation status of with respect to a representational schema, using a
temporal validator
Property [Snapshot Validation Subsumption] Let S be an XML Schema document,
be a time-varying XML document, and be a representational schema, also an
4
The general case is that a timestamp is a collection of periods from multiple temporal
dimensions (a multidimensional temporal element).
5
Physically, there are myriad ways to represent a timestamp It could be represented as an
<rs: timestamp> subelement in the content of the timestamped element as is done in the
fragment in Fig 4 Or it could be a set of additional attributes in the timestamped element, or
it could even be a <rs : version> element that wraps the timestamped element.
6
Note that the lifetime captures only when an element appears in the context of the enclosing
elements The same element can appear in other contexts (enclosed by different elements) but
clearly it has a different lifetime in those contexts.
the notation to signify that element x has been timestamped Let the lifetime of
time t replaces each timestamped element with its non-timestamped copy x if t is in
Trang 25356 F Currim et al.
XML Schema document is said to have snapshot validation subsumption with
respect to S if
Intuitively, the property asserts that a good representational schema will validate only
those time-varying documents for which every snapshot conforms to the snapshot
schema The subsumption property is depicted in the following correspondence
diagram
Fig 6 Snapshot validation subsumption
Details of the process for constructing a schema for a time-varying document that
conforms to the snapshot validation subsumption property from a snapshot schema
are available in a technical report by the authors [12]
The architecture of is illustrated in Fig 7 This figure is central to our
approach, so we describe it in detail and illustrate it with the example We note that
although the architecture has many components, only those components shaded gray
in the figure are specific to an individual time-varying document and need to be
supplied by a user New time-varying schemas can be quickly and easily developed
and deployed We also note that the representational schema, instead of being the only
schema in an ad hoc approach, is merely an artifact in our approach, with the snapshot
schema, temporal annotations, and physical annotations being the crucial
specifications to be created by the designer
The designer annotates the snapshot schema with temporal annotations (box 6).
The temporal annotations together with the snapshot schema form the logical schema.
Fig 8 provides an extract of the temporal annotations on the winOlympic schema
The temporal annotations specify a variety of characteristics such as whether an
element or attribute varies over valid time or transaction time, whether its lifetime is
described as a continuous state or a single event, whether the item itself may appear at
certain times (and not at others), and whether its content changes For example,
<athlete> is described as a state element, indicating that the <athlete> will be
valid over a period (continuous) of time rather than a single instant Annotations can
be nested, enabling the target to be relative to that of its parent, and inheriting as
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.