Abstr Act The main issues of spatial databases and Geographic Information System GIS, concern the tation, the management and the manipulation of a large and complex number of spatial obj
Trang 1
Virtual Environments for Geospatial Applications
Critical Issues in the Design and
Implementation of Geospatial Virtual
environments
This section concisely discusses some limitations
and constraints typically experienced in several
virtual world generations as well One noteworthy
issue is that in visualizing real-world scenarios,
there is an inevitable trade-off amid performance
and resolution Exploiting the complete
capabili-ties of virtual environments over the Web
contin-ues to pose problems As the number of objects in
a virtual environment increases, online hosting
becomes an issue as spontaneous rendering of
numerous objects is no easy task Scenes with a
greater number of polygons decelerate the system
and make the interactivity poor Several factors need to be considered during visualization such
as the type and volume of data to be visualized, memory constraints, and system performance Table 2 presents a summary of the significant issues concerning geo-virtual environments In their work on information visualization, Robert-son et al (1993) have presented a terse compilation
of the important issues
In his work on dynamic and interactive based visualizations, Huang and Lin (1999, 2001, and 2002) discuss in detail some of these concerns and also address some critical issues concerning online hosting of interactive visualizations The Java-3D based hybrid method that Huang and Lin (1999, 2001) propose offers a standard framework
web-Figure 8 a) 3D virtual environment depicting geospatial processes (1 picture of a series) such as scape change over time etc.; b) 3D virtual environment depicting water flow in a reservoir
land-Table 2 A summary of critical issues in designing and implementing 3D virtual worlds
Photo-realistic scene generations
Generating 3D virtual e nvironments w ith
adequate photo-realism
Bandwidth Limitations
3D Scenes with n umerous objects, rendering
difficulty, and transmission speed
Browser and Plug-in Compatibility
Compatibility a mong v arious b rowsers as well
as plug-ins
User Navigation capabilities
Users n eed skills to n avigate and situate themselves within immersive virtual worlds
Lag in real-time interaction
Complex Scenes n ot o nly take t ime to r ender, b ut also cause delays/lags during navigation/interaction
Data Integrity and online security issues
Sensitive data must be p rotected, and the data represented by such 3D worlds should be up-to-date
Trang 2for visualizing dynamic environmental processes
Figure 9 illustrates a 3-tier configuration that
Huang and Lin (1999) proposed in GeoVR The
visualization server that is interlinked to the spatial
database accesses the geospatial information from
the data repository and the web server accesses
the visualization server for 3D information This
framework efficiently handles requests for
visual-izing dynamic processes and based on the client
requests, the web server provides the appropriate
information in the conventional HTML or 3D
VRML format
dIscuss Ion And c onc Lus Ion
Over the past several decades, information
presen-tation has inspired the development of several new
tools and techniques The information revolution
has resulted in vast amounts of data that are far
too complex, both in quality and quantity, to be
handled by conventional tools and techniques
Recent technological advances in the realm of
remote sensing have dramatically increased the
amount of geospatial data available Virtual
en-vironments are an efficient means of visualizing
voluminous geospatial data and are efficient in elucidating the intricate patterns as well as hidden and associated information Such virtual environ-ments facilitate understanding of the complex relationships among the various components of
a multi-level scenario
This paper discussed the design and mentation of virtual worlds that can be used to generate both static representations depicting real-world settings and dynamic representations that can simulate geospatial processes and en-vironmental phenomena The paper discussed the generation of such geo-virtual environments with examples and provided explanations as to how such geo-visualization applications facilitate understanding of various geospatial phenomena and environmental processes The fundamental principles underlying the generation of virtual worlds, both static and dynamic, were elaborated and the common issues involved in the generation
imple-of such 3D virtual worlds were discussed more, the issues related to the online hosting of such virtual environments were tersely delineated and possible solutions to frequently encountered problems were provided
Further-Figure 9 Online hosting of interactive visualization (From Huang et al.,1999)
Trang 3Bonham-Carter, G F (1994) Geographic
Infor-mation Systems for Geoscientists: Modeling with
GIS Pergemon: Oxford (p 398).
Boyd, D S., Lansdown, J., & Huxor, A (1996)
The Design of Virtual Environments SIMA.
Chandramouli, M., Lan-Kun, C., Tien-Yin, C.,
&vChing-Yi, K (2004) Design and
Implementa-tion of Virtual Environments for Visualizing 3D
Geospatial Data TGIS Conference, Oct 28-29
2004
Chandramouli, M., Huang, B., Yin Chou, T.,
Kun Chung, L., & Wu, Q (2006) Design and
Implementation of Virtual Environments for
plan-ning and Building Sustainable Railway Transit
Systems, COMPRAIL July 2006, Prague.
Colin, W (2000) Information Visualization:
Perception for Design Morgan Kaufmann Series
in Interactive Technologies GeoVRML, (www
geovrml.org)
Huang, B., & Lin, H (1999) GeoVR: A
Web-based tool for virtual reality presentation from
2D GIS data Computers & Geosciences, 25(10),
1167-1175
Huang, B., Jiang, B., & Lin, H (2001) An
inte-gration of GIS, virtual reality and the Internet for
spatial data exploration International Journal of
GIS, 15(5): 439-456.
Huang, B., & Lin, H (2002) A Java/CGI approach
to developing a geographic virtual reality toolkit
on the Internet Computers & Geosciences, 28(1),
13-19
Karel, C., & Jiri, Z (n/d) Using VRML for
creat-ing interactive demonstrations of physical models
Department of computer science and Engineering
Czech Technical University
Robertson, G., Card, S., & Mackinlay, J D (1993) Information Visualization Using 3D Interactive
Animation Communications of the ACM, 36,
Sutherland, I E (1965) The ultimate display In
the proceedings of the IFIPS Congress, 2,
506-508 New York City, NY
key t er Ms
Immersion: A Sense of being present within
the virtual world and a ‘sense’ being able to alize objects by being amidst their surroundings and navigating through the world
visu-Node: An entity within the hierarchical scene
structure that represents a group of objects
OpenSource: Source code or computer
soft-ware that is freely offered and is available to the public for building software applications
Scene-Hierarchy: The organization of the
elements of a 3D virtual scene into successive levels, in such a way that the object under which other objects are grouped is called the parent and the grouped objects are called its children When
a parent object is transformed, the children are also transformed
SCRIPT: Program scripts that are used to
perform calculations and return values to the calling programs
Transformation: Operations such as
transla-tion, rotatransla-tion, or scaling involving objects in a virtual environment
Trang 4Virtual Reality: A three-dimensional visual
immersive setting that facilitates user to navigate
within the scene and perform operations in real
time
Trang 5Cleveland State University, USA
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
Geospatial predictive models often require mapping of predefined concepts or categories with various conditioning factors in a given space This chapter discusses various aspects of uncertainty in predictive modeling by characterizing different typologies of classification uncertainty It argues that understanding uncertainty semantics is a perquisite for efficient handling and management of predictive models
1 spAt IAL pred Ict Ion And
cLA ss If Ic At Ion
Geospatial predictive models entail an array of
analytical techniques of data mining, classical
statistical and geostatistical models that attempt
to predict spatial states and behavior of objects
from a fine set of observations The process of
pre-diction presupposes a set of spatial concepts and
categories to which objects are to be mapped For
example, spatial processes, such as classification
of land cover from satellite image, modeling
for-est fire, propagation of epidemics, and prediction
of urban sprawl require a unifying and common reference of “space” or location where the multiple features of spatial attributes are to be mapped to predefined class labels The prediction of spatial features can be conceived as a process of driving classification schemes in relation to certain spa-tial properties such as neighborhood, proximity, dependency, as well as similarity of non-spatial attributes (Han & Kamber, 2006; Shekhar & Chawla, 2003) In data mining, a classification function is often defined as a mapping function:
Trang 6: → A
f , where A is the domain of function,
f represents attribute space and C is the set of
class categories
2 uncert AInty In spAt IAL
cLA ss If Ic At Ion
Uncertainty may emerge from ontological
con-straints in classification i.e., from the lack of
specification of what kind of spatial objects
ex-ist, as well as from epistemic limitations which
concern whether such objects are knowable to
subjective schemes, and if so, to what extent
they can be represented in the subjective
frame-work, given the limited empirical evidences
Epistemic uncertainty in spatial classification
emerges due to inadequate representation of
spatial knowledge which is often incomplete,
imprecise, fragmentary, and ambiguous The
at-tributes of spatial objects or evidences suggesting
various conceptual or thematic classes may often
suggest conflicting categories Moreover,
clas-sification labels are dependent on the resolution
of observation and the extent of granularity For
example, the observation of coarser granularity
offers less detail while the clumping of
informa-tion into pixels in remotely sensed images may
prevent sub-pixel entities being distinguished
(Fisher, 1997) The classification of land cover
from satellite image depends not only on a specific
spatial resolution, radiometric resolution and the
corresponding spectral signatures limit predictive
accuracy Therefore, spatial characteristics of a
given observation are indiscernible with respect
to attributes associated with it For example, the
number of vegetation types that can be identified
from an NDVI (Normalized Difference Vegetation
Index) image significantly increases when a very
high radiometric resolution is used Moreover,
in a specific case, a multispectral image may
provide more accuracy than a hyperspectral
image, but such accuracy is of little value if it is
achieved at the cost of less specificity or higher imprecision
3 t ypo Log Ies of cLA ss If Ic At Ion uncert AInty
While there is increasing awareness of certainty, and its aspects and dimensions in predictive as well as classificatory schemes, little agreement exists among experts on how to characterize them Many typologies of uncer-tainty have been suggested from risk analysis perspective, which often overlaps and builds on each other (Ferson & R Ginzburg, 1996; Linkov
un-& Burmistrov 2003; Regan et al., 2002) These typologies make distinctions between variability and lack of knowledge at the parameter and model level However, from the geographic information perspective, the ontological specification of im-perfection of geographic data provides some key vocabularies and taxonomies to deal with spatial uncertainties (Duckham et al., 2001; Worboys & Clementini, 2001) Such ontology distinguishes between inaccuracy (i.e., errors or commission
or omission) and imprecision, which arises from limitations on the granularity of the schema or levels of detail obtainable for an observation under which the observation is made (Worboys, 1998) The concept “vagueness” refers to indeterminate boundary-line cases or “inexact concepts” Classification of geographic objects with in-determinate boundaries offers many challenges (Burrough & Frank, 1996) which emerge from the boundary of many real entities representing natural, social, or cultural phenomena (for exam-ple, forests, mountains, areas ethnic distribution etc.) Since many common geographical concepts are vague (Fisher, 2000), the explicit specifica-tion of vagueness is essential to characterize the classification performance As a special type of vagueness, nonspecifity originates due to our inability to discern the true alternatives among several alternatives in a given context It implies
Trang 7
Managing Uncertainty in Geospatial Predictive Models
cardinality of undiscerned alternatives (Klir &
Yuan, 1995) The larger the set of alternatives,
the higher is the nonspecifity For example, in
a remotely sensed image, a pixel with class type
“forest” and the mean annual temperature > 30C
has less nonspecifity than the pixel labeled only
with “forest” type This is because in the latter
case a pixel can have a large number of possible
variations of “forest” type
Broadly, three major categories of uncertainty
can be identified in dealing with predictive and
classificatory problems: ontological uncertainty,
epistemological uncertainty, and deontological or
normative uncertainty The typology illustrated
in Figure 1 is relevant to mainly geospatial data
and includes many important components and
concept provided in Morgan & Henrion (1998),
Finkel (1990), and Cullen & Frey (1999) and
Haimes (2004) The types presented here is in by
no means mutually exclusive, i.e., some concepts
may subtly overlap each other in a specific context
Ontologically, variability, also known as aleatory
or objective uncertainty, occurs when the object that needs to be classified actually exhibits multi-plicity across space, time and scale An empirical quantity measured in a single point may objec-tively manifest multiple aspects in a collective process For example, land cover classes are not only influenced by seasonal and spatial extent, but also the topographic formation due to self-similar features of geological objects requires specificity
of fractal dimension of a classification scheme The spurious correlation representing the so called of
ecological fallacy resulting from modifiable areal
unit problem (MAUP) (Openshaw, 1984) indicates the requirement of adequate disaggregation in spatial data to be analyzed In image processing, uncertainty often arises due to the assignment of more than one class to a pixel This specialized
type of pixel, often known as mixel, indicates
uncertainty resulting from variability Similarly, variability or the degree of spatial heterogeneity
• Range of risk tolerance
• Deontological/
Figure 1 Types of uncertainty in dealing with geospatial predictive and classificatory systems
Trang 8is also reflected in the measures of fragmentation
of a landscape The uncertainty stemming from
variability can not be handled by a reductionist
approach, but needs to be managed by a process
of disaggregation of data Measures often used to
manage this kind of uncertainty are: estimating
space-time frequency distribution,
disaggrega-tion by pixel unmixing or decoupling, estimating
entropy as indicator of fragmentation, computing
self-similarity and fractal dimension (Kallimanis
et al., 2002), and multiscale and multiresolution
analysis using wavelet (Kolaczyk et al.,2005;
Nychka et al., 2001)
While the origin of uncertainty due to
vari-ability is objective and ontological in nature,
parameter uncertainty and model uncertainty
reflect the epistemic state or lack of knowledge in
a classificatory scheme Parameter is an empirical
quantity that is measurable in principle, and is
part of the system components or construct of a
definition Parameter uncertainty is mainly due
to the result of measurement error and sampling
error For example, the misclassification rate of
land cover classification, measured by the so
called error of commission or omission is as good
as the choice of sampling scheme, the systematic
bias introduced by the selection of space-time
boundary conditions, level of precision, and other
parameters internal to the system Moreover,
the selection of parameters may depend on the
degree of variability A high degree of spatial
heterogeneity requires an intensive sampling
scheme across multiple scales Quantitatively,
parameter uncertainty can be modeled by using
probability distribution based on statistical
vari-ance of observed error e.g., Gaussian distribution
can be used to predict the relative abundances of
different magnitudes of error or perform Monte
Carlo simulation to estimate the effect of error
on a digital elevation model (Heuvelink, 1998;
Longley et al., 2001)
Model uncertainty, or sometimes called
in-formative uncertainty (van Asselt, 1999) is due
to limitation in the ability to represent or model
real-world processes with the given data Although both parameter uncertainty and model uncertainty represent the epistemic or subjective aspect of the state of our knowledge, the line between these two types of uncertainties can not be sharply divided, because the choices of the model form have impli-cations for a parameter, and the parameter itself can be the output of complex models (Krupnick
et al., 2006) Many schemes have been developed
to formalize the uncertainty due to limitation of models The probabilistic intolerance to impreci-sion of classical probability theory has led to many alternative formations of uncertainty models For example, traditional classification models such as multi-source classification (Lee & Swain, 1987)
or the so-called maximum likelihood tion (Tso & Mather, 2001) allows no room for expressing modeler’s ignorance in the model construct This has led to new model constructs such as, interval representation in Dempster-Shafer’s evidence theory (Shafer, 1976) where the
classifica-numbers of all possible subsets of the frame of
discernment are candidate classes of belief
func-tion The belief is extracted from the sum of the probability of all the attributes that an object has, and the plausibility is the sum of the probabilities
of all the attributes that the object does not have The uncommitted belief is assigned to the frame
of discernment, thus allowing representation of modeler’s ignorance The evidential reasoning ap-proach has been adopted for multi-source remotely sensed images (Lee & Swain, 1987; Srinivasan
& Richards, 1990; Wilkinson & Megier, 1990) Rough set theory (Pawlak, 1992), a variant from multivalued logic is recently being used to model vagueness and imprecision by using an upper and
a lower approximation Ahlqvist et al (2000) used a rough set-based classification and accuracy assessment method for constructing rough con-fusion matrix In the integration model of rough set theory and evidence theory, the “belief” is extracted from the lower approximation of a set and the “plausibility” from the upper approximation (Skowron & Grzymalla-Busse, 1994) In spatial
Trang 9
Managing Uncertainty in Geospatial Predictive Models
prediction, this approach was further extended by
introducing evidences from spatial neighborhood
contexts (Sikder & Gangapadhayay, 2007) Using
rough–fuzzy hybridization and cognitive theory of
conceptual spaces a parameterized representation
of classes are modeled as a collection of
rough-fuzzy property where an attribute itself can be
treated as a special case of a concept In spatial
classification, the fuzzy approach is mainly used
to provide a flexible way to represent categorical
continua (Foody, 1995) In this approach instead
of explicitly defining concept hierarchies, different
conceptual structures emerge through measures
of concept inclusion and similarity, and fuzzy
categorical data is presented in terms of fuzzy
membership (Cross & Firat, 2000; Robinson,
2003; Yazici & Akkaya, 2000)
Deontological or normative uncertainty is
associated with consequentiality paradigm of
decision or value judgments, e.g., in
multicri-teria classification, risk perception, preference
elicitation There has been extensive research
from behavioral decision theoretic perspective to
understand human judgment under uncertainty
(Tversky et al., 1974) The heuristics that decision
makers use (Kahneman et al., 1982) can lead to
biases in many spatial decision making scenarios,
such as watershed prioritization, location or
facility planning, habitat suitability modeling
Uncertainty may also spring from conflicting
value-laden terms or preference-ordered criteria
(Li et al., 2004; 2005) It could be possible that
preference order induced from a set of attributes
may contradict the assignment of the degree of
risk classes, resulting in potential paradoxical
inference Pöyhönen & Hämäläinen (2001) showed
that the use of weights based on the rank order
of attributes can only easily lead to biases when
the structure of a value tree is changed While
it is difficult to extract complete preferential
information, research is going on to work with
information-gap uncertainty in preferences by
using graph model for conflict resolution
(Ben-Haim & Hipel, 2002)
4 c onc Lus Ion
Uncertainty in spatial predictive and tory system is an endemic and multi-faceted aspect Recognition and agreement of appropriate characterization and definition of typologies of uncertainty semantics are prerequisite to efficient handling and management This article charac-terizes the objective, subjective and normative aspect of uncertainty It specifically differentiates uncertainty resulting from lack of knowledge and objective variability or intrinsic properties
classifica-of spatial systems Various new directions classifica-of uncertainty handling mechanism are discussed While currently there are many promising direc-tions of research in managing different types
of uncertainty, a new paradigm is required in spatial analysis that is fundamentally driven by the consideration of uncertainty
Gap Uncertainty in Preferences Applied
Math-ematics and Computation, 126, 319-340.
Burrough, P., & Frank, A (1996) Geographic
Objects with Indeterminate Boundaries London:
Taylor and Francis
Cross, V., & Firat, A (2000) Fuzzy objects for
geographical information systems Fuzzy Sets
and Systems, 113, 19–36.
Cullen, A C.,& C Frey, H (1999) Probabilistic Techniques in Exposure Assessment: A Handbook for Dealing with Variability and Uncertainty
in Models and Inputs New York, NY: Plenum Press
Trang 10Duckham, M., Mason, K., Stell, J., & Worboys,
M F (2001) A formal approach to imperfection
in geographic information Computers,
Environ-ment and Urban Systems, 25, 89-103.
Ferson, S., & R Ginzburg , L (1996) Different
Methods Are Needed to Propagate Ignorance and
Variability Reliability Engineering and Systems
Safety, 54, 133-144.
Finkel, A M (1990) Confronting Uncertainty
in Risk Management: A Guide for Decision
Mak-ers Washington, DC: Resources for the Future,
Center for Risk Management
Fisher, P (1997) The pixel: a snare and a
delu-sion International journal of Remote Sensing,
18(3), 679-685.
Fisher, P F (2000) Sorites paradox and vague
geographies Fuzzy Sets and Systems, 113, 7-18.
Foody, G M (1995) Land cover classification by
an artificial neural network with ancillary
infor-mation International Journal of Geographical
Information Systems, 9(5), 527-542.
Haimes,Y Y (2004) Risk Modeling, Assessment,
and Management Hoboken, NJ: Wiley.
Han, J., & Kamber, M (2006) Data Mining
Concepts and Techniques Boston: Morgan
Kaufmann
Heuvelink, G (1998) Error Propagation in
En-vironmental Modeling with GIS London: Taylor
and Francis
Kahneman, D., Slovic, P., & Tversky, A (1982)
Judgment under Uncertainty: Heuristics and
Bi-ases Cambridge: Cambridge University Press.
Kallimanis, A S., Sgardelis, S P., & Halley, J M
(2002) Accuracy of fractal dimension estimates
for small samples of ecological distributions
Landscape Ecology, 17(3), 281-297.
Klir, G., & Yuan, B (1995) Fuzzy Sets and
Fuzzy Logic: Theory and Applications: Pearson
Krupnick, A., Morgenstern, R., Batz, M.,
Nel-son, P., Burtraw, D., Shih, J., et al (2006) Not a
Sure Thing: Making Regulatory Choices under Uncertainty: U.S EPA.
Lee, R T., & Swain, P H (1987) Probabilistic and evidential approach for multisource data
analysis IEEE Transactions on Geoscience and
Remote Sensing, 25, 283-293.
Li, K W., Hipel, K W., Kilgour, D M., & Noakes,
D (2005) Integrating Uncertain Preferences into Status Quo Analysis with Applications to
an Environmental Conflict Group Decision and
Negotiation, 14(6), 461-479.
Li, K W., Hipel, K W., Kilgour, D M., & Fang,
L (2004) Preference Uncertainty in the Graph
Model for Conflict Resolution IEEE
Transac-tion on Systems, Man, and Cybernetics- Part A: Systems and Humans, 34(4 July).
Linkov, I., & Burmistrov, D (2003) Model certainty and Choices Made by Modelers: Lessons Learned from the International Atomic Energy
Un-Agency Model Intercomparisons Risk Analysis
23(6), 1297–1308.
Longley, P A., Goodchild, M F., Maguire, D
J., & Rhind, D W (2001) Geographic
Informa-tion Systems and Science Chichester, UK: John
Wiley & Sons
Morgan, M G (1998) Uncertainty
Analy-sis in Risk Assessment Human and
Ecologi-cal Risk Assessment 4(1), 25–39.
Nychka, D., Wikle, C., & Royle, J A (2001) Multiresolution models for nonstationary spatial
covariance functions Statistical Modelling, 2(4),
315-331
Trang 11
Managing Uncertainty in Geospatial Predictive Models
Openshaw, S (1984) The modifiable areal unit
problem In Concepts and Techniques in
Mod-ern Geography (Vol 38) Norwich,UK: Geo
Books
Pawlak, Z (1992) Rough sets: a new approach to
vagueness New York: John Wiley & Sons.
Pöyhönen, M., & Hämäläinen, R P (2001) On
the Convergence of Multiattribute Weighting
Methods European Journal of Operational
Re-search, 129(3), 569-585.
Regan, H M., M Colyvan, & A Burgman, M
(2002) A Taxonomy and Treatment of
Uncer-tainty for Ecology and Conservation Biology
Ecological Applications, 12(2), 618–628.
Robinson, V B (2003) A perspective on the
fundamentals of fuzzy sets and their use in
geographic information systems Transactions
in GIS, 7, 3–30.
Shafer, G (1976) A Mathematical Theory of
Evi-dence New Jersey: Princeton University Press.
Shekhar, S., & Chawla, S (2003) Spatial
Data-bases A Tour New Jersey: Prentice Hall.
Sikder, I., & Gangapadhayay, A (2007) Managing
Uncertainty in Location Services Using Rough
Set and Evidence Theory Expert Systems with
Applications, 32(2), 386-396.
Skowron, A., & Grzymalla-Busse, J (1994) From
rough set theory to evidence theory In R Yager,
M Fedrizzi & J Kacprzyk (Eds.), Advances in
the Dempster-Shafer Theory of Evidence (pp
192-271) New York: John Wiley & Sons, Inc
Srinivasan, A., & Richards, J (1990)
Knowledge-based techniques for multi-source classification
International Journal of Remote Sensing, 11,
505-525
Tso, B., & Mather, P (2001) Classification
Methods for Remotely Sensed Data New York:
Taylor & Francis
Tversky, A., & Kahneman, D (1974) Judgment
under Uncertainty: Heuristics and Biases Science
185, 1124–1131.
van Asselt, M (1999) Uncertainty in Decision
Support: From Problem to Challenge Maastricht,
The Netherlands: University of Maastricht, national Centre for Integrative Studies (ICIS).Wilkinson, G G., & Megier, J (1990) Evidential reasoning in a pixel classification hierarchy - a potential method for integrating image classifiers and expert system rules based on geographic
Inter-context International Journal of Remote Sensing,
11(10), 1963-1968.
Worboys, M (1998) Imprecision in Finite
Resolution Spatial Data Geoinformatica, 2(3),
257-280
Worboys, M F., & Clementini, E (2001)
Integra-tion of imperfect spatial informaIntegra-tion Journal of
Visual Languages and Computing, 12, 61-80.
Yazici, A., & Akkaya, K (2000) Conceptual modeling of geographic information system applications In G Bordogna & G Pasi (Eds.),
Recent Issues on Fuzzy Databases (pp 129–151)
Heidelberg, New York
key t er Ms
Frame of Discernment: The set of all the
pos-sible sets of the hypotheses or class categories
Imprecision: Lack of specificity or lack of
detail in a representation
Inaccuracy: Lack of correlation between
observation and representations of reality
Indiscernibility: A type of imprecision
re-sulting from our inability to distinguish some elements in reality
Mixel: A specialized type of pixel whose area
is subdivided among more than one class
Trang 12Nonspecifity: A form of uncertainty which
implies lack of specificity in evidential claims
which can be represented as function of
cardinal-ity of undiscerned alternatives
Vagueness: A special type of imprecision that
represents borderline cases of a concept
Trang 130
Chapter XLII
Geographic Visual Query
Languages and Ambiguities
Consiglio Nazionale delle Ricerche, IRPPS, Italy
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
The main issues of spatial databases and Geographic Information System (GIS), concern the tation, the management and the manipulation of a large and complex number of spatial objects and spatial relationships In these systems many concepts are spatial and, therefore they are intrinsically related with a visual representation, which makes also easier to formulate queries by non-expert users The main problems in visual query languages for spatial databases concern imprecision, spatial integ- rity and ambiguities in query formulation Our concern in this chapter is with the ambiguity of visual geographical queries In particular, a review of existing visual query languages for spatial databases and their classification on the grounds of the methodology adopted to resolve the ambiguity problem are provided.
Trang 14represen-Introduct Ion
Spatial databases and Geographic Information
System (GIS) represent, manage and manipulate
a large and complex number of spatial objects
and spatial relationships
Visual queries for spatial databases can be
expressed using one of the following four
ap-proaches The first approach uses predefined
icons to retrieve pictorial information Examples
of languages that use this approach are: Cigales
(Calcinelli and Mainguenaud, 1994), the language
defined by Lee and Chin in (1995) and the
card-based language proposed by Ju et al (2003) The
second approach specifies spatial relationships by
freehand drawing Sketch (Meyer, 1993)
Spatial-Query-By-Sketch (Egenhofer, 1997; Blaser and
Egenhofer, 2000), VISCO (Wessel and Haarslev,
1998), GeoPQL (Ferri and Rafanelli, 2005; Ferri
et al., 2004) and, finally, the language proposed
by Erwig and Schneider (2003), belong to this
approach The third approach uses symbolic
images for representing a set of objects and a set
of spatial relations among them Languages that
belong to this approach are Pictorial Query By
Example (Papadias and Sellis, 1995), SVIQUEL
(Kaushik and Rundensteiner, 1997), and the
lan-guage proposed by Rahman et al (2005) Finally,
the fourth approach combines text and sketching
in a hybrid solution, such as the language proposed
by Szmurlo et al (1998)
The main problems in visual query languages
for spatial databases concern imprecision, spatial
integrity (Favetta & Laurini, 2001) and
ambi-guities in query formulation Some authors have
proposed solutions to resolve ambiguities For
example, Favetta and Aufaures-Portier (2000)
proposed a taxonomy for classifying different
types of ambiguity during query formulation
They state that the best solution for ambiguities
is a hybrid language (textual and visual) with a
more intensive dialog between user and system
Lbath et al (1997) have proposed to resolve ambiguities through a standard for semantics using specific menus They argue that it is possible to define the interpretation for the query and suggest
a hybrid visual language named Aigle-Cigales, in which the system works with default semantics Details should be explicitly mentioned through
a specific contextual menu or by textual format.Since ambiguity can represent a restriction for visual languages, it is very interesting to analyze several language proposals and classify them according to the methodology used to resolve the problem of ambiguity A first group of lan-guages, such as Pictorial Query By Example and SVIQUEL, faces ambiguity by allowing the use
of a few operators and/or spatial relationships A second group disambiguates language through the use of actions in query formulation by modifying the query semantics The iconic language defined
by Lee and Chin (1995) belongs to this group
A third group of languages tries to increase the user’s ability to formulate more complex queries
by the use of several operators without facing the ambiguity problem Languages that belong to this group are Cigales and LVIS A fourth group of languages proposes approximate solutions as well
as the exact answer to the query enabling the user
to select what he/she requires Sketch and Spatial Query By Sketch are part of this category Finally,
a fifth group of languages, such as GeoPQL, resolves ambiguities by introducing special new operators to manage them
This chapter is structured as follows Section
2 gives a brief overview of the approaches used for the definition of visual querying for spatial databases Section 3 illustrates problems about ambiguity treatment in these kinds of visual languages In section 4 a classification of differ-ent languages on the grounds of methodology adopting to resolve the problem of ambiguity is proposed Section 5 presents some future perspec-tives on the growth of visual languages for spatial databases and conclusions
Trang 15
Geographic Visual Query Languages and Ambiguities Treatment
vIsu AL Quer y LAngu Ages for
spAt IAL dAt AbAses
Several proposals of visual languages for
geo-graphic data exist in the literature The following
discussion expressed in detail the classification
of the languages presented in the introduction
To conceptually represent geographic objects,
different visual query languages consider three
types of symbolic graphical objects (SGO): point,
polyline and polygon
The first approach uses predefined icons for
retrieving pictorial information The shortcoming
of this method is that the predefined icons do not
have strong expressive power, and the consequent
query capability is limited
Among languages based on this approach,
there is Cigales (Calcinelli and Mainguenaud,
1994), which allows the user to draw a query
It is based on the idea of expressing a query by
drawing the pattern corresponding to the result
desired by the user To achieve this it uses a set
of icons that model the geometric objects,
poly-line and polygon (point is not considered), and
the operations carried out on these objects (e.g.,
intersection, inclusion, adjacency, path and
dis-tance) Symbolic graphical objects and icons that
conceptualize the operators are predefined
Another language, defined by Lee and Chin
(1995), allows the user to compose a query utilizing
the three symbolic graphical objects: rectangle,
line and point These SGOs can compose an iconic
sentence whose meaning is due to topological
relations among the icons of the query In this
language the user draws a new SGO and can set
the state of all of the previously drawn SGOs to
the foreground or background SGOs for which
the relationships with the new SGO have to be
considered, must be placed in the foreground,
while those SGOs whose relationships have not
been considered are placed in the background
The card-based language, proposed by Ju
et al (2003), uses the card iconic metaphor for
representing both complex spatial objects and
spatial relationships between them The user can describe his/her query requirements in a visual environment by selecting the appropriate cards and putting them into the proper query boxes The result of the query is also displayed in graphical form
The second approach specifies spatial ships by a freehand drawing First of all there is Sketch! (Meyer, 1993), that allows the user to draw a visual representation of his/her query, as
relation-if on a blackboard, without explicit references to operators to be applied to geographical objects involved in the query In fact, the spatial query is expressed through a sketch on the screen, which
is later interpreted by the system It means that spatial operators are directly derived from the sketch
The Spatial-Query-By-Sketch language (Egenhofer, 1997; Blaser and Egenhofer, 2000), similarly to Sketch!, is based on a formal model for topological spatial relations and a computa-tional model for the constraints relaxation Each query produces a set of candidate interpretations
as result and the user selects the correct one.Another language based on the second ap-proach is VISCO (Wessel and Haarslev, 1998) which considers geometric as well as topological constraints as drawn by the user The system parses the geometry of query sketches and supports the annotation of meta information for specifying relaxations which are additional constraints, or
“don’t cares”, that define the query tion
interpreta-Among the languages based on the second approach there is the GeoPQL (Ferri and Ra-fanelli, 2005; Ferri et al., 2004) that is based on twelve operators The nine traditional topological operators, the distance operator and finally two further operators, ALIAS and ANY, devoted
to solving ambiguities in query interpretation Only the last two operators need to be explicitly expressed, while topological operators are auto-matically deduced from the visual representation
of the query
Trang 16Finally, the proposal of Erwig and Schneider
(2003) consists in a language devoted to
analyz-ing two-dimensional traces of movanalyz-ing objects
to infer a temporal development of their mutual
spatial relationships
The third approach uses symbolic images for
representing a set of objects and a set of spatial
relations among them For instance, Pictorial
Query-by-Example (PQBE) (Papadias and Sellis,
1995) uses symbolic images to find directional
relationships This language considers a symbolic
image as an array that could correspond to visual
scenes, geographical maps or other forms of
spa-tial data The main limitation of PQBE is that it
considers directional relationships only
An evolution of PQBE is SVIQUEL (Kaushik
and Rundensteiner, 1997) that also includes
topo-logical operators which consider 45 different types
of primitives, allowing to represent topological
and directional relationships between two SGOs
of type polygon
Finally, the proposal by Rahman et al (2005),
allows elicit information requirements through
the interactive choice of wireless web services
To define a visual spatial query language this
approach combines text and sketching in a
hy-brid solution Using this method, users can draw
spatial configurations of the objects they would
like to retrieve from the GIS, while the textual
part permits the specification of the geographical
semantics An example of this approach is
repre-sented by the language proposed by Szmurlo et al
(1998) This language allows the user to draw the
configuration of the objects he/she is interested
in, and thus defines spatial constraints between
these objects Any geographic object can be
classified as a zone, a line or a point Moreover,
constraints written in natural language are
col-lected in labels that are graphically connected to
the object they refer to
Quer y’s AMbIgu Ity prob LeM
The main problems of visual query languages for spatial databases concern imprecision, spa-tial integrity and ambiguities in query formula-tion Favetta and Aufaure-Portier (2000) details problems due to topological imprecision and integrity in spatial relations Some researchers have proposed solutions to resolve ambiguities For example, Favetta and Aufaures-Portier (2000) have suggested a taxonomy for classifying dif-ferent kinds of ambiguity that can be produced during the query formulation They state that the best solution to resolve ambiguities is a hybrid language (textual and visual) with a dialog more intense between user and system
Another proposal to resolve the ambiguities has been introduced by Lbath et al (1997) They proposed standard semantics, through specific menus, that make interpretation of the query possible They propose a hybrid visual language, Aigle-Cigales, in which the system works with default semantics and details are mentioned ex-plicitly through a specific contextual menu or by textual format
Moreover, Carpentier and Mainguenaud (2002) distinguish two ambiguities: visual ambiguity that appears when a given visual representation of a query corresponds with several interpretations, and selection ambiguity that appears when sev-eral metaphors correspond to a given selection
To reduce these problems the authors propose
a drawing process with a grammar for tion: the more powerful the grammar, the more important the level of ambiguity A compromise needs to be found
interac-Visual languages offer an intuitive and mental view of spatial queries, but often times they may offer different interpretations of the same query Among the reasons for multiple query interpretations, the most important is that the user has a different intention in formulating
Trang 17incre-
Geographic Visual Query Languages and Ambiguities Treatment
his/her query with respect to the analysis that the
system makes of it Moreover, when the user draws
two icons for representing different objects of a
sentence, he/she can avoid defining one or more
spatial relations between them Then the system
can formulate different interpretations to obtain
the required result
For example, suppose the user formulate the
following query: “Find all the regions which pass
through a river and overlap a forest” The user
is not interested in the relationship between the
river and the forest and the absence, in natural
language (NL) formulation, of explicit
relation-ships between them produces an ambiguity The
different visual queries of Figure 1 represent the
query in natural language
To remove the ambiguity, the complete natural
language query, “Find all the regions which are
passed through by a river and overlap a forest,
irrespectively of the topological relationships
between the river and the forest”, could be
con-sidered However, when the user draws an SGO
representing a forest and another representing a
river he/she cannot avoid representing a
topologi-cal relationship between them
Since ambiguity can constitute a restriction
of visual language, in the following section we
analyze methodologies adopted by different
lan-guages to resolve this problem
vIsu AL spAt IAL Quer y LAngu Ages And the AMbIgu Ity prob LeM
In section 2 we presented visual query languages for spatial databases that have been proposed in the literature Now we classify these languages
on the grounds of methodologies they adopt to resolve the problem of ambiguity
A first group of languages handles the biguity by allowing the use of few operators or spatial relationships, such as Pictorial Query By Example and SVIQUEL Considering only limited kinds of spatial relations (directional relations) PQBE avoids multiple interpretations of the query but reduces the possibility of formulating more complex queries that involve topological relationships The SVIQUEL language also in-cludes topological operators However, it avoids multiple interpretations by limiting the number
am-of objects involved (to just two) and provides a tool with a low expressive power for specifying the relative spatial positions
A second group disambiguates the language
by the use of actions in query formulation that modify query semantics The iconic language defined by Lee and Chin (1995) belongs to this group In this language it is possible to remove undesired relationships among drawn symbolic graphical objects or impose an a priori restrictive interpretation using the foreground/background
Figure 1 Visual queries for the same NL query
Region
Forest
Trang 18metaphor The relationships of a new symbolic
graphical object depend on the state (foreground
or background) of the previously drawn symbolic
graphical objects To interpret a query the parser
must consider both the visual representation and
the drawing process In this manner some
proce-dural steps influence the semantics of the query
they do not influence its representation Queries
having the same representation may have
differ-ent semantics Another language that belongs to
the second group is VISCO This language offers
tools that specify meta-information resembling
the user’s idea of the interpretation This approach
demands more skill from the user but makes the
intended interpretation explicit
A third group of language tries to increase a
user’s possibility of formulating a more complex
query by the use of several operators without
facing explicitly the ambiguity problem A
lan-guage that belongs to this group is Cigales In this
language the system is not able to give a unique
interpretation of the visual query
representa-tion Two possible solutions proposed to reduce
ambiguity of Cigales are: to introduce various
interactions (feedback) with the user and to
in-crease the complexity of the resolution model
However, different obstructions may arise so that
the semantics of the query are fully user
depen-dent and complex queries with numerous basic
objects, are not expressible Another language
of the third group is that proposed by Szmurlo
et al (1998) This language allows the user to
draw the configuration of the objects and thus to
define spatial constraints between these objects
and thematic constraints that each object has to
respect However, as this language allows the user
to have great freedom, many ambiguities may arise
due to the incoherencies between the object and
the thematic constraints, between constraints for
the same object or between constraints for
dif-ferent objects A solution to ambiguity consists
of detection of incoherencies and proposal of a
possible solutions to the user
A fourth group of languages proposes to give query approximation solutions to the user and the user then selects what he/she requires Sketch and Spatial Query By Sketch are part of this category In particular, Spatial-Query-By-Sketch resolves the ambiguity problem by considering and proposing to the user both the exact solution
of the query, if possible, and other approximate solutions obtained by relaxing some relationships
In this manner the language includes multiple interpretations in the result, and the user selects the representation that provides a correct inter-pretation of his/her query
Finally, a fifth group of languages resolves the ambiguity problem by introducing new special operators that serve to manage the ambiguity Among the languages belonging to this group there
is GeoPQL, which allows the user to represent only the desired relationships The system interprets the query considering all relationships between symbolic graphical objects of the sketch, and it
is possible to remove or modify some undesired relations using ad hoc operators introduced in the language Figure 2 shows the few examples of visual queries represented by using some of the languages introduced in this article
c onc Lus Ion
The number of applications using spatial or graphic data has been ever increasing over the last decade New small-scale GISs, often called desktop GIS, are gradually becoming available and people are becoming familiar with using the web to access remote information An important future direction is to provide the existing system through the web by using standards created by the OpenGIS consortium such as WMS (Web Map Server) and GML (Geographic Markup Language) In particular, GML is an open stan-dard for encoding geographic information in
geo-an eXtensible Markup Lgeo-anguage (XML) It is not related to any specific hardware or software
Trang 19
Geographic Visual Query Languages and Ambiguities Treatment
platform Any data encoded using it can be
eas-ily read and understood by any programming
language and software system able to parse XML
streams Moreover, there are commercial and
open source tools that translate GML data into
the Scalable Vector Graphics (SVG) format in
order to display maps
Consequently, the World Wide Web diffusion
and the increasingly widespread usage of mobile
devices has made possible to query and to access
geographical databases available online from
mo-bile devices There are mainly two drawbacks in
using GML for mobile devices First, it is memory
and bandwidth-consuming for storage and transfer
respectively Moreover, maps described with it
have to be projected and scaled before being
plot-ted These characteristics make GML not directly
accessible with small devices It is necessary to
reduce GML size in order to make cartographic
data accessible from mobile devices
Another future perspective concerns the
repre-sentation of dynamic phenomena that change over
space and time in GISs While current GIS can provide snapshot views at discrete time intervals, they fall short in providing an ability to link the process models with data from multiple sources
or simulate scenarios of change for users
r eferences
Blaser, A D., & Egenhofer, M J (2000) A Visual
Tool for Querying Geographic Databases
Ad-vanced Visual Interfaces – AVI 2000 Palermo,
Italy: ACM Press (pp 211-216)
Calcinelli, D., & Mainguenaud, M (1994) Cigales,
a visual language for geographic information
system: the user interface Journal of Visual
Languages and Computing, 5(2), 113-132
Carpentier, C., & Mainguenaud, M (2002) sifying Ambiguities in Visual Spatial Languages
Clas-GeoInformatica, 6(3), 285-316.
Figure 2 The same visual query represented with different languages
Trang 20Egenhofer, M J (1997) Query Processing in
Spatial-Query-by-Sketch Journal of Visual
Lan-guages and Computing, 8(4), 403-424
Erwig, M., & Schneider, M (2003) A visual
language for the evolution of spatial relationships
and its translation into a spatio-temporal calculus
Journal of Visual Languages and Computing
Elsevier, 14, 181–211.
Favetta, F., & Laurini, R (2001) About
Preci-sion and Integrity in Visual Query Languages
for Spatial Databases Proceedings of the 7th
International Conference on Database Systems
for Advanced Applications (DASFAA 2001) IEEE
Computer Society, (pp 286-293)
Favetta, F., & Aufaure-Portier, M (2000) About
Ambiguities in Visual GIS Query Languages: a
Taxonomy and Solutions Proceedings of the 4th
International Conference on Advances in Visual
Information Systems LNCS Springer-Verlag
Publications, LNCS 1929, (pp 154-165)
Ferri, F., Grifoni, P., & Rafanelli, M (2004)
XPQL: A pictorial language for querying
geographic data Databases and Expert Systems
Applications (DEXA 2004) LNCS 3180,
Springer-Verlag Publications (pp 925-935)
Ferri, F., & Rafanelli, M (2005) GeoPQL: A
Geographical Pictorial Query Language that
resolves ambiguities in query interpretation
Journal of Data Semantics LNCS 3534, 3, 50-80
Springer-Verlag Publications
Ju, S., Guo, W., & Hernández, H J (2003) A
Card-based Visual Query System for
Geographi-cal Information Systems Scandinavian Research
Conference on Geographical Information Science
(pp 62-74)
Kaushik, S., & Rundensteiner, E (1997)
SVIQUEL: A Spatial Visual Query and
Explora-tion Language, Databases and Expert Systems
Applications (DEXA 1998), LNCS 1460,
Springer-Verlag Publications (pp 290-299).
Lbath, A., Aufaure-Portier, M., & Laurini, R
(1997) Using a Visual Language for the Design
and Query in GIS Customization International
IEEE Conference on Visual Information Systems
San Diego, CA (pp 197-204)
Lee, Y C., & Chin, F (1995) An Iconic Query Language for Topological Relationship in GIS
International Journal of geographical tion Systems, 9(1), 25-46.
Informa-Meyer, B (1993) Beyond Icons: Towards New Metaphors for Visual Query Languages for Spatial
Information Systems International Workshop
on Interfaces to Database Systems Glasgow,
Scotland, (pp 113-135)
Papadias, D., & Sellis T (1995) A Pictorial
Query-by-Example Language Journal of Visual
Languages and Computing, 6(1), 53-72.
Rahman, S A., Bhalla, S., & Hashimoto, T (2005) Query-By-Object Interface for Information Re-
quirement Elicitation Implementation Fourth
International Conference on Mobile Business (ICMB2005) IEEE Computer Society, Sydney,
Wessel, M., & Haarslev, V (1998) VISCO:
Bringing Visual Spatial Querying to Reality
Proceedings of the IEEE Symposium on Visual
Languages IEEE Computer Society, Halifax,
Canada, (pp 170-179)
key t er Ms
eXtensible Markup Language (XML): is
a W3C-recommended general-purpose markup language for creating special-purpose markup languages, capable of describing many different
Trang 21
Geographic Visual Query Languages and Ambiguities Treatment
kinds of data In other words, XML is a way of
describing data
Geographical Database: A database in which
geographic information is store by x-y
coordi-nates of single points or points which identify the
boundaries of lines (or polylines, which sometimes
represent the boundaries of polygons) Different
attributes characterize the objects stored in these
databases In general the storing structure consists
of “classes” of objects, each of them implemented
by a layer Often a geographic databases include
raster, topological vector, image processing, and
graphics production functionality
Geographical Information System (GIS):
A computerized database system used for the
capture, conversion, storage, retrieval, analysis
and display of spatial objects
Geography Markup Language (GML): is
the XML grammar defined by the Open
Geo-spatial Consortium (OGC) to express geographic
features GML serves as a modeling language
for geographic systems as well as an open
in-terchange format for geographic transactions on
the Internet
Icon: Small pictures that represent commands,
files, objects or windows
Metaphor: Figurative language that creates
an analogy between two unlike things A phor does not make a comparison, but creates its analogy by representing one thing as something else
meta-Scalable Vector Graphics (SVG): SVG is a
language for describing two-dimensional graphics and graphical applications in XML
Visual Query language: A language that
allows the user to specify its goals in a two-(or more)-dimensional way with visual expressions
- spatial arrangements of textual and graphical symbols
Trang 230
Chapter XLIII
GeoCache:
A Cache for GML Geographical Data
PRiSM Laboratory, France
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
GML is a promising model for integrating geodata within data warehouses The resulting databases are generally large and require spatial operators to be handled Depending on the size of the target geo- graphical data and the number and complexity of operators in a query, the processing time may quickly become prohibitive To optimize spatial queries over GML encoded data, this chapter introduces a novel cache-based architecture A new cache replacement policy is then proposed It takes into account the containment properties of geographical data and predicates, and allows evicting the most irrelevant values from the cache Experiences with the GeoCache prototype show the effectiveness of the proposed architecture with the associated replacement policy, compared to existing works.
The increasing accumulation of geographical
data and the heterogeneity of Geographical
In-formation Systems (GISs) make difficult efficient
query processing in distributed GIS Novel
archi-tectures (Boucelma, Messid, & Lacroix, 2002; Chen, Wang, & Rundensteiner, 2004; Corocoles
& Gonzalez, 2003; Gupta, Marciano, Zaslavsky,
& Baru, 1999; Leclercq, Djamal, & Yétongnon,
1999;Sindoni, Tininini, Ambrosetti, Bedeschi, De Francisci, Gargano, Molinaro, Paolucci, Patteri,
Trang 24& Ticca, 2001; Stoimenov, Djordjevic-Kajan,
& Stojanovic, 2000; Voisard & Juergens, 1999;
Zhang, Javed, Shaheen, & Gruenwald, 2001) are
based on XML, which becomes a standard for
exchanging data between heterogeneous sources
Proposed by OpenGIS (2003), GML is an XML
encoding for the modeling, transport, and storage
of geographical information including both the
spatial and non-spatial fragments of geographical
data (called features) As stressed in (Savary &
Zeitouni, 2003), we believe that GML is a
promis-ing model for geographical data mediatpromis-ing and
warehousing purpose
By their nature, geographical data are large
Thus GML documents are often of important size
The processing time of geographical queries over
such documents in a data warehouse can become
too large for several reasons:
1 The query evaluator needs to parse entire
documents to find and extract query relevant
data
2 Spatial operators are not cost effective,
especially if the query contains complex
selections and joins on large GML
docu-ments
Moreover, computational costs of spatial
operators are generally more expensive than
those of standard relational operators Thus,
geographical queries on GML documents raise
the problem of memory and CPU consumption
To solve this problem, we propose to exploit the
specificities of a semantic cache (Dar, Franklin,
Jonsson, Srivastava, & Tan, 1996) with an
op-timized data structure The proposed structure
aims at considerably reducing memory space by
avoiding storing redundant values Furthermore,
a new cache replacement policy is proposed It
keeps in cache the most relevant data for better
efficiency
Related works generally focus on spatial data
stored in object-relational databases (Beckmann,
Kriegel, Schneider, & Seeger, 1990) The
pro-posed cache organizations are better suitable for tuple-oriented data structures (Brinkhoff, 2002) Most cache replacement policies are based
on Least Recently Used (LRU) and its variants Other cache replacement policies proposed in the literature (Arlitt, Friedrich, Cherkasova, Dilley, & Jin, 1999; Cao & Irani, 1997; Lorenzetti & Rizzo, 1996) deal with relational or XML databases, but have not yet investigated the area of XML spatial databases
The rest of the chapter is organized as follows: The second section gives an overview of related works In the third section we present our cache architecture adapted for GML geographical data The fourth section discusses the inference rules of spatial operators and presents an efficient replace-ment policy for geographical data considering inference between spatial operators The fifth section shows some results of the proposed cache implementation and replacement policy Finally, the conclusion summarizes our contributions and points out the main advantages of the proposed GML cache-based architecture
r eLAted works Cache Replacement Policy
In the literature, several approaches have been proposed for cache replacement policy The most well known is the Least Recently Used (LRU) (Tanenbaum, 1992) This algorithm replaces the document requested the least recently Rather at the opposite, the Least Frequently Used (LFU) algorithm evicts the document accessed the least frequently A lot of extensions or variations have been proposed in the context of WWW proxy cach-ing algorithms We review some in the sequel.The LRU-Threshold (Chou & DeWitt, 1985) is
a simple extension of LRU in which documents larger than a given threshold size are never cached The LRU-K (O’Neil, O’Neil, & Weikum, 1993) considers the time of the last K references to a
Trang 25
GeoCache
page and uses such information to make
page-replacement decisions The page to be dropped
is the one with a maximum backward K-distance
for all pages in the buffer The Log(size)+LRU
(Abrams, Standbridge, Adbulla, Williams, &
Fox, 1995) evicts the document with the largest
log(size), and apply LRU in case of equality The
Size algorithm evicts the largest document The
Hybrid algorithm aims at reducing the total latency
time by computing a function that estimates the
value of keeping a page in cache This function
takes into account the time to connect with a
server, the network bandwidth, the use frequency
of the cache result, and the size of the document
The document with the smallest function value is
then evicted The Lowest Relative Value (LRV)
algorithm includes the cost and the size of a
document in estimating the utility of keeping it
in cache (Lorenzetti et al., 1996) LRV evicts the
document with the lowest utility value
One of the most successful algorithms is the
Greedy Dual-Size (GD-size) introduced by Cao
et al (1997) It takes into account the cost and
the size of a new object When a new object
ar-rives, the algorithm increases the ranking of the
new object by the cost of the removed object In
the same spirit, the Greedy Dual-Size Frequency
(GDSF) algorithm proposed by Arlitt et al (1999)
takes into account not only the size and the cost,
but also the frequency of accesses to objects As an
enhancement of GDSF, Yang, Zhang, and Zhang
introduce the time factor (2003) Combined to the
Taylor series, it allows predicting the time of the
next access to an object Thus, it provides a more
accurate prediction on future access trends when
the access patterns vary greatly But the main
bottleneck of this approach is the time
consump-tion to recalculate the priority of each object
Spatial Cache Replacement Policy
Most proposed spatial cache replacement policies
are based on variants of LRU and are developed in
the context of relational databases In the area of
spatial database systems, the effect of other replacement strategies has not been investigated except in (Brinkhoff, 2002)
page-Considering a page managed by a spatial base system, one can distinguish three categories
data-of pages (Brinkhdata-off, Horn, Kriegel, & Schneider, 1993): directory pages (descriptors), data pages (classical information), and object pages (storing the exact representation of spatial objects) Using the type-based LRU (LRU-T), first the object pages are evicted, followed by the data pages, and finally by the directory pages Using primitive based LRU (LRU-P), pages are removed from buffer according to their respective priorities If
a tree-based spatial access method is used, the highest priority is accorded from the root to the index directory pages, followed by the data pages, and finally the object pages Thus, the priority of
a page depends on its height in the tree
Let us recall that in GIS jargon, the MBR of
an object is the minimum-bounding rectangle of this object The area of a page of objects is the minimum rectangle including all MBRs of that page The margin is the border of an area Beck-mann et al (1990) and Brinkhoff (2002) define five spatial pages-replacement algorithms based
on spatial criteria:
1 Maximizing the area of a page (A): A page
with a large area should stay in the buffer as long as possible This result from the obser-vation that the larger is the area, the more frequently the page should be requested
2 Maximizing the area of the entries of a
page (EA): Instead of the area of a page, the
sum of the area of its entries (spatial objects)
is maximized
3 Maximizing the margin of a page (M):
The margin of a page p is defined as the margin of the MBR containing all entries
of p The larger a page margin is, the longer
it will stay in the buffer
4 Maximizing the margin of the entries
of a page (EM): Instead of the margin of
Trang 26a page p, that of the composing MBRs are
considered
5 Maximizing the overlaps between the
entries of a page (EO): This algorithm tries
to maximize the sum of the intersection
areas of all pairs of entries with overlapping
MBRs
As a synthesis, Brinkhoff (2002) proposes
a combination of LRU-based and spatial
page-replacement algorithms To evict a document, a
set of victim candidates is determined using the
LRU strategy Then, the page to be dropped out
of the buffer is selected from the candidate set
using a spatial page replacement algorithm The
page dropped by this selection is placed in an
overflow buffer, where a victim is evicted using the
FIFO strategy Depending on its spatial and LRU
criteria, a requested page found in the overflow
buffer is moved to the standard part of the buffer,
influencing the size of the candidate set
Buffer cache techniques are mainly used in
spatial database systems in order to optimize
queries response time The work conducted by
Brinkhoff uses a spatial index for better
man-agement of the buffer cache However, there is
no spatial index for GML documents, as they
are encoded in XML Hence, the spatial criteria
mentioned above could not be applied Other
cri-teria must be considered to handle geographical
queries Moreover, semantic cache gives better
performances than page or tuple replacement
strategies (Dar et al., 1996), but until now, it has
not been really studied for geographical queries
where data are stored in XML
Cache Structure for Geographical
Queries in GML
Generally, spatial data consume a lot of memory
space Hence, caching spatial objects has a
ten-dency to flood the available space in cache For
example in a spatial join query, a spatial object A
can match with several objects B1, B2, etc Thence,
a same object A can be replicated many times
in spatial query results This may considerably reduce the available space in cache, especially when a large amount of spatial fragments must
be stored
To avoid spatial object replication in cache, we propose a simple data structure, which facilitates object identification and isolation This structure
is divided into two parts The first is devoted
to the non-spatial elements of the geographical data The second one contains non-redundant spatial fragments of geographical data (i.e., only distinct spatial objects are stored in cache) In semantic cache, the semantic region is divided into two parts (Chidlovskii, Roncancio, & Sch-neider, 1999): the region descriptor describing each query result stored in cache, and the region content where the data are stored In the case of geographical queries, we introduce two kinds of
region content: the non-spatial region content, and the spatial region content.
The spatial region content contains redundant spatial data of geographical query results, whereas the non-spatial region content contains non-spatial data of geographical query results These region contents are associated with the geographical region descriptor It con-tains information about each geographical query stored in cache The cache is then divided into
non-two parts (see Figure 1): (i) the non-spatial part
of the cache composed of the non-spatial region content and it associated description contained in
the geographical region descriptor (see the third section); (ii) the spatial part of the cache composed
of the spatial region contents and it associated description contained in the geographical region
descriptor (see the third section)
non-spatial part of the c ache
All data are encoded in XML For simplicity and standard enforcement, we encode XML data as DOM trees More compact structures are possible, but it would not change the relative results and