InformatIon ScIence Reference Part 8 docx

Abstr Act The main issues of spatial databases and Geographic Information System GIS, concern the tation, the management and the manipulation of a large and complex number of spatial obj

Trang 1

Virtual Environments for Geospatial Applications

Critical Issues in the Design and

Implementation of Geospatial Virtual

environments

This section concisely discusses some limitations

and constraints typically experienced in several

virtual world generations as well One noteworthy

issue is that in visualizing real-world scenarios,

there is an inevitable trade-off amid performance

and resolution Exploiting the complete

capabili-ties of virtual environments over the Web

contin-ues to pose problems As the number of objects in

a virtual environment increases, online hosting

becomes an issue as spontaneous rendering of

numerous objects is no easy task Scenes with a

greater number of polygons decelerate the system

and make the interactivity poor Several factors need to be considered during visualization such

as the type and volume of data to be visualized, memory constraints, and system performance Table 2 presents a summary of the significant issues concerning geo-virtual environments In their work on information visualization, Robert-son et al (1993) have presented a terse compilation

of the important issues

In his work on dynamic and interactive based visualizations, Huang and Lin (1999, 2001, and 2002) discuss in detail some of these concerns and also address some critical issues concerning online hosting of interactive visualizations The Java-3D based hybrid method that Huang and Lin (1999, 2001) propose offers a standard framework

web-Figure 8 a) 3D virtual environment depicting geospatial processes (1 picture of a series) such as scape change over time etc.; b) 3D virtual environment depicting water flow in a reservoir

land-Table 2 A summary of critical issues in designing and implementing 3D virtual worlds

Photo-realistic scene generations

Generating 3D virtual e nvironments w ith

adequate photo-realism

Bandwidth Limitations

3D Scenes with n umerous objects, rendering

difficulty, and transmission speed

Browser and Plug-in Compatibility

Compatibility a mong v arious b rowsers as well

as plug-ins

User Navigation capabilities

Users n eed skills to n avigate and situate themselves within immersive virtual worlds

Lag in real-time interaction

Complex Scenes n ot o nly take t ime to r ender, b ut also cause delays/lags during navigation/interaction

Data Integrity and online security issues

Sensitive data must be p rotected, and the data represented by such 3D worlds should be up-to-date

Trang 2

for visualizing dynamic environmental processes

Figure 9 illustrates a 3-tier configuration that

Huang and Lin (1999) proposed in GeoVR The

visualization server that is interlinked to the spatial

database accesses the geospatial information from

the data repository and the web server accesses

the visualization server for 3D information This

framework efficiently handles requests for

visual-izing dynamic processes and based on the client

requests, the web server provides the appropriate

information in the conventional HTML or 3D

VRML format

dIscuss Ion And c onc Lus Ion

Over the past several decades, information

presen-tation has inspired the development of several new

tools and techniques The information revolution

has resulted in vast amounts of data that are far

too complex, both in quality and quantity, to be

handled by conventional tools and techniques

Recent technological advances in the realm of

remote sensing have dramatically increased the

amount of geospatial data available Virtual

en-vironments are an efficient means of visualizing

voluminous geospatial data and are efficient in elucidating the intricate patterns as well as hidden and associated information Such virtual environ-ments facilitate understanding of the complex relationships among the various components of

a multi-level scenario

This paper discussed the design and mentation of virtual worlds that can be used to generate both static representations depicting real-world settings and dynamic representations that can simulate geospatial processes and en-vironmental phenomena The paper discussed the generation of such geo-virtual environments with examples and provided explanations as to how such geo-visualization applications facilitate understanding of various geospatial phenomena and environmental processes The fundamental principles underlying the generation of virtual worlds, both static and dynamic, were elaborated and the common issues involved in the generation

imple-of such 3D virtual worlds were discussed more, the issues related to the online hosting of such virtual environments were tersely delineated and possible solutions to frequently encountered problems were provided

Further-Figure 9 Online hosting of interactive visualization (From Huang et al.,1999)

Trang 3

Bonham-Carter, G F (1994) Geographic

Infor-mation Systems for Geoscientists: Modeling with

GIS Pergemon: Oxford (p 398).

Boyd, D S., Lansdown, J., & Huxor, A (1996)

The Design of Virtual Environments SIMA.

Chandramouli, M., Lan-Kun, C., Tien-Yin, C.,

&vChing-Yi, K (2004) Design and

Implementa-tion of Virtual Environments for Visualizing 3D

Geospatial Data TGIS Conference, Oct 28-29

2004

Chandramouli, M., Huang, B., Yin Chou, T.,

Kun Chung, L., & Wu, Q (2006) Design and

Implementation of Virtual Environments for

plan-ning and Building Sustainable Railway Transit

Systems, COMPRAIL July 2006, Prague.

Colin, W (2000) Information Visualization:

Perception for Design Morgan Kaufmann Series

in Interactive Technologies GeoVRML, (www

geovrml.org)

Huang, B., & Lin, H (1999) GeoVR: A

Web-based tool for virtual reality presentation from

2D GIS data Computers & Geosciences, 25(10),

1167-1175

Huang, B., Jiang, B., & Lin, H (2001) An

inte-gration of GIS, virtual reality and the Internet for

spatial data exploration International Journal of

GIS, 15(5): 439-456.

Huang, B., & Lin, H (2002) A Java/CGI approach

to developing a geographic virtual reality toolkit

on the Internet Computers & Geosciences, 28(1),

13-19

Karel, C., & Jiri, Z (n/d) Using VRML for

creat-ing interactive demonstrations of physical models

Department of computer science and Engineering

Czech Technical University

Robertson, G., Card, S., & Mackinlay, J D (1993) Information Visualization Using 3D Interactive

Animation Communications of the ACM, 36,

Sutherland, I E (1965) The ultimate display In

the proceedings of the IFIPS Congress, 2,

506-508 New York City, NY

key t er Ms

Immersion: A Sense of being present within

the virtual world and a ‘sense’ being able to alize objects by being amidst their surroundings and navigating through the world

visu-Node: An entity within the hierarchical scene

structure that represents a group of objects

OpenSource: Source code or computer

soft-ware that is freely offered and is available to the public for building software applications

Scene-Hierarchy: The organization of the

elements of a 3D virtual scene into successive levels, in such a way that the object under which other objects are grouped is called the parent and the grouped objects are called its children When

a parent object is transformed, the children are also transformed

SCRIPT: Program scripts that are used to

perform calculations and return values to the calling programs

Transformation: Operations such as

transla-tion, rotatransla-tion, or scaling involving objects in a virtual environment

Trang 4

Virtual Reality: A three-dimensional visual

immersive setting that facilitates user to navigate

within the scene and perform operations in real

time

Trang 5

Cleveland State University, USA

Abstr Act

Geospatial predictive models often require mapping of predefined concepts or categories with various conditioning factors in a given space This chapter discusses various aspects of uncertainty in predictive modeling by characterizing different typologies of classification uncertainty It argues that understanding uncertainty semantics is a perquisite for efficient handling and management of predictive models

1 spAt IAL pred Ict Ion And

cLA ss If Ic At Ion

Geospatial predictive models entail an array of

analytical techniques of data mining, classical

statistical and geostatistical models that attempt

to predict spatial states and behavior of objects

from a fine set of observations The process of

pre-diction presupposes a set of spatial concepts and

categories to which objects are to be mapped For

example, spatial processes, such as classification

of land cover from satellite image, modeling

for-est fire, propagation of epidemics, and prediction

of urban sprawl require a unifying and common reference of “space” or location where the multiple features of spatial attributes are to be mapped to predefined class labels The prediction of spatial features can be conceived as a process of driving classification schemes in relation to certain spa-tial properties such as neighborhood, proximity, dependency, as well as similarity of non-spatial attributes (Han & Kamber, 2006; Shekhar & Chawla, 2003) In data mining, a classification function is often defined as a mapping function:

Trang 6

: → A

f , where A is the domain of function,

f represents attribute space and C is the set of

class categories

2 uncert AInty In spAt IAL

cLA ss If Ic At Ion

Uncertainty may emerge from ontological

con-straints in classification i.e., from the lack of

specification of what kind of spatial objects

ex-ist, as well as from epistemic limitations which

concern whether such objects are knowable to

subjective schemes, and if so, to what extent

they can be represented in the subjective

frame-work, given the limited empirical evidences

Epistemic uncertainty in spatial classification

emerges due to inadequate representation of

spatial knowledge which is often incomplete,

imprecise, fragmentary, and ambiguous The

at-tributes of spatial objects or evidences suggesting

various conceptual or thematic classes may often

suggest conflicting categories Moreover,

clas-sification labels are dependent on the resolution

of observation and the extent of granularity For

example, the observation of coarser granularity

offers less detail while the clumping of

informa-tion into pixels in remotely sensed images may

prevent sub-pixel entities being distinguished

(Fisher, 1997) The classification of land cover

from satellite image depends not only on a specific

spatial resolution, radiometric resolution and the

corresponding spectral signatures limit predictive

accuracy Therefore, spatial characteristics of a

given observation are indiscernible with respect

to attributes associated with it For example, the

number of vegetation types that can be identified

from an NDVI (Normalized Difference Vegetation

Index) image significantly increases when a very

high radiometric resolution is used Moreover,

in a specific case, a multispectral image may

provide more accuracy than a hyperspectral

image, but such accuracy is of little value if it is

achieved at the cost of less specificity or higher imprecision

3 t ypo Log Ies of cLA ss If Ic At Ion uncert AInty

While there is increasing awareness of certainty, and its aspects and dimensions in predictive as well as classificatory schemes, little agreement exists among experts on how to characterize them Many typologies of uncer-tainty have been suggested from risk analysis perspective, which often overlaps and builds on each other (Ferson & R Ginzburg, 1996; Linkov

un-& Burmistrov 2003; Regan et al., 2002) These typologies make distinctions between variability and lack of knowledge at the parameter and model level However, from the geographic information perspective, the ontological specification of im-perfection of geographic data provides some key vocabularies and taxonomies to deal with spatial uncertainties (Duckham et al., 2001; Worboys & Clementini, 2001) Such ontology distinguishes between inaccuracy (i.e., errors or commission

or omission) and imprecision, which arises from limitations on the granularity of the schema or levels of detail obtainable for an observation under which the observation is made (Worboys, 1998) The concept “vagueness” refers to indeterminate boundary-line cases or “inexact concepts” Classification of geographic objects with in-determinate boundaries offers many challenges (Burrough & Frank, 1996) which emerge from the boundary of many real entities representing natural, social, or cultural phenomena (for exam-ple, forests, mountains, areas ethnic distribution etc.) Since many common geographical concepts are vague (Fisher, 2000), the explicit specifica-tion of vagueness is essential to characterize the classification performance As a special type of vagueness, nonspecifity originates due to our inability to discern the true alternatives among several alternatives in a given context It implies

Trang 7

Managing Uncertainty in Geospatial Predictive Models

cardinality of undiscerned alternatives (Klir &

Yuan, 1995) The larger the set of alternatives,

the higher is the nonspecifity For example, in

a remotely sensed image, a pixel with class type

“forest” and the mean annual temperature > 30C

has less nonspecifity than the pixel labeled only

with “forest” type This is because in the latter

case a pixel can have a large number of possible

variations of “forest” type

Broadly, three major categories of uncertainty

can be identified in dealing with predictive and

classificatory problems: ontological uncertainty,

epistemological uncertainty, and deontological or

normative uncertainty The typology illustrated

in Figure 1 is relevant to mainly geospatial data

and includes many important components and

concept provided in Morgan & Henrion (1998),

Finkel (1990), and Cullen & Frey (1999) and

Haimes (2004) The types presented here is in by

no means mutually exclusive, i.e., some concepts

may subtly overlap each other in a specific context

Ontologically, variability, also known as aleatory

or objective uncertainty, occurs when the object that needs to be classified actually exhibits multi-plicity across space, time and scale An empirical quantity measured in a single point may objec-tively manifest multiple aspects in a collective process For example, land cover classes are not only influenced by seasonal and spatial extent, but also the topographic formation due to self-similar features of geological objects requires specificity

of fractal dimension of a classification scheme The spurious correlation representing the so called of

ecological fallacy resulting from modifiable areal

unit problem (MAUP) (Openshaw, 1984) indicates the requirement of adequate disaggregation in spatial data to be analyzed In image processing, uncertainty often arises due to the assignment of more than one class to a pixel This specialized

type of pixel, often known as mixel, indicates

uncertainty resulting from variability Similarly, variability or the degree of spatial heterogeneity

• Range of risk tolerance

• Deontological/

Figure 1 Types of uncertainty in dealing with geospatial predictive and classificatory systems

Trang 8

is also reflected in the measures of fragmentation

of a landscape The uncertainty stemming from

variability can not be handled by a reductionist

approach, but needs to be managed by a process

of disaggregation of data Measures often used to

manage this kind of uncertainty are: estimating

space-time frequency distribution,

disaggrega-tion by pixel unmixing or decoupling, estimating

entropy as indicator of fragmentation, computing

self-similarity and fractal dimension (Kallimanis

et al., 2002), and multiscale and multiresolution

analysis using wavelet (Kolaczyk et al.,2005;

Nychka et al., 2001)

While the origin of uncertainty due to

vari-ability is objective and ontological in nature,

parameter uncertainty and model uncertainty

reflect the epistemic state or lack of knowledge in

a classificatory scheme Parameter is an empirical

quantity that is measurable in principle, and is

part of the system components or construct of a

definition Parameter uncertainty is mainly due

to the result of measurement error and sampling

error For example, the misclassification rate of

land cover classification, measured by the so

called error of commission or omission is as good

as the choice of sampling scheme, the systematic

bias introduced by the selection of space-time

boundary conditions, level of precision, and other

parameters internal to the system Moreover,

the selection of parameters may depend on the

degree of variability A high degree of spatial

heterogeneity requires an intensive sampling

scheme across multiple scales Quantitatively,

parameter uncertainty can be modeled by using

probability distribution based on statistical

vari-ance of observed error e.g., Gaussian distribution

can be used to predict the relative abundances of

different magnitudes of error or perform Monte

Carlo simulation to estimate the effect of error

on a digital elevation model (Heuvelink, 1998;

Longley et al., 2001)

Model uncertainty, or sometimes called

in-formative uncertainty (van Asselt, 1999) is due

to limitation in the ability to represent or model

real-world processes with the given data Although both parameter uncertainty and model uncertainty represent the epistemic or subjective aspect of the state of our knowledge, the line between these two types of uncertainties can not be sharply divided, because the choices of the model form have impli-cations for a parameter, and the parameter itself can be the output of complex models (Krupnick

et al., 2006) Many schemes have been developed

to formalize the uncertainty due to limitation of models The probabilistic intolerance to impreci-sion of classical probability theory has led to many alternative formations of uncertainty models For example, traditional classification models such as multi-source classification (Lee & Swain, 1987)

or the so-called maximum likelihood tion (Tso & Mather, 2001) allows no room for expressing modeler’s ignorance in the model construct This has led to new model constructs such as, interval representation in Dempster-Shafer’s evidence theory (Shafer, 1976) where the

classifica-numbers of all possible subsets of the frame of

discernment are candidate classes of belief

func-tion The belief is extracted from the sum of the probability of all the attributes that an object has, and the plausibility is the sum of the probabilities

of all the attributes that the object does not have The uncommitted belief is assigned to the frame

of discernment, thus allowing representation of modeler’s ignorance The evidential reasoning ap-proach has been adopted for multi-source remotely sensed images (Lee & Swain, 1987; Srinivasan

& Richards, 1990; Wilkinson & Megier, 1990) Rough set theory (Pawlak, 1992), a variant from multivalued logic is recently being used to model vagueness and imprecision by using an upper and

a lower approximation Ahlqvist et al (2000) used a rough set-based classification and accuracy assessment method for constructing rough con-fusion matrix In the integration model of rough set theory and evidence theory, the “belief” is extracted from the lower approximation of a set and the “plausibility” from the upper approximation (Skowron & Grzymalla-Busse, 1994) In spatial

Trang 9

prediction, this approach was further extended by

introducing evidences from spatial neighborhood

contexts (Sikder & Gangapadhayay, 2007) Using

rough–fuzzy hybridization and cognitive theory of

conceptual spaces a parameterized representation

of classes are modeled as a collection of

rough-fuzzy property where an attribute itself can be

treated as a special case of a concept In spatial

classification, the fuzzy approach is mainly used

to provide a flexible way to represent categorical

continua (Foody, 1995) In this approach instead

of explicitly defining concept hierarchies, different

conceptual structures emerge through measures

of concept inclusion and similarity, and fuzzy

categorical data is presented in terms of fuzzy

membership (Cross & Firat, 2000; Robinson,

2003; Yazici & Akkaya, 2000)

Deontological or normative uncertainty is

associated with consequentiality paradigm of

decision or value judgments, e.g., in

multicri-teria classification, risk perception, preference

elicitation There has been extensive research

from behavioral decision theoretic perspective to

understand human judgment under uncertainty

(Tversky et al., 1974) The heuristics that decision

makers use (Kahneman et al., 1982) can lead to

biases in many spatial decision making scenarios,

such as watershed prioritization, location or

facility planning, habitat suitability modeling

Uncertainty may also spring from conflicting

value-laden terms or preference-ordered criteria

(Li et al., 2004; 2005) It could be possible that

preference order induced from a set of attributes

may contradict the assignment of the degree of

risk classes, resulting in potential paradoxical

inference Pöyhönen & Hämäläinen (2001) showed

that the use of weights based on the rank order

of attributes can only easily lead to biases when

the structure of a value tree is changed While

it is difficult to extract complete preferential

information, research is going on to work with

information-gap uncertainty in preferences by

using graph model for conflict resolution

(Ben-Haim & Hipel, 2002)

4 c onc Lus Ion

Uncertainty in spatial predictive and tory system is an endemic and multi-faceted aspect Recognition and agreement of appropriate characterization and definition of typologies of uncertainty semantics are prerequisite to efficient handling and management This article charac-terizes the objective, subjective and normative aspect of uncertainty It specifically differentiates uncertainty resulting from lack of knowledge and objective variability or intrinsic properties

classifica-of spatial systems Various new directions classifica-of uncertainty handling mechanism are discussed While currently there are many promising direc-tions of research in managing different types

of uncertainty, a new paradigm is required in spatial analysis that is fundamentally driven by the consideration of uncertainty

Gap Uncertainty in Preferences Applied

Math-ematics and Computation, 126, 319-340.

Burrough, P., & Frank, A (1996) Geographic

Objects with Indeterminate Boundaries London:

Taylor and Francis

Cross, V., & Firat, A (2000) Fuzzy objects for

geographical information systems Fuzzy Sets

and Systems, 113, 19–36.

Cullen, A C.,& C Frey, H (1999) Probabilistic Techniques in Exposure Assessment: A Handbook for Dealing with Variability and Uncertainty

in Models and Inputs New York, NY: Plenum Press

Trang 10

Duckham, M., Mason, K., Stell, J., & Worboys,

M F (2001) A formal approach to imperfection

in geographic information Computers,

Environ-ment and Urban Systems, 25, 89-103.

Ferson, S., & R Ginzburg , L (1996) Different

Methods Are Needed to Propagate Ignorance and

Variability Reliability Engineering and Systems

Safety, 54, 133-144.

Finkel, A M (1990) Confronting Uncertainty

in Risk Management: A Guide for Decision

Mak-ers Washington, DC: Resources for the Future,

Center for Risk Management

Fisher, P (1997) The pixel: a snare and a

delu-sion International journal of Remote Sensing,

18(3), 679-685.

Fisher, P F (2000) Sorites paradox and vague

geographies Fuzzy Sets and Systems, 113, 7-18.

Foody, G M (1995) Land cover classification by

an artificial neural network with ancillary

infor-mation International Journal of Geographical

Information Systems, 9(5), 527-542.

Haimes,Y Y (2004) Risk Modeling, Assessment,

and Management Hoboken, NJ: Wiley.

Han, J., & Kamber, M (2006) Data Mining

Concepts and Techniques Boston: Morgan

Kaufmann

Heuvelink, G (1998) Error Propagation in

En-vironmental Modeling with GIS London: Taylor

and Francis

Kahneman, D., Slovic, P., & Tversky, A (1982)

Judgment under Uncertainty: Heuristics and

Bi-ases Cambridge: Cambridge University Press.

Kallimanis, A S., Sgardelis, S P., & Halley, J M

(2002) Accuracy of fractal dimension estimates

for small samples of ecological distributions

Landscape Ecology, 17(3), 281-297.

Klir, G., & Yuan, B (1995) Fuzzy Sets and

Fuzzy Logic: Theory and Applications: Pearson

Krupnick, A., Morgenstern, R., Batz, M.,

Nel-son, P., Burtraw, D., Shih, J., et al (2006) Not a

Sure Thing: Making Regulatory Choices under Uncertainty: U.S EPA.

Lee, R T., & Swain, P H (1987) Probabilistic and evidential approach for multisource data

analysis IEEE Transactions on Geoscience and

Remote Sensing, 25, 283-293.

Li, K W., Hipel, K W., Kilgour, D M., & Noakes,

D (2005) Integrating Uncertain Preferences into Status Quo Analysis with Applications to

an Environmental Conflict Group Decision and

Negotiation, 14(6), 461-479.

Li, K W., Hipel, K W., Kilgour, D M., & Fang,

L (2004) Preference Uncertainty in the Graph

Model for Conflict Resolution IEEE

Transac-tion on Systems, Man, and Cybernetics- Part A: Systems and Humans, 34(4 July).

Linkov, I., & Burmistrov, D (2003) Model certainty and Choices Made by Modelers: Lessons Learned from the International Atomic Energy

Un-Agency Model Intercomparisons Risk Analysis

23(6), 1297–1308.

Longley, P A., Goodchild, M F., Maguire, D

J., & Rhind, D W (2001) Geographic

Informa-tion Systems and Science Chichester, UK: John

Wiley & Sons

Morgan, M G (1998) Uncertainty

Analy-sis in Risk Assessment Human and

Ecologi-cal Risk Assessment 4(1), 25–39.

Nychka, D., Wikle, C., & Royle, J A (2001) Multiresolution models for nonstationary spatial

covariance functions Statistical Modelling, 2(4),

315-331

Trang 11

Openshaw, S (1984) The modifiable areal unit

problem In Concepts and Techniques in

Mod-ern Geography (Vol 38) Norwich,UK: Geo

Books

Pawlak, Z (1992) Rough sets: a new approach to

vagueness New York: John Wiley & Sons.

Pöyhönen, M., & Hämäläinen, R P (2001) On

the Convergence of Multiattribute Weighting

Methods European Journal of Operational

Re-search, 129(3), 569-585.

Regan, H M., M Colyvan, & A Burgman, M

(2002) A Taxonomy and Treatment of

Uncer-tainty for Ecology and Conservation Biology

Ecological Applications, 12(2), 618–628.

Robinson, V B (2003) A perspective on the

fundamentals of fuzzy sets and their use in

geographic information systems Transactions

in GIS, 7, 3–30.

Shafer, G (1976) A Mathematical Theory of

Evi-dence New Jersey: Princeton University Press.

Shekhar, S., & Chawla, S (2003) Spatial

Data-bases A Tour New Jersey: Prentice Hall.

Sikder, I., & Gangapadhayay, A (2007) Managing

Uncertainty in Location Services Using Rough

Set and Evidence Theory Expert Systems with

Applications, 32(2), 386-396.

Skowron, A., & Grzymalla-Busse, J (1994) From

rough set theory to evidence theory In R Yager,

M Fedrizzi & J Kacprzyk (Eds.), Advances in

the Dempster-Shafer Theory of Evidence (pp

192-271) New York: John Wiley & Sons, Inc

Srinivasan, A., & Richards, J (1990)

Knowledge-based techniques for multi-source classification

International Journal of Remote Sensing, 11,

505-525

Tso, B., & Mather, P (2001) Classification

Methods for Remotely Sensed Data New York:

Taylor & Francis

Tversky, A., & Kahneman, D (1974) Judgment

under Uncertainty: Heuristics and Biases Science

185, 1124–1131.

van Asselt, M (1999) Uncertainty in Decision

Support: From Problem to Challenge Maastricht,

The Netherlands: University of Maastricht, national Centre for Integrative Studies (ICIS).Wilkinson, G G., & Megier, J (1990) Evidential reasoning in a pixel classification hierarchy - a potential method for integrating image classifiers and expert system rules based on geographic

Inter-context International Journal of Remote Sensing,

11(10), 1963-1968.

Worboys, M (1998) Imprecision in Finite

Resolution Spatial Data Geoinformatica, 2(3),

257-280

Worboys, M F., & Clementini, E (2001)

Integra-tion of imperfect spatial informaIntegra-tion Journal of

Visual Languages and Computing, 12, 61-80.

Yazici, A., & Akkaya, K (2000) Conceptual modeling of geographic information system applications In G Bordogna & G Pasi (Eds.),

Recent Issues on Fuzzy Databases (pp 129–151)

Heidelberg, New York

key t er Ms

Frame of Discernment: The set of all the

pos-sible sets of the hypotheses or class categories

Imprecision: Lack of specificity or lack of

detail in a representation

Inaccuracy: Lack of correlation between

observation and representations of reality

Indiscernibility: A type of imprecision

re-sulting from our inability to distinguish some elements in reality

Mixel: A specialized type of pixel whose area

is subdivided among more than one class

Trang 12

Nonspecifity: A form of uncertainty which

implies lack of specificity in evidential claims

which can be represented as function of

cardinal-ity of undiscerned alternatives

Vagueness: A special type of imprecision that

represents borderline cases of a concept

Trang 13

0

Chapter XLII

Geographic Visual Query

Languages and Ambiguities

Consiglio Nazionale delle Ricerche, IRPPS, Italy

Abstr Act

The main issues of spatial databases and Geographic Information System (GIS), concern the tation, the management and the manipulation of a large and complex number of spatial objects and spatial relationships In these systems many concepts are spatial and, therefore they are intrinsically related with a visual representation, which makes also easier to formulate queries by non-expert users The main problems in visual query languages for spatial databases concern imprecision, spatial integrity and ambiguities in query formulation Our concern in this chapter is with the ambiguity of visual geographical queries In particular, a review of existing visual query languages for spatial databases and their classification on the grounds of the methodology adopted to resolve the ambiguity problem are provided.

Trang 14

represen-Introduct Ion

Spatial databases and Geographic Information

System (GIS) represent, manage and manipulate

a large and complex number of spatial objects

and spatial relationships

Visual queries for spatial databases can be

expressed using one of the following four

ap-proaches The first approach uses predefined

icons to retrieve pictorial information Examples

of languages that use this approach are: Cigales

(Calcinelli and Mainguenaud, 1994), the language

defined by Lee and Chin in (1995) and the

card-based language proposed by Ju et al (2003) The

second approach specifies spatial relationships by

freehand drawing Sketch (Meyer, 1993)

Spatial-Query-By-Sketch (Egenhofer, 1997; Blaser and

Egenhofer, 2000), VISCO (Wessel and Haarslev,

1998), GeoPQL (Ferri and Rafanelli, 2005; Ferri

et al., 2004) and, finally, the language proposed

by Erwig and Schneider (2003), belong to this

approach The third approach uses symbolic

images for representing a set of objects and a set

of spatial relations among them Languages that

belong to this approach are Pictorial Query By

Example (Papadias and Sellis, 1995), SVIQUEL

(Kaushik and Rundensteiner, 1997), and the

lan-guage proposed by Rahman et al (2005) Finally,

the fourth approach combines text and sketching

in a hybrid solution, such as the language proposed

by Szmurlo et al (1998)

The main problems in visual query languages

for spatial databases concern imprecision, spatial

integrity (Favetta & Laurini, 2001) and

ambi-guities in query formulation Some authors have

proposed solutions to resolve ambiguities For

example, Favetta and Aufaures-Portier (2000)

proposed a taxonomy for classifying different

types of ambiguity during query formulation

They state that the best solution for ambiguities

is a hybrid language (textual and visual) with a

more intensive dialog between user and system

Lbath et al (1997) have proposed to resolve ambiguities through a standard for semantics using specific menus They argue that it is possible to define the interpretation for the query and suggest

a hybrid visual language named Aigle-Cigales, in which the system works with default semantics Details should be explicitly mentioned through

a specific contextual menu or by textual format.Since ambiguity can represent a restriction for visual languages, it is very interesting to analyze several language proposals and classify them according to the methodology used to resolve the problem of ambiguity A first group of lan-guages, such as Pictorial Query By Example and SVIQUEL, faces ambiguity by allowing the use

of a few operators and/or spatial relationships A second group disambiguates language through the use of actions in query formulation by modifying the query semantics The iconic language defined

by Lee and Chin (1995) belongs to this group

A third group of languages tries to increase the user’s ability to formulate more complex queries

by the use of several operators without facing the ambiguity problem Languages that belong to this group are Cigales and LVIS A fourth group of languages proposes approximate solutions as well

as the exact answer to the query enabling the user

to select what he/she requires Sketch and Spatial Query By Sketch are part of this category Finally,

a fifth group of languages, such as GeoPQL, resolves ambiguities by introducing special new operators to manage them

This chapter is structured as follows Section

2 gives a brief overview of the approaches used for the definition of visual querying for spatial databases Section 3 illustrates problems about ambiguity treatment in these kinds of visual languages In section 4 a classification of differ-ent languages on the grounds of methodology adopting to resolve the problem of ambiguity is proposed Section 5 presents some future perspec-tives on the growth of visual languages for spatial databases and conclusions

Trang 15

Geographic Visual Query Languages and Ambiguities Treatment

vIsu AL Quer y LAngu Ages for

spAt IAL dAt AbAses

Several proposals of visual languages for

geo-graphic data exist in the literature The following

discussion expressed in detail the classification

of the languages presented in the introduction

To conceptually represent geographic objects,

different visual query languages consider three

types of symbolic graphical objects (SGO): point,

polyline and polygon

The first approach uses predefined icons for

retrieving pictorial information The shortcoming

of this method is that the predefined icons do not

have strong expressive power, and the consequent

query capability is limited

Among languages based on this approach,

there is Cigales (Calcinelli and Mainguenaud,

1994), which allows the user to draw a query

It is based on the idea of expressing a query by

drawing the pattern corresponding to the result

desired by the user To achieve this it uses a set

of icons that model the geometric objects,

poly-line and polygon (point is not considered), and

the operations carried out on these objects (e.g.,

intersection, inclusion, adjacency, path and

dis-tance) Symbolic graphical objects and icons that

conceptualize the operators are predefined

Another language, defined by Lee and Chin

(1995), allows the user to compose a query utilizing

the three symbolic graphical objects: rectangle,

line and point These SGOs can compose an iconic

sentence whose meaning is due to topological

relations among the icons of the query In this

language the user draws a new SGO and can set

the state of all of the previously drawn SGOs to

the foreground or background SGOs for which

the relationships with the new SGO have to be

considered, must be placed in the foreground,

while those SGOs whose relationships have not

been considered are placed in the background

The card-based language, proposed by Ju

et al (2003), uses the card iconic metaphor for

representing both complex spatial objects and

spatial relationships between them The user can describe his/her query requirements in a visual environment by selecting the appropriate cards and putting them into the proper query boxes The result of the query is also displayed in graphical form

The second approach specifies spatial ships by a freehand drawing First of all there is Sketch! (Meyer, 1993), that allows the user to draw a visual representation of his/her query, as

relation-if on a blackboard, without explicit references to operators to be applied to geographical objects involved in the query In fact, the spatial query is expressed through a sketch on the screen, which

is later interpreted by the system It means that spatial operators are directly derived from the sketch

The Spatial-Query-By-Sketch language (Egenhofer, 1997; Blaser and Egenhofer, 2000), similarly to Sketch!, is based on a formal model for topological spatial relations and a computa-tional model for the constraints relaxation Each query produces a set of candidate interpretations

as result and the user selects the correct one.Another language based on the second ap-proach is VISCO (Wessel and Haarslev, 1998) which considers geometric as well as topological constraints as drawn by the user The system parses the geometry of query sketches and supports the annotation of meta information for specifying relaxations which are additional constraints, or

“don’t cares”, that define the query tion

interpreta-Among the languages based on the second approach there is the GeoPQL (Ferri and Ra-fanelli, 2005; Ferri et al., 2004) that is based on twelve operators The nine traditional topological operators, the distance operator and finally two further operators, ALIAS and ANY, devoted

to solving ambiguities in query interpretation Only the last two operators need to be explicitly expressed, while topological operators are auto-matically deduced from the visual representation

of the query

Trang 16

Finally, the proposal of Erwig and Schneider

(2003) consists in a language devoted to

analyz-ing two-dimensional traces of movanalyz-ing objects

to infer a temporal development of their mutual

spatial relationships

The third approach uses symbolic images for

representing a set of objects and a set of spatial

relations among them For instance, Pictorial

Query-by-Example (PQBE) (Papadias and Sellis,

1995) uses symbolic images to find directional

relationships This language considers a symbolic

image as an array that could correspond to visual

scenes, geographical maps or other forms of

spa-tial data The main limitation of PQBE is that it

considers directional relationships only

An evolution of PQBE is SVIQUEL (Kaushik

and Rundensteiner, 1997) that also includes

topo-logical operators which consider 45 different types

of primitives, allowing to represent topological

and directional relationships between two SGOs

of type polygon

Finally, the proposal by Rahman et al (2005),

allows elicit information requirements through

the interactive choice of wireless web services

To define a visual spatial query language this

approach combines text and sketching in a

hy-brid solution Using this method, users can draw

spatial configurations of the objects they would

like to retrieve from the GIS, while the textual

part permits the specification of the geographical

semantics An example of this approach is

repre-sented by the language proposed by Szmurlo et al

(1998) This language allows the user to draw the

configuration of the objects he/she is interested

in, and thus defines spatial constraints between

these objects Any geographic object can be

classified as a zone, a line or a point Moreover,

constraints written in natural language are

col-lected in labels that are graphically connected to

the object they refer to

Quer y’s AMbIgu Ity prob LeM

The main problems of visual query languages for spatial databases concern imprecision, spa-tial integrity and ambiguities in query formula-tion Favetta and Aufaure-Portier (2000) details problems due to topological imprecision and integrity in spatial relations Some researchers have proposed solutions to resolve ambiguities For example, Favetta and Aufaures-Portier (2000) have suggested a taxonomy for classifying dif-ferent kinds of ambiguity that can be produced during the query formulation They state that the best solution to resolve ambiguities is a hybrid language (textual and visual) with a dialog more intense between user and system

Another proposal to resolve the ambiguities has been introduced by Lbath et al (1997) They proposed standard semantics, through specific menus, that make interpretation of the query possible They propose a hybrid visual language, Aigle-Cigales, in which the system works with default semantics and details are mentioned ex-plicitly through a specific contextual menu or by textual format

Moreover, Carpentier and Mainguenaud (2002) distinguish two ambiguities: visual ambiguity that appears when a given visual representation of a query corresponds with several interpretations, and selection ambiguity that appears when sev-eral metaphors correspond to a given selection

To reduce these problems the authors propose

a drawing process with a grammar for tion: the more powerful the grammar, the more important the level of ambiguity A compromise needs to be found

interac-Visual languages offer an intuitive and mental view of spatial queries, but often times they may offer different interpretations of the same query Among the reasons for multiple query interpretations, the most important is that the user has a different intention in formulating

Trang 17

incre-

his/her query with respect to the analysis that the

system makes of it Moreover, when the user draws

two icons for representing different objects of a

sentence, he/she can avoid defining one or more

spatial relations between them Then the system

can formulate different interpretations to obtain

the required result

For example, suppose the user formulate the

following query: “Find all the regions which pass

through a river and overlap a forest” The user

is not interested in the relationship between the

river and the forest and the absence, in natural

language (NL) formulation, of explicit

relation-ships between them produces an ambiguity The

different visual queries of Figure 1 represent the

query in natural language

To remove the ambiguity, the complete natural

language query, “Find all the regions which are

passed through by a river and overlap a forest,

irrespectively of the topological relationships

between the river and the forest”, could be

con-sidered However, when the user draws an SGO

representing a forest and another representing a

river he/she cannot avoid representing a

topologi-cal relationship between them

Since ambiguity can constitute a restriction

of visual language, in the following section we

analyze methodologies adopted by different

lan-guages to resolve this problem

vIsu AL spAt IAL Quer y LAngu Ages And the AMbIgu Ity prob LeM

In section 2 we presented visual query languages for spatial databases that have been proposed in the literature Now we classify these languages

on the grounds of methodologies they adopt to resolve the problem of ambiguity

A first group of languages handles the biguity by allowing the use of few operators or spatial relationships, such as Pictorial Query By Example and SVIQUEL Considering only limited kinds of spatial relations (directional relations) PQBE avoids multiple interpretations of the query but reduces the possibility of formulating more complex queries that involve topological relationships The SVIQUEL language also in-cludes topological operators However, it avoids multiple interpretations by limiting the number

am-of objects involved (to just two) and provides a tool with a low expressive power for specifying the relative spatial positions

A second group disambiguates the language

by the use of actions in query formulation that modify query semantics The iconic language defined by Lee and Chin (1995) belongs to this group In this language it is possible to remove undesired relationships among drawn symbolic graphical objects or impose an a priori restrictive interpretation using the foreground/background

Figure 1 Visual queries for the same NL query

Region

Forest

Trang 18

metaphor The relationships of a new symbolic

graphical object depend on the state (foreground

or background) of the previously drawn symbolic

graphical objects To interpret a query the parser

must consider both the visual representation and

the drawing process In this manner some

proce-dural steps influence the semantics of the query

they do not influence its representation Queries

having the same representation may have

differ-ent semantics Another language that belongs to

the second group is VISCO This language offers

tools that specify meta-information resembling

the user’s idea of the interpretation This approach

demands more skill from the user but makes the

intended interpretation explicit

A third group of language tries to increase a

user’s possibility of formulating a more complex

query by the use of several operators without

facing explicitly the ambiguity problem A

lan-guage that belongs to this group is Cigales In this

language the system is not able to give a unique

interpretation of the visual query

representa-tion Two possible solutions proposed to reduce

ambiguity of Cigales are: to introduce various

interactions (feedback) with the user and to

in-crease the complexity of the resolution model

However, different obstructions may arise so that

the semantics of the query are fully user

depen-dent and complex queries with numerous basic

objects, are not expressible Another language

of the third group is that proposed by Szmurlo

et al (1998) This language allows the user to

draw the configuration of the objects and thus to

define spatial constraints between these objects

and thematic constraints that each object has to

respect However, as this language allows the user

to have great freedom, many ambiguities may arise

due to the incoherencies between the object and

the thematic constraints, between constraints for

the same object or between constraints for

dif-ferent objects A solution to ambiguity consists

of detection of incoherencies and proposal of a

possible solutions to the user

A fourth group of languages proposes to give query approximation solutions to the user and the user then selects what he/she requires Sketch and Spatial Query By Sketch are part of this category In particular, Spatial-Query-By-Sketch resolves the ambiguity problem by considering and proposing to the user both the exact solution

of the query, if possible, and other approximate solutions obtained by relaxing some relationships

In this manner the language includes multiple interpretations in the result, and the user selects the representation that provides a correct inter-pretation of his/her query

Finally, a fifth group of languages resolves the ambiguity problem by introducing new special operators that serve to manage the ambiguity Among the languages belonging to this group there

is GeoPQL, which allows the user to represent only the desired relationships The system interprets the query considering all relationships between symbolic graphical objects of the sketch, and it

is possible to remove or modify some undesired relations using ad hoc operators introduced in the language Figure 2 shows the few examples of visual queries represented by using some of the languages introduced in this article

c onc Lus Ion

The number of applications using spatial or graphic data has been ever increasing over the last decade New small-scale GISs, often called desktop GIS, are gradually becoming available and people are becoming familiar with using the web to access remote information An important future direction is to provide the existing system through the web by using standards created by the OpenGIS consortium such as WMS (Web Map Server) and GML (Geographic Markup Language) In particular, GML is an open stan-dard for encoding geographic information in

geo-an eXtensible Markup Lgeo-anguage (XML) It is not related to any specific hardware or software

Trang 19

platform Any data encoded using it can be

eas-ily read and understood by any programming

language and software system able to parse XML

streams Moreover, there are commercial and

open source tools that translate GML data into

the Scalable Vector Graphics (SVG) format in

order to display maps

Consequently, the World Wide Web diffusion

and the increasingly widespread usage of mobile

devices has made possible to query and to access

geographical databases available online from

mo-bile devices There are mainly two drawbacks in

using GML for mobile devices First, it is memory

and bandwidth-consuming for storage and transfer

respectively Moreover, maps described with it

have to be projected and scaled before being

plot-ted These characteristics make GML not directly

accessible with small devices It is necessary to

reduce GML size in order to make cartographic

data accessible from mobile devices

Another future perspective concerns the

repre-sentation of dynamic phenomena that change over

space and time in GISs While current GIS can provide snapshot views at discrete time intervals, they fall short in providing an ability to link the process models with data from multiple sources

or simulate scenarios of change for users

r eferences

Blaser, A D., & Egenhofer, M J (2000) A Visual

Tool for Querying Geographic Databases

Ad-vanced Visual Interfaces – AVI 2000 Palermo,

Italy: ACM Press (pp 211-216)

Calcinelli, D., & Mainguenaud, M (1994) Cigales,

a visual language for geographic information

system: the user interface Journal of Visual

Languages and Computing, 5(2), 113-132

Carpentier, C., & Mainguenaud, M (2002) sifying Ambiguities in Visual Spatial Languages

Clas-GeoInformatica, 6(3), 285-316.

Figure 2 The same visual query represented with different languages

Trang 20

Egenhofer, M J (1997) Query Processing in

Spatial-Query-by-Sketch Journal of Visual

Lan-guages and Computing, 8(4), 403-424

Erwig, M., & Schneider, M (2003) A visual

language for the evolution of spatial relationships

and its translation into a spatio-temporal calculus

Journal of Visual Languages and Computing

Elsevier, 14, 181–211.

Favetta, F., & Laurini, R (2001) About

Preci-sion and Integrity in Visual Query Languages

for Spatial Databases Proceedings of the 7th

International Conference on Database Systems

for Advanced Applications (DASFAA 2001) IEEE

Computer Society, (pp 286-293)

Favetta, F., & Aufaure-Portier, M (2000) About

Ambiguities in Visual GIS Query Languages: a

Taxonomy and Solutions Proceedings of the 4th

International Conference on Advances in Visual

Information Systems LNCS Springer-Verlag

Publications, LNCS 1929, (pp 154-165)

Ferri, F., Grifoni, P., & Rafanelli, M (2004)

XPQL: A pictorial language for querying

geographic data Databases and Expert Systems

Applications (DEXA 2004) LNCS 3180,

Springer-Verlag Publications (pp 925-935)

Ferri, F., & Rafanelli, M (2005) GeoPQL: A

Geographical Pictorial Query Language that

resolves ambiguities in query interpretation

Journal of Data Semantics LNCS 3534, 3, 50-80

Springer-Verlag Publications

Ju, S., Guo, W., & Hernández, H J (2003) A

Card-based Visual Query System for

Geographi-cal Information Systems Scandinavian Research

Conference on Geographical Information Science

(pp 62-74)

Kaushik, S., & Rundensteiner, E (1997)

SVIQUEL: A Spatial Visual Query and

Explora-tion Language, Databases and Expert Systems

Applications (DEXA 1998), LNCS 1460,

Springer-Verlag Publications (pp 290-299).

Lbath, A., Aufaure-Portier, M., & Laurini, R

(1997) Using a Visual Language for the Design

and Query in GIS Customization International

IEEE Conference on Visual Information Systems

San Diego, CA (pp 197-204)

Lee, Y C., & Chin, F (1995) An Iconic Query Language for Topological Relationship in GIS

International Journal of geographical tion Systems, 9(1), 25-46.

Informa-Meyer, B (1993) Beyond Icons: Towards New Metaphors for Visual Query Languages for Spatial

Information Systems International Workshop

on Interfaces to Database Systems Glasgow,

Scotland, (pp 113-135)

Papadias, D., & Sellis T (1995) A Pictorial

Query-by-Example Language Journal of Visual

Languages and Computing, 6(1), 53-72.

Rahman, S A., Bhalla, S., & Hashimoto, T (2005) Query-By-Object Interface for Information Re-

quirement Elicitation Implementation Fourth

International Conference on Mobile Business (ICMB2005) IEEE Computer Society, Sydney,

Wessel, M., & Haarslev, V (1998) VISCO:

Bringing Visual Spatial Querying to Reality

Proceedings of the IEEE Symposium on Visual

Languages IEEE Computer Society, Halifax,

Canada, (pp 170-179)

key t er Ms

eXtensible Markup Language (XML): is

a W3C-recommended general-purpose markup language for creating special-purpose markup languages, capable of describing many different

Trang 21

kinds of data In other words, XML is a way of

describing data

Geographical Database: A database in which

geographic information is store by x-y

coordi-nates of single points or points which identify the

boundaries of lines (or polylines, which sometimes

represent the boundaries of polygons) Different

attributes characterize the objects stored in these

databases In general the storing structure consists

of “classes” of objects, each of them implemented

by a layer Often a geographic databases include

raster, topological vector, image processing, and

graphics production functionality

Geographical Information System (GIS):

A computerized database system used for the

capture, conversion, storage, retrieval, analysis

and display of spatial objects

Geography Markup Language (GML): is

the XML grammar defined by the Open

Geo-spatial Consortium (OGC) to express geographic

features GML serves as a modeling language

for geographic systems as well as an open

in-terchange format for geographic transactions on

the Internet

Icon: Small pictures that represent commands,

files, objects or windows

Metaphor: Figurative language that creates

an analogy between two unlike things A phor does not make a comparison, but creates its analogy by representing one thing as something else

meta-Scalable Vector Graphics (SVG): SVG is a

language for describing two-dimensional graphics and graphical applications in XML

Visual Query language: A language that

allows the user to specify its goals in a two-(or more)-dimensional way with visual expressions

- spatial arrangements of textual and graphical symbols

Trang 23

0

Chapter XLIII

GeoCache:

A Cache for GML Geographical Data

PRiSM Laboratory, France

Abstr Act

GML is a promising model for integrating geodata within data warehouses The resulting databases are generally large and require spatial operators to be handled Depending on the size of the target geographical data and the number and complexity of operators in a query, the processing time may quickly become prohibitive To optimize spatial queries over GML encoded data, this chapter introduces a novel cache-based architecture A new cache replacement policy is then proposed It takes into account the containment properties of geographical data and predicates, and allows evicting the most irrelevant values from the cache Experiences with the GeoCache prototype show the effectiveness of the proposed architecture with the associated replacement policy, compared to existing works.

The increasing accumulation of geographical

data and the heterogeneity of Geographical

In-formation Systems (GISs) make difficult efficient

query processing in distributed GIS Novel

archi-tectures (Boucelma, Messid, & Lacroix, 2002; Chen, Wang, & Rundensteiner, 2004; Corocoles

& Gonzalez, 2003; Gupta, Marciano, Zaslavsky,

& Baru, 1999; Leclercq, Djamal, & Yétongnon,

1999;Sindoni, Tininini, Ambrosetti, Bedeschi, De Francisci, Gargano, Molinaro, Paolucci, Patteri,

Trang 24

& Ticca, 2001; Stoimenov, Djordjevic-Kajan,

& Stojanovic, 2000; Voisard & Juergens, 1999;

Zhang, Javed, Shaheen, & Gruenwald, 2001) are

based on XML, which becomes a standard for

exchanging data between heterogeneous sources

Proposed by OpenGIS (2003), GML is an XML

encoding for the modeling, transport, and storage

of geographical information including both the

spatial and non-spatial fragments of geographical

data (called features) As stressed in (Savary &

Zeitouni, 2003), we believe that GML is a

promis-ing model for geographical data mediatpromis-ing and

warehousing purpose

By their nature, geographical data are large

Thus GML documents are often of important size

The processing time of geographical queries over

such documents in a data warehouse can become

too large for several reasons:

1 The query evaluator needs to parse entire

documents to find and extract query relevant

data

2 Spatial operators are not cost effective,

especially if the query contains complex

selections and joins on large GML

docu-ments

Moreover, computational costs of spatial

operators are generally more expensive than

those of standard relational operators Thus,

geographical queries on GML documents raise

the problem of memory and CPU consumption

To solve this problem, we propose to exploit the

specificities of a semantic cache (Dar, Franklin,

Jonsson, Srivastava, & Tan, 1996) with an

op-timized data structure The proposed structure

aims at considerably reducing memory space by

avoiding storing redundant values Furthermore,

a new cache replacement policy is proposed It

keeps in cache the most relevant data for better

efficiency

Related works generally focus on spatial data

stored in object-relational databases (Beckmann,

Kriegel, Schneider, & Seeger, 1990) The

pro-posed cache organizations are better suitable for tuple-oriented data structures (Brinkhoff, 2002) Most cache replacement policies are based

on Least Recently Used (LRU) and its variants Other cache replacement policies proposed in the literature (Arlitt, Friedrich, Cherkasova, Dilley, & Jin, 1999; Cao & Irani, 1997; Lorenzetti & Rizzo, 1996) deal with relational or XML databases, but have not yet investigated the area of XML spatial databases

The rest of the chapter is organized as follows: The second section gives an overview of related works In the third section we present our cache architecture adapted for GML geographical data The fourth section discusses the inference rules of spatial operators and presents an efficient replace-ment policy for geographical data considering inference between spatial operators The fifth section shows some results of the proposed cache implementation and replacement policy Finally, the conclusion summarizes our contributions and points out the main advantages of the proposed GML cache-based architecture

r eLAted works Cache Replacement Policy

In the literature, several approaches have been proposed for cache replacement policy The most well known is the Least Recently Used (LRU) (Tanenbaum, 1992) This algorithm replaces the document requested the least recently Rather at the opposite, the Least Frequently Used (LFU) algorithm evicts the document accessed the least frequently A lot of extensions or variations have been proposed in the context of WWW proxy cach-ing algorithms We review some in the sequel.The LRU-Threshold (Chou & DeWitt, 1985) is

a simple extension of LRU in which documents larger than a given threshold size are never cached The LRU-K (O’Neil, O’Neil, & Weikum, 1993) considers the time of the last K references to a

Trang 25

GeoCache

page and uses such information to make

page-replacement decisions The page to be dropped

is the one with a maximum backward K-distance

for all pages in the buffer The Log(size)+LRU

(Abrams, Standbridge, Adbulla, Williams, &

Fox, 1995) evicts the document with the largest

log(size), and apply LRU in case of equality The

Size algorithm evicts the largest document The

Hybrid algorithm aims at reducing the total latency

time by computing a function that estimates the

value of keeping a page in cache This function

takes into account the time to connect with a

server, the network bandwidth, the use frequency

of the cache result, and the size of the document

The document with the smallest function value is

then evicted The Lowest Relative Value (LRV)

algorithm includes the cost and the size of a

document in estimating the utility of keeping it

in cache (Lorenzetti et al., 1996) LRV evicts the

document with the lowest utility value

One of the most successful algorithms is the

Greedy Dual-Size (GD-size) introduced by Cao

et al (1997) It takes into account the cost and

the size of a new object When a new object

ar-rives, the algorithm increases the ranking of the

new object by the cost of the removed object In

the same spirit, the Greedy Dual-Size Frequency

(GDSF) algorithm proposed by Arlitt et al (1999)

takes into account not only the size and the cost,

but also the frequency of accesses to objects As an

enhancement of GDSF, Yang, Zhang, and Zhang

introduce the time factor (2003) Combined to the

Taylor series, it allows predicting the time of the

next access to an object Thus, it provides a more

accurate prediction on future access trends when

the access patterns vary greatly But the main

bottleneck of this approach is the time

consump-tion to recalculate the priority of each object

Spatial Cache Replacement Policy

Most proposed spatial cache replacement policies

are based on variants of LRU and are developed in

the context of relational databases In the area of

spatial database systems, the effect of other replacement strategies has not been investigated except in (Brinkhoff, 2002)

page-Considering a page managed by a spatial base system, one can distinguish three categories

data-of pages (Brinkhdata-off, Horn, Kriegel, & Schneider, 1993): directory pages (descriptors), data pages (classical information), and object pages (storing the exact representation of spatial objects) Using the type-based LRU (LRU-T), first the object pages are evicted, followed by the data pages, and finally by the directory pages Using primitive based LRU (LRU-P), pages are removed from buffer according to their respective priorities If

a tree-based spatial access method is used, the highest priority is accorded from the root to the index directory pages, followed by the data pages, and finally the object pages Thus, the priority of

a page depends on its height in the tree

Let us recall that in GIS jargon, the MBR of

an object is the minimum-bounding rectangle of this object The area of a page of objects is the minimum rectangle including all MBRs of that page The margin is the border of an area Beck-mann et al (1990) and Brinkhoff (2002) define five spatial pages-replacement algorithms based

on spatial criteria:

1 Maximizing the area of a page (A): A page

with a large area should stay in the buffer as long as possible This result from the obser-vation that the larger is the area, the more frequently the page should be requested

2 Maximizing the area of the entries of a

page (EA): Instead of the area of a page, the

sum of the area of its entries (spatial objects)

is maximized

3 Maximizing the margin of a page (M):

The margin of a page p is defined as the margin of the MBR containing all entries

of p The larger a page margin is, the longer

it will stay in the buffer

4 Maximizing the margin of the entries

of a page (EM): Instead of the margin of

Trang 26

a page p, that of the composing MBRs are

considered

5 Maximizing the overlaps between the

entries of a page (EO): This algorithm tries

to maximize the sum of the intersection

areas of all pairs of entries with overlapping

MBRs

As a synthesis, Brinkhoff (2002) proposes

a combination of LRU-based and spatial

page-replacement algorithms To evict a document, a

set of victim candidates is determined using the

LRU strategy Then, the page to be dropped out

of the buffer is selected from the candidate set

using a spatial page replacement algorithm The

page dropped by this selection is placed in an

overflow buffer, where a victim is evicted using the

FIFO strategy Depending on its spatial and LRU

criteria, a requested page found in the overflow

buffer is moved to the standard part of the buffer,

influencing the size of the candidate set

Buffer cache techniques are mainly used in

spatial database systems in order to optimize

queries response time The work conducted by

Brinkhoff uses a spatial index for better

man-agement of the buffer cache However, there is

no spatial index for GML documents, as they

are encoded in XML Hence, the spatial criteria

mentioned above could not be applied Other

cri-teria must be considered to handle geographical

queries Moreover, semantic cache gives better

performances than page or tuple replacement

strategies (Dar et al., 1996), but until now, it has

not been really studied for geographical queries

where data are stored in XML

Cache Structure for Geographical

Queries in GML

Generally, spatial data consume a lot of memory

space Hence, caching spatial objects has a

ten-dency to flood the available space in cache For

example in a spatial join query, a spatial object A

can match with several objects B1, B2, etc Thence,

a same object A can be replicated many times

in spatial query results This may considerably reduce the available space in cache, especially when a large amount of spatial fragments must

be stored

To avoid spatial object replication in cache, we propose a simple data structure, which facilitates object identification and isolation This structure

is divided into two parts The first is devoted

to the non-spatial elements of the geographical data The second one contains non-redundant spatial fragments of geographical data (i.e., only distinct spatial objects are stored in cache) In semantic cache, the semantic region is divided into two parts (Chidlovskii, Roncancio, & Sch-neider, 1999): the region descriptor describing each query result stored in cache, and the region content where the data are stored In the case of geographical queries, we introduce two kinds of

region content: the non-spatial region content, and the spatial region content.

The spatial region content contains redundant spatial data of geographical query results, whereas the non-spatial region content contains non-spatial data of geographical query results These region contents are associated with the geographical region descriptor It con-tains information about each geographical query stored in cache The cache is then divided into

non-two parts (see Figure 1): (i) the non-spatial part

of the cache composed of the non-spatial region content and it associated description contained in

the geographical region descriptor (see the third section); (ii) the spatial part of the cache composed

of the spatial region contents and it associated description contained in the geographical region

descriptor (see the third section)

non-spatial part of the c ache

All data are encoded in XML For simplicity and standard enforcement, we encode XML data as DOM trees More compact structures are possible, but it would not change the relative results and

Định dạng
Số trang	52
Dung lượng	2,87 MB