InformatIon ScIence Reference Part 2 pdf

It is based on a robust data model and algebra and it offers all the features of an XML query language and a wide set of spatial opera-tors.. Query Language: Computer language used to m

Trang 1

Querying GML

distances between spatial objects and calculating

relative direction Some other forms of spatial

predicates are realized as a set of functions

The basic syntax of GML-QL is the same as

that of XQuery, with added spatial functions

The following are some examples of GML-QL

queries

Query 1: List the name, population, and area

of each country for the file “country.XML”

Query 2: The St Lawrence River can supply

water to the cities which are within 300 km, if

needed List the cities which can use water from

the St Lawrence This query illustrates buffer

analysis and spatial join operations

to variable $c Finally, the result is constructed as defined in the return clause The values obtained from the query, which is bound to the variable, can be used to construct new elements in the result if necessary In this example, a function, area(), is used to calculate the area of a country The second example illustrates buffer analysis and spatial joins operations by using overlap() and buffer() functions

Although Vatsavai (2002) mentioned that spatial operators in this query language are imple-mented as a set of functions, the details of these functions were not given We therefore suppose that this query language has neither a particular application nor an implementation Nevertheless,

it offered a novel and interesting approach to define a GML query language based on XQuery and OGC-SQL (Open Geospatial Consortium, 1999) spatial operators

CXQuery (Constraint XML Query Language)

CXQuery or Constraint XML Query Language

(Chen and Revesz, 2002) is a declarative, style language for querying and updating XML documents It employs the syntax and semantics

Datalog-of constraint query languages (Kanellakis et al.,

Topologically closed geometries Disjoint(a,b) ⇔ a ∩ b = ∅

a and b applies to the A/A, L/L, L/A, P/A and P/L groups

of relationships Touches(a,b) ⇔ (I(a)∩I(b) = ∅) ∧ (a ∩ b) ≠∅

a and b applies to the P/L, P/A, L/L and L/A Crosses(a,b) ⇔ (dim(I(a) ∩ I(b)) < max(dim(I(a)),

dim(I(b)))) ∧ (a ∩ b ≠a ) ∧ (a ∩ b ≠b)

a and b applies to the A/A, P/A,L/A,L/L Within(a,b) ⇔ (a ∩ b = a) ∧ (I(a) ∩I(b) ≠ ∅)

a and b applies to the A/A, P/A,L/A,L/L Contains(a,b) ⇔ Within(b,a)

a and b applies to the A/A, P/A,L/A,L/L Intersects(a,b) ⇔ not.Disjoint(a,b)

Table 1 Definition of predicates

Trang 2

1990) The input of a CXQuery is a set of XML

documents The output of a CXQuery query is

also an XML document When CXQuery is used

to define views, the query result is not

material-ized

A CXQuery expression contains a rule head

and a rule body, with a “:-” symbol between them

The rule body contains a set of predicates, which

are separated by semicolons The semicolons stand

for the logical operation “and” To simplify the

CXQuery expression, it employs a subset of XPath

functionality to navigate the hierarchical structure

of XML documents and to avoid namespace

con-flicts Since most XML documents exchanged in

e-Business have relatively restricted structures,

CXQuery considers those XML documents that

have internal DTD definitions or have external

DTD definition connections

Due to these difficulties, to date there is no

query language proposal which supports querying

spatial XML documents Since both CXQuery

and many constraint query languages are based

on Prolog, they can be easily combined Since

constraint query languages can express

spatio-temporal queries, the combination leads to a

query language for XML documents that contain

spatio-temporal data Moreover, combination

can be easily implemented on top of a constraint

database system

Query 3 shows a spatial query: Find all

build-ings located in citycampus and belonging to the

Computer Science department

Building/dept = «Computer Science».

The first two rules construct the constraint representation of the spatial data from the XML documents The third rule uses a spatial func-tion contains() to test the spatial relation of two spatial objects

One way in which CXQuery improves upon

XQuery is by specifying schemas for the results

of queries Chen and Revesz (2002) claim that query results without schemas are limited for defining views, integrating data, updating, and further querying XQuery can query the results

of a query without a schema provided

The main focus of this query language is to provide schema information in the query result

Since CXQuery is derived from a constraint query language and the fact that constraint query

languages can express spatial-temporal queries, the combination leads to a query language for XML documents that contains support for spa-tial-temporal data

Gquery

Gquery (Boucelma and Colonna, 2004) is yet

another GML query language based on XQuery

Unlike GML-QL, Boucelma and Colonna (2004)

define a set of Gquery-specific spatial operators and basic data types Its data types are polygon, line and point, in the same way as the basic data types defined for GML

The spatial operators can be classified into three groups: operators that return boolean type (equal, inside and cut), operators that return float type (distance, perimeter and length) and opera-tors that return spatial type (convexhull, center, intersection)

Trang 3

Querying GML

Query 4 is an example It obtains the

intersec-tion point between a road and a river:

for n in city

return intersection(n/road, n/river)

Gquery is designed for use in a particular

mediator architecture It provides an integrated

view of the data supplied by all sources, and

Gquery makes it possible to access and manipulate

integrated data

c onc Lus Ion

Currently, there is a large set of query languages

over XML Although each one is based on

differ-ent algebra and data models, all of them have the

same aim: to query semi-structured data

There are fewer query languages for GML

documents Since GML is an XML encoding, the

features of XML could be applied to GML With

this, a GML query language should extend a query

language over XML with spatial features

In fact, in this chapter we have discussed four

query languages over GML The first one is a

novel extension of a previous query language

over XML It is based on a robust data model and

algebra and it offers all the features of an XML

query language and a wide set of spatial

opera-tors Since it was the first approach in this area,

it has inspired other query languages (Chung et

al., 2004)

The other three approaches are an extension

of XQuery, with different aims and perspectives

The first of these, GML-QL, was the first novel

approach of a GML query language based on

XQuery Since the literature about GML-QL is

rather scarce, we suppose that this query

lan-guage has neither a particular application nor

an implementation Furthermore, details about

spatial operators and functions were not given

by Vatsavai (2002)

Although the second of these, CXQuery , is

based on XQuery, it offers an interesting approach

for a spatial query language over GML CXQuery

allows to query and update XML documents

us-ing the syntax and semantics of constraint query

languages This query language is currently the best approach over GML

The last approach, Gquery, defines a set of spatial operators for GML It is a specific ap-proach to be applied in a particular mediator architecture

In conclusion, GML can represent database resources on the web, etc which can be queried

with a specific query language Query languages

over GML are a reality

r eferences

Abiteboul, S., Quass, S., McHugh, J., Widom, J.,

& Wiener, J (1997) The Lorel Query Language

for Semistructured Data International Journal

on Digital Libraries, 1(1), 68-88.

Beech, D., Malhotra A., & Rys, M (1999)

A Formal Data Model and Algebra for XML

http://www-db.stanford.edu/dbseminar/Archive/FallY99/ malhotra-slides/malhotra.pdf

Boucelma, O., & Colonna, F (2004) Mediation

for Online Geoservices In 4th International

Workshop on Web and Wireless Geographical Information Systems W2GIS 2004 Korea.

Chen, Y., & Revesz, P (2002) CXQuery: A Novel

XML Query Language In Proc of International

Conference on Advances in Infrastructure for Electronic Business, Science, and Medicine on the Internet (SSGRR’02).

Chung, W., Park, S., & Bae, H (2004) An sion of XQuery for Moving Objects over GML

Exten-ITCC Proc Of the International Conference on

Information Technology: Coding and ing IEEE.

Trang 4

Comput-Córcoles, J E., & González, P (2001) A

Speci-fication of a Spatial Query Language over GML

ACM-GIS 2001 9th ACM International

Sympo-sium on Advances in Geographic Information

Systems Atlanta (USA)

Deutsch, A., Fernandez, M., Florescu, D., Levy, A.,

& Suciu, D (1999) XML-QL: A Query Language

for XML Computer Networks, 31, 11-16.

Kanellakis, P C Kuper, G M., & Revesz (1990)

P Constraint Query languages, Symposium on

Principles of Database Systems.

Open Geospatial Consortium (1999) Simple

Features Specification For SQL, 05-1341 Open

Geospatial Consortium Retrieved 13th January,

2005, from http://www.opengeospatial.org

Open Geospatial Consortium (2003) Geography

Markup Language – GML Retrieved 13th

Janu-ary, 2005, from http://www.opengis.org/techno/

documents/02-023r4.pdf

Robie, J (1998) The design of XQL Retrieved

13th January, 2005, from http://www.w3.org/style/

XSL/Group/1998/09/XQL-design.html

Vatsavai, R (2002) GML-QL: A Spatial Query

Language Specification for GML Retrieved 13th

January, 2005, from

http://www.cobblestone-concepts.com/ucgis2summer2002/ vatsavai/

vatsavai.htm

W3C (1998) XSL Retrieved 13th January, 2005,

from http://www.w3 org/TR/REC-XML

W3C (2001) XQuery: A Query Language for XML

Retrieved 13th January, 2005, from http://www.w3.org/TR/2001/WD-XQuery-20010215

W3C (2005) Extensible Markup Language

– XML Retrieved 13th January, 2005, from

http://www.w3c.org/XML/

key t er Ms

Feature: A feature is an application object

that represents a physical entity, e.g a building, a river, or a person A feature may or may not have geometric aspects

Markup Language: Language which

com-bines text and extra information about the text The extra information is expressed using markup, which is intermingled with the primary text

Query Language: Computer language used

to make queries into databases and information systems

Semi-Structured Data: Data with incomplete

structure Data are directly described using a simple syntax, e.g XML, GML, etc

Trang 5

Technical University of Varna, Bulgaria

Abstr Act

Image Databases (IDBs) are a kind of Spatial Databases where a large number of images are stored and queried In this chapter, techniques for indexing an IDB for efficiently processing several kinds of queries, like retrieval based on features, content, structure, processing of joins, and queries by example are reviewed The main indexing techniques used in IDBs are either members of the R-tree family (data driven structures), or members of the quadtree family (space driven structures) Although, research

on IDB indexing counts several years, there are still significant research challenges, which are also discussed in this chapter IDBs and their indexing structures bring together two different disciplines (databases and image processing) and interdisciplinary research efforts are required Moreover, deal- ing with the semantic gap (successful integrated retrieval based on low-level features and high-level semantic features) and querying between images and other kinds of spatial data are also significant future research directions.

Trang 6

Introduct Ion

Image Databases (IDBs) are a special kind of

Spatial Databases where a large number of images

are stored and queried IDBs have a plethora of

applications in modern life, for example in

medi-cal, multimedia, and educational applications

In the framework of Geographical Information

Systems (GIS), digital images (raster data) may

represent changes in cultivations, sunny areas, and

the discrimination between urban environments

and country sides

Apart from the raster format, GIS data may

be stored in vector format (points, line segments,

polygons, etc.) Each of these data formats has

certain advantages making a choice between them

a challenge Raster data leads to faster computing

for several operations (e.g., overlays) and are well

suited for remote sensing On the other hand, they

have a fixed resolution leading to limited detail

In this article, we focus on raster data (image

databases) and their indexing techniques

Since the start of the 1980s several structures

for spatial objects have been proposed in the

lit-erature for efficient storage and retrieval of image

collections Based on these methods, many kinds

of useful queries on image data may be processed

efficiently These include:

• Queries about the content of additional

properties (descriptive information) that

have been embedded for each image (e.g.,

which images have been used in the book

cover of children’s books?)

• Queries about the characteristics/features

of the images like color, texture, shape etc

(e.g., find the images that depict vivid blue

sky.)

• Queries for retrieving images with specified

content (e.g., find the images that contain

the sub-image of a specified chair.)

• Queries by example or sketch (e.g., a sample

image is chosen, or drawn by the user and

images similar to this sample are sought.)

• Structural queries (e.g., find the images that contain a number of specific objects in a specified arrangement.)

• Image Joins (e.g., find the cultivation areas that reside in polluted atmosphere areas.)

• Queries that combine regional data and other sorts of spatial data (e.g., find the cities represented by point data that reside within 5km from cotton cultivations.)

• Temporal Queries on sequences of evolving images (e.g., find if there has been an increase

in the regions of wheat cultivations in this prefecture during the last two years.).The importance of image indexing and query-ing techniques led major Database Management Systems’ manufacturers to embed related exten-sions to the core engine of their products, (e.g., DB2 has embedded QBIC technology) (Flickner

et al 1995) and Oracle provides Content-Based Image Retrieval (CBIR) based on Virage (An-namalai et al 2000)

bAckground

A digital image is a representation of a mensional image as a finite set of digital values, called picture elements or pixels In a binary image, each pixel can be either black, or white, while in a greyscale (color) image each pixel cor-responds to a shade of gray (to a color), among a set of permitted greyscale (color) values Each image represents a scene containing objects and regions An IDB is an organized collection of digital images aiming at the man-agement and the efficient processing of queries

two-di-on this image collectitwo-di-on There are numerous publications in the literature related to the pro-cessing of queries on image features like color (e.g., distribution of colors, dominant colors, and color moments), texture (the pattern of the image surface change, usually expressed by a combination of characteristics like coarseness, contrast, directionality, uniformity, regularity,

Trang 7

Image Database Indexing Techniques

density, frequency, etc.) and shape (the physical

structure of objects, or the geometric shapes

pres-ent in the image) In several of these publications

(emerging from the image processing/computer

vision community) the term indexing refers to

the features corresponding to each image and to

the algorithm used for computing the similarity

between them (the algorithm often works by an

exhaustive comparison with all the images

pres-ent in the databases) In this article, indexing is

used in the context of databases and corresponds

to the access methods (data structures) used to

speed up query processing

Several publications that contain review

mate-rial have appeared in the literature Rui et al (1999)

review numerous papers covering several aspects

of CBIR, including multidimensional indexing,

and identify open research issues Smeulders et

al (2000) is another detailed review of CBIR

techniques covering the research presented up

to the year 2000 that includes also a subsection

on storage and indexing The last section of this

paper presents the authors’ view on CBIR’s

fu-ture trends Manolopoulos et al (2000) overview

indexing for structural and feature based queries

Veltkamp and Tanase (2001) performed a survey

on numerous CBIR systems providing

informa-tion that is available for each of them on several

technical aspects, including the use of indexing

structures One conclusion of this survey is that

“Indexing data structures are often not used”

Manouvrier et al (2005) present a detailed review

of quadtree based indexing in the image domain

ranging from image representation and storage

to CBIR Price (2006) maintains an extensive

Computer Vision bibliography (an invaluable tool

for the researcher) that contains many references

to image indexing

MAIn focus of the Art Ic Le

In this section, we review the main indexing

techniques that have been proposed for image

databases These techniques are grouped and classified by the family of their structure The two main families are the R-tree family (data-driven structures) and the quadtree family (space-driven structures), (subsection 2.1.2 of chapter 6, Mano-lopoulos et al 2000)

Chang (1987) proposed the use of 2-D strings for the structural representation of objects appear-ing in an image Using this technique, structural queries can be answered by exhaustive com-parisons with all the images in the IDB Petrakis (1993) and Orphanoudakis (1996) used hash-based indexing to speed up processing 2D strings are

an efficient representation of the “left/right” and “below/above” relationships Petrakis and Faloutsos (1997), Petrakis (2002) and Petrakis et

al (2002) adopted Attributed Relational Graphs (ARG), the most general image structure rep-resentation method, where individual object, or regions are represented by graph nodes and their relationships are represented by edges between such nodes The method developed by Petrakis and Faloutsos (1997) achieves fast query processing

by making certain assumptions on the presence

of objects in each image Petrakis (2002) and Petrakis et al (2002) relax these assumptions All these ARG-based methods achieve high performance by indexing ARGs with the R-tree family structure

An R-tree is a balanced multiway tree for secondary storage, where each node is related

to a Minimum Bounding Rectangle (MBR), that represents the minimum rectangle that bounds the data elements contained in the node The MBR

of the root bounds all the data stored in the tree Figure 1 depicts some rectangles (MBRs of data elements) on the right and the corresponding R-tree on the left Dotted lines denote the bound-ing rectangles of the subtrees that are rooted

in inner nodes The most widely used R-tee is the R*-tree; for more details refer to Gaede and Günther (1998)

Papadias et al (1998) treat the problem of structural image queries as a Multiple Constraint

Trang 8

Satisfaction (MCS) problem Both the images and

the queries are mapped to regions in a

multidi-mensional space and are indexed by structures of

the R-tree family Query processing is treated as

general form of spatial joins (multi-way spatial

joins)

QBIC (Flincker et al 1995) was one of the

first systems that introduced multidimensional

indexing to enhance performance of CBIR

Color, shape and texture features are extracted

from the images and are represented by points

in high-dimensional spaces Karhunen Loeve

Transform is used to perform dimension

reduc-tion of the feature data (in order to overcome the

degradation of performance of multidimensional

index structures as the dimensionality increases,

a situation known as the “curse of

dimensional-ity”, Lin et al 1994) and a structure belonging to

the R-tree family (an R*-tree) is used as a

multi-dimensional indexing structure

Seidl and Kriegel (2001) present techniques for

adaptable similarity search They use quadratic

distance functions that are evaluated using

multi-dimensional index structures of the R-tree family

(and especially X-trees), dimensionality reduction

and approximation techniques (for an introduction

to X-trees, see Manolopoulos et al 2000)

For efficient processing of queries in image

databases, Quadtrees have also been extensively

used as indexing mechanisms The Quadtree is a

four-way unbalanced tree where each node

cor-responds to a subquadrant of the quadrant of its

father node (the root corresponds to the whole

space) These trees subdivide space in a

hierarchi-cal and regular fashion They are mainly designed for main memory, however several alternatives for secondary memory have been proposed The most widely used Quadtree is the Region Quadtree that stores regional data in the form of raster images Figure 2 depicts an 8x8 pixel array and the corre-sponding Quadtree Note that black/white squares represent black/white regions, while circles rep-resent internal nodes (gray regions) The Linear Region Quadtree is an external memory version

of the Region Quadtree, where each quadrant is represented by a codeword stored in a B+-tree; for more details refer to Samet (1990)

Quadtrees have been used for CBIR tation and querying by image features) by several researchers, as a mechanism for calculating im-age similarity by defining appropriate similarity measures Examples of such work follows In some research efforts, complete Quadtrees with

(represen-a fixed number of levels (represen-are used, since they lead to precisely enough results Each node in the Quadtree stores the features that correspond

to its quadrant, for example, a color histogram (Lin et al., 2001), or a combination of feature

Figure 1 An example of an R-tree

Trang 9

histograms (Malki et al, 1999) De Natale and

Granelli (2001) use unbalanced Quadtrees for

image segmentation to dominant colors Each

quadtree is modelled by a binary array

represent-ing its structure and a label array representrepresent-ing

the dominant color associated to each node or

leaf Ahmad & Grosky (2003) use unbalanced

Quadtrees to decompose an image into a spatial

arrangement of features points (extracted using

image processing techniques) and to quantify

image similarity, while providing geometric

vari-ance independence For search and retrieval, an

indexing scheme based on image signatures and

quadtrees is used Chakrabarti et al (2000), use

Quadtrees to represent two-dimensional shapes

and perform shape-based similarity retrieval The

proposed representation is designed to exhibit

invariance to scale, translation and rotation

Overlapping has been applied to Linear Region

Quadtrees (Tzouramanis et al 2004) In this and

previous papers by the same authors, four

differ-ent extensions of the Linear Region Quadtree are

presented for indexing a sequence of evolving

raster data Moreover, temporal window queries

are defined and studied These queries relate to

the evolution of regional data inside a window in

the course of time

Quadtrees have also been used for creating an

IDB, where image retrieval, insertion, deletion,

comparison and set operations can be applied A

single quadtree is used for all images Its nodes

are associated with the list of images that have

information in the respective quadrants

Vas-silakopoulos & Manolopoulos (1995) proposed

Dynamic Inverted Quadtrees, while Jomier et

al (2000) proposed a version suitable for binary,

gray scale or color images

Corral et al (1999) combine two different

kinds of data and two different kinds of indexing

structures They present five algorithms suitable

for processing join queries between point data

stored in an R-tree and image data stored in a

Linear Region Quadtree

Due to space limitations, the most prominent IDB indexing structures are reviewed in this article The choice of an indexing method among them depends on the application Each of the above techniques has been designed around a specific problem setting A qualitative comparison between them is an interesting direction for future work that lies beyond the scope and the size limit

of this article Descriptions of several other Spatial Access Methods that have been used in IDBs can

be found in Samet (1990), Gaede and Günther (1998) and Manolopoulos et al (2000)

future trends

IDBs are related to two different scientific nities: database and image processing / computer vision researchers Multidimensional access meth-ods as well as information retrieval techniques and their use for query processing, constitute the key meeting point of the two worlds Several of the techniques of the image processing community could make further use of access methods or/and adapt to their properties, leading to more efficient processing of image related queries Related to the previous research direction is the further develop-ment of systems able to retrieve (and, in general, process queries) from image collections existing

commu-in different sources, commu-includcommu-ing the WWW (Rui et

al 1999) and indexing techniques are expected

to play a dominant role in them

In Zhao and Grosky (2002) one of the first techniques for integrated image retrieval based

on low-level features and high-level semantic features of images is presented Mojsilovic et al (2004) present a methodology for semantic-based image retrieval based on low-level image descrip-tors However, neither of these works is based on indexing structures Since image retrieval based

on both these kinds is features is crucial for the usefulness of CBIR systems (for a discussion of the semantic gap see subsection 2.4 of, Smeulders

et al 2000) and still remains one of the big

Trang 10

chal-lenges for researchers, indexing structures could

be used in this context for calculating the

correla-tions between low-level features and high-level

concepts efficiently

In Mao et al (2005) distance-based tree

struc-tures are used for computing the similarity of

im-ages, which are represented by features reflecting

their structure, texture and color Although the

high dimensionality of the feature space

sug-gests that distance-based indexing techniques

are outperformed by sequential scan (curse of

dimensionality), the authors show that the

in-trinsic dimensionality of real data is low and can

apply distance-based indexing that is specifically

designed to reflect the intrinsic clustering of real

data The design and study of more generalized

techniques in this direction is another research

challenge

Despite the extensive research performed

in spatial / spatio-temporal databases, storing a

large database of (possibly evolving) images, or

of regional data sets and being able to efficiently

answer queries between these data and other

sorts of spatial/spatiotemporal data, or queries

involving the notion of time is still a big research

challenge For example, being able to efficiently

answer queries like: find the boats (moving points)

that were inside the storm (changing regional data)

during this morning (a time interval)

conc Lus Ion

In this paper, we have reviewed techniques related

to indexing an image database as a means for

ef-ficiently processing several kinds of queries, like

retrieval based on features, content, structure,

processing of joins, and queries by example

Although, research in this scientific area counts

several years, there are still significant research

challenges Image databases and their indexing

structures bring together two different disciplines

(databases and image processing) and

develop-ing a true Image Database System requires

interdisciplinary research efforts Nevertheless, the semantic gap is alive and querying between images and other kinds of spatial data has not attracted enough attention yet

r eferences

Ahmad, I., & Grosky, W I (2003) Indexing and

retrieval of images by spatial constraints Journal

of Visual Communication and Image tion, 14(3), 291-320.

Representa-Annamalai, M., Chopra, R., DeFazio, S., &

Ma-vris, S (2000) Indexing images in oracle8i In

Proc SIGMOD’00, 539-547.

Chakrabarti, K., Ortega-Binderberger, M., kaew, K., Zuo, P., & Mehrotra, S (2000) Similar

Por-Shape Retrieval in MARS In Proc IEEE Int Conf

on Multimedia and Expo (II), 709-712.

Chang, S K., Shi, Q Y., & Yan, C W (1987)

Iconic indexing by 2-d strings IEEE Trans

Pat-tern Anal Machine Intell., 9, 413-427.

Corral, A., Vassilakopoulos, M., & Manolopoulos,

Y (1999) Algorithms for Joining R-trees and

Linear Region Quadtrees In Proc of SSD’99,

LNCS 1651, 251-269 Spinger Verlag.

De Natale, F G B., & Granelli, F (2001) tured-Based Image Retrieval Using a Structured

Struc-Color Descriptor In Proc Int Workshop on

Content-Based Multimedia Indexing (CBMI’01),

109-115Flickner, M., Sawhney, H., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Pet-kovic, D., Steele, D., & Yanker, P (1995) Query

by Image and Video Content: The QBIC System

IEEE Computer 28(9), 23-32

Gaede, V., & Günther, O (1998) Multidimensional

Access Methods ACM Computing Surveys, 30(2),

170-231

Trang 11

Jomier, G., Manouvrier, M., & Rukoz, M (2000)

Storage and Management of Similar Images

Journal of the Brazilian Computer Society (JBCS),

3(6), 13-26.

Lin, K.-I., Jagadish, H V., & Faloutsos, C (1994)

The TV-tree - an index structure for

high-dimen-sional data VLDB Journal, 3, 517-542.

Lin, S., Tamer, Özsu, M., Oria, V., & Ng, R

(2001) An Extendible Hash for Multi-Precision

Similarity Querying of Image Databases In Proc

of VLDB’2001, 221-230.

Malki, J., Boujemaa, N., Nastar, C., & Winter, A

(1999) Region Queries without Segmentation for

Image Retrieval by Content In proc of 3rd Int

Conf on Visual Information Systems (Visual’99),

115-122

Manolopoulos, Y., Theodoridis, Y., & Tsotras,

V (2000) Image and Multimedia indexing In

Advanced Database Indexing, Kluwer

Publish-ers, 167-184.

Manouvrier, M., Rukoz, M., & Jomier, G (2005)

Quadtree-Based Image Representation and

Re-trieval In Manolopoulos Y., Papadopoulos A &

Vassilakopoulos M (Eds.) Spatial Databases:

Technologies, Techniques and Trends Idea Group

Publishing, Information Science Publishing and

IRM Press, 81-106

Mao, R., Iqbal, Q., Liu, W., & Miranker, D (2005)

Case study: Distance-Based Image Retrieval in

the MoBIoS DBMS, In Proc of the 5th Int Conf

on Computer and Information Technology

(CIT-2005), pp 49-57

Mojsilovic, A., Gomes, J., & Rogowitz, B (2004)

Semantic-Friendly Indexing and Quering of

Im-ages Based on the Extraction of the Objective

Semantic Cues In Special Issue on Content-Based

Image Retrieval of IJCV (56), No 1-2, 79-107.

Papadias, D., Mamoulis, N., & Delis, V (1998)

Algorithms for Querying by Spatial Structure In

& Tortora G (eds.) Intelligent Image Database

Systems ( important contributions in the field of spatial projections), World Scientific Publishing

Co 197-218

Petrakis, E (2002) Fast Retrieval by Spatial

Struc-ture in Image Databases J Vis Lang Comput.,

13(5), 545-569

Petrakis, E., & Faloutsos, C (1997) Similarity

Searching in Medical Image Databases IEEE

Trans Knowl Data Eng 9(3), 435-447

Petrakis, E., Faloutsos, C., & Lin, K-I (2002) ImageMap: An Image Indexing Method Based

on Spatial Similarity IEEE Trans Knowl Data

Eng., 14(5), 979-987

Price, K (2006) Annotated Computer Vision

Bibliography http://iris.usc.edu/Vision-Notes/

bibliography/contents.htmlRui, Y., Huang, T S., & Chang, S.-F (1999) Image retrieval: Current techniques, promising

directions, and open issues Journal of Visual

Communication and Image Representation, 10(1), 39-62.

Samet, H (1990) The Design and Analysis of

Spatial Data Structures Addison Wesley.

Seidl T & Kriegel H.-P (2001) Adaptable

Similar-ity Search in Large Image Databases In Veltkamp

R., Burkhardt H., Kriegel H.-P.(eds.): State-of-the Art in Content-Based Image and Video Retrieval,

Kluwer Publishers, 297-317

Smeulders, A, Worring, M., Santini, S., Gupta, A.,

& Jain, R (2000) Content-Based Image Retrieval

at the End of the Early Years IEEE Trans Pattern

Anal Mach Intell 22(12), 1349-1380.

Trang 12

Tzouramanis, T., Vassilakopoulos, M., &

Manolo-poulos, Y (2004) Benchmarking access methods

for time-evolving regional data Data & Knowl

Eng., 49(3), 243-286

Vassilakopoulos, M., & Manolopoulos, Y (1995)

Dynamic Inverted Quadtree - a Structure for

Pictorial Databases Information Systems

Spe-cial Issue on Multimedia Information Systems,

20(6), 483-500.

Veltkamp, R C., & Tanase, M (2001)

Content-based image retrieval systems: A survey http://

www.aa-lab.cs.uu.nl/cbirsurvey/

Zhao, R., & Grosky, W I (2002) Bridging the

semanitic gap in image retrieval In Distributed

multimedia databases: techniques & applications,

Idea Group Publishing, 14-36

key ter Ms

Access Method or Index Structure: A

technique of organizing data that allows the

efficient retrieval of data according to a set of

search criteria

Color Features of an Image: Characteristics

of an image related to the presence of color

in-formation, like distribution of colors, dominant

colors, or color moments

Content-Based Image Retrieval: Searching

for images in image databases according to their

visual contents, like searching for images with

specific color, texture, or shape properties, for

images containing specific objects, or containing

objects in a specified arrangement

Image Database: An organized collection

of digital images aimed at the efficient ment and the processing of queries on this image collection

manage-Query Processing: Extracting information

from a large amount of data without actually changing the underlying database where the data are organized

Semantic Features of an Image: The contents

of an image according to human perception, like the objects present in the image or the concepts / situations related to the image

Shape Features of an Image: The physical

structure(s) of the objects, or the geometric shapes present in the image

Similarity of Images: The degree of likeness

between images according to a number of features, like color texture, shape, and semantic features

Structural Features of an Image: The

ar-rangement of the objects depicted in the image

Texture Features of an Image: The pattern(s)

of the image’s surface change, usually expressed

by a combination of characteristics like ness, contrast, directionality, uniformity, regular-ity, density, and frequency

Trang 13

coarse-

Chapter IV

Different Roles and De.n itions

of Spatial Data Fusion

Patrik Skogster

Rouaniemi University of Applied Sciences, Finland

Abstr Act

Geographic information is created by manipulating geographic (or spatial) data (generally known by the abbreviation geodata) in a computerized system Geo-spatial information and geomatics are issues

of modern business and research It is essential to provide their different definitions and roles in order

to get an overall picture of the issue This article discusses about the problematic of definitions, but also the technologies and challenges within spatial data fusion.

Due to the rapid advances in database systems

and information technology over the last decade,

researchers in information systems, decision

sci-ence, artificial intelligence (AI), machine learning,

and data mining communities are facing a new challenge: discovering and driving useful and actionable knowledge from massive data sets During the last decade, many researchers have also studied how to exploit the synergy in infor-mation from multiple sources This phenomenon

Trang 14

includes terminology such as spatial data fusion,

information fusion, knowledge (and/or belief)

fusion, and many more

t er MIno Logy

Geospatial data has many definitions, but one point

of view is that it is data consisting of geographical

information, geostatistics and geotextual

infor-mation This theme was handled already in the

mid 80’s by Crist & Cicone (1984a) According

to Crist and Cicone (1984a), geostatistics are data

that is related to a national or subnational unit

and can be georeferenced Geotextual data are

defined as text databases (like treaty databases)

that are linked to some geographic entity Crist

and Cicone also (1984b) argue that data fusion is

not just overlaying maps

Information fusion is a term defined as “a

for-mal framework in which are expressed the means

and tools for the alliance of data originating from

different sources” (Wald 1999) Wald continues

(2000) that spatial data fusion is therefore “the

formal framework that expresses the means and

tools for the alliance of data originating from

dif-ferent sources” It must be remembered, though,

that every definition always reflects the current

subject Wald´s (2000) focus is mainly on the

prominent vision of remote sensing data, where

discussion is about pixel fusion, image fusion,

sensor fusion and measurement fusion

The term “information fusion” can, in other

words, be used when different information and

data is used to solve problems Locational data

can be added to this information fusion context

and the result is spatial data fusion As Kim

(2005) describes, “information fusion can be

implemented at two different levels: raw data

and intermediate data” Information fusion at

raw data basically means taking advantage of the

synergy from considering multiple of the same

pattern (i.e., considering two temporal series

based on different measurement systems), while

information fusion at immediate data is to take the synergy from utilizing multiple patterns (i.e., utilizing both temporal and spatial patterns)(Hall

& Llinas 2001) The information fusion at the raw data level becomes important for example when two different measurements have recorded the same activities or events (e.g., flood level or mar-ket share) at the same location on a regular basis (Vanderhaegen & Muro 2005; Pereira 2002) Knowledge fusion is the process by which het-erogeneous information from multiple sources is merged to create knowledge that is more complete, less uncertain, and less conflicting than the input Knowledge fusion can be seen as a process that creates knowledge Knowledge fusion can also involve annotating the output information with meta-level information about the provenance of the information used and the mode of aggregation (Hunter and Liu 2005; Hunter and Summerton 2004)

Spatial data fusion is a combination of the above mentioned with the dimension of spatial-ity It is by definition an enormous and complex field, comprising issues ranging from registration and pixel-level fusion of data for improving the spatial resolution of managerial decision level fusion by using previously computed informa-tion stored in geographic information systems (Malhotra 1998)

t echno Log Ies w Ith In spAt IAL dAt A fus Ion

It has been estimated that up to 80% of all data stored in corporate databases may have a spatial component (Franklin 1992) To support analytical processes, today’s organizations deploy data ware-houses and client tools such as OLAP (On-Line Analytical Processing) to access, visualize, and analyze integrated, aggregated and summarized data The term “multidimensional” was estab-lished in the mid-1980s by computer scientists who were involved in the extraction of meaningful

Trang 15

0

Different Roles and Definitions of Spatial Data Fusion

information from very large statistical databases

(Rafanelli 2003)

Since a large part of this data has a spatial

component, better client tools are required to

take full advantage of the geometry of the spatial

phenomena or objects being analyzed In this

regard, Spatial OLAP (SOLAP) technology can

be one solution (Rivest et al 2005) A SOLAP

tool can be defined as “a type of software that

allows rapid and easy navigation within spatial

databases and that offers many levels of

informa-tion granularity, many themes, many epochs and

many display modes synchronized or not: maps,

tables and diagrams” (Bedard et al 2005)

As an alternative to the traditional statistic

regression models, new algorithms have been

developed and presented from machine

learn-ing, artificial intelligence, and data mining

communities (Kim 2005) These algorithms

include decision tree algorithms (Quinlan 1993),

support vector machines (SVM) (Burges 1998),

and genetic algorithms (Goldberg 1989) In

particular, many algorithms based on artificial

neural networks (ANNs) and their variants have

been shown very successful to predict, classify,

and describe temporally correlated data (Giles et

al 2001) ANNs can also be applied to various

business applications (Christakos et al 2002)

such as analytical review procedures in auditing

(Koskivaara 2004), stock market predictions (Saad

et al 1998), market segmentations (Hruschka &

Natter 1999) and Web usage profiling

(Ananda-rajan 2002) It is the universal approximation

property of ANNs that makes ANNs one of the

most popular algorithms to analyze temporally

correlated stochastic processes The universal

approximation property implies that with an

infinite number of hidden nodes, multi-layer

neural networks can approximate any function

arbitrarily close (Hornik et al 1989) Commonly,

multi-layer perceptions with sigmoidal and radial

basis functions have been used as alternatives to

the linear stochastic model, AR(p) model

Another type of neural network, time-delay neural networks (TDNNs), has been used to ap-proximate a stochastic process In TDNNs, input patterns arrive at hidden nodes at different times through delayed connections and thus can influ-ence subsequent inputs A different type of neural network, recurrent neural networks (RNNs), has also been proposed to model temporally correlated data sets “Jordan” (Jordan 1986) and “Elman” networks (Elman 1990) are two representative examples of RNNs Both networks employ feedback connections to enhance the limited representational power of networks due to a nar-row time window For example, Jordan networks have a feedback loop from the output layer to an additional input called the context layer However, both types of networks are still restricted in the sense that they cannot deal with an arbitrarily long time window (Dorffher 1996)

Since transactional systems are not designed

to support the decisional processes, new types

of systems have been developed to perform data fusion such as those developed by Fischer and Ostwald (2001) The solutions are technically called Analytical Systems and are known on the market as Business Intelligence (BI) solutions Rivest et al (2005) explain that these systems,

in which the data warehouse is usually a central component, are optimized to facilitate complex analysis and to improve the performance of database queries involving thousands or more occurrences According to Meeks and Dasgupta (2004), “in the short-term, the models need re-finement, primarily through applying statistical confidence to the scoring functions” Popular web search engines can also be seen as BI- solutions They use several different evaluation schemes such as keyword proximity, keyword density, and synonym matching, among others, to estimate the quality of links and files returned from Internet text searches Analyses made by their servers combine different search terms with relevant data available, in other words search engines perform data fusion

Trang 16

Extensible Markup Language (XML) is

applied to control all definitions of discovered

patterns and rules to ensure the consistency of

the proposed knowledge map The advantage

of XML is that it represents a compromise

be-tween flexibility, simplicity, and readability by

both humans and machines (World Wide Web

Consortium (W3C), 2000) So XML is rapidly

becoming an information-exchange standard for

integrating data among various Internet-based

applications (Bertino and Ferrari 2001) It must,

though, be noticed that especially in the web-

browsing context exist also numerous other data

fusion standards

Fusion rule technology is a logic-based

ap-proach to knowledge fusion A set of fusion rules

is a way of specifying how to merge structured

reports Structured reports are XML documents,

where the data entries are restricted to

indi-vidual words or simple phrases, such as names

and domain-specific terminology, numbers and

units Different sets of fusion rules, with

differ-ent merging criteria, can be used to investigate

a set of structured data analyses by looking at

the results of merging More information can

be found in Hunter & Liu (2005) and Hunter &

Summerton (2004)

Asynchronous JavaScript and XML (AJAX)

is created to build dynamic web pages on the

client side Data is read from the server or sent

to the server by JavaScript requests However,

some processing at the server side is required to

handle requests, i.e., finding and storing the data

This is accomplished more easily with the use of a

framework dedicated to process Ajax requests In

the article that coined the “AJAX” term, Garrett

(2005) describes the technology as “an

intermedi-ary between the user and the server.” This Ajax

engine is intended to suppress so called waiting

time for the user when the page attempts to

ac-cess the server The goal of the framework is to

provide this Ajax engine and associated server

and client-side functions

c hALLenges In spAt IAL dAt A fus Ion

The use of various spatio-temporal data and information usually greatly improves decision-making in many different fields (Cristakos et

al 2003) Examples can be found in Meeks and Dasgupta (2004) When using spatial and temporal information to improve decision making, atten-tion must be paid to uncertainty and sensitivity issues (Crosetto & Tarantola 2001)

Because the spatial data fusion process is by its origins a process that produces data assimila-tions, the challenges it is facing are largely related

to the data handling process These include the ability to accept higher data rates and volumes, improved analysis performance and improved multiple operations Data integration processes, synchronous sampling and common measurement standards are developed for optimizing the data fusion performance This includes increasing both the data management process and data col-lection efficiency

Fusion processes are not yet robust enough They must be capable of accepting wider ranges

of data types, accommodating natural language extracted text reports, historical data and vari-ous spatial information (maps, charts, images) Therefore, the processes must have learning abili-ties Fusion processes must develop adaptive and learning properties, using the operating process

to add to the knowledge base while exhibiting a sensitivity to “unexpected” or “too good to be true” situations that may indicate countermeasures

or deception activities

The performance of processes needs to crease exponentially When the amount of pro-cessed data increases and analyses become more complicated, efficient, linked data structures are required to handle the wide variety of data Data volumes for global and even regional spatial da-tabases will be measured in millions of terabits, and short access times are demanded for even broad searches

Trang 17

in-

Future aspects on spatial data fusion are

sub-jects of great uncertainty Nevertheless, merging

spatial data through the use of WMS (Web Map

Sevices) or geoRSS (Really Simple Syndication)

seems to become more and more common practice,

as well as strategic decisions based on spatial data

infrastructure (SDI) context

The collection of data and its availability can

also be seen as a strategic matter Roberts et

al (2005) highlights the importance of making

sense of networks that comprise many nodes and

are animated by flows of resources and

knowl-edge The transfer of managerial practices and

knowledge is essential to the functionality of

these networks and resources A survey made

by Vanderhaeden & Muro (2005) reveals that

almost all of the organisations (90%) making

use of spatial data “experience problems with the

availability, quality and use of spatial data” In

general, the organisations using the widest range

of data types experienced the greatest difficulties

in using the data

The quality of the spatial data is still only

one of the many factors that must be taken into

consideration within spatial data fusion Clearly,

the results of any spatial data fusions are only as

good as the data on which it is based (Johnson et

al 2001) One approach to improve data quality

is the imposition of constraints upon data entered

into the database (Cockcroft 1997) The proposal

is that “better decisions can be made by

account-ing risks due to errors in spatial data” (Van Oort

& Bregt 2005)

c onc Lus Ion

A great amount of data located in various

data-bases have a spatial component New innovative

applications can be produced by assimilating

in-formation with other data This paper introduced

the terminology and technology associated with

spatial data fusion Data fusion is the process of

de-tection, association, correlation, and combination

of data and information from multiple sources In order to lead the reader further, material mentioned

in the list of references are suggested and allow one to go beyond the traditional transactional data fusion capabilities

r eferences

Anandarajan, M (2002) Profiling Web Usage

in the Workplace: A Behavior-based Artificial

Intelligence Approach Journal of Management

Information Systems 19(1), 243-266

Bertino, E., & Ferrari, E (2001) XML and data

integration IEEE Internet Computing, 5(6),

75-76

Burges, C J C (1998) A Tutorial on Support

Vector Machines Data Mining and Knowledge

Discovery, 2(2), 1-27.

Christakos, G., Bogaert, P., & Serre, M (2002)

In: Temporal GIS: Advanced Functions for

Field-based Applications Berlin: Springer

Crist, E P., & Cicone, R C (1984a) Comparisons

of the dimensionality and features of simulated

Landsat-4 MSS and TM data Remote Sensing of

Environment, 14(1-3), 235-246

Crist, E P., & Cicone, R C (1984b) A cally-based transformation of Thematic Mapper

physi-data-the TM tasseled cap IEEE Transactions

on Geoscience and Remote Sensing, 22(3),

256-263

Cockcroft, S (1997) A Taxonomy of Spatial

Data Integrity Constraints GeoInformatica 1(4),

327-343Crosetto, M., & Tarantola, S (2001) Uncertainty and sensitivity analysis: Tools for GIS-based

model implementation International Journal

of Geographical Information Science 15(5),

415–437

Trang 18

Dorffher, G (1996) Neural Networks for Time

Series Processing Neural Network World 6(4),

447-468

Elman, J L (1990) Finding Structure in Time

Cognitive Science,14(2), 179-212.

Fischer, G., & Ostwald, J (2001) Knowledge

management: problems, promises, realities, and

challenges IEEE Intelligent Systems, 16(1),

60-72

Franklin, C (1992) An introduction to geographic

information systems: Linking maps to databases

Database 15(2), 13–21.

Garrett, J J (2005) Ajax: A New Approach to

Web Applications http://www.adaptivepath.

com/publications/essays/archives/000385.php,

visited 25.4.2006

Giles, C L., Lawrence, S., & Tsoi, A C (2001)

Noisy Time Series Prediction using a Recurrent

Neural Network and Grammatical Inference

Machine Learning, 44(1/2), 161-183.

Goldberg, D E (1989) Genetic Algorithms in

Search, Optimization and Machine Learning

New York: Addison-Wesley

Hall, D L., & Llinas, J (2001) Handbook on

Mul-tisensor Data Fusion Boca Raton: CRC Press.

Hornik, K., Stinchcombe, M., & White, H

(1989) Multi-layer Feedforward Networks are

Universal Approximators Neural Networks,

2(5), 359-366.

Hruschka, H., & Natter, M (1999) Comparing

Performance of Feedforward Neural Nets and

K-means for Market Segmentation European

Jour-nal of OperatioJour-nal Research, 114(2), 346-353.

Hunter, A., & Liu, W (2005) Fusion rules for

merging uncertain information Information

Fusion, 7, 97-134.

Hunter, A., & Summerton, R (2004) Fusion rules

for context-dependent aggregation of structured

news reports Journal of Applied Non-classical

Logic, 14(3), 329-366

Johnson, R G (2001) In: United States Imagery

and Geospatial Information Service Geospatial Transition Plan Bethesda, MD: National Imagery

and Mapping Agency

Jordan, M I (1986) Serial Order: A Parallel

Distributed Processing Approach Technical port ICS 8604 San Diego: Institute for Cognitive

Re-Sciences, University of California

Kim, Y (2005) Information fusion via a

hier-archical neural network model The Journal of

Computer Information Systems, 45(4), 1-14.

Koskivaara, E (2004) Artificial neural networks

for analytical review in auditing Publications

of the Turku School of Economics and Business Administration A-7

Malhotra, Y (1998) Deciphering the knowledge

management hype Journal for Quality &

Perspective Risk Analysis, 25(6), 1599-1610.

Pereira, G M (2002) A typology of spatial and

temporal scale relations Geographical Analysis,

34(1), 21–33.

Quinlan, J R (1993) Programs for Machine

Learning San Mateo, CA: Morgan Kaufmann.

Rafanelli, M (2003) Multidimensional

Data-bases: Problems and Solutions London: Idea

Group Publishing

Rivest, S., Bedard, Y., Proulx, M-L., Nadeau, M., Hubert, F., & Pastor, J (2005) SOLAP technol-ogy: Merging business intelligence with geospa-

Trang 19

tial technology for interactive spatio-temporal

exploration and analysis of data ISPRS Journal

of Photogrammetry and Remote Sensing, 60(1),

17-33

Roberts, S M., Jones, J P., & Frohling, O (2005)

NGOs and the globalization of managerialism: A

research framework, 33(11), 1845-1864.

Saad, E., Prokhorov, D., & Wunsch, D (1998)

Comparative Study of Stock Trend Prediction

Using Time Delay, Recurrent and Probabilistic

Neural Networks IEEE Transactions on Neural

Networks, 9(6), 1456-1470

Vanderhaegen, M., & Muro, E (2005)

Contri-bution of a European spatial data infrastructure

to the effectiveness of EIA and SEA studies

Environmental Impact Assessment Review, 25(2),

123-142

Wald, L (2000) A Conceptual Approach To The

Fusion Of Earth Observation Data Surveys in

Geophysics, 2(2-3), 177-186.

Wald, L (1999) Some Terms of Reference in Data

Fusion IEEE Transactions on Geosciences and

Remote Sensing, 37(3), 1190-1193.

World Wide Web Consortium (2000), Extensible

Markup Language (XML) 1.0, 2nd ed., World Wide

Web Consortium, available at: www.w3.org/TR/

REC-xml (1st ed published in 1998), Vol W3C

Recommendation

key t er Ms

AJAX Processes: A scripting technique for

silently loading new data from the server

Al-though AJAX scripts commonly use the soon to

be standardized XMLHttpRequest object, they

could also use a hidden iframe or frame An AJAX

script is useless by itself It also requires a DOM

Scripting component to embed the received data

in the document

Arti cial Intelligence (AI): Multidisciplinary

field encompassing computer science, science, philosophy, psychology, robotics, and linguistics, and is devoted to the reproduction of the methods or results of human reasoning and brain activity

neuro-Artificial Neural Networks (ANN): Also

called a simulated neural network (SNN) or just a neural network (NN), is an interconnected group

of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computa-tion

Data Mining: The analysis of data to establish

relationships and identify patterns

Extensible Markup Language (XML): A

W3C-recommended general-purpose markup language for creating special-purpose markup languages, capable of describing many different kinds of data XML is a way of describing data

GeoRSS: “RSS” is variously used to refer to

the following: Really Simple Syndication (RSS 2.0), Rich Site Summary (RSS 0.91, RSS 1.0) and RDF Site Summary (RSS 0.9 and 1.0) It can be defined as a family of web feed formats

In the RSS- context geographical data is known

as geoRSS

Geospatial Data: Data consisting of

geo-graphical information, geostatistics and tual information

geotex-Raw Data: Uninterpreted data from a storage

medium The maximum amount of raw data that can be copied from a storage medium equals the capacity of the medium

Spatial Data Infrastructure (SDI): Often

used to denote the relevant base collection of technologies, policies and institutional arrange-ments that facilitate the availability of and access

to spatial data

Trang 20

Web Map Service (WMS): Produces maps of

spatially referenced data dynamically from

geo-graphic information This international standard

defines a “map” to be a portrayal of geographic

information as a digital image file suitable for

display on a computer screen A map is not the

data itself WMS-produced maps are generally

rendered in a pictorial format such as PNG, GIF

or JPEG This is in contrast to a Web Feature

Service (WFS), which returns the actual data

Trang 21

Universitat Jaume I, Spain

Miguel Ángel Manso

Technical University of Madrid, Spain

Miguel Ángel Bernabé

Technical University of Madrid, Spain

Abstr Act

Geographic Information Systems (GIS) are data-centric applications that rely on the input and constant maintenance of large quantities of basic and thematic spatial data in order to be useful tools for decision-making This chapter presents the institutional collaboration framework and the major technology components to facilitate discovery and sharing of spatial data: Spatial Data Infrastructures (SDI) We review the essential software components –metadata editors and associated catalogue services, spatial data content repositories, client applications, and middleware or intermediate geospatial services– that define SDIs as heterogeneous distributed information systems Finally we highlight future research needs

in the areas of semantic interoperability of SDI services and in improved institutional collaboration.

Trang 22

Int roduct Ion

Geographic Information Systems (GIS) and

re-lated spatial applications are data-centric in the

sense that they rely on the input and constant

main-tenance of large quantities of reference spatial data,

on top of which integrators and end-users produce

value-added thematic geographic information for

the purpose of decision-making A typical GIS

workflow can be simplified as consisting of three

components: 1) data entry and reformatting, 2) data

processing (geoprocessing), and 3) presentation

of results to the user In practice, this apparently

simple workflow is constrained by two key factors

The first is limited interoperability among GIS

components, because most are tightly coupled

to specific data formats or to other software,

complicating the task of integrating components

from multiple vendors The second is that the

basic spatial data (reference data) necessary to

begin geoprocessing are in many cases not readily

available, because they are poorly documented,

outdated, are too expensive, or are available under

restrictive licensing conditions This second factor

has been seriously limiting the ability of

govern-ment employees, researchers and businesses to

exploit geographic information, unnecessarily

incrementing project costs and, thus, negatively

affecting the economy

Many government administrations have

rec-ognized this critical problem and have initiated

coordinated actions to facilitate the discovery and

sharing of spatial data, creating the institutional

basis for Spatial Data Infrastructures (SDI) (van

Loenen and Kok 2004) The Global Spatial Data

Infrastructure (GSDI) association (www.gsdi.org)

defines SDI as a coordinated series of agreements

on technology standards, institutional

arrange-ments, and policies that enable the discovery

and facilitate the availability of and access to

spatial data The SDI, once agreed upon and

implemented, serves to connect GIS and other

spatial data users to a myriad of spatial data

sources, the majority of which are held by public

sector agencies

In 1990 the U.S Federal Geographic Data Committee (FGDC) was created and in 1994, then president, William Clinton, asked it (Ex-ecutive Order 12906) to establish a national SDI

in conjunction with organizations from state, local, and tribal governments, the academic community, and the private sector Three years later the European Umbrella Organization for Geographic Information (EUROGI) was created with the mission to develop a unified European approach to the use of geographic technologies (a mission far from complete) More recently, the European Commission launched the Infrastruc-ture for Spatial Information in Europe (INSPIRE) initiative for the creation of a European Spatial Data Infrastructure (ESDI), based on a Frame-work Directive (European legislation) defining how European member states should go about facilitating discovery and access to integrated and interoperable spatial information services and their respective data sources As the number

of national SDIs increased, to include in 2004 about half the nations worldwide (Masser 2005; Crompvoets et al 2004), the Global Spatial Data Infrastructure (GSDI) Association was created to promote international cooperation and collabora-tion in support of local, national, and international SDI developments

The basic creation and management principles

of SDI apply to any and all spatial jurisdictions

in a spatial hierarchy, from municipalities to regions, states, nations, and international areas Béjar et al (2004) show how each SDI at each level in the hierarchy can be created in accordance with its thematic (e.g., soils, transportation) and geographical (e.g., municipality, nation) cover-age, following international standards-based processes and interfaces, to help ensure that the SDIs fit like puzzle pieces, both geographically and vertically (thematically) This harmoniza-tion exercise is necessary to allow for seamless spatial data discovery and exploitation crossing jurisdictional boundaries, in the case of response

to flooding or forest fires, just to name two

Trang 23

im-

Spatial Data Infrastructures

portant cross-border applications In practice this

harmonization has been difficult to achieve due

to political but also semantic-related differences

between neighboring regions An early (1980s)

European exercise in cross-border harmonization,

stitching together nationally-produced pieces of

the Coordinated Information on the European

Environment (CORINE) land cover database,

highlighted some of these discrepancies at regional

and national borders: experts on both sides

dis-agreed on how to classify the same, cross-border

land cover regions

sd I essent IAL co Mponents

Although SDIs are primarily institutional

col-laboration frameworks, they also define and guide

implementation of heterogeneous distributed

information systems, consisting of four main

software components linked via Internet These

components are: 1) metadata editors and

associ-ated catalogue services, 2) spatial data content

repositories, 3) client applications for user search

and access to spatial data, and 4) middleware or intermediate geoprocessing services which assist the user in finding and in transforming spatial data for use at the client side application Figure 1 summarizes these essential technol-ogy components, as generally accepted within the geographic information standards organizations Open Geospatial Consortium (OGC) (www.open-geospatial.org) and ISO Technical Committee

211 (www.isotc211.org), and synthesized by the FGDC and NASA This conceptual architecture may be interpreted as a traditional 3-tier client-middleware-server model, where GI applications seek spatial data content that are discovered and then possibly transformed or processed by intermediary services before presentation by the client application But the architecture also may

be interpreted using the web services find-bind’ triangle model (Gottschalk et al 2002), whereby spatial data content (and service) offers are published to catalogue servers, which are later queried to discover (find) data or services, and then the client application binds to (consumes or executes) them

‘publish-Figure 1 High-level SDI architecture, taken from the FGDC-NASA Geospatial Interoperability ence Model (GIRM), (FGDC 2003)

Trang 24

Refer-Regardless of the precise conceptual model

adopted, what is common among nearly all SDIs

is the primary goal of improving discovery and

access to spatial data Discovery is based on

the documentation of datasets to be shared, in

the sense of metadata following international

standards such as ISO 19115/19139 Metadata

describing the content, geographic and temporal

coverage, authorship, access and usage rights

details, and other attributes of a dataset are

cre-ated within GIS applications or externally using

specialized text editors The metadata files are

stored in standard XML formats and are then

sent (published) to some data catalogue server,

in many cases one which is located at a central

node of the SDI but in principle may be distributed

anywhere on the network

Users wishing to discover spatial data sources

normally access catalogue search interfaces via

web applications called Geoportals (Bernard

et al 2005), examples of which may be found

at http://www.geo-one-stop.gov/ and

http://eu-geoportal.jrc.it/ The geoportal is an interface

façade, both hiding the implementation details of

the underlying catalogue query mechanisms, and

inviting participation in the SDI community In

addition to discovery queries, the geoportal also

normally provides free access to quick looks or

small samples of datasets that are discovered

This spatial data visualization is frequently

imple-mented as software employing Web Map Service

(WMS) software interfaces (OGC 2006), allowing

for integration of heterogeneous client and server

products from multiple vendors, as proprietary

or free software solutions WMS-based services

receive a request for a certain spatial data layer

and for a certain geographical extent, convert the

data (initially in vector or raster format) to create

a bitmap (standard MIME formats such as JPEG,

GIF, PNG) and then deliver the image to the web

client (browser or GIS/SDI client)

More sophisticated spatial data (web) services

are becoming available, many of which also

following de jure ISO standards and de facto

specifications from organizations such as OGC, Organization for the Advancement of Structured Information Standards (OASIS), and W3C These include services providing concrete functionality such as coordinate transformation, basic image processing and treatment, basic geostatistics, and composition or chaining of individual services to form more complex services

Summarizing both the institutional and technological aspects, article 1 of the INSPIRE European Directive proposal (EC 2004), lists the following necessary components for SDIs:

“The component elements of those tures shall include metadata, spatial datasets and spatial data services; network services and tech-nologies; agreements on sharing, access and use; and coordination and monitoring mechanisms, processes and procedures.”

infrastruc-Caution should be taken when describing more narrow initiatives, projects or products which provide only a subset of these requirements, as SDI This is especially the case of institutional agreements and cooperation for improved access

to spatial data: the above principles and nents should be explicitly involved

compo-f uture r ese Arch

SDI researchers are active on several fronts, but two main areas of interest are: improving institutional collaboration and SDI effective-ness (including cost-benefit analyses and more elaborate data access policy), and SDI component implementation and testing In the second area fall topics such as semantic interoperability and composition of SDI (web) services, the integra-tion of so-called disruptive technologies such as Google Earth and similar commercial services, grass-roots initiatives contributing user-generated data, integration with grid computing and with e-Government solutions, and exploitation of data from diverse sensor networks

Trang 25

0

Spatial Data Infrastructures

For further details on SDI the reader should

consult the Spatial Data Infrastructure Cookbook

(GSDI 2004) and the European Commission’s

International Journal of SDI Research (http://

ijsdir.jrc.it)

r eferences

Béjar, R., Gallardo, P., Gould, M., Muro, P.,

Nogueras, J., & Zarazaga, J (2004) A high level

architecture for national SDI: The Spanish case

EC-GI&GIS Workshop, Warsaw, June 2004

Retrieved April 4, 2006, from http://www.ec-gis

org/Workshops/10ec-gis/

Bernard, L., Kanellopoulos, I., Annoni, A., &

Smits, P (2005) The European geoportal — one

step towards the establishment of a European

Spatial Data Infrastructure Computers,

Environ-ment and Urban Systems, 29(1), 15–31.

Crompvoets, J., Bregt, A., Rajabifard, A., &

Williamson, I (2004) Assessing the worldwide

status of national spatial data clearinghouses

International Journal of Geographical

Informa-tion Science, 18(7), 665-689.

EC Commission of The European

Commu-nities (2004) Proposal for a Directive of the

European Parliament and of the Council

estab-lishing an infrastructure for spatial information

in the Community (INSPIRE), COM(2004) 516

Retrieved April 4, 2006, from http://inspire.jrc

it/proposal/EN.pdf

FGDC (2003) The Geospatial Interoperability

Reference Model, version 1.1 Federal Geographic

Data Committee Geospatial Applications

Interop-erability (GAI) Working Group Retrieved April

4, 2006, http://gai.fgdc.gov/girm/v1.1/

Gottschalk, K., Graham, S., Krueger, S., & Snell,

J (2002) Introduction to Web services

archi-tecture IBM Systems Journal, 41(2) Retrieved

April 4, 2006, from http://researchweb.watson

ibm.com/journal/sj/412/gottschalk.html

GSDI (2004) Spatial Data Infrastructure book (English version 2.0) Retrieved April 4,

Cook-2006, from kindex.asp

http://www.gsdi.org/gsdicookboo-Masser, I (2005) GIS Worlds; Creating Spatial

Data Infrastructures Redlands, California: ESRI

Press

OGC (2006) OpenGIS Web Map Service (WMS) implementation specification, version 1.3 Retrieved April 4, 2006, from http://portal.opengeospatial.org/files/?artifact_id=14416

van Loenen, B., & Kok, B.C (Eds.) (2004)

Spa-tial data infrastructure and policy development

in Europe and the United States Delft: Delft

University Press

key t er Ms

FGDC: Federal Geographic Data Committee,

an interagency committee established in the US

in 1990, with the mandate to create and support data sharing, in the form of the US National SDI http://www.fgdc.gov

GI: Geographic information, the subset of

information pertaining to, or referenced to, known locations on or near the Earth’s surface

GSDI: Global Spatial Data Infrastructure

Association An umbrella organization grouping national, regional and local organizations dedi-cated to the creation and maintenance of SDIs around the world http://www.gsdi.org

OGC: Open Geospatial Consortium, a

mem-bership body of 300-plus organizations from the commercial, government and academic sectors, that creates consensus interface specifications

in an effort to maximize interoperability among software detailing with geographic data http://www.opengeospatial.org

SDI: Spatial Data Infrastructure

Trang 26

WMS: Web Map Service, a software interface

specification published by the Open Geospatial

Consortium (OGC) The specification defines

how software clients should formulate queries

to compliant map servers, and how those servers

should behave and respond

Tiêu đề	Querying GML
Trường học	Unknown University
Chuyên ngành	Information Science
Thể loại	Research Paper
Năm xuất bản	Unknown
Thành phố	Unknown City

Định dạng
Số trang	52
Dung lượng	2,44 MB