It is based on a robust data model and algebra and it offers all the features of an XML query language and a wide set of spatial opera-tors.. Query Language: Computer language used to m
Trang 1
Querying GML
distances between spatial objects and calculating
relative direction Some other forms of spatial
predicates are realized as a set of functions
The basic syntax of GML-QL is the same as
that of XQuery, with added spatial functions
The following are some examples of GML-QL
queries
Query 1: List the name, population, and area
of each country for the file “country.XML”
Query 2: The St Lawrence River can supply
water to the cities which are within 300 km, if
needed List the cities which can use water from
the St Lawrence This query illustrates buffer
analysis and spatial join operations
to variable $c Finally, the result is constructed as defined in the return clause The values obtained from the query, which is bound to the variable, can be used to construct new elements in the result if necessary In this example, a function, area(), is used to calculate the area of a country The second example illustrates buffer analysis and spatial joins operations by using overlap() and buffer() functions
Although Vatsavai (2002) mentioned that spatial operators in this query language are imple-mented as a set of functions, the details of these functions were not given We therefore suppose that this query language has neither a particular application nor an implementation Nevertheless,
it offered a novel and interesting approach to define a GML query language based on XQuery and OGC-SQL (Open Geospatial Consortium, 1999) spatial operators
CXQuery (Constraint XML Query Language)
CXQuery or Constraint XML Query Language
(Chen and Revesz, 2002) is a declarative, style language for querying and updating XML documents It employs the syntax and semantics
Datalog-of constraint query languages (Kanellakis et al.,
Topologically closed geometries Disjoint(a,b) ⇔ a ∩ b = ∅
a and b applies to the A/A, L/L, L/A, P/A and P/L groups
of relationships Touches(a,b) ⇔ (I(a)∩I(b) = ∅) ∧ (a ∩ b) ≠∅
a and b applies to the P/L, P/A, L/L and L/A Crosses(a,b) ⇔ (dim(I(a) ∩ I(b)) < max(dim(I(a)),
dim(I(b)))) ∧ (a ∩ b ≠a ) ∧ (a ∩ b ≠b)
a and b applies to the A/A, P/A,L/A,L/L Within(a,b) ⇔ (a ∩ b = a) ∧ (I(a) ∩I(b) ≠ ∅)
a and b applies to the A/A, P/A,L/A,L/L Contains(a,b) ⇔ Within(b,a)
a and b applies to the A/A, P/A,L/A,L/L Intersects(a,b) ⇔ not.Disjoint(a,b)
Table 1 Definition of predicates
Trang 21990) The input of a CXQuery is a set of XML
documents The output of a CXQuery query is
also an XML document When CXQuery is used
to define views, the query result is not
material-ized
A CXQuery expression contains a rule head
and a rule body, with a “:-” symbol between them
The rule body contains a set of predicates, which
are separated by semicolons The semicolons stand
for the logical operation “and” To simplify the
CXQuery expression, it employs a subset of XPath
functionality to navigate the hierarchical structure
of XML documents and to avoid namespace
con-flicts Since most XML documents exchanged in
e-Business have relatively restricted structures,
CXQuery considers those XML documents that
have internal DTD definitions or have external
DTD definition connections
Due to these difficulties, to date there is no
query language proposal which supports querying
spatial XML documents Since both CXQuery
and many constraint query languages are based
on Prolog, they can be easily combined Since
constraint query languages can express
spatio-temporal queries, the combination leads to a
query language for XML documents that contain
spatio-temporal data Moreover, combination
can be easily implemented on top of a constraint
database system
Query 3 shows a spatial query: Find all
build-ings located in citycampus and belonging to the
Computer Science department
Building/dept = «Computer Science».
The first two rules construct the constraint representation of the spatial data from the XML documents The third rule uses a spatial func-tion contains() to test the spatial relation of two spatial objects
One way in which CXQuery improves upon
XQuery is by specifying schemas for the results
of queries Chen and Revesz (2002) claim that query results without schemas are limited for defining views, integrating data, updating, and further querying XQuery can query the results
of a query without a schema provided
The main focus of this query language is to provide schema information in the query result
Since CXQuery is derived from a constraint query language and the fact that constraint query
languages can express spatial-temporal queries, the combination leads to a query language for XML documents that contains support for spa-tial-temporal data
Gquery
Gquery (Boucelma and Colonna, 2004) is yet
another GML query language based on XQuery
Unlike GML-QL, Boucelma and Colonna (2004)
define a set of Gquery-specific spatial operators and basic data types Its data types are polygon, line and point, in the same way as the basic data types defined for GML
The spatial operators can be classified into three groups: operators that return boolean type (equal, inside and cut), operators that return float type (distance, perimeter and length) and opera-tors that return spatial type (convexhull, center, intersection)
Trang 3
Querying GML
Query 4 is an example It obtains the
intersec-tion point between a road and a river:
for n in city
return intersection(n/road, n/river)
Gquery is designed for use in a particular
mediator architecture It provides an integrated
view of the data supplied by all sources, and
Gquery makes it possible to access and manipulate
integrated data
c onc Lus Ion
Currently, there is a large set of query languages
over XML Although each one is based on
differ-ent algebra and data models, all of them have the
same aim: to query semi-structured data
There are fewer query languages for GML
documents Since GML is an XML encoding, the
features of XML could be applied to GML With
this, a GML query language should extend a query
language over XML with spatial features
In fact, in this chapter we have discussed four
query languages over GML The first one is a
novel extension of a previous query language
over XML It is based on a robust data model and
algebra and it offers all the features of an XML
query language and a wide set of spatial
opera-tors Since it was the first approach in this area,
it has inspired other query languages (Chung et
al., 2004)
The other three approaches are an extension
of XQuery, with different aims and perspectives
The first of these, GML-QL, was the first novel
approach of a GML query language based on
XQuery Since the literature about GML-QL is
rather scarce, we suppose that this query
lan-guage has neither a particular application nor
an implementation Furthermore, details about
spatial operators and functions were not given
by Vatsavai (2002)
Although the second of these, CXQuery , is
based on XQuery, it offers an interesting approach
for a spatial query language over GML CXQuery
allows to query and update XML documents
us-ing the syntax and semantics of constraint query
languages This query language is currently the best approach over GML
The last approach, Gquery, defines a set of spatial operators for GML It is a specific ap-proach to be applied in a particular mediator architecture
In conclusion, GML can represent database resources on the web, etc which can be queried
with a specific query language Query languages
over GML are a reality
r eferences
Abiteboul, S., Quass, S., McHugh, J., Widom, J.,
& Wiener, J (1997) The Lorel Query Language
for Semistructured Data International Journal
on Digital Libraries, 1(1), 68-88.
Beech, D., Malhotra A., & Rys, M (1999)
A Formal Data Model and Algebra for XML
http://www-db.stanford.edu/dbseminar/Archive/FallY99/ malhotra-slides/malhotra.pdf
Boucelma, O., & Colonna, F (2004) Mediation
for Online Geoservices In 4th International
Workshop on Web and Wireless Geographical Information Systems W2GIS 2004 Korea.
Chen, Y., & Revesz, P (2002) CXQuery: A Novel
XML Query Language In Proc of International
Conference on Advances in Infrastructure for Electronic Business, Science, and Medicine on the Internet (SSGRR’02).
Chung, W., Park, S., & Bae, H (2004) An sion of XQuery for Moving Objects over GML
Exten-ITCC Proc Of the International Conference on
Information Technology: Coding and ing IEEE.
Trang 4Comput-Córcoles, J E., & González, P (2001) A
Speci-fication of a Spatial Query Language over GML
ACM-GIS 2001 9th ACM International
Sympo-sium on Advances in Geographic Information
Systems Atlanta (USA)
Deutsch, A., Fernandez, M., Florescu, D., Levy, A.,
& Suciu, D (1999) XML-QL: A Query Language
for XML Computer Networks, 31, 11-16.
Kanellakis, P C Kuper, G M., & Revesz (1990)
P Constraint Query languages, Symposium on
Principles of Database Systems.
Open Geospatial Consortium (1999) Simple
Features Specification For SQL, 05-1341 Open
Geospatial Consortium Retrieved 13th January,
2005, from http://www.opengeospatial.org
Open Geospatial Consortium (2003) Geography
Markup Language – GML Retrieved 13th
Janu-ary, 2005, from http://www.opengis.org/techno/
documents/02-023r4.pdf
Robie, J (1998) The design of XQL Retrieved
13th January, 2005, from http://www.w3.org/style/
XSL/Group/1998/09/XQL-design.html
Vatsavai, R (2002) GML-QL: A Spatial Query
Language Specification for GML Retrieved 13th
January, 2005, from
http://www.cobblestone-concepts.com/ucgis2summer2002/ vatsavai/
vatsavai.htm
W3C (1998) XSL Retrieved 13th January, 2005,
from http://www.w3 org/TR/REC-XML
W3C (2001) XQuery: A Query Language for XML
Retrieved 13th January, 2005, from http://www.w3.org/TR/2001/WD-XQuery-20010215
W3C (2005) Extensible Markup Language
– XML Retrieved 13th January, 2005, from
http://www.w3c.org/XML/
key t er Ms
Feature: A feature is an application object
that represents a physical entity, e.g a building, a river, or a person A feature may or may not have geometric aspects
Markup Language: Language which
com-bines text and extra information about the text The extra information is expressed using markup, which is intermingled with the primary text
Query Language: Computer language used
to make queries into databases and information systems
Semi-Structured Data: Data with incomplete
structure Data are directly described using a simple syntax, e.g XML, GML, etc
Trang 5Technical University of Varna, Bulgaria
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
Image Databases (IDBs) are a kind of Spatial Databases where a large number of images are stored and queried In this chapter, techniques for indexing an IDB for efficiently processing several kinds of queries, like retrieval based on features, content, structure, processing of joins, and queries by example are reviewed The main indexing techniques used in IDBs are either members of the R-tree family (data driven structures), or members of the quadtree family (space driven structures) Although, research
on IDB indexing counts several years, there are still significant research challenges, which are also discussed in this chapter IDBs and their indexing structures bring together two different disciplines (databases and image processing) and interdisciplinary research efforts are required Moreover, deal- ing with the semantic gap (successful integrated retrieval based on low-level features and high-level semantic features) and querying between images and other kinds of spatial data are also significant future research directions.
Trang 6Introduct Ion
Image Databases (IDBs) are a special kind of
Spatial Databases where a large number of images
are stored and queried IDBs have a plethora of
applications in modern life, for example in
medi-cal, multimedia, and educational applications
In the framework of Geographical Information
Systems (GIS), digital images (raster data) may
represent changes in cultivations, sunny areas, and
the discrimination between urban environments
and country sides
Apart from the raster format, GIS data may
be stored in vector format (points, line segments,
polygons, etc.) Each of these data formats has
certain advantages making a choice between them
a challenge Raster data leads to faster computing
for several operations (e.g., overlays) and are well
suited for remote sensing On the other hand, they
have a fixed resolution leading to limited detail
In this article, we focus on raster data (image
databases) and their indexing techniques
Since the start of the 1980s several structures
for spatial objects have been proposed in the
lit-erature for efficient storage and retrieval of image
collections Based on these methods, many kinds
of useful queries on image data may be processed
efficiently These include:
• Queries about the content of additional
properties (descriptive information) that
have been embedded for each image (e.g.,
which images have been used in the book
cover of children’s books?)
• Queries about the characteristics/features
of the images like color, texture, shape etc
(e.g., find the images that depict vivid blue
sky.)
• Queries for retrieving images with specified
content (e.g., find the images that contain
the sub-image of a specified chair.)
• Queries by example or sketch (e.g., a sample
image is chosen, or drawn by the user and
images similar to this sample are sought.)
• Structural queries (e.g., find the images that contain a number of specific objects in a specified arrangement.)
• Image Joins (e.g., find the cultivation areas that reside in polluted atmosphere areas.)
• Queries that combine regional data and other sorts of spatial data (e.g., find the cities represented by point data that reside within 5km from cotton cultivations.)
• Temporal Queries on sequences of evolving images (e.g., find if there has been an increase
in the regions of wheat cultivations in this prefecture during the last two years.).The importance of image indexing and query-ing techniques led major Database Management Systems’ manufacturers to embed related exten-sions to the core engine of their products, (e.g., DB2 has embedded QBIC technology) (Flickner
et al 1995) and Oracle provides Content-Based Image Retrieval (CBIR) based on Virage (An-namalai et al 2000)
bAckground
A digital image is a representation of a mensional image as a finite set of digital values, called picture elements or pixels In a binary image, each pixel can be either black, or white, while in a greyscale (color) image each pixel cor-responds to a shade of gray (to a color), among a set of permitted greyscale (color) values Each image represents a scene containing objects and regions An IDB is an organized collection of digital images aiming at the man-agement and the efficient processing of queries
two-di-on this image collectitwo-di-on There are numerous publications in the literature related to the pro-cessing of queries on image features like color (e.g., distribution of colors, dominant colors, and color moments), texture (the pattern of the image surface change, usually expressed by a combination of characteristics like coarseness, contrast, directionality, uniformity, regularity,
Trang 7
Image Database Indexing Techniques
density, frequency, etc.) and shape (the physical
structure of objects, or the geometric shapes
pres-ent in the image) In several of these publications
(emerging from the image processing/computer
vision community) the term indexing refers to
the features corresponding to each image and to
the algorithm used for computing the similarity
between them (the algorithm often works by an
exhaustive comparison with all the images
pres-ent in the databases) In this article, indexing is
used in the context of databases and corresponds
to the access methods (data structures) used to
speed up query processing
Several publications that contain review
mate-rial have appeared in the literature Rui et al (1999)
review numerous papers covering several aspects
of CBIR, including multidimensional indexing,
and identify open research issues Smeulders et
al (2000) is another detailed review of CBIR
techniques covering the research presented up
to the year 2000 that includes also a subsection
on storage and indexing The last section of this
paper presents the authors’ view on CBIR’s
fu-ture trends Manolopoulos et al (2000) overview
indexing for structural and feature based queries
Veltkamp and Tanase (2001) performed a survey
on numerous CBIR systems providing
informa-tion that is available for each of them on several
technical aspects, including the use of indexing
structures One conclusion of this survey is that
“Indexing data structures are often not used”
Manouvrier et al (2005) present a detailed review
of quadtree based indexing in the image domain
ranging from image representation and storage
to CBIR Price (2006) maintains an extensive
Computer Vision bibliography (an invaluable tool
for the researcher) that contains many references
to image indexing
MAIn focus of the Art Ic Le
In this section, we review the main indexing
techniques that have been proposed for image
databases These techniques are grouped and classified by the family of their structure The two main families are the R-tree family (data-driven structures) and the quadtree family (space-driven structures), (subsection 2.1.2 of chapter 6, Mano-lopoulos et al 2000)
Chang (1987) proposed the use of 2-D strings for the structural representation of objects appear-ing in an image Using this technique, structural queries can be answered by exhaustive com-parisons with all the images in the IDB Petrakis (1993) and Orphanoudakis (1996) used hash-based indexing to speed up processing 2D strings are
an efficient representation of the “left/right” and “below/above” relationships Petrakis and Faloutsos (1997), Petrakis (2002) and Petrakis et
al (2002) adopted Attributed Relational Graphs (ARG), the most general image structure rep-resentation method, where individual object, or regions are represented by graph nodes and their relationships are represented by edges between such nodes The method developed by Petrakis and Faloutsos (1997) achieves fast query processing
by making certain assumptions on the presence
of objects in each image Petrakis (2002) and Petrakis et al (2002) relax these assumptions All these ARG-based methods achieve high performance by indexing ARGs with the R-tree family structure
An R-tree is a balanced multiway tree for secondary storage, where each node is related
to a Minimum Bounding Rectangle (MBR), that represents the minimum rectangle that bounds the data elements contained in the node The MBR
of the root bounds all the data stored in the tree Figure 1 depicts some rectangles (MBRs of data elements) on the right and the corresponding R-tree on the left Dotted lines denote the bound-ing rectangles of the subtrees that are rooted
in inner nodes The most widely used R-tee is the R*-tree; for more details refer to Gaede and Günther (1998)
Papadias et al (1998) treat the problem of structural image queries as a Multiple Constraint
Trang 8Satisfaction (MCS) problem Both the images and
the queries are mapped to regions in a
multidi-mensional space and are indexed by structures of
the R-tree family Query processing is treated as
general form of spatial joins (multi-way spatial
joins)
QBIC (Flincker et al 1995) was one of the
first systems that introduced multidimensional
indexing to enhance performance of CBIR
Color, shape and texture features are extracted
from the images and are represented by points
in high-dimensional spaces Karhunen Loeve
Transform is used to perform dimension
reduc-tion of the feature data (in order to overcome the
degradation of performance of multidimensional
index structures as the dimensionality increases,
a situation known as the “curse of
dimensional-ity”, Lin et al 1994) and a structure belonging to
the R-tree family (an R*-tree) is used as a
multi-dimensional indexing structure
Seidl and Kriegel (2001) present techniques for
adaptable similarity search They use quadratic
distance functions that are evaluated using
multi-dimensional index structures of the R-tree family
(and especially X-trees), dimensionality reduction
and approximation techniques (for an introduction
to X-trees, see Manolopoulos et al 2000)
For efficient processing of queries in image
databases, Quadtrees have also been extensively
used as indexing mechanisms The Quadtree is a
four-way unbalanced tree where each node
cor-responds to a subquadrant of the quadrant of its
father node (the root corresponds to the whole
space) These trees subdivide space in a
hierarchi-cal and regular fashion They are mainly designed for main memory, however several alternatives for secondary memory have been proposed The most widely used Quadtree is the Region Quadtree that stores regional data in the form of raster images Figure 2 depicts an 8x8 pixel array and the corre-sponding Quadtree Note that black/white squares represent black/white regions, while circles rep-resent internal nodes (gray regions) The Linear Region Quadtree is an external memory version
of the Region Quadtree, where each quadrant is represented by a codeword stored in a B+-tree; for more details refer to Samet (1990)
Quadtrees have been used for CBIR tation and querying by image features) by several researchers, as a mechanism for calculating im-age similarity by defining appropriate similarity measures Examples of such work follows In some research efforts, complete Quadtrees with
(represen-a fixed number of levels (represen-are used, since they lead to precisely enough results Each node in the Quadtree stores the features that correspond
to its quadrant, for example, a color histogram (Lin et al., 2001), or a combination of feature
Figure 1 An example of an R-tree
Trang 9
Image Database Indexing Techniques
histograms (Malki et al, 1999) De Natale and
Granelli (2001) use unbalanced Quadtrees for
image segmentation to dominant colors Each
quadtree is modelled by a binary array
represent-ing its structure and a label array representrepresent-ing
the dominant color associated to each node or
leaf Ahmad & Grosky (2003) use unbalanced
Quadtrees to decompose an image into a spatial
arrangement of features points (extracted using
image processing techniques) and to quantify
image similarity, while providing geometric
vari-ance independence For search and retrieval, an
indexing scheme based on image signatures and
quadtrees is used Chakrabarti et al (2000), use
Quadtrees to represent two-dimensional shapes
and perform shape-based similarity retrieval The
proposed representation is designed to exhibit
invariance to scale, translation and rotation
Overlapping has been applied to Linear Region
Quadtrees (Tzouramanis et al 2004) In this and
previous papers by the same authors, four
differ-ent extensions of the Linear Region Quadtree are
presented for indexing a sequence of evolving
raster data Moreover, temporal window queries
are defined and studied These queries relate to
the evolution of regional data inside a window in
the course of time
Quadtrees have also been used for creating an
IDB, where image retrieval, insertion, deletion,
comparison and set operations can be applied A
single quadtree is used for all images Its nodes
are associated with the list of images that have
information in the respective quadrants
Vas-silakopoulos & Manolopoulos (1995) proposed
Dynamic Inverted Quadtrees, while Jomier et
al (2000) proposed a version suitable for binary,
gray scale or color images
Corral et al (1999) combine two different
kinds of data and two different kinds of indexing
structures They present five algorithms suitable
for processing join queries between point data
stored in an R-tree and image data stored in a
Linear Region Quadtree
Due to space limitations, the most prominent IDB indexing structures are reviewed in this article The choice of an indexing method among them depends on the application Each of the above techniques has been designed around a specific problem setting A qualitative comparison between them is an interesting direction for future work that lies beyond the scope and the size limit
of this article Descriptions of several other Spatial Access Methods that have been used in IDBs can
be found in Samet (1990), Gaede and Günther (1998) and Manolopoulos et al (2000)
future trends
IDBs are related to two different scientific nities: database and image processing / computer vision researchers Multidimensional access meth-ods as well as information retrieval techniques and their use for query processing, constitute the key meeting point of the two worlds Several of the techniques of the image processing community could make further use of access methods or/and adapt to their properties, leading to more efficient processing of image related queries Related to the previous research direction is the further develop-ment of systems able to retrieve (and, in general, process queries) from image collections existing
commu-in different sources, commu-includcommu-ing the WWW (Rui et
al 1999) and indexing techniques are expected
to play a dominant role in them
In Zhao and Grosky (2002) one of the first techniques for integrated image retrieval based
on low-level features and high-level semantic features of images is presented Mojsilovic et al (2004) present a methodology for semantic-based image retrieval based on low-level image descrip-tors However, neither of these works is based on indexing structures Since image retrieval based
on both these kinds is features is crucial for the usefulness of CBIR systems (for a discussion of the semantic gap see subsection 2.4 of, Smeulders
et al 2000) and still remains one of the big
Trang 10chal-lenges for researchers, indexing structures could
be used in this context for calculating the
correla-tions between low-level features and high-level
concepts efficiently
In Mao et al (2005) distance-based tree
struc-tures are used for computing the similarity of
im-ages, which are represented by features reflecting
their structure, texture and color Although the
high dimensionality of the feature space
sug-gests that distance-based indexing techniques
are outperformed by sequential scan (curse of
dimensionality), the authors show that the
in-trinsic dimensionality of real data is low and can
apply distance-based indexing that is specifically
designed to reflect the intrinsic clustering of real
data The design and study of more generalized
techniques in this direction is another research
challenge
Despite the extensive research performed
in spatial / spatio-temporal databases, storing a
large database of (possibly evolving) images, or
of regional data sets and being able to efficiently
answer queries between these data and other
sorts of spatial/spatiotemporal data, or queries
involving the notion of time is still a big research
challenge For example, being able to efficiently
answer queries like: find the boats (moving points)
that were inside the storm (changing regional data)
during this morning (a time interval)
conc Lus Ion
In this paper, we have reviewed techniques related
to indexing an image database as a means for
ef-ficiently processing several kinds of queries, like
retrieval based on features, content, structure,
processing of joins, and queries by example
Although, research in this scientific area counts
several years, there are still significant research
challenges Image databases and their indexing
structures bring together two different disciplines
(databases and image processing) and
develop-ing a true Image Database System requires
interdisciplinary research efforts Nevertheless, the semantic gap is alive and querying between images and other kinds of spatial data has not attracted enough attention yet
r eferences
Ahmad, I., & Grosky, W I (2003) Indexing and
retrieval of images by spatial constraints Journal
of Visual Communication and Image tion, 14(3), 291-320.
Representa-Annamalai, M., Chopra, R., DeFazio, S., &
Ma-vris, S (2000) Indexing images in oracle8i In
Proc SIGMOD’00, 539-547.
Chakrabarti, K., Ortega-Binderberger, M., kaew, K., Zuo, P., & Mehrotra, S (2000) Similar
Por-Shape Retrieval in MARS In Proc IEEE Int Conf
on Multimedia and Expo (II), 709-712.
Chang, S K., Shi, Q Y., & Yan, C W (1987)
Iconic indexing by 2-d strings IEEE Trans
Pat-tern Anal Machine Intell., 9, 413-427.
Corral, A., Vassilakopoulos, M., & Manolopoulos,
Y (1999) Algorithms for Joining R-trees and
Linear Region Quadtrees In Proc of SSD’99,
LNCS 1651, 251-269 Spinger Verlag.
De Natale, F G B., & Granelli, F (2001) tured-Based Image Retrieval Using a Structured
Struc-Color Descriptor In Proc Int Workshop on
Content-Based Multimedia Indexing (CBMI’01),
109-115Flickner, M., Sawhney, H., Ashley, J., Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Pet-kovic, D., Steele, D., & Yanker, P (1995) Query
by Image and Video Content: The QBIC System
IEEE Computer 28(9), 23-32
Gaede, V., & Günther, O (1998) Multidimensional
Access Methods ACM Computing Surveys, 30(2),
170-231
Trang 11
Image Database Indexing Techniques
Jomier, G., Manouvrier, M., & Rukoz, M (2000)
Storage and Management of Similar Images
Journal of the Brazilian Computer Society (JBCS),
3(6), 13-26.
Lin, K.-I., Jagadish, H V., & Faloutsos, C (1994)
The TV-tree - an index structure for
high-dimen-sional data VLDB Journal, 3, 517-542.
Lin, S., Tamer, Özsu, M., Oria, V., & Ng, R
(2001) An Extendible Hash for Multi-Precision
Similarity Querying of Image Databases In Proc
of VLDB’2001, 221-230.
Malki, J., Boujemaa, N., Nastar, C., & Winter, A
(1999) Region Queries without Segmentation for
Image Retrieval by Content In proc of 3rd Int
Conf on Visual Information Systems (Visual’99),
115-122
Manolopoulos, Y., Theodoridis, Y., & Tsotras,
V (2000) Image and Multimedia indexing In
Advanced Database Indexing, Kluwer
Publish-ers, 167-184.
Manouvrier, M., Rukoz, M., & Jomier, G (2005)
Quadtree-Based Image Representation and
Re-trieval In Manolopoulos Y., Papadopoulos A &
Vassilakopoulos M (Eds.) Spatial Databases:
Technologies, Techniques and Trends Idea Group
Publishing, Information Science Publishing and
IRM Press, 81-106
Mao, R., Iqbal, Q., Liu, W., & Miranker, D (2005)
Case study: Distance-Based Image Retrieval in
the MoBIoS DBMS, In Proc of the 5th Int Conf
on Computer and Information Technology
(CIT-2005), pp 49-57
Mojsilovic, A., Gomes, J., & Rogowitz, B (2004)
Semantic-Friendly Indexing and Quering of
Im-ages Based on the Extraction of the Objective
Semantic Cues In Special Issue on Content-Based
Image Retrieval of IJCV (56), No 1-2, 79-107.
Papadias, D., Mamoulis, N., & Delis, V (1998)
Algorithms for Querying by Spatial Structure In
& Tortora G (eds.) Intelligent Image Database
Systems ( important contributions in the field of spatial projections), World Scientific Publishing
Co 197-218
Petrakis, E (2002) Fast Retrieval by Spatial
Struc-ture in Image Databases J Vis Lang Comput.,
13(5), 545-569
Petrakis, E., & Faloutsos, C (1997) Similarity
Searching in Medical Image Databases IEEE
Trans Knowl Data Eng 9(3), 435-447
Petrakis, E., Faloutsos, C., & Lin, K-I (2002) ImageMap: An Image Indexing Method Based
on Spatial Similarity IEEE Trans Knowl Data
Eng., 14(5), 979-987
Price, K (2006) Annotated Computer Vision
Bibliography http://iris.usc.edu/Vision-Notes/
bibliography/contents.htmlRui, Y., Huang, T S., & Chang, S.-F (1999) Image retrieval: Current techniques, promising
directions, and open issues Journal of Visual
Communication and Image Representation, 10(1), 39-62.
Samet, H (1990) The Design and Analysis of
Spatial Data Structures Addison Wesley.
Seidl T & Kriegel H.-P (2001) Adaptable
Similar-ity Search in Large Image Databases In Veltkamp
R., Burkhardt H., Kriegel H.-P.(eds.): State-of-the Art in Content-Based Image and Video Retrieval,
Kluwer Publishers, 297-317
Smeulders, A, Worring, M., Santini, S., Gupta, A.,
& Jain, R (2000) Content-Based Image Retrieval
at the End of the Early Years IEEE Trans Pattern
Anal Mach Intell 22(12), 1349-1380.
Trang 12Tzouramanis, T., Vassilakopoulos, M., &
Manolo-poulos, Y (2004) Benchmarking access methods
for time-evolving regional data Data & Knowl
Eng., 49(3), 243-286
Vassilakopoulos, M., & Manolopoulos, Y (1995)
Dynamic Inverted Quadtree - a Structure for
Pictorial Databases Information Systems
Spe-cial Issue on Multimedia Information Systems,
20(6), 483-500.
Veltkamp, R C., & Tanase, M (2001)
Content-based image retrieval systems: A survey http://
www.aa-lab.cs.uu.nl/cbirsurvey/
Zhao, R., & Grosky, W I (2002) Bridging the
semanitic gap in image retrieval In Distributed
multimedia databases: techniques & applications,
Idea Group Publishing, 14-36
key ter Ms
Access Method or Index Structure: A
technique of organizing data that allows the
efficient retrieval of data according to a set of
search criteria
Color Features of an Image: Characteristics
of an image related to the presence of color
in-formation, like distribution of colors, dominant
colors, or color moments
Content-Based Image Retrieval: Searching
for images in image databases according to their
visual contents, like searching for images with
specific color, texture, or shape properties, for
images containing specific objects, or containing
objects in a specified arrangement
Image Database: An organized collection
of digital images aimed at the efficient ment and the processing of queries on this image collection
manage-Query Processing: Extracting information
from a large amount of data without actually changing the underlying database where the data are organized
Semantic Features of an Image: The contents
of an image according to human perception, like the objects present in the image or the concepts / situations related to the image
Shape Features of an Image: The physical
structure(s) of the objects, or the geometric shapes present in the image
Similarity of Images: The degree of likeness
between images according to a number of features, like color texture, shape, and semantic features
Structural Features of an Image: The
ar-rangement of the objects depicted in the image
Texture Features of an Image: The pattern(s)
of the image’s surface change, usually expressed
by a combination of characteristics like ness, contrast, directionality, uniformity, regular-ity, density, and frequency
Trang 13coarse-
Chapter IV
Different Roles and De.n itions
of Spatial Data Fusion
Patrik Skogster
Rouaniemi University of Applied Sciences, Finland
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
Geographic information is created by manipulating geographic (or spatial) data (generally known by the abbreviation geodata) in a computerized system Geo-spatial information and geomatics are issues
of modern business and research It is essential to provide their different definitions and roles in order
to get an overall picture of the issue This article discusses about the problematic of definitions, but also the technologies and challenges within spatial data fusion.
Due to the rapid advances in database systems
and information technology over the last decade,
researchers in information systems, decision
sci-ence, artificial intelligence (AI), machine learning,
and data mining communities are facing a new challenge: discovering and driving useful and actionable knowledge from massive data sets During the last decade, many researchers have also studied how to exploit the synergy in infor-mation from multiple sources This phenomenon
Trang 14includes terminology such as spatial data fusion,
information fusion, knowledge (and/or belief)
fusion, and many more
t er MIno Logy
Geospatial data has many definitions, but one point
of view is that it is data consisting of geographical
information, geostatistics and geotextual
infor-mation This theme was handled already in the
mid 80’s by Crist & Cicone (1984a) According
to Crist and Cicone (1984a), geostatistics are data
that is related to a national or subnational unit
and can be georeferenced Geotextual data are
defined as text databases (like treaty databases)
that are linked to some geographic entity Crist
and Cicone also (1984b) argue that data fusion is
not just overlaying maps
Information fusion is a term defined as “a
for-mal framework in which are expressed the means
and tools for the alliance of data originating from
different sources” (Wald 1999) Wald continues
(2000) that spatial data fusion is therefore “the
formal framework that expresses the means and
tools for the alliance of data originating from
dif-ferent sources” It must be remembered, though,
that every definition always reflects the current
subject Wald´s (2000) focus is mainly on the
prominent vision of remote sensing data, where
discussion is about pixel fusion, image fusion,
sensor fusion and measurement fusion
The term “information fusion” can, in other
words, be used when different information and
data is used to solve problems Locational data
can be added to this information fusion context
and the result is spatial data fusion As Kim
(2005) describes, “information fusion can be
implemented at two different levels: raw data
and intermediate data” Information fusion at
raw data basically means taking advantage of the
synergy from considering multiple of the same
pattern (i.e., considering two temporal series
based on different measurement systems), while
information fusion at immediate data is to take the synergy from utilizing multiple patterns (i.e., utilizing both temporal and spatial patterns)(Hall
& Llinas 2001) The information fusion at the raw data level becomes important for example when two different measurements have recorded the same activities or events (e.g., flood level or mar-ket share) at the same location on a regular basis (Vanderhaegen & Muro 2005; Pereira 2002) Knowledge fusion is the process by which het-erogeneous information from multiple sources is merged to create knowledge that is more complete, less uncertain, and less conflicting than the input Knowledge fusion can be seen as a process that creates knowledge Knowledge fusion can also involve annotating the output information with meta-level information about the provenance of the information used and the mode of aggregation (Hunter and Liu 2005; Hunter and Summerton 2004)
Spatial data fusion is a combination of the above mentioned with the dimension of spatial-ity It is by definition an enormous and complex field, comprising issues ranging from registration and pixel-level fusion of data for improving the spatial resolution of managerial decision level fusion by using previously computed informa-tion stored in geographic information systems (Malhotra 1998)
t echno Log Ies w Ith In spAt IAL dAt A fus Ion
It has been estimated that up to 80% of all data stored in corporate databases may have a spatial component (Franklin 1992) To support analytical processes, today’s organizations deploy data ware-houses and client tools such as OLAP (On-Line Analytical Processing) to access, visualize, and analyze integrated, aggregated and summarized data The term “multidimensional” was estab-lished in the mid-1980s by computer scientists who were involved in the extraction of meaningful
Trang 150
Different Roles and Definitions of Spatial Data Fusion
information from very large statistical databases
(Rafanelli 2003)
Since a large part of this data has a spatial
component, better client tools are required to
take full advantage of the geometry of the spatial
phenomena or objects being analyzed In this
regard, Spatial OLAP (SOLAP) technology can
be one solution (Rivest et al 2005) A SOLAP
tool can be defined as “a type of software that
allows rapid and easy navigation within spatial
databases and that offers many levels of
informa-tion granularity, many themes, many epochs and
many display modes synchronized or not: maps,
tables and diagrams” (Bedard et al 2005)
As an alternative to the traditional statistic
regression models, new algorithms have been
developed and presented from machine
learn-ing, artificial intelligence, and data mining
communities (Kim 2005) These algorithms
include decision tree algorithms (Quinlan 1993),
support vector machines (SVM) (Burges 1998),
and genetic algorithms (Goldberg 1989) In
particular, many algorithms based on artificial
neural networks (ANNs) and their variants have
been shown very successful to predict, classify,
and describe temporally correlated data (Giles et
al 2001) ANNs can also be applied to various
business applications (Christakos et al 2002)
such as analytical review procedures in auditing
(Koskivaara 2004), stock market predictions (Saad
et al 1998), market segmentations (Hruschka &
Natter 1999) and Web usage profiling
(Ananda-rajan 2002) It is the universal approximation
property of ANNs that makes ANNs one of the
most popular algorithms to analyze temporally
correlated stochastic processes The universal
approximation property implies that with an
infinite number of hidden nodes, multi-layer
neural networks can approximate any function
arbitrarily close (Hornik et al 1989) Commonly,
multi-layer perceptions with sigmoidal and radial
basis functions have been used as alternatives to
the linear stochastic model, AR(p) model
Another type of neural network, time-delay neural networks (TDNNs), has been used to ap-proximate a stochastic process In TDNNs, input patterns arrive at hidden nodes at different times through delayed connections and thus can influ-ence subsequent inputs A different type of neural network, recurrent neural networks (RNNs), has also been proposed to model temporally correlated data sets “Jordan” (Jordan 1986) and “Elman” networks (Elman 1990) are two representative examples of RNNs Both networks employ feedback connections to enhance the limited representational power of networks due to a nar-row time window For example, Jordan networks have a feedback loop from the output layer to an additional input called the context layer However, both types of networks are still restricted in the sense that they cannot deal with an arbitrarily long time window (Dorffher 1996)
Since transactional systems are not designed
to support the decisional processes, new types
of systems have been developed to perform data fusion such as those developed by Fischer and Ostwald (2001) The solutions are technically called Analytical Systems and are known on the market as Business Intelligence (BI) solutions Rivest et al (2005) explain that these systems,
in which the data warehouse is usually a central component, are optimized to facilitate complex analysis and to improve the performance of database queries involving thousands or more occurrences According to Meeks and Dasgupta (2004), “in the short-term, the models need re-finement, primarily through applying statistical confidence to the scoring functions” Popular web search engines can also be seen as BI- solutions They use several different evaluation schemes such as keyword proximity, keyword density, and synonym matching, among others, to estimate the quality of links and files returned from Internet text searches Analyses made by their servers combine different search terms with relevant data available, in other words search engines perform data fusion
Trang 16Extensible Markup Language (XML) is
applied to control all definitions of discovered
patterns and rules to ensure the consistency of
the proposed knowledge map The advantage
of XML is that it represents a compromise
be-tween flexibility, simplicity, and readability by
both humans and machines (World Wide Web
Consortium (W3C), 2000) So XML is rapidly
becoming an information-exchange standard for
integrating data among various Internet-based
applications (Bertino and Ferrari 2001) It must,
though, be noticed that especially in the web-
browsing context exist also numerous other data
fusion standards
Fusion rule technology is a logic-based
ap-proach to knowledge fusion A set of fusion rules
is a way of specifying how to merge structured
reports Structured reports are XML documents,
where the data entries are restricted to
indi-vidual words or simple phrases, such as names
and domain-specific terminology, numbers and
units Different sets of fusion rules, with
differ-ent merging criteria, can be used to investigate
a set of structured data analyses by looking at
the results of merging More information can
be found in Hunter & Liu (2005) and Hunter &
Summerton (2004)
Asynchronous JavaScript and XML (AJAX)
is created to build dynamic web pages on the
client side Data is read from the server or sent
to the server by JavaScript requests However,
some processing at the server side is required to
handle requests, i.e., finding and storing the data
This is accomplished more easily with the use of a
framework dedicated to process Ajax requests In
the article that coined the “AJAX” term, Garrett
(2005) describes the technology as “an
intermedi-ary between the user and the server.” This Ajax
engine is intended to suppress so called waiting
time for the user when the page attempts to
ac-cess the server The goal of the framework is to
provide this Ajax engine and associated server
and client-side functions
c hALLenges In spAt IAL dAt A fus Ion
The use of various spatio-temporal data and information usually greatly improves decision-making in many different fields (Cristakos et
al 2003) Examples can be found in Meeks and Dasgupta (2004) When using spatial and temporal information to improve decision making, atten-tion must be paid to uncertainty and sensitivity issues (Crosetto & Tarantola 2001)
Because the spatial data fusion process is by its origins a process that produces data assimila-tions, the challenges it is facing are largely related
to the data handling process These include the ability to accept higher data rates and volumes, improved analysis performance and improved multiple operations Data integration processes, synchronous sampling and common measurement standards are developed for optimizing the data fusion performance This includes increasing both the data management process and data col-lection efficiency
Fusion processes are not yet robust enough They must be capable of accepting wider ranges
of data types, accommodating natural language extracted text reports, historical data and vari-ous spatial information (maps, charts, images) Therefore, the processes must have learning abili-ties Fusion processes must develop adaptive and learning properties, using the operating process
to add to the knowledge base while exhibiting a sensitivity to “unexpected” or “too good to be true” situations that may indicate countermeasures
or deception activities
The performance of processes needs to crease exponentially When the amount of pro-cessed data increases and analyses become more complicated, efficient, linked data structures are required to handle the wide variety of data Data volumes for global and even regional spatial da-tabases will be measured in millions of terabits, and short access times are demanded for even broad searches
Trang 17in-
Different Roles and Definitions of Spatial Data Fusion
Future aspects on spatial data fusion are
sub-jects of great uncertainty Nevertheless, merging
spatial data through the use of WMS (Web Map
Sevices) or geoRSS (Really Simple Syndication)
seems to become more and more common practice,
as well as strategic decisions based on spatial data
infrastructure (SDI) context
The collection of data and its availability can
also be seen as a strategic matter Roberts et
al (2005) highlights the importance of making
sense of networks that comprise many nodes and
are animated by flows of resources and
knowl-edge The transfer of managerial practices and
knowledge is essential to the functionality of
these networks and resources A survey made
by Vanderhaeden & Muro (2005) reveals that
almost all of the organisations (90%) making
use of spatial data “experience problems with the
availability, quality and use of spatial data” In
general, the organisations using the widest range
of data types experienced the greatest difficulties
in using the data
The quality of the spatial data is still only
one of the many factors that must be taken into
consideration within spatial data fusion Clearly,
the results of any spatial data fusions are only as
good as the data on which it is based (Johnson et
al 2001) One approach to improve data quality
is the imposition of constraints upon data entered
into the database (Cockcroft 1997) The proposal
is that “better decisions can be made by
account-ing risks due to errors in spatial data” (Van Oort
& Bregt 2005)
c onc Lus Ion
A great amount of data located in various
data-bases have a spatial component New innovative
applications can be produced by assimilating
in-formation with other data This paper introduced
the terminology and technology associated with
spatial data fusion Data fusion is the process of
de-tection, association, correlation, and combination
of data and information from multiple sources In order to lead the reader further, material mentioned
in the list of references are suggested and allow one to go beyond the traditional transactional data fusion capabilities
r eferences
Anandarajan, M (2002) Profiling Web Usage
in the Workplace: A Behavior-based Artificial
Intelligence Approach Journal of Management
Information Systems 19(1), 243-266
Bertino, E., & Ferrari, E (2001) XML and data
integration IEEE Internet Computing, 5(6),
75-76
Burges, C J C (1998) A Tutorial on Support
Vector Machines Data Mining and Knowledge
Discovery, 2(2), 1-27.
Christakos, G., Bogaert, P., & Serre, M (2002)
In: Temporal GIS: Advanced Functions for
Field-based Applications Berlin: Springer
Crist, E P., & Cicone, R C (1984a) Comparisons
of the dimensionality and features of simulated
Landsat-4 MSS and TM data Remote Sensing of
Environment, 14(1-3), 235-246
Crist, E P., & Cicone, R C (1984b) A cally-based transformation of Thematic Mapper
physi-data-the TM tasseled cap IEEE Transactions
on Geoscience and Remote Sensing, 22(3),
256-263
Cockcroft, S (1997) A Taxonomy of Spatial
Data Integrity Constraints GeoInformatica 1(4),
327-343Crosetto, M., & Tarantola, S (2001) Uncertainty and sensitivity analysis: Tools for GIS-based
model implementation International Journal
of Geographical Information Science 15(5),
415–437
Trang 18Dorffher, G (1996) Neural Networks for Time
Series Processing Neural Network World 6(4),
447-468
Elman, J L (1990) Finding Structure in Time
Cognitive Science,14(2), 179-212.
Fischer, G., & Ostwald, J (2001) Knowledge
management: problems, promises, realities, and
challenges IEEE Intelligent Systems, 16(1),
60-72
Franklin, C (1992) An introduction to geographic
information systems: Linking maps to databases
Database 15(2), 13–21.
Garrett, J J (2005) Ajax: A New Approach to
Web Applications http://www.adaptivepath.
com/publications/essays/archives/000385.php,
visited 25.4.2006
Giles, C L., Lawrence, S., & Tsoi, A C (2001)
Noisy Time Series Prediction using a Recurrent
Neural Network and Grammatical Inference
Machine Learning, 44(1/2), 161-183.
Goldberg, D E (1989) Genetic Algorithms in
Search, Optimization and Machine Learning
New York: Addison-Wesley
Hall, D L., & Llinas, J (2001) Handbook on
Mul-tisensor Data Fusion Boca Raton: CRC Press.
Hornik, K., Stinchcombe, M., & White, H
(1989) Multi-layer Feedforward Networks are
Universal Approximators Neural Networks,
2(5), 359-366.
Hruschka, H., & Natter, M (1999) Comparing
Performance of Feedforward Neural Nets and
K-means for Market Segmentation European
Jour-nal of OperatioJour-nal Research, 114(2), 346-353.
Hunter, A., & Liu, W (2005) Fusion rules for
merging uncertain information Information
Fusion, 7, 97-134.
Hunter, A., & Summerton, R (2004) Fusion rules
for context-dependent aggregation of structured
news reports Journal of Applied Non-classical
Logic, 14(3), 329-366
Johnson, R G (2001) In: United States Imagery
and Geospatial Information Service Geospatial Transition Plan Bethesda, MD: National Imagery
and Mapping Agency
Jordan, M I (1986) Serial Order: A Parallel
Distributed Processing Approach Technical port ICS 8604 San Diego: Institute for Cognitive
Re-Sciences, University of California
Kim, Y (2005) Information fusion via a
hier-archical neural network model The Journal of
Computer Information Systems, 45(4), 1-14.
Koskivaara, E (2004) Artificial neural networks
for analytical review in auditing Publications
of the Turku School of Economics and Business Administration A-7
Malhotra, Y (1998) Deciphering the knowledge
management hype Journal for Quality &
Perspective Risk Analysis, 25(6), 1599-1610.
Pereira, G M (2002) A typology of spatial and
temporal scale relations Geographical Analysis,
34(1), 21–33.
Quinlan, J R (1993) Programs for Machine
Learning San Mateo, CA: Morgan Kaufmann.
Rafanelli, M (2003) Multidimensional
Data-bases: Problems and Solutions London: Idea
Group Publishing
Rivest, S., Bedard, Y., Proulx, M-L., Nadeau, M., Hubert, F., & Pastor, J (2005) SOLAP technol-ogy: Merging business intelligence with geospa-
Trang 19
Different Roles and Definitions of Spatial Data Fusion
tial technology for interactive spatio-temporal
exploration and analysis of data ISPRS Journal
of Photogrammetry and Remote Sensing, 60(1),
17-33
Roberts, S M., Jones, J P., & Frohling, O (2005)
NGOs and the globalization of managerialism: A
research framework, 33(11), 1845-1864.
Saad, E., Prokhorov, D., & Wunsch, D (1998)
Comparative Study of Stock Trend Prediction
Using Time Delay, Recurrent and Probabilistic
Neural Networks IEEE Transactions on Neural
Networks, 9(6), 1456-1470
Vanderhaegen, M., & Muro, E (2005)
Contri-bution of a European spatial data infrastructure
to the effectiveness of EIA and SEA studies
Environmental Impact Assessment Review, 25(2),
123-142
Wald, L (2000) A Conceptual Approach To The
Fusion Of Earth Observation Data Surveys in
Geophysics, 2(2-3), 177-186.
Wald, L (1999) Some Terms of Reference in Data
Fusion IEEE Transactions on Geosciences and
Remote Sensing, 37(3), 1190-1193.
World Wide Web Consortium (2000), Extensible
Markup Language (XML) 1.0, 2nd ed., World Wide
Web Consortium, available at: www.w3.org/TR/
REC-xml (1st ed published in 1998), Vol W3C
Recommendation
key t er Ms
AJAX Processes: A scripting technique for
silently loading new data from the server
Al-though AJAX scripts commonly use the soon to
be standardized XMLHttpRequest object, they
could also use a hidden iframe or frame An AJAX
script is useless by itself It also requires a DOM
Scripting component to embed the received data
in the document
Arti cial Intelligence (AI): Multidisciplinary
field encompassing computer science, science, philosophy, psychology, robotics, and linguistics, and is devoted to the reproduction of the methods or results of human reasoning and brain activity
neuro-Artificial Neural Networks (ANN): Also
called a simulated neural network (SNN) or just a neural network (NN), is an interconnected group
of artificial neurons that uses a mathematical or computational model for information processing based on a connectionist approach to computa-tion
Data Mining: The analysis of data to establish
relationships and identify patterns
Extensible Markup Language (XML): A
W3C-recommended general-purpose markup language for creating special-purpose markup languages, capable of describing many different kinds of data XML is a way of describing data
GeoRSS: “RSS” is variously used to refer to
the following: Really Simple Syndication (RSS 2.0), Rich Site Summary (RSS 0.91, RSS 1.0) and RDF Site Summary (RSS 0.9 and 1.0) It can be defined as a family of web feed formats
In the RSS- context geographical data is known
as geoRSS
Geospatial Data: Data consisting of
geo-graphical information, geostatistics and tual information
geotex-Raw Data: Uninterpreted data from a storage
medium The maximum amount of raw data that can be copied from a storage medium equals the capacity of the medium
Spatial Data Infrastructure (SDI): Often
used to denote the relevant base collection of technologies, policies and institutional arrange-ments that facilitate the availability of and access
to spatial data
Trang 20Web Map Service (WMS): Produces maps of
spatially referenced data dynamically from
geo-graphic information This international standard
defines a “map” to be a portrayal of geographic
information as a digital image file suitable for
display on a computer screen A map is not the
data itself WMS-produced maps are generally
rendered in a pictorial format such as PNG, GIF
or JPEG This is in contrast to a Web Feature
Service (WFS), which returns the actual data
Trang 21Universitat Jaume I, Spain
Miguel Ángel Manso
Technical University of Madrid, Spain
Miguel Ángel Bernabé
Technical University of Madrid, Spain
Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited.
Abstr Act
Geographic Information Systems (GIS) are data-centric applications that rely on the input and constant maintenance of large quantities of basic and thematic spatial data in order to be useful tools for deci- sion-making This chapter presents the institutional collaboration framework and the major technology components to facilitate discovery and sharing of spatial data: Spatial Data Infrastructures (SDI) We review the essential software components –metadata editors and associated catalogue services, spatial data content repositories, client applications, and middleware or intermediate geospatial services– that define SDIs as heterogeneous distributed information systems Finally we highlight future research needs
in the areas of semantic interoperability of SDI services and in improved institutional collaboration.
Trang 22Int roduct Ion
Geographic Information Systems (GIS) and
re-lated spatial applications are data-centric in the
sense that they rely on the input and constant
main-tenance of large quantities of reference spatial data,
on top of which integrators and end-users produce
value-added thematic geographic information for
the purpose of decision-making A typical GIS
workflow can be simplified as consisting of three
components: 1) data entry and reformatting, 2) data
processing (geoprocessing), and 3) presentation
of results to the user In practice, this apparently
simple workflow is constrained by two key factors
The first is limited interoperability among GIS
components, because most are tightly coupled
to specific data formats or to other software,
complicating the task of integrating components
from multiple vendors The second is that the
basic spatial data (reference data) necessary to
begin geoprocessing are in many cases not readily
available, because they are poorly documented,
outdated, are too expensive, or are available under
restrictive licensing conditions This second factor
has been seriously limiting the ability of
govern-ment employees, researchers and businesses to
exploit geographic information, unnecessarily
incrementing project costs and, thus, negatively
affecting the economy
Many government administrations have
rec-ognized this critical problem and have initiated
coordinated actions to facilitate the discovery and
sharing of spatial data, creating the institutional
basis for Spatial Data Infrastructures (SDI) (van
Loenen and Kok 2004) The Global Spatial Data
Infrastructure (GSDI) association (www.gsdi.org)
defines SDI as a coordinated series of agreements
on technology standards, institutional
arrange-ments, and policies that enable the discovery
and facilitate the availability of and access to
spatial data The SDI, once agreed upon and
implemented, serves to connect GIS and other
spatial data users to a myriad of spatial data
sources, the majority of which are held by public
sector agencies
In 1990 the U.S Federal Geographic Data Committee (FGDC) was created and in 1994, then president, William Clinton, asked it (Ex-ecutive Order 12906) to establish a national SDI
in conjunction with organizations from state, local, and tribal governments, the academic community, and the private sector Three years later the European Umbrella Organization for Geographic Information (EUROGI) was created with the mission to develop a unified European approach to the use of geographic technologies (a mission far from complete) More recently, the European Commission launched the Infrastruc-ture for Spatial Information in Europe (INSPIRE) initiative for the creation of a European Spatial Data Infrastructure (ESDI), based on a Frame-work Directive (European legislation) defining how European member states should go about facilitating discovery and access to integrated and interoperable spatial information services and their respective data sources As the number
of national SDIs increased, to include in 2004 about half the nations worldwide (Masser 2005; Crompvoets et al 2004), the Global Spatial Data Infrastructure (GSDI) Association was created to promote international cooperation and collabora-tion in support of local, national, and international SDI developments
The basic creation and management principles
of SDI apply to any and all spatial jurisdictions
in a spatial hierarchy, from municipalities to regions, states, nations, and international areas Béjar et al (2004) show how each SDI at each level in the hierarchy can be created in accordance with its thematic (e.g., soils, transportation) and geographical (e.g., municipality, nation) cover-age, following international standards-based processes and interfaces, to help ensure that the SDIs fit like puzzle pieces, both geographically and vertically (thematically) This harmoniza-tion exercise is necessary to allow for seamless spatial data discovery and exploitation crossing jurisdictional boundaries, in the case of response
to flooding or forest fires, just to name two
Trang 23im-
Spatial Data Infrastructures
portant cross-border applications In practice this
harmonization has been difficult to achieve due
to political but also semantic-related differences
between neighboring regions An early (1980s)
European exercise in cross-border harmonization,
stitching together nationally-produced pieces of
the Coordinated Information on the European
Environment (CORINE) land cover database,
highlighted some of these discrepancies at regional
and national borders: experts on both sides
dis-agreed on how to classify the same, cross-border
land cover regions
sd I essent IAL co Mponents
Although SDIs are primarily institutional
col-laboration frameworks, they also define and guide
implementation of heterogeneous distributed
information systems, consisting of four main
software components linked via Internet These
components are: 1) metadata editors and
associ-ated catalogue services, 2) spatial data content
repositories, 3) client applications for user search
and access to spatial data, and 4) middleware or intermediate geoprocessing services which assist the user in finding and in transforming spatial data for use at the client side application Figure 1 summarizes these essential technol-ogy components, as generally accepted within the geographic information standards organizations Open Geospatial Consortium (OGC) (www.open-geospatial.org) and ISO Technical Committee
211 (www.isotc211.org), and synthesized by the FGDC and NASA This conceptual architecture may be interpreted as a traditional 3-tier client-middleware-server model, where GI applications seek spatial data content that are discovered and then possibly transformed or processed by intermediary services before presentation by the client application But the architecture also may
be interpreted using the web services find-bind’ triangle model (Gottschalk et al 2002), whereby spatial data content (and service) offers are published to catalogue servers, which are later queried to discover (find) data or services, and then the client application binds to (consumes or executes) them
‘publish-Figure 1 High-level SDI architecture, taken from the FGDC-NASA Geospatial Interoperability ence Model (GIRM), (FGDC 2003)
Trang 24Refer-Regardless of the precise conceptual model
adopted, what is common among nearly all SDIs
is the primary goal of improving discovery and
access to spatial data Discovery is based on
the documentation of datasets to be shared, in
the sense of metadata following international
standards such as ISO 19115/19139 Metadata
describing the content, geographic and temporal
coverage, authorship, access and usage rights
details, and other attributes of a dataset are
cre-ated within GIS applications or externally using
specialized text editors The metadata files are
stored in standard XML formats and are then
sent (published) to some data catalogue server,
in many cases one which is located at a central
node of the SDI but in principle may be distributed
anywhere on the network
Users wishing to discover spatial data sources
normally access catalogue search interfaces via
web applications called Geoportals (Bernard
et al 2005), examples of which may be found
at http://www.geo-one-stop.gov/ and
http://eu-geoportal.jrc.it/ The geoportal is an interface
façade, both hiding the implementation details of
the underlying catalogue query mechanisms, and
inviting participation in the SDI community In
addition to discovery queries, the geoportal also
normally provides free access to quick looks or
small samples of datasets that are discovered
This spatial data visualization is frequently
imple-mented as software employing Web Map Service
(WMS) software interfaces (OGC 2006), allowing
for integration of heterogeneous client and server
products from multiple vendors, as proprietary
or free software solutions WMS-based services
receive a request for a certain spatial data layer
and for a certain geographical extent, convert the
data (initially in vector or raster format) to create
a bitmap (standard MIME formats such as JPEG,
GIF, PNG) and then deliver the image to the web
client (browser or GIS/SDI client)
More sophisticated spatial data (web) services
are becoming available, many of which also
following de jure ISO standards and de facto
specifications from organizations such as OGC, Organization for the Advancement of Structured Information Standards (OASIS), and W3C These include services providing concrete functionality such as coordinate transformation, basic image processing and treatment, basic geostatistics, and composition or chaining of individual services to form more complex services
Summarizing both the institutional and technological aspects, article 1 of the INSPIRE European Directive proposal (EC 2004), lists the following necessary components for SDIs:
“The component elements of those tures shall include metadata, spatial datasets and spatial data services; network services and tech-nologies; agreements on sharing, access and use; and coordination and monitoring mechanisms, processes and procedures.”
infrastruc-Caution should be taken when describing more narrow initiatives, projects or products which provide only a subset of these requirements, as SDI This is especially the case of institutional agreements and cooperation for improved access
to spatial data: the above principles and nents should be explicitly involved
compo-f uture r ese Arch
SDI researchers are active on several fronts, but two main areas of interest are: improving institutional collaboration and SDI effective-ness (including cost-benefit analyses and more elaborate data access policy), and SDI component implementation and testing In the second area fall topics such as semantic interoperability and composition of SDI (web) services, the integra-tion of so-called disruptive technologies such as Google Earth and similar commercial services, grass-roots initiatives contributing user-generated data, integration with grid computing and with e-Government solutions, and exploitation of data from diverse sensor networks
Trang 250
Spatial Data Infrastructures
For further details on SDI the reader should
consult the Spatial Data Infrastructure Cookbook
(GSDI 2004) and the European Commission’s
International Journal of SDI Research (http://
ijsdir.jrc.it)
r eferences
Béjar, R., Gallardo, P., Gould, M., Muro, P.,
Nogueras, J., & Zarazaga, J (2004) A high level
architecture for national SDI: The Spanish case
EC-GI&GIS Workshop, Warsaw, June 2004
Retrieved April 4, 2006, from http://www.ec-gis
org/Workshops/10ec-gis/
Bernard, L., Kanellopoulos, I., Annoni, A., &
Smits, P (2005) The European geoportal — one
step towards the establishment of a European
Spatial Data Infrastructure Computers,
Environ-ment and Urban Systems, 29(1), 15–31.
Crompvoets, J., Bregt, A., Rajabifard, A., &
Williamson, I (2004) Assessing the worldwide
status of national spatial data clearinghouses
International Journal of Geographical
Informa-tion Science, 18(7), 665-689.
EC Commission of The European
Commu-nities (2004) Proposal for a Directive of the
European Parliament and of the Council
estab-lishing an infrastructure for spatial information
in the Community (INSPIRE), COM(2004) 516
Retrieved April 4, 2006, from http://inspire.jrc
it/proposal/EN.pdf
FGDC (2003) The Geospatial Interoperability
Reference Model, version 1.1 Federal Geographic
Data Committee Geospatial Applications
Interop-erability (GAI) Working Group Retrieved April
4, 2006, http://gai.fgdc.gov/girm/v1.1/
Gottschalk, K., Graham, S., Krueger, S., & Snell,
J (2002) Introduction to Web services
archi-tecture IBM Systems Journal, 41(2) Retrieved
April 4, 2006, from http://researchweb.watson
ibm.com/journal/sj/412/gottschalk.html
GSDI (2004) Spatial Data Infrastructure book (English version 2.0) Retrieved April 4,
Cook-2006, from kindex.asp
http://www.gsdi.org/gsdicookboo-Masser, I (2005) GIS Worlds; Creating Spatial
Data Infrastructures Redlands, California: ESRI
Press
OGC (2006) OpenGIS Web Map Service (WMS) implementation specification, version 1.3 Retrieved April 4, 2006, from http://portal.opengeospatial.org/files/?artifact_id=14416
van Loenen, B., & Kok, B.C (Eds.) (2004)
Spa-tial data infrastructure and policy development
in Europe and the United States Delft: Delft
University Press
key t er Ms
FGDC: Federal Geographic Data Committee,
an interagency committee established in the US
in 1990, with the mandate to create and support data sharing, in the form of the US National SDI http://www.fgdc.gov
GI: Geographic information, the subset of
information pertaining to, or referenced to, known locations on or near the Earth’s surface
GSDI: Global Spatial Data Infrastructure
Association An umbrella organization grouping national, regional and local organizations dedi-cated to the creation and maintenance of SDIs around the world http://www.gsdi.org
OGC: Open Geospatial Consortium, a
mem-bership body of 300-plus organizations from the commercial, government and academic sectors, that creates consensus interface specifications
in an effort to maximize interoperability among software detailing with geographic data http://www.opengeospatial.org
SDI: Spatial Data Infrastructure
Trang 26WMS: Web Map Service, a software interface
specification published by the Open Geospatial
Consortium (OGC) The specification defines
how software clients should formulate queries
to compliant map servers, and how those servers
should behave and respond