The data model is applied to a small watershed modelling test case, which provides initial scope for simulating geographic processes with the new data model... For example, a global clim
Trang 1Dynamic and Mobile GIS: Investigating Changes in Space and Time Edited by Jane Drummond, Roland
Billen, Elsa João and David Forrest © 2006 Taylor & Francis
Chapter 5
nen, A Process-oriented Data Model
Femke Reitsma1 and Jochen Albrecht2
1
Institute of Geography, School of Geosciences, The University of Edinburgh,
Scotland
2
Department of Geography, Hunter College, City University of New York, USA
5.1 Introduction
Thus far, GIScience has lacked an appropriate data model to represent processes; processes such as erosion, migration and pollution dispersal The need for extending geographic representations for processes has been recognised in GIScience literature (Peuquet, 2001; Raper, 2000; Worboys, 2001) and acknowledged as a key goal in the University Consortium of GIS’s (UCGIS) research agenda (McMaster and Usery, 2005) Yuan et al (2005, p 132) posit that ‘As the conceptual core of a geographic information system, geographic representations determine what information is available for communication, exploration and analysis Hence, research in extensions to geographic representations is critical to advancing geographic information science’ In order to investigate change in space and time, the theme of this book, we need to be able to explicitly represent change as it occurs
Existing theories and data models for simulating processes focus on representing the state of the represented system at a moment in time The future pattern of global temperature from a global climate change model or the distribution of humans in an agent-based simulation of disease spread, for example, only provides information about the status of the attributes of the system at each step of the simulation, attributes such as temperature or agent health at a particular location Information about the processes defined in the model is typically not expressed or represented in any form In utilising a process-oriented data model, we gain the advantage of being able to query, analyse and visualise processes
This chapter presents a new process-oriented data model called nen, which can
be used to represent process information The application of the nen data model to
process modelling offers a set of modelling results that is complementary to those of traditional models Its novelty is the provision of a new epistemological window on the modelled results, allowing for new process-oriented queries and analysis The data model is applied to a small watershed modelling test case, which provides initial scope for simulating geographic processes with the new data model In what
Trang 2follows, Section 5.2 describes current approaches to theorising and representing processes in GIScience, forming a framework for discussion of the new data model
Section 5.3 presents an alternative approach, describing the new data model, which
is then applied with a prototype implementation of a watershed runoff model in
Section 5.4 The results of the nen-based approach are then discussed in Section 5.5, followed by consideration of validation of models and results of this method in
Section 5.6 Section 5.7 concludes the chapter
5.2 Process theories and models
Current research into dynamic phenomena in GIScience has focused on the representation of object states at each moment of time and over time This is built upon long-standing theories defining the entities that populate or compose space and time What is meant by object, are those things that we typically identify and categorize as existing at an instant of time, such as trees, mountains, barrier islands and political boundaries These are the things dominating metaphysics (Hartshorne, 1998; Rosenthal, 1999), as well as GIScience ontologies (for example, Casati et al., 1998; Fonseca and Egenhofer, 1999; Smith and Mark, 1998; Thomasson, 2001) Spatiotemporal research in GIScience has consequently focused on the dynamics of these entities, i.e connecting the states of these entities over time (e.g Tryfona and Pfoser, 2001), or exploring the relationships between objects and the processes that modify them (Bittner and Smith, 2003; Tomai and Kavouras, 2004)
As a consequence of the focus on static objects, data models for dynamic phenomena centre on state changes of objects For example, a global climate change model, while containing process information in the model structure, does not represent or store this information for analysis; rather, the states of the climate system are stored at each instant of time There is no data object that represents a geographic process that changes over space and time (Yuan et al., 2005) This results in a loss of information about the modelled process, which cannot accurately
be regained by interpolating between time slices For example, in global climate modelling virtually the same future state of increased temperature can be modelled
as a result of two very different changes to the model, an increase in solar luminosity or an increase in CO2 It is not immediately obvious which process or processes, such as heat transport or a change in cloud optical depth, caused these results
The static roots of GIS are found in its cartographic origins, which have formed the intellectual framework for much of GIScience research (Kuhn, 2001; Yuan et al., 2005) Kuhn (2001) notes a number of other reasons for such object orientation
in geographic and other information systems, including:
an emphasis on attributes and relationships rather than process and change,
the weakness of logic-based formal languages in dealing with operations and their semantics,
and a presumed priority of objects in human (spatial) cognition
Trang 35.3 An alternative process data model
An alternative data model for the representation of processes is presented in this chapter, which provides advantages in querying, analysis and exploration of process
descriptions under computer simulation conditions - or in silico The data model is referred to as a nen because its simplest and most abstract graph representation is a
node-edge-node triple (Figure 5.1) This simple point process representation, which was used for the watershed prototype described in Section 5.4, can be extended to larger spatial entities, as might be represented by a polygon (Figure 5.2)
(x1, y1)
(x2, y2)
Figure 5.1 Process representation for point feature
A more comprehensive representation is in form of a tuple: (x1, y1, x2, y2, t, st, {a1, a2, }, {r1, r2, …}) The spatial location of the process is identified by x1, y1, x2, y2, which expresses the spatial extent of the process The temporal location of the process is defined by t, where a process is represented on a single layer of spatial information rather than lost between time slices The st represents the spatiotemporal granularity of the process, which may be a function of the amount of energy that initiates the process For example, given some threshold breaking push, the spatiotemporal granularity expresses how far and over what time period the process will operate in response to that push The set {a1, a2, } defines the attributes of the process The set {r1, r2, …} defines the rules of the process that govern its dynamics and interaction with other processes For example, a set of rules for modelling the process of sediment transport in the longshore may define the spatiotemporal extent of an instance of that process as 5m/hour, depending on various relationships it holds between other processes operating in the nearshore
Trang 4Figure 5.2 Process representation for area feature
This data model provides a new epistemological window on geographic processes Simulating processes with a process data model allows us to ask questions that are not directly answerable with current object-centred formulations, which focus on the states of a system that result from the operation of processes Our new data model allows us to ask questions such as:
Where is a process operating at a particular instant of time?
How has the process changed over time?
What process(es) caused another process to occur?
The answers are not inferred (or interpolated) but are explicitly stored as part of running the process model How the rules of the process affect the spatial dynamics
of the process may therefore also be better explored
5.4 Watershed modelling application
The theory of taking process as a representational primitive has been prototyped with a watershed model within a simulation environment called Flux
5.4.1 Simulator
Flux is written in Java and inherits and extends a number of basic operating classes from the RePast (Recursive Porous Agent Simulation Toolkit) library, which is an open source agent-based modelling environment created by Social Science Research Computing at the University of Chicago1 RePast is primarily used for its display and scheduling classes, and also has the advantage of containing Java classes for importing GIS raster data (ESRI ASCII raster files) Flux contains a set
of interfaces and default classes that define the basic structure of the process model, including methods that must be implemented by an inheriting domain model The
1
http://repast.sourceforge.net/
Trang 5objective was to maximise generic functionality within the Flux classes, thereby minimising the code to be developed within the domain model The output of a simulated model is stored in text files, which can then be queried with a query tool that was developed as part of the initial steps towards process analysis For a full description of the simulator, see Reitsma and Albrecht (forthcoming)
Figure 5.3 presents a sample simulation using the Flux simulator Each nen,
represented by a node-edge-node tuple (as depicted in Figure 5.1), indicates an instance of groundwater flow The raster backdrop is a digital elevation model of a small sub-watershed, where lighter shades represent higher elevation At each time step, groundwater flows towards the North-Western corner of the sub-watershed
Figure 5.3 Sample simulation
5.4.2 Model and simulation
For the purposes of testing the methodology a simple watershed model was simulated The model included the following restricted set of processes: Hortonian overland flow, groundwater flow, infiltration, percolation, saturation excess runoff and surface ponding The data used to define the parameters for the simulation are taken from the Reynolds Creek Experimental Watershed (RCEW), which is a
Trang 6high-quality long-term dataset created by the U.S Department of Agriculture Agricultural Research Service’s Northwest Watershed Research Center in Boise, Idaho, United States For a full description of the RCEW, see the special issue of Water Resources Research introduced by Marks (2001)
At each hourly time step the precipitation input is updated, which initiates one of three processes, Hortonian overland flow, infiltration or surface ponding Each process type has a set of rules defining its behaviour For example, if the precipitation exceeds the infiltration capacity of the soil and depending on the slope characteristics, an instance of Hortonian overland flow will be generated Although hydrologically limited, the example explores the advantages of the methodological approach of considering process as a data modelling primitive
Two time slices of the simulation are presented in Figure 5.4 The black nens
represent the process of Hortonian overland flow, the dark grey nens represent
infiltration, the grey nens represent percolation, and the light grey nens represent
groundwater flow Percolation and infiltration processes are represented by two nodes on top of each other because the third dimension is not represented With the
nen data model, insight can be gained as to where and when certain processes
dominate, which may lead to a better understanding of the modelled system and give guidance to better ways of interacting with that system For example, in Figure 5.4 it is evident that the process of Hortonian overland flow dominates in certain upland parts of Upper Sheep Creek This is in contrast to typical approaches to modelling that generate results expressing where some energy or mass is at an instant of time within the system, such as water in our watershed, with no information of the processes that caused that state
Figure 5.4 Simulation at two time steps, in progressive order from left to right
Trang 75.5 Analysis of results
Without an appropriate data model to represent processes, we cannot easily analyse
or visualise the dynamics and interactions of processes for the purpose of
understanding the modelled system Because the nen data model represents a
process as a spatially extended entity at any moment in time, not only can its state
be analysed but also its dynamics In addition, due to the structure of the data model, namely two nodes connected by an edge, network analysis may also result in new insights into the model results This may be of particular interest in recording the interaction of processes and provide new patterns of process relationships to be explored and classified, as will be discussed below
5.5.1 Process state and change
As will be discussed further below, the state information of a process includes all of the components of the data structure, namely:
the spatial location (x1, y1, x2, y2),
the temporal location (t),
the spatiotemporal granularity (st),
the attributes ({a1, a2, }),
and the rules ({r1, r2, …})
Furthermore, from the data structure the direction and velocity of the process may
be derived Each of these aspects of the state of the process can be temporally extended such that processes can be queried for change For example, the change in direction of groundwater flow or change in the mass of water involved in this process can be queried
The location of individual or interacting processes can be analysed spatially, spatiotemporally or temporally Discovering spatial, spatiotemporal or temporal clusters of processes may provide new insights into thresholds and critical combinations of processes Spatial clusters of processes may indicate the dominance
of processes in certain locations over time, such as erosion on a certain part of a hill slope Spatiotemporal clusters of processes are the spatial clustering of processes at certain times, where we may use different notions of time, such as linear or cyclic; for example, analysing the results of our model may result in findings of new large-scale recurrent weather patterns such as El Nino Modelled processes might be widely distributed with no evident spatial pattern, yet we might find temporal clusters that indicate that these processes are temporally correlated in some way; for example, ocean thermohaline circulation has a significant effect on global climate change (Knutti et al., 2004) In these three cases, we may find interesting new patterns among process instants of the same type or among different processes The attributes of the modelled processes can be analysed for variations in magnitude, or specific values of interest Certain magnitudes may dominate in certain types of processes or be correlated with other processes The dynamics of
Trang 8the magnitude of groundwater in the process of groundwater flow, for example, may
be of interest in understanding the impact of soil structure on groundwater flow
The rules of the process may also be of interest for analysis Although typically
the rules or mathematical functions defining the behaviour of the process are static,
they may also be evolutionary Genetic algorithms, for example, allow us to evolve
rules We may find that certain types of rules dominate, or particular patterns of
rules or cycles of rules may develop
Because the data model is spatiotemporally extended, the difference between one
location and the other can be used to provide information on direction and velocity
of processes Determining the average direction and average velocity of a certain
type of process may be of particular import to analysing and understanding climate
processes The direction and velocity of climate processes, for example, may be
correlated with certain types of erosion or vegetation growth processes at a certain
location They also assist in the identification of when model rules need to change
as small-extent nens move into a new geographic regime; the effect of tropical
hurricanes on previously unaffected deciduous forests as a result of large-extent
global warming would be an example for that
Each of these dimensions, location (spatial, temporal and spatiotemporal),
direction, velocity, attributes and rules can be combined, as is reflected in Figure
5.5 Some of these variables may be held constant, others may vary The example
provided in the figure illustrates a case where analysis is undertaken on the
relationship between direction and attributes of a process A yet unresolved
challenge is how we visualise all of these dimensions of analysis, either individually
or combined
Location
Velocity
Attributes
Rules
Figure 5.5 Matrix of dimensions of process analysis
5.5.2 Process interaction and causality
In order to analyse the interaction of processes, the data model has another
advantage of supporting network analysis Network analysis describes the structure
of a network based on the number of nodes, links and the attributes associated with
the nodes and/or links It includes a large range of measures that are applied in
fields as disparate as sociology (e.g Wasserman and Faust, 1994) and physics (e.g
Dorogovtsev and Mendes, 2002) The network described by nens may be of a single
type of process, such as Hortonian overland flow, or of a collection of different
processes, such as those operating within a watershed Analysing the network of
Trang 9nens allows us to explore the relationships among processes The application of
network analysis to networks of interacting processes may provide new measures of process patterns, and perhaps, as with recent discoveries of patterns in animate and inanimate networks (Barabasi, 2002), new insights into the systems that we model Tracing the complex interactions among processes of different types in our model also allows us to monitor causality In Figure 5.6, for example, five interacting processes are schematically displayed, with the x-axis defining the temporal extent and the y-axis a set of discrete rules The interaction of processes is
indicated by spatial coincidence of some part of the nen data model representing the process In this figure: nen 1 interacts with nen 2 according to rules 4 and 5; the process represented by nen 1 is followed by nen 3, which is followed by nen 4, this
is evident by the (x2, y2) of nen 1 being equivalent to the (x1, y1) of nen 3, and the (x2, y2) of nen 3 being the same as the (x1, y1) of nen 4; nen 2, nen 3, and nen 4 interact with the long-term process nen 5; nen 3 starts as a point process and ends as
an area process
Figure 5.6 Five interacting processes
5.6 Validation of model and results
As with analysis, without an appropriate data model we cannot easily validate the spatial behaviour of our modelled processes For example, while a lumped hydrological model may produce a hydrograph that concurs with the measured discharge of the watershed, all of that modelled discharge may have resulted from Hortonian overland flow, whereas in reality it may have been a mixture of processes such as groundwater flow and saturated excess flow Without a data model to represent these processes, we cannot easily tell which processes caused the final modelled state This problem is well recognised by watershed modellers as that of equifinality, which describes the situation where the same system state can result from many different sets of processes (Bevan, 2000)
Trang 10The nen data model allows us to visualise and analyse the dynamics of the
processes in the model, facilitating the validation of the definitions, in rules of
mathematical formulas, of the processes in the model Furthermore, the nen
provides the basis for testing and comparing different definitions of processes By visualising and measuring how descriptions of processes within the model compare
to other definitions and known spatial dynamics of processes, modellers can test whether their mathematical or rule-based formalisms act in expected and realistic ways
A process data model also enhances the ability to compare models, lending itself
to model inter-comparison studies The nen allows us to compare distribution,
quantity and dynamics of processes among models This contrasts with traditional approaches to model inter-comparison, which analyse the state of the modelled system at the end of the simulation or over specified time steps (for example Dutay
et al., 2002)
In validating the results of a nen-based model, however, difficulty lies in the lack
of qualitative or quantitative descriptions of geographic processes The results of a model are validated by matching the output of the model with the real world, a good
result being the ability to mirror that world in silico Typically a model is validated
by comparing the final simulated system state, with the real system at the same
point in time In order to validate the results of a simulation using the nen data
model we need long-term empirical observations of the simulated processes As with the data and literature on the RCEW used in the watershed application and described in Section 4, such process data is rarely if ever available Without process
observations, any simulation using the nen data model cannot be effectively
validated
5.7 Conclusion and future developments
The lack of appropriate data against which to validate process definitions and results
of a nen-based model leads to questions of how we might go about observing and
measuring processes in the field Qualitative descriptions of processes, while available in certain cases, will always need to be quantified in some manner in order
to provide a basis for comparison and formal analysis Quantitative measurement devices also facilitate automation of analysis and validation We do not know of any measurement approach that quantitatively records process information, which suggests there is a need for new data collection techniques that collect such information for comparison against model results Furthermore, data theory needs to
be developed, that is, new approaches to transforming real-world observations into something that can be analysed (Jacoby, 1991)
Currently the flux simulation environment is constrained to small models due to problems of computational complexity To use this approach for models of larger spatial scale and of greater detail would require a significant rewrite of the software and consideration of advanced methods for accessing larger-scale computing resources Alternatively, it should not be difficult to modify existing modelling
software environments to implement the nen data model However, given the