GIS for Environmental Decision Making - Chapter 7 ppt

The research develops a set of likelihood or suitability models for the presence of tree species that are widely distributed over a study area of 41,000 km2.. Suitability models can be g

Trang 1

117

GIS and Predictive Modelling: A Comparison of Methods for Forest Management and Decision-Making

A Felicísimo and A Gómez-Muñoz 7.1 INTRODUCTION

GIS can be a useful tool for spatial or land-use planning, but only if several conditions are fulfilled The key conditions are related to 1) the quality of basic spatial information, and 2) the statistical methods applied to the spatial nature of the data Appropriate information and methods allow the generation of robust models that guarantee objective and methodologically sound decisions

In this study we apply several multivariate statistical methods and test their usefulness to provide robust solutions in forestry planning using GIS We must emphasize that in our Iberian study area, where forests have progressively decreased in extent over centuries, the main aims of forestry planning are the reduction of forest fragmentation, biodiversity conservation, and restoration of degraded biotopes

The research develops a set of likelihood or suitability models for the presence

of tree species that are widely distributed over a study area of 41,000 km2 The utility of suitability models has been demonstrated in some previous studies1, but they are still not as widely employed as might be expected

A suitability model is a raster map in which each pixel is assigned a value reflecting suitability for a given use (e.g., presence of a tree species) Suitability models can be generated through diverse techniques, such as logistic regression or non-parametric CART (classification and regression trees) and MARS (multiple adaptive regression splines)2-4 All of these techniques require a vegetation map (dependent variable) and a set of environmental variables (climate, topography, geology, etc.) which potentially influence the vegetation distribution The foundation of the method is to establish relationships between the environmental variables and the spatial distribution of the vegetation Typically, each vegetation type will respond in a different way as a consequence of its contrasting environmental requirements

Suitability is commonly expressed on a 0-1 scale (incompatible-ideal) The precise value depends on a set of physical and biological factors that favor or limit the growth of each type of vegetation Once the distribution of suitability values across a region is known, decisions on land use and management can be made on the basis of objective criteria

Trang 2

The set of suitability values for a region can be considered as the potential distribution model if presented as a map: the area defined as ‘suitable’ in a model should reflect the potential area for the vegetation type under consideration Such

a model also represents the relationships between presence/absence of each forest type and the values of the potentially influential environmental variables in a given region Usually, current forest distributions are significantly smaller than the potential spatial extents because they have been systematically logged Potential distribution models allow the recognition and delineation of such former distribution areas in order to direct current and future management plans, provide valuable data for restoration initiatives and highlight areas where such actions should be considered a priority

7.2 OBJECTIVES

The main objectives of the study were to 1) use several different statistical methods to generate maps of potential distributions and suitability for each of three

species of Quercus (oak) in the study area, and 2) identify the most appropriate

method and assess its advantages and limitations In order to fulfill these objectives, we developed a workflow that included sampling strategies, GIS implementation of statistical models and validation of results

7.3 STUDY AREA

The study area was Extremadura, one of the 17 Autonomous Communities of Spain, covering 41,680 km2, and located in the west of the Iberian Peninsula (Figure 7.1) It has a Mediterranean climate, somewhat softened by the relative proximity to the sea and the passage of frontal systems from the Atlantic

The study subjects, which partially cover this area, were three species of the

genus Quercus that grow in forests or ‘dehesas’ Dehesas are artificial ecotypes

derived from original forest clearings (Figure 7.2) Continuous forest cover disappeared centuries ago and currently only scattered patches remain over a large potential area In some places deforestation was complete and not even the most

open dehesas remain Trees from the genus Quercus are the dominant constituents

of forests in the area, the most important species (and those considered in the

analysis) being Quercus rotundifolia Lam (holm oak, 12,680 km2, synonym:

Quercus ilex L ssp ballota (Desf.) Samp.), Quercus suber L (cork oak, 2,130

km2) and Quercus pyrenaica Wild (Pyrenean oak, 950 km2) With some exceptions, Pyrenean oak appears most commonly in forests, while cork and holm oaks preferentially occur in dehesas

Trang 3

Figure 7.1 Location of Extremadura in the Iberian Peninsula

Figure 7.2 Dehesas are artificial ecotypes comparable to savannas: a Mediterranean (seasonal) grassland

containing scattered trees of the genus Quercus

Trang 4

7.4 DATA

A set of raster maps was compiled to reflect the spatial distribution of dependent and independent (predictive) variables

7.4.1 Quercus Distributions

Current Quercus species distribution maps were taken from the Forestry Map of

Spain (scale 1:50,000), produced by the Spanish General Directorate for Nature Conservation during the period 1986-96 We used the digital version of the map to identify the main vegetation classes and the current spatial distributions (Figure 7.3)

Figure 7.3 Current distribution of Quercus species in the study area (black represents Pyrenean oak, Q

pyrenaica; dark gray, cork oak, Q suber; and pale gray, holm oak, Q rotundifolia)

Trang 5

7.4.2 Predictive Variables

Raster maps were generated to represent the following independent variables:

• Elevation A digital elevation model (DEM) was constructed using

Delaunay triangulation of spot height and contour data from the 1:50,000 scale topographic map of the Army Geographical Service, followed by transformation to a regular 100 m resolution grid

• Slope angle was calculated from the DEM by applying Sobel's algorithm5

• Potential insolation A measure was derived following the method

proposed by Fernández Cepedal and Felicísimo6 This used the DEM to assess the extent of topographical shading given the position of the sun at different standard date periods7 The result was an estimate of the time that each point on the terrain surface was directly illuminated by solar radiation The temporal resolution was 20 minutes and the spatial resolution 100 m

• Temperature maps of the annual maxima and minima were interpolated

from data for 140 meteorological monitoring points (National Institute of Meteorology, Spain) using the thin-plate spline method8,9 with a spatial resolution of 500 m

• Quarterly rainfall maps were interpolated from data for 276

meteorological monitoring points (National Institute of Meteorology, Spain) using the thin-plate spline method with a 500 m spatial resolution These variables were selected because of their potential influence on the distribution of the vegetation and the availability of sufficient data to generate GIS digital layers Lack of data eliminated other variables (e.g., soils) commonly used

in ecological modelling

7.5 METHODS 7.5.1 Statistical Methods

The methods used in predictive modelling are usually of two main types: global parametric and local non-parametric Global parametric models adopt an approach where each entered predictor has a universal relationship with the response variable An advantage of global parametric models, such as linear and logistic regression, is that they are easy and quick to compute, and their integration with a GIS is straightforward As an example of such a model we used logistic multiple regression (LMR) This is widely employed in predictive modelling10, but has several important limitations For instance, ecologists frequently assume a

Trang 6

response function which is unimodal and symmetric, yet this is often not justified11,12

An alternative hypothesis when modelling organism or community distributions

is to assume that the response is related to predictor variables in a non-linear and local manner Local non-parametric models are appropriate for such an approach since they use a strategy of local variable selection and reduction, and are flexible enough to allow non-linear relationships Two examples of this type of model are CART (classification and regression trees) and MARS (multiple adaptive regression splines)

All three types of model used in this study were calculated from stratified random samples of pixels with an approximately even representation of points

where each Quercus species was present or absent Each random sample covered

about 10-20% of the total area for each species One sample was used to generate the models, and a second to test the reliability of the predictions

7.5.1.1 Logistic Multiple Regression

Logistic multiple regression (LMR) has been used to generate likelihood models for forecasting in a variety of fields It requires a dichotomous (presence/absence) dependent variable and the predicted probability of presence takes the form shown in Equation 7.1:

P(i) = 1 / 1+exp[-(b0 + b1· x1 + b2· x2 +…+ b n · x n)] (7.1)

where P(i) is the probability of presence (e.g., for a tree species), x 1 .x n represent

the values of the independent variables, and b 1 b n the coefficients The predicted values from the regression are probabilities which range from 0 to 1 and can be interpreted as measures of potential suitability13 Several studies have combined LMR with GIS tools to present such probabilities in cartographic form For instance, Guisan et al.14 used LMR in the ArcInfo GIS to generate a distribution

model for the plant Carex curvula in the Swiss Alps A similar study on aquatic

vegetation was conducted by Van de Rijt et al.15 using the GRASS GIS In this study LMR was performed using a forward conditional stepwise method in SPSS® 11.516 and the results were then imported back into the ArcInfo® GIS17 for mapping

7.5.1.2 Classification and Regression Trees

CART is a rule-based method that generates a binary tree through ‘binary recursive partitioning’, a process that splits a node based on yes/no answers about the values of the predictors2 Each split is based on a single variable, and while some variables can be used several times in a model, others may not be used at all The rule generated at each step minimizes the variability within each of the two resulting subsets Applying CART often results in a complex tree of subsets based

Trang 7

on a node purity criterion and subsequently this is usually ‘pruned back’ to avoid over-fitting via cross-validation

The main drawback of CART models when used to predict organism distributions is that the generated models can be extremely complex and difficult to interpret For example, work on Australian forests by Moore et al.18 produced a tree with 510 nodes from just 10 predictors In this study, the optimal tree

generated from the Quercus rotundifolia data set had 4889 terminal nodes

Although the complexity of such a tree does not diminish its predictive power, it makes it almost impossible to interpret, which in many studies is a key requirement Moreover, implementation of such an analysis within a GIS is difficult Nevertheless, as part of this study we developed a method to translate the large CART reports (text files) to AML (Arc Macro Language) files that could be run with the ArcInfo GIS Such files can be large (e.g., the text file containing the

CART decision rules for constructing the Q rotundifolia suitability map was 1.8

Mb in size) and execution times may be long (about 55 hours for the Q

rotundifolia model)

7.5.1.3 Multivariate Adaptive Regression Splines

MARS is a relatively novel technique that combines classical linear regression, mathematical construction of splines and binary recursive partitioning to produce a local model where relationships between response and predictors can be either linear or non-linear3 To do this, MARS approximates the underlying function through a set of adaptive piecewise linear regressions termed ‘basis functions’ For

example, the first four basis functions from the Q pyrenaica model are:

BF1 = MAX (0, PT4 - 3431) BF2 = MAX (0, 3431 - PT4 ) BF3 = MAX (0, MDE50 - 1181) BF4 = MAX (0, 1181 - MDE50) where PT4 is the mean rainfall for the period October-December (l/m2 * 10) and MDE50 is elevation (m)

Changes in the slope of these basis functions occur at points called ‘knots’ (the values 3431 or 1181 in the above examples) Regression lines are allowed to bend

at the knots, which mark the end of one region of data and the beginning of another with different functional behavior Like the subdivisions in CART, knots are established in a forward/backward stepwise way A model which clearly overfits is produced first and then those knots that contribute least to efficiency are discarded

in a backwards-pruning step to avoid overfitting The best model is selected via cross-validation, a process that applies a penalty to each term (i.e., a knot) added to the model in order to keep complexity as low as possible

Trang 8

As in the CART analysis, we transformed the MARS text report files into AML and then generated the suitability models using the ArcInfo GIS

7.5.2 Model Evaluation

The predictive capacity of a model can be evaluated as a function of the percentages of correct classifications, both for presences and absences (sensitivity and specificity parameters) The sensitivity and specificity of the model depend on the threshold or cut-off, which is set so as to classify each point according to its likelihood value

To assess model performance we used the area under the Receiver Operating Characteristic (ROC) curve, particularly a measure commonly termed AUC19 The ROC curve is a plot of the relationship between sensitivity and specificity across all cut-off points of the model We developed a method to construct the ROC curves

by importing the databases associated with sample points into the SPSS statistical package The ROC curve is recommended for comparing two-class classifiers, as it does not merely summarize performance at a single arbitrarily selected decision threshold, but across all possible decision thresholds20,21 AUC is a synthesized overall measure of model accuracy where 1 indicates a perfect fit and a value of 0.5 indicates that the model is performing no better than chance AUC is also equivalent to the normalized Mann-Whitney two-sample statistic, which makes it comparable to the Wilcoxon statistic

7.6 RESULTS 7.6.1 Suitability Models

All the LMR equations, MARS basis functions and CART classification rules were translated into ArcInfo GIS syntax ArcInfo was subsequently used to generate the spatial suitability models, whose goodness-of-fit was evaluated by AUC values Table 7.1 compares the overall results for different tree species and statistical methods, with bold text highlighting the best fitting models for each species The AUC values indicate that the LMR models provided the poorest goodness-of-fit for each species, while the CART ones were the best performers However, there were some differences between tree species with a relatively

narrow range of AUC values for Q pyrenaica (i.e., all the methods produce a good fit) and a much greater one in the Q rotundifolia case This may be related to

differences in the current extent of the species (see Section 7.3) with Q

rotundifolia being the most common and therefore having potentially more

complex environmental relationships It is also worth noting that greater complexity (number of terminal nodes) in the CART models does not guarantee better results This is an interesting finding that could assist in the practicalities of implementing such models within a GIS framework

Trang 9

Table 7.1 Summary statistics for the suitability models

Terminal Nodes AUC

Confidence Interval (95%)

Sample Size MARS Not Applicable 0.972 0.970-0.974

18,880 positive cases CART 56 0.970 0.968-0.972

18,590 negative cases CART 102 0.974 0.972-0.976

41,979 negative cases CART 1016 0.975 0.974-0.977

50,690 negative cases CART 2347 0.894 0.892-0.896

Another feature of the CART model output became apparent when the results were converted into suitability maps As is illustrated in Figure 7.4a the CART maps show abrupt transitions between areas of high and low suitability (darker and lighter shading respectively) which reflects the reliance on binary rules In addition, due to the influence of climate variables, the suitability models frequently replicate the shapes of isopleths, which makes them visually less convincing Although the backward pruning process in CART reduces the number of terminal nodes and makes the final model less complex, it does not eliminate such effects These features are not present in the MARS-based maps (Figures 7.4b-7.4d) which show more smoothed and continuous distributions of suitability values For this reason, we decided to use the MARS model output to generate a potential vegetation distribution

Trang 10

Figure 7.4 Suitability models: a) CART model for Q rotundifolia, b) MARS model for Q pyrenaica, c) MARS model for Q suber, d) MARS model for Q rotundifolia Darker shading indicates higher

suitability

7.6.2 Potential Vegetation Model

Suitability models for the three tree species were combined to generate a potential vegetation distribution map that could be used to inform land management and decision-making This map was generated through a decision rule that took into account both suitability values as well as proximity to the current presence of forests We defined a function where, for each cell, the suitability value for each species was corrected by the inverse of the distance to the closest cell where the species currently grows This correction can be considered as a coarse indicator of

Định dạng
Số trang	13
Dung lượng	3,78 MB