The research develops a set of likelihood or suitability models for the presence of tree species that are widely distributed over a study area of 41,000 km2.. Suitability models can be g
Trang 1117
GIS and Predictive Modelling: A Comparison of Methods for Forest Management and Decision-Making
A Felicísimo and A Gómez-Muñoz 7.1 INTRODUCTION
GIS can be a useful tool for spatial or land-use planning, but only if several conditions are fulfilled The key conditions are related to 1) the quality of basic spatial information, and 2) the statistical methods applied to the spatial nature of the data Appropriate information and methods allow the generation of robust models that guarantee objective and methodologically sound decisions
In this study we apply several multivariate statistical methods and test their usefulness to provide robust solutions in forestry planning using GIS We must emphasize that in our Iberian study area, where forests have progressively decreased in extent over centuries, the main aims of forestry planning are the reduction of forest fragmentation, biodiversity conservation, and restoration of degraded biotopes
The research develops a set of likelihood or suitability models for the presence
of tree species that are widely distributed over a study area of 41,000 km2 The utility of suitability models has been demonstrated in some previous studies1, but they are still not as widely employed as might be expected
A suitability model is a raster map in which each pixel is assigned a value reflecting suitability for a given use (e.g., presence of a tree species) Suitability models can be generated through diverse techniques, such as logistic regression or non-parametric CART (classification and regression trees) and MARS (multiple adaptive regression splines)2-4 All of these techniques require a vegetation map (dependent variable) and a set of environmental variables (climate, topography, geology, etc.) which potentially influence the vegetation distribution The foundation of the method is to establish relationships between the environmental variables and the spatial distribution of the vegetation Typically, each vegetation type will respond in a different way as a consequence of its contrasting environmental requirements
Suitability is commonly expressed on a 0-1 scale (incompatible-ideal) The precise value depends on a set of physical and biological factors that favor or limit the growth of each type of vegetation Once the distribution of suitability values across a region is known, decisions on land use and management can be made on the basis of objective criteria
© 2008 by Taylor & Francis Group, LLC
Trang 2The set of suitability values for a region can be considered as the potential distribution model if presented as a map: the area defined as ‘suitable’ in a model should reflect the potential area for the vegetation type under consideration Such
a model also represents the relationships between presence/absence of each forest type and the values of the potentially influential environmental variables in a given region Usually, current forest distributions are significantly smaller than the potential spatial extents because they have been systematically logged Potential distribution models allow the recognition and delineation of such former distribution areas in order to direct current and future management plans, provide valuable data for restoration initiatives and highlight areas where such actions should be considered a priority
7.2 OBJECTIVES
The main objectives of the study were to 1) use several different statistical methods to generate maps of potential distributions and suitability for each of three
species of Quercus (oak) in the study area, and 2) identify the most appropriate
method and assess its advantages and limitations In order to fulfill these objectives, we developed a workflow that included sampling strategies, GIS implementation of statistical models and validation of results
7.3 STUDY AREA
The study area was Extremadura, one of the 17 Autonomous Communities of Spain, covering 41,680 km2, and located in the west of the Iberian Peninsula (Figure 7.1) It has a Mediterranean climate, somewhat softened by the relative proximity to the sea and the passage of frontal systems from the Atlantic
The study subjects, which partially cover this area, were three species of the
genus Quercus that grow in forests or ‘dehesas’ Dehesas are artificial ecotypes
derived from original forest clearings (Figure 7.2) Continuous forest cover disappeared centuries ago and currently only scattered patches remain over a large potential area In some places deforestation was complete and not even the most
open dehesas remain Trees from the genus Quercus are the dominant constituents
of forests in the area, the most important species (and those considered in the
analysis) being Quercus rotundifolia Lam (holm oak, 12,680 km2, synonym:
Quercus ilex L ssp ballota (Desf.) Samp.), Quercus suber L (cork oak, 2,130
km2) and Quercus pyrenaica Wild (Pyrenean oak, 950 km2) With some exceptions, Pyrenean oak appears most commonly in forests, while cork and holm oaks preferentially occur in dehesas
Trang 3Figure 7.1 Location of Extremadura in the Iberian Peninsula
Figure 7.2 Dehesas are artificial ecotypes comparable to savannas: a Mediterranean (seasonal) grassland
containing scattered trees of the genus Quercus
Trang 47.4 DATA
A set of raster maps was compiled to reflect the spatial distribution of dependent and independent (predictive) variables
7.4.1 Quercus Distributions
Current Quercus species distribution maps were taken from the Forestry Map of
Spain (scale 1:50,000), produced by the Spanish General Directorate for Nature Conservation during the period 1986-96 We used the digital version of the map to identify the main vegetation classes and the current spatial distributions (Figure 7.3)
Figure 7.3 Current distribution of Quercus species in the study area (black represents Pyrenean oak, Q
pyrenaica; dark gray, cork oak, Q suber; and pale gray, holm oak, Q rotundifolia)
Trang 57.4.2 Predictive Variables
Raster maps were generated to represent the following independent variables:
• Elevation A digital elevation model (DEM) was constructed using
Delaunay triangulation of spot height and contour data from the 1:50,000 scale topographic map of the Army Geographical Service, followed by transformation to a regular 100 m resolution grid
• Slope angle was calculated from the DEM by applying Sobel's algorithm5
• Potential insolation A measure was derived following the method
proposed by Fernández Cepedal and Felicísimo6 This used the DEM to assess the extent of topographical shading given the position of the sun at different standard date periods7 The result was an estimate of the time that each point on the terrain surface was directly illuminated by solar radiation The temporal resolution was 20 minutes and the spatial resolution 100 m
• Temperature maps of the annual maxima and minima were interpolated
from data for 140 meteorological monitoring points (National Institute of Meteorology, Spain) using the thin-plate spline method8,9 with a spatial resolution of 500 m
• Quarterly rainfall maps were interpolated from data for 276
meteorological monitoring points (National Institute of Meteorology, Spain) using the thin-plate spline method with a 500 m spatial resolution These variables were selected because of their potential influence on the distribution of the vegetation and the availability of sufficient data to generate GIS digital layers Lack of data eliminated other variables (e.g., soils) commonly used
in ecological modelling
7.5 METHODS 7.5.1 Statistical Methods
The methods used in predictive modelling are usually of two main types: global parametric and local non-parametric Global parametric models adopt an approach where each entered predictor has a universal relationship with the response variable An advantage of global parametric models, such as linear and logistic regression, is that they are easy and quick to compute, and their integration with a GIS is straightforward As an example of such a model we used logistic multiple regression (LMR) This is widely employed in predictive modelling10, but has several important limitations For instance, ecologists frequently assume a
Trang 6response function which is unimodal and symmetric, yet this is often not justified11,12
An alternative hypothesis when modelling organism or community distributions
is to assume that the response is related to predictor variables in a non-linear and local manner Local non-parametric models are appropriate for such an approach since they use a strategy of local variable selection and reduction, and are flexible enough to allow non-linear relationships Two examples of this type of model are CART (classification and regression trees) and MARS (multiple adaptive regression splines)
All three types of model used in this study were calculated from stratified random samples of pixels with an approximately even representation of points
where each Quercus species was present or absent Each random sample covered
about 10-20% of the total area for each species One sample was used to generate the models, and a second to test the reliability of the predictions
7.5.1.1 Logistic Multiple Regression
Logistic multiple regression (LMR) has been used to generate likelihood models for forecasting in a variety of fields It requires a dichotomous (presence/absence) dependent variable and the predicted probability of presence takes the form shown in Equation 7.1:
P(i) = 1 / 1+exp[-(b0 + b1· x1 + b2· x2 +…+ b n · x n)] (7.1)
where P(i) is the probability of presence (e.g., for a tree species), x 1 .x n represent
the values of the independent variables, and b 1 b n the coefficients The predicted values from the regression are probabilities which range from 0 to 1 and can be interpreted as measures of potential suitability13 Several studies have combined LMR with GIS tools to present such probabilities in cartographic form For instance, Guisan et al.14 used LMR in the ArcInfo GIS to generate a distribution
model for the plant Carex curvula in the Swiss Alps A similar study on aquatic
vegetation was conducted by Van de Rijt et al.15 using the GRASS GIS In this study LMR was performed using a forward conditional stepwise method in SPSS® 11.516 and the results were then imported back into the ArcInfo® GIS17 for mapping
7.5.1.2 Classification and Regression Trees
CART is a rule-based method that generates a binary tree through ‘binary recursive partitioning’, a process that splits a node based on yes/no answers about the values of the predictors2 Each split is based on a single variable, and while some variables can be used several times in a model, others may not be used at all The rule generated at each step minimizes the variability within each of the two resulting subsets Applying CART often results in a complex tree of subsets based
Trang 7on a node purity criterion and subsequently this is usually ‘pruned back’ to avoid over-fitting via cross-validation
The main drawback of CART models when used to predict organism distributions is that the generated models can be extremely complex and difficult to interpret For example, work on Australian forests by Moore et al.18 produced a tree with 510 nodes from just 10 predictors In this study, the optimal tree
generated from the Quercus rotundifolia data set had 4889 terminal nodes
Although the complexity of such a tree does not diminish its predictive power, it makes it almost impossible to interpret, which in many studies is a key requirement Moreover, implementation of such an analysis within a GIS is difficult Nevertheless, as part of this study we developed a method to translate the large CART reports (text files) to AML (Arc Macro Language) files that could be run with the ArcInfo GIS Such files can be large (e.g., the text file containing the
CART decision rules for constructing the Q rotundifolia suitability map was 1.8
Mb in size) and execution times may be long (about 55 hours for the Q
rotundifolia model)
7.5.1.3 Multivariate Adaptive Regression Splines
MARS is a relatively novel technique that combines classical linear regression, mathematical construction of splines and binary recursive partitioning to produce a local model where relationships between response and predictors can be either linear or non-linear3 To do this, MARS approximates the underlying function through a set of adaptive piecewise linear regressions termed ‘basis functions’ For
example, the first four basis functions from the Q pyrenaica model are:
BF1 = MAX (0, PT4 - 3431) BF2 = MAX (0, 3431 - PT4 ) BF3 = MAX (0, MDE50 - 1181) BF4 = MAX (0, 1181 - MDE50) where PT4 is the mean rainfall for the period October-December (l/m2 * 10) and MDE50 is elevation (m)
Changes in the slope of these basis functions occur at points called ‘knots’ (the values 3431 or 1181 in the above examples) Regression lines are allowed to bend
at the knots, which mark the end of one region of data and the beginning of another with different functional behavior Like the subdivisions in CART, knots are established in a forward/backward stepwise way A model which clearly overfits is produced first and then those knots that contribute least to efficiency are discarded
in a backwards-pruning step to avoid overfitting The best model is selected via cross-validation, a process that applies a penalty to each term (i.e., a knot) added to the model in order to keep complexity as low as possible
Trang 8As in the CART analysis, we transformed the MARS text report files into AML and then generated the suitability models using the ArcInfo GIS
7.5.2 Model Evaluation
The predictive capacity of a model can be evaluated as a function of the percentages of correct classifications, both for presences and absences (sensitivity and specificity parameters) The sensitivity and specificity of the model depend on the threshold or cut-off, which is set so as to classify each point according to its likelihood value
To assess model performance we used the area under the Receiver Operating Characteristic (ROC) curve, particularly a measure commonly termed AUC19 The ROC curve is a plot of the relationship between sensitivity and specificity across all cut-off points of the model We developed a method to construct the ROC curves
by importing the databases associated with sample points into the SPSS statistical package The ROC curve is recommended for comparing two-class classifiers, as it does not merely summarize performance at a single arbitrarily selected decision threshold, but across all possible decision thresholds20,21 AUC is a synthesized overall measure of model accuracy where 1 indicates a perfect fit and a value of 0.5 indicates that the model is performing no better than chance AUC is also equivalent to the normalized Mann-Whitney two-sample statistic, which makes it comparable to the Wilcoxon statistic
7.6 RESULTS 7.6.1 Suitability Models
All the LMR equations, MARS basis functions and CART classification rules were translated into ArcInfo GIS syntax ArcInfo was subsequently used to generate the spatial suitability models, whose goodness-of-fit was evaluated by AUC values Table 7.1 compares the overall results for different tree species and statistical methods, with bold text highlighting the best fitting models for each species The AUC values indicate that the LMR models provided the poorest goodness-of-fit for each species, while the CART ones were the best performers However, there were some differences between tree species with a relatively
narrow range of AUC values for Q pyrenaica (i.e., all the methods produce a good fit) and a much greater one in the Q rotundifolia case This may be related to
differences in the current extent of the species (see Section 7.3) with Q
rotundifolia being the most common and therefore having potentially more
complex environmental relationships It is also worth noting that greater complexity (number of terminal nodes) in the CART models does not guarantee better results This is an interesting finding that could assist in the practicalities of implementing such models within a GIS framework
Trang 9Table 7.1 Summary statistics for the suitability models
Terminal Nodes AUC
Confidence Interval (95%)
Sample Size MARS Not Applicable 0.972 0.970-0.974
18,880 positive cases CART 56 0.970 0.968-0.972
18,590 negative cases CART 102 0.974 0.972-0.976
Sample Size MARS Not Applicable 0.802 0.799-0.805
42,040 positive cases CART 525 0.971 0.970-0.972
41,979 negative cases CART 1016 0.975 0.974-0.977
Sample Size MARS Not Applicable 0.767 0.764-0.770
50,394 positive cases CART 1343 0.889 0.887-0.891
50,690 negative cases CART 2347 0.894 0.892-0.896
Another feature of the CART model output became apparent when the results were converted into suitability maps As is illustrated in Figure 7.4a the CART maps show abrupt transitions between areas of high and low suitability (darker and lighter shading respectively) which reflects the reliance on binary rules In addition, due to the influence of climate variables, the suitability models frequently replicate the shapes of isopleths, which makes them visually less convincing Although the backward pruning process in CART reduces the number of terminal nodes and makes the final model less complex, it does not eliminate such effects These features are not present in the MARS-based maps (Figures 7.4b-7.4d) which show more smoothed and continuous distributions of suitability values For this reason, we decided to use the MARS model output to generate a potential vegetation distribution
Trang 10Figure 7.4 Suitability models: a) CART model for Q rotundifolia, b) MARS model for Q pyrenaica, c) MARS model for Q suber, d) MARS model for Q rotundifolia Darker shading indicates higher
suitability
7.6.2 Potential Vegetation Model
Suitability models for the three tree species were combined to generate a potential vegetation distribution map that could be used to inform land management and decision-making This map was generated through a decision rule that took into account both suitability values as well as proximity to the current presence of forests We defined a function where, for each cell, the suitability value for each species was corrected by the inverse of the distance to the closest cell where the species currently grows This correction can be considered as a coarse indicator of