Landslide Susceptibility Modeling: Optimization and Factor Effect Analysis Biswajeet Pradhan and Maher Ibrahim Sameen 6.1 Introduction Landslides are considered devastating natural geoha
Trang 1Landslide Susceptibility Modeling:
Optimization and Factor Effect Analysis
Biswajeet Pradhan and Maher Ibrahim Sameen
6.1 Introduction
Landslides are considered devastating natural geohazards
worldwide; they pose significant threats to human life and
result in socioeconomic losses in many countries
(Maha-lingam et al.2016) A literature search shows that
consider-able efforts have been exerted to develop new ideas and tools
that can improve the mitigation of landslide effects Onefield
that is attracting the attention of an increasing number of
researchers worldwide is landslide susceptibility modeling
(LSM) LSM is the basic information required for hazard and
risk assessments; it is also a critical component in disaster
management and mitigation (Pradhan and Lee 2009; Bui
et al.2015; Gaprindashvili and van Westen2016) Significant
studies on landslide susceptibility mapping were conducted
in the last decades, thereby creating new ideas and research
directions for future studies The optimization of landslide
conditioning factors (Jebur et al 2014), the study of the
effects of landslide sampling procedures (Hussin et al.2016),
the development of novel and hybrid models (Moosavi and
Niazi2015), and the analysis of the effects of landslide
fac-tors (Guo and Hamada2013) are among recent and signi
fi-cant research directions in landslide susceptibility studies
Landslides are triggered by several factors that create
challenges for researchers in analyzing and predicting
dif-ferent types of landslides In general, geomorphological,
topographical, geological, and hydrological factors are
among the factors that are widely studied and considered in
LSM (Pradhan 2013; Pereira et al 2013) However,
land-slide conditioning factors, such as slope, aspect, land use,
distance to road, and vegetation density are not consistent
among studies In addition, the quality and quantity of data
can also vary, thereby affect the accuracy of LSM
There-fore, a detailed analysis and comprehensive investigation of
the input data before LSM is performed are important to
increase the accuracy of landslide susceptibility models In addition, recent advances in light detection and ranging (LiDAR) technology enable landslide researchers to collect high-quality data (Kasai et al 2009) Nevertheless, chal-lenges remain because of the variability in topography and other conditions of different study areas
Several studies have attempted to provide insights into landslide conditioning factors and have investigated these factors for LSM Mahalingam et al (2016) evaluated land-slide susceptibility mapping techniques using LiDAR-derived factors in Oregon City The results of their study showed that only a few factors were necessary to produce satisfactory maps with a high predictive capability (area under the curve >0.7) Qin et al (2013) investigated uncertainties caused by digital elevation map (DEM) error in LSM The uncertainty assessment showed that modeling techniques could have varying sensitivities to DEM errors Mahalingam and Olsen (2015) assessed the influences of the source and spatial resolution of DEMs on derivative prod-ucts used in landslide mapping Their study showed that a fine resolution would not necessarily guarantee high pre-dictive accuracy in landslide mapping, and the source of the datasets would be an important consideration in LSM The effects of landslide conditioning factor combinations on the accuracy of LSM were explored by Meten et al (2015) In their study, the accuracy of LSM was improved by removing certain landslide conditioning factors based on their corre-lations with other factors Kayastha (2015) conducted a study on factor effect analysis using the frequency ratio (FR) model in Nepal The results indicated that using all nine causative factors produced the best success rate accuracy of over 80% However, in the study of Vasu and Lee (2016), an LSM with 13 relevant factors selected from the initial 23 factors presented a success rate of 85% and a prediction rate
of 89.45% Hussin et al (2016) evaluated the effects of different landslide sampling procedures on a statistical sus-ceptibility model The study demonstrated that the highest success rates were obtained when sampling shallow
B Pradhan ( &) M.I Sameen
Department of Civil Engineering, University Putra Malaysia,
Serdang, Malaysia
e-mail: biswajeet24@gmail.com
© Springer International Publishing AG 2017
B Pradhan (ed.), Laser Scanning Applications in Landslide Assessment,
DOI 10.1007/978-3-319-55342-9_6
115
Trang 2landslides as 50 m grid points and debris flow scarps as
polygons The highest prediction rates were achieved when
the entire scarp polygon method was used for both landslide
types The sample size test using the landslide centroids
showed that a sample of 104 debrisflow scarps was
suffi-cient to predict the remaining 941 debrisflows, whereas 161
shallow landslides were the minimum number required to
predict the remaining 1451 scarps
The current study used 15 landslide conditioning factors
and an adequate number of landslide inventories to
investi-gate the optimization of landslide conditioning factors and
conduct a factor effect analysis for developing landslide
susceptibility models in the Cameron Highlands, western
Malaysia After multicollinearity and factor effect analyses
were performed, Ant colony optimization (ACO) was
uti-lized to select significant landslide conditioning factors
among the initial 14 factors for further analysis Data mining
techniques, including support vector machine (SVM) and
random forest (RF), were used to analyze the effects of the
selected landslide conditioning factors on the prediction rate
accuracy of the susceptibility models Details and
discus-sions on the obtained results are presented in the remainder
of this chapter
6.2 Study Area and Landslide Inventory Data
The Cameron Highlands is a tropical rain forest district located in western Malaysia at the northwestern tip of Pahang It is approximately 200 km from Kuala Lumpur Previous studies have reported several landslides in this region, which have caused significant damages to properties (Khan 2010) The lithology of the Cameron Highlands mainly consists of Quaternary and Devonian granite and schist (Pradhan and Lee2010) The granite in the Cameron Highlands is classified as megacrysts biotite granite (Prad-han and Lee2010) A subset that occupies a surface area of approximately 25 km2 was selected for the current study because of the frequent occurrence of landslides in this area (Fig.6.1) The lowest and highest altitudes are 889.61 and 1539.49 m, respectively
Multisource remote sensing images and geographic information system (GIS) data were used to collect and prepare a landslide inventory database for LSM Remote sensing data, including archived 1: 10,000–1: 50,000 aerial photographs, SPOT 5 panchromatic satellite images, and high-resolution LiDAR-based orthophotos, were used to
Fig 6.1 Geographic location of the study area and the landslide inventory map created by using multisource remote sensing data
Trang 3visually detect landslide occurrences in the study area In
addition, all historical landslide reports, newspaper records,
and archived data for the period under examination were
collected The locations of the individual landslides were
drawn on 1:25,000 maps based on the site description,
archived database, and aerial photograph interpretation
Field observations were performed to confirm fresh landslide
scarps In the aerial photographs and SPOT 5 satellite
ima-ges, historical landslides could be observed as breaks in the
forest canopy, bare soil, or geomorphological features, such
as head and side scarps, flow tracks, and soil and debris
deposits below a scarp These landslides were then classified
and sorted based on their modes of occurrence Most of the
landslides are shallow rotational, whereas a few are
trans-lational A few landslides that occurred inflat areas were not
considered, and thus eliminated from the analysis To create
a database for assessing the surface area and number of
landslides in the study area, landslides were mapped within
an area of 25 km2 The landslide inventory map is shown in
Fig.6.1
6.2.1 Preparation of Landslide Conditioning
Factors
A geospatial database that contained 15 landslide
condi-tioning factors was prepared for susceptibility analysis in
GIS Some factors were derived from a LiDAR-based DEM
and Advanced Spaceborne Thermal Emission and Re
flec-tion Radiometer (ASTER) images, whereas others were
digitized from GIS layers collected from government
agencies First, a DEM at 0.5 m spatial resolution was
created from LiDAR point clouds using a multiscale
cur-vature algorithm and inverse distance weighted (IDW)
in-terpolation techniques implemented in ArcGIS 10.3
Subsequently, slope, aspect, profile, and plan curvature
were derived from the generated DEM at 0.5 m spatial
resolution using the spatial analysis tools of GIS In the case
of curvature, negative curvatures represent concave
sur-faces, zero curvatures represent flat surfaces, and positive
curvatures represent convex surfaces In addition, four
hydrological factors, namely the topographic wetness index
(TWI), the topographic roughness index (TRI), the stream
power index (SPI), and the sediment transport index (STI),
were derived from the slope andflow accumulation layers
The land cover map was prepared from SPOT 5 satellite
images (10 m spatial resolution) using a supervised
classi-fication method The map was verified via field survey
Then, 10 classes of land cover types were identified,
including water bodies, transportation, agriculture,
residen-tial, and bare land The normalized difference vegetation
index (NDVI) map was generated from SPOT 5 satellite
images (10 m spatial resolution) The NDVI value was
calculated using the formula NDVI = (IR− R)/(IR + R), where IR and R denote the energy reflected in the infrared and red portions, respectively, of the electromagnetic spectrum Finally, distance to road, distance to river, and distance to lineament were calculated based on the Eucli-dean distance method using the GIS layers
Several studies have explained the contributing factors of
a landslide The significance of a particular factor depends
on site-specific conditions In the current study, soil and lithology were not used because the study area consists of only one type of soil and lithology However, 15 factors were used, namely altitude, slope, aspect, profile curvature, plan curvature, land use, TWI, TRI, SPI, STI, NDVI, veg-etation density, distance to road, distance to river, and dis-tance to the fault The succeeding paragraphs briefly describe these factors
Altitude is controlled by several geological and geo-morphological processes Landslides typically occur at intermediate elevations because slopes tend to be covered by
a layer of thin colluvium, which is prone to landslides In this study, the lowest and highest altitudes were 889.61 and 1539.49 m, respectively The altitude layer was reclassified into six classes using the quantile classification method, as shown in Fig.6.2d
The slope is a measure of the rate of change in elevation
in the direction of the steepest descent and is considered the main cause of landslides The slope gradient map of the study area was divided into six slope angle classes The study area hasflat regions The highest slope was observed
at 80° (Fig.6.2e)
Aspect is defined as the slope direction measured (in degrees) from the north in a clockwise direction It ranges from 0° to 360° Parameters, such as exposure to sunlight, rainfall, and dry winds control the concentration of soil moisture, which in turn, determines landslide occurrence (Fig 6.2f) Plan curvature is described as the curvature of a contour line formed by the intersection of a horizontal plane with the surface It influences the convergence and divergence of flow across a surface Profile curvature, in which the vertical plane is parallel to the slope direction, affects the accelera-tion and deceleraaccelera-tion of downslopeflows and, consequently,
influences erosion and deposition Plan and profile curvature maps were reclassified into three classes, namely convex, flat, and concave lands, with negative, zero, and positive values, respectively (Figs.6.2g and h)
In addition to the topographical factors, land use, NDVI, and vegetation density are key conditioning factors that contribute to the occurrence of landslides Sparsely vege-tated areas are more prone to erosion and increased insta-bility than forests Vegetation strengthens the soil through an interlocking network of roots that forms erosion-resistant mats that stabilize slopes Evapotranspiration controls the wetness of slopes NDVI is frequently considered a
Trang 4Fig 6.2 Landslide conditioning factor used in the current study
Trang 5Fig 6.2 (continued)
Trang 6Fig 6.2 (continued)
Trang 7Fig 6.2 (continued)
Trang 8controlling factor in landslide susceptibility mapping In
general, when the value of NDVI is high, the area covered
by vegetation is large Furthermore, a relatively low
vege-tation coverage can easily lead to a landslide incident In this
study, a land use layer that consisted of 10 classes was used
for LSM Vegetation density was reclassified into four
classes, namely non-vegetation, low vegetation, moderate
vegetation, and dense vegetation (Fig.6.2a) NDVI was
reclassified into six classes starting from the lowest value of
−0.521 to 0.96 (Fig.6.2b)
Four hydrological factors were also used for LSM in the
current study TWI describes the effects of topography on
the location and size of saturated source areas of runoff
generation This index is calculated using Ln[AS/tan(b)],
where AS is the specific catchment area of each cell, and b
represents the slope gradient (in degrees) of the
topo-graphic heights SPI, which is a measure of the erosion
power of a stream, is also considered a factor that
con-tributes to the stability of the study area This index is
expressed as SPI = AS tan(b), where AS is the area of a
specific catchment, and b is the local slope gradient
mea-sured in degrees STI, which reflects the erosive power of
overlandflow, is derived by considering transport capacity
limiting sediment flux and catchment evolution erosion
theories TRI is another important factor that affects
land-slide susceptibility These hydrological factors were
reclassified into six classes using the quantile method and
then applied in LSM
Anthropogenic factors, such as distance to roads, distance
to rivers, and distance to faults, have been considered
important factors that influence landslides Extensive
exca-vations, application of external loads, and vegetation
removal are some of the most common actions that occur
along road network slopes during their construction The
intermittent flow regime of a hydrological network and
gullies encompasses erosive and saturation processes,
thereby increasing pore water pressure and leading to
land-slides in areas adjacent to drainage channels In addition,
geological faults are important triggering factors of
land-slides The fracturing and shearing degree plays an important
role in determining slope instability Proximity (buffers) to
these structures increases the likelihood of landslides given
that selective erosion and the movement of water along fault
planes promote these phenomena The aforementioned
lay-ers were reclassified into six classes using the quantile
method
6.3 Methodology 6.3.1 Overall Research Flow This study encompasses four methodological steps Thefirst step is the multicollinearity and factor effect analyses In the second step, relevant factors among the initial 15 landslide conditioning factors are selected using ACO The third step involves the application of the susceptibility models using several experiments that aim to analyze the effects of rele-vant factors In the last step, susceptibility models are vali-dated using receiver operator characteristic (ROC) curves The overall workflow of this study is shown in Fig.6.3
6.3.2 Selection of Relevant Factors Using ACO ACO is a metaheuristic optimization technique whose applications have developed significantly The advantages of ACO include a probabilistic decision in terms of artificial pheromone trails and local heuristic information These advantages enable the exploration of a larger number of solutions compared with that of greedy heuristics (Gottlieb
et al.2003) The overall workflow of the ACO-based land-slide factor selection is presented in Fig.6.4 First, ants were generated and then placed randomly on a graph, i.e., each ant starts with one random landslide factor The number of ants placed on the graph may be set to be equal to the number of factors of the data; each ant initiates a path construction at a different factor The ants traverse nodes probabilistically from their initial positions until a traversal stopping criterion
is satisfied The resulting subsets are gathered and evaluated When an optimal subset has been found or when the algo-rithm has been executed a certain number of times, the process stops and the best encountered factor subset is out-putted If none of these conditions hold, then the pheromone
is updated, a new set of ants are created, and the process is reiterated
6.3.3 Susceptibility Models
In this study, susceptibility maps were produced using two data mining approaches: SVM and RF These algorithms were used to determine whether the results were consistent
or the performance of the susceptibility models with
Trang 9Fig 6.3 Overall research activities used to optimize landslide conditioning factors, conduct factor effect analysis, and develop improved susceptibility models
Fig 6.4 Overall work flow of factor subset selection by ACO method
Trang 10significant factors varied from one model to another The
subsequent sections briefly describe the basic concept of the
algorithms
6.3.3.1 SVM
SVM was originally developed by Vladimir and Vapnik
(1995) as a more recent machine learning method than
artificial neural networks SVM uses the training data to
convert the original input space implicitly into
high-dimensional feature space based on kernel functions
(Brenning 2005) Subsequently, the optimal hyperplane in
the feature space is determined by maximizing the margins
of class boundaries (Abe2005) Therefore, SVM training is
modeled by constraining the duality optimal solution In
general, kernel types include linear, polynomial, and radial
basis function (RBF) or Gaussian kernels The RBF kernel
was applied in this study because it was proven to be the
most powerful kernel for addressing nonlinear cases (Yao
et al.2008)
6.3.3.2 RF
RF is an ensemble machine learning method that generates
numerous classification trees that are combined to compute a
classification (Breiman et al 1984; Breiman2001) Hansen
and Salamon (1990) indicated that a necessary and sufficient
condition for an ensemble of classification trees to be more
accurate than any of its individual member was that the
members of the ensemble must perform better than random
members and should be diverse RF increases diversity
among classification trees by resampling the data with
replacement and randomly changing the predictive variable
sets over different tree induction processes The RF
algo-rithm involves two main user-defined parameters that require
appropriate specifications: the number of trees (k) and the
number of predictive variables A predictive variable may be
numerical or categorical, and translation into the design
variables is unnecessary An unbiased estimate of the
gen-eralization error is obtained during the construction of an
RF The proportion of misclassifications (%) overall
out-of-bag (OOB) elements is called the OOB error
The OOB error is an unbiased estimate of the generalization
error Breiman (2001) proved that RF produces a limiting
value of the generalization error As the number of trees
increases, the generalization error always converges The
value of k must be set sufficiently high to allow this
con-vergence The RF algorithm estimates the importance of a
predictive variable by examining the OOB errors An
increase in the OOB error is relative to predictive variable
importance
The advantages of RF include resistance to overtraining and the capability to grow a large number of RF trees without creating a risk of overfitting RF algorithm data do not need to be rescaled, transformed, or modified; they are also resistant to outliers in predictors In this study, the number of trees in an RF wasfixed at 500 for RF modeling after a primary analysis, and m sampled at each node was set
at 3 to analyze the combined contributions of subsets of features while maintaining fast convergence during itera-tions No calibration set is required to regulate the parame-ters (Micheletti et al 2014) The importance and standardized rank of each landslide variable were calculated The ranks were then used to overlay landslide factors and generate the susceptibility maps
6.4 Results 6.4.1 Multicollinearity Analysis Multicollinearity analysis is an important step in LSM The existence of a near-linear relationship among factors can create a division-by-zero problem during regression calcu-lations This problem can cause the calculations to be aborted and the relationship to be inexact; division by an extremely small quantity still distorts the results Therefore, analyzing landslide conditioning factors before LSM is important In multicollinearity analysis, collinear (depen-dent) factors are identified by examining a correlation matrix constructed by calculating R2 Various quantitative methods for detecting multicollinearities, such as pairwise scatter plots, estimation of the variance inflation factor (VIF), and investigation of eigenvalues in a correlation matrix, are available In this study, multicollinearity was detected by calculating the VIF values of each landslide conditioning factor In addition, communalities similar to R2 were cal-culated for each factor (Costello2009) Communality shows how well a variable is predicted by the retained factors Table6.1 presents the estimated communalities and VIF values for each landslide conditioning factor The second column of Table6.1indicates that some factors, such as land use, distance to road, distance to river, slope, STI, TWI, and TRI, exhibit strong linear relationships with other factors These factors may negatively affect the regression analysis However, VIF values are quantitative measures that are typically used to conclude whether a factor has a problem In some studies, a VIF greater than two was considered prob-lematic, whereas in other studies, a VIF greater than 10 was considered problematic (Garrosa et al 2010) To solve the