Landslide susceptibility modeling optimization and factor effect analysis

Landslide Susceptibility Modeling: Optimization and Factor Effect Analysis Biswajeet Pradhan and Maher Ibrahim Sameen 6.1 Introduction Landslides are considered devastating natural geoha

Trang 1

Landslide Susceptibility Modeling:

Optimization and Factor Effect Analysis

Biswajeet Pradhan and Maher Ibrahim Sameen

6.1 Introduction

Landslides are considered devastating natural geohazards

worldwide; they pose signiﬁcant threats to human life and

result in socioeconomic losses in many countries

(Maha-lingam et al.2016) A literature search shows that

consider-able efforts have been exerted to develop new ideas and tools

that can improve the mitigation of landslide effects Oneﬁeld

that is attracting the attention of an increasing number of

researchers worldwide is landslide susceptibility modeling

(LSM) LSM is the basic information required for hazard and

risk assessments; it is also a critical component in disaster

management and mitigation (Pradhan and Lee 2009; Bui

et al.2015; Gaprindashvili and van Westen2016) Signiﬁcant

studies on landslide susceptibility mapping were conducted

in the last decades, thereby creating new ideas and research

directions for future studies The optimization of landslide

conditioning factors (Jebur et al 2014), the study of the

effects of landslide sampling procedures (Hussin et al.2016),

the development of novel and hybrid models (Moosavi and

Niazi2015), and the analysis of the effects of landslide

fac-tors (Guo and Hamada2013) are among recent and signi

ﬁ-cant research directions in landslide susceptibility studies

Landslides are triggered by several factors that create

challenges for researchers in analyzing and predicting

dif-ferent types of landslides In general, geomorphological,

topographical, geological, and hydrological factors are

among the factors that are widely studied and considered in

LSM (Pradhan 2013; Pereira et al 2013) However,

land-slide conditioning factors, such as slope, aspect, land use,

distance to road, and vegetation density are not consistent

among studies In addition, the quality and quantity of data

can also vary, thereby affect the accuracy of LSM

There-fore, a detailed analysis and comprehensive investigation of

the input data before LSM is performed are important to

increase the accuracy of landslide susceptibility models In addition, recent advances in light detection and ranging (LiDAR) technology enable landslide researchers to collect high-quality data (Kasai et al 2009) Nevertheless, chal-lenges remain because of the variability in topography and other conditions of different study areas

Several studies have attempted to provide insights into landslide conditioning factors and have investigated these factors for LSM Mahalingam et al (2016) evaluated land-slide susceptibility mapping techniques using LiDAR-derived factors in Oregon City The results of their study showed that only a few factors were necessary to produce satisfactory maps with a high predictive capability (area under the curve >0.7) Qin et al (2013) investigated uncertainties caused by digital elevation map (DEM) error in LSM The uncertainty assessment showed that modeling techniques could have varying sensitivities to DEM errors Mahalingam and Olsen (2015) assessed the influences of the source and spatial resolution of DEMs on derivative prod-ucts used in landslide mapping Their study showed that a ﬁne resolution would not necessarily guarantee high pre-dictive accuracy in landslide mapping, and the source of the datasets would be an important consideration in LSM The effects of landslide conditioning factor combinations on the accuracy of LSM were explored by Meten et al (2015) In their study, the accuracy of LSM was improved by removing certain landslide conditioning factors based on their corre-lations with other factors Kayastha (2015) conducted a study on factor effect analysis using the frequency ratio (FR) model in Nepal The results indicated that using all nine causative factors produced the best success rate accuracy of over 80% However, in the study of Vasu and Lee (2016), an LSM with 13 relevant factors selected from the initial 23 factors presented a success rate of 85% and a prediction rate

of 89.45% Hussin et al (2016) evaluated the effects of different landslide sampling procedures on a statistical sus-ceptibility model The study demonstrated that the highest success rates were obtained when sampling shallow

B Pradhan ( &) M.I Sameen

Department of Civil Engineering, University Putra Malaysia,

Serdang, Malaysia

e-mail: biswajeet24@gmail.com

B Pradhan (ed.), Laser Scanning Applications in Landslide Assessment,

DOI 10.1007/978-3-319-55342-9_6

115

Trang 2

landslides as 50 m grid points and debris flow scarps as

polygons The highest prediction rates were achieved when

the entire scarp polygon method was used for both landslide

types The sample size test using the landslide centroids

showed that a sample of 104 debrisflow scarps was

sufﬁ-cient to predict the remaining 941 debrisflows, whereas 161

shallow landslides were the minimum number required to

predict the remaining 1451 scarps

The current study used 15 landslide conditioning factors

and an adequate number of landslide inventories to

investi-gate the optimization of landslide conditioning factors and

conduct a factor effect analysis for developing landslide

susceptibility models in the Cameron Highlands, western

Malaysia After multicollinearity and factor effect analyses

were performed, Ant colony optimization (ACO) was

uti-lized to select signiﬁcant landslide conditioning factors

among the initial 14 factors for further analysis Data mining

techniques, including support vector machine (SVM) and

random forest (RF), were used to analyze the effects of the

selected landslide conditioning factors on the prediction rate

accuracy of the susceptibility models Details and

discus-sions on the obtained results are presented in the remainder

of this chapter

6.2 Study Area and Landslide Inventory Data

The Cameron Highlands is a tropical rain forest district located in western Malaysia at the northwestern tip of Pahang It is approximately 200 km from Kuala Lumpur Previous studies have reported several landslides in this region, which have caused signiﬁcant damages to properties (Khan 2010) The lithology of the Cameron Highlands mainly consists of Quaternary and Devonian granite and schist (Pradhan and Lee2010) The granite in the Cameron Highlands is classiﬁed as megacrysts biotite granite (Prad-han and Lee2010) A subset that occupies a surface area of approximately 25 km2 was selected for the current study because of the frequent occurrence of landslides in this area (Fig.6.1) The lowest and highest altitudes are 889.61 and 1539.49 m, respectively

Multisource remote sensing images and geographic information system (GIS) data were used to collect and prepare a landslide inventory database for LSM Remote sensing data, including archived 1: 10,000–1: 50,000 aerial photographs, SPOT 5 panchromatic satellite images, and high-resolution LiDAR-based orthophotos, were used to

Fig 6.1 Geographic location of the study area and the landslide inventory map created by using multisource remote sensing data

Trang 3

visually detect landslide occurrences in the study area In

addition, all historical landslide reports, newspaper records,

and archived data for the period under examination were

collected The locations of the individual landslides were

drawn on 1:25,000 maps based on the site description,

archived database, and aerial photograph interpretation

Field observations were performed to conﬁrm fresh landslide

scarps In the aerial photographs and SPOT 5 satellite

ima-ges, historical landslides could be observed as breaks in the

forest canopy, bare soil, or geomorphological features, such

as head and side scarps, flow tracks, and soil and debris

deposits below a scarp These landslides were then classiﬁed

and sorted based on their modes of occurrence Most of the

landslides are shallow rotational, whereas a few are

trans-lational A few landslides that occurred inflat areas were not

considered, and thus eliminated from the analysis To create

a database for assessing the surface area and number of

landslides in the study area, landslides were mapped within

an area of 25 km2 The landslide inventory map is shown in

Fig.6.1

6.2.1 Preparation of Landslide Conditioning

Factors

A geospatial database that contained 15 landslide

condi-tioning factors was prepared for susceptibility analysis in

GIS Some factors were derived from a LiDAR-based DEM

and Advanced Spaceborne Thermal Emission and Re

flec-tion Radiometer (ASTER) images, whereas others were

digitized from GIS layers collected from government

agencies First, a DEM at 0.5 m spatial resolution was

created from LiDAR point clouds using a multiscale

cur-vature algorithm and inverse distance weighted (IDW)

in-terpolation techniques implemented in ArcGIS 10.3

Subsequently, slope, aspect, proﬁle, and plan curvature

were derived from the generated DEM at 0.5 m spatial

resolution using the spatial analysis tools of GIS In the case

of curvature, negative curvatures represent concave

sur-faces, zero curvatures represent flat surfaces, and positive

curvatures represent convex surfaces In addition, four

hydrological factors, namely the topographic wetness index

(TWI), the topographic roughness index (TRI), the stream

power index (SPI), and the sediment transport index (STI),

were derived from the slope andflow accumulation layers

The land cover map was prepared from SPOT 5 satellite

images (10 m spatial resolution) using a supervised

classi-fication method The map was verified via field survey

Then, 10 classes of land cover types were identiﬁed,

including water bodies, transportation, agriculture,

residen-tial, and bare land The normalized difference vegetation

index (NDVI) map was generated from SPOT 5 satellite

images (10 m spatial resolution) The NDVI value was

calculated using the formula NDVI = (IR− R)/(IR + R), where IR and R denote the energy reflected in the infrared and red portions, respectively, of the electromagnetic spectrum Finally, distance to road, distance to river, and distance to lineament were calculated based on the Eucli-dean distance method using the GIS layers

Several studies have explained the contributing factors of

a landslide The signiﬁcance of a particular factor depends

on site-speciﬁc conditions In the current study, soil and lithology were not used because the study area consists of only one type of soil and lithology However, 15 factors were used, namely altitude, slope, aspect, proﬁle curvature, plan curvature, land use, TWI, TRI, SPI, STI, NDVI, veg-etation density, distance to road, distance to river, and dis-tance to the fault The succeeding paragraphs briefly describe these factors

Altitude is controlled by several geological and geo-morphological processes Landslides typically occur at intermediate elevations because slopes tend to be covered by

a layer of thin colluvium, which is prone to landslides In this study, the lowest and highest altitudes were 889.61 and 1539.49 m, respectively The altitude layer was reclassiﬁed into six classes using the quantile classiﬁcation method, as shown in Fig.6.2d

The slope is a measure of the rate of change in elevation

in the direction of the steepest descent and is considered the main cause of landslides The slope gradient map of the study area was divided into six slope angle classes The study area hasflat regions The highest slope was observed

at 80° (Fig.6.2e)

Aspect is deﬁned as the slope direction measured (in degrees) from the north in a clockwise direction It ranges from 0° to 360° Parameters, such as exposure to sunlight, rainfall, and dry winds control the concentration of soil moisture, which in turn, determines landslide occurrence (Fig 6.2f) Plan curvature is described as the curvature of a contour line formed by the intersection of a horizontal plane with the surface It influences the convergence and divergence of flow across a surface Proﬁle curvature, in which the vertical plane is parallel to the slope direction, affects the accelera-tion and deceleraaccelera-tion of downslopeflows and, consequently,

influences erosion and deposition Plan and proﬁle curvature maps were reclassiﬁed into three classes, namely convex, flat, and concave lands, with negative, zero, and positive values, respectively (Figs.6.2g and h)

In addition to the topographical factors, land use, NDVI, and vegetation density are key conditioning factors that contribute to the occurrence of landslides Sparsely vege-tated areas are more prone to erosion and increased insta-bility than forests Vegetation strengthens the soil through an interlocking network of roots that forms erosion-resistant mats that stabilize slopes Evapotranspiration controls the wetness of slopes NDVI is frequently considered a

Trang 4

Fig 6.2 Landslide conditioning factor used in the current study

Trang 5

Fig 6.2 (continued)

Trang 6

Fig 6.2 (continued)

Trang 7

Fig 6.2 (continued)

Trang 8

controlling factor in landslide susceptibility mapping In

general, when the value of NDVI is high, the area covered

by vegetation is large Furthermore, a relatively low

vege-tation coverage can easily lead to a landslide incident In this

study, a land use layer that consisted of 10 classes was used

for LSM Vegetation density was reclassiﬁed into four

classes, namely non-vegetation, low vegetation, moderate

vegetation, and dense vegetation (Fig.6.2a) NDVI was

reclassiﬁed into six classes starting from the lowest value of

−0.521 to 0.96 (Fig.6.2b)

Four hydrological factors were also used for LSM in the

current study TWI describes the effects of topography on

the location and size of saturated source areas of runoff

generation This index is calculated using Ln[AS/tan(b)],

where AS is the speciﬁc catchment area of each cell, and b

represents the slope gradient (in degrees) of the

topo-graphic heights SPI, which is a measure of the erosion

power of a stream, is also considered a factor that

con-tributes to the stability of the study area This index is

expressed as SPI = AS tan(b), where AS is the area of a

speciﬁc catchment, and b is the local slope gradient

mea-sured in degrees STI, which reflects the erosive power of

overlandflow, is derived by considering transport capacity

limiting sediment flux and catchment evolution erosion

theories TRI is another important factor that affects

land-slide susceptibility These hydrological factors were

reclassiﬁed into six classes using the quantile method and

then applied in LSM

Anthropogenic factors, such as distance to roads, distance

to rivers, and distance to faults, have been considered

important factors that influence landslides Extensive

exca-vations, application of external loads, and vegetation

removal are some of the most common actions that occur

along road network slopes during their construction The

intermittent flow regime of a hydrological network and

gullies encompasses erosive and saturation processes,

thereby increasing pore water pressure and leading to

land-slides in areas adjacent to drainage channels In addition,

geological faults are important triggering factors of

land-slides The fracturing and shearing degree plays an important

role in determining slope instability Proximity (buffers) to

these structures increases the likelihood of landslides given

that selective erosion and the movement of water along fault

planes promote these phenomena The aforementioned

lay-ers were reclassiﬁed into six classes using the quantile

method

6.3 Methodology 6.3.1 Overall Research Flow This study encompasses four methodological steps Theﬁrst step is the multicollinearity and factor effect analyses In the second step, relevant factors among the initial 15 landslide conditioning factors are selected using ACO The third step involves the application of the susceptibility models using several experiments that aim to analyze the effects of rele-vant factors In the last step, susceptibility models are vali-dated using receiver operator characteristic (ROC) curves The overall workflow of this study is shown in Fig.6.3

6.3.2 Selection of Relevant Factors Using ACO ACO is a metaheuristic optimization technique whose applications have developed signiﬁcantly The advantages of ACO include a probabilistic decision in terms of artiﬁcial pheromone trails and local heuristic information These advantages enable the exploration of a larger number of solutions compared with that of greedy heuristics (Gottlieb

et al.2003) The overall workflow of the ACO-based land-slide factor selection is presented in Fig.6.4 First, ants were generated and then placed randomly on a graph, i.e., each ant starts with one random landslide factor The number of ants placed on the graph may be set to be equal to the number of factors of the data; each ant initiates a path construction at a different factor The ants traverse nodes probabilistically from their initial positions until a traversal stopping criterion

is satisﬁed The resulting subsets are gathered and evaluated When an optimal subset has been found or when the algo-rithm has been executed a certain number of times, the process stops and the best encountered factor subset is out-putted If none of these conditions hold, then the pheromone

is updated, a new set of ants are created, and the process is reiterated

6.3.3 Susceptibility Models

In this study, susceptibility maps were produced using two data mining approaches: SVM and RF These algorithms were used to determine whether the results were consistent

or the performance of the susceptibility models with

Trang 9

Fig 6.3 Overall research activities used to optimize landslide conditioning factors, conduct factor effect analysis, and develop improved susceptibility models

Fig 6.4 Overall work flow of factor subset selection by ACO method

Trang 10

signiﬁcant factors varied from one model to another The

subsequent sections briefly describe the basic concept of the

algorithms

6.3.3.1 SVM

SVM was originally developed by Vladimir and Vapnik

(1995) as a more recent machine learning method than

artiﬁcial neural networks SVM uses the training data to

convert the original input space implicitly into

high-dimensional feature space based on kernel functions

(Brenning 2005) Subsequently, the optimal hyperplane in

the feature space is determined by maximizing the margins

of class boundaries (Abe2005) Therefore, SVM training is

modeled by constraining the duality optimal solution In

general, kernel types include linear, polynomial, and radial

basis function (RBF) or Gaussian kernels The RBF kernel

was applied in this study because it was proven to be the

most powerful kernel for addressing nonlinear cases (Yao

et al.2008)

6.3.3.2 RF

RF is an ensemble machine learning method that generates

numerous classiﬁcation trees that are combined to compute a

classiﬁcation (Breiman et al 1984; Breiman2001) Hansen

and Salamon (1990) indicated that a necessary and sufﬁcient

condition for an ensemble of classiﬁcation trees to be more

accurate than any of its individual member was that the

members of the ensemble must perform better than random

members and should be diverse RF increases diversity

among classiﬁcation trees by resampling the data with

replacement and randomly changing the predictive variable

sets over different tree induction processes The RF

algo-rithm involves two main user-deﬁned parameters that require

appropriate speciﬁcations: the number of trees (k) and the

number of predictive variables A predictive variable may be

numerical or categorical, and translation into the design

variables is unnecessary An unbiased estimate of the

gen-eralization error is obtained during the construction of an

RF The proportion of misclassiﬁcations (%) overall

out-of-bag (OOB) elements is called the OOB error

The OOB error is an unbiased estimate of the generalization

error Breiman (2001) proved that RF produces a limiting

value of the generalization error As the number of trees

increases, the generalization error always converges The

value of k must be set sufﬁciently high to allow this

con-vergence The RF algorithm estimates the importance of a

predictive variable by examining the OOB errors An

increase in the OOB error is relative to predictive variable

importance

The advantages of RF include resistance to overtraining and the capability to grow a large number of RF trees without creating a risk of overfitting RF algorithm data do not need to be rescaled, transformed, or modified; they are also resistant to outliers in predictors In this study, the number of trees in an RF wasfixed at 500 for RF modeling after a primary analysis, and m sampled at each node was set

at 3 to analyze the combined contributions of subsets of features while maintaining fast convergence during itera-tions No calibration set is required to regulate the parame-ters (Micheletti et al 2014) The importance and standardized rank of each landslide variable were calculated The ranks were then used to overlay landslide factors and generate the susceptibility maps

6.4 Results 6.4.1 Multicollinearity Analysis Multicollinearity analysis is an important step in LSM The existence of a near-linear relationship among factors can create a division-by-zero problem during regression calcu-lations This problem can cause the calculations to be aborted and the relationship to be inexact; division by an extremely small quantity still distorts the results Therefore, analyzing landslide conditioning factors before LSM is important In multicollinearity analysis, collinear (depen-dent) factors are identiﬁed by examining a correlation matrix constructed by calculating R2 Various quantitative methods for detecting multicollinearities, such as pairwise scatter plots, estimation of the variance inflation factor (VIF), and investigation of eigenvalues in a correlation matrix, are available In this study, multicollinearity was detected by calculating the VIF values of each landslide conditioning factor In addition, communalities similar to R2 were cal-culated for each factor (Costello2009) Communality shows how well a variable is predicted by the retained factors Table6.1 presents the estimated communalities and VIF values for each landslide conditioning factor The second column of Table6.1indicates that some factors, such as land use, distance to road, distance to river, slope, STI, TWI, and TRI, exhibit strong linear relationships with other factors These factors may negatively affect the regression analysis However, VIF values are quantitative measures that are typically used to conclude whether a factor has a problem In some studies, a VIF greater than two was considered prob-lematic, whereas in other studies, a VIF greater than 10 was considered problematic (Garrosa et al 2010) To solve the

Định dạng
Số trang	18
Dung lượng	2,77 MB