As a classifier, SVM is adopted, which is known as one of the most effective methods in pattern and texture classification; texture patterns are composed of many pixels and are used as i
Trang 1SAR Image Classification by Support Vector Machine
Michifumi Yoshioka, Toru Fujinaka, and Sigeru Omatu
CONTENTS
15.1 Introduction 341
15.2 Proposed Method 342
15.3 Simulation 348
15.3.1 Data Set and Condition for Simulations 348
15.3.2 Simulation Results 349
15.3.3 Reduction of SVM Learning Cost 349
15.4 Conclusions 352
References 352
15.1 Introduction
Remote sensing is the term used for observing the strength of electromagnetic radi-ation that is radiated or reflected from various objects on the ground level with a sensor installed in a space satellite or in an aircraft The analysis of acquired data is an effective means to survey vast areas periodically [1] Land map classification is one of the analyses The land map classification classifies the surface of the Earth into categories such as water area, forests, factories, or cities In this study, we will discuss
an effective method for land map classification by using synthetic aperture radar (SAR) and support vector machine (SVM) The sensor installed in the space satellite includes an optical and a microwave sensor SAR as an active-type microwave sensor
is used for land map classification in this study A feature of SAR is that it is not influenced by weather conditions [2–9] As a classifier, SVM is adopted, which is known as one of the most effective methods in pattern and texture classification; texture patterns are composed of many pixels and are used as input features for SVM [10–12] Traditionally, the maximum likelihood method has been used as a general classification technique for land map classification However, the categories
to be classified might not achieve high accuracy because the method assumes normal distribution of the data of each category Finally, the effectiveness of our proposed method is shown by simulations
341
© 2008 by Taylor & Francis Group, LLC
Trang 215.2 Proposed Method
The outline of the proposed method is described here At first, the target images from SAR are divided into an area of 88 pixels for the calculation of texture features The texture features that serve as input data to the SVM are calculated using gray level co-occurrence matrix (GLCM), Cij, and gray level difference matrix (GLDM), Dk The term GLCM means the co-occurrence probability that neighbor pixels i and j become the same gray level, and GLDM means the gray level difference of neighbor pixels whose distance
is k The definitions of texture features based on GLCM and GLDM are as follows: Energy (GLCM)
i,j
Entropy (GLCM)
i,j
Local homogeneity
i,j
1
Inertia
i,j
Correlation
i,j
(i mi)(j mj)
sisj Cij
i
j
Cij, mj¼X
j
i
Cij
s2i ¼X i
(i mi)2X
j
Ci j, s2j ¼X
j
(j mj)2X
i
Variance
i,j
Sum average
i,j
Trang 3Energy (GLDM)
k
Entropy (GLDM)
k
Mean
k
Difference variance
k
The next step is to select effective texture features as an input to SVM as there are too many texture features to feed SVM [(7 GLCMs þ 4 GLDMs) 8 bands ¼ totally 88 fea-tures] Kullback–Leibler distance is adopted as the selection method of features in this study The definition of Kullback–Leibler distance between two probability density func-tions p(x) and q(x) is as follows:
L ¼
ð p(x) logp(x)
Using the above as the distance measure, the distance indicated in the selected features between two categories can be compared, and the feature combinations whose distance is large are selected as input to the SVM However, it is difficult to calculate all combinations
of 88 features for computational costs Therefore, in this study, each 5-feature combination from 88 is tested for selection
Then the selected features are fed to the SVM for classification The SVM classifies the data into two categories at a time Therefore, in this study, input data are classified into two sets, that is, a set of water and cultivation areas or a set of city and factory areas in the first stage In the second stage, these two sets are classified into two categories, respect-ively In this step, it is important to reduce the learning costs of SVM since the remote sensing data from SAR are too large for learning In this study, we propose a reduction method of SVM learning costs using the extraction of surrounding part data based on the distance in the kernel space because the boundary data of categories determine the SVM learning efficiency The distance d(x) of an element x in the kernel space from the category
to which the element belongs is defined as follows using the kernel function F(x):
d2(x) ¼ F(x) 1
n
Xn k¼1
F(xk)
2
n
Xn k¼1 F(xk)
!t F(x) 1 n
Xn l¼1 F(xl)
!
¼ F(x)tF(x) 1
n
Xn l¼1
F(x)tF(xl) 1
n
Xn k¼1
F(xk)tF(x) þ 1
n2
Xn k¼1
Xn l¼1
F(xk)tF(xl) (15:13)
© 2008 by Taylor & Francis Group, LLC
Trang 4Here, xkdenotes the elements of category, and n is the total number of elements.
Using the above distance d(x), the relative distance r1(x) and r2(x) can be defined as
r1(x) ¼d2(x) d1(x)
r2(x) ¼d1(x) d2(x)
In these equations, d1(x) and d2(x) indicate the distance of the element x from the category
1 or 2, respectively A half of the total data that has small relative distance is extracted and fed to the SVM To evaluate this extraction method by comparing with the traditional method based on Mahalanobis distance, the simulation is performed using sample data 1 and 2 illustrated in Figure 15.1 through Figure 15.4, respectively The distribution of samples 1 and 2 is Gaussian The centers of distributions are (0.5,0), (0.5,0) in class 1 and 2 of sample 1, and (0.6), (0.6) in class 1, and (0,0) in class 2 of sample 2, respectively The variances of distributions are 0.03 and 0.015, respectively The total number of data is
500 per class The kernel function used in this simulation is as follows:
K(x,x0) ¼ F(x)TF(x0) ¼ exp kx x
0k2 2s2
!
(15:16)
2s2¼ 0:1
As a result of the simulation illustrated inFigure 15.2,Figure 15.3,Figure 15.5, andFigure 15.6, in the case of sample 1, both the proposed and the Mahalanobis-based method classify
− 1
− 0.5
0
0.5
1
Class 1 Class 2
FIGURE 15.1
Sample data 1.
Trang 5− 1
− 0.5
0
0.5
1
Class 1 Class 2 Extraction 1 Extraction 2
FIGURE 15.2
Extracted boundary elements by proposed method (sample 1).
− 1
− 0.5
0
0.5
1
Class 1 Class 2 Extraction 1 Extraction 2
FIGURE 15.3
Extracted boundary elements by Mahalanobis distance (sample 1).
© 2008 by Taylor & Francis Group, LLC
Trang 6− 1
− 0.5
0
0.5
1
Class 1 Class 2
FIGURE 15.4
Sample data 2.
− 1
− 0.5
0
0.5
1
Class 1 Class 2 Extraction 1 Extraction 2
FIGURE 15.5
Extracted boundary elements by proposed method (sample 2).
Trang 7data successfully However, in the case of sample 2, the Mahalanobis-based method fails to classify data though the proposed method succeeds This is because Mahalanobis-based method assumes that the data distribution is spheroidal The distance function in those methods illustrated in Figure 15.7 throughFigure 15.10 clearly shows the reason for the classification ability difference of those methods
− 1
− 0.5
0
0.5
1
Class 1 Class 2 Extraction 1 Extraction 2
FIGURE 15.6
Extracted boundary elements by Mahalanobis distance (sample 2).
d1 (x,y)
− 1
− 0.5
0
0.5
1 − 1
− 0.5 0 0.5 1 0.7
0.750.8
0.85
0.9
0.95
1
1.05
1.1
1.15
FIGURE 15.7
Distance d1(x) for class 1 in sample 2 (proposed method).
© 2008 by Taylor & Francis Group, LLC
Trang 815.3 Simulation
15.3.1 Data Set and Condition for Simulations
The target data (Figure 15.11) used in this study for the classification are the observational data by SIR-C The SIR-C device is a SAR system that consists of two wavelengths: L-band (wavelength 23 cm) and C-band (wavelength 6 cm) and four polarized electromagnetic radiations The observed region is Kagawa Prefecture, Sakaide City, Japan (October 3,
d2 (x,y)
− 1
− 0.5
0
0.5
1 − 1
− 0.5 0 0.5 1 0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
FIGURE 15.8
Distance d2(x) for class 2 in sample 2 (proposed method).
d1 (x,y)
− 1
− 0.5
0
0.5
1 − 1
− 0.5 0 0.5 1 0
1
2
3
4
5
6
7
8
9
FIGURE 15.9
Distance d1(x) for class 2 in sample 2 (Mahalanobis).
Trang 91994) The image sizes are 1000 pixels in height, 696 pixels in width, and each pixel has
256 gray levels in eight bands
To extract the texture features, the areas of 88 pixels on the target data are combined and classified into four categories ‘‘water area,’’ ‘‘cultivation region,’’ ‘‘city region,’’ and
‘‘factory region.’’ The mountain region is not classified because of the backscatter The ground-truth data for training are shown inFigure 15.12and the numbers of sample data
in each category are shown inTable 15.1
The selected texture features based on Kullback–Leibler distance mentioned in the previous section are shown inTable 15.2
The kernel function of SVM is Gaussian kernel with the variance s2 ¼ 0.5, and soft margin parameter C is 1000 The SVM training data are 100 samples randomly selected from ground-truth data for each category
15.3.2 Simulation Results
The final result of simulation is shown inTable 15.3 In this table, ‘‘selected’’ implies the classification accuracy with selected texture features in Table 15.2, and ‘‘all’’ implies the accuracy with all 88 kinds of texture features The final result shows the effectiveness
of feature selection for improving classification accuracy
15.3.3 Reduction of SVM Learning Cost
The learning time of SVM depends on the number of sample data Therefore, the com-putational cost reduction method for SVM learning is important for complex data sets such as ‘‘city’’ and ‘‘factory’’ region in this study Then, the reduction method proposed in the previous section is applied, and the effectiveness of this method is evaluated by comparing the learning time with traditional methods The numbers of data are from
200 to 4000, and the learning times of the SVM classifier are measured in two cases In the
d2 (x,y)
− 1
− 0.5
0
0.5
1 − 1
− 0.5 0 0.5 1 0
2
4
6
8
10
12
14
FIGURE 15.10
Distance d2(x) for class 2 in sample 2 (Mahalanobis).
© 2008 by Taylor & Francis Group, LLC
Trang 10first case, all data are used for learning and in the second case, by using the proposed method, data for learning are reduced by half The selected texture features for classifi-cation are the energy (band 1), the entropy (band 6), and the local homogeneity (band 2), and the SVM kernel is Gaussian.Figure 15.13shows the result of the simulation The CPU
of the computer used in this simulation is Pentium 4/2GHz The result of the simulation clearly shows that the learning time is reduced by using the proposed method The learning time is reduced to about 50% on average
FIGURE 15.11
Target data.
Trang 11Water City Cultivation Factory Mountain
FIGURE 15.12 (See color insert following page 240.)
Ground-truth data for training.
TABLE 15.1
Number of Data
TABLE 15.2
Selected Features
Water, cultivation/city, factory Correlation (1, 2), sum average (1)
City/factory Energy (1), entropy (6),
local homogeneity (2) Numbers in parentheses indicate SAR band
© 2008 by Taylor & Francis Group, LLC
Trang 1215.4 Conclusions
In this chapter, we have proposed the automatic selection of texture feature combinations based on the Kullback–Leibler distance between category data distributions, and the computational cost reduction method for SVM classifier learning As a result of simula-tions, by using our proposed texture feature selection method and the SVM classfier, higher classification accuracy is achieved when compared with traditional methods In addition, it is shown that our proposed SVM learning method can be applied to more complex distributions than applying traditional methods
References
1 Richards J.A., Remote Sensing Digital Image Analysis, 2nd ed., Springer-Verlag, Berlin, p 246, 1993
2 Hara Y., Atkins R.G., Shin R.T., Kong J.A., Yueh S.H., and Kwok R., Application of neural networks for sea ice classification in polarimetric SAR images, IEEE Transactions on Geoscience and Remote Sensing, 33, 740, 1995
3 Heermann P.D and Khazenie N., Classification of multispectral remote sensing data using
a back-propagation neural network, IEEE Transactions on Geoscience and Remote Sensing,
30, 81, 1992
TABLE 15.3 Classification Accuracy (%)
Water Cultivation City Factory Average Selected 99.82 94.37 93.18 89.77 94.29 All 99.49 94.35 92.18 87.55 93.39
Number of data
Normal
Accelerated
200 400 600 800 1000 2000 3000 4000 0
200 400 600 800 1000 1200 1400 1600
FIGURE 15.13
SVM learning Time.
Trang 134 Yoshida T., Omatu S., and Teranishi M., Pattern classification for remote sensing data using neural network, Transactions of the Institute of Systems, Control and Information Engineers,
4, 11, 1991
5 Yoshida T and Omatu S., Neural network approach to land cover mapping, IEEE Transactions
on Geoscience and Remote Sensing, 32, 1103, 1994
6 Hecht-Nielsen R., Neurocomputing, Addison-Wesley, New York, 1990
7 Ulaby F.T and Elachi C., Radar Polarimetry for Geoscience Applications, Artech House, Norwood, 1990
8 Van Zyl J.J., Zebker H.A., and Elach C., Imaging radar polarization signatures: theory and observation, Radio Science, 22, 529, 1987
9 Lim H.H., Swartz A.A., Yueh H.A., Kong J.A., Shin R.T., and Van Zyl J.J., Classification of earth terrain using polarimetric synthetic aperture radar images, Journal of Geophysical Research, 94,
7049, 1989
10 Vapnik V.N., The Nature of Statistical Learning Theory, 2nd ed., Springer, New York, 1999
11 Platt J., Sequential minimal optimization: A fast algorithm for training support vector machines, Technical Report MSR-TR-98-14, Microsoft Research, 1998
12 Joachims T., Making large-scale SVM learning practical, In B Scho¨lkopf, C.J.C Burges, and A.J Smola, Eds., Advanced in Kernel Method—Support Vector Learning, MIT Press, Cambridge,
MA, 1998
© 2008 by Taylor & Francis Group, LLC