These different clustering methods collaborate together during a refinement step of their results, to converge towards a similar result.. This collaborative process consists in an automat
Trang 1EURASIP Journal on Advances in Signal Processing
Volume 2008, Article ID 374095, 11 pages
doi:10.1155/2008/374095
Research Article
Multisource Images Analysis Using Collaborative Clustering
Germain Forestier, C ´edric Wemmert, and Pierre Ganc¸arski
LSIIT, UMR 7005 CNRS/ULP, University Louis Pasteur, 67070 Strasbourg Cedex, France
Correspondence should be addressed to Germain Forestier,forestier@lsiit.u-strasbg.fr
Received 1 October 2007; Revised 20 February 2008; Accepted 26 February 2008
Recommended by C Charrier
The development of very high-resolution (VHR) satellite imagery has produced a huge amount of data The multiplication of satellites which embed different types of sensors provides a lot of heterogeneous images Consequently, the image analyst has often many different images available, representing the same area of the Earth surface These images can be from different dates, produced by different sensors, or even at different resolutions The lack of machine learning tools using all these representations
in an overall process constraints to a sequential analysis of these various images In order to use all the information available simultaneously, we propose a framework where different algorithms can use different views of the scene Each one works on
a different remotely sensed image and, thus, produces different and useful information These algorithms work together in a collaborative way through an automatic and mutual refinement of their results, so that all the results have almost the same number of clusters, which are statistically similar Finally, a unique result is produced, representing a consensus among the information obtained by each clustering method on its own image The unified result and the complementarity of the single results (i.e., the agreement between the clustering methods as well as the disagreement) lead to a better understanding of the scene The experiments carried out on multispectral remote sensing images have shown that this method is efficient to extract relevant information and to improve the scene understanding
Copyright © 2008 Germain Forestier et al This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
1 INTRODUCTION
Unsupervised classification, also called clustering, is a
well-known machine learning tool which extracts knowledge
from datasets [1,2] The purpose of clustering is to group
similar objects into subsets (called clusters), maximizing
the intracluster similarity and the intercluster dissimilarity
Many clustering algorithms have been developed during the
last 40 years,each one is based on a different strategy In
image processing, clustering algorithms are usually used by
considering the pixels of the image as data objects: each pixel
is assigned to a cluster by the clustering algorithm Then, a
map is produced, representing each pixel with the colour of
the cluster it has been assigned to This cluster map, depicting
the spatial distribution of the clusters, is then interpreted
by the expert who assigns to each cluster (i.e., colour in
the image) a mean in terms of thematic classes (vegetation,
water, etc.)
In contrast to the supervised classification, unsupervised
classification requires very few inputs The classification
process only uses spectral properties to group pixels together However, it requires a precise parametrization by the user because the classification is performed without any control Other potential problems exist, especially when the user attempts to assign a thematic class to each produced cluster
On the one hand, some thematic classes may be represented
by a mix of different types of surface covers: a single thematic class may be split among two or more clusters (e.g., a park
is often an aggregate of vegetation, sand, water, etc.) On the other hand, some of the clusters may be meaningless, as they include too many mixed pixels: a mixed pixel (mixel) represents the average energy reflected by several types of surface present within the studied area
These problems have increased with the recent availabil-ity of very high-resolution satellite sensors, which provide many details of the land cover Moreover, several images with different characteristics are often available for the same area: different dates, from different kinds of remote sensing acquisition systems (i.e., with different numbers of sensors and wavelengths) or different resolutions (i.e., different sizes
Trang 2of surface of the area that a pixel represents on the ground).
Consequently, the expert is confronted to a too great mass
of data: the use of classical knowledge extraction techniques
became too complex It needs specific tools to extract
efficiently the knowledge stored in each of the available
images
To avoid the independent analysis of each image, we
propose to use different clustering methods, each working on
a different image of the same area These different clustering
methods collaborate together during a refinement step of
their results, to converge towards a similar result At the
end of this collaborative process, the different results are
combined using a voting algorithm This unified result
rep-resents a consensus among all the knowledge extracted from
the different sources Furthermore, the voting algorithm
highlights the agreement and the disagreement between the
clustering methods These two pieces of information, as well
as the result produced by each clustering method, lead to a
better understanding of the scene by the expert
The paper is organized as follows First, an overview
of multisource applications is introduced in Section 2 The
collaborative method to combine different clustering
algo-rithms is then presented inSection 3.Section 4presents in
details the paradigm of multisource images and the different
ways to use it in the collaborative system Section 5shows
an experimental evaluation of the developed methods, and
finally, conclusions are drawn inSection 6
2 MULTISOURCE IMAGES ANALYSIS
In the domain of Earth observation, many works focus on
the development of data-fusion techniques to take advantage
of all the available data on the studied area As discussed in
[3], multisource image analysis can be achieved at different
levels, according to the stage where the fusion takes place:
pixel, feature, or decision level
At pixel level, data fusion consists in creating a fused
image based on the sensors measurements by merging the
values given by the various sources A method is proposed
in [4] for combining multispectral, panchromatic, and radar
images by using conjointly the intensity-hue-saturation
transform and the redundant wavelet decomposition In [5],
the authors propose a multisource data-fusion mechanism
using generalized positive Boolean functions which consists
of two steps: a band generation is carried out followed
by a classification using a positive Boolean function-based
classifier In the case of feature fusion, the first step creates
new features from the various datasets; these new features
are merged and analyzed in a second step For example,
a segmentation can be performed on the different image
sources and these segmentations are fused [6] In [7], the
authors present another method based on the
Dempster-Shafer theory of evidence and using the fuzzy statistical
estimation maximization (FSEM) algorithm to find an
optimal estimation of the inaccuracy and uncertainty of the
classification
The fusion of decisions consists in finding a single
deci-sion (also called consensus) from all the decideci-sions produced
by the classifiers In [8], the authors propose a method based
on the combination of neural networks for multisource classification The system exposed in [9] is composed of
an ensemble of classifiers trained in a supervised way on
a specific image, and can be retrained in an unsupervised way to be able to classify a new image In [10], a general framework is presented for combining information from several supervised classifiers using a fuzzy decision rule
In our work, we focus on fusion of decisions from unsupervised classifications, each one produced from a
different image Contrary to the methods presented above,
we propose a mechanism which finds a consensus according
to the decisions taken by each of the unsupervised classifier
3 COLLABORATIVE CLUSTERING
Many works focus on combining different results of
clus-tering, which is commonly called clustering aggregation
[11], multiple clusterings [12], or cluster ensembles [13,14] All these approaches try to combine different results of
clustering in a final step In fact, these results must have
the same number of clusters (vote-based methods) [14] or the expected clusters must be separable in the data space (coassociation-based methods) [12] This latter property is almost never encountered in remote sensing image analysis
It is difficult to compute a consensual result from cluster-ing results with different numbers of classes or different structures (flat partitioning or hierarchical result) because of the lack of a trivial correspondence between the clusters of these different results To address the problem, we present in this section a framework where different clustering methods work together in a collaborative way to find an agreement about their proposals This collaborative process consists in
an automatic and mutual refinement of the clustering results, until all the results have almost the same number of clusters, and all the clusters are statistically similar At the end of this process, as the results have comparable structures, it is possible to define a correspondence function between the clusters, and to apply a unifying technique such as a voting method [15]
Before the description of the collaborative method, we introduce the correspondence function used within it
3.1 Intercluster correspondence function
There is no problem to associate classes of different super-vised classifications as a common set of class labels is given for all the classifications Unfortunately, in the case of unsupervised classifications, the results may not have a same number of clusters, and no information is available about the correspondence between the different clusters of the different results
To address the problem, we have defined a new interclus-ter correspondence function, which associates to each clusinterclus-ter from a result, a cluster from each of the other results Let{Ri }1 ≤ i ≤ mbe the set of results given by the different algorithms Let{Ck i }1≤ k ≤ ni be the clusters of the resultRi
Figure 1shows an example of such results
Trang 3C 1
C 1
C 1
C 1
C 2
C 2
C2
C 2
Figure 1: Two clustering results of the same data but using a
different method
The corresponding cluster CC(C k i,Rj) of a clusterCk ifrom
Riin the resultRj,i / = j, is the cluster from R jwhich is the
most similar toCk i:
CC
Ck i,Rj
=C j
withS
Ci k,C j
S
Ck i,Cl j , ∀ l ∈[1,n j]
, (1) where S is the intercluster similarity which evaluates the
similarity between two clusters of two different results
It is calculated from the recovery of the clusters in two
steps First, the intersection between each couple of clusters
(Ck i,Cl j), from two different results RiandRj, is calculated
and written in the confusion matrixMi, j:
Mi, j =
⎛
⎜
⎜
α i, j1,1 · · · α i, j1,nj
.
α i, j ni,1 · · · α i, j ni,nj
⎞
⎟
⎟, whereα i, j k,l = |
Ck i
Cl j
|
Ck i
(2) Then, the similarityS(C i
k,Cl j) between two clusters Ci
k
andCl j is evaluated by observing the relationship between
the size of their intersection and the size of the cluster itself,
and by taking into account the distribution of the data in the
other clusters as follows:
S
Ci
k,Cl j
Figure 2presents the correspondence function obtained
by using the intercluster similarity on the results shown in
Figure 1
3.2 Collaborative process overview
The entire clustering process is broken down in three main
following phases:
(i) initial clusterings: each clustering method computes a
clustering of the data using its parameters;
(ii) results refinement: a phase of convergence of the results,
which consists of conflicts evaluation and resolution,
is iterated as long as the quality of the results and their
similarity increase;
(iii) Unification: the refined results are unified using a
voting algorithm
C 1
C 1
C 1
C1
C 2
C 2
C 2
C 2
C2
C 2 Figure 2: The correspondence between the clusters of the two results fromFigure 1using the intercluster similarity by recovery
3.2.1 Initial clusterings
During the first step, each clustering method is initialized with its own parameters and a clustering is performed on
a remotely sensed image: all the pixels are grouped into
different clusters
3.2.2 Results refinement
The mechanism we propose for refining the results is based
on the concept of distributed local resolution of conflicts, by the iteration of four phases:
(i) detection of the conflicts by evaluating the dissimilari-ties between couples of results;
(ii) choice of the conflicts to solve;
(iii) local resolution of these conflicts;
(iv) management of the local modifications in the global result (if they are relevant)
(a) Conflicts detection
The detection of the conflicts consists in seeking all the couples (Ci
k,Rj), i / = j, such as C i
k,Rj) One conflictKk i, j is identified by one clusterCk i and one result
Rj
We associate to each conflict a measurement of its
importance, the conflict importance coe fficient, calculated
according to the intercluster similarity
CI
Kk i, j
Ck i,CC
Ck i,Rj
(b) Choice of the conflicts to solve
During an iteration of refinement of the results, several local resolutions are performed in parallel A conflict is selected in the set of existing conflicts and its resolution is started This conflict, like all those concerning the two results involved in the conflict, are removed from the list of the conflicts This process is iterated, until the list of the conflicts is empty Different heuristics can be used to choose the conflict to solve, according to the conflict importance coefficient (4) We choose to try to solve the most important conflict first
Trang 4letn = |CCs(Ci
k,Rj)|
letRi(resp.,Rj ) be the result of the application of an
operator onRi(resp.,Rj)
ifn > 1 then
Ri =Ri \ {Ci
k,n) }
Rj =Rj \CCs(Ci
k,Rj)∪ {merge(CCs(Ci
k,Rj))}
else
Ri =reclustering(Ri,Ci
k)
end if
Algorithm 1
(c) Local resolution of a conflict
The local resolution of a conflictKk i, jconsists of applying an
operator on each result involved in the conflict,RiandRj,
to try to make them more similar
The operators that can be applied to a result are the
following:
(i) merging of clusters: some clusters are merged together
(all the objects are merged in a new cluster that
replaces the clusters merged),
(ii) splitting of a cluster in subclusters: a clustering is
applied to the objects of a cluster to produce
subclus-ters,
(iii) reclustering of a group of objects: one cluster is
removed and its objects are reclassified in all the other
existing clusters
The operator to apply is chosen according to the
corre-sponding clusters of the cluster involved in the conflict The
corresponding clusters (CCs) of a cluster are an extension of
the definition of the corresponding cluster (1):
CCs
Ck i,Rj
=Cl j | S
Ck i,Cl j
> pcr,∀ l ∈[1,n j]
, (5) wherepcr, 0≤ pcr≤1, is given by the user Having found the
corresponding clusters of the cluster involved in the conflict,
an operator is chosen and applied as shown in Algorithm
But the application of the two operators is not always
relevant Indeed, it does not always increase the similarity of
the results implied in the conflict treated, and especially, the
iteration of conflict resolutions may lead to a trivial solution
where all the methods are in agreement For example, they
can converge towards a result with only one cluster including
all the objects to classify, or towards a result having one
cluster for each object These two solutions are not relevant
and must be avoided
So we defined a criterionγ, called local similarity
crite-rion, to evaluate the similarity between two results, based
on the intercluster similarityS (3) and a quality criterionδ
(given by the user):
γ i, j =1
2
p s ·
1
n i
ni
k =1
ω i, j k + 1
n j
nj
k =1
ω k j,i
+p q ·δ i+δ j
, (6)
where
ω k i, j =
nj
l =1
S
Ck i, CC
Ck i,Rj
(7)
and,p qandp sare given by the user (p q+p s =1) The quality criterionδ irepresents the internal quality of a resultRi(the compactness of its clusters, e.g.)
At the end of each conflict resolution, the local similarity criterion enables to choose which couple of results are to be kept: the two new results, the two old results, or one new result with one old result
(d) Global management of the local modifications
After the resolutions of all these local conflicts, a global application of the modifications proposed by the refinement step is decided if it improves the quality of the global result
The global agreement coe fficient of the results is evaluated
according to all the local similarity between each couple of results It evaluates the global similarity of the results and their quality:
Γ= 1 m
m
i =1
where
Γi = 1
m −1
m
j =1
j / = i
Even if the local modifications decrease this global agreement coefficient, the solution is accepted to avoid to fall
in a local maximum If the coefficient is decreasing too much, all the results are reinitialized to the best temporary solution (the one with the best global agreement coefficient) The global process is iterated until some conflicts can be solved
3.2.3 Unification
In the final step, all the results tend to have the same number
of clusters, which are increasingly similar Thus, we use a vot-ing algorithm [15] to compute a unified result combining the different results This multiview-voting algorithm enables
to combine in one unique result, many different clustering results that have not necessarily the same number of clusters The basic idea is that for each object to cluster, each result
Rivotes for the cluster it has found for this object,Ck i for example, and for the corresponding cluster ofCk i in all the
other results The maximum of these values indicates the best
cluster for the object, for exampleCl j This means that this object should be in the clusterCl j according to the opinion
of all the methods
After having done the vote for all objects, a new cluster
is created for each best cluster found if a majority of the
methods has voted for this cluster If not, the object is affected
to a special cluster, containing all the objects that do not have the majority, which means they have been classified
differently in too many results
Trang 5Real objectO
V1
V n
.
D1
D n
E1= {12; 45; 234}
E1= {2; 129; 73} .
E1N1= {172; 29; 89}
E n1= {172; 4; 34; 98}
E n2= {27; 129; 173; 53}.
.
E n
N n = {12; 129; 9; 255}
Figure 3: Different points of view V1toV non a same objectO (the
river) producing different descriptions D1toD nof the object
4 MULTISOURCE IMAGE PARADIGM
The method described in the previous section can use
different types of clustering algorithms, but they work with
only one common dataset (i.e., the same image for each
clustering algorithm) In this section, we describe how we
make the collaborative method able to combine different
sources of data and to extract knowledge from them
The problem can be described as follows There exists
one real object O that can be viewed from different points
of view, and the goal is to find one description of this object,
according to all the different points of view (Figure 3) Each
viewV iof the object is represented by a data setD iwhich is
composed of many elements{ E i
1, , E i
Ni } Each elementE i
k
is described by a set of attributes{(a i,k l ,v i,k l )}1 <l<ni,kcomposed
of a namea and a value υ.
Three different cases can be happened (Figure 4):
(a)E i k = E k jfor alli, j, a i,k l = a l j,kfor alll and v i,k l = / v l j,k
(e.g., two remote sensing images of a same region,
from the same satellite, but at different seasons);
(b)E i k = E k jfor alli, j and a i,k l = / a l j,k(e.g., two remote
sens-ing images of a same region, havsens-ing a same resolution,
but from two different satellites with different sensors);
(c) E i k = / E k jfor alli, j | i / = j (e.g., two remote sensing
images of a same region, but having a different
reso-lution, and from two different satellites with different
sensors)
4.1 Multisource objects clustering
A first method to classify multisource objects is to merge
the attributes from the different sources Each object has a
new description composed of the attributes of all the sources
(Figure 5(a)) But this technique may produce many clusters
because the description of the object would be too precise
(i.e., would have an important number of attributes) So
it is hard to discriminate the objects Indeed, due to the
D i
xs1 xs2 xs3
12 32 151
D j
xs1 xs2 xs3
15 41 131
(a) Same resolution/same sensors/di fferent dates: a pixel is described
by the same attributes but has di fferent values because of its evolution during the two dates
D i
xs1 xs2 xs3
12 32 151
D j
tm1 tm2 tm3 tm4
7 17 161 234
(b) Same resolutions/di fferent sensors: a pixel is described by three attributes in the image on the left, but by four attributes in the image
on the right
D i
xs1 xs2 xs3
12 32 151
D j
tm1 tm2 tm3 tm4
7 17 161 234
(c) Different resolutions/different sensors: the image D ihas a higher resolution thanD j, the two images do not the same size and the pixels are no more the same
Figure 4: The three different cases of image comparison
curse of dimensionality [16], most of the classical distance-based algorithms are not efficient enough to analyse objects having many attributes, the distances between these objects being not different enough to correctly determine the nearest objects In addition, the increase of the spectral dimension-ality increases the problems like the Hughes phenomena [17] which describes the harmful objects of high-dimensionality objects
A second way to combine all the attributes (Figure 5(b))
is to first classify the objects with each data sets These clusterings are made independently Then a new description
of each object is built, using the number of each cluster found
by the first classifications And finally a classification is made using these new descriptions of the objects The first phase
of clusterings enables to reduce the data space for the final clustering, making it easier This approach is similar to the
stacking method [18]
In our approach, the collaborative clustering (Figure 5(c)) is made quite as in the second method presented above Each data set is classified according to its attributes Although the clusterings are not made independently but they are refined to make them converge towards a unique result Then
Trang 6DataD1· · ·DataD N Clustering Final result
(a) The di fferent data are merged to produce a new dataset which is
classified
DataD1
· · ·
DataD N
Clustering 1
ClusteringN
(b) Each dataset is classified independently by a di fferent clustering
method and the results are combined
DataD1
· · ·
DataD N
Clustering 1
ClusteringN
(c) Each dataset is classified by a di fferent clustering method that
collaborates with the other methods and then the results are combined
Figure 5: Different data fusion techniques
only they are unified by a voting method, or a clustering as
in method (b)
To integrate this new approach in our system, we affect
one dataset to each clustering method All the process of
results refinement stay unchanged, but we are confronted
with the problem of the comparison of the different results,
and precisely of the estimation of the intercluster similarity
(seeSection 3.1) In the two first cases presented above (same
elements with different descriptions), the confusion matrix
and the intercluster similarity defined in Section 3can be
used However, in the third case (different elements with
different descriptions), it cannot be applied because the
computation of a confusion matrix between two clusterings
involves that the clusters refer to the same objects The
definition of a confusion matrix between datasets of different
objects is in the general case very hard, or even impossible
Nevertheless, in some particular problems, it is possible to
define it In the next section, we describe how this matrix
can be evaluated in the domain of multiscale remote sensing
images clustering
4.2 Multiscale remote sensing images classification
In remote sensing image classification, the problem of the
image resolution is not easy to resolve The resolution of an
image is the size covered by one pixel in the real world For
example, the very high-resolution satellites give a resolution
of 2.5 m, that is, one pixel is a square of 2.5 m×2.5 m One
can have different images of a same area but not with the
same resolution So it is really difficult to use these different
images because they do not include the same objects to
cluster (Figure 6)
Reality
Clustering of low resolution image
Clustering of high resolution image Figure 6: How can someone compare objects that are different but that represent a same “real” object? A same reality is viewed at two different resolutions For example the river is composed of 17 pixels
on the low resolution image but it is composed of 43 pixels on the high resolution image
For example, satellites often produce two kinds of images
of the same area, a panchromatic and a multispectral The panchromatic has a good spatial resolution but a low spectral resolution and, on the contrary, multispectral has a good spectral resolution but a low spatial resolution A solution
to use these two sources of information is to fuse the panchromatic and the multispectral images in a unique one Many methods have been investigated in the last few years to fuse these two kinds of images and to produce an image with
a good spectral and spatial resolution [19,20]
A fused image can be used directly as input of our collaborative system However, the fused image could not be available or the user would not like to use the fusion or would prefer to process the images without fusing them In these cases, we have to modify our system to be able to support images at different resolutions The modification consists of
a new definition of the confusion matrix (see (2)) between two clustering results
In the previous definition given inSection 3, each line of
the confusion matrix is given by the confusion vector α i, j k of the clusterCk i from the resultRicompared to then jclusters found in the resultRj:
α i, j k =α i, j k,l
l =1, ,n j, whereα i, j k,l = | C
i
k ∩ C l j |
If the two results were not computed using the same data and if the resolution of the two images are not the same, it
Trang 7is impossible to compute | C k i ∩ C l j | So we propose a new
definition of the confusion vector for a class Ck i from the
resultRicompared to the resultRj
Definition 1 (new confusion matrix) let r i and r j be the
resolution of the two imagesI iandI j ; let λ I1 ,I2be a function
that associates each pixel of the image I1 to one pixel of
the image I1, with r1 ≤ r2; let #(C, I1,I2) = |{ p ∈ C :
cluster (λ I1 ,I2(p)) =C}| ; if r i ≤ r j
α i, j k,l =#
Ck i,I i,I j
else
α i, j k,l = #
Cl j,I j,I i
r j
With this new definition of the confusion matrix, the
results can be compared with each other and evaluated
as described previously In the same way, the conflicts
resolution phase is unchanged
Because the images have not the same resolution, it is
not possible to apply directly the unification algorithm In
order to build a unique image representing all the results, we
choose the maximal resolution and the voting algorithm is
applied using the association function λ I1 ,I2 for each pixel
This choice was made to produce a result having the best
spatial resolution among the different input images
5 EXPERIMENTS
In this section, we present two experiments of our
collab-orative method on real images In the first experiment, we
use images of the satellite SPOT-5 to study an urban area In
the second experiment, we use the collaborative method to
analyse a coastal zone, through a set of heterogeneous images
(SPOT-1, SPOT-5, ASTER)
To be able to use our system with images at different
resolutions, we have to define aλ function (Figure 7) which
defines the correspondence between the pixels of two images
We use here the georeferencing [21] to define this function
In remote sensing, it is possible to associate the real world
coordinates to the pixels of an image (i.e., its position on
the globe) The georeferencing (here the Lambert 1 North
coordinates) is used here to map the pixel from an image to
the pixel of another image at a different resolution By using
the georeferencing, we are certain to maximize the quality of
the correspondence whatever the difference is between the
resolutions of the images
5.1 Panchromatic and multispectral collaboration
The first experiment is the analysis of images of the city
of Strasbourg (France) We use the images provided by the
sensors of the satellite SPOT-5 The panchromatic image
(Figure 8(a)) has a resolution of 5 meters (i.e., the width of
one pixel represents 5 meters in the real world), a size of
865×1021 pixels, and has a unique band The multispectral
λ I1 ,I2
Figure 7: The functionλ I1 ,I2 is the association function between two images It enables to associate one pixel of the imageI2to each pixel of the imageI1
(a) Panchromatic image (resolu-tion 5 meters-size: 865×1021)
(b) Multispectral image (resolu-tion 10 meters-size: 436×511) Figure 8: The two images of Strasbourg (France) from SPOT-5
image (Figure 8(b)) has a resolution of 10 meters, a size of
436×511, and has four bands (red, green, blue, and near infrared)
Our goal is to use these two heterogeneous (different resolutions, different number of bands, etc.) sources of data
in our collaborative clustering system to show that using multisource images improves the image analysis and scene understanding.Figure 9presents four different ways to use these two images with our collaborative system:
(a) six clustering methods working on the panchromatic image;
(b) six clustering methods working on the multispectral image;
(c) six clustering methods working on the fusion of the two image;
(d) three clustering methods working on the panchro-matic image; and three clustering methods working on the multispectral image
For case (c), we used the Gram-Schmidt algorithm to merge the panchromatic and the multispectral images This algorithm is well known in the field of remote sensing image fusion, and produces usually good results [22]
We choose to use the K-Means [23] algorithm for each clustering method This choice was made for computation
Trang 8(a) Multispectral:
collab-orative clustering on the
multispectral image
(b) Panchromatic: collab-orative clustering on the panchromatic image
(c) Fusion: collabora-tive clustering on the fusion of the multispec-tral and the panchro-matic images
(d) Multisource: multisource collabo-rative clustering using the panchro-matic and the multispectral images
Figure 9: The four test cases studied
Table 1: Results with ground truth
Classes Multispectral Panchromatic Fusion Collaborative
Building 42.24% 44.26% 67.92% 46.42%
convenience, but any clustering method can be used in
the collaborative system For each experiment ((a), (b),
(c), and (d)) each clustering method is assigned to one
image Then, the collaborative system described inSection 3
is launched with the modifications added in Section 4 for
multiresolution handling, thanks to the georeferencing The
K-Means algorithm is applied on each image (step 1) with
different number of clusters (randomly piked in [8; 10]),
and initialized randomly (different initialization for each
method) Then, the clustering methods collaborate through
the refinement step and modify their results according to the
result of the other methods (step 2) Finally, the different
results obtained are combined in a single one, thanks to
a voting algorithm (step 3) Figure 10 presents the final
unification result (obtained from the vote of the different
methods) for the four test cases
All the final results have seven clusters, due to the
capacity of the collaborative method to find a consensual
number of clusters According to the interpretation of the
geographer expert, the following conclusions can be made
The panchromatic case (Figure 10(b)) has produced a quite
bad result where a part of the vegetation has been merged
with the water because of the lack of spectral information
to describe the pixels (i.e., only one band) The fusion case
(Figure 10(c)) has produced a result with a good spatial
resolution, but has failed to find some real classes (i.e., the
expert expected two clusters of vegetation which have been
merged) The multispectral case (Figure 10(a)) has produced
a quite good result, but with a low spatial resolution Finally,
the multisource collaboration (Figure 10(d)) has produced a
good result with a good spatial resolution, and has corrected
some mistakes which appear on the multispectral case For
(a) Multispectral (7 clusters) (b) Panchromatic (7 clusters)
(c) Fusion (7 clusters) (d) Multisource collaboration (7
clusters) Figure 10: Results for the four test cases studied
example, the field on the top-right of the area has been identified more precisely thanks to the collaboration with the panchromatic image (Figure 11)
To validate these interpretations, a ground truth has been provided by the expert as partial binaries masks (Figure 11(b)) for four classes For each ground truth classes, the most potential cluster was selected by the expert (the best overlapping cluster as defined by the Vinet index in [24]) An accuracy index has been computed as the ratio of the number
of pixels in the ground truth classes, and the number of pixels
of the cluster overlapping it The results are presented in
Trang 9(a) Raw image (b) Ground truth
(c) Multispectral (d) Panchromatic
(e) Fusion (f) Collaborative
Figure 11: Examples of fields detection (b) illustrates the ground
truth for field (1) (on the left) and field (2) (on the right)
Table 1 As expected, the collaborative solution has produced
the best results, especially for the fields detection
To study the evolution of the agreement amongst all the
clustering methods during the refinement step, the tools of
the theoretical framework of information theory [25] can be
used random variable Then, the mutual information [26]
can be computed between a couple of clustering results
The mutual information quantify the amount of information
shared by the two results For two results Ri andRj, the
[0; 1] normalized mutual information is defined as
nmi (Ri,Rj)= 2
p
ni
k =1
nj
l =1
logni · nj
p.α i, j k,l
n i k n l j
, (13)
wherep is the number of pixels to classify, n iis the number
of clusters fromRi, andn i
k is the number of objects in the clusterCi
Moreover, the average mutual information quantify the
shared information among an ensemble of clustering results,
and can be used as an indicator of agreement:
anmi (m) = 1
N −1
N
j =1,j / = m
nmi (Rm,Rj) (14) withm =1, 2, , N, and N the number of clustering results.
45 40 35 30 25 20 15 10 5 0
Iteration Anmi among the clustering methods Anmi with the unified result
0.55
0.6
0.65
0.7
0.75
0.8
0.85
Figure 12: Evolution of the anmi index among the clustering methods and the average nmi between the results and the unified result
The average mutual information has been computed during the refinement process which have produced the result of Figure 10(d) Figure 12 presents the evolution of the anmi index among the results of the different clustering methods, and the average of the mutual information between each clustering method and the unified result
5.2 Multiresolution multidate collaboration
The second experiment was made on four images of a coastal zone (Normandy Coast, Northwest of France) This area is very interesting because it is periodically affected
by natural and anthropic phenomena which modify the structure of the area Consequently, the expert has often a lot of heterogeneous images available which are acquired through the years Four images issued from three different satellites (SPOT-4, SPOT-5 and ASTER) and having different resolutions (20, 15, 10, and 2.5 meters) are used.
Four clustering methods were set up, each one using one of the available images As in the previous experiment, the K-Means algorithm is ran on each image (step 1), the refinement algorithm is then applied (step 2), and the results are combined (step 3).Figure 14 presents the result of the unification of the final results
To make a better interpretation of the unified result,
a vote map is produced This map represents the result of
the vote carried out during the combination of the results [15].Figure 15presents the vote map corresponding to the result shown inFigure 14 In this image, the darker the pixels are, the less the clustering methods are in agreement So, the pixels where all the clustering methods agreed are in white, and the black pixels represent a strong disagreement amongst the clustering methods This degree of agreement
is computed using the corresponding cluster (see (1)) This representation helps the expert to improve his analysis of the result, by concentrating his attention on the part of the image where the clustering methods are in disagreement
Trang 10(a) SPOT-4-20 meters-3 bands (659×188)-date: 1999
(b) ASTER-15 meters-3 bands (922×256)-date: 2004
(c) SPOT-4-10 meters-3 bands (1382×384)-date: 2002
(d) SPOT-5-2.5 meters-3 bands (5528×1536)-date: 2005
Figure 13: The four images of Normandy Coast, France
Figure 14: The final unification result
Figure 15: The vote map
Consequently, another way to improve the scene
under-standing and to show the agreement between the methods is
to visualise the corresponding clusters (1) between a pair of
results It allows the expert to see which parts of the clusters
are in agreement, and which parts are in disagreement, for
a couple of results Figure 16 presents two corresponding
clusters between the clustering methods of this experiment
(a) Corresponding clusters showing disagreement in the fields
(b) Corresponding clusters showing a part of the coast line Figure 16: Corresponding clusters between two clustering meth-ods, in grey the agreement, in black the disagreement
InFigure 16(a), one can see the disagreement on a part of the coast line Figure 16(b)illustrates the disagreement on the fields All these results help the expert to improve his image understanding
6 CONCLUSIONS
In this paper, we have presented a method of multi-source images analysis using collaborative clustering This collaborative method enables the user to exploit different heterogeneous images in an overall system Each clustering method works on one image and collaborates with the other clustering methods to refine its result
Experimentations for the analysis of an urban area and a coastal area have been presented The system produces a final result by combining the results of the different clustering methods using a voting algorithm The agreement and the disagreement of the clustering methods can be highlighted
by a vote map, depicting the accordance between the different clustering methods Furthermore, the corresponding clusters between a pair of clustering methods can be visualised These features are very useful to help the expert to better understand his images
However, there is still a lot of work for the expert
to really interpret the information in the dataset because
no semantic is given by the system That is why we are working on an extension of this process, integrating high-level domain knowledge on the studied area (urban objects ontology, spatial relationships, etc.) This should enable
to add automatically semantic to the result, giving more information to the user
ACKNOWLEDGMENTS
The authors would like to thank the members of the FodoMuST and Ecosgil projects for providing the images and the geographers of the LIV Laboratory for their help in the interpretation of the results This work is supported by the
french Centre National d’Etudes Spatiales (CNES Contract
70904/00)
... computed using the same data and if the resolution of the two images are not the same, it Trang 7is...
differently in too many results
Trang 5Real objectO
V1... converge towards a unique result Then
Trang 6DataD1·