Báo cáo hóa học: " Research Article Multisource Images Analysis Using Collaborative Clustering" potx

These diﬀerent clustering methods collaborate together during a refinement step of their results, to converge towards a similar result.. This collaborative process consists in an automat

Trang 1

EURASIP Journal on Advances in Signal Processing

Volume 2008, Article ID 374095, 11 pages

doi:10.1155/2008/374095

Research Article

Multisource Images Analysis Using Collaborative Clustering

Germain Forestier, C ´edric Wemmert, and Pierre Ganc¸arski

LSIIT, UMR 7005 CNRS/ULP, University Louis Pasteur, 67070 Strasbourg Cedex, France

Correspondence should be addressed to Germain Forestier,forestier@lsiit.u-strasbg.fr

Received 1 October 2007; Revised 20 February 2008; Accepted 26 February 2008

Recommended by C Charrier

The development of very high-resolution (VHR) satellite imagery has produced a huge amount of data The multiplication of satellites which embed different types of sensors provides a lot of heterogeneous images Consequently, the image analyst has often many different images available, representing the same area of the Earth surface These images can be from different dates, produced by different sensors, or even at different resolutions The lack of machine learning tools using all these representations

in an overall process constraints to a sequential analysis of these various images In order to use all the information available simultaneously, we propose a framework where diﬀerent algorithms can use diﬀerent views of the scene Each one works on

a different remotely sensed image and, thus, produces different and useful information These algorithms work together in a collaborative way through an automatic and mutual refinement of their results, so that all the results have almost the same number of clusters, which are statistically similar Finally, a unique result is produced, representing a consensus among the information obtained by each clustering method on its own image The unified result and the complementarity of the single results (i.e., the agreement between the clustering methods as well as the disagreement) lead to a better understanding of the scene The experiments carried out on multispectral remote sensing images have shown that this method is efficient to extract relevant information and to improve the scene understanding

Copyright © 2008 Germain Forestier et al This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited

1 INTRODUCTION

Unsupervised classification, also called clustering, is a

well-known machine learning tool which extracts knowledge

from datasets [1,2] The purpose of clustering is to group

similar objects into subsets (called clusters), maximizing

the intracluster similarity and the intercluster dissimilarity

Many clustering algorithms have been developed during the

last 40 years,each one is based on a diﬀerent strategy In

image processing, clustering algorithms are usually used by

considering the pixels of the image as data objects: each pixel

is assigned to a cluster by the clustering algorithm Then, a

map is produced, representing each pixel with the colour of

the cluster it has been assigned to This cluster map, depicting

the spatial distribution of the clusters, is then interpreted

by the expert who assigns to each cluster (i.e., colour in

the image) a mean in terms of thematic classes (vegetation,

water, etc.)

In contrast to the supervised classification, unsupervised

classification requires very few inputs The classification

process only uses spectral properties to group pixels together However, it requires a precise parametrization by the user because the classification is performed without any control Other potential problems exist, especially when the user attempts to assign a thematic class to each produced cluster

On the one hand, some thematic classes may be represented

by a mix of diﬀerent types of surface covers: a single thematic class may be split among two or more clusters (e.g., a park

is often an aggregate of vegetation, sand, water, etc.) On the other hand, some of the clusters may be meaningless, as they include too many mixed pixels: a mixed pixel (mixel) represents the average energy reflected by several types of surface present within the studied area

These problems have increased with the recent availabil-ity of very high-resolution satellite sensors, which provide many details of the land cover Moreover, several images with different characteristics are often available for the same area: different dates, from different kinds of remote sensing acquisition systems (i.e., with different numbers of sensors and wavelengths) or different resolutions (i.e., different sizes

Trang 2

of surface of the area that a pixel represents on the ground).

Consequently, the expert is confronted to a too great mass

of data: the use of classical knowledge extraction techniques

became too complex It needs specific tools to extract

eﬃciently the knowledge stored in each of the available

images

To avoid the independent analysis of each image, we

propose to use diﬀerent clustering methods, each working on

a diﬀerent image of the same area These diﬀerent clustering

methods collaborate together during a refinement step of

their results, to converge towards a similar result At the

end of this collaborative process, the diﬀerent results are

combined using a voting algorithm This unified result

rep-resents a consensus among all the knowledge extracted from

the diﬀerent sources Furthermore, the voting algorithm

highlights the agreement and the disagreement between the

clustering methods These two pieces of information, as well

as the result produced by each clustering method, lead to a

better understanding of the scene by the expert

The paper is organized as follows First, an overview

of multisource applications is introduced in Section 2 The

collaborative method to combine diﬀerent clustering

algo-rithms is then presented inSection 3.Section 4presents in

details the paradigm of multisource images and the diﬀerent

ways to use it in the collaborative system Section 5shows

an experimental evaluation of the developed methods, and

finally, conclusions are drawn inSection 6

2 MULTISOURCE IMAGES ANALYSIS

In the domain of Earth observation, many works focus on

the development of data-fusion techniques to take advantage

of all the available data on the studied area As discussed in

[3], multisource image analysis can be achieved at diﬀerent

levels, according to the stage where the fusion takes place:

pixel, feature, or decision level

At pixel level, data fusion consists in creating a fused

image based on the sensors measurements by merging the

values given by the various sources A method is proposed

in [4] for combining multispectral, panchromatic, and radar

images by using conjointly the intensity-hue-saturation

transform and the redundant wavelet decomposition In [5],

the authors propose a multisource data-fusion mechanism

using generalized positive Boolean functions which consists

of two steps: a band generation is carried out followed

by a classification using a positive Boolean function-based

classifier In the case of feature fusion, the first step creates

new features from the various datasets; these new features

are merged and analyzed in a second step For example,

a segmentation can be performed on the diﬀerent image

sources and these segmentations are fused [6] In [7], the

authors present another method based on the

Dempster-Shafer theory of evidence and using the fuzzy statistical

estimation maximization (FSEM) algorithm to find an

optimal estimation of the inaccuracy and uncertainty of the

classification

The fusion of decisions consists in finding a single

deci-sion (also called consensus) from all the decideci-sions produced

by the classifiers In [8], the authors propose a method based

on the combination of neural networks for multisource classification The system exposed in [9] is composed of

an ensemble of classifiers trained in a supervised way on

a specific image, and can be retrained in an unsupervised way to be able to classify a new image In [10], a general framework is presented for combining information from several supervised classifiers using a fuzzy decision rule

In our work, we focus on fusion of decisions from unsupervised classifications, each one produced from a

diﬀerent image Contrary to the methods presented above,

we propose a mechanism which finds a consensus according

to the decisions taken by each of the unsupervised classifier

3 COLLABORATIVE CLUSTERING

Many works focus on combining diﬀerent results of

clus-tering, which is commonly called clustering aggregation

[11], multiple clusterings [12], or cluster ensembles [13,14] All these approaches try to combine diﬀerent results of

clustering in a final step In fact, these results must have

the same number of clusters (vote-based methods) [14] or the expected clusters must be separable in the data space (coassociation-based methods) [12] This latter property is almost never encountered in remote sensing image analysis

It is difficult to compute a consensual result from cluster-ing results with different numbers of classes or different structures (flat partitioning or hierarchical result) because of the lack of a trivial correspondence between the clusters of these different results To address the problem, we present in this section a framework where different clustering methods work together in a collaborative way to find an agreement about their proposals This collaborative process consists in

an automatic and mutual refinement of the clustering results, until all the results have almost the same number of clusters, and all the clusters are statistically similar At the end of this process, as the results have comparable structures, it is possible to define a correspondence function between the clusters, and to apply a unifying technique such as a voting method [15]

Before the description of the collaborative method, we introduce the correspondence function used within it

3.1 Intercluster correspondence function

There is no problem to associate classes of different super-vised classifications as a common set of class labels is given for all the classifications Unfortunately, in the case of unsupervised classifications, the results may not have a same number of clusters, and no information is available about the correspondence between the different clusters of the different results

To address the problem, we have defined a new interclus-ter correspondence function, which associates to each clusinterclus-ter from a result, a cluster from each of the other results Let{Ri }1 ≤ i ≤ mbe the set of results given by the diﬀerent algorithms Let{Ck i }1≤ k ≤ ni be the clusters of the resultRi

Figure 1shows an example of such results

Trang 3

C 1

C 2

C2

C 2

Figure 1: Two clustering results of the same data but using a

diﬀerent method

The corresponding cluster CC(C k i,Rj) of a clusterCk ifrom

Riin the resultRj,i / = j, is the cluster from R jwhich is the

most similar toCk i:

CC

Ck i,Rj

=C j

withS

Ci k,C j

S

Ck i,Cl j , ∀ l ∈[1,n j]

, (1) where S is the intercluster similarity which evaluates the

similarity between two clusters of two diﬀerent results

It is calculated from the recovery of the clusters in two

steps First, the intersection between each couple of clusters

(Ck i,Cl j), from two diﬀerent results RiandRj, is calculated

and written in the confusion matrixMi, j:

Mi, j =

⎛

⎜

α i, j1,1 · · · α i, j1,nj

.

α i, j ni,1 · · · α i, j ni,nj

⎞

⎟

⎟, whereα i, j k,l = |

Ck i

Cl j

|

Ck i

(2) Then, the similarityS(C i

k,Cl j) between two clusters Ci

k

andCl j is evaluated by observing the relationship between

the size of their intersection and the size of the cluster itself,

and by taking into account the distribution of the data in the

other clusters as follows:

S

Ci

k,Cl j

Figure 2presents the correspondence function obtained

by using the intercluster similarity on the results shown in

Figure 1

3.2 Collaborative process overview

The entire clustering process is broken down in three main

following phases:

(i) initial clusterings: each clustering method computes a

clustering of the data using its parameters;

(ii) results refinement: a phase of convergence of the results,

which consists of conflicts evaluation and resolution,

is iterated as long as the quality of the results and their

similarity increase;

(iii) Unification: the refined results are unified using a

voting algorithm

C 1

C1

C 2

C2

C 2 Figure 2: The correspondence between the clusters of the two results fromFigure 1using the intercluster similarity by recovery

3.2.1 Initial clusterings

During the first step, each clustering method is initialized with its own parameters and a clustering is performed on

a remotely sensed image: all the pixels are grouped into

diﬀerent clusters

3.2.2 Results refinement

The mechanism we propose for refining the results is based

on the concept of distributed local resolution of conflicts, by the iteration of four phases:

(i) detection of the conflicts by evaluating the dissimilari-ties between couples of results;

(ii) choice of the conflicts to solve;

(iii) local resolution of these conflicts;

(iv) management of the local modifications in the global result (if they are relevant)

(a) Conflicts detection

The detection of the conflicts consists in seeking all the couples (Ci

k,Rj), i / = j, such as C i

k,Rj) One conflictKk i, j is identified by one clusterCk i and one result

Rj

We associate to each conflict a measurement of its

importance, the conflict importance coe ﬃcient, calculated

according to the intercluster similarity

CI

Kk i, j

Ck i,CC

Ck i,Rj

(b) Choice of the conflicts to solve

During an iteration of refinement of the results, several local resolutions are performed in parallel A conflict is selected in the set of existing conflicts and its resolution is started This conflict, like all those concerning the two results involved in the conflict, are removed from the list of the conflicts This process is iterated, until the list of the conflicts is empty Diﬀerent heuristics can be used to choose the conflict to solve, according to the conflict importance coeﬃcient (4) We choose to try to solve the most important conflict first

Trang 4

letn = |CCs(Ci

k,Rj)|

letRi(resp.,Rj ) be the result of the application of an

operator onRi(resp.,Rj)

ifn > 1 then

Ri =Ri \ {Ci

k,n) }

Rj =Rj \CCs(Ci

k,Rj)∪ {merge(CCs(Ci

k,Rj))}

else

Ri =reclustering(Ri,Ci

k)

end if

Algorithm 1

(c) Local resolution of a conflict

The local resolution of a conflictKk i, jconsists of applying an

operator on each result involved in the conflict,RiandRj,

to try to make them more similar

The operators that can be applied to a result are the

following:

(i) merging of clusters: some clusters are merged together

(all the objects are merged in a new cluster that

replaces the clusters merged),

(ii) splitting of a cluster in subclusters: a clustering is

applied to the objects of a cluster to produce

subclus-ters,

(iii) reclustering of a group of objects: one cluster is

removed and its objects are reclassified in all the other

existing clusters

The operator to apply is chosen according to the

corre-sponding clusters of the cluster involved in the conflict The

corresponding clusters (CCs) of a cluster are an extension of

the definition of the corresponding cluster (1):

CCs

Ck i,Rj

=Cl j | S

Ck i,Cl j

> pcr,∀ l ∈[1,n j]

, (5) wherepcr, 0≤ pcr≤1, is given by the user Having found the

corresponding clusters of the cluster involved in the conflict,

an operator is chosen and applied as shown in Algorithm

But the application of the two operators is not always

relevant Indeed, it does not always increase the similarity of

the results implied in the conflict treated, and especially, the

iteration of conflict resolutions may lead to a trivial solution

where all the methods are in agreement For example, they

can converge towards a result with only one cluster including

all the objects to classify, or towards a result having one

cluster for each object These two solutions are not relevant

and must be avoided

So we defined a criterionγ, called local similarity

crite-rion, to evaluate the similarity between two results, based

on the intercluster similarityS (3) and a quality criterionδ

(given by the user):

γ i, j =1

2

p s ·

1

n i

ni

k =1

ω i, j k + 1

n j

nj

k =1

ω k j,i

+p q ·δ i+δ j

, (6)

where

ω k i, j =

nj

l =1

S

Ck i, CC

Ck i,Rj

(7)

and,p qandp sare given by the user (p q+p s =1) The quality criterionδ irepresents the internal quality of a resultRi(the compactness of its clusters, e.g.)

At the end of each conflict resolution, the local similarity criterion enables to choose which couple of results are to be kept: the two new results, the two old results, or one new result with one old result

(d) Global management of the local modifications

After the resolutions of all these local conflicts, a global application of the modifications proposed by the refinement step is decided if it improves the quality of the global result

The global agreement coe ﬃcient of the results is evaluated

according to all the local similarity between each couple of results It evaluates the global similarity of the results and their quality:

Γ= 1 m

m

i =1

where

Γi = 1

m −1

m

j =1

j / = i

Even if the local modifications decrease this global agreement coeﬃcient, the solution is accepted to avoid to fall

in a local maximum If the coeﬃcient is decreasing too much, all the results are reinitialized to the best temporary solution (the one with the best global agreement coeﬃcient) The global process is iterated until some conflicts can be solved

3.2.3 Unification

In the final step, all the results tend to have the same number

of clusters, which are increasingly similar Thus, we use a vot-ing algorithm [15] to compute a unified result combining the diﬀerent results This multiview-voting algorithm enables

to combine in one unique result, many diﬀerent clustering results that have not necessarily the same number of clusters The basic idea is that for each object to cluster, each result

Rivotes for the cluster it has found for this object,Ck i for example, and for the corresponding cluster ofCk i in all the

other results The maximum of these values indicates the best

cluster for the object, for exampleCl j This means that this object should be in the clusterCl j according to the opinion

of all the methods

After having done the vote for all objects, a new cluster

is created for each best cluster found if a majority of the

methods has voted for this cluster If not, the object is aﬀected

to a special cluster, containing all the objects that do not have the majority, which means they have been classified

diﬀerently in too many results

Trang 5

Real objectO

V1

V n

.

D1

D n

E1= {12; 45; 234}

E1= {2; 129; 73} .

E1N1= {172; 29; 89}

E n1= {172; 4; 34; 98}

E n2= {27; 129; 173; 53}.

.

E n

N n = {12; 129; 9; 255}

Figure 3: Diﬀerent points of view V1toV non a same objectO (the

river) producing diﬀerent descriptions D1toD nof the object

4 MULTISOURCE IMAGE PARADIGM

The method described in the previous section can use

diﬀerent types of clustering algorithms, but they work with

only one common dataset (i.e., the same image for each

clustering algorithm) In this section, we describe how we

make the collaborative method able to combine diﬀerent

sources of data and to extract knowledge from them

The problem can be described as follows There exists

one real object O that can be viewed from diﬀerent points

of view, and the goal is to find one description of this object,

according to all the diﬀerent points of view (Figure 3) Each

viewV iof the object is represented by a data setD iwhich is

composed of many elements{ E i

1, , E i

Ni } Each elementE i

k

is described by a set of attributes{(a i,k l ,v i,k l )}1 <l<ni,kcomposed

of a namea and a value υ.

Three diﬀerent cases can be happened (Figure 4):

(a)E i k = E k jfor alli, j, a i,k l = a l j,kfor alll and v i,k l = / v l j,k

(e.g., two remote sensing images of a same region,

from the same satellite, but at diﬀerent seasons);

(b)E i k = E k jfor alli, j and a i,k l = / a l j,k(e.g., two remote

sens-ing images of a same region, havsens-ing a same resolution,

but from two diﬀerent satellites with diﬀerent sensors);

(c) E i k = / E k jfor alli, j | i / = j (e.g., two remote sensing

images of a same region, but having a diﬀerent

reso-lution, and from two diﬀerent satellites with diﬀerent

sensors)

4.1 Multisource objects clustering

A first method to classify multisource objects is to merge

the attributes from the diﬀerent sources Each object has a

new description composed of the attributes of all the sources

(Figure 5(a)) But this technique may produce many clusters

because the description of the object would be too precise

(i.e., would have an important number of attributes) So

it is hard to discriminate the objects Indeed, due to the

D i

xs1 xs2 xs3

12 32 151

D j

xs1 xs2 xs3

15 41 131

(a) Same resolution/same sensors/di ﬀerent dates: a pixel is described

by the same attributes but has di ﬀerent values because of its evolution during the two dates

D i

xs1 xs2 xs3

12 32 151

D j

tm1 tm2 tm3 tm4

7 17 161 234

(b) Same resolutions/di ﬀerent sensors: a pixel is described by three attributes in the image on the left, but by four attributes in the image

on the right

D i

xs1 xs2 xs3

12 32 151

D j

tm1 tm2 tm3 tm4

7 17 161 234

(c) Diﬀerent resolutions/diﬀerent sensors: the image D ihas a higher resolution thanD j, the two images do not the same size and the pixels are no more the same

Figure 4: The three diﬀerent cases of image comparison

curse of dimensionality [16], most of the classical distance-based algorithms are not eﬃcient enough to analyse objects having many attributes, the distances between these objects being not diﬀerent enough to correctly determine the nearest objects In addition, the increase of the spectral dimension-ality increases the problems like the Hughes phenomena [17] which describes the harmful objects of high-dimensionality objects

A second way to combine all the attributes (Figure 5(b))

is to first classify the objects with each data sets These clusterings are made independently Then a new description

of each object is built, using the number of each cluster found

by the first classifications And finally a classification is made using these new descriptions of the objects The first phase

of clusterings enables to reduce the data space for the final clustering, making it easier This approach is similar to the

stacking method [18]

In our approach, the collaborative clustering (Figure 5(c)) is made quite as in the second method presented above Each data set is classified according to its attributes Although the clusterings are not made independently but they are refined to make them converge towards a unique result Then

Trang 6

DataD1· · ·DataD N Clustering Final result

(a) The di ﬀerent data are merged to produce a new dataset which is

classified

DataD1

· · ·

DataD N

Clustering 1

ClusteringN

(b) Each dataset is classified independently by a di ﬀerent clustering

method and the results are combined

DataD1

· · ·

DataD N

Clustering 1

ClusteringN

(c) Each dataset is classified by a di ﬀerent clustering method that

collaborates with the other methods and then the results are combined

Figure 5: Diﬀerent data fusion techniques

only they are unified by a voting method, or a clustering as

in method (b)

To integrate this new approach in our system, we aﬀect

one dataset to each clustering method All the process of

results refinement stay unchanged, but we are confronted

with the problem of the comparison of the diﬀerent results,

and precisely of the estimation of the intercluster similarity

(seeSection 3.1) In the two first cases presented above (same

elements with diﬀerent descriptions), the confusion matrix

and the intercluster similarity defined in Section 3can be

used However, in the third case (diﬀerent elements with

diﬀerent descriptions), it cannot be applied because the

computation of a confusion matrix between two clusterings

involves that the clusters refer to the same objects The

definition of a confusion matrix between datasets of diﬀerent

objects is in the general case very hard, or even impossible

Nevertheless, in some particular problems, it is possible to

define it In the next section, we describe how this matrix

can be evaluated in the domain of multiscale remote sensing

images clustering

4.2 Multiscale remote sensing images classification

In remote sensing image classification, the problem of the

image resolution is not easy to resolve The resolution of an

image is the size covered by one pixel in the real world For

example, the very high-resolution satellites give a resolution

of 2.5 m, that is, one pixel is a square of 2.5 m×2.5 m One

can have diﬀerent images of a same area but not with the

same resolution So it is really diﬃcult to use these diﬀerent

images because they do not include the same objects to

cluster (Figure 6)

Reality

Clustering of low resolution image

Clustering of high resolution image Figure 6: How can someone compare objects that are diﬀerent but that represent a same “real” object? A same reality is viewed at two diﬀerent resolutions For example the river is composed of 17 pixels

on the low resolution image but it is composed of 43 pixels on the high resolution image

For example, satellites often produce two kinds of images

of the same area, a panchromatic and a multispectral The panchromatic has a good spatial resolution but a low spectral resolution and, on the contrary, multispectral has a good spectral resolution but a low spatial resolution A solution

to use these two sources of information is to fuse the panchromatic and the multispectral images in a unique one Many methods have been investigated in the last few years to fuse these two kinds of images and to produce an image with

a good spectral and spatial resolution [19,20]

A fused image can be used directly as input of our collaborative system However, the fused image could not be available or the user would not like to use the fusion or would prefer to process the images without fusing them In these cases, we have to modify our system to be able to support images at diﬀerent resolutions The modification consists of

a new definition of the confusion matrix (see (2)) between two clustering results

In the previous definition given inSection 3, each line of

the confusion matrix is given by the confusion vector α i, j k of the clusterCk i from the resultRicompared to then jclusters found in the resultRj:

α i, j k =α i, j k,l

l =1, ,n j, whereα i, j k,l = | C

i

k ∩ C l j |

If the two results were not computed using the same data and if the resolution of the two images are not the same, it

Trang 7

is impossible to compute | C k i ∩ C l j | So we propose a new

definition of the confusion vector for a class Ck i from the

resultRicompared to the resultRj

Definition 1 (new confusion matrix) let r i and r j be the

resolution of the two imagesI iandI j ; let λ I1 ,I2be a function

that associates each pixel of the image I1 to one pixel of

the image I1, with r1 ≤ r2; let #(C, I1,I2) = |{ p ∈ C :

cluster (λ I1 ,I2(p)) =C}| ; if r i ≤ r j

α i, j k,l =#

Ck i,I i,I j

else

α i, j k,l = #

Cl j,I j,I i

r j

With this new definition of the confusion matrix, the

results can be compared with each other and evaluated

as described previously In the same way, the conflicts

resolution phase is unchanged

Because the images have not the same resolution, it is

not possible to apply directly the unification algorithm In

order to build a unique image representing all the results, we

choose the maximal resolution and the voting algorithm is

applied using the association function λ I1 ,I2 for each pixel

This choice was made to produce a result having the best

spatial resolution among the diﬀerent input images

5 EXPERIMENTS

In this section, we present two experiments of our

collab-orative method on real images In the first experiment, we

use images of the satellite SPOT-5 to study an urban area In

the second experiment, we use the collaborative method to

analyse a coastal zone, through a set of heterogeneous images

(SPOT-1, SPOT-5, ASTER)

To be able to use our system with images at diﬀerent

resolutions, we have to define aλ function (Figure 7) which

defines the correspondence between the pixels of two images

We use here the georeferencing [21] to define this function

In remote sensing, it is possible to associate the real world

coordinates to the pixels of an image (i.e., its position on

the globe) The georeferencing (here the Lambert 1 North

coordinates) is used here to map the pixel from an image to

the pixel of another image at a diﬀerent resolution By using

the georeferencing, we are certain to maximize the quality of

the correspondence whatever the diﬀerence is between the

resolutions of the images

5.1 Panchromatic and multispectral collaboration

The first experiment is the analysis of images of the city

of Strasbourg (France) We use the images provided by the

sensors of the satellite SPOT-5 The panchromatic image

(Figure 8(a)) has a resolution of 5 meters (i.e., the width of

one pixel represents 5 meters in the real world), a size of

865×1021 pixels, and has a unique band The multispectral

λ I1 ,I2

Figure 7: The functionλ I1 ,I2 is the association function between two images It enables to associate one pixel of the imageI2to each pixel of the imageI1

(a) Panchromatic image (resolu-tion 5 meters-size: 865×1021)

(b) Multispectral image (resolu-tion 10 meters-size: 436×511) Figure 8: The two images of Strasbourg (France) from SPOT-5

image (Figure 8(b)) has a resolution of 10 meters, a size of

436×511, and has four bands (red, green, blue, and near infrared)

Our goal is to use these two heterogeneous (diﬀerent resolutions, diﬀerent number of bands, etc.) sources of data

in our collaborative clustering system to show that using multisource images improves the image analysis and scene understanding.Figure 9presents four diﬀerent ways to use these two images with our collaborative system:

(a) six clustering methods working on the panchromatic image;

(b) six clustering methods working on the multispectral image;

(c) six clustering methods working on the fusion of the two image;

(d) three clustering methods working on the panchro-matic image; and three clustering methods working on the multispectral image

For case (c), we used the Gram-Schmidt algorithm to merge the panchromatic and the multispectral images This algorithm is well known in the field of remote sensing image fusion, and produces usually good results [22]

We choose to use the K-Means [23] algorithm for each clustering method This choice was made for computation

Trang 8

(a) Multispectral:

collab-orative clustering on the

multispectral image

(b) Panchromatic: collab-orative clustering on the panchromatic image

(c) Fusion: collabora-tive clustering on the fusion of the multispec-tral and the panchro-matic images

(d) Multisource: multisource collabo-rative clustering using the panchro-matic and the multispectral images

Figure 9: The four test cases studied

Table 1: Results with ground truth

Classes Multispectral Panchromatic Fusion Collaborative

Building 42.24% 44.26% 67.92% 46.42%

convenience, but any clustering method can be used in

the collaborative system For each experiment ((a), (b),

(c), and (d)) each clustering method is assigned to one

image Then, the collaborative system described inSection 3

is launched with the modifications added in Section 4 for

multiresolution handling, thanks to the georeferencing The

K-Means algorithm is applied on each image (step 1) with

diﬀerent number of clusters (randomly piked in [8; 10]),

and initialized randomly (diﬀerent initialization for each

method) Then, the clustering methods collaborate through

the refinement step and modify their results according to the

result of the other methods (step 2) Finally, the diﬀerent

results obtained are combined in a single one, thanks to

a voting algorithm (step 3) Figure 10 presents the final

unification result (obtained from the vote of the diﬀerent

methods) for the four test cases

All the final results have seven clusters, due to the

capacity of the collaborative method to find a consensual

number of clusters According to the interpretation of the

geographer expert, the following conclusions can be made

The panchromatic case (Figure 10(b)) has produced a quite

bad result where a part of the vegetation has been merged

with the water because of the lack of spectral information

to describe the pixels (i.e., only one band) The fusion case

(Figure 10(c)) has produced a result with a good spatial

resolution, but has failed to find some real classes (i.e., the

expert expected two clusters of vegetation which have been

merged) The multispectral case (Figure 10(a)) has produced

a quite good result, but with a low spatial resolution Finally,

the multisource collaboration (Figure 10(d)) has produced a

good result with a good spatial resolution, and has corrected

some mistakes which appear on the multispectral case For

(a) Multispectral (7 clusters) (b) Panchromatic (7 clusters)

(c) Fusion (7 clusters) (d) Multisource collaboration (7

clusters) Figure 10: Results for the four test cases studied

example, the field on the top-right of the area has been identified more precisely thanks to the collaboration with the panchromatic image (Figure 11)

To validate these interpretations, a ground truth has been provided by the expert as partial binaries masks (Figure 11(b)) for four classes For each ground truth classes, the most potential cluster was selected by the expert (the best overlapping cluster as defined by the Vinet index in [24]) An accuracy index has been computed as the ratio of the number

of pixels in the ground truth classes, and the number of pixels

of the cluster overlapping it The results are presented in

Trang 9

(a) Raw image (b) Ground truth

(c) Multispectral (d) Panchromatic

(e) Fusion (f) Collaborative

Figure 11: Examples of fields detection (b) illustrates the ground

truth for field (1) (on the left) and field (2) (on the right)

Table 1 As expected, the collaborative solution has produced

the best results, especially for the fields detection

To study the evolution of the agreement amongst all the

clustering methods during the refinement step, the tools of

the theoretical framework of information theory [25] can be

used random variable Then, the mutual information [26]

can be computed between a couple of clustering results

The mutual information quantify the amount of information

shared by the two results For two results Ri andRj, the

[0; 1] normalized mutual information is defined as

nmi (Ri,Rj)= 2

p

ni

k =1

nj

l =1

logni · nj

p.α i, j k,l

n i k n l j

, (13)

wherep is the number of pixels to classify, n iis the number

of clusters fromRi, andn i

k is the number of objects in the clusterCi

Moreover, the average mutual information quantify the

shared information among an ensemble of clustering results,

and can be used as an indicator of agreement:

anmi (m) = 1

N −1

N

j =1,j / = m

nmi (Rm,Rj) (14) withm =1, 2, , N, and N the number of clustering results.

45 40 35 30 25 20 15 10 5 0

Iteration Anmi among the clustering methods Anmi with the unified result

0.55

0.6

0.65

0.7

0.75

0.8

0.85

Figure 12: Evolution of the anmi index among the clustering methods and the average nmi between the results and the unified result

The average mutual information has been computed during the refinement process which have produced the result of Figure 10(d) Figure 12 presents the evolution of the anmi index among the results of the diﬀerent clustering methods, and the average of the mutual information between each clustering method and the unified result

5.2 Multiresolution multidate collaboration

The second experiment was made on four images of a coastal zone (Normandy Coast, Northwest of France) This area is very interesting because it is periodically aﬀected

by natural and anthropic phenomena which modify the structure of the area Consequently, the expert has often a lot of heterogeneous images available which are acquired through the years Four images issued from three diﬀerent satellites (SPOT-4, SPOT-5 and ASTER) and having diﬀerent resolutions (20, 15, 10, and 2.5 meters) are used.

Four clustering methods were set up, each one using one of the available images As in the previous experiment, the K-Means algorithm is ran on each image (step 1), the refinement algorithm is then applied (step 2), and the results are combined (step 3).Figure 14 presents the result of the unification of the final results

To make a better interpretation of the unified result,

a vote map is produced This map represents the result of

the vote carried out during the combination of the results [15].Figure 15presents the vote map corresponding to the result shown inFigure 14 In this image, the darker the pixels are, the less the clustering methods are in agreement So, the pixels where all the clustering methods agreed are in white, and the black pixels represent a strong disagreement amongst the clustering methods This degree of agreement

is computed using the corresponding cluster (see (1)) This representation helps the expert to improve his analysis of the result, by concentrating his attention on the part of the image where the clustering methods are in disagreement

Trang 10

(a) SPOT-4-20 meters-3 bands (659×188)-date: 1999

(b) ASTER-15 meters-3 bands (922×256)-date: 2004

(c) SPOT-4-10 meters-3 bands (1382×384)-date: 2002

(d) SPOT-5-2.5 meters-3 bands (5528×1536)-date: 2005

Figure 13: The four images of Normandy Coast, France

Figure 14: The final unification result

Figure 15: The vote map

Consequently, another way to improve the scene

under-standing and to show the agreement between the methods is

to visualise the corresponding clusters (1) between a pair of

results It allows the expert to see which parts of the clusters

are in agreement, and which parts are in disagreement, for

a couple of results Figure 16 presents two corresponding

clusters between the clustering methods of this experiment

(a) Corresponding clusters showing disagreement in the fields

(b) Corresponding clusters showing a part of the coast line Figure 16: Corresponding clusters between two clustering meth-ods, in grey the agreement, in black the disagreement

InFigure 16(a), one can see the disagreement on a part of the coast line Figure 16(b)illustrates the disagreement on the fields All these results help the expert to improve his image understanding

6 CONCLUSIONS

In this paper, we have presented a method of multi-source images analysis using collaborative clustering This collaborative method enables the user to exploit diﬀerent heterogeneous images in an overall system Each clustering method works on one image and collaborates with the other clustering methods to refine its result

Experimentations for the analysis of an urban area and a coastal area have been presented The system produces a final result by combining the results of the diﬀerent clustering methods using a voting algorithm The agreement and the disagreement of the clustering methods can be highlighted

by a vote map, depicting the accordance between the diﬀerent clustering methods Furthermore, the corresponding clusters between a pair of clustering methods can be visualised These features are very useful to help the expert to better understand his images

However, there is still a lot of work for the expert

to really interpret the information in the dataset because

no semantic is given by the system That is why we are working on an extension of this process, integrating high-level domain knowledge on the studied area (urban objects ontology, spatial relationships, etc.) This should enable

to add automatically semantic to the result, giving more information to the user

ACKNOWLEDGMENTS

The authors would like to thank the members of the FodoMuST and Ecosgil projects for providing the images and the geographers of the LIV Laboratory for their help in the interpretation of the results This work is supported by the

french Centre National d’Etudes Spatiales (CNES Contract

70904/00)

Trang 7

is...

diﬀerently in too many results

Trang 5

Real objectO

V1... converge towards a unique result Then

Trang 6

DataD1·

Định dạng
Số trang	11
Dung lượng	8,31 MB