We also et al., 2003, AHC Lance and Williams, 1967, R-tree Guttman, et al., 1996 with different real image databases of increasing sizes Wang, PascalVoc2006, Caltech101, Corel30k the num
Trang 1A new interactive semi-supervised clustering model for large image
database indexing
Hien Phuong Laia,b,c,⇑, Muriel Visania, Alain Bouchera,b,c, Jean-Marc Ogiera
a
L3I, Université de La Rochelle, Avenue M Crépeau, 17042 La Rochelle cedex 1, France
b
IFI, Equipe MSI; IRD, UMI 209 UMMISCO, Institut de la Francophonie pour l’Informatique, 42 Ta Quang Buu, Hanoi, Vietnam
c
Vietnam National University, Hanoi, Vietnam
a r t i c l e i n f o
Article history:
Available online 27 June 2013
Keywords:
Semi-supervised clustering
Interactive learning
Image indexing
a b s t r a c t
Indexing methods play a very important role in finding information in large image databases They orga-nize indexed images in order to facilitate, accelerate and improve the results for later retrieval Alterna-tively, clustering may be used for structuring the feature space so as to organize the dataset into groups of similar objects without prior knowledge (unsupervised clustering) or with a limited amount of prior knowledge (semi-supervised clustering)
In this paper, we introduce a new interactive semi-supervised clustering model where prior informa-tion is integrated via pairwise constraints between images The proposed method allows users to provide feedback in order to improve the clustering results according to their wishes Different strategies for deducing pairwise constraints from user feedback were investigated Our experiments on different image databases (Wang, PascalVoc2006, Caltech101) show that the proposed method outperforms semi-super-vised HMRF-kmeans (Basu et al., 2004)
Ó 2013 Elsevier B.V All rights reserved
1 Introduction
Content-Based Image Retrieval (CBIR) refers to the process
which uses visual information (usually encoded using color, shape,
texture feature vectors, etc.) to search for images in the database
that correspond to the user’s queries Traditional CBIR systems
generally rely on two phases The first phase is to extract the
fea-ture vectors from all the images in the database and to organize
them into an efficient index data structure The second phase is
to efficiently search in the indexed feature space to find the most
similar images to the query image
With the development of many large image databases, an
exhaustive search is generally intractable Feature space
structur-ing methods (normally called indexstructur-ing methods) are therefore
nec-essary for facilitating and accelerating further retrieval They can
be classified into space partitioning methods and data partitioning
methods
into cells (sometimes referred to as ‘‘buckets’’) of fairly similar
cardinality (in terms of number of images per cell), without taking into account the distribution of the images in the feature space Therefore, dissimilar points may be included in a same cell while similar points may end up in different cells The resulting index
is therefore not optimal for retrieval, as the user generally wants
to retrieve similar images to the query image Moreover, these methods are not designed to handle high dimensional data, while image feature vectors commonly count hundreds of elements
informa-tion about image distribuinforma-tion in the feature space However, the limitations on the cardinality of the space cells remain, causing the resulting index to be non-optimal for retrieval, especially in the case where groups of similar objects are unbalanced, i.e com-posed of different numbers of images
Our claim is that using clustering instead of traditional indexing
to organize feature vectors, results in indexes better adapted to high dimensional and unbalanced data Indeed, clustering aims to split a collection of data into groups (clusters) so that similar ob-jects belong to the same group and dissimilar obob-jects are in differ-ent groups, with no constraints on the cluster size This makes the resulting index better optimized for retrieval In fact, while in tra-ditional indexing methods it might be difficult to fix the number of objects in each bucket (especially in the case of unbalanced data), 0167-8655/$ - see front matter Ó 2013 Elsevier B.V All rights reserved.
⇑Corresponding author at: L3I, Université de La Rochelle, Avenue M Crépeau,
17042 La Rochelle cedex 1, France Tel.: +33 6 46 51 12 32; fax: +33 5 46 45 82 42.
E-mail addresses: hien_phuong.lai@univ-lr.fr (H.P Lai), muriel.visani@univ-lr.fr
(M Visani), alainboucher12@gmail.com (A Boucher), jean-marc.ogier@univ-lr.fr
(J.-M Ogier).
Pattern Recognition Letters
j o u r n a l h o m e p a g e : w w w e l s e v i e r c o m / l o c a t e / p a t r e c
Trang 2clustering methods have no limitation on the cardinality of the
clusters, objects can be grouped into clusters of very different sizes
Moreover, using clustering might simplify the relevance feedback
task, as the user might interact with a small number of cluster
pro-totypes rather than numerous single images
Because feature vectors only capture low level information such
as color, shape or texture, there is a semantic gap between
high-level semantic concepts expressed by the user and these low-high-level
features The clustering results are therefore generally different
from the intent of the user Our work aims to involve users in
the clustering phase so that they can interact with the system in
order to improve the clustering results The clustering methods
should therefore produce a hierarchical cluster structure where
the initial clusters may be easily merged or split We are also
inter-ested in clustering methods which can be incrementally built in
or-der to facilitate the insertion or deletion of new images by the user
It can be noted that incrementality is also very important in the
context of huge image databases, when the whole dataset cannot
be stored in the main memory Another very important point is
the computational complexity of the clustering algorithm,
espe-cially in an interactive online context where the user is involved
In the case of large image database indexing, we may be
et al., 2005) or semi-supervised clustering (Basu et al., 2002, 2004;
Dubey et al., 2010; Wagstaff et al., 2001) While no information
about ground truth is provided in the case of unsupervised
cluster-ing, a limited amount of knowledge is available in the case of
semi-supervised clustering The provided knowledge may consist of
class labels (for some objects) or pairwise constraints (must-link
or cannot-link) between objects
InLai et al (2012a), we proposed a survey of unsupervised
clus-tering techniques and analyzed the advantages and disadvantages
of different methods in a context of huge masses of data where
incrementality and hierarchical structuring are needed We also
et al., 2003), AHC (Lance and Williams, 1967), R-tree (Guttman,
et al., 1996)) with different real image databases of increasing sizes
(Wang, PascalVoc2006, Caltech101, Corel30k) (the number of
images ranges from 1000 to 30,000) to study the scalability of
(2012b), we presented an overview of semi-supervised clustering
methods and proposed a preliminary experiment of an interactive
semi-supervised clustering model using the HMRF-kmeans
on the Wang image database in order to analyze the improvement
in the clustering process when user feedback is provided
There are three main parts to this paper Firstly, we propose a
new interactive semi-supervised clustering model using pairwise
constraints Secondly, we investigate different methods for
deduc-ing pairwise constraints from user feedback Thirdly, we
experi-mentally compare our proposed semi-supervised method with
the widely known semi-supervised HMRF-kmeans method
This paper is structured as follows A short review of
2 A short review of semi-supervised clustering methods
For unsupervised clustering only similarity information is used
to organize objects; in the case of semi-supervised clustering a
small amount of prior knowledge is available Prior knowledge is
either in the form of class labels (for some objects) or pairwise
constraints between objects Pairwise constraints specify whether two objects should be in the same cluster (must-link) or in differ-ent clusters (cannot-link) As the clusters produced by unsuper-vised clustering may not be the ones required by the user, this prior knowledge is needed to guide the clustering process for resulting clusters which are closer to the user’s wishes For in-stance, for clustering a database with thousands of animal images,
an user may want to cluster by animal species or by background landscape types An unsupervised clustering method may give, as
a result, a cluster containing images of elephants with a grass back-ground together with images of horses with a grass backback-ground and another cluster containing images of elephants with a sand background These results are ideal when the user wants to cluster
by background landscape types But they are poor when the user wants to cluster by animal species In this case, must-link con-straints between images of elephants with a grass background and images of elephants with a sand background and cannot-link constraints between images of elephants with a grass background and images of horses with a grass background are needed to guide the clustering process The objective of our work is to make the user interact with the system so as to define easily these con-straints with only a few clicks Note that the available knowledge
is too poor to be used with supervised learning, as only a very lim-ited ratio of the available images are considered by the user at each step In general, semi-supervised clustering methods are used to maximize intra-cluster similarity, to minimize inter-cluster simi-larity and to keep a high consistency between partitioning and do-main knowledge
Semi-supervised clustering has been developed in the last dec-ade and some methods have been published to date They can be divided into semi-supervised clustering with labels, where partial information about object labels is given, and semi-supervised clus-tering with constraints, where a small amount of pairwise con-straints between objects is given
Some semi-supervised clustering methods using labeled objects
con-strained-kmeans are based on the k-means algorithm Prior knowledge for these two methods is a small subset of the input database, called seed set, containing user-specified labeled objects
of k different clusters Unlike k-means algorithm which randomly selects the initial cluster prototypes, these two methods use the la-beled objects to initialize the cluster prototypes Following this we repeat, until convergence, the re-assignment of each object in the dataset to the nearest prototype and the re-computation of the prototypes with the assigned objects The seeded-kmeans assigns objects to the nearest prototype without considering the prior la-bels of the objects in the seed set In contrast, the constrained-kmeans maintains the labeled examples in their initial clusters and assigns the other objects to the nearest prototype An
Du-bey et al (2010)for document analysis In this model, knowledge
is progressively provided as assignment feedback and cluster description feedback after each interactive iteration Using assign-ment feedback, the user moves an object from one cluster to an-other cluster Using cluster description feedback, the user modifies the feature vector of any current cluster (e.g increase the weighting of some important words) The algorithm learns from all the feedback to re-cluster the dataset in order to minimize average distance between points and their cluster centers while minimizing the violation of constraints corresponding to feedback Among the semi-supervised clustering methods using pairwise constraints between objects, we can cite COP-kmeans
Trang 3methods is data set X, a set of must-link constraints M and a set of
cannot-link constraints C In COP-kmeans, points are assigned to
until a suitable cluster is found The clustering fails if no solution
respecting the constraints is found While the constraint violation
is strictly prohibited in COP-kmeans, it is allowed with a violation
cost (penalty) in HMRF-kmeans and in semi-supervised
kernel-kmeans The objective function to be minimized in the
semi-super-vised HMRF-kmeans is as follows:
JHMRF Kmeans¼X
xi2X
Dðxi;ll
ðx i ;x j Þ2M;l i –l j
wijþ X
ðx i ;x j Þ2C;l i ¼l j
be either a constant or a function of the distance between the two
points specified in the pairwise constraint as follows:
where w and w are constants specifying the cost for violating a
between two points in the data set We can see that, to ensure the
most difficult constraints are respected, higher penalties are
as-signed to violations of must-link constraints between points which
are distant and to violations of cannot-link constraints between
can-not-link penalty term sensitive to extreme outliers, but all
cannot-link constraints are treated in the same way, so even in the presence
of extreme outliers, there would be no cannot-link constraint
also sensitive to outliers We can reduce this sensitivity by using
maximum distance between two clusters HMRF-kmeans first
ini-tializes the k cluster centers based on user-specified constraints,
iterative relocation approach similar to k-means is applied to
min-imize the objective function The iterative algorithm represents the
repetition of the assignment phase of each point to the cluster
which minimizes its contribution to the objective function and
the re-estimation phase of the cluster centers minimizing the
function in a transformed space instead of the original space using
a kernel function mapping as follows:
JSS Kernel Kmeans¼X
x i 2X
k/ðxiÞ /l ik2 X
ðx i ;xjÞ2M;l i ¼l j
wijþ X
ðx i ;xjÞ2C;l i ¼l j
wij ð4Þ
give a reward for must-link constraint satisfaction if the two points
are in the same cluster, by subtracting the corresponding penalty
term from the objective function
3 Proposed interactive semi-supervised clustering model
In this section, we present our proposed interactive semi-super-vised clustering model In our model, the initial clustering is car-ried out without any prior knowledge, using an unsupervised
adequa-tion between different unsupervised clustering methods and our applied context (involving user interactivity) as well as experimen-tally compared different unsupervised clustering methods (global
most suitable to our context BIRCH is less sensitive to variations
in its parameters Moreover, it is incremental, it provides a hierar-chical structure of clusters and it outperforms other methods in the context of a large database (best results and best computational time in our tests) Therefore, BIRCH is chosen for the initial unsupervised clustering in our model After the initial clustering, the user views the clustering results and provides feedback to the system The pairwise constraints (must-link, cannot-link) are deduced, based on user feedback; the system then re-organizes the clusters by considering the constraints The re-clustering process is done using the proposed semi-supervised clustering
feed-back and system reorganizes the clusters) is repeated until the clustering result satisfies the user The interactive semi-supervised clustering model contains the following steps:
1 Initial clustering using BIRCH unsupervised clustering
2 Repeat:
(a) Receive feedback from the user and deduce pairwise constraints
(b) Re-organize the clusters using the proposed semi-super-vised clustering method
until the clustering result satisfies the user
3.1 BIRCH unsupervised clustering Let us briefly describe the BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies) unsupervised clustering method (Zhang et al., 1996) The idea of BIRCH is to build a Clustering Fea-ture Tree (CF-tree)
We define a CF-vector, summarizing information of a cluster
LS
!
and SS are respectively the linear sum and the square sum of
i¼1!xi; SS ¼PN
i¼1!xi 2
From the CF-vectors, we can simply compute the centroid, the radius (average distance from points to the centroid) of a cluster and also the distance be-tween two clusters (e.g the Euclidean distance bebe-tween their cen-troids) A CF-tree is a balanced tree having three parameters B; L and T:
Each internal node contains, at most, B elements of the form
also contains two pointers, prev and next, to link leaf nodes
have a radius lower than a threshold T (threshold condition) The CF-tree is created by successively inserting points into the
closest leaf, if the threshold condition is not violated If it is
Trang 4impos-sible, a new CFj is created for the new point The corresponding
internal and leaf nodes must be split if necessary After creating
the CF-tree, we can use any clustering method (AHC, k-means,
for clustering the leaf entries, as it is suitable to be used with our
proposed semi-supervised clustering in the interactive phase
3.2 Proposed semi-supervised clustering method
At each interactive iteration, our semi-supervised clustering
method is applied after receiving feedback from the users for
re-organizing the clusters according to their wishes Our
semi-super-vised clustering method considers the set of all leaf entries
pro-vided as two sets of pairwise constraints between CF entries
and therefore all points which are included in these two entries
function to be minimized is as follows:
Jobj¼ X
CFi2S CF
DðCFi;lliÞ þ X
ðCF i ;CFjÞ2M CF ;li–lj
wNCFiNCFjDðCFi;CFjÞ
ðCF i ;CFjÞ2C CF ;li¼l j
wNCFiNCFjðDmax DðCFi;CFjÞÞ ð5Þ
where:
The first term measures the distortion between each leaf entry
The second and the third terms represent the penalty costs for
respectively violating the must-link and cannot-link constraints
between CF entries w and w are constants specifying the
viola-tion cost of a must-link and a cannot-link between two points
two entries The violation cost of a pairwise constraint between
is the maximum distance between two CF entries in the data
set Therefore, higher penalties are assigned to violations of
must-link between entries that are distant and of cannot-link
between entries which are close As in HMRF-kmeans, the term
extreme outliers, and could be replaced by the maximum
dis-tance between two clusters if the database contains extreme
outliers
In our case, we use the most frequently used squared Euclidean
distance as distortion measure The distance between two entries
CF i;SSCF iÞ, CFj¼ ðNCF j;!LS
the distance between their means as follows:
DðCFi;CFjÞ ¼Xd
p¼1
LSCFiðpÞ
NCF i
LSCFjðpÞ
NCF j
!2
ð6Þ
where d is the number of dimensions of the feature space
The proposed semi-supervised clustering is as follows:
Method:
1 Set t 0
2 Repeat until convergence
flðtþ1Þi gmi¼1 of entries fCFigmi¼1 to minimize the objective function
objec-tive function
(c) t t þ 1
In the re-assignment step, given the current cluster centers,
its contribution to the objective function as follows:
JobjðCFi;lhÞ ¼ DðCFi;lhÞ þ X
ðCF i ;CFjÞ2M CF ;h–lj
wNCFiNCFjDðCFi;CFjÞ
ðCF i ;CFjÞ2C CF ;h¼l j
wNCFiNCFjðDmax DðCFi;CFjÞÞ ð7Þ
We can see that the optimal assignment of each CF entry also de-pends on the current assignment of the other CF entries due to the violation cost of pairwise constraints in the second and third
randomly re-ordered, and the re-assignment process is repeated until no CF entry changes its cluster label between two successive iterations
minimize the objective function of the current assignment For simple calculation, each cluster center is also represented in the form of a CF-vector By using the squared Euclidean measure, the
en-tries which are assigned to this cluster as follows:
Nlh¼X
l i ¼h
LS
!
lh¼X
l i ¼h
LS
!
SSlh¼X
li¼h
func-tion is decreased with this re-assignment Therefore, the objective
step And in each re-estimation step, the mean of the CF-vector
therefore the points) in this cluster, that minimizes the
part in cluster center re-estimation Therefore, the objective
and re-estimation steps, the proposed semi-supervised clustering will converge to a (at least local) minimum in each interactive iteration
After each interactive iteration, new constraints are given to the system These new constraints might be in contradiction with some of the ones previously deduced by the system from the ear-lier user interactive iterations For this reason and also for compu-tational time matters, our system omits at each step some of the
Trang 5constraints deduced at earlier steps Therefore, the objective
And the convergence of the interactive semi-supervised model is
thus not guaranteed But we can verify the convergence of the
model, practically, by determining, at the end of all interactive
iter-ations, the global objective function which considers all feedback
given by the user in all interactive iterations and then by verifying
if this global objective function has improved or not after different
interactive steps This is a part of our current work
3.3 Interactive interface
In order to allow the user to view the clustering results and to
provide feedback to the systems, we implement an interactive
plane representing all presented clusters by their prototype
images In our system, the maximum number of cluster prototypes
presented to the user on the principal plane is fixed at 30 The
pro-totype image of each cluster is the most representative image of
that cluster chosen as follows In our model, we use the internal
estimate the quality of each image in a cluster The higher the
SW value of an image in a cluster, the more representative this
im-age is for the cluster The prototype imim-age of a cluster is thus the
image with the highest SW value in the cluster Any other internal
measure could be used instead The position of the prototype
im-age of each cluster in the principal plane represents the position
of the corresponding cluster center It means that, if two cluster
centers are close (or distant) in the n-dimensional feature space,
their prototype images are close (or distant) in the 2D principal
plane For representing the cluster centers which are n-dimen-sional vectors in 2D plane, we use Principal Component Analysis
prin-cipal axes associated with the highest eigenvalues The importance
of an axis is represented by its inertia (the sum of the squared
of its inertia in the total inertia of all axes In general, if the two principal axes explain (cumulatively) greater or equal to 80% of the total inertia, the PCA approach could lead to a nice 2D-repre-sentation of the prototype images In our case, the accumulated inertia explained by the two first principal axes is about 65% for the Wang and PascalVoc2006 databases and about 20% for the Cal-tech101 and Corel30k image databases As only a maximum of 30 clusters (and therefore 30 prototype images) can be shown to the user in an interactive iteration, a not very nice 2D-representation
of prototype images does not influence on the results as long as the user can distinguish between the prototype images and have
a rough idea of the distances between the clusters When there are some prototype images which overlap each other, a slight modification of the PCA components can help to separate these images
By clicking on a prototype image in the principal plane, the user
by the user is represented by a circle:
The prototype image of this cluster is located at the center of the circle
The 10 most representative images (images with the highest
SW values), which have not received feedback from the user
in the previous iterations, are located in the first circle of images around the prototype image, near the center
Fig 1 2D interactive interface The rectangle at the bottom right corner represents the principal plane consisting of the two first principal axes (obtained by PCA) of the
Trang 6The 10 least representative images (images with the smallest
SW values), which have not received feedback from the user
in the previous iterations, are located in the second circle of
images around the prototype image, close to the cluster border
By showing, for each iteration, the images which have not received
user feedback in previous iterations, we wish to obtain feedback
for different images
The user can specify positive feedback and negative feedback
cluster The user can also change the cluster assignment of a given
image by dragging and dropping the image from the original cluster
to the new cluster When an image is changed from cluster A to
clus-ter B, it is considered as negative feedback for clusclus-ter A and positive
feedback for cluster B Therefore, after each interactive iteration, the
process returns a positive image list and a negative image list for
each cluster with which the user has interacted
3.4 Pairwise constraint deduction
In each interactive iteration, user feedback is in the form of
positive and negative images, while the supervised input
infor-mation of the proposed semi-supervised clustering method are
pairwise constraints between CF entries Therefore, we have to
deduce the pairwise constraints between CF entries from the
user feedback
At each interactive iteration and for each interacted cluster, all
positive images should be in this cluster while negative images
should move to another cluster We consider that each image in
the positive set is linked to each image in the negative set by a
can-not-link, while all images in the positive set are linked by
must-links If we assume that all feedback is coherent between different
interactive iterations, we try to group images, which should be in
the same cluster according to the user feedback of all interactive
iterations, in a group called neighborhood We define:
cluster
includ-ing labels of the neighborhoods which should not be in the
cannot-link neighborhoods if there is at least one cannot-link
After receiving the list of feedback in the current iteration, the
lists Np and CannotNp are updated as follows:
receives interaction from the user:
neighbor-hood ! create a new neighborneighbor-hood for these positive
multiple neighborhoods ! merge these neighborhoods (in
the case of multiple neighborhoods) into one single
neigh-borhood, insert the other positive images which are not
included in any neighborhood to this neighborhood and
the set CannotNp to signify that neighborhoods that had
cannot-link with one of the neighborhoods which has merged, now have cannot-link with the new neighborhood
the neighborhood corresponding to the positive images of
As we assume that the user feedback is coherent among differ-ent interactive iterations, all images in a same neighborhood should be in a same cluster and images of cannot-link neighbor-hoods should be in different clusters There may be cannot-link
corre-sponding to different neighborhoods and other images which are not included in any other neighborhood Cannot-link may or may not exist between seeds of a CF entry With each CF entry that should be split, we present the user with each pair of seeds, which
do not have cannot-link between them, to demand more informa-tion (for each seed, the image which is closest to the center of the seed is presented):
If the user indicates that there is must-link between these two seeds, these seeds and also their corresponding neighborhoods are merged
If the user indicates that there is cannot-link between these two seeds, update the corresponding cannotCF lists specifying that their two corresponding neighborhoods have cannot-link between them
into p different CF entries; each new CF entry contains all points of
assigned to the CF entry corresponding to the closest seed By split-ting the necessary CF entries into purer CF entries, we can elimi-nate the case where cannot-link exists between images of a same
CF or where must-link and cannot-link exist simultaneously between images of two different CF entries Subsequently, pairwise constraints between CF entries can be deduced based on pairwise constraints between images as follows: if there is must-link (or respectively cannot-link) between two images of two CF entries,
a must-link (or respectively cannot-link) is created between these two CF entries
Concerning pairwise constraints between images, a simple and complete way to deduce them is to create must-link between each pair of images of a same neighborhood, and to create, for each pair
con-straints between images in this way, the number of concon-straints be-tween images can be very high, and therefore the number of constraints between CF entries could also be very high The pro-cessing time of the semi-supervised clustering in the next phase could thus be very high due to the high number of constraints There are different strategies for deducing pairwise constraints be-tween images that could reduce the number of constraints and also
cre-ated between positive images of each cluster while cannot-link are created between positive and negative images of each cluster (note 1
For interpretation of color in Fig 1, the reader is referred to the web version of
Trang 7the displacement feedback corresponding to a negative image of
the source cluster and a positive image of the destination cluster)
4 Experiments
In this section, we present some experimental results of our
interactive semi-supervised clustering model We also,
experimen-tally, compare our semi-supervised clustering model with the
semi-supervised HMRF-kmeans When using the semi-supervised
HMRF-kmeans in the re-clustering phase, the initial unsupervised
clustering is k-means
4.1 Experimental protocol
In order to analyze the performance of our interactive
semi-supervised clustering model, we use different image databases
di-vided into 101 classes)) Note that in our experiments we use the
same number of clusters as the number of classes in the ground
shown to the user on the principal plane; users can choose to view
and interact with any cluster in which they are interested For
dat-abases which have a small number of classes, such as Wang and
Pas-calVoc2006, all prototype images can be shown on the principal
plane For databases which have a large number of classes, such as
Caltech101, only a part of the prototype images can be shown for
visualization In our system, the maximum number of cluster
proto-types shown to the user in each iteration is fixed at 30 We use two
simple strategies for choosing clusters to be shown for each
itera-tion: 30 clusters chosen randomly or iteratively chosen pairs of
clos-est clusters until there are 30 clusters
The external measures compare the clustering results with the
ground truth, thus they are compatible for estimating the quality of
the interactive clustering involving user interaction As different
external measures analyze the clustering results in a similar way
(seeLai et al (2012a)), we use, in this paper, the external measure
V-measure values are, the better the results (compared to the ground-truth)
Concerning feature descriptors, we implement the local
used for its high performance The SIFT descriptor detects interest points from an image and describes the local neighborhood around each interest point by a 128-dimensional histogram of local gradi-ent directions of image intensities The rgSIFT descriptor of each interest point is computed as the concatenation of the SIFT descrip-tors calculated for the r and g components of the normalized RGB
the intensity channel, resulting in a 3128-dimensional vector
to group local features of each image into a single vector It consists
in two steps Firstly, K-means clustering is used to group local fea-tures of all images in the database according to a number dictSize of clusters We then generate a dictionary containing dictSize visual words which are the centroids of these clusters The feature vector
of each image is a dictSize dimension histogram representing the frequency of occurrence of the visual words in the dictionary, by replacing each local descriptor of the image by the nearest visual
descrip-tors are better than global descripdescrip-tors regarding the external mea-sures and the value dictSize ¼ 200 is a good trade-off between the size of the feature vector and the performance Therefore, in our experiments, we use the rgSIFT descriptor together with a visual word dictionary of size 200
In order to undertake the interactive tests automatically, we implement a software agent, later referred to as ‘‘user agent’’ that simulates the behavior of the human user when interacting with the system (assuming that the agent knows all the ground truth containing the class label for each image) At each interactive iter-ation, clustering results are returned to the user agent by the sys-tem; the agent simulates the behavior of the user giving feedback
to the system For simulating the user behavior, we suggest some rules:
At each interactive iteration, the user agent interacts with a fixed number of c clusters
The user agent uses two strategies for choosing clusters: ran-domly chosen c clusters, or iteratively chosen pairs of closest clusters until there are c clusters
Fig 2 Example of pairwise constraint deduction between images from the user feedback.
2 http://wang.ist.psu.edu/docs/related/.
3
http://pascallin.ecs.soton.ac.uk/challenges/VOC/.
4
Trang 8The user agent determines the image class (in the ground truth)
corresponding to each cluster by the most represented class
among the 21 presented images of the cluster The number of
images of this class in the cluster must be greater than a
thresh-old MinImages If this is not the case, this cluster can be
consid-ered as a noise cluster In our experiments, MinImages ¼ 5 for
databases having a small number of classes (Wang,
Pascal-Voc2006), and MinImages ¼ 2 for databases having a large
num-ber of classes (Caltech101)
When several clusters (among chosen clusters) correspond to a
same class, the cluster in which the images of this class are the
most numerous (among the 21 shown images of the cluster) is
chosen as the principal cluster of this class The classes of the
other clusters are redefined as usual, but neutralize the images
from this class
In each chosen cluster, all images, where the result of the
algo-rithm corresponds to the ground truth, are labeled as positive
samples of this cluster, while the others are negative samples
of this cluster All negative samples are moved to the cluster
(among chosen clusters) corresponding to their class in the
ground truth
con-straints between images based on user feedback in each iteration
and also on the neighborhood information User feedback is in the
form of positive and negative images of each cluster (the image which is displaced from one cluster to another cluster is considered
as a negative image of the source cluster and a positive image of the destination cluster) The neighborhood information is in the form of
user feedback during all interactive iterations, as presented in
for the semi-supervised HMRF-Kmeans, while they have to be
to be used by our proposed semi-supervised clustering We divide pairwise constraints between images into two kinds: user con-straints and deduced concon-straints User concon-straints are created di-rectly, based on user feedback in each iteration, while deduced constraints are created by deduction rules For instance, in the first
and cannot-links between positive and negative images in the first
Table 1
D75ifferent strategies for deducing pairwise constraints between images based on user feedback and on neighborhood information.
1 All user constraints of all interactive iterations
All deduced constraints of all interactive iterations
All constraints are created based on the neighborhood information:
Must-link between each pair of images of each neighborhood
Cannot-link between each image of each neighborhood Np i 2 Np and each image
of each neighborhood having cannot-link with Np i (listed in cannotNp i )
2 All user constraints of all interactive iterations
None of deduced constraints
In each iteration, all possible user constraints are created:
Must-link between each pair of positive images of each cluster
Cannot-link between each pair of a positive image and a negative image of a same cluster
3 All user constraints of all interactive iterations
All deduced constraints in the current iteration (deduced constraints
in the previous iterations are eliminated)
In each iteration, all possible user constraints are created as in Strategy 2
Deduced constraints in the current iteration are created while updating the neigh-borhoods as follows:
– If there is a must-link (or cannot-link) ðx i ;x j Þ; x j 2 Np m , deduced must-links (or cannot-links) ðx i ;x l Þ; x l 2 Np m are created
– If there is a must-link (or cannot-link) ðx i ;x j Þ; x i 2 Np m ; x j 2 Np n , deduced must-links (or cannot-links) ðx k ;x l Þ; 8x k 2 Np m ;8x l 2 Np n are created
4 User constraints between images and cluster centers of all
interac-tive iterations
Deduced constraints between images and cluster centers in the
cur-rent iteration (deduced constraints in the previous iterations are
eliminated)
In each iteration, the positive image having the best internal measure (SW) value among all positive images of each cluster is the center of this cluster
Must-link/cannot-link user constraints are created in each iteration between each positive/negative image and the corresponding cluster center
Deduced constraints in the current iteration are created while updating the neigh-borhoods as follows:
– If x i and x j must be in the same (or different) clusters (based on user feedback),
x j 2 Np m , deduced must-links (or cannot-links) are created between x i and each center image of Np m
– If x i and x j must be in the same (or different) clusters (based on user feedback),
x i 2 Np m ;x j 2 Np n , deduced must-links (or cannot-links) are created between
x i and each center image of Np n and between x j and each center image of Np m
5 User constraints (must-links between the most distant images and
cannot-links between the closest images) of all iterations
Deduced constraints (must-links between the most distant images
and cannot-links between the closest images) of all iterations
User constraints are created for each cluster in each iteration as follows: must-links are successively created between two positive images (at least one of them
is not selected by any must-link) that have the longest distance until all positive images of the cluster are connected by these must-links; cannot-links are created between each negative image and the nearest positive image of the cluster
Deduced constraints are created in each iteration as follows: must-links for each neighborhood are successively created between two images that have the longest distance until all images of this neighborhood are connected by these must-links; cannot-links are deduced, for each pair of cannot-link neighborhoods (Np i ;Np j ), between each image of Np i and the nearest image of Np j and between each image
of Np j and the nearest image of Np i
6 Same idea as in strategy 5, but the size of the neighborhoods is
considered while creating deduced cannot-links
User constraints and deduced must-link constraints are created as in Strategy 5 For each pair of cannot-link neighborhoods, deduced cannot-links are only created between each image of the neighborhood that has the least number of images and the nearest image of the neighborhood that has the most images
Trang 9cannot-link ðx3;x4Þ is created, based on the must-link ðx1;x4Þ and
be created based on neighborhood information In our experiments,
we use different strategies for deducing pairwise constraints
4.2 Experimental results
4.2.1 Analysis of different strategies for deducing pairwise constraints
between images
The first set of experiments aims at evaluating the performance
of our interactive semi-supervised clustering model using different
strategies for deducing pairwise constraints between images Note that constraints between CF entries should be deduced from con-straints between images, before being used in the re-clustering phase We use the Wang and the PascalVoc2006 image databases for these experiments For these two databases, we propose three test scenarios (note that c specifies the number of clusters which are chosen for interacting in each iteration):
Scenario 1: c ¼ 5 closest clusters are chosen
Scenario 2: c ¼ 5 clusters are randomly chosen
Scenario 3: c ¼ 10, all cluster are chosen (Wang and Pascal-Voc2006 both have 10 clusters)
(a) Results on the Wang image database
(b) Results on the PascalVoc2006 image database
Fig 3 Results of our proposed interactive semi-supervised clustering model during 50 interactive iterations on the Wang and PascalVoc2006 image databases, using 6 strategies for deducing pairwise constraints The horizontal axis specifies the number of iterations.
Trang 10Note that our experiments are carried out automatically, i.e the
feedback is given by a software agent simulating the behaviors of
the human user when interacting with the system In fact, the
human user can give feedback by clicking for specifying the
posi-tive and/or negaposi-tive images of each cluster or by dragging and
dropping the image from a cluster to another cluster For each
clus-ter selected by the user, only 21 images of this clusclus-ter are displayed
(seeFig 1) Therefore, for interacting with 5 clusters (scenarios 1,
2) or 10 clusters (scenario 3), the user has to realize respectively
a maximum of 105 or 210 mouse clicks in each interactive
itera-tion These upper bounds do not depend on neither the size of
the database nor the pairwise constraint deduction strategy, and
in practice the number of clicks that the user has to provide is
far lower However, the number of deduced constraints may be
much greater than the user’s clicks (and this number depends on
the database size and on the pairwise constraint deduction
strat-egy) When applying the interactive semi-supervised clustering
model in the indexing phase, the user is generally required to
pro-vide as much feedback as possible for having a good indexing
structure which could lead to better results in the further retrieval
phase Therefore, in the case of the indexing phase, the proposed
number of clicks seems tractable
Fig 3(a) and (b) show, respectively, the results during 50
inter-active iterations of our proposed interinter-active semi-supervised
clus-tering model on the Wang and PascalVoc2006 image databases,
with the three proposed scenarios The results are shown according
Ta-ble 1 The vertical axis specifies the V-measure values, while the
horizontal axis specifies the number of iterations Note that with
each selected cluster, the user agent gives all possible feedback
Therefore, for each scenario, the numbers of user feedback are
equivalent between different iterations and between different
strategies As in scenario 2, clusters are randomly chosen, we
real-ize this scenario 10 times for each database The curves of the
of the V-measure over these 10 executions at each iteration The average standard deviation of each strategy after 50 iterations is
the average execution times of 10 executions are shown) The experiments are executed using a normal PC with 2 GB of RAM
We can see that the clustering results progress, in general, after each interactive iteration, in which the system re-clusters the data-set by considering the constraints deduced from accumulated user feedback In most cases, the clustering results converge after only a few iterations This may be due to the fact that no new knowledge
is provided Moreover, we can easily see that the clustering results are better and converge more quickly when the number of chosen clusters (and therefore the number of constraints) in each interac-tive iteration is higher (scenario 3 gives better results and con-verges more quickly than scenarios 1 and 2) In addition, for both image databases, scenario 2, in which clusters are randomly cho-sen for interacting, gives better results than scenario 1, in which the closest clusters are chosen When selecting the closest clusters there may be only several clusters that always receive user feed-back; thus the constraint information is less than when all the clus-ters could receive user feedback when we randomly select the clusters
As regards different strategies for deducing pairwise con-straints, we can see that for each database, the average standard deviations over 10 executions of the scenario 2 are similar for all scenarios Therefore, we can compare different strategies based
Strategy 1 shows, in general, very good performance but the processing time is huge because it uses all possible user con-straints and deduced concon-straints created during all iterations
Strategy 2, the only strategy uniquely using user constraints, generally gives the worst results; thus deduced constraints are needed for better performance Its processing time is also high due to the large number of user constraints
Strategy 3 shows good or very good performance but some oscillations exist between different iterations because, when overlooking previously deduced constraints, some important constraints may be omitted Its processing time is high
Strategy 4 gives better results than strategy 2, but the results are unstable because this strategy also overlooks previously deduced constraints It has good execution time while reducing the number of constraints
Strategy 5 generally gives good or very good results by keeping important constraints (must-links between the most distant images and cannot-links between the closest images), but its processing time is still high
Strategy 6, by reducing the deduced cannot-link constraints from strategy 5, gives in general very good results in low execu-tion time
We can conclude, from this analysis, that strategy 6 shows the best trade-off between performance and processing time This strategy will be used in further experiments
4.2.2 Comparison of the proposed semi-supervised clustering model and the semi-supervised HMRF-kmeans
Figs 4(a) and (b) represent, respectively, the clustering results for 50 interactive iterations on the Wang and the PascalVoc2006 image databases when using our proposed semi-supervised clus-tering and the semi-supervised HMRF-kmeans in the re-clusclus-tering
6, for deducing pairwise constraints between images, are used Note that the results of scenario 2 represent the mean values and
Table 2
Average standard deviation of 10 executions of the scenario 2 after 50 interactive
iterations corresponding to the experiments of our proposed interactive
semi-supervised clustering model shown in Fig 3 (a) and (b).
Average standard deviation
Table 3
Processing time after 50 interactive iterations of the experiments of our proposed
interactive semi-supervised clustering model shown in Fig 3 (a) and (b).
Wang database
PascalVoc2006 database