From a single point initial query, query expansion provides a multiple point query, which is then enhanced using query point movement.. To learn the multiple point queries, the irrelevan
Trang 1O R I G I N A L R E S E A R C H
Cluster-based relevance feedback for CBIR: a combination
of query point movement and query expansion
Nhu-Van Nguyen•Alain Boucher•Jean-Marc Ogier•
Salvatore Tabbone
Received: 30 June 2011 / Accepted: 5 June 2012 / Published online: 21 June 2012
Springer-Verlag 2012
Abstract This paper presents a cluster-based relevance
feedback method, which combines two popular techniques
of relevance feedback: query point movement and query
expansion Inspired from text retrieval, these two
tech-niques are giving good results for image retrieval But
query point movement is limited by a constraint of
un-imodality in taking into account the user feedbacks Query
expansion gives better results than query point movement,
but it cannot take into account irrelevant images from the
user feedbacks We combine the two techniques to profit
from their advantages and to cope with their limitations
From a single point initial query, query expansion provides
a multiple point query, which is then enhanced using query
point movement To learn the multiple point queries, the
irrelevant feedback images are classified into query points
which are clustered from relevant images using the query
expansion technique The experiments show that our
method gives better results in comparison with the two
techniques of relevance feedback taken individually
Keywords Image retrieval Relevance feedback Query point movement Query expansion
1 Introduction There are two reasons for limited performance of all Content-Based Image Retrieval (CBIR) systems The first one is that it is impossible to fully express all the user intent into a simple query for retrieval The latter is due to the the semantic gap, which can be defined as the differ-ence between the user interpretation and the computer description for an image In order to resolve these prob-lems, several researchers (Zhou and Huang2003; Nguyen
et al.2009; Apostol et al.2005; Kim et al.2005; Ritendra
et al 2008; Ortega and Mehrotra 2004; Yoshiharu et al
1998) have applied the relevance feedback (RF) techniques
in CBIR over the last decade Significant improvements in performance have been witnessed in the application of RF techniques in the traditional text retrieval domain Nowa-days, RF has become an essential component of a CBIR system
RF is an interactive strategy which is effective to improve the accuracy of information retrieval systems The basic idea of RF is that the user is involved in the retrieval process so the final result set is improved In particular, the user gives feedback on the relevance of documents in an initial set of results It adapts the retrieval process for a specific user and a specific query The user first submits a query (an image as example in our case), then receives some results After that, the user interacts with the system
by labeling some images as relevant or irrelevant with the given query The system, in turn, computes a better revised set of retrieval results based on the user feedback RF has a short-term memory which means that the system can
N.-V Nguyen ( &) J.-M Ogier
L3i-University of La Rochelle, La Rochelle, France
e-mail: Nhu-Van.Nguyen@univ-lr.fr
J.-M Ogier
e-mail: Jean-Marc.Ogier@univ-lr.fr
N.-V Nguyen A Boucher
IFI, MSI team; IRD, UMI 209 UMMISCO; Vietnam National
University, Hanoi, Vietnam
e-mail: alain.boucher@auf.org
N.-V Nguyen S Tabbone
QGAR-LORIA, University of Lorraine, Nancy, France
e-mail: tabbone@loria.fr
DOI 10.1007/s12652-012-0141-z
Trang 2remember the results during the interaction process for the
given query Once it is finished, the system cleans its
memory and the next user starts from scratch
Various relevance feedback techniques have been
pro-posed to improve the retrieval performance: weight
fea-tures learning (Yoshiharu et al.1998), query modification
(Ortega and Mehrotra2004), classifier learning (Tao et al
2006) Among them, query representation modification is
the most popular technique and is widely used in both
image retrieval and text retrieval Query modification
includes two different techniques: query expansion and
query point movement The first technique, query point
movement (Ortega and Mehrotra 2004; Yoshiharu et al
1998) is referred to as the retrieval by single point query (as
represented in the feature space) which is modified via
relevant and irrelevant images, which represent positive
and negative feedbacks from the user It is working with
the assumption of the unimodality of relevant images
(Yimin and Aidong 2004) Unimodality means that all
relevant images are similar between them and they form a
distinct cluster from other images in the feature space
Query point movement tries to obtain the ideal query point
by moving it towards relevant images and away from
irrelevant ones The second technique, query expansion
(Ortega and Mehrotra2004; Kim et al.2005), is referred to
as the retrieval by multiple point queries Instead of
assuming an unimodal distribution, query expansion
assumes many smaller unimodal distributions to construct
multiple point queries from relevant images Query
expansion is arguably one of the most effective approaches
of relevance feedback
In this paper, a novel method for combining these two
techniques is proposed for query by example in CBIR
Query expansion is used to construct multiple point queries
by clustering the relevant images Query point movement is
used to improve the representation of the multiple point
queries by applying the Rocchio technique (Salton 1971)
on the relevant and the irrelevant images Our contribution
is a cluster-based relevance feedback technique which uses
the query point movement technique and the irrelevant
examples to enhance the efficiency of query expansion
This paper is divided into 6 sections In Sect 2 the
related work is described and the remaining problems are
discussed in Sect.3 Section4presents our method Section
5discusses the evaluation and presents experimental results
on a large dataset with 30K images Section6 concludes
the paper and gives some future directions for work
2 Related work
Because of the problem of fully expressing the user intents
using a simple query and the problem of the semantic gap,
there have been many works focusing on relevance feed-back Various relevance feedback techniques have been proposed: weight features learning (Yoshiharu et al.1998), query modification (Ortega and Mehrotra2004), classifier learning (Tao et al 2006) Weight features learning improves the distance function, query modification looks for the ideal query point and classifier learning uses the relevant/irrelevant images as training data to construct a probability classifier Among the techniques for relevance feedback, query modification is based on the text retrieval approach and is often considered as the best approach of relevance feedback in image retrieval systems This tradi-tional type of approach is still very efficient compared to all other techniques in the two fields: text retrieval and image retrieval In the general context of the image retrieval process and the development of techniques of relevance feedback, a recognized problem is the small number of available examples We state the hypothesis that
a user can label up to 20 images only when most of the learning techniques require much more If we compare the Rocchio algorithm for query modification with learning algorithms (metric of classifier optimization), such as neural networks for example, it can be understood that the popularity of query modification is related to the fact that it requires very few examples in learning
To detail these two techniques for query modification,
we must first define the concept of unimodality of an image group Unimodality is a concept used by some authors in the field of reference feedback (Karthik and Jawahar2006; Yimin and Aidong 2004) to characterize the fact that the closest images of a query in the feature space are not all relevant to the query However, there is no clear definition
of this concept, so we define it as:
Definition The concept of unimodality of an image group means that all images in this group are similar and they form a group distinct from the other images in the feature space In relevance feedback, images in a group are similar
in the sense of their relevance with the given query The relevance can be estimated using an arbitrary threshold or function, or in the case of our work, indicated by the user who is labeling some images in the retrieval results as relevant or irrelevant Relevance is then a subjective notion meaning that it satisfies the query as judged by the human user An image group is defined as centered on the query in the feature space, or in another words as the most closest retrieval results for the given query
For example in Fig.1, the left group is unimodal while the right group is not unimodal
The query modification technique, which we focus on in this paper, can be achieved using either of two approaches: query point movement and query expansion In both approaches, the input is a single point query (or a vector in
Trang 3the feature space) Query point movement aims at moving
the single point query in the feature space (adjusting the
feature vector of the query point, Fig.2) Query expansion
aims at replacing the single point query by a multiple point
query (replacing a feature vector by multiple feature
vec-tors, Fig.3) Each technique uses the incremental
infor-mation from interactions with the user, or in other terms,
the relevant/irrelevant images returned (labeled) by the
user
2.1 Query point movement
In the query point movement approach (Ortega and
Me-hrotra 2004; Yoshiharu et al 1998) for the query by
example in CBIR, a query is represented by a single point
in the feature space and the refinement process attempts to
reformulate the query vector to move it closer to the area
containing relevant images (see Fig.2) With the
assump-tion of the unimodality of relevant images, the optimal
query maximizes the similarity to relevant images and
minimizes the similarity to irrelevant ones (Kim et al
2005) The Rocchio technique (Salton1971) is often used
to compute the optimal query:
qiþ1¼ aqiþ b
jDrj
X
d2D r
d c
jDnj
X
d2D n
where q~iis the query at iteration i of the relevance feedback process, Dris the relevant set, Dnis the irrelevant set, a, b and c give the relative weights of q, Drand Dn In exper-iments, the set of parameters a = b = c = 1 is widely used for image retrieval
2.2 Query expansion
In the query expansion approach (Kim et al.2005; Ortega and Mehrotra 2004), the query is modified by selectively adding new relevant point to the query representation A single point query is replaced by a multiple point query (see Fig.3) Instead of assuming an unimodal distribution
as in query point movement, query expansion assumes many smaller unimodal distributions to construct multiple local clusters from the relevant images The representatives
of local clusters are used to perform multiple point que-rying The clustering of relevant images is repeated for each relevance feedback iteration Querying by multiple points is investigated in (Xiangyu and James2003; Natsev and Smith 2003; Thijs and de P Vries Arjen 2004; Tah-aghoghi et al 2002; Apostol et al 2005; Danzhou et al
2009) which are focused on the similarity function and the fusion of multiple single point query Experimental eval-uation in (Ortega and Mehrotra 2004) shows that query expansion outperforms query point movement in retrieval effectiveness
Recently, new approaches are aiming to improve the query modification technique The QCluster system (Kim
et al 2005) uses a new adaptive classification and cluster-merging method to find multiple regions The clustering step is not repeated as in query expansion QCluster clas-sifies relevant examples into the previous clusters or create
a new cluster The number of clusters is limited to a fixed number by using a cluster-merging method But this complex approach is unable to make effective use of irrelevant examples All the above methods still have
Fig 1 Unimodality of an image group based on the user feedbacks:
relevant (?) and irrelevant (-) result images compared with the given
query A non-unimodal image group (the group includes irrelevant
images as judged by the user given the query) could contain some
unimodal subgroups, as in the right group where we can identify
contains 3 unimodal subgroups (but not-centered on the query) In our
work, we try to identify these unimodal subgroups from a
non-unimodal image group
Fig 2 Query point movement a The initial query and the user
feedbacks (relevant ‘‘?’’ and irrelevant ‘‘-’’ result images) b The
query moves toward the relevant images c The query moves toward
the relevant images until it is positioned at the center of the relevant
images
Fig 3 Query expansion: a a single point query is replaced by b a multiple point query, using the user feedbacks, relevant (?) example images only
Trang 4drawbacks such as local maximum traps and slow
con-vergence In (Danzhou et al.2009), the authors propose a
fast query point movement technique to get rid of these
drawbacks However, their work aims to specific target
search by using relevance feedback, which has some
dif-ference with the category search done in classical CBIR
Target search in CBIR systems refers to finding a specific
(target) image such as a particular registered logo or a
specific historical photograph
2.3 Multiple point query
Query expansion requires support for multiple point
que-rying Querying by multiple point is investigated in (Thijs
and de P Vries Arjen 2004; Tahaghoghi et al 2002;
Apostol et al.2005) which are concerned by the similarity
function and the fusion of multiple single point queries
The similarity of images for each single point query is
determined independently The result for a single point
query is an ordered list Lists from all single point queries
must be combined to determine the final ranking of the
multiple points query A combining function is therefore
required to reduce multiple similarity values to a single
value When this reduction has been performed for all
images in the collection, the user is presented with a list of
the images, presented in decreasing order of similarity All
combining functions can be resumed into three types:
MINIMUM, MAXIMUM and SUM These types
deter-mine the distance of images from the specified multiple
points query to be respectively the minimum, the
maxi-mum, and the sum of the distances (with weights) to each
single point query In our experiment, the MINIMUM
function is found to be the best combining function in term
of robustness This is also confirmed by Tahaghoghi et al
in (2002)
3 Remaining problems
The main disadvantage of query point movement is the
constraint of unimodality (see previous definition in Sect
2) on relevant examples The main problem for query
expansion is its difficulty to use effectively irrelevant
images In query point movement, the query point is moved
closer to the relevant examples and away from the
irrele-vant ones in the feature space When the releirrele-vant images
are grouped in distinct subsets in the feature space (that is
to say the distribution of the relevant examples is not
unimodal), then the problem arises from the need to cover
multiple clusters with a single query In these cases, the
ideal query point includes irrelevant examples Figure4
shows the ellipse representing the line equidistant from a
new query We can see some irrelevant examples included
in the relevant ellipses
Query expansion and its best improved version QCluster (Kim et al 2005) only use relevant examples to form queries to multiple points The technique of query expan-sion does not use irrelevant examples because we cannot perform clustering using relevant and irrelevant examples together, which would give false groups Our analysis on the subject suggests that without irrelevant examples, convergence towards the ideal query point can potentially
be very slow, and also the risk of falling into a local minimum is not insignificant Indeed, a false ideal query point can be achieved when the local group is close to some relevant examples, but located near also many irrelevant examples (see Fig 4) We can see from this figure that irrelevant examples may be included in local groups, because these are constructed based only on relevant examples regardless of the presence or not of irrelevant ones In general, relevance feedback techniques often use relevant feedback examples The management of irrelevant feedback examples remains a major growth factor, thus representing a very open scientific question (Xuanhui et al
2008)
4 Clustered-based relevance feeback for CBIR
In this section, we present our approach which attempts to provide precise answers to questions previously identified This approach exploits irrelevant examples and combines query point movement and query expansion
A combination of query point movement and query expansion is proposed to overcome problems related to query expansion and query point movement The main drawback of query point movement is the constraint of unimodality on relevant examples that cannot be always verified We solve this problem by using a clustering
Fig 4 Remaining problems with query point movement and query expansion a In query point movement the ideal query point can include some irrelevant examples (-) due to the non-unimodality of the relevant examples b In query expansion, ideal query points slowly converge when irrelevant examples (-) are not used Both techniques can cause result in a local maximum trap
Trang 5technique to build multiple local clusters that provide local
unimodality using relevant examples The main drawback
of query expansion is the inability to make effective use of
irrelevant examples In our approach, we propose a
sequential combination of the two techniques: first query
expansion (Fig.5b) then query point movement (Fig.5c)
We are taking advantage of irrelevant examples using the
technique of query point movement on multiple local
clusters created using query expansion We believe this
sequential combination is the best among all possible
combinations because it ensures the unimodality constraint
and makes use of irrelevant examples (Fig.5c) to
effec-tively achieve the ideal query The opposite combination
(first query point movement then query expansion) is not
good as query expansion cannot profit from irrelevant
examples which were used in query point movement
The purpose of this technique is to reach the ideal query
through interaction with the user and to overcome the
identified problems for both query point movement and
query expansion The first relevance feedback interaction
loop is shown in Fig.6 Initially, a single point query is
formalized by using the feature vector of an image query q:
Q = f1, f2, , fn fi is a n-dimension vector in the feature
space Then images are retrieved, the first N images are
shown to the user (which has a limited view due to screen
interface constraints) The user identifies and labels
rele-vant/irrelevant images in an interaction process of RF, with
the assumption that relevant examples in the result do not
ensure the unimodality (Fig.6, steps 1 and 2) Basing on
(only) relevant/irrelevant images returned from the user the
technique will replace and improve the single point query
q by a multiple point query qi, i [ 1 (a query with multiple
feature vectors) using the two main processes: query
expansion and query movement
First, the single point query q is expanded into a
mul-tiple point query to ensure the unimodality (of each
sub-query) which is the problem of query point movement
(Fig.6, step 3): the relevant examples are clustered into c
groups C1, C2,…, Cc The number of clusters c is selected
automatically using an adaptive clustering technique and is limited to a maximum value In this step, we try to have the cluster/group maximums that are always unimodal Two clustering algorithms used in our system are presented in the end of this section Second, in order to find the ideal points of the c relevant groups, the query point movement technique is used: irrelevant examples are classified into these c groups (Fig 6, step 4) to identify irrelevant examples present in each local group (in contrast with query expansion where only relevant examples are used) Relevant and irrelevant examples in each group are then used to build the multiple point query by the Eq 1 (Fig.6, step 5) in which we try to move the query points closer to the relevant images and away from the irrelevant images The classifier k Nearest Neighbors (k-NN) is used in step 4 for the classification of irrelevant examples because of its efficiency and simplicity, the parameter k of the classifier is selected as follows:
and the query point q! of cluster i is calculated using thei Rocchio’s formula (Salton1971):
qi
! ¼Pmj¼1R!j
Pn j¼1!Ij
Fig 5 Combination of query point movement and query expansion,
where ideal query points are achieved more efficiently and quickly
and irrelevant examples are not present in local clusters a The initial
single point query and the feedbacks (relevant ‘‘?’’ and irrelevant
‘‘-’’) given by the user b The multiple point query obtained by query
expansion c The multiple points query is moved towards relevant
feedbacks and away from irrelevant ones using query point movement
Fig 6 Main steps for the cluster-based relevance feedback
Trang 6where I1, I2,…, In: n irrelevant examples and
R1, R2,…, Rm: m relevant examples of the local cluster Ci
These c points of query form the final multiple point query
As discussed above, in the first interaction loop, the
initial query (one sole point) is replaced by a multiple
points query by building local groups (clustering step) For
the following interaction loops, there are two choices to
improve the multiple points query The first choice does
not rely on the first multiple point query (clustering step of
the first iteration), but is re-clustering relevant examples at
each iteration This method attempts to add relevant query
points and to remove irrelevant points in this same query,
based on all relevant/irrelevant examples from each
inter-action loop Clustering and classification are repeated for
each iteration for this method The second choice is to
move points of the first query to ideal points based on new
relevant/irrelevant examples from the following
interac-tions This method assumes that one can get at ideal query
points from the first constructed query points Since we do
not rebuild local groups, the clustering step is performed
once at the beginning (during the first interaction loop), in
the following interactions the query is built based on the
multiple point query from the first iteration
We can observe that the first choice is more influenced
by query point movement than query expansion, because it
attempts to move the multiple points query to the ideal
query In contrast, the second choice is more influenced by
query expansion because it tries to create the ideal query
points based on the clustering We are calling these two
methods: Clustering-Repeat (CR) and
Clustering-No-Repeat (CNR) The two corresponding algorithms are
described below
Clustering-Repeat (CR) In this approach, the clustering
step of relevant examples, the classification step of irrelevant
examples and the multiple point query construction step are
repeated for each iteration of relevance feedback Thus, the
system performs the same process for all iterations The
query of the previous iteration does not directly affect the
new query for the current iteration Examples from the
pre-vious iteration are also included in the current iteration
Implicitly, relevant points are added and irrelevant ones get
dropped as we move from one iteration to the next
Clustering-No-Repeat In this approach, the previous
query affect directly the new query The clustering step of
relevant examples is performed once at the beginning (first
iteration) Then, during subsequent iterations, instead of
making a new clustering as in the case of the CR method,
both of relevant/irrelevant examples are classified in points
of the previous query, so take advantage of the previous
query New query points are refined from the
relevant/irrel-evant examples using the query point movement technique:
In these two algorithms, we can observe that the
dif-ference is in steps 3, 4 and 5 In the case of the CNR
method, step 3 is performed only once (at the first iteration) while it is repeated for all iterations for the CR method In step 4, only the irrelevant set is classified into clusters for the CR algorithm, while both sets (relevant and irrelevant) are classified into the clusters for the CNR algorithm Step
5 of the CR algorithm, the relevant set is used to rebuild the local groups (step 3 is repeated) Finally, the formula used
to construct the multiple points query is different for two algorithms
Discussion In this section, we have presented our approach with two variants for relevance feedback Our approach combines two techniques of query modification: query point movement and query expansion, to take advantage of irrelevant examples and to address the prob-lem of unimodality and trying to eliminate all irrelevant examples in the result Both variants of our approach (Clustering-Repeat and Clustering-No-Repeat methods) are aiming at finding the ideal query points when we move from one interaction loop to another The first method (Clustering-Repeat) aims to replace irrelevant query points
by relevant query points The second method
Trang 7(Clustering-No-Repeat) aims to move query points to ideal points The
first method (CR) is more dependent on the performance of
the clustering method used than the second one because in
the CR method the clustering is repeated for all iterations
The second method (CNR) is more dependent on the
construction of the initial points For example, if all the
possible relevant examples can be represented in n distinct
groups but the relevant examples labeled by the user and
used to construct the initial points belonging to c \\ n
distinct groups, this can produce a loss in the result The
computational complexity of the two algorithms is the sum
of the complexity of the clustering and the classification
methods used In our case, the Competitive Agglomeration
is a fuzzy clustering method which has a computational efficiency (complexity) of O(CDN), C being the number of prototypes, the data points are D-dimensional and N the number of data points to cluster The kNN classification method has a complexity of O(DN), where the data points are D-dimensions and N is the total number of points The total complexity is O(CDN) ? O(DN) which is are suitable for retrieval analysis in large image datasets, remembering that as in our assumption/condition for each interaction the number of samples processed (relevant/irrelevant exam-ples) is very small, estimated at 20 maximum (limited by the quantity of images that the user can label
4.1 Selection of clustering method
In our approach of relevance feedback, an important step concerns the clustering of user feedbacks Clustering is used to cluster relevant images in separate groups In our system, the number of groups is unknown We are there-fore interested in clustering methods able of determining automatically the optimal number of groups We have experimented using 2 methods: Adaptive K-Means (Kot-hari and Pitts1999) and Competitive Agglomeration (Fri-gui and Krishnapuram 1997) These two methods are chosen for their ability to automatically determine the number of groups, and are representative of two known types of clustering methods in the literature: hierarchical methods and partitional methods
Adaptive K-means The best known algorithm for clus-tering is the k-means method For p models:
fxl:l¼ 1; 2; ; pg; xl2 Rn ð4Þ the k-means method obtains the position of the k cluster centers ymby minimizing the cost function given by:
J¼Xp l¼1
Xk m¼1
IðymjxlÞjjxl ymjj2 ð5Þ
where ||.|| denotes a distance metric, I(ym|xl) is an indicator function which equals 1 if l = arg minł ||xl- ył||2and 0 otherwise
In the Adaptive K-Means method (Kothari and Pitts,
1999), the proposed cost function is:
J¼Xp l¼1
Xk m¼1
IðymjxlÞjjxl ymjj2þ extra term ð6Þ
extra term¼Xp
l¼1
Xk m¼1
~
kmIðy~ mjxlÞjjym yxjj2 ð7Þ
Trang 8where ~IðymjxlÞ is an indicator function which equals 1 if
ym2 Ny x;x¼ argminłjjxl yłjj2; and Ny x are
neighbor-hoods of the center of the cluster yx
There are two terms in the cost function: the first is
similar to the k-means method, the second is an extra term
This extra term tries to spread the cluster centers to
mini-mize the sums of squares of distance of a cluster center to
cluster centers nearby
Smaller values for the neighborhood encourage the
formation of several centers in separate clusters, while
large values for the neighborhood encourage the formation
of fewer distinct cluster centers The Adaptive K-Means
method identifies the neighborhood as a scale parameter
and provides the number of centers of clusters at different
values of the scale parameter The number of centers of
clusters in the data is then obtained based on the stability of
clusters by varying the scale parameter
Competitive agglomeration This second clustering method
by (Frigui and Krishnapuram1997) minimizes an objective
function that integrates the advantages of hierarchical and
partitional clustering techniques The Competitive
Agglom-eration algorithm produces a sequence of partitions with a
decrease in the number of groups Competitive
Agglomera-tion begins with data partiAgglomera-tioning on a specified number of
groups, and finally provides the ‘‘best’’ number of groups
During the clustering phase, the adjacent groups playing
against each other to capture the data points, and groups that
are gradually losing in the competition run out and disappear,
until only groups with large cardinality survive The algorithm
can incorporate different distance measures in the objective
function to find a number of groups in various forms
Discussion on clustering methods In our experiments,
different clustering methods were studied to calculate the
local groups Taking advantage of the benefits of both
hierarchical and partitional clustering, Competitive
Agglomeration (Frigui and Krishnapuram1997) seems to
produce the best performance in our extensive testing
Another advantage of this clustering method is the automatic
selection of the number of groups Our experiments have
shown that the choice of the clustering and the classification
methods does not influence much the final result, because the
total number of samples (relevant/irrelevant) is very small
Let us recall here that the user marks only a few examples as
relevant or irrelevant during the relevance feedback process
We will present the experiment to compare these clustering
methods in the result section of this paper
5 Evaluation
We presented our contribution on relevance feedback for
content-based image retrieval with two methods These
methods are based on a combination of two popular tech-niques: query point movement and query expansion The main idea of our approach is to avoid the problems asso-ciated with query point movement and query expansion to enhance search results This approach provides a good tool
to improve the performance of image retrieval In this section we present our experiments to evaluate our meth-ods for relevance feedback
5.1 Experimental protocol For our experiment, we are using 3 different databases: Corel 30K image database (Gustavo et al 2007), Cal-tech256 database (Griffin et al.2007) and Pascal VOC2011 database (Everingham et al 2007) User interactions are simulated using external knowledge corresponding to the manual annotations in this database Three methods of relevance feedback are evaluated in this experiment: the query point movement, the query expansion and our pro-posed method with two variants which are Clustering-Repeat (CR) and Clustering-No-Clustering-Repeat (CNR)
The content-based image retrieval system used in the experiments is based on the state-of-the-art Bag of Words model (Sivic and Zisserman2008) Visual words are built using the SIFT feature, computed as in (Sivic and Zisser-man 2008) All the results presented in this section will evaluate the improvement between the initial response from the system (after the initial query) and the one obtained after relevance feedback (in percent of improve-ment for the precision and recall measures)
5.1.1 Experimental database The Corel 30K image database contains 30,000 images divided into different categories by experts and there are
100 images in each class The Caltech256 database contains about 30,000 images divided into 256 different categories by experts and there are about 100 images in each class The Pascal VOC2011 database contains about 15,000 images, each image being in one or sev-eral of the 23 different categories (multiple class images)
We rely on a simulation of human interaction, using data already in Corel30K, Caltech256 and PascalVoc2011, playing a role somewhat similar to that of a human A technique of pseudo-relevance feedback is used to simu-late automatically human interactions in relevance feed-back Our approach relies on the use of textual annotations given for the images in this database, for which there are various possibilities for specifying a ground truth for validation
Trang 95.1.2 Discussion on the protocols used for other systems
For the MARS system (Ortega and Mehrotra2004), images
relevant to a query image are selected as follows A query
image Q is selected at random from the database and
retrieval for the first 50 image results This set of 50 images
is referred to the set relevant(Q) Then new queries are
constructed by moving around of Q (these queries are close
to Q in the feature space) It is then considering Q as the
ideal query Queries are chosen from around Q in the hope
that they will achieve the ideal query Q (using relevance
feedback) Then the first 100 images are retrieved, which
become the retrieved (Q) In Mars, precision and recall are
calculated using the relevant (Q) set and retrieved (Q) set
using the classical formulas below:
precison¼relevantðQÞ
T retrievedðQÞ
rappel¼relevantðQÞ
TretrievedðQÞ
For the MARS system (Ortega and Mehrotra2004), the
relevant set is selected by ensuring the unimodality since
all images are visually similar to a query image The
authors assume that all the relevant images form a
unimodal, assumption which is not entirely realistic,
creating an implicit limitation of the approach In
addition, this work supports all measures on average
about 100 queries, which is very small compared to the
number of images in the database In another example, the
QCluster system (Kim et al 2005), the ground truth is
relatively simple because information from high-level
category in the Corel database is used as ground truth for
simulating the relevance feedback The images of the same
class are considered as the most relevant images and
related categories (such as flowers and plants) are
considered relevant This assumption creates an easy
condition for the relevance feedback, because the number
of relevant images is then higher compared with other
approaches [e.g Mars (Ortega and Mehrotra 2004)],
explaining the good quality results for the QCluster system
5.1.3 Our experimental protocol
For our experiment, we consider the ground truth as the
class of images in Corek30K, Caltech256 and
Pascal-Voc2011, which can produce a wide variety of classes, but
that seems representative of real life conditions We
mea-sure the retrieval performance with the classical criteria of
recall/precision by retrieving the first 100 responses (we
assume that the user can see only 100 results on the screen
interface) Most of studies (Huiskes and Lew2008; Yimin and Aidong2004; Faria et al.2010) on relevance feedback use only a sub-database (10, 20 or 50 categories) for exp-riment on Corel30K and Caltech256 due to the great number of images in these databases (30,000) while the number of images in each category is small (100) This is done to stress the effect of relevance feedback in the val-idation process Following a similar protocol, we are dividing the whole database into five different experiment sets to ensure there are relevant images in the first 100 images retrieved The PascalVoc2011 database has 14,961 images and there are from 275 to 1,366 images in each class (except for one class which has 7,419 images), so there is no need to divide this database For the experi-mentation, we are using about 5,000 queries for each experiment set
One parameter for relevance feedback is the number of feedbacks given by the user at each iteration This number
of training examples is usually small In our experiments,
we rely on the assumption that a maximum of 20 images can be selected by the user These images are chosen as the first P relevant examples and the first N irrelevant examples
in the first 100 responses, where P ? N B 20 These examples are automatically returned by the system using the ground truth as we use a technique of pseudo-relevance feedback to simulate automatically human interaction We propose two strategies for the number of examples:
1 Ten relevant examples, 10 irrelevant examples in the case of query point movement, CR and CNR And 20 relevant examples in the case of query expansion We remind that query expansion does not use irrelevant examples because this technique attempts to combine the relevant examples to form the multiple point query
2 Five relevant examples, 5 irrelevant examples in the case of query point movement, CR and CNR And 10 relevant examples in the case of query expansion
5.2 Results and discussion 5.2.1 Retrieval performance over 3 image databases
In this section, the 4 relevance feedback techniques are compared according to the protocol described above As mentioned above, we compute the classical criteria of recall/precision by retrieving the first 100 responses As the number of images of each class in Corel30K and Cal-tech256 database is about 100 (thus, the number of relevant examples is equal to the number of examples retrieved), the recall for the first 100 retrieved images is equal to the precision
For the Corel30K database, in the case of experiments based on 10 sample images (Fig.7), our methods are better
Trang 10than query expansion and query point movement CNR
method is slightly better than CR method After two
iter-ations of relevance feedback, query point movement has
the worst performance; the other three methods are with
equivalent performance During subsequent iterations, both
methods CR and CNR become better than traditional
techniques The average precision of traditional techniques
is approximately of 0.244 after five iterations, while the
CNR method has an average accuracy of 0.288 and the CR
method has an average accuracy of 0.279 The
improve-ment in accuracy of our methods over traditional
tech-niques is 18 % from these results
In the case of experiments with 20 images of feedback
(Fig.8), the CNR method outperforms all other methods
Our methods have better performance for the early
itera-tions, but the accuracy of the CR method is not better than
query point movement for the following iterations In this
case, query expansion gives the worst performance; query
point movement and the CR method have the same
per-formance with an average accuracy of about 0.305, the
CNR method with the best average accuracy of 0.39 The
improvement in accuracy for the CNR method compared
with traditional techniques is 28% in this experiment
Our methods give better results compared to query
modification techniques used in MARS (Ortega and
Me-hrotra2004) Both also provides a significant improvement
in average accuracy compared to QCluster (Kim et al
2005) They show improvements of 18 and 28 %
(respec-tively for 10 and 20 examples of relevance feedback in the
first 100 retrieved images) as compared with traditional
techniques QCluster has an improvement of 20 %
com-pared with traditional techniques, but for this approach, the
number of examples is the maximum number of relevant
images in the first 100 images result This number is greater than the number of examples in our proposed methods (20 maximum) In reality, the approach proposed
by QCluster seems unrealistic in terms of usage, because it
is difficult to ask too many interactions from the user A system asking the user 20 interactions seems more realistic compared to one who is asking 100 In addition, Qcluster and MARS are evaluated on only 100 queries and their ground truths are selected solely for their own methods Our method is evaluated on a number of 5,000 queries that provides so much more than generic QCluster and MARS For the Caltech256 database based on 20 sample images (Fig.9), query expansion is the worst and query point movement and CR method are the same The first iteration, all methods have the same performance, while for the latter two iterations, CR is better than query point movement but
in the 5th iteration, query point movement is better than
CR Only CNR method is always better than other meth-ods The average precision of the best traditional technique
is 0.308 after 5 iterations, while the CNR method has an average accuracy of 0.368 and the CR method has an average accuracy of 0.296 The improvement in accuracy
of CNR method over traditional techniques is about 20 % For the PascalVOC2011 database based on 20 sample images (Fig.10), query expansion is also the worst and query point movement is better than CR method For the first iteration, the two traditional techniques have better performance than our methods During the latter iterations, query point movement is better than CR method but CNR method always outperforms all other methods The average precision of the best traditional technique is about 0.393 after 5 iterations, while the CNR method has an average accuracy of 0.464 and the CR method has an average accuracy of 0.370 The improvement in accuracy of CNR
Fig 7 Corel30K: Average accuracy for the first 100 retrieved images
for the four techniques of relevance feedback with 10 feedback
examples for each iteration QE, Query expansion; QPM, Query point
movement; CR, Clustering-Repeat; CNR Clustering-No-Repeat Both
CR and CNR methods show very good performance compared to
existing query modification techniques
Fig 8 Corel30K: Average accuracy for the first 100 images from the four techniques with 20 examples of relevance feedback for one iteration QE, Query expansion; QPM, Query point movement; CR, Clustering-Repeat; CNR Clustering-No-Repeat The CNR method gives the best result