DSpace at VNU: Cluster-based relevance feedback for CBIR: A combination of query point movement and query expansion

From a single point initial query, query expansion provides a multiple point query, which is then enhanced using query point movement.. To learn the multiple point queries, the irrelevan

Trang 1

O R I G I N A L R E S E A R C H

Cluster-based relevance feedback for CBIR: a combination

of query point movement and query expansion

Nhu-Van Nguyen•Alain Boucher•Jean-Marc Ogier•

Salvatore Tabbone

Received: 30 June 2011 / Accepted: 5 June 2012 / Published online: 21 June 2012

Springer-Verlag 2012

Abstract This paper presents a cluster-based relevance

feedback method, which combines two popular techniques

of relevance feedback: query point movement and query

expansion Inspired from text retrieval, these two

tech-niques are giving good results for image retrieval But

query point movement is limited by a constraint of

un-imodality in taking into account the user feedbacks Query

expansion gives better results than query point movement,

but it cannot take into account irrelevant images from the

user feedbacks We combine the two techniques to profit

from their advantages and to cope with their limitations

From a single point initial query, query expansion provides

a multiple point query, which is then enhanced using query

point movement To learn the multiple point queries, the

irrelevant feedback images are classified into query points

which are clustered from relevant images using the query

expansion technique The experiments show that our

method gives better results in comparison with the two

techniques of relevance feedback taken individually

Keywords Image retrieval Relevance feedback Query point movement Query expansion

1 Introduction There are two reasons for limited performance of all Content-Based Image Retrieval (CBIR) systems The first one is that it is impossible to fully express all the user intent into a simple query for retrieval The latter is due to the the semantic gap, which can be defined as the differ-ence between the user interpretation and the computer description for an image In order to resolve these prob-lems, several researchers (Zhou and Huang2003; Nguyen

et al.2009; Apostol et al.2005; Kim et al.2005; Ritendra

et al 2008; Ortega and Mehrotra 2004; Yoshiharu et al

1998) have applied the relevance feedback (RF) techniques

in CBIR over the last decade Significant improvements in performance have been witnessed in the application of RF techniques in the traditional text retrieval domain Nowa-days, RF has become an essential component of a CBIR system

RF is an interactive strategy which is effective to improve the accuracy of information retrieval systems The basic idea of RF is that the user is involved in the retrieval process so the final result set is improved In particular, the user gives feedback on the relevance of documents in an initial set of results It adapts the retrieval process for a specific user and a specific query The user first submits a query (an image as example in our case), then receives some results After that, the user interacts with the system

by labeling some images as relevant or irrelevant with the given query The system, in turn, computes a better revised set of retrieval results based on the user feedback RF has a short-term memory which means that the system can

N.-V Nguyen ( &) J.-M Ogier

L3i-University of La Rochelle, La Rochelle, France

e-mail: Nhu-Van.Nguyen@univ-lr.fr

J.-M Ogier

e-mail: Jean-Marc.Ogier@univ-lr.fr

N.-V Nguyen A Boucher

IFI, MSI team; IRD, UMI 209 UMMISCO; Vietnam National

University, Hanoi, Vietnam

e-mail: alain.boucher@auf.org

N.-V Nguyen S Tabbone

QGAR-LORIA, University of Lorraine, Nancy, France

e-mail: tabbone@loria.fr

DOI 10.1007/s12652-012-0141-z

Trang 2

remember the results during the interaction process for the

given query Once it is finished, the system cleans its

memory and the next user starts from scratch

Various relevance feedback techniques have been

pro-posed to improve the retrieval performance: weight

fea-tures learning (Yoshiharu et al.1998), query modification

(Ortega and Mehrotra2004), classifier learning (Tao et al

2006) Among them, query representation modification is

the most popular technique and is widely used in both

image retrieval and text retrieval Query modification

includes two different techniques: query expansion and

query point movement The first technique, query point

movement (Ortega and Mehrotra 2004; Yoshiharu et al

1998) is referred to as the retrieval by single point query (as

represented in the feature space) which is modified via

relevant and irrelevant images, which represent positive

and negative feedbacks from the user It is working with

the assumption of the unimodality of relevant images

(Yimin and Aidong 2004) Unimodality means that all

relevant images are similar between them and they form a

distinct cluster from other images in the feature space

Query point movement tries to obtain the ideal query point

by moving it towards relevant images and away from

irrelevant ones The second technique, query expansion

(Ortega and Mehrotra2004; Kim et al.2005), is referred to

as the retrieval by multiple point queries Instead of

assuming an unimodal distribution, query expansion

assumes many smaller unimodal distributions to construct

multiple point queries from relevant images Query

expansion is arguably one of the most effective approaches

of relevance feedback

In this paper, a novel method for combining these two

techniques is proposed for query by example in CBIR

Query expansion is used to construct multiple point queries

by clustering the relevant images Query point movement is

used to improve the representation of the multiple point

queries by applying the Rocchio technique (Salton 1971)

on the relevant and the irrelevant images Our contribution

is a cluster-based relevance feedback technique which uses

the query point movement technique and the irrelevant

examples to enhance the efficiency of query expansion

This paper is divided into 6 sections In Sect 2 the

related work is described and the remaining problems are

discussed in Sect.3 Section4presents our method Section

5discusses the evaluation and presents experimental results

on a large dataset with 30K images Section6 concludes

the paper and gives some future directions for work

2 Related work

Because of the problem of fully expressing the user intents

using a simple query and the problem of the semantic gap,

there have been many works focusing on relevance feed-back Various relevance feedback techniques have been proposed: weight features learning (Yoshiharu et al.1998), query modification (Ortega and Mehrotra2004), classifier learning (Tao et al 2006) Weight features learning improves the distance function, query modification looks for the ideal query point and classifier learning uses the relevant/irrelevant images as training data to construct a probability classifier Among the techniques for relevance feedback, query modification is based on the text retrieval approach and is often considered as the best approach of relevance feedback in image retrieval systems This tradi-tional type of approach is still very efficient compared to all other techniques in the two fields: text retrieval and image retrieval In the general context of the image retrieval process and the development of techniques of relevance feedback, a recognized problem is the small number of available examples We state the hypothesis that

a user can label up to 20 images only when most of the learning techniques require much more If we compare the Rocchio algorithm for query modification with learning algorithms (metric of classifier optimization), such as neural networks for example, it can be understood that the popularity of query modification is related to the fact that it requires very few examples in learning

To detail these two techniques for query modification,

we must first define the concept of unimodality of an image group Unimodality is a concept used by some authors in the field of reference feedback (Karthik and Jawahar2006; Yimin and Aidong 2004) to characterize the fact that the closest images of a query in the feature space are not all relevant to the query However, there is no clear definition

of this concept, so we define it as:

Definition The concept of unimodality of an image group means that all images in this group are similar and they form a group distinct from the other images in the feature space In relevance feedback, images in a group are similar

in the sense of their relevance with the given query The relevance can be estimated using an arbitrary threshold or function, or in the case of our work, indicated by the user who is labeling some images in the retrieval results as relevant or irrelevant Relevance is then a subjective notion meaning that it satisfies the query as judged by the human user An image group is defined as centered on the query in the feature space, or in another words as the most closest retrieval results for the given query

For example in Fig.1, the left group is unimodal while the right group is not unimodal

The query modification technique, which we focus on in this paper, can be achieved using either of two approaches: query point movement and query expansion In both approaches, the input is a single point query (or a vector in

Trang 3

the feature space) Query point movement aims at moving

the single point query in the feature space (adjusting the

feature vector of the query point, Fig.2) Query expansion

aims at replacing the single point query by a multiple point

query (replacing a feature vector by multiple feature

vec-tors, Fig.3) Each technique uses the incremental

infor-mation from interactions with the user, or in other terms,

the relevant/irrelevant images returned (labeled) by the

user

2.1 Query point movement

In the query point movement approach (Ortega and

Me-hrotra 2004; Yoshiharu et al 1998) for the query by

example in CBIR, a query is represented by a single point

in the feature space and the refinement process attempts to

reformulate the query vector to move it closer to the area

containing relevant images (see Fig.2) With the

assump-tion of the unimodality of relevant images, the optimal

query maximizes the similarity to relevant images and

minimizes the similarity to irrelevant ones (Kim et al

2005) The Rocchio technique (Salton1971) is often used

to compute the optimal query:

qiþ1¼ aqiþ b

jDrj

X

d2D r

d c

jDnj

X

d2D n

where q~iis the query at iteration i of the relevance feedback process, Dris the relevant set, Dnis the irrelevant set, a, b and c give the relative weights of q, Drand Dn In exper-iments, the set of parameters a = b = c = 1 is widely used for image retrieval

2.2 Query expansion

In the query expansion approach (Kim et al.2005; Ortega and Mehrotra 2004), the query is modified by selectively adding new relevant point to the query representation A single point query is replaced by a multiple point query (see Fig.3) Instead of assuming an unimodal distribution

as in query point movement, query expansion assumes many smaller unimodal distributions to construct multiple local clusters from the relevant images The representatives

of local clusters are used to perform multiple point que-rying The clustering of relevant images is repeated for each relevance feedback iteration Querying by multiple points is investigated in (Xiangyu and James2003; Natsev and Smith 2003; Thijs and de P Vries Arjen 2004; Tah-aghoghi et al 2002; Apostol et al 2005; Danzhou et al

2009) which are focused on the similarity function and the fusion of multiple single point query Experimental eval-uation in (Ortega and Mehrotra 2004) shows that query expansion outperforms query point movement in retrieval effectiveness

Recently, new approaches are aiming to improve the query modification technique The QCluster system (Kim

et al 2005) uses a new adaptive classification and cluster-merging method to find multiple regions The clustering step is not repeated as in query expansion QCluster clas-sifies relevant examples into the previous clusters or create

a new cluster The number of clusters is limited to a fixed number by using a cluster-merging method But this complex approach is unable to make effective use of irrelevant examples All the above methods still have

Fig 1 Unimodality of an image group based on the user feedbacks:

relevant (?) and irrelevant (-) result images compared with the given

query A non-unimodal image group (the group includes irrelevant

images as judged by the user given the query) could contain some

unimodal subgroups, as in the right group where we can identify

contains 3 unimodal subgroups (but not-centered on the query) In our

work, we try to identify these unimodal subgroups from a

non-unimodal image group

Fig 2 Query point movement a The initial query and the user

feedbacks (relevant ‘‘?’’ and irrelevant ‘‘-’’ result images) b The

query moves toward the relevant images c The query moves toward

the relevant images until it is positioned at the center of the relevant

images

Fig 3 Query expansion: a a single point query is replaced by b a multiple point query, using the user feedbacks, relevant (?) example images only

Trang 4

drawbacks such as local maximum traps and slow

con-vergence In (Danzhou et al.2009), the authors propose a

fast query point movement technique to get rid of these

drawbacks However, their work aims to specific target

search by using relevance feedback, which has some

dif-ference with the category search done in classical CBIR

Target search in CBIR systems refers to finding a specific

(target) image such as a particular registered logo or a

specific historical photograph

2.3 Multiple point query

Query expansion requires support for multiple point

que-rying Querying by multiple point is investigated in (Thijs

and de P Vries Arjen 2004; Tahaghoghi et al 2002;

Apostol et al.2005) which are concerned by the similarity

function and the fusion of multiple single point queries

The similarity of images for each single point query is

determined independently The result for a single point

query is an ordered list Lists from all single point queries

must be combined to determine the final ranking of the

multiple points query A combining function is therefore

required to reduce multiple similarity values to a single

value When this reduction has been performed for all

images in the collection, the user is presented with a list of

the images, presented in decreasing order of similarity All

combining functions can be resumed into three types:

MINIMUM, MAXIMUM and SUM These types

deter-mine the distance of images from the specified multiple

points query to be respectively the minimum, the

maxi-mum, and the sum of the distances (with weights) to each

single point query In our experiment, the MINIMUM

function is found to be the best combining function in term

of robustness This is also confirmed by Tahaghoghi et al

in (2002)

3 Remaining problems

The main disadvantage of query point movement is the

constraint of unimodality (see previous definition in Sect

2) on relevant examples The main problem for query

expansion is its difficulty to use effectively irrelevant

images In query point movement, the query point is moved

closer to the relevant examples and away from the

irrele-vant ones in the feature space When the releirrele-vant images

are grouped in distinct subsets in the feature space (that is

to say the distribution of the relevant examples is not

unimodal), then the problem arises from the need to cover

multiple clusters with a single query In these cases, the

ideal query point includes irrelevant examples Figure4

shows the ellipse representing the line equidistant from a

new query We can see some irrelevant examples included

in the relevant ellipses

Query expansion and its best improved version QCluster (Kim et al 2005) only use relevant examples to form queries to multiple points The technique of query expan-sion does not use irrelevant examples because we cannot perform clustering using relevant and irrelevant examples together, which would give false groups Our analysis on the subject suggests that without irrelevant examples, convergence towards the ideal query point can potentially

be very slow, and also the risk of falling into a local minimum is not insignificant Indeed, a false ideal query point can be achieved when the local group is close to some relevant examples, but located near also many irrelevant examples (see Fig 4) We can see from this figure that irrelevant examples may be included in local groups, because these are constructed based only on relevant examples regardless of the presence or not of irrelevant ones In general, relevance feedback techniques often use relevant feedback examples The management of irrelevant feedback examples remains a major growth factor, thus representing a very open scientific question (Xuanhui et al

2008)

4 Clustered-based relevance feeback for CBIR

In this section, we present our approach which attempts to provide precise answers to questions previously identified This approach exploits irrelevant examples and combines query point movement and query expansion

A combination of query point movement and query expansion is proposed to overcome problems related to query expansion and query point movement The main drawback of query point movement is the constraint of unimodality on relevant examples that cannot be always verified We solve this problem by using a clustering

Fig 4 Remaining problems with query point movement and query expansion a In query point movement the ideal query point can include some irrelevant examples (-) due to the non-unimodality of the relevant examples b In query expansion, ideal query points slowly converge when irrelevant examples (-) are not used Both techniques can cause result in a local maximum trap

Trang 5

technique to build multiple local clusters that provide local

unimodality using relevant examples The main drawback

of query expansion is the inability to make effective use of

irrelevant examples In our approach, we propose a

sequential combination of the two techniques: first query

expansion (Fig.5b) then query point movement (Fig.5c)

We are taking advantage of irrelevant examples using the

technique of query point movement on multiple local

clusters created using query expansion We believe this

sequential combination is the best among all possible

combinations because it ensures the unimodality constraint

and makes use of irrelevant examples (Fig.5c) to

effec-tively achieve the ideal query The opposite combination

(first query point movement then query expansion) is not

good as query expansion cannot profit from irrelevant

examples which were used in query point movement

The purpose of this technique is to reach the ideal query

through interaction with the user and to overcome the

identified problems for both query point movement and

query expansion The first relevance feedback interaction

loop is shown in Fig.6 Initially, a single point query is

formalized by using the feature vector of an image query q:

Q = f1, f2, , fn fi is a n-dimension vector in the feature

space Then images are retrieved, the first N images are

shown to the user (which has a limited view due to screen

interface constraints) The user identifies and labels

rele-vant/irrelevant images in an interaction process of RF, with

the assumption that relevant examples in the result do not

ensure the unimodality (Fig.6, steps 1 and 2) Basing on

(only) relevant/irrelevant images returned from the user the

technique will replace and improve the single point query

q by a multiple point query qi, i [ 1 (a query with multiple

feature vectors) using the two main processes: query

expansion and query movement

First, the single point query q is expanded into a

mul-tiple point query to ensure the unimodality (of each

sub-query) which is the problem of query point movement

(Fig.6, step 3): the relevant examples are clustered into c

groups C1, C2,…, Cc The number of clusters c is selected

automatically using an adaptive clustering technique and is limited to a maximum value In this step, we try to have the cluster/group maximums that are always unimodal Two clustering algorithms used in our system are presented in the end of this section Second, in order to find the ideal points of the c relevant groups, the query point movement technique is used: irrelevant examples are classified into these c groups (Fig 6, step 4) to identify irrelevant examples present in each local group (in contrast with query expansion where only relevant examples are used) Relevant and irrelevant examples in each group are then used to build the multiple point query by the Eq 1 (Fig.6, step 5) in which we try to move the query points closer to the relevant images and away from the irrelevant images The classifier k Nearest Neighbors (k-NN) is used in step 4 for the classification of irrelevant examples because of its efficiency and simplicity, the parameter k of the classifier is selected as follows:

and the query point q! of cluster i is calculated using thei Rocchio’s formula (Salton1971):

qi

! ¼Pmj¼1R!j

Pn j¼1!Ij

Fig 5 Combination of query point movement and query expansion,

where ideal query points are achieved more efficiently and quickly

and irrelevant examples are not present in local clusters a The initial

single point query and the feedbacks (relevant ‘‘?’’ and irrelevant

‘‘-’’) given by the user b The multiple point query obtained by query

expansion c The multiple points query is moved towards relevant

feedbacks and away from irrelevant ones using query point movement

Fig 6 Main steps for the cluster-based relevance feedback

Trang 6

where I1, I2,…, In: n irrelevant examples and

R1, R2,…, Rm: m relevant examples of the local cluster Ci

These c points of query form the final multiple point query

As discussed above, in the first interaction loop, the

initial query (one sole point) is replaced by a multiple

points query by building local groups (clustering step) For

the following interaction loops, there are two choices to

improve the multiple points query The first choice does

not rely on the first multiple point query (clustering step of

the first iteration), but is re-clustering relevant examples at

each iteration This method attempts to add relevant query

points and to remove irrelevant points in this same query,

based on all relevant/irrelevant examples from each

inter-action loop Clustering and classification are repeated for

each iteration for this method The second choice is to

move points of the first query to ideal points based on new

relevant/irrelevant examples from the following

interac-tions This method assumes that one can get at ideal query

points from the first constructed query points Since we do

not rebuild local groups, the clustering step is performed

once at the beginning (during the first interaction loop), in

the following interactions the query is built based on the

multiple point query from the first iteration

We can observe that the first choice is more influenced

by query point movement than query expansion, because it

attempts to move the multiple points query to the ideal

query In contrast, the second choice is more influenced by

query expansion because it tries to create the ideal query

points based on the clustering We are calling these two

methods: Clustering-Repeat (CR) and

Clustering-No-Repeat (CNR) The two corresponding algorithms are

described below

Clustering-Repeat (CR) In this approach, the clustering

step of relevant examples, the classification step of irrelevant

examples and the multiple point query construction step are

repeated for each iteration of relevance feedback Thus, the

system performs the same process for all iterations The

query of the previous iteration does not directly affect the

new query for the current iteration Examples from the

pre-vious iteration are also included in the current iteration

Implicitly, relevant points are added and irrelevant ones get

dropped as we move from one iteration to the next

Clustering-No-Repeat In this approach, the previous

query affect directly the new query The clustering step of

relevant examples is performed once at the beginning (first

iteration) Then, during subsequent iterations, instead of

making a new clustering as in the case of the CR method,

both of relevant/irrelevant examples are classified in points

of the previous query, so take advantage of the previous

query New query points are refined from the

relevant/irrel-evant examples using the query point movement technique:

In these two algorithms, we can observe that the

dif-ference is in steps 3, 4 and 5 In the case of the CNR

method, step 3 is performed only once (at the first iteration) while it is repeated for all iterations for the CR method In step 4, only the irrelevant set is classified into clusters for the CR algorithm, while both sets (relevant and irrelevant) are classified into the clusters for the CNR algorithm Step

5 of the CR algorithm, the relevant set is used to rebuild the local groups (step 3 is repeated) Finally, the formula used

to construct the multiple points query is different for two algorithms

Discussion In this section, we have presented our approach with two variants for relevance feedback Our approach combines two techniques of query modification: query point movement and query expansion, to take advantage of irrelevant examples and to address the prob-lem of unimodality and trying to eliminate all irrelevant examples in the result Both variants of our approach (Clustering-Repeat and Clustering-No-Repeat methods) are aiming at finding the ideal query points when we move from one interaction loop to another The first method (Clustering-Repeat) aims to replace irrelevant query points

by relevant query points The second method

Trang 7

(Clustering-No-Repeat) aims to move query points to ideal points The

first method (CR) is more dependent on the performance of

the clustering method used than the second one because in

the CR method the clustering is repeated for all iterations

The second method (CNR) is more dependent on the

construction of the initial points For example, if all the

possible relevant examples can be represented in n distinct

groups but the relevant examples labeled by the user and

used to construct the initial points belonging to c \\ n

distinct groups, this can produce a loss in the result The

computational complexity of the two algorithms is the sum

of the complexity of the clustering and the classification

methods used In our case, the Competitive Agglomeration

is a fuzzy clustering method which has a computational efficiency (complexity) of O(CDN), C being the number of prototypes, the data points are D-dimensional and N the number of data points to cluster The kNN classification method has a complexity of O(DN), where the data points are D-dimensions and N is the total number of points The total complexity is O(CDN) ? O(DN) which is are suitable for retrieval analysis in large image datasets, remembering that as in our assumption/condition for each interaction the number of samples processed (relevant/irrelevant exam-ples) is very small, estimated at 20 maximum (limited by the quantity of images that the user can label

4.1 Selection of clustering method

In our approach of relevance feedback, an important step concerns the clustering of user feedbacks Clustering is used to cluster relevant images in separate groups In our system, the number of groups is unknown We are there-fore interested in clustering methods able of determining automatically the optimal number of groups We have experimented using 2 methods: Adaptive K-Means (Kot-hari and Pitts1999) and Competitive Agglomeration (Fri-gui and Krishnapuram 1997) These two methods are chosen for their ability to automatically determine the number of groups, and are representative of two known types of clustering methods in the literature: hierarchical methods and partitional methods

Adaptive K-means The best known algorithm for clus-tering is the k-means method For p models:

fxl:l¼ 1; 2; ; pg; xl2 Rn ð4Þ the k-means method obtains the position of the k cluster centers ymby minimizing the cost function given by:

J¼Xp l¼1

Xk m¼1

IðymjxlÞjjxl ymjj2 ð5Þ

where ||.|| denotes a distance metric, I(ym|xl) is an indicator function which equals 1 if l = arg minł ||xl- ył||2and 0 otherwise

In the Adaptive K-Means method (Kothari and Pitts,

1999), the proposed cost function is:

J¼Xp l¼1

Xk m¼1

IðymjxlÞjjxl ymjj2þ extra term ð6Þ

extra term¼Xp

l¼1

Xk m¼1

~

kmIðy~ mjxlÞjjym yxjj2 ð7Þ

Trang 8

where ~IðymjxlÞ is an indicator function which equals 1 if

ym2 Ny x;x¼ argminłjjxl yłjj2; and Ny x are

neighbor-hoods of the center of the cluster yx

There are two terms in the cost function: the first is

similar to the k-means method, the second is an extra term

This extra term tries to spread the cluster centers to

mini-mize the sums of squares of distance of a cluster center to

cluster centers nearby

Smaller values for the neighborhood encourage the

formation of several centers in separate clusters, while

large values for the neighborhood encourage the formation

of fewer distinct cluster centers The Adaptive K-Means

method identifies the neighborhood as a scale parameter

and provides the number of centers of clusters at different

values of the scale parameter The number of centers of

clusters in the data is then obtained based on the stability of

clusters by varying the scale parameter

Competitive agglomeration This second clustering method

by (Frigui and Krishnapuram1997) minimizes an objective

function that integrates the advantages of hierarchical and

partitional clustering techniques The Competitive

Agglom-eration algorithm produces a sequence of partitions with a

decrease in the number of groups Competitive

Agglomera-tion begins with data partiAgglomera-tioning on a specified number of

groups, and finally provides the ‘‘best’’ number of groups

During the clustering phase, the adjacent groups playing

against each other to capture the data points, and groups that

are gradually losing in the competition run out and disappear,

until only groups with large cardinality survive The algorithm

can incorporate different distance measures in the objective

function to find a number of groups in various forms

Discussion on clustering methods In our experiments,

different clustering methods were studied to calculate the

local groups Taking advantage of the benefits of both

hierarchical and partitional clustering, Competitive

Agglomeration (Frigui and Krishnapuram1997) seems to

produce the best performance in our extensive testing

Another advantage of this clustering method is the automatic

selection of the number of groups Our experiments have

shown that the choice of the clustering and the classification

methods does not influence much the final result, because the

total number of samples (relevant/irrelevant) is very small

Let us recall here that the user marks only a few examples as

relevant or irrelevant during the relevance feedback process

We will present the experiment to compare these clustering

methods in the result section of this paper

5 Evaluation

We presented our contribution on relevance feedback for

content-based image retrieval with two methods These

methods are based on a combination of two popular tech-niques: query point movement and query expansion The main idea of our approach is to avoid the problems asso-ciated with query point movement and query expansion to enhance search results This approach provides a good tool

to improve the performance of image retrieval In this section we present our experiments to evaluate our meth-ods for relevance feedback

5.1 Experimental protocol For our experiment, we are using 3 different databases: Corel 30K image database (Gustavo et al 2007), Cal-tech256 database (Griffin et al.2007) and Pascal VOC2011 database (Everingham et al 2007) User interactions are simulated using external knowledge corresponding to the manual annotations in this database Three methods of relevance feedback are evaluated in this experiment: the query point movement, the query expansion and our pro-posed method with two variants which are Clustering-Repeat (CR) and Clustering-No-Clustering-Repeat (CNR)

The content-based image retrieval system used in the experiments is based on the state-of-the-art Bag of Words model (Sivic and Zisserman2008) Visual words are built using the SIFT feature, computed as in (Sivic and Zisser-man 2008) All the results presented in this section will evaluate the improvement between the initial response from the system (after the initial query) and the one obtained after relevance feedback (in percent of improve-ment for the precision and recall measures)

5.1.1 Experimental database The Corel 30K image database contains 30,000 images divided into different categories by experts and there are

100 images in each class The Caltech256 database contains about 30,000 images divided into 256 different categories by experts and there are about 100 images in each class The Pascal VOC2011 database contains about 15,000 images, each image being in one or sev-eral of the 23 different categories (multiple class images)

We rely on a simulation of human interaction, using data already in Corel30K, Caltech256 and PascalVoc2011, playing a role somewhat similar to that of a human A technique of pseudo-relevance feedback is used to simu-late automatically human interactions in relevance feed-back Our approach relies on the use of textual annotations given for the images in this database, for which there are various possibilities for specifying a ground truth for validation

Trang 9

5.1.2 Discussion on the protocols used for other systems

For the MARS system (Ortega and Mehrotra2004), images

relevant to a query image are selected as follows A query

image Q is selected at random from the database and

retrieval for the first 50 image results This set of 50 images

is referred to the set relevant(Q) Then new queries are

constructed by moving around of Q (these queries are close

to Q in the feature space) It is then considering Q as the

ideal query Queries are chosen from around Q in the hope

that they will achieve the ideal query Q (using relevance

feedback) Then the first 100 images are retrieved, which

become the retrieved (Q) In Mars, precision and recall are

calculated using the relevant (Q) set and retrieved (Q) set

using the classical formulas below:

precison¼relevantðQÞ

T retrievedðQÞ

rappel¼relevantðQÞ

TretrievedðQÞ

For the MARS system (Ortega and Mehrotra2004), the

relevant set is selected by ensuring the unimodality since

all images are visually similar to a query image The

authors assume that all the relevant images form a

unimodal, assumption which is not entirely realistic,

creating an implicit limitation of the approach In

addition, this work supports all measures on average

about 100 queries, which is very small compared to the

number of images in the database In another example, the

QCluster system (Kim et al 2005), the ground truth is

relatively simple because information from high-level

category in the Corel database is used as ground truth for

simulating the relevance feedback The images of the same

class are considered as the most relevant images and

related categories (such as flowers and plants) are

considered relevant This assumption creates an easy

condition for the relevance feedback, because the number

of relevant images is then higher compared with other

approaches [e.g Mars (Ortega and Mehrotra 2004)],

explaining the good quality results for the QCluster system

5.1.3 Our experimental protocol

For our experiment, we consider the ground truth as the

class of images in Corek30K, Caltech256 and

Pascal-Voc2011, which can produce a wide variety of classes, but

that seems representative of real life conditions We

mea-sure the retrieval performance with the classical criteria of

recall/precision by retrieving the first 100 responses (we

assume that the user can see only 100 results on the screen

interface) Most of studies (Huiskes and Lew2008; Yimin and Aidong2004; Faria et al.2010) on relevance feedback use only a sub-database (10, 20 or 50 categories) for exp-riment on Corel30K and Caltech256 due to the great number of images in these databases (30,000) while the number of images in each category is small (100) This is done to stress the effect of relevance feedback in the val-idation process Following a similar protocol, we are dividing the whole database into five different experiment sets to ensure there are relevant images in the first 100 images retrieved The PascalVoc2011 database has 14,961 images and there are from 275 to 1,366 images in each class (except for one class which has 7,419 images), so there is no need to divide this database For the experi-mentation, we are using about 5,000 queries for each experiment set

One parameter for relevance feedback is the number of feedbacks given by the user at each iteration This number

of training examples is usually small In our experiments,

we rely on the assumption that a maximum of 20 images can be selected by the user These images are chosen as the first P relevant examples and the first N irrelevant examples

in the first 100 responses, where P ? N B 20 These examples are automatically returned by the system using the ground truth as we use a technique of pseudo-relevance feedback to simulate automatically human interaction We propose two strategies for the number of examples:

1 Ten relevant examples, 10 irrelevant examples in the case of query point movement, CR and CNR And 20 relevant examples in the case of query expansion We remind that query expansion does not use irrelevant examples because this technique attempts to combine the relevant examples to form the multiple point query

2 Five relevant examples, 5 irrelevant examples in the case of query point movement, CR and CNR And 10 relevant examples in the case of query expansion

5.2 Results and discussion 5.2.1 Retrieval performance over 3 image databases

In this section, the 4 relevance feedback techniques are compared according to the protocol described above As mentioned above, we compute the classical criteria of recall/precision by retrieving the first 100 responses As the number of images of each class in Corel30K and Cal-tech256 database is about 100 (thus, the number of relevant examples is equal to the number of examples retrieved), the recall for the first 100 retrieved images is equal to the precision

For the Corel30K database, in the case of experiments based on 10 sample images (Fig.7), our methods are better

Trang 10

than query expansion and query point movement CNR

method is slightly better than CR method After two

iter-ations of relevance feedback, query point movement has

the worst performance; the other three methods are with

equivalent performance During subsequent iterations, both

methods CR and CNR become better than traditional

techniques The average precision of traditional techniques

is approximately of 0.244 after five iterations, while the

CNR method has an average accuracy of 0.288 and the CR

method has an average accuracy of 0.279 The

improve-ment in accuracy of our methods over traditional

tech-niques is 18 % from these results

In the case of experiments with 20 images of feedback

(Fig.8), the CNR method outperforms all other methods

Our methods have better performance for the early

itera-tions, but the accuracy of the CR method is not better than

query point movement for the following iterations In this

case, query expansion gives the worst performance; query

point movement and the CR method have the same

per-formance with an average accuracy of about 0.305, the

CNR method with the best average accuracy of 0.39 The

improvement in accuracy for the CNR method compared

with traditional techniques is 28% in this experiment

Our methods give better results compared to query

modification techniques used in MARS (Ortega and

Me-hrotra2004) Both also provides a significant improvement

in average accuracy compared to QCluster (Kim et al

2005) They show improvements of 18 and 28 %

(respec-tively for 10 and 20 examples of relevance feedback in the

first 100 retrieved images) as compared with traditional

techniques QCluster has an improvement of 20 %

com-pared with traditional techniques, but for this approach, the

number of examples is the maximum number of relevant

images in the first 100 images result This number is greater than the number of examples in our proposed methods (20 maximum) In reality, the approach proposed

by QCluster seems unrealistic in terms of usage, because it

is difficult to ask too many interactions from the user A system asking the user 20 interactions seems more realistic compared to one who is asking 100 In addition, Qcluster and MARS are evaluated on only 100 queries and their ground truths are selected solely for their own methods Our method is evaluated on a number of 5,000 queries that provides so much more than generic QCluster and MARS For the Caltech256 database based on 20 sample images (Fig.9), query expansion is the worst and query point movement and CR method are the same The first iteration, all methods have the same performance, while for the latter two iterations, CR is better than query point movement but

in the 5th iteration, query point movement is better than

CR Only CNR method is always better than other meth-ods The average precision of the best traditional technique

is 0.308 after 5 iterations, while the CNR method has an average accuracy of 0.368 and the CR method has an average accuracy of 0.296 The improvement in accuracy

of CNR method over traditional techniques is about 20 % For the PascalVOC2011 database based on 20 sample images (Fig.10), query expansion is also the worst and query point movement is better than CR method For the first iteration, the two traditional techniques have better performance than our methods During the latter iterations, query point movement is better than CR method but CNR method always outperforms all other methods The average precision of the best traditional technique is about 0.393 after 5 iterations, while the CNR method has an average accuracy of 0.464 and the CR method has an average accuracy of 0.370 The improvement in accuracy of CNR

Fig 7 Corel30K: Average accuracy for the first 100 retrieved images

for the four techniques of relevance feedback with 10 feedback

examples for each iteration QE, Query expansion; QPM, Query point

movement; CR, Clustering-Repeat; CNR Clustering-No-Repeat Both

CR and CNR methods show very good performance compared to

existing query modification techniques

Fig 8 Corel30K: Average accuracy for the first 100 images from the four techniques with 20 examples of relevance feedback for one iteration QE, Query expansion; QPM, Query point movement; CR, Clustering-Repeat; CNR Clustering-No-Repeat The CNR method gives the best result

Định dạng
Số trang	12
Dung lượng	0,94 MB