One potential problem as pointed by user-centered studies is that the relevance measures in information retrieval systems are biased towards topicality and fail to capture the multidimen
Trang 1FOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF INFORMATION SYSTEMS
NATIONAL UNIVERSITY OF SINGAPORE
2005
Trang 2Secondly, I would like to thank A/P Danny POO and Dr Kan Min-Yen Their valuable suggestions definitely helped me a lot in the improvement of my research
In addition, many people have facilitated my research by providing suggestions in system development, data analysis, and thesis writing Among all the people there who have given me helpful assistance, I would like to present my special thanks to Mr Liu Chengliang, Mr Ji Yong, Mr Zhang Xinhua, Mr Cui Hang, Mr Wang Gang, Mr Chen Zhiwei and Mr Chen Xi
I gratefully acknowledge the financial support of National University of Singapore in the form of my research scholarship Besides, I would also like to express my gratitude for the excellent environment and facilities provided by NUS
My heartfelt thanks go to my friends for their constant love and support Miss Shen Wei,
Mr Chin Yee Yung, Miss Qian Bo and Miss Shi Yijing have each given me years of friendship and have done more for me than I could ever hope to repay Also, I would like
Trang 3to thank my lab-mates in the Knowledge Management Lab, such as Miss Teoh Say Yen,
Mr Kong Wei-Chang, Ms Chen Junwen, Mr Qian Zhijiang, Mr Cai Shun and Miss Yang Li Every day being with them was really enjoyable
Lastly, I would like to express my sincerest thanks to my parents Their love and understanding are my impetus to perform research during my postgraduate studies
Trang 4TABLE OF CONTENTS
ACKNOWLEDGEMENTS i
TABLE OF CONTENTS iii
SUMMARY v
LIST OF TABLES vii
LIST OF FIGURES viii
Chapter 1 Introduction 1
Chapter 2 Related Works 5
2.1 Subjective Relevance, Topicality and Novelty 5
2.2 Relevance in Personalized Information Retrieval Studies 8
2.3 Novelty in System-Centered Information Retrieval Studies 10
2.4 Integration Rule in Relevance Judgment 13
Chapter 3 Novelty-Augmented Systems 17
3.1 Topicality Profile and Judgment 18
3.1.1 Topicality Profile 18
3.1.2 Topicality Profile Updating Strategy 18
3.1.3 Topicality Judgment 19
3.2 Novelty Profile and Relevance Judgment 19
3.2.1 Novelty Profile Type I 19
3.2.2 Novelty Profile Type II 21
Chapter 4 Experiment Design 27
4.1 Testing Task 27
4.2 Testing Corpus 28
Trang 54.3 Experimental Procedure 29
Chapter 5 Result and Analysis 32
5.1 Performance Test 33
5.2 Test of Learning Assumption and Judgment Criteria Integration 38
5.3 Simulations and Sensitivity Analysis 40
5.3.1 Simulation of Relevance Feedback 42
5.3.2 Novelty Profile Updating Speed 43
5.3.3 Novelty Weight 43
Chapter 6 Discussion and Conclusion 46
Bibliography 50
Appendix A 59
Appendix B 62
Trang 6SUMMARY
Information overload becomes an immediate issue with the rapid progress of information technology, especially the WWW In order to help users better find their desired information, it is important to tailor information retrieval systems to meet individual preference However, the performances of most personalized information retrieval systems are still far from satisfactory One potential problem as pointed by user-centered studies is that the relevance measures in information retrieval systems are biased towards topicality and fail to capture the multidimensionality of users’ relevance judgment Furthermore, it has been also found by user-centered studies that novelty perception is the next most important factor of user’s relevance judgment besides topicality
Building on past user studies, this thesis proposes a novelty-based approach to personalized information retrieval which incorporates both topicality and novelty as relevance criteria More specifically, we propose a set of hypotheses regarding topicality and novelty in relevance judgment and test the validity of such hypotheses with real users using systems designed based on the hypotheses Particularly, we hypothesize that (1) novelty perception is a value-added criterion to improve personalized information retrieval, (2) relevance measures in past system-centered personalized information retrieval studies are biased towards topicality, (3) user’s novelty judgment standard is directed toward a subtopic and is slowly changing because user’s learning of document content in retrieval process is incomplete, and (4) relevance judgment of a document starts with topicality judgment followed by novelty judgment in a stepwise fashion A set
of personalized information retrieval systems has been designed to implement these
Trang 7propositions Our user test supports these hypotheses except for last one which might be insignificant because of the specific nature of the testing corpus
Trang 8LIST OF TABLES
Table 1: Example 1 – simple retrieval example based on vector space model 9
Table 2: Example 2 – vector space model with relevance feedback 10
Table 3: Summary of system-centered studies on novelty 12
Table 4: Potential PIR models that incorporate both topicality and novelty 16
Table 5: Novelty and topicality precision 38
Table 6: Missing evaluation analysis in simulations 62
Trang 9LIST OF FIGURES
Figure 1: System interface 30
Figure 2: Raw relevance precision 34
Figure 3: Adjusted relevance precision 36
Figure 4: Adjusted relevance precision by round 37
Figure 5: Interaction effect of learning assumption and judgment criteria integration rule 40
Figure 6: Simulation for traditional relevance feedback 42
Figure 7: Sensitivity of IL-Add and IL-Step to novelty updating speed 43
Figure 8: Sensitivity of MMR-Add5 to redundancy parameter 44
Figure 9: Sensitivity of IL-Add to novelty weight 45
Trang 10Chapter 1
Introduction
With the rapid progress of information technology especially the prosperity of WWW, the amount of information in the form of documents and web pages increases dramatically, which arouses an acute need for information retrieval (IR) systems to help users exploit such an extremely valuable resource However, one severe problem
of most IR systems such as search engines is that they are not tailored to meet individual preference Pretschner and Gauch (1999) noted that almost half of the documents returned by search engines are deemed irrelevant by their users An IR system typically treats a user only by the text query submitted by the user, and generates the same search results regardless of who submitted the query In order to discriminate the different information needs of the users, the learning ability and personalization of the IR systems is critical to achieve a satisfactory retrieval performance Therefore, personalized information retrieval (PIR) has been a very active research field in the past years Typical PIR techniques are based on relevance feedback and its variants (Ide, 1971; Ide and Salton, 1971; Rocchio, 1971; Salton & Buckley, 1990), which can be considered to be learning user’s interest model in a single search session Such techniques try to capture the context of a user’s query from extra feedback Furthermore, the application of relevance feedback technique to long-term personalization can be seen as a kind of user profiling In the IR domain,
Trang 11user profiling is the process of gathering feedback information either explicitly or implicitly from each user (Eirinaki & Vazirgiannis, 2003) By user profiling, a user’s interest or preference profile which represents user specific and individual needs can
be learned over time Some representative works of user profiling are term vector representation1 (Widyantoro et al., 2001; Widyantoro et al., 1999), ontology representation2 (Middleton et al., 2004; Speretta, 2004; Mostafa et al., 2003), and combination of term vector representation and ontology representation (Liu et al., 2002)
Although PIR researchers have made efforts to improve the personalization algorithms, unfortunately, the performances of most PIR systems are still far from satisfactory One potential problem as alleged by user-centered IR studies is that the relevance measures in IR systems fail to capture the multidimensionality of users’ relevance judgment and are most probably biased towards topicality (Borlund, 2003; Cosijn & Ingwersen, 2000; Schamber et al., 1990; Saracevic, 1970) Such argument is also supported by our study which will be discussed later It has been repeatedly found that a user’s relevance judgment encompasses not only topical match between information need and a document, but also novelty, understandability, reliability, and scope of the document (Xu & Chen, 2005; Maglaughlin & Sonnewald, 2002; Fitzgerald & Galloway, 2001; Bateman, 1998; Spink et al 1998; Wang & Soergel,
1 Term vector representation of user profile is in the same way as a document or a query is represented in vector space model (Salton & McGill, 1983)
2 The core of ontology representation of user profile is to assign weights to various concepts obtained by some
predefined topic concept hierarchy such as Open Directory Project (Netscape, 1998).
Trang 121998; Park, 1993; Schamber, 1991) Among all these criteria, novelty perception, referring to the degree to which the content of a document is new to the user or different from what the user has known before, is considered the next most important factor besides topicality (Xu & Chen, 2005)
If the relevance criteria uncovered by user-centered IR studies are indeed important and the current system-centered PIR systems could not fully incorporate such aspects, the next question is how to incorporate these aspects into the system design of PIR systems This question is important because the actual performance of PIR systems following the guidance of user-centered IR studies offers a way to verify the validity
of findings from user studies as well as providing the PIR researchers a new direction
to create innovative systems beyond parameter tweaking and algorithm modification
Based on the findings of user-centered IR studies on relevance judgment, the purpose
of this thesis is to propose a novelty-based approach to PIR which incorporates both topicality and novelty as relevance criteria More specifically, we would like to propose a set of propositions regarding users’ novelty perception and the way topicality and novelty perceptions are integrated in relevance judgment as well as testing the validity of such propositions with real users using PIR systems designed based on the propositions In particular, we propose that (1) novelty perception is a value-added criterion to improve personalize information retrieval, (2) relevance measures in past system-centered PIR studies are biased towards topicality, (3) user’s
Trang 13novelty judgment standard is directed toward a subtopic and is slowly changing because user’s learning of document content in retrieval process is incomplete, and (4) relevance judgment of a document starts with topicality judgment followed by novelty judgment in a stepwise fashion A set of PIR systems have been designed to implement these propositions Our user test supports these propositions except for proposition 4
This thesis is organized as follows In chapter 2, we review related user and system IR studies to propose the propositions Chapter 3 presents a set of PIR systems based on these propositions to various degrees After that, we elaborate the experiment design
of user test in chapter 4 In chapter 5, experimental results are reported and analyzed Chapter 6 discusses the implications and limitations of this work and concludes the thesis
Trang 14Chapter 2
Related Works
2.1 Subjective Relevance, Topicality and Novelty
In a broader sense, the objective of personalization is to improve the effectiveness of information retrieval by adapting to individual users’ needs (Croft et al., 2001) From
a relevance judgment perspective, such individual need can be the user’s individual and specific criteria of relevance In order to build an effective PIR system which incorporates relevance criteria, it is necessary to examine the nature of relevance at first
To the heart of user-centered relevance is the recognition that relevance is subjective, multidimensional and dynamic (Schamber et al., 1990) Subjectivity means that relevance is personal What one considers as relevant might not be considered so by others Multidimensionality means that there are multiple criteria in relevance judgment For example, Bateman (1998) listed forty criteria that affect relevance judgment, covering aspects of content topicality, document availability, novelty, currency, information quality, presentation quality, and source characteristics Schamber (1994) synthesized a list of more than eighty factors Surely many of such criteria could be redundant or insignificant (Barry & Schamber, 1998) However, some categories of criteria were repeatedly found to be present Xu and Chen (2005) summarize five criteria from a representative list of thirteen empirical user studies, i.e.,
Trang 15topicality, novelty, understandability, reliability, and scope Topicality and novelty are regarded as two key dimensions of relevance Given the identified importance of novelty in relevance judgment, our first proposition is that novelty is value-added criterion to improve personalized information retrieval performance when it is incorporated into system design
If relevance is multidimensional, then two questions must be answered: First, can a single user profile, regardless the way it is quantified (e.g in vector space model, probabilistic model, or language model), effectively capture all different aspects of relevance judgment? Second, if all different aspects of relevance judgment can be captured by a single user profile, should they be processed (e.g., modeled, updated) in the same way when a document is to be judged? These questions motivate us to explore the possibility of using a multidimensional user profile, as we shall discuss shortly Finally, user-centered relevance also emphasizes the dynamics of the relevance judgment (Harter, 1992) The basic tenet is that the user’s knowledge in a domain area and the user’s information need are constantly modified by the information item she examines (Harter, 1992) Therefore, the judgment of topicality, novelty, and other criteria evolves in the information seeking and retrieval process as the user ‘consumes’ different documents
As two major dimensions of relevance judgment, topicality and novelty are also subjective and dynamic However, they differ in the degree of subjectivity and
Trang 16dynamics Topicality, which measures the “aboutness” of a document to the topic of interest, is considered more objective (Bookstein, 1979) That is why subject indexing
is possible (Bookstein, 1979) Borlund (2003) terms the topic match between information need and a document the intellectual topicality which can be agreed upon
by multiple judges For example, if the information need is papers on recent development of probabilistic IR models, old papers on probabilistic IR model by Robertson and Spark Jones (1976) might still be considered as on topic by many searchers, although the searchers might have already read the paper and its usefulness
is marginal Therefore, topicality is relatively objective as different searchers can have certain degree of consensus Topicality is also relatively stable For a searcher, it does not change in one search session (Vakkari, 2003) In contrast, novelty is more subjective and volatile Novelty is affected by the user’s background knowledge (Barry, 1994; Bateman, 1998) What one regarded as novel might not be novel to another A novel document can cause noticeable change in user’s cognition, which in turn affects her information need and relevance judgment criteria for later documents (Harter, 1992) Therefore, novelty has to be individual and dynamic It is impossible
to impose a novelty standard on a document with a rule of majority Although novelty
is dynamic, the speed of change may vary Consider a hypothetical scenario when students are asked to find information on a topic for a course: in one case they are required to download documents for later group discussion; in the other they are required to find and study a set of documents to prepare for an examination The novelty judgment may change in both cases; however, the speed is faster in the later
Trang 17case Therefore, learning and learning rate are intrinsic components of novelty judgment, because seeking new and even contradicting information is to create a bank
of potentially useful knowledge and further improve people’s problem-solving skills (Hirschman, 1980; p.284)
2.2 Relevance in Personalized Information Retrieval Studies
The criticism from user-centered IR researchers is that relevance measures in system studies (e.g., using vector space model) fail to capture the multidimensionality, subjectivity, and dynamics of users’ relevance judgment (Borlund, 2003; Cosijn & Ingwersen, 2000; Schamber et al., 1990; Saracevic, 1970) However, such criticism is partially overstated in the sense that PIR studies do offer methods to capture subjectivity and dynamics The most popular idea of PIR systems is the use of relevance feedback in either explicit or implicit fashion On the other hand, the criticism is right in pointing out that relevance measure in PIR studies cannot accommodate for the multidimensionality and is biased toward topicality
Consider the following example A user is interested in the health impact of using mobile phone She has already known that the potential health threat is due to phone radiation But she is not sure if such radiation poses severe threat to her son when she buys a mobile phone for him She goes to an online search engine and submits a query
Q which consists of three terms: ‘mobile phone’ (a bi-gram), ‘radiation’, and ‘child’ Document D1 and D2 are returned (Table 1) In Q, ‘mobile phone’ is a topicality term;
Trang 18‘child’ is a novelty term; ‘radiation’ is on-topic but non-novel Each term has its corresponding term frequency in the query and documents as illustrated in Table 1 Assume Q is used as user profile, the cosine similarity between Q and D1 can be regarded as relevance judgment In this case, both topicality and novelty are incorporated in the final score However, such vector profile is impersonal, as it does not differentiate the user’s background knowledge and interest D1 and D2 will produce same relevance score although one is clearly better than the other Moreover, the term weight in the original query does not differentiate the importance of topicality and novelty terms
Table 1 Example 1 - simple retrieval example based on vector space model
Documents Mobile Phone Child Radiation
‘radiation’ While this example might look contrived, it is nevertheless representative
Trang 19because a search engine typically returns more mediocre document than highly relevant ones In that case, topicality terms are present at most documents while the novelty terms are buried in an array of on-topic but not-interested documents This situation could be further exacerbated by the binary classification of relevance If D2 and D3 are both treated as fully relevant and given a weight 1 (i.e., there is no partial relevance), the relevance feedback process will in fact give ‘radiation’ higher weight (5) than ‘child’ (4.5) The main cause here is the updating strategy forced on by relevance feedback technique which does not differentiate topicality and novelty terms In the long run, such updating strategy makes the topicality terms stand out, but blunts the novelty terms
Table 2 Example 2 - vector space model with relevance feedback
Documents (weight) Mobile Phone Child Radiation
Relevance feedback profile 5 3.5 3.5
This leads us to our second proposition: Relevance measure in system-center PIR studies is biased toward topicality
2.3 Novelty in System-Centered Information Retrieval Studies
Independent of user-centered IR research, system-centered researchers have introduced novelty to IR too Zhang et al (2002) recognize the limitation of
Trang 20traditional relevance measure “A common complaint about information filtering systems is that they do not distinguish between documents that contain new relevant information and document documents that contain information that is relevant but already known” (p81) They notice that it is unrealistic to expect a single component (i.e., user profile and judgment model) to satisfy both topicality and novelty Novelty
is regarded as a value added measure to topicality Novelty is also regarded as order-dependent When documents are evaluated in different orders, their novelty measure should change; therefore novelty is dependent on what has been seen before They classify documents into i) not relevant, ii) relevant but contains no new information, and iii) relevant and contains new information In order to measure novelty, they propose the concept of redundancy The retrieval system follows a two-step process Documents are first evaluated for topicality Only on-topic documents are evaluated for redundancy They define redundancy as the amount of relevant information in the current document that is covered by relevant documents delivered previously Novelty is defined as the opposite to redundancy Five redundancy measures are proposed, including word set difference between the current document and prior retrieved documents, the cosine similarity between the two, distributional similarity based on different language models, and mixed language models Surprisingly, their testing result shows that cosine metric is very effective in detecting novel documents The mixed model which is composed of a general language model, a topic language model, and a ‘new information’ model performs the
second best
Trang 21The same two-stage process and novelty-as-redundancy assumption are adopted by Yang et al (2002), Brants et al (2003), Allan et al (2003), and Kumaran and Allan (2004) with various variations of novelty measure However, the rationale is the same that topicality and novelty are staged decisions in relevance judgment One exception
to the staged model is Zhai et al (2003) In their study, language models are used to calculate the probabilities that a document is on-topic and novel The two probabilities are then multiplied to give the document a final score Such a multiplicative model essentially assumes that topicality and novelty are compensatory – high topicality score can compensate for low novelty score, or vice versa
Table 3 Summary of system-centered studies on novelty
Novelty Definition
Decision Strategy Novelty profile Novelty Measures
Zhang et al., 2002 Redundancy First eliminate by
topicality, then sort by novelty
Historical documents
Word set difference, cosine similarity, language modeling
Yang et al., 2002 Redundancy First eliminate by
topicality, then sort by novelty
Historical documents
Word set difference enriched with named entities
Brants et al., 2003 Redundancy First eliminate by
topicality, then sort by novelty
Historical documents
Cosine similarity enriched with document
segmentation Zhai et al., 2003 Redundancy One-step
multiplicative model
Historical documents
Historical documents
Multiple cosine similarities with name entities or non-named entities Allan et al., 2003 Redundancy First eliminate by
topicality, then sort by novelty
Historical sentences
Word set difference, word count, cosine similarity, language modeling
Table 3 summarizes a representative list of system-centered IR studies on novelty It
Trang 22can be observed that the system studies have explored novelty in some specific domains of information retrieval On one hand, they have recognized the limitation of topicality, and have made attempts to incorporate novelty They have also proposed a set of measures for novelty Moreover, the stepwise model is most popular, reflecting
an intuitive recognition of the stepwise decision making process in human judgment
On the other hand, these system-centered IR studies suffer some critical theoretical limitations First, the definition of novelty is different from the user-centered perspective Novelty is not personal and subjective It is based on a historical document set If two people start with the same document set, they have to regard next document as of same novelty Second, novelty is reduced to redundancy If a document or sentence is similar to ones seen before, it is non-novel However, such simplification prerequisites a strong assumption that learning of new information is instant, complete, and independent of the individual It also assumes system users are diversity seeking, rather than subtopic-focused in novelty judgment In reality, people may want to drill down to more details of a subtopic; novelty judgment is directed and certain degree of redundancy is welcome Therefore, our third proposition is that users’ novelty judgment standard is directed because users’ learning of document content is incomplete
2.4 Integration Rule in Relevance Judgment
Past PIR studies also paid little attention to the justification of the use of stepwise or composite measure for document relevance judgment When users judge a document,
Trang 23how do they integrate topicality and novelty evaluations? We shall start by analyzing the nature of relevance judgment First, is relevance criteria compensatory, i.e., can novelty compensate for topicality or vice versa? There are compelling arguments in information science that topicality is the first and the most important criteria for relevance judgment Presence of topicality as a condition for other criteria to operate
is widely accepted among researchers (e.g Cosijn & Ingwersen, 2000; Schamber, 1994; Park, 1993, 1997) Froehlich (1994, p.129) highlights that “all relevance
judgments start with topically relevant materials (which is an appropriate first step of
system), but then diverse criteria come into play…” [italics in original] Mizzaro (1997), in summarizing the history of relevance research, notice that relevance criteria
are identified beyond topicality If topicality is a necessary condition for other criteria
to operation, then we should predict that if a document is off-topic, all other factors should not matter to the relevance judgment Therefore, relevance judgment is not compensatory when topicality is below certain point In that circumstance, user
follows an elimination-by-topicality heuristic in the first step (Greisdorf, 2003; Wang
& Soergel, 1998)
What if a document is judged on-topic? A user might put aside topicality (since it is already satisfied) and focus on the next most important attribute, say, novelty On-topic documents are then sorted by novelty, and the best a few are accepted This line of reasoning was clearly articulated by Boyce (1982) with a proposal for two-stage retrieval process In the first stage, documents are filtered by topicality In
Trang 24the second stage, documents are sorted by ‘informativeness’ The resort to other criteria beyond topicality implies that topicality can be treated as a binary variable Greisdorf (2003) depicts a stepwise judgment process starting with topicality, followed by understandability and pragmatic usefulness Judgment at each step is regarded as binary Therefore, Greisdorf (2003) allures to an elimination-by-aspect process through out the whole decision process If we relax the binary nature of later judgment by saying understandability and usefulness is a matter of degree, then we shall arrive at the same conclusion as Boyce (1982) that the second stage can be
sorted by a criterion such as novelty Therefore, we have a ‘first
eliminate-by-topicality then sort-by-novelty’ decision strategy
In short, users’ relevance judgment is better considered non-compensatory This is also consistent with the findings in the decision-making theory Payne (1976) and Bettman & Park (1980) found that when the number of alternatives is large, decision making becomes attribute-based (judge by individual attributes) early in the process Moreover, a decision maker tends to use information that is easily available (Slovic, 1972) In IR, the document title is the often first piece of information, which is more indicative of topicality than novelty Novelty is to be inferred or obtained after skimming or reading the document Therefore, our proposition 4 is that when topicality and novelty are considered, non-compensatory integration rule is a better approximation of users’ relevance judgment
Trang 25If we cross-tabulate the assumption of the compensation relationship between topicality and novelty and the assumption of learning, we have four quadrants of possible PIR models (Table 4) The studies done in the system-centered IR have focused the combination of non-compensatory relationship and novelty based on complete learning (e.g., Zhang et al., 2002), with the exception of compensatory relationship and novelty based complete learning (Zhai et al., 2003) However, no exploration has been done based on incomplete learning hypothesis Neither is the comparison of system performance based on different hypotheses
Table 4 Potential PIR models that incorporate both topicality and novelty
Complete learning Incomplete learning
Non-compensatory
First EBT then SBN
Novelty = dissim (P N
, d i) System-centered IR studies:
Zhang et al., 2002; Yang et al., 2002; Brants et al., 2003; Allan
et al., 2003; Kumaran & Allan, 2004;
First EBT then SNB
Novelty = sim (P N
, d i) System-centered IR studies:
No study
Compensatory
sim(P T , d i)×(+)dissim (P N , d i) System-centered IR studies:
Zhai et al., 2003
sim(P T , di)×(+) sim (P N , d i) System-centered IR studies:
No study EBT: eliminate by topicality
Trang 26Chapter 3
Novelty-Augmented Systems
In order to test the above propositions, we propose a set of six PIR systems here which implement the propositions to various degrees Real user testing using theses systems can provide a validity test for these propositions Before we introduce the six systems, the boundary of our test needs to be laid out First, our objective is not to optimize the algorithms, but rather to test assumptions of novelty perception and relevance judgment Therefore, we are interested in observing statistically significant
or systematic performance difference rather than the magnitude of the difference which can be further improved by fine tuning the algorithms or exploring other alternatives Second, we assume user has only one search topic
We use vector space model as our foundation because the vector space model is found very robust across different applications (Allan et al., 2003; Zhang et al., 2002; Baeza-Yates & Ribeiro-Neto, 1999) For each user, there are two profiles, one for
topicality and one for novelty, denoted as P T and P N respectively A term vector with TFIDF3 weighting is used to represent topicality A novelty profile can be constructed
in different ways as will be explained shortly To capture the subjectivity and
3 TFIDF (Salton & McGill, 1983) is a popular scheme for weighting terms TFIDF weight w i,j for term k i in
document d j can be given by
i j l
j j
n
N freq
freq
,
, , = × , where freq i,j is the frequency of term k i in the
document d j , max l freq l,j is the maximum term frequency among all terms occurring in document d j , N is the total
number of documents and ni is the total number of documents containing k
Trang 27dynamics of topicality and novelty perceptions, we use manual relevance feedback technique for all models
3.1 Topicality Profile and Judgment
3.1.1 Topicality Profile
Topicality profile is constructed with a term vector with TFIDF weighting because such term vector is largely biased toward topicality as we discussed above A user’s topicality profile starts with the initial query
3.1.2 Topicality Profile Updating Strategy
Topicality profile updating strategy is based on Rocchio’s relevance feedback (Rocchio, 1971) Assume a user evaluates and assigns topicality scores to a set of documents, the topicality scores can be used to update the initial profile The only difference from the traditional relevance feedback is that feedback is not based on relevance evaluation We use only positive feedback Therefore,
t T t
i T d R P P
Trang 28different weight to past and current documents as in traditional relevance feedback The differential contribution of a document to profile is determined by the product of
its weighted terms and user’s topicality perception T i Such formulation is reasonable because we assume a fixed topic of interest The purpose of profile maintenance is to better describe the topic rather than accommodating for drifting interest
3.1.3 Topicality Judgment
Topicality score of a document is calculated with a similarity function which measures the similarity between the topicality profile and the document vector If the relevance of a document is based on topicality match only, we have a topicality-based retrieval system:
Rel(d i ) = sim(d i , P T)
An IR model based on topicality profile serves as our baseline model We name it topicality-feedback model (TF) Cosine similarity function is used in our system Comparison with other systems that are augmented with novelty provides a way to verify proposition 1
3.2 Novelty Profile and Relevance Judgment
3.2.1 Novelty Profile Type I
There are two types of novelty profile based on different assumptions of user learning Type one assumes that users learn new document instantly and completely Therefore, novelty is the opposite of redundancy, and novelty profile is basically a redundancy
Trang 29profile This line of thinking started with Carbonell et al (1998) and was followed by most system-centered IR researchers (Allan et al, 2003; Brants et al., 2003; Zhai et al., 2003; Yang, et al., 2002; Zhang et al., 2002), although different profile construction methods (e.g., simple document collection, language model, mixed language model) have been employed
Carbonell et al (1998) proposed the maximum marginal relevance (MMR) model In this model, the “marginal relevance” of a document is measured with a weighted sum
of its similarity to query (which is what we defined as topicality) and its redundancy (dissimilarity) to previously selected documents A document with high marginal relevance mandates a certain degree of relevance to the query and minimal similarity
to previously retrieved documents Mathematically, let R be the set of documents seen before, d i be the new document at time t, the redundancy score of d t,Rd(d i |R) is:
Rd(d i |R) = max dj ∈R sim(d i , d j), and the relevance of the document can be defined as:
MMR(d i |R) = αsim(d i , P T) – (1-α)Rd(di |R), where sim(d i , P T) measures the topicality of the document with regard to a topicality
profile P T and Rd(d i |R) is a measure of novelty In this formulation, the novelty profile
of a user is basically the collection of past retrieved documents; to update the profile
is to add newly-retrieved documents to the collection In the original MMR, the initial
query was used in the place of P T Our adaptation incorporates dynamics into the
topicality judgment When cosine similarity is used, the first term, sim(d i , P T), is the
Trang 30topicality feedback model The topicality importance factor, α, is set to 0.5 or 0.6 for two different groups of subjects In Xu & Chen (2005), it was found that users assign topicality and novelty relatively equal weights, therefore, we set α to 0.5 for one system The use of system with α=0.6 is based on the result of a pilot study which shows 0.6 might give the best performance The sensitivity to this parameter can be further tested in the future The above model assumes 1) that learning is complete, and 2) that relevance judgment is a compensatory decision We name it the additive MMR model (MMR-Add) According to the different value for α, we will name the two versions MMR-Add5 and MMR-Add6
We can further adapt the MMR model for stepwise relevance judgment, whereby documents are sorted by topicality first, then by redundancy
)
(if)(
)(
if 0, )(
|R d Rd
s*
, P d sim d
i i
T i i
where s* is a topicality cutoff value s* can be set to a value between 0 and 1 or top k
documents In this study, we set it to twenty Fine tuning of this parameter can be explored in future research This model assumes 1) complete learning, 2) stepwise relevance judgment We name it the stepwise MMR model (MMR-Step)
3.2.2 Novelty Profile Type II
Novelty type two assumes incomplete learning A user wants to read more on a novel aspect of the topic until she is satisfied Therefore, what is regarded as novel in the last round of evaluation should be a judgment standard for the next round In other
Trang 31words, the novelty judgment should be based on similarity to prior novel document rather than dissimilarity Because novelty is measured by similarity, we can use a term
vector to present the novelty profile P N as we did for topicality Another reason for using term vector is because novelty profile based on incomplete learning is like a subtopic profile as assumed in our proposition 3: a user’s novelty judgment standard
is directed toward a subtopic
TFIDF weighting might not be adequate for novelty profile because novelty is to be differentiated among on-topic documents, while TFIDF weighting is better at differentiating different topics If we still use the same method as topicality profile updating strategy for novelty profile building (the only difference is to use user’s novelty evaluation instead of topicality evaluation), neither those novel terms (representing unknown subtopic) can be differentiated in a user-evaluated novel document nor those non-novel terms (representing subtopic already known) can be differentiated in a non-novel document In fact, we expect novelty profile based on incomplete learning to discriminate those novel terms that contribute new knowledge
to users and those non-novel terms already within user’s knowledge scope
Assume we have sets of novel and non-novel documents, one way to identify the novelty feature terms is to use a classification algorithm The terms that best classify novel and non-novel documents are the feature terms To that end, we use the probabilistic measure F4 -proposed by Robertson and Spark Jones (1976) The F4
Trang 32measure of a term is the ratio of relevance odds and non-relevance odds, i.e., the ratio
of the odds that a relevant document contains term t j and the odds that an irrelevant document contains it The F4 measure can be regarded as a classification measure because it assigns weight to a term based on its relative probability in relevant and irrelevant documents Apply the F4 measure to a set of novel and non-novel
documents (rather than relevant and non-relevant), the weight of t j is:
) /(
) (
) /(
log )
| ( 1
)
| ( log )
| ( 1
)
| ( log
j j j j
j j
j j
j
j t
r R n S r n
r R r N
t P
N t P N
t P
N t P w
where P(t j |N) is the probability that novel documents contains t j; P(t j |N) is the
probability that non-novel documents contains t j r j is the count of novel documents
containing term t j , R is the count of all novel documents, n j is the count of all
documents containing t j , and S is the total number of documents The above
formulation is based on binary classification of document novelty To adapt it to multiple discrete evaluation levels (because how much new knowledge a document can provide to the user varies), a partially novel document can contribute to both the novel and the non-novel set proportionally For example, if the maximum novelty score is 7, then a document with novelty score 5 contributes 5/7 document to the novel document set and 2/7 document to the non-novel document set After such
conversion, now r j is the sum of novelty “fractions” of only those documents
containing term t j , R is the sum of novelty scores for all documents regardless of the terms contained, n j is the number of documents containing term t j , and S is the total
number of documents in the set
Trang 33Our pilot study showed that F4 could be more effective than TFIDF Nevertheless, the term vector representation of novelty profile suffers the limitation that it does not consider the interrelationship among terms In some cases, non-novel terms might form some “novel” ideas Algorithm improvement (e.g bi-gram, sentence level) can
be explored in future studies
In an interactive retrieval system, novel and non-novel documents come in batches Similar to traditional relevance feedback, only a small set of document is evaluated in each round A novelty profile can be generated for each round which captures the current judgment standard and can be used to update a “global” novelty profile We
term the local novelty profile for each round P N
Rt, i.e., the profile based on a small set
of documents Rt at round t There is a risk in calculating P N
Rt when all the documents are evaluated as novel, in which case no term has discriminating power To avoid such situation, after collecting user’s feedback on a set of documents (e.g., top 10), we automatically assume the bottom a few documents (e.g., bottom 10) to be non-novel This reduces the risk that novelty terms cannot be detected
The F4 measure identifies novelty feature terms regardless of the topicality implication of the terms In the extreme case, if a stop word appears only in novelty documents, it might be regarded as a good feature However, novelty is meaningful only when topicality exists Therefore, for all novelty terms so identified, we multiply
it with corresponding term weight in the topicality profile We then have a
Trang 34topicality-conditioned novelty profile
T j j j
P t F t
In this equation, off-topic terms will be discounted because of their low weight in topicality profile This makes novelty profile a local discriminator rather than a global one
Updating Strategy
The first novelty profile is built based on the first round of user novelty feedback After that, it can be updated in the succeeding rounds with the following formula:
N R N t N
P =(1−β) −1+β
where P Rt N is the feature term vector based on round t evaluations, and P t N is the
global profile at round t β is an updating parameter We set β to 0.8, which means that
novelty profile is largely based on the last round of feedback
Rel(d i) = γsim(d i , P T) + (1-γ)sim(d i , P N)
Trang 35where γ is the relative weight of topicality and novelty We set it to 0.5 for the same reason as explained in section “Novelty Profile Type I” Fine tuning of the parameter can be tested in future in empirical study We call this model the additive model with incomplete learning (IL-Add) Similarly, we can define relevance as:
)
(if) ,(
)(if 0, )(
s*
, P d sim d
i N
i
T i i
where s* is a topicality cutoff value as before We call this model the stepwise model with incomplete learning (IL-Step)
MMR-Add5, MMR-Add6, MMR-Step, IL-Add, and IL-Step are designed based on different combinations of the learning assumptions and integration rules The comparison of these systems offers us a way to test proposition 3 and 4 We will use simulation to test proposition 2, as will be discussed later