Multimedia question answering 2

Another noteworthy issue is thatthe initial relevance probabilities of each question in Q to the given question q are estimated based on Eqn.3.5.. Table 3.2: Meta information of our data

Trang 1

We ﬁrst rank all the questions in Q in a descending order according to their

rel-evance to q which is estimated via our adaptive probabilistic hypergraph model.

We then select the top n questions to form the semantic space In this chapter,

the relevance estimation is viewed as a transductive inference problem [158, 150],formulated as a regularization framework,

arg min

f Φ(f) = arg min

where Ω(f) and R(f) denote the regularizer on the hypergraph and empirical loss,

respectively The number λ is a regularization parameter to balance the empirical

loss and the regularizer

Inspired by the normalized cost function of a simple graph [101, 157], Ω(f)

Trang 2

is deﬁned as,

12

v HWD−1 e HTD−

1 2

v , we can further derive that

∑

v ∈V

h(v, e) δ(e) −

where I is an identity matrix Let ∆ = I− Θ, which is a positive semi-deﬁnite

matrix, the so-called hypergraph Laplacian [158], then Ω(f) can be rewritten as,

For the loss term, after introducing a new vector y containing all the initially

estimated relevance probabilities, it is stated as a least square function,

By minimizing Φ(f), the ﬁrst term guarantees that the relevance probability

func-tion is continuous and smooth in semantic space This means that the relevanceprobabilities of semantically similar questions should be close While the empiri-cal loss function forces the relevance probabilities to approach the initial roughlyestimated relevance scores These two implicit constraints are widely adopted inreranking-oriented approaches [128, 101]

Trang 3

However, in the constructed hypergraph, the eﬀects of hyperedges cannot

be treated on an equal footing, since they are generated from diﬀerent angles, panning from semantical similarities between QA pairs, to tag sharing networks,and users’ social behaviours Even through all the hyperedges are initialized withreasonable weights based on local information, further globally adaptive reﬁnementand modulation are still necessary Inspired by [150, 40], we extend the conven-

s-tional hypergraph to an adaptive one by integrating a two norm regularizer on W.

Therefore, Eqn.(3.7) is restated as,

where µ is a positive parameter For model simplicity, all the entries in W are

conﬁned to be non-negative, and add up to 1 We alternatively optimize f and W.

First, W is ﬁxed and partial derivatives with respect to f are taken on the

objective function We have

f = (1− η)(I − ηΘ) −1 y, (3.13)

where η = 1+λ1 Next we ﬁx f and optimize W with the help of Lagrangian, which

is frequently utilized in the optimization problems [40] The objective function istransformed into,

v H Replacing W in Eqn.(3.14) with Eqn.(3.15), and taking

derivatives with ξ, we obtain,

Trang 4

0 50 100 150 200 250 300 15

Number of Unique Tags

Figure 3.4: The tag frequency distribution with respect to the number of distincttags over our large-scale dataset Obviously, it follows a power law distribution

In the whole iterative process, we alternatively update f and W Each step decreases the objective function Φ(f) whose lower bound is zero Therefore, con-

vergence of our scheme is guaranteed [150, 40] Another noteworthy issue is thatthe initial relevance probabilities of each question in Q to the given question q are

estimated based on Eqn.(3.5).

It is intuitive that the conventional simple graph is a special case of hypergraph,where all hyperedges have degree two and represent only pairwise relationships Tofurther investigate the learning approaches based on these two kinds of graphs, wedevelop a regularization framework Φs (f) for simple graph,

12

The ﬁrst term is the normalized cost function controlling the smoothness, where

D is a diagonal matrix with its (i, i)-element equal to the sum of the i-th row of

Trang 5

Figure 3.5: The illustrative instance of semantically similar questions sharing sametags.

the aﬃnity matrix W Let Θ s = D−1WD−1, the simple graph Laplacian can bedenoted as ∆s = I− Θs It can be shown that the ﬁrst term is equivalent to fT∆s f

which is similar to the regularizer on the hypergraph in Eqn.(3.10) Analogous to

the empirical loss function of hypergraph, the second term is utilized to constrainthe ﬁtting, which means a good classifying function should not change too muchfrom the initial label assignment [157] Diﬀerentiating Φs (f) with respect to f, we

Trang 6

3.5 Relevant Tag Selection

Based on the first component, a tag space shared by the inferred question spacecan be generated effortlessly However, not all the roughly selected tag candidatesare able to well summarize the question content A heuristic tag relevance esti-mation approach is proposed in this section to further filter the tag candidates byintegrating multi-faceted cues Following that, the complexity of our scheme isanalyzed

According to our statistics, the tag frequency distribution in our dataset with spect to the number of distinct tags follows a power law as shown in Figure 3.4 Wefurther observe that the tags distributed in the head part of the power law tend to

re-be the phrases in high-level semantics, such as “technology”, “life”, t”, and so on They are too generic to be informative as tags On the other hand,the tail of the power law contains the tags with very low collection frequencies thatare usually extremely speciﬁc They are either unpopular abbreviations, personal-ized terms or informal spellings [76], such as “iSteve”, “WEBLOC”, etc Actually,these two phenomenons accord with our second assumption Moreover, it is alsofound that the closer the two questions semantically, the higher the probabilitiesthat the tags can be shared between them This again is coherent with our ﬁrstassumption The typical example is illustrated in Figure 3.5

“entertainmen-The foregoing analysis strongly suggests that the tag relevance estimationshould simultaneously damp generic tags, penalize speciﬁc tags as well as rewardtags from semantically closer questions It is formally stated as,

Score(q, t s ) = I(t s)× S(Q s , t s)× C(q, t s ), (3.20)

where q is the to be annotated question and t s is a tag from the inferred tag space

Trang 7

where o(t s ) refers to the occurrence frequency of tag t s in the entire data collection.

The second term measures the stability of tags, written as,

and family members Then the popularity of tag t s in the family is estimated byaveraging the voting from all family members Practically, if diﬀerent communityparticipants annotate more distinct questions from the same semantically similarspace using the same tags, these tags are more likely to reﬂect the objective aspects

of the semantic content and they are more reliable than tags with very lower lection frequencies Through the algorithm, unambiguous and objective tags thatreceive most neighbor voting will stand out

col-The last term in Eqn.(3.20) analyzes the tag relevance from the perspective

of its owners’ semantical closeness to q, stated as,

Trang 8

Table 3.2: Meta information of our data collection.

User Num Question Num Answer Num Tag Num Distinct Tag Num

The computational complexity of our scheme mainly comes from three parts: (1)feature extraction (including both of questions and answers); (2) adaptive proba-bilistic hypergraph learning; and (3) the heuristic approach for tag selection Un-doubtedly, feature extraction is the most computationally expensive step, but can

be handled oﬀ-line Actually, the complexity of the relevant tag selection can be nored due to the smaller size of tag candidates inferred by our ﬁrst component Forthe proposed hypergraph learning, the computational cost magnitude is analyzedas

ig-O(

t(E3+ 2V E2+ 2EV2+ V3) + dV2)

where t is the time of iterations and is usually below 10 in our work d stands for

the 29802-dimensional features The sizes of considered vertices and hyperedges

are respectively denoted as V and E, both in the order of thousands if we only truncate the top 1K questions based on the initial relevance probabilities Thus the

computational cost is very low In our experiments, the process can be completed

within 2 seconds(3.4GHz and 8G memory).

To collect relevant image and video data from the web, we need to generate priate queries from text QA pairs before performing search on multimedia searchengines We accomplish the task with three steps The ﬁrst step is query extrac-tion Textual questions and answers are usually complex sentences But frequently

Trang 9

appro-search engines do not work well for queries that are long and verbose Therefore,

we need to extract a set of informative keywords from questions and answers forquerying The second step is query selection This is because we can generate dif-ferent queries: one from question, one from answer, and one from the combination

of question and answer Which one is the most informative depends on the QApairs For example, some QA pairs embed the useful query terms in their questions,

such as “What did the Globe Theater look like” Some hide the helpful keywords in their answers, such as the QA pair “Q: What is the best computer for 3D art; A:

Alienware brand computer ” Some should combine the question and the answer to

generate a useful query, such as the QA pair “Q: Who is Chen Ning Yang’s wife; A:

Fan Weng ”, for which both “Chen Ning Yang” and “Fan Weng” are informative

words (we can ﬁnd some pictures of the couple, and only using “Fan Weng” tosearch will yield a lot of incorrect results)

For each QA pair, we generate three queries First, we convert the question

to a query, i.e., we convert a grammatically correct interrogative sentence into one ofthe syntactically correct declarative sentences or meaningful phrases We directlyutilize the method in [5] Meanwhile, the generated query is expanded with thesuggested tags if they are visual phrase [103] Second, we identify several keyconcepts from verbose answer which will have the major impact on eﬀectiveness.Here we employ the method in [15] Finally, we combine the two queries that aregenerated from the question and the answer respectively Therefore, we obtainthree queries, and the next step is to select one from them

The query selection is formulated as a three-class classiﬁcation task, since

we need to choose one from the three queries that are generated from the tion, answer and the combination of question and answer We adopt the followingfeatures:

ques-1 POS Histogram POS histogram reﬂects the characteristic of a query Using

Trang 10

POS histogram for query selection is motivated by several observations Forexample, for the queries that contain a lot of complex verbs it will be diﬃcult

to retrieve meaningful multimedia results We use POS tagger to assignpart-of-speech to each word of both question and answer Here we employthe Stanford Log-linear Part-Of-Speech Tagger and 36 POS are identiﬁed2

We then generate a 36-dimensional histogram, in which each bin counts thenumber of words belonging to the corresponding category of part-of-speech

2 Search performance prediction This is because, for certain queries, existingimage and video search engines cannot return satisfactory results We adopt

the method introduced in Section 3.3.3, which measures a clarity score for

each query based on the KL divergence between the query and collection guage models We can generate 6-dimensional search performance predictionfeatures in all (note that there are three queries and search is performed onboth image and video search engines)

lan-Therefore, for each QA pair, we can generate 42-dimensional features Based

on the extracted features, we train an SVM classiﬁer with a labeled training set forclassiﬁcation, i.e., selecting one from the three queries And the last step is queryexpansion, which expands the selected query with suggested tags

Our dataset for query generation comes from multiple resources For the ﬁrst

subset, we randomly collect 5, 000 questions and their corresponding answers from

2 They are: RB, DT, RP, RBR, RBS, LS, VBN, VB, VBP, PRP, MD, SYM, VBZ, IN, VBG, POS, EX, VBD, LRB, UH, NNS, NNP, JJ, RRB, TO,JJS, JJR, FW, NN, NNPS, PDT, WP, WDT, CC, CD, and WRB.

Trang 11

WikiAnswers For the second subset, we randomly collect 5, 000 questions and their best answers from the dataset used in [124], which contains 4, 483, 032 questions

and their answers from Y!A Here we use the best answer that is determined by theasker or the community voting3

When it comes to the evaluation on question annotation, a large real-worlddataset was created based on Zhihu, who oﬃcially announced that it owned ap-

proximately 300K users as at March 20124 Our dataset was collected in July

2012, comprising more than 105K connected users and all their associated data,

such as asked questions, replied answers, and the social connections among users

It accounts for a large fraction of the whole website and hence is comparativelyrepresentative for statistical analytics Table 3.2 displays the meta information ofour data collection

For ground truth labeling (including the ground truths for question tation and query generation), the ﬁve volunteers that have been involved in thelabeling task of [102] were involved again, including two Ph.D students and onefaculty in computer science, one master student in information system, and onesoftware engineer The labelers are trained with a short tutorial and a set of typi-cal examples We need to admit that the ground truth labeling is subjective But

anno-a manno-ajority voting anno-among the ﬁve lanno-abelers canno-an panno-artianno-ally anno-allevianno-ate the problem

We ﬁrst analyze the dataset for query generation from Y!A and WikiAnswers spired by [69, 44], we classify all the questions into two categories: conversationaland informational Conversational questions usually only seek personal opinions

In-3 There are also many research eﬀorts on ranking community-contributed answers or selecting the best answer by machine learning and NLP technologies [124, 45, 4] These methods can also been integrated with our work and we only need to change the best answer for each question.

4 http://tech.sina.com.cn/i/2012-03-16/16476844824.shtml

Trang 12

or judgments, such as “Anybody watch the Bears game last night ”, and

informa-tional questions are asked with the intent of getting information that the askers

hopes to learn or use via fact-oriented answers, such as “What is the population of

Singapore” There are several automatic algorithms for the categorization of

con-versational and informational questions, such as the work in [69] and [44] But since

it is not the focus of our work, we perform the categorization with human labeling

To be speciﬁc, each question is labeled by at least two volunteers independently

In the case that the first two volunteers have different decisions about the questiontype, we solicit two additional volunteers to label this question again The questionwill be viewed as ambiguous if the four voters cannot come to a majority classifi-cation It is worth noting that each volunteer was trained with the question typedefinition as well as corresponding examples before labeling This question type

labeling process is analogous to [44] In this way, we extract 3, 333 informational questions from the Y!A subset and 4, 000 from the WikiAnswers set The QA pairs

in our dataset cover a wide range of topics, including travel, life, education, etc.Query selection needs to learn classiﬁers based on several training data, and thus

we split the 7, 333 QA pairs into two parts, a training set that contains 5, 866 QA pairs and a testing set of the remaining 1, 467 QA pairs The testing set consists

of 800 QA pairs from WikiAnswers and 667 from Y!A Classiﬁcation models are

trained with the whole training set, i.e., 5, 866 QA pairs They are tested on the

800 QA pairs from WikiAnswers, 667 QA pairs from Y!A, or the both

We then perform statistical analytics on the dataset from Zhihu for questionannotation Based on table 3.2, we can easily calculate that the number of tags

per question and the number of unique tag occurrences on average is 2.48 and 16.9,

respectively These two values explicitly reveal the tag incompleteness problem andhigh rate of tag repeating utilization, correspondingly The reuse of tags demon-strates the rationality to select appropriate tags from the inferred tag vocabulary

Trang 13

0 50 100 150 200 0.0

Number of Social Connections

Figure 3.6: The distribution of the number of users with respect to the number ofsocial connections

Meanwhile, the repeating times determine the size of tag-based hyperedges, i.e.,

around 16.9 questions on average are grouped by one tag-based hyperedge

Al-so, this ﬁnding suggests that we only need consider 17 nearest neighbours whenconstruct the QA-based hyperedges

Figure 3.6 shows the distribution of the number of users with respect tofollowees and followers, respectively Both of them comply with power law distri-butions except two bottom-left points that refer to hundreds of users either have

no followers or have no followees Besides, it is derived that the average followersper user, 44, is relatively larger than the average followees per user, 28 This is why

we chose information from followees to construct the user-based hyperedges, i.e.,keeping the simplicity of our hypergraph Also, Figure 3.7 shows the distribution

of the number of users over the categorical posts, including questions and answers

This ﬁgure provides conclusive evidence that more than 1/2 users are not active,

who never ask or answer From the angle of statistics, community participants seem

to prefer answering (8.53 answers per user and 4.12 answers per question) to asking

Trang 14

0 1 2 3 4 5 6 >=7 0

Figure 3.7: The distribution of the number of users with respect to the number ofposts

(2.07 questions per user) Jointly analyzing these basic statistical data, another

important evidence can be inferred is the average size of user-based hyperedges: 60questions on average are gathered together by each user-based hyperedge

To represent the content of each QA pair, we ﬁrst eliminated the questionswithout any tag We then performed sentence segmentation for all remaining QA

pairs with the Stanford Parser [26] and obtained more than 200K chunks After

removing stop words and ﬁltering the chunks with frequencies smaller than 5, webuilt a 29802-dimensional bag-of-chunks histogram for each QA pair Meanwhile,

we randomly selected 50 questions as testing data

To evaluate the ranking based question space inference, we adopted NDCG@n as

our metric,

Trang 15

N DCG@n = rel1+

∑n i=2

• PRF: Pseudo-Relevance Feedback [143] A support vector machine (SVM)

classiﬁer was trained to perform the reranking based on the assumption thatthe top ranked questions are more relevant than the low-ranked results in

general The initial question ranking list was generated based on Eqn.(3.5).

(Baseline 1)

• RW: Random walk based reranking [53] is a typical simple graph-based

r-eranking method jointly exploiting both initial relevance probabilities andsemantically similarity between questions The stationary probability of ran-dom walk was used to compute the ﬁnal relevance scores The initial relevance

probabilities of each question was estimated based on Eqn.(3.5) (Baseline 2)

• CHL: Conventional hypergraph learning [54] The weights of diﬀerent

hy-peredges were not dynamically learned which are ﬁxed according to initial

estimation as described in Section 6.4.1 (Baseline 3)

• APHL: Our proposed adaptive probabilistic hypergraph learning approach

with alternative optimization between W and f.

For each method mentioned above, the involved parameters were carefully tuned,and the parameters with the best performances were used to report the ﬁnal com-parison results Meanwhile, the ground truth for these four strategies was created

Trang 16

5 10 20 30 50 0.60

Figure 3.8: Performance comparison of diﬀerent reranking based question spaceinference approaches in terms of NDCG at diﬀerent depths

by a manual labelling procedure through a pooling method Specifically, each ing question has a pool that was constructed by merging four top 50 semanticallysimilar questions recommended based on each strategy Then five human annota-tors with diverse backgrounds were invited to label all the questions pool by pool.Each question was assigned to be very relevant (score 2), relevant (score 1) or ir-relevant (score 0) with respect to the given question We performed a voting toestablish the final relevance level of each question For the cases that there weretwo classes having the same number of ballots, a discussion was carried out amongthe labelers to decide the final ground truths

test-Figure 3.8 illustrates the experimental results From this ﬁgure, our servation conﬁrms that the proposed approach consistently and substantially out-performs other current publicly disclosed state-of-art reranking algorithms acrossvarious depth of NDCG Among these four methods, the two hypergraph-basedlearning approaches show superiority over other two approaches One possible rea-son is the unreliable initial ranking list resulted by rough estimation The other

Trang 17

ob-main reason is hypergraph-based learning is able to capture the high-order ships among questions, i.e., the summarized local grouping information, in contrast

relation-to simple pairwise relationships characterized by other two approaches From thisﬁgure, we can also observe our proposed method performs stably better than theconventional hypergraph learning approach This well supports the fact that it isbetter to simultaneously learn the question relevance score and hyperedge weights

It is well known that for the annotation task, precision is usually more importantthan recall Therefore, we adopted two metrics that are able to capture precisions

from different aspects The first one is average S@K over all testing questions, which measures the probability of finding a relevant tag among the top K recommended tags To be specific, for each testing question, S@K is assigned 1 if a relevant tag was ranked in the top K positions and 0 otherwise The second one is average P @K that stands for the proportion of recommended tags that is relevant.

Table 3.3 presents the precision of relevant tag selection based on the above

two metrics It is observed that the performance in terms of S@1 is as high as 70%

that means for up to 70% questions, our proposed annotation scheme can suggest

a relevant tag at rank 1 Moreover, the value of S@5 almost ensures that at least

Trang 18

Table 3.3: The evaluation results of relevant tag selection in terms of diﬀerentmetrics.

one tag is relevant among the top 5 recommended tags Besides, P @5 achieves

58% accuracy that reﬂects about 3 out of the top 5 tags on average are able tocharacterize the question topics well From the view of eﬃcacy, the performances

of S@K and P @K unquestioningly conﬁrm the high applicability of our proposed

method in tag suggestion with human-computer interaction and automatic tagannotation without human interference, respectively

As discussed above, the two positive parameters λ and µ play important roles in

modulating the eﬀects of empirical loss and weighting regularizer, respectively The

Trang 19

former is widely tuned in hypergraph learning algorithms While for the latter, withvariation from zero to inﬁnity, the hyperedge weights will accordingly vary from anextremely balanced case to an extremely imbalanced case [150] Speciﬁcally, when

µ = ∞, the proposed adaptive hypergraph will be reduced to conventional

hyper-graph, since the optimal solution will assign identical weights for all hyperedges

On the contrary, if µ tends to zero, then the optimal results will be that only one

weight is 1 and all others are 0

In this section, we conducted a series of experiments to investigate the tivity of these two parameters We ﬁrst performed grid search with ﬂexible step size,

sensi-to seek the λ and µ with optimal reranking performance in terms of NDCG@20.

200 and 0.001 were respectively located for λ and µ The NDCG@20-λ curve is presented in Figure 3.9 with µ ﬁxed as 0.001 As illustrated, the performance grad- ually increases with λ growing and arrives at a peak at a certain value, then the

performance goes downward and ﬁnally becomes relatively stable Similarly,

Fig-ure 3.10 shows the NDCG@20-µ curve with λ ﬁxed as 200, where the performance varies according to diﬀerent µ With the increase of µ, more informative hyper-

edges are taken into consideration via updating the weights from zero to nonzero

It is also observed that when µ reach a certain value, the performance starts to

decrease This is due to more “incorrect” hyperedges are potentially introduced.However, based on the observations, we conclude that the performance of our pro-

posed method changes between (0.765, 0.803) when the parameters vary in a wide

range, which is not very sensitive

3.7.6.1 Application Scenario

Since the social QA content naturally and quickly evolves over time, informationseekers are usually overwhelmed by the huge amount of information routinely re-

Trang 20

0 1/16 1/8 1/4 1/2 1 2 4 8 16 x0.001 0.68

a table of contents like navigational hierarchy, depicting the topical relationshipswithin the data archive would be more effective In fact, some conventional cQAforums have partially provided taxonomy-based navigation, such as Y!A and BaiduKnows However, these predefined taxonomies are fixed and only support singletopic per question, which does not well coincide with the inherent features of QAknowledge

On the other hand, the invention of question tagging introduces an intuitiveand easy way for QA contents organization In this chapter, we propose a method

to automatically organize the set of QA pairs into various knowledge structures.The types of structures ﬂexibly depend on user needs, which is essential to assistinformation retrieval and user navigation

Trang 21

Figure 3.11: The question distribution over our selected topic taxonomy Only topﬁve dominant categories are illustrated.

3.7.6.2 Experiments

Supposing the hierarchical structure of Y!A is the desired ontology tree, containing

1263 leaf-level nodes distributed in 26 top-level categories For each tag in ourtag vocabulary, we ﬁrst directly map it to one leaf node by tree search algorithmbased on their semantic relatedness Then the associated QA pairs with tag ex-pansion automatically fall into the corresponding leaf nodes Here our proposedquestion annotation scheme will enlarge the tag number of each question up to 5 if

it originally not reach this threshold

The semantic relatedness between two concepts is usually estimated based onpath length according to well-structured corpus such as WordNet However, suchmethod is not robust to web phrases due to their free forms, “ipad”, for example.Inspired by Google distance [31], we measure the semantic distance between tagand category name via exploring their co-occurrence on Google,

d(t i , t j) = max(log r(t i ), log r(t j))− log r(t i , t j)

log G − min(log r(t i ), log r(t j)) , (3.27)

where G is the total number of documents retrieved from Google r(t i) is the

Trang 22

Table 3.4: The Ground truth distribution for Query selection.

number of hits for search concepts t i , and r(t i , t j) is the number of web documents

on which both t i and t j co-occur Then their semantic relevance is deﬁned as

S(t i , t j ) = exp ( −d(t i , t j )) , (3.28)

The collected question distribution over top-level categories is partially lustrated in Figure 3.11, which is normalized on 26 categories It is observed that

il-more than 1/5 participants are active in “computer&internet” category This result

is consistent with the Zhihu’s operation mechanism that it invited lots of expertsfrom computer science community in the start up stage

Finally, we conducted a user study to further evaluate the performance ofknowledge structure generation First, 50 questions were randomly selected Afterannotation, their tag set was almost doubled in size And then each of them wasshowed to 5 assessors together with their assigned leaf-level categories based onour proposed structure generation method Assessors independently judged eachassigned category with “correct” if the category can well capture the question topic.Otherwise, they were encouraged to mark the assigned category with “incorrect”

We obtained that around 76% of assigned categories were labeled as correct onaverage However, without the expanded tag set, it is not possible to come upwith such comprehensive structure, since the tag incomplete problem is extremelyserious as previously mentioned

Trang 23

Table 3.5: The classiﬁcation accuracies for query selection with diﬀerent features.SPP stands for search performance prediction

Now we evaluate the query generation and selection approach For each QA pair,three queries are generated from the question, answer and the combination of ques-tion and answer As previously mentioned, ﬁve labelers participate in the groundtruth labeling process Each labeler selects the most informative one They areallowed to perform search on the web to compare the informativeness of search re-sults The ﬁnal ground truths are obtained by a majority voting The distribution

of the three classes is illustrated in Table 3.4

We adopt SVM with RBF kernel, and the parameters, including the radiusparameter and the weighting parameter that modulates the regularize term and theloss term, are established by 5-fold cross-validation Table 3.5 illustrates the clas-siﬁcation results From the results we can see that integrating POS histogram andsearch performance prediction can achieve better performance than using merelyPOS histogram and retrieval performance prediction The classiﬁcation accuracies

on the Y!A subset, WikiAnswers subset and the whole dataset are 72.19%, 76.38% and 74.47%, respectively The QA pairs in WikiAnswers are well semantically and

syntactically presented, which results in its higher accuracy in query generationand selection

Trang 24

3.8 Summary

This chapter studies the user tagging behaviours with a representative real-worlddataset, and presents a novel scheme to automatically annotate social questionswhich unravels the incomplete and biased problem of question tags For a givenquestion, the scheme ﬁrst constructs an adaptive probabilistic hypergraph to inferthe semantically similar question space Based on this question space, a collection

of probably relevant tags are roughly identiﬁed Comprehensive information cuesfrom users, questions and tags are seamlessly integrated into this hypergraph Thisstep is towards narrowing down the suggested tag candidates Our scheme thenperforms a heuristic approach to further ﬁlter the tag candidates by simultaneous-

ly damping generic tags, penalizing speciﬁc tags as well as rewarding closer tags

It aims to strengthen annotation by keeping oﬀ subjective, ambiguous and generictags The experimental results have demonstrated that our scheme achieves promis-ing performance for question annotation and greatly enhances ﬂexible knowledgeorganization

This work begins a new research direction to ﬂexibly organize QA knowledgeutilizing question annotation However, the current approach overlooks the descrip-tive terms extracted from QA pairs Also, this chapter only conducts a preliminaryanalysis of ontology generation Thus in-depth research remain to be investigated

Định dạng
Số trang	48
Dung lượng	7,55 MB