Another noteworthy issue is thatthe initial relevance probabilities of each question in Q to the given question q are estimated based on Eqn.3.5.. Table 3.2: Meta information of our data
Trang 1We first rank all the questions in Q in a descending order according to their
rel-evance to q which is estimated via our adaptive probabilistic hypergraph model.
We then select the top n questions to form the semantic space In this chapter,
the relevance estimation is viewed as a transductive inference problem [158, 150],formulated as a regularization framework,
arg min
f Φ(f) = arg min
where Ω(f) and R(f) denote the regularizer on the hypergraph and empirical loss,
respectively The number λ is a regularization parameter to balance the empirical
loss and the regularizer
Inspired by the normalized cost function of a simple graph [101, 157], Ω(f)
Trang 2is defined as,
12
v HWD−1 e HTD−
1 2
v , we can further derive that
∑
v ∈V
h(v, e) δ(e) −
where I is an identity matrix Let ∆ = I− Θ, which is a positive semi-definite
matrix, the so-called hypergraph Laplacian [158], then Ω(f) can be rewritten as,
For the loss term, after introducing a new vector y containing all the initially
estimated relevance probabilities, it is stated as a least square function,
By minimizing Φ(f), the first term guarantees that the relevance probability
func-tion is continuous and smooth in semantic space This means that the relevanceprobabilities of semantically similar questions should be close While the empiri-cal loss function forces the relevance probabilities to approach the initial roughlyestimated relevance scores These two implicit constraints are widely adopted inreranking-oriented approaches [128, 101]
Trang 3However, in the constructed hypergraph, the effects of hyperedges cannot
be treated on an equal footing, since they are generated from different angles, panning from semantical similarities between QA pairs, to tag sharing networks,and users’ social behaviours Even through all the hyperedges are initialized withreasonable weights based on local information, further globally adaptive refinementand modulation are still necessary Inspired by [150, 40], we extend the conven-
s-tional hypergraph to an adaptive one by integrating a two norm regularizer on W.
Therefore, Eqn.(3.7) is restated as,
where µ is a positive parameter For model simplicity, all the entries in W are
confined to be non-negative, and add up to 1 We alternatively optimize f and W.
First, W is fixed and partial derivatives with respect to f are taken on the
objective function We have
f = (1− η)(I − ηΘ) −1 y, (3.13)
where η = 1+λ1 Next we fix f and optimize W with the help of Lagrangian, which
is frequently utilized in the optimization problems [40] The objective function istransformed into,
v H Replacing W in Eqn.(3.14) with Eqn.(3.15), and taking
derivatives with ξ, we obtain,
Trang 40 50 100 150 200 250 300 15
Number of Unique Tags
Figure 3.4: The tag frequency distribution with respect to the number of distincttags over our large-scale dataset Obviously, it follows a power law distribution
In the whole iterative process, we alternatively update f and W Each step decreases the objective function Φ(f) whose lower bound is zero Therefore, con-
vergence of our scheme is guaranteed [150, 40] Another noteworthy issue is thatthe initial relevance probabilities of each question in Q to the given question q are
estimated based on Eqn.(3.5).
It is intuitive that the conventional simple graph is a special case of hypergraph,where all hyperedges have degree two and represent only pairwise relationships Tofurther investigate the learning approaches based on these two kinds of graphs, wedevelop a regularization framework Φs (f) for simple graph,
12
The first term is the normalized cost function controlling the smoothness, where
D is a diagonal matrix with its (i, i)-element equal to the sum of the i-th row of
Trang 5Figure 3.5: The illustrative instance of semantically similar questions sharing sametags.
the affinity matrix W Let Θ s = D−1WD−1, the simple graph Laplacian can bedenoted as ∆s = I− Θs It can be shown that the first term is equivalent to fT∆s f
which is similar to the regularizer on the hypergraph in Eqn.(3.10) Analogous to
the empirical loss function of hypergraph, the second term is utilized to constrainthe fitting, which means a good classifying function should not change too muchfrom the initial label assignment [157] Differentiating Φs (f) with respect to f, we
Trang 63.5 Relevant Tag Selection
Based on the first component, a tag space shared by the inferred question spacecan be generated effortlessly However, not all the roughly selected tag candidatesare able to well summarize the question content A heuristic tag relevance esti-mation approach is proposed in this section to further filter the tag candidates byintegrating multi-faceted cues Following that, the complexity of our scheme isanalyzed
According to our statistics, the tag frequency distribution in our dataset with spect to the number of distinct tags follows a power law as shown in Figure 3.4 Wefurther observe that the tags distributed in the head part of the power law tend to
re-be the phrases in high-level semantics, such as “technology”, “life”, t”, and so on They are too generic to be informative as tags On the other hand,the tail of the power law contains the tags with very low collection frequencies thatare usually extremely specific They are either unpopular abbreviations, personal-ized terms or informal spellings [76], such as “iSteve”, “WEBLOC”, etc Actually,these two phenomenons accord with our second assumption Moreover, it is alsofound that the closer the two questions semantically, the higher the probabilitiesthat the tags can be shared between them This again is coherent with our firstassumption The typical example is illustrated in Figure 3.5
“entertainmen-The foregoing analysis strongly suggests that the tag relevance estimationshould simultaneously damp generic tags, penalize specific tags as well as rewardtags from semantically closer questions It is formally stated as,
Score(q, t s ) = I(t s)× S(Q s , t s)× C(q, t s ), (3.20)
where q is the to be annotated question and t s is a tag from the inferred tag space
Trang 7where o(t s ) refers to the occurrence frequency of tag t s in the entire data collection.
The second term measures the stability of tags, written as,
and family members Then the popularity of tag t s in the family is estimated byaveraging the voting from all family members Practically, if different communityparticipants annotate more distinct questions from the same semantically similarspace using the same tags, these tags are more likely to reflect the objective aspects
of the semantic content and they are more reliable than tags with very lower lection frequencies Through the algorithm, unambiguous and objective tags thatreceive most neighbor voting will stand out
col-The last term in Eqn.(3.20) analyzes the tag relevance from the perspective
of its owners’ semantical closeness to q, stated as,
Trang 8Table 3.2: Meta information of our data collection.
User Num Question Num Answer Num Tag Num Distinct Tag Num
The computational complexity of our scheme mainly comes from three parts: (1)feature extraction (including both of questions and answers); (2) adaptive proba-bilistic hypergraph learning; and (3) the heuristic approach for tag selection Un-doubtedly, feature extraction is the most computationally expensive step, but can
be handled off-line Actually, the complexity of the relevant tag selection can be nored due to the smaller size of tag candidates inferred by our first component Forthe proposed hypergraph learning, the computational cost magnitude is analyzedas
ig-O(
t(E3+ 2V E2+ 2EV2+ V3) + dV2)
where t is the time of iterations and is usually below 10 in our work d stands for
the 29802-dimensional features The sizes of considered vertices and hyperedges
are respectively denoted as V and E, both in the order of thousands if we only truncate the top 1K questions based on the initial relevance probabilities Thus the
computational cost is very low In our experiments, the process can be completed
within 2 seconds(3.4GHz and 8G memory).
To collect relevant image and video data from the web, we need to generate priate queries from text QA pairs before performing search on multimedia searchengines We accomplish the task with three steps The first step is query extrac-tion Textual questions and answers are usually complex sentences But frequently
Trang 9appro-search engines do not work well for queries that are long and verbose Therefore,
we need to extract a set of informative keywords from questions and answers forquerying The second step is query selection This is because we can generate dif-ferent queries: one from question, one from answer, and one from the combination
of question and answer Which one is the most informative depends on the QApairs For example, some QA pairs embed the useful query terms in their questions,
such as “What did the Globe Theater look like” Some hide the helpful keywords in their answers, such as the QA pair “Q: What is the best computer for 3D art; A:
Alienware brand computer ” Some should combine the question and the answer to
generate a useful query, such as the QA pair “Q: Who is Chen Ning Yang’s wife; A:
Fan Weng ”, for which both “Chen Ning Yang” and “Fan Weng” are informative
words (we can find some pictures of the couple, and only using “Fan Weng” tosearch will yield a lot of incorrect results)
For each QA pair, we generate three queries First, we convert the question
to a query, i.e., we convert a grammatically correct interrogative sentence into one ofthe syntactically correct declarative sentences or meaningful phrases We directlyutilize the method in [5] Meanwhile, the generated query is expanded with thesuggested tags if they are visual phrase [103] Second, we identify several keyconcepts from verbose answer which will have the major impact on effectiveness.Here we employ the method in [15] Finally, we combine the two queries that aregenerated from the question and the answer respectively Therefore, we obtainthree queries, and the next step is to select one from them
The query selection is formulated as a three-class classification task, since
we need to choose one from the three queries that are generated from the tion, answer and the combination of question and answer We adopt the followingfeatures:
ques-1 POS Histogram POS histogram reflects the characteristic of a query Using
Trang 10POS histogram for query selection is motivated by several observations Forexample, for the queries that contain a lot of complex verbs it will be difficult
to retrieve meaningful multimedia results We use POS tagger to assignpart-of-speech to each word of both question and answer Here we employthe Stanford Log-linear Part-Of-Speech Tagger and 36 POS are identified2
We then generate a 36-dimensional histogram, in which each bin counts thenumber of words belonging to the corresponding category of part-of-speech
2 Search performance prediction This is because, for certain queries, existingimage and video search engines cannot return satisfactory results We adopt
the method introduced in Section 3.3.3, which measures a clarity score for
each query based on the KL divergence between the query and collection guage models We can generate 6-dimensional search performance predictionfeatures in all (note that there are three queries and search is performed onboth image and video search engines)
lan-Therefore, for each QA pair, we can generate 42-dimensional features Based
on the extracted features, we train an SVM classifier with a labeled training set forclassification, i.e., selecting one from the three queries And the last step is queryexpansion, which expands the selected query with suggested tags
Our dataset for query generation comes from multiple resources For the first
subset, we randomly collect 5, 000 questions and their corresponding answers from
2 They are: RB, DT, RP, RBR, RBS, LS, VBN, VB, VBP, PRP, MD, SYM, VBZ, IN, VBG, POS, EX, VBD, LRB, UH, NNS, NNP, JJ, RRB, TO,JJS, JJR, FW, NN, NNPS, PDT, WP, WDT, CC, CD, and WRB.
Trang 11WikiAnswers For the second subset, we randomly collect 5, 000 questions and their best answers from the dataset used in [124], which contains 4, 483, 032 questions
and their answers from Y!A Here we use the best answer that is determined by theasker or the community voting3
When it comes to the evaluation on question annotation, a large real-worlddataset was created based on Zhihu, who officially announced that it owned ap-
proximately 300K users as at March 20124 Our dataset was collected in July
2012, comprising more than 105K connected users and all their associated data,
such as asked questions, replied answers, and the social connections among users
It accounts for a large fraction of the whole website and hence is comparativelyrepresentative for statistical analytics Table 3.2 displays the meta information ofour data collection
For ground truth labeling (including the ground truths for question tation and query generation), the five volunteers that have been involved in thelabeling task of [102] were involved again, including two Ph.D students and onefaculty in computer science, one master student in information system, and onesoftware engineer The labelers are trained with a short tutorial and a set of typi-cal examples We need to admit that the ground truth labeling is subjective But
anno-a manno-ajority voting anno-among the five lanno-abelers canno-an panno-artianno-ally anno-allevianno-ate the problem
We first analyze the dataset for query generation from Y!A and WikiAnswers spired by [69, 44], we classify all the questions into two categories: conversationaland informational Conversational questions usually only seek personal opinions
In-3 There are also many research efforts on ranking community-contributed answers or selecting the best answer by machine learning and NLP technologies [124, 45, 4] These methods can also been integrated with our work and we only need to change the best answer for each question.
4 http://tech.sina.com.cn/i/2012-03-16/16476844824.shtml
Trang 12or judgments, such as “Anybody watch the Bears game last night ”, and
informa-tional questions are asked with the intent of getting information that the askers
hopes to learn or use via fact-oriented answers, such as “What is the population of
Singapore” There are several automatic algorithms for the categorization of
con-versational and informational questions, such as the work in [69] and [44] But since
it is not the focus of our work, we perform the categorization with human labeling
To be specific, each question is labeled by at least two volunteers independently
In the case that the first two volunteers have different decisions about the questiontype, we solicit two additional volunteers to label this question again The questionwill be viewed as ambiguous if the four voters cannot come to a majority classifi-cation It is worth noting that each volunteer was trained with the question typedefinition as well as corresponding examples before labeling This question type
labeling process is analogous to [44] In this way, we extract 3, 333 informational questions from the Y!A subset and 4, 000 from the WikiAnswers set The QA pairs
in our dataset cover a wide range of topics, including travel, life, education, etc.Query selection needs to learn classifiers based on several training data, and thus
we split the 7, 333 QA pairs into two parts, a training set that contains 5, 866 QA pairs and a testing set of the remaining 1, 467 QA pairs The testing set consists
of 800 QA pairs from WikiAnswers and 667 from Y!A Classification models are
trained with the whole training set, i.e., 5, 866 QA pairs They are tested on the
800 QA pairs from WikiAnswers, 667 QA pairs from Y!A, or the both
We then perform statistical analytics on the dataset from Zhihu for questionannotation Based on table 3.2, we can easily calculate that the number of tags
per question and the number of unique tag occurrences on average is 2.48 and 16.9,
respectively These two values explicitly reveal the tag incompleteness problem andhigh rate of tag repeating utilization, correspondingly The reuse of tags demon-strates the rationality to select appropriate tags from the inferred tag vocabulary
Trang 130 50 100 150 200 0.0
Number of Social Connections
Figure 3.6: The distribution of the number of users with respect to the number ofsocial connections
Meanwhile, the repeating times determine the size of tag-based hyperedges, i.e.,
around 16.9 questions on average are grouped by one tag-based hyperedge
Al-so, this finding suggests that we only need consider 17 nearest neighbours whenconstruct the QA-based hyperedges
Figure 3.6 shows the distribution of the number of users with respect tofollowees and followers, respectively Both of them comply with power law distri-butions except two bottom-left points that refer to hundreds of users either have
no followers or have no followees Besides, it is derived that the average followersper user, 44, is relatively larger than the average followees per user, 28 This is why
we chose information from followees to construct the user-based hyperedges, i.e.,keeping the simplicity of our hypergraph Also, Figure 3.7 shows the distribution
of the number of users over the categorical posts, including questions and answers
This figure provides conclusive evidence that more than 1/2 users are not active,
who never ask or answer From the angle of statistics, community participants seem
to prefer answering (8.53 answers per user and 4.12 answers per question) to asking
Trang 140 1 2 3 4 5 6 >=7 0
Figure 3.7: The distribution of the number of users with respect to the number ofposts
(2.07 questions per user) Jointly analyzing these basic statistical data, another
important evidence can be inferred is the average size of user-based hyperedges: 60questions on average are gathered together by each user-based hyperedge
To represent the content of each QA pair, we first eliminated the questionswithout any tag We then performed sentence segmentation for all remaining QA
pairs with the Stanford Parser [26] and obtained more than 200K chunks After
removing stop words and filtering the chunks with frequencies smaller than 5, webuilt a 29802-dimensional bag-of-chunks histogram for each QA pair Meanwhile,
we randomly selected 50 questions as testing data
To evaluate the ranking based question space inference, we adopted NDCG@n as
our metric,
Trang 15N DCG@n = rel1+
∑n i=2
• PRF: Pseudo-Relevance Feedback [143] A support vector machine (SVM)
classifier was trained to perform the reranking based on the assumption thatthe top ranked questions are more relevant than the low-ranked results in
general The initial question ranking list was generated based on Eqn.(3.5).
(Baseline 1)
• RW: Random walk based reranking [53] is a typical simple graph-based
r-eranking method jointly exploiting both initial relevance probabilities andsemantically similarity between questions The stationary probability of ran-dom walk was used to compute the final relevance scores The initial relevance
probabilities of each question was estimated based on Eqn.(3.5) (Baseline 2)
• CHL: Conventional hypergraph learning [54] The weights of different
hy-peredges were not dynamically learned which are fixed according to initial
estimation as described in Section 6.4.1 (Baseline 3)
• APHL: Our proposed adaptive probabilistic hypergraph learning approach
with alternative optimization between W and f.
For each method mentioned above, the involved parameters were carefully tuned,and the parameters with the best performances were used to report the final com-parison results Meanwhile, the ground truth for these four strategies was created
Trang 165 10 20 30 50 0.60
Figure 3.8: Performance comparison of different reranking based question spaceinference approaches in terms of NDCG at different depths
by a manual labelling procedure through a pooling method Specifically, each ing question has a pool that was constructed by merging four top 50 semanticallysimilar questions recommended based on each strategy Then five human annota-tors with diverse backgrounds were invited to label all the questions pool by pool.Each question was assigned to be very relevant (score 2), relevant (score 1) or ir-relevant (score 0) with respect to the given question We performed a voting toestablish the final relevance level of each question For the cases that there weretwo classes having the same number of ballots, a discussion was carried out amongthe labelers to decide the final ground truths
test-Figure 3.8 illustrates the experimental results From this figure, our servation confirms that the proposed approach consistently and substantially out-performs other current publicly disclosed state-of-art reranking algorithms acrossvarious depth of NDCG Among these four methods, the two hypergraph-basedlearning approaches show superiority over other two approaches One possible rea-son is the unreliable initial ranking list resulted by rough estimation The other
Trang 17ob-main reason is hypergraph-based learning is able to capture the high-order ships among questions, i.e., the summarized local grouping information, in contrast
relation-to simple pairwise relationships characterized by other two approaches From thisfigure, we can also observe our proposed method performs stably better than theconventional hypergraph learning approach This well supports the fact that it isbetter to simultaneously learn the question relevance score and hyperedge weights
It is well known that for the annotation task, precision is usually more importantthan recall Therefore, we adopted two metrics that are able to capture precisions
from different aspects The first one is average S@K over all testing questions, which measures the probability of finding a relevant tag among the top K recom- mended tags To be specific, for each testing question, S@K is assigned 1 if a relevant tag was ranked in the top K positions and 0 otherwise The second one is average P @K that stands for the proportion of recommended tags that is relevant.
Table 3.3 presents the precision of relevant tag selection based on the above
two metrics It is observed that the performance in terms of S@1 is as high as 70%
that means for up to 70% questions, our proposed annotation scheme can suggest
a relevant tag at rank 1 Moreover, the value of S@5 almost ensures that at least
Trang 18Table 3.3: The evaluation results of relevant tag selection in terms of differentmetrics.
one tag is relevant among the top 5 recommended tags Besides, P @5 achieves
58% accuracy that reflects about 3 out of the top 5 tags on average are able tocharacterize the question topics well From the view of efficacy, the performances
of S@K and P @K unquestioningly confirm the high applicability of our proposed
method in tag suggestion with human-computer interaction and automatic tagannotation without human interference, respectively
As discussed above, the two positive parameters λ and µ play important roles in
modulating the effects of empirical loss and weighting regularizer, respectively The
Trang 19former is widely tuned in hypergraph learning algorithms While for the latter, withvariation from zero to infinity, the hyperedge weights will accordingly vary from anextremely balanced case to an extremely imbalanced case [150] Specifically, when
µ = ∞, the proposed adaptive hypergraph will be reduced to conventional
hyper-graph, since the optimal solution will assign identical weights for all hyperedges
On the contrary, if µ tends to zero, then the optimal results will be that only one
weight is 1 and all others are 0
In this section, we conducted a series of experiments to investigate the tivity of these two parameters We first performed grid search with flexible step size,
sensi-to seek the λ and µ with optimal reranking performance in terms of NDCG@20.
200 and 0.001 were respectively located for λ and µ The NDCG@20-λ curve is presented in Figure 3.9 with µ fixed as 0.001 As illustrated, the performance grad- ually increases with λ growing and arrives at a peak at a certain value, then the
performance goes downward and finally becomes relatively stable Similarly,
Fig-ure 3.10 shows the NDCG@20-µ curve with λ fixed as 200, where the performance varies according to different µ With the increase of µ, more informative hyper-
edges are taken into consideration via updating the weights from zero to nonzero
It is also observed that when µ reach a certain value, the performance starts to
decrease This is due to more “incorrect” hyperedges are potentially introduced.However, based on the observations, we conclude that the performance of our pro-
posed method changes between (0.765, 0.803) when the parameters vary in a wide
range, which is not very sensitive
3.7.6.1 Application Scenario
Since the social QA content naturally and quickly evolves over time, informationseekers are usually overwhelmed by the huge amount of information routinely re-
Trang 200 1/16 1/8 1/4 1/2 1 2 4 8 16 x0.001 0.68
a table of contents like navigational hierarchy, depicting the topical relationshipswithin the data archive would be more effective In fact, some conventional cQAforums have partially provided taxonomy-based navigation, such as Y!A and BaiduKnows However, these predefined taxonomies are fixed and only support singletopic per question, which does not well coincide with the inherent features of QAknowledge
On the other hand, the invention of question tagging introduces an intuitiveand easy way for QA contents organization In this chapter, we propose a method
to automatically organize the set of QA pairs into various knowledge structures.The types of structures flexibly depend on user needs, which is essential to assistinformation retrieval and user navigation
Trang 21Figure 3.11: The question distribution over our selected topic taxonomy Only topfive dominant categories are illustrated.
3.7.6.2 Experiments
Supposing the hierarchical structure of Y!A is the desired ontology tree, containing
1263 leaf-level nodes distributed in 26 top-level categories For each tag in ourtag vocabulary, we first directly map it to one leaf node by tree search algorithmbased on their semantic relatedness Then the associated QA pairs with tag ex-pansion automatically fall into the corresponding leaf nodes Here our proposedquestion annotation scheme will enlarge the tag number of each question up to 5 if
it originally not reach this threshold
The semantic relatedness between two concepts is usually estimated based onpath length according to well-structured corpus such as WordNet However, suchmethod is not robust to web phrases due to their free forms, “ipad”, for example.Inspired by Google distance [31], we measure the semantic distance between tagand category name via exploring their co-occurrence on Google,
d(t i , t j) = max(log r(t i ), log r(t j))− log r(t i , t j)
log G − min(log r(t i ), log r(t j)) , (3.27)
where G is the total number of documents retrieved from Google r(t i) is the
Trang 22Table 3.4: The Ground truth distribution for Query selection.
number of hits for search concepts t i , and r(t i , t j) is the number of web documents
on which both t i and t j co-occur Then their semantic relevance is defined as
S(t i , t j ) = exp ( −d(t i , t j )) , (3.28)
The collected question distribution over top-level categories is partially lustrated in Figure 3.11, which is normalized on 26 categories It is observed that
il-more than 1/5 participants are active in “computer&internet” category This result
is consistent with the Zhihu’s operation mechanism that it invited lots of expertsfrom computer science community in the start up stage
Finally, we conducted a user study to further evaluate the performance ofknowledge structure generation First, 50 questions were randomly selected Afterannotation, their tag set was almost doubled in size And then each of them wasshowed to 5 assessors together with their assigned leaf-level categories based onour proposed structure generation method Assessors independently judged eachassigned category with “correct” if the category can well capture the question topic.Otherwise, they were encouraged to mark the assigned category with “incorrect”
We obtained that around 76% of assigned categories were labeled as correct onaverage However, without the expanded tag set, it is not possible to come upwith such comprehensive structure, since the tag incomplete problem is extremelyserious as previously mentioned
Trang 23Table 3.5: The classification accuracies for query selection with different features.SPP stands for search performance prediction
Now we evaluate the query generation and selection approach For each QA pair,three queries are generated from the question, answer and the combination of ques-tion and answer As previously mentioned, five labelers participate in the groundtruth labeling process Each labeler selects the most informative one They areallowed to perform search on the web to compare the informativeness of search re-sults The final ground truths are obtained by a majority voting The distribution
of the three classes is illustrated in Table 3.4
We adopt SVM with RBF kernel, and the parameters, including the radiusparameter and the weighting parameter that modulates the regularize term and theloss term, are established by 5-fold cross-validation Table 3.5 illustrates the clas-sification results From the results we can see that integrating POS histogram andsearch performance prediction can achieve better performance than using merelyPOS histogram and retrieval performance prediction The classification accuracies
on the Y!A subset, WikiAnswers subset and the whole dataset are 72.19%, 76.38% and 74.47%, respectively The QA pairs in WikiAnswers are well semantically and
syntactically presented, which results in its higher accuracy in query generationand selection
Trang 243.8 Summary
This chapter studies the user tagging behaviours with a representative real-worlddataset, and presents a novel scheme to automatically annotate social questionswhich unravels the incomplete and biased problem of question tags For a givenquestion, the scheme first constructs an adaptive probabilistic hypergraph to inferthe semantically similar question space Based on this question space, a collection
of probably relevant tags are roughly identified Comprehensive information cuesfrom users, questions and tags are seamlessly integrated into this hypergraph Thisstep is towards narrowing down the suggested tag candidates Our scheme thenperforms a heuristic approach to further filter the tag candidates by simultaneous-
ly damping generic tags, penalizing specific tags as well as rewarding closer tags
It aims to strengthen annotation by keeping off subjective, ambiguous and generictags The experimental results have demonstrated that our scheme achieves promis-ing performance for question annotation and greatly enhances flexible knowledgeorganization
This work begins a new research direction to flexibly organize QA knowledgeutilizing question annotation However, the current approach overlooks the descrip-tive terms extracted from QA pairs Also, this chapter only conducts a preliminaryanalysis of ontology generation Thus in-depth research remain to be investigated