1. Trang chủ
  2. » Luận Văn - Báo Cáo

(LUẬN văn THẠC sĩ) a vietnamese text based conversational agent

58 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề A Vietnamese Text-based Conversational Agent
Tác giả Nguyen Quoc Dai, Dat Quoc Nguyen, Son Bao Pham
Người hướng dẫn Dr. Pham Bao Son
Trường học University of Engineering and Technology
Chuyên ngành Computer Science
Thể loại thesis
Năm xuất bản 2011
Thành phố Hanoi
Định dạng
Số trang 58
Dung lượng 827,31 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

163.2 Architecture of the natural language question analysis componentusing FrameScript.. Natural language question analysis component is the first component in any question answering sy

Trang 1

Nguyen Quoc Dai

Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi

Supervised by

Dr Pham Bao Son

A thesis submitted in fulfillment of the requirements

for the degree of Master of Science in Computer Science

November 2011

Trang 3

or diploma at University of Engineering and Technology (UET/Coltech) or any other educational institution, except where due acknowledgement is made in the thesis Any contribution made to the research by others, with whom I have worked at UET/Coltech

or elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’

Hanoi, November 23rd, 2011

Signed

Trang 4

tems Nevertheless, in existing rule-based approaches, manually creating the rules iserror-prone and expensive in time and effort In this thesis, we focus on introduc-ing a rule-based approach that offers an intuitive way to create compact rules forextracting intermediate representation of input questions Experimental results arepromising where our system achieves reasonable performance and demonstrate that

it is straightforward to adapt to new domains and languages

More importantly, this thesis introduces a Vietnamese text-based conversational agentarchitecture on specific knowledge domain which is integrated in a question answer-ing system When the question answering system fails to provide answers to userinput, our conversational agent can step in to interact with users to provide answers

to users Experimental results are promising where our Vietnamese text-based versational agent achieves positive feedback in a study conducted in the universityacademic regulation domain

con-Publications:

? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham A Vietnamese Text-based sational Agent In Proc of The 25th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems ( IEA/AIE 2012 ), Springer-Verlag LNAI, pp 699-708.

Conver-? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham A Semantic Approach for tion Analysis In Proc of The 25th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems ( IEA/AIE 2012 ), Springer-Verlag LNAI, pp 156-165.

Ques-? Dat Quoc Nguyen, Dai Quoc Nguyen and Son Bao Pham Systematic Knowledge Acquisition for Question Analysis In Proc of the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011), ACL Anthology, pp 406-412.

Trang 5

? Dai Quoc Nguyen, Dat Quoc Nguyen, Khoi Trong Ma and Son Bao Pham Automatic tology Construction from Vietnamese text In Proceedings of the 7th International Conference on Natural Language Processing and Knowledge Engineering (NLPKE’11), IEEE, pp 485-488.

On-? Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham and Dang Duc Pham Ripple Down Rules for Part-Of-Speech Tagging In Proc of 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2011), Springer-Verlag LNCS, part I, pp 190-201.

? Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham A Vietnamese question answering system In Proceedings of the 2009 International Conference on Knowledge and Systems Engineer- ing (KSE 2009) , IEEE CS, pp 26–32.

Trang 6

First and foremost, I would like to express my deepest gratitude to my supervisor,

Dr Pham Bao Son, for his patient guidance and continuous support throughout theyears He always appears when I need help, and responds to queries so helpfully andpromptly

I would like to give my honest appreciation to my younger brother, Nguyen QuocDat, for his great support

I would like to specially thank Prof Bui The Duy and my colleagues for their helpthrough my time at Human Machine Interaction Laboratory, UET/Coltech

I sincerely acknowledge the Vietnam National University, Hanoi, Toshiba tion Scholarship, and especially Dr Pham Bao Son for supporting finance to mymaster study

Founda-Finally, this thesis would not have been possible without the support and love of

my mother and my father Thank you!

Trang 7

To my family ♥

Trang 8

Table of Contents

1.1 A Semantic Approach for Question Analysis 1

1.2 A Vietnamese Text-based Conversational Agent 2

1.3 Thesis Organisation 3

2 Literature review 4 2.1 Text-based conversational agents 4

2.1.1 Using keywords for pattern matching 4

2.1.2 Using the sentence similarity measure for pattern matching 7

2.2 FrameScript Scripting Language 9

2.3 Question answering systems 12

3 Our Question Answering System Architecture 15 3.1 Vietnamese Question Answering System 15

3.1.1 Natural language question analysis component 16

3.1.1.1 Intermediate representation of an input question 16

3.1.1.2 Question analysis 17

3.1.2 Answer retrieval component 18

3.2 Using FrameScript for question analysis 19

3.2.1 Preprocessing module 19

3.2.2 Syntactic analysis module 20

3.2.3 Semantic analysis module 22

4 Text-based Conversational Agent for Vietnamese 24 4.1 Overview of architecture 24

4.2 Determining separate contexts 25

4.3 Identifying hierarchical contexts 27

Trang 9

TABLE OF CONTENTS vii

5 Evaluation and Discussion 295.1 Experimental results

for Vietnamese text-based conversational agent 295.2 Question Analysis for English 315.3 Discussion 33

A Scripting patternsfor English question analysis 36

B Definitions of question-class types 38

C Definitions of question-structures 40

Trang 10

List of Figures

2.1 O’Shea et al.’s conversational agent framework 72.2 Aqualog’s architecture 143.1 Architecture of our question answering system 163.2 Architecture of the natural language question analysis componentusing FrameScript 194.1 Architecture of our Vietnamese text-based conversational agent 25

Trang 11

List of Tables

4.1 Script examples of “subjects” 26

4.2 Transformations between contexts 27

4.3 Order of transformation rules 28

4.4 Ordered transformation between contexts 28

5.1 List of transformations among contexts 30

5.2 Unsatisfying analysis 30

5.3 The satisfied degree of students 31

5.4 Number of rules corresponding with each question-structure type 31

5.5 Number of rules with conditional responses 32

5.6 Number of questions corresponding with each question-structure type 32 5.7 Error results 32

Trang 12

NLIDB Natural Language Interface to DataBasePOS Part-of-Speech

NLP Natural Language ProcessingGUI Graphic User Interface

Trang 13

Chapter 1 Introduction

The goal of question answering systems is to give answers to the user’s questionsinstead of ranked lists of related documents as used by most current search engines(Hirschman and Gaizauskas, 2001) Natural language question analysis component

is the first component in any question answering systems This component creates

an intermediate representation of the input question, which is expressed in naturallanguage, to be utilized in the rest of the system

For the task of translating a natural language question into an explicit ate representation of the complexity in question answering systems, most publishedworks so far use rule-based approach to the best of our knowledge Some questionanswering systems such as (Lopez et al., 2007; Phan and Nguyen, 2010) manuallydefined a list of sequence rule structures to analyze questions However, in theserule-based approaches, manually creating the rules is error-prone and expensive intime and effort

intermedi-In this thesis, we present an approach to return an intermediate representation

of question via FrameScript scripting language (McGill et al., 2003) Natural guage questions will be transformed into intermediate representation elements whichinclude the construction type of question, question class, keywords in question andsemantic constraints between them Framescript allows users to intuitively writerules to directly extract the output tuple

Trang 14

lan-2 Chapter 1 Introduction

A text-based conversational agent is a program allowing the conversational actions between human and machine by using natural language through text Thetext-based conversational agent uses scripts organized into contexts comprising hier-archically constructed rules The rules consist of patterns and associated responses,where the input is matched based on patterns and the corresponding responses aresent to user as output

inter-We focus on the analysis of input text in building a conversational agent cently, the input analysis over user’s statements have been developed following twomain approaches: using keywords (ELIZA (Weizenbaum, 1983), ALICE (Wallace,

Re-2001), ProBot (Sammut,2001)) and using similarity measures (O’Shea et al.,2010;Graesser et al., 2004; Traum, 2006) for pattern matching The approaches usingkeywords usually utilize a scripting language to match the input statements, whilethe other approaches measure the similarity between the statements and patternsfrom the agent’s scripts

In this thesis, we introduce a Vietnamese text-based conversational agent chitecture on a specific knowledge domain Our system aims to direct the user’sstatement into an appropriate context The contexts are structured in a hierarchy ofscripts consisting of rules in FrameScript language (McGill et al.,2003) In addition,our text-based conversational agent was constructed to integrate in a Vietnamesequestion answering system Our conversational agent provides not only informationrelated to user’s statement but also provides necessary knowledge to support ourquestion answering system when it is unable to find an answer

ar-The knowledge domain we used to build our text-based conversational agent isthe academic regulation at Vietnam National University, Hanoi (VNU) The aca-demic regulation book helps students to know the course programs, the regulation ofexaminations, the discipline at VNU However, most students don’t prefer readingthe academic regulation book Therefore, our contribution creates an interactionchannel to offer the necessary information to students Once students give theirstatements that they are interested in the academic regulation, our text-based con-versational agent responses these statements by providing the related information indetail Furthermore, our conversation agent also interacts with students by offeringthe option to ask if students want to know other information

Trang 16

Chapter 2 Literature review

In this chapter, we review related works using text-based approaches for tional agent (CA) Section2.1 describes the approaches constructing rules to matchuser’s natural language utterances in the ways of using keywords (in section 2.1.1)and using a sentence similarity measure (in section 2.1.2) In addition, section 2.2covers the basic knowledge background about FrameScript scripting language that

conversa-we have been working on, while section 2.3 presents reviews about the questionanswering systems driving specific-domains

ELIZA (Weizenbaum,1983) was one of the earliest text-based conversational agentsbased on a simple pattern matching by using the identification of keywords fromuser’s statement Then ELIZA transforms the user’s statement to an appropriaterule and generates output response The procedure that ELIZA responds to an userinput to give an appropriate output consists of five steps

• Identify the important keywords appearing in user’s statement

• Define some minimal context within which selected keyword occurs

• Determine an appropriate transformation rule

• Generate the responses when the input text contained no keywords

Trang 17

2.1 Text-based conversational agents 5

• Provide a facilitate editing for scripts on the script writing level

Transformation rules are used to serve decomposing a data string according tocertain criteria and reassembling a decomposed string according to certain assemblyspecifications Therefore, the input are analyzed based on the decomposition rulestriggered by keywords, and responses are generated against the reassembly rulesassociated with selected decomposition rules For example, encountering the inputsentence:

“It seems that you like me”

this sentence is decomposed into the four parts:

(1) It seems that (2) you (3) like (4) me

by using the decomposition rule:

(0 YOU 1 ME)The associated response might then be:

“What makes you think I like you”

by using the reassembly rule:

(WHAT MAKES YOU THINK I 3 YOU)

An integer 0 in the decomposition rule will match more words and a non-zero integer

“n” appearing in a decomposition rules indicates that exactly “n” words will bematched, while an integer 3 in the above reassembly rule shows that the third part

of the decomposed sentence is inserted in its place to reply the input sentence Ifeach word is defined in a dictionary of keywords by scanning an input sentence fromleft to right, then only decomposition rules containing that keyword need to be tried

An ELIZA script consists mainly of a set of list structures as following:

(K ((D1) (R1, 1) (R1, 2) (R1, m1))((D2) (R2, 1) (R2, 2) (R2, m2))

((Dn) (Rn, 1) (Rn, 2) (Rn, mn)))where K is the keyword, Di the i th decomposition rule associated with K and Ri, j

the j th reassembly rule associated with the i th decomposition rule Any number

of decomposition rules may be associated with a given keyword and any number ofreassembly rules with any specific decomposition rule since having no predeterminedordering limitations

Trang 18

6 Chapter 2 Literature review

ALICE (Wallace, 2001) is a text-based conversational agent as chat robot lizing an XML language called Artificial Intelligence Markup Language (AIML).AIML files consist of category tags representing rules; each category tag contains apair of pattern and template tag The entire category is stored in a tree The systemsearches the pattern according with an user input by using depth-first search in thetree, and produces the appropriate template as a response For example, a categorybelow:

<topic name=“MOVIES”>

<category>

<pattern>YES</pattern>

<that>DO YOU LIKE ROMANTIC MOVIES</that>

<template>What is your favourite romantic movie?</template>

</category>

<category>

<pattern>YES</pattern>

<that>DO YOU LIKE ACTION MOVIES</that>

<template>What is your favourite action movie?</template>

</category>

When the client says yes, the program must discover the robot’s previous utterance

If the robot asked “Do you like romantic movies?”, the response sent to reply is

“What is your favourite romantic movie?”

AIML is clever and simple, and easy for implementation and a good start forbeginners writing simple bots However, it is difficult to write and debug more

Trang 19

2.1 Text-based conversational agents 7

discriminating patterns, and it is very hard to know all the transformations availablebecause AIML depends on self-modifying the input

Sammut (Sammut, 2001) presented a text-based CA called ProBot that is able

to extract data from users ProBot’s scripts are typically organized into cal contexts consisting of a number of organized rules to handle unexpected inputs.Concurrently, McGill et al (McGill et al.,2003) derived from ProBot’s scripts (Sam-mut,2001) build the rule system in FrameScript scripting language (in section 2.2).FrameScript (McGill et al., 2003) provides for the rapid prototyping of conversa-tional interfaces and simplifies the writing of scripts

for pattern matching

O’Shea et al (O’Shea et al.,2008,2010) proposed a text-based conversational agentframework (shown in figure2.1) using semantic analysis All patterns in scripts arethe natural language sentences The pattern matching uses a sentence similaritymeasure (Li et al., 2006) to calculate the similarity between sentences from scriptsand user input The highest ranked sentence is selected and its associated response

is sent as output

Figure 2.1: O’Shea et al.’s conversational agent framework

Scripts used in framework consist of contexts relating to a specific topic of sation Each context contains one or more rules, and each rule uses “s” to represent

Trang 20

conver-8 Chapter 2 Literature review

a natural language sentence and “r” to represent a response statement For example,considering a following rule:

<Rule_01>

s: I’m a studentr: Which university do you study?

With a user’s statement:

“I am a master student” or

“I am a phd student”

This input and the natural language sentences from the scripts are received in order

to send the sentence similarity measure Then sentence similarity measure calculates

a firing strength for each sentence pair to rank the sentences In this above example,the highest ranked sentence selected is “I’m a student” and its associated responsesent to user is “Which university do you study?”

The advantages of using a sentence similarity measure for pattern-matching isthat rule structures are simplified and reduced in size and complexity By contrast,this approach can’t retrieve some information from an input to insert into responselike using keywords for presented section 2.1.1

Graesser et al (Graesser et al., 2004) presented a conversational agent calledAUTOTUTOR matching input statements in the use of Latent Semantic Analysis.Traum (Traum, 2006) adapted the effective question answering characters (Leuski

et al.,2006) to build a conversational agent also employing Latent Semantic Analysisfor pattern matching

Trang 21

2.2 FrameScript Scripting Language 9

FrameScript (McGill et al., 2003) is a language for creating multi-modal user terfaces It employs from Sammut’s Probot (Sammut, 2001) to enable rule-basedprogramming, frame representations and simple function evaluation The Frame-Script scripting language also proposes a set of tools to represent knowledge andinteracting with users and external devices

in-Each script in FrameScript (McGill et al.,2003) includes a list of rules matchedagainst user input and used to give the appropriate response Rules are grouped intoparticular contexts of the form: context_name :: rule_set The scripting rules in theFrameScript language consist of patterns and responses with the form:

{response 1 | response 2 | | another response},

in which any response may be chosen randomly for user output

In addition, responses utilize the ‘#’ to perform some action such as ing the current context For example, #goto(a_script) transforms a conversation orinteraction from one context to another Similarly, ‘∧’ is used to perform actions, ex-

Trang 22

chang-10 Chapter 2 Literature review

cept that when the following expression is evaluated it is inserted into response notthrown away And some response expression may be dependent on some conditionsholding true in the constructed form below:

Furthermore, some pattern elements create a numbered match component when

a pattern matches These component are segments of the input that can be referred

to in a response using ‘∧’ Pattern elements that identify match components arewild-card (*, and ∼), alternatives and non-terminals When ‘∧’ is followed by aninteger then the numbered pattern component associated with that integer is placed

in the output response Encountering an example as following:

{My name is | I’m} * ==>

[ Hello∧2 How old are you? ]

I am <Number> years old ==>

[∧(∧1 <= 20) –> Are you a student?

| How do you do? ]

The transcript of dialogue is shown below illustrating the above example:

User: My name is XCA: Hello X How old are you?

User: I am 19 years oldCA: Are you a student?

An input received from user is given to a domain in order to ensure that the input

is matched against the correct scripts Script can be registered as topic in a domain

to become the current script and process the input When a script is registered as

a topic, the domain uses the script’s trigger to determine whether or not an inputactivates that topic If a topic doesn’t have a trigger, any input will activate it.When a topic’s trigger matches the input, it becomes the current context and thecurrent topic

Trang 23

2.2 FrameScript Scripting Language 11

Example ::

domain exampletrigger{* {Hi | hi | Hello | hello} *}

* {Hi | hi | Hello | hello} * ==> [Hi there!]

When writing complex scripts where scripts have similar behaviours, FrameScript

is possible to use inheritance to enable rule to be shared between scripts Moreover,FrameScript allows defining failsafes for scripts A failsafe is another script whoserules would be used if an input matches incorrectly any of rules for a script

The order in which domains attempt to determine rules that the input should

be matched is:

1 triggers of the topics

2 the current context

3 the failsafe of the current context

4 the current topic

5 the failsafe of the current topic

6 the failsafe for the domainWhen an input is compared to the rules of a script, the input is first compared tothe rules specifically defined by the script If none of these rules match, the input ismatched against the rules of the script’s parents The rules of the scripts are tried

in top to bottom order

Trang 24

12 Chapter 2 Literature review

Kinds of question answering systems range from closed-domain systems (aiming toanswer questions in a specific domain) to open-domain systems (aiming to answer all

of asked questions) In our experiment, the open-domain systems focus on retrievingand ranking related documents corresponding with the input, while the close-domainsystems focus on analysis natural language questions to extract reliable terms

Additionally, natural language question analysis component is the first nent in any question answering systems This component creates an intermediaterepresentation of the input question, which is expressed in natural language, to beutilized in the rest of the system The basis of the question parser is question clas-sification that can be defined as the task of mapping a given question to one of

compo-k classes based on the possible types of the answers (Li and Roth, 2002b) quently, natural language questions analysis techniques are used to identify keywordsand semantic relations in input questions

Subse-Therefore, our related works come from reviewing question answering systemsagainst the question analysis approaches in specific domain driven ones

Pattern-matching based systems

Close-domain question answering systems are usually linked to relational databasesand called natural language interfaces to databases A natural language interface to

a database (NLIDB) is a system that allows the users to access information stored in

a database by typing questions using natural language expressions (Androutsopoulos

et al.,1995)

Early NLIDB systems used pattern-matching technique to process user’s tions and generate corresponding answers (Sneiders,2002) presented a NLIDB sys-tem by using question patterns covering conceptual model of the database Theinput is converted into SQL query by using defined templates that contain entityslots – free space for data instances representing the primary concepts of the ques-tion Some other open-domain systems presented in (Wu et al.,2003;Saxena et al.,

ques-2007) used pattern-matching techniques to respond user’s requests

The main advantage of pattern-matching approach is its simplicity, and the tem can be able to perform well in certain applications However, the one’s shallow-ness would often lead to bad results

Trang 25

sys-2.3 Question answering systems 13

Semantic-based systems

Later NILDBs respond user’s question by using semantic grammar to parse theinput into syntax tree and mapping the tree to a database query In semantic-basedsystems, the grammar’s categories (i.e the non-leaf nodes appearing in the parsetree) have not to correspond to syntactic concepts (Androutsopoulos et al., 1995).Semantic constraints are usually enforced by choosing semantic grammar categories,

in which the grammar’s categories can also be chosen to ease the mapping from thesyntax tree to database objects

Nguyen and Le (Nguyen and Le,2008) introduced a NLIDB question answeringsystem in Vietnamese employing semantic grammars Their system includes twomain modules: QTRAN and TGEN QTRAN (Query Translator) maps a naturallanguage question to an SQL query while TGEN (Text Generator) generates answersbased on the query result tables QTRAN uses limited context-free grammars toanalyze user’s question into syntax tree via CYK algorithm The syntax tree isthen converted into an SQL query by using a mapping dictionary to determinenames of attributes in Vietnamese, names of attributes in the database and names

of individuals stored in these attributes

The PRECISE system (Popescu et al., 2003) maps the natural language tion to a unique semantic interpretation by analyzing some lexicons and semanticconstraints (Stratica et al., 2003) described a template-based system to translateEnglish question into SQL query by matching the syntactic parse of the question to

ques-a set of fixed semques-antic templques-ates Some other systems bques-ased on semques-antic grques-ammques-arrules such as Planes (Waltz, 1978), Eufid (Templeton and Burger, 1983) Semanticgrammar-based approaches were considered as an engineering methodology, whichallows semantic knowledge to be easily included in the system

Annotation-based systems

Recently, some question answering systems that used semantic annotations ated high results in natural language question analysis A well known annotationbased framework is GATE (General Architecture for Text Engineering) (Cunning-ham et al., 2002) which have been used in many question answering systems likeOntology-based AquaLog (Lopez et al., 2007) and QuestIO (Damljanovic et al.,

gener-2008) systems, and Galea’s open-domain system (Galea, 2003), especially for thenatural language question analysis component

Trang 26

14 Chapter 2 Literature review

Aqualog (Lopez et al., 2007) shown in figure 2.2 is an ontology-based questionanswering system for English and is the basis for the development of our system Anatural language question is mapped to a set of representation based on the inter-mediate triple that is called a Query-Triple through the Linguistic Component byusing Java Annotation Patterns Engine (JAPE) grammars in GATE (Cunningham

et al., 2002) The Relation Similarity Service takes a Query-Triple and processes

it to provide queries with respect to the input ontology called Onto-Triple ThenAqualog uses Onto-Triple to return an answer for users

Figure 2.2: Aqualog’s architecture

In our experiment, we reported an approach to convert Vietnamese natural guage questions into intermediate representation element in query-tuples (Question-structure, Question-class, Term1, Relation, Term2, Term3) based on semantic annota-tions via JAPE grammars (Nguyen et al., 2009) The selected query-tuple type ismore complex aiming to cover a wider variety of question types in different languages

lan-In addition, we proposed a language-independent approach to acquire JAPE rules

in a systematic manner which avoids unintended interaction among rules (Nguyen

et al.,2011) (Phan and Nguyen,2010) presented an approach to syntactically andsemantically map Vietnamese questions into triple-like of Subject, Verb and Object

in also utilizing JAPE grammars

The START (Katz, 1997; Katz et al., 2006) question answering system alsoused natural language annotations (Katz,1997) without utilizing GATE A lexicaldatabase WordNet (Fellbaum,1998) is important natural language application Afterthe appearance of WordNet, almost question answering systems used it to provideinformation for analyzing questions

Trang 27

an-a nan-aturan-al lan-anguan-age phran-ase an-and elements in the ontology The communican-ation betweenthe front-end and back-end is an intermediate representation of the question, whichcaptures the semantic structure of the users’ question.

Furthermore, we focus on describing a rule-based approach to directly extract anintermediate representation elements of question via FrameScript scripting language(McGill et al.,2003) (in section3.2)

The architecture of our question answering system is shown in figure3.1 It includestwo components: the Natural language question analysis and the Answer retrieval.The question analysis component takes the user’s question as an input and re-turns a query-tuple representing the question in a compact form The role of thisintermediate representation is to provide structured information of the input ques-tion for later processing such as retrieving answers

The answer retrieval component includes two main modules: Ontology mapping

Trang 28

16 Chapter 3 Our Question Answering System Architecture

and Answer extraction It takes an intermediate representation produced by thequestion analysis component and an ontology as its input to generate semanticanswers

Figure 3.1: Architecture of our question answering system

3.1.1.1 Intermediate representation of an input questionThe intermediate representation used in our approach aims to cover a wider variety

of question types It consists of a question-structure and one or more query-tuple inthe following format:

( question-structure, question-class, T erm1, Relation, T erm2, T erm3 )where T erm1 represents a concept (object class), T erm2 and T erm3, if exist,represent entities (objects), Relation (property) is a semantic constraint betweenterms in the question This representation is meant to capture the semantics of thequestion

Simple questions corresponding to basic constructions only have one query-tuple

Trang 29

3.1 Vietnamese Question Answering System 17

and its question-structure is the query-tuple’s question-structure More complexquestions such as composite questions are constructed by several sub-questions,each sub-question is described by a separate question-structure, and the question-structure capture this composition attribute This representation is chosen so that

it can represent a richer set of question types Therefore, some terms or relation inthe query-tuple can be missed Composite questions such as:

“list all students in the Faculty of Information Technology whose hometown isHanoi?”

has question structure of type And with two query-tuples where ? represents amissed element: ( UnknRel , List , students , ? , Faculty of Information Technology,

? ) and ( Normal , List , students, hometown, Hanoi, ? )

The definitions of the following question categories of HowWhy, YesNo, What,When, Where, Who, Many, ManyClass, List and Entity, and question-structures ofNormal, UnknTerm, UnknRel, Definition, Compare, ThreeTerm, Clause, Combine,And, Or, Affirm, Affirm_3Term, Affirm_MoreTuples could be found in appendixes

be reused for further processing in subsequent modules New modules are cally designed to handle Vietnamese questions using JAPE grammars over existinglinguistic annotations

specifi-There are three modules that we use to get an intermediate representation ofuser’s question including: preprocessing, syntactic analysis and semantic analysis.The preprocessing module generates TokenVn annotations representing a Viet-namese word with features such as part-of-speech to identify question-words andcomparing-phrases or special-words by using JAPE rules

The syntactic module is responsible for identifying noun phrases, questionphrases and relation phrases between noun phrases or noun phrases and questionphrases The different modules communicate through the annotations, for example,

Ngày đăng: 17/12/2023, 01:45

HÌNH ẢNH LIÊN QUAN

Hình thức đào tạo training f orm - (LUẬN văn THẠC sĩ) a vietnamese text based conversational agent
Hình th ức đào tạo training f orm (Trang 42)

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN