1. Trang chủ
  2. » Giáo Dục - Đào Tạo

An Introduction to Information Retrieval and Question Answering

23 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 23
Dung lượng 227 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Introduction to Information Retrieval and Question Answering An Introduction to Information Retrieval and Question Answering Jimmy Lin College of Information Studies University of Maryland Wednesday, December 8, 2004 Chu trình tìm kiếm thông tin Chọn nguồn Tài liệu Tìm kiếm Tuyển chọn Đánh giá Trả kết quả Tạo câu truy vấn Câu truy vấn NNTN Cú pháp chuyển qua ngữ nghĩa => xác định nội dung cần hỏi Câu truy vấn Danh sách đã xếp hạng Tài liệu tìm được Tài liệu Tài liệu Tạo lại câu truy vấn Ghi nhớ.

Trang 1

An Introduction to Information Retrieval and Question Answering

Jimmy Lin

College of Information StudiesUniversity of Maryland

Wednesday, December 8, 2004

Trang 2

Chu trình tìm kiếm thông tin

Chọn nguồn

Tài liệu

Tìm kiếm Câu truy vấn

Tuyển chọn

Danh sách đã xếp hạng

Đánh giá Tài liệu tìm được

Trả kết quả

Tài liệu

Tạo câu truy vấn Tài liệu

Tạo lại câu truy vấn Ghi nhớ từ vựng Các phản hồi liên quan

Chọn lại nguồn tài liệu

Câu truy vấn NNTN

Cú pháp chuyển qua ngữ nghĩa => xác định nội dung cần hỏi

Trang 3

Selection Ranked List

Examination

Documents

Delivery Documents

Tạo câu truy vấn

Nguồn

Lập chỉ mục Chỉ mục

Tài liệu thu

thập được Tập tài liệu

Trang 4

Types of Information Needs

Ad hoc retrieval: find me documents “like this”

 Question answering

Who discovered Oxygen?

When did Hawaii become a state?

Where is Ayer’s Rock located?

What team won the World Series in 1992?

Identify positive accomplishments of the Hubble telescope since it was launched in 1991.

Compile a list of mammals that are considered to be endangered, identify their habitat and, if possible, specify what threatens them.

What countries export oil?

Name U.S cities that have a “Shubert” theater.

Who is Aaron Copland?

What is a quasar?

“Factoid”

“List”

“Definition”

Trang 5

IR is an Experimental Science!

 Formulate a research question, the hypothesis

 Design an experiment to answer the question

 Perform the experiment

 Compare with a baseline “control”

 Does the experiment answer the question?

 Are the results significant?

 Report the results!

 Rinse, repeat…

Trang 7

IR Test Collections

 Three components of a test collection:

 Collection of documents (corpus)

 Set of information needs (topics)

 Sets of documents that satisfy the information needs (relevance judgments)

 Metrics for assessing “performance”

 Precision

 Recall

 Other measures derived therefrom

Trang 8

Where do they come from?

 TREC = Text REtrieval Conferences

 Series of annual evaluations, started in 1992

 Organized into “tracks”

 Test collections are formed by “pooling”

 Gather results from all participants

 Corpus/topics/judgments can be reused

Trang 9

Roots of Question Answering

 Information Retrieval (IR)

 Information Extraction (IE)

Trang 10

Information Retrieval (IR)

 Can substitute “document” for “information”

 IR systems

 Use statistical methods

 Rely on frequency of words in query, document,

collection

 Retrieve complete documents

 Return ranked lists of “hits” based on relevance

 Limitations

 Answers questions indirectly

 Does not attempt to understand the “meaning” of user’s query or documents in the collection

Trang 11

Information Extraction (IE)

 IE systems

 Identify documents of a specific type

 Extract information according to pre-defined templates

 Place the information into frame-like database records

 Templates = pre-defined questions

 Extracted information = answers

 Limitations

 Templates are domain dependent and not easily

portable

 One size does not fit all!

Weather disaster: Type

Date Location

Damage Deaths

Trang 12

Central Idea of Factoid QA

 Determine the semantic type of the expected

“Who won the Nobel Peace Prize in 1991?” is looking for a PERSON

Retrieve documents that have the keywords “won”, “Nobel Peace Prize”, and “1991”

Look for a PERSON near the keywords “won”, “Nobel Peace

Prize”, and “1991”

Trang 13

An Example

But many foreign investors remain sceptical, and western governments are withholding aid because of the Slorc's dismal human rights record and the continued detention of Ms Aung San Suu Kyi, the opposition leader who won the Nobel Peace Prize in 1991

The military junta took power in 1988 as pro-democracy demonstrations were sweeping the country It held elections in 1990, but has ignored their result It has kept the 1991 Nobel peace prize winner , Aung San Suu Kyi - leader of the opposition party which won a landslide victory in the poll - under house arrest since July 1989.

The regime, which is also engaged in a battle with insurgents near its eastern border with Thailand, ignored a 1990 election victory by an

opposition party and is detaining its leader, Ms Aung San Suu Kyi, who was awarded the 1991 Nobel Peace Prize According to the British Red Cross, 5,000 or more refugees, mainly the elderly and women and

children, are crossing into Bangladesh each day.

Who won the Nobel Peace Prize in 1991?

Trang 15

Question analysis

 Question word cues

 Who  person, organization, location (e.g., city)

 When  date

 Where  location

 What/Why/How  ??

 Head noun cues

 What city, which country, what year

 Which astronaut, what blues band,

 Scalar adjective cues

 How long, how fast, how far, how old,

Trang 17

Extracting Named Entities

Person: Mr Hubert J Smith, Adm McInnes, Grace Chan

Title: Chairman, Vice President of Technology, Secretary of State Country: USSR, France, Haiti, Haitian Republic

City: New York, Rome, Paris, Birmingham, Seneca Falls

Province: Kansas, Yorkshire, Uttar Pradesh

Business: GTE Corporation, FreeMarkets Inc., Acme

University: Bryn Mawr College, University of Iowa

Organization: Red Cross, Boys and Girls Club

Trang 18

More Named Entities

Currency: 400 yen, $100, DM 450,000

Linear: 10 feet, 100 miles, 15 centimeters

Area: a square foot, 15 acres

Volume: 6 cubic feet, 100 gallons

Weight: 10 pounds, half a ton, 100 kilos

Duration: 10 day, five minutes, 3 years, a millennium Frequency: daily, biannually, 5 times, 3 times a day Speed: 6 miles per hour, 15 feet per second, 5 kph Age: 3 weeks old, 10-year-old, 50 years of age

Trang 19

How do we extract NEs?

 Heuristics and patterns

 Fixed-lists (gazetteers)

 Machine learning approaches

Trang 20

Answer Type Hierarchy

Trang 21

 in the back of pick-up trucks

 Where are zebras most likely found?

 near dumps

 in the dictionary

 Why can't ostriches fly?

 Because of American economic sanctions

 What’s the population of Maryland?

 three

Trang 22

Limitations?

Trang 23

 Question answering is an exciting research area!

 Lies at the intersection of information retrieval and

natural language processing

 A real-world application of NLP technologies

 The dream: a vast repository of knowledge we can “talk to”

 We’re a long way from there…

Ngày đăng: 28/06/2022, 09:03

TỪ KHÓA LIÊN QUAN

w