Keyword Search in Databases- P27 docx

VARIATIONS OF KEYWORD SEARCH ON DATABASES 129maybe empty.. 5.3.5 SMALL DATABASE AS RESULT Précis [Koutrika et al.,2006;Simitsis et al.,2008] returns a small database that contains only t

Trang 1

5.3 VARIATIONS OF KEYWORD SEARCH ON DATABASES 129

(maybe empty) For example, for the keyword query Q = {author, number, paper, XML}, one of the possible CI s is (C = {author.TID, paper.TID, paper.title contains “XML” }, a = paper.TID, F = count, w = author.TID) A CI is trivial if one of the following is satisfied: (1) C contains a attribute

c = a such that c functionally determines a, or (2) C contains two attributes ci and cj that refer to

the same attribute or ci is a foreign key of cj The set of non-trivial CI s can be enumerated by using

the full text index enabled inrdbms

After enumerating all non-trivial CI s, for each CI = (C, a, F, w), it enumerates a set of Simple Query Networks (SQN) where each SQN is a connected subgraph of the schema graph

that satisfies the following conditions:

• Total - All tables in C are contained in the SQN.

• Minimal - It is not total if any node is removed from SQN.

• Node Clarity - Each node in SQN has at most one incoming edge.

Suppose the cost of a SQN is the summation of all edge costs and node costs For each CI , it needs to get the SQN with the smallest cost, which is a NP-Complete problem A heuristic greedy algorithm is proposed in SQAK For a CI , (C, a, F, w), it starts at the table o that contains the attribute a For each of the other tables (nodes) v ∈ C, it finds the shortest path from v to o in a backtrack manner If, after adding the path from v to o, the node clarity condition is violated, it backtracks to find the next shortest path from v to o until all nodes in C are successfully added It then outputs the current result to be a good SQN for the CI After finding the SQN for each CI ,

it can get the top-k SQNs with the smallest cost And each of the top-k SQNs is translated into

ansqlto be output

5.3.5 SMALL DATABASE AS RESULT

Précis [Koutrika et al.,2006;Simitsis et al.,2008] returns a small database that contains only the

tuples relevant to a given keyword query Q The schema of a relational database D is modeled as

a weighted graph GS (V , E) , where each relation is modeled as a node in GS, and each foreign key

reference between relations is modeled as an edge in GS Each edge also has a weight, defining the

tightness of the relationship between the two relations Given a keyword query Q = {k1, k2, , k l}, the result of applying Q on D is a small database D, satisfying the following conditions

1 The set of relation names in Dis a subset of the set of relation names in D.

2 For each relation R ∈ Dthat corresponds to relation R ∈ D, we have att(R) ⊆ att(R) and

t up(R) ⊆ tup(R), where att(R) denotes the attributes of R and tup(R) denotes the tuples

of R.

3 The tuples in D can be generated by expanding from the tuples that contain keywords in the query, following the foreign key references They must satisfy the degree constraints and

Trang 2

130 5 OTHER TOPICS FOR KEYWORD SEARCH ON DATABASES

cardinality constraints Degree constraints define the attributes and relations in D They

include (1) the maximum number of attributes in D, and (2) the minimum weight of projection

paths in the database schema graph GS Cardinality constraints define the set of tuples in D

They include (1) the maximum number of tuples in D, and (2) the maximum number of

tuples for each relation in D

For example, for the DBLP database shown in Figure 2.2, consider a keyword query Q= {algorithms}, with the constraint such that the distance from any tuple to the tuple that contains

the keyword in Q must be no larger than 2 Then, the result contains the database having the same schema with the original database Tuples such as p2and p3will be contained in the result because

they all have distance 2 with the tuple p4that contains the keyword “algorithms” Tuples such as a1,

a2 and p4will not be contained in the result because they all have distance larger than 2 with any tuple that contains the keyword “algorithms”

In Précis, a keyword query is processed in two steps In the first step, the schema of the database Dis generated, such that all of the degree constraints are satisfied This can be done easily

by expanding from the relations, that may contain the user given keywords, to the adjacent relations following the foreign key references, until all degree constraints are satisfied In the second step, it

evaluates each join edge defined in the schema of Din order to satisfy all the cardinality constraints

5.3.6 OTHER RELATED ISSUES

Jagadish et al.[2007] assert that usability of a database is an important issue to address in database research Enabling keyword query on database is one aspect to improve the usability

Goldman et al [1998] propose the notion, proximity search, which is to search objects in

database that are “near” other relevant objects Here the database is represented as a graph, where objects are represented by nodes and edges represent relationships between the corresponding objects

Su and Widom[2005] propose to construct virtual documents offline, which is the answer unit for a keyword query Virtual documents are interconnected tuples from multiple relations Query answering is in an traditional IR style, where virtual documents satisfying the query are returned.Nandi and Jagadish[2009] propose to represent the database, conceptually, as a collection

of independent “queried units”, each of which represents the desired result of some query against the database.Jayapandian and Jagadish[2008] present an automated technique to generate a good set of forms that can express all possible queries, and each form is capable of expressing only a very limited range of queries.Talukdar et al.[2008] present a system with which a non-expert user can author new query templates and Web forms, to be used by anyone with related information needs The query templates and Web forms are generated by a keyword query against interlinked source relations

Ji et al.[2009] study interactive keyword search on RDB, where the interaction is provided

by autocompletion, which predicts a word of phrase that a user may type based on the partial query

the user has entered An answer defined in [Ji et al., 2009] is a single record in RDB. Li et al

[2009a] extend the autocompletion framework to the steiner tree based semantics for a keyword query.

Trang 3

5.3 VARIATIONS OF KEYWORD SEARCH ON DATABASES 131

Chaudhuri and Kaushik[2009] study autocompletion with tolerated errors in a general framework,

in which only autocompletions are computed without query evaluation [Pu and Yu,2008,2009]

study the problem of query cleaning for keyword queries in RDB, where query cleaning involves

semantic linkage and spelling corrections followed by segmenting nearby query words into high quality data terms

Guo et al [2007] present efficient algorithm to conduct topology search over biological databases.Shao et al.[2009b] present an effective workflow search engine, WISE, to find informa-tive and concise search results, defined as the minimal views of the most specific workflow hierarchies containing keywords for a keyword query

Trang 5

Bibliography

Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das DBXplorer: A system for keyword-based

search over relational databases In Proc 18th Int Conf on Data Engineering, pages 5–16, 2002.

DOI: 10.1109/ICDE.2002.994693 2.1, 2.3

Shurug Al-Khalifa, Cong Yu, and H V Jagadish Querying structured text in an xml

database In Proc 2003 ACM SIGMOD Int Conf On Management of Data, pages 4–15, 2003.

DOI: 10.1145/872757.8727614.5

Sihem Amer-Yahia, Pat Case, Thomas Rölleke, Jayavel Shanmugasundaram, and Gerhard

Weikum Report on the db/ir panel at sigmod 2005 SIGMOD Record, 34(4):71–74, 2005.

DOI: 10.1145/1107499.1107514 (document)

Sihem Amer-Yahia and Jayavel Shanmugasundaram Xml full-text search: Challenges and

oppor-tunities In Proc 31st Int Conf on Very Large Data Bases, page 1368, 2005 (document)

Andrey Balmin, Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou, Divesh Srivastava, and

Tianqiu Wang A system for keyword proximity search on xml databases In Proc 29th Int Conf.

on Very Large Data Bases, pages 1069–1072, 2003 4.5

Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou ObjectRank: Authority-based

keyword search in databases In Proc 30th Int Conf on Very Large Data Bases, pages 564–575,

2004 5.3.1

Zhifeng Bao, Tok Wang Ling, Bo Chen, and Jiaheng Lu Effective xml keyword search with

relevance oriented ranking In Proc 25th Int Conf on Data Engineering, pages 517–528, 2009.

DOI: 10.1109/ICDE.2009.164.5

Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S Sudarshan Keyword

searching and browsing in databases using BANKS In Proc 18th Int Conf on Data Engineering,

pages 431–440, 2002.DOI: 10.1109/ICDE.2002.994756 3.1, 3.1, 3.3.1, 3.3.1

Sergey Brin and Lawrence Page The anatomy of a large-scale hypertextual web search engine

Computer Networks, 30(1-7):107–117, 1998.DOI: 10.1016/S0169-7552(98)00110-X3.1, 4.4.1 Kaushik Chakrabarti, Venkatesh Ganti, Jiawei Han, and Dong Xin Ranking objects based on

relationships In Proc 2006 ACM SIGMOD Int Conf On Management of Data, pages 371–382,

2006.DOI: 10.1145/1142473.1142516 5.3.1

Định dạng
Số trang	5
Dung lượng	107,88 KB