1. Trang chủ
  2. » Công Nghệ Thông Tin

Tài liệu Advances in Database Technology- P18 pptx

46 387 2
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Aggregation of Continuous Monitoring Queries
Tác giả K.-Y. Lam, H.C.W. Pang
Trường học University of Science and Technology
Chuyên ngành Database Technology
Thể loại Báo cáo nghiên cứu
Năm xuất bản Unknown
Thành phố Unknown
Định dạng
Số trang 46
Dung lượng 728,23 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The sensor program is implemented in NesC, which is a C-like language for TinyOS [TINY], and is responsible for capturing sensor data values, evaluating queries, submitting sub-query res

Trang 1

832 K.-Y Lam and H.C.W Pang

Due to the dynamic properties of sensor data, the probability of satisfying the

condition in a sub-query at a node may change with time Therefore the coordinator

node needs to reorder the sequence of the participating nodes periodically The

reorder procedure is performed when the following condition is satisfied: the

evaluation is stopped at the same node, called the false node, consecutively for a

pre-defined period of time, and the false node is not the first node Satisfaction of these

conditions suggests that the sensor data values generated at the false node may have a

high probability to be false in next evaluation Hence the coordinator node will

reorder the sequence of the nodes using the following procedure:

a The false node is now the first node in the sequence

b All the nodes following the false node will remain in the same relative order

to each other

c All the nodes in front of the false node remain in their original relative order

They rejoin the node sequence but are now attached after the last node of theoriginal sequence

4 Implementation

CMQES is implemented with MICA Motes [MM] In CMQES, one of the MSPUs is

connected with the base station through a mote interface board It is the base station

MSPU CMQES contains two main software components: the sensor program in the

MPSU, and the server program in the base station The sensor program is

implemented in NesC, which is a C-like language for TinyOS [TINY], and is

responsible for capturing sensor data values, evaluating queries, submitting

sub-query results to the coordinator nodes, and collecting performance statistics at each

MPSU We have implemented SeqPush in the sensor program The evaluation results

of a CMQ and performance statistics are periodically sent to the base station through

the base station MSPU for reporting

The second program is the server program residing at the base station This

program is implemented in Java, with MS Windows 2000 and MS SQL server chosen

respectively as the operating systems and database The server program is responsible

for collecting results and performance statistics In addition, the following parameters

can be set using the server program at the base station for submitting a CMQ and

controlling the operations at the MPSUs:

1 The sampling rate of the MSPUs

2 The number of nodes participating in a CMQ

3 Aggregation functions to be performed at a node, i.e., calculate the mean,

maximum and minimum from the values of sub-queries

4 Condition for the sub-query of a CMQ at an MPSU

5 Demonstration and Sample Results

In the demonstration, through the interface at the base station, we can submit CMQs

for processing at the MPSUs, as shown in Figure 1 Currently, the MPSU is

programmed to capture the light intensity of its surrounding environment periodically,

Trang 2

Aggregation of Continuous Monitoring Queries 833

i.e., every 2 sec, as sensor data values, and a message is sent every 30 sec to the

coordinator node The sampling rate and message reporting period can be varied at

the base station The message size for communication in TinyOS is 34 bytes Five

bytes are reserved for the message header and the message contains the results from

10 evaluations with each reading taking up 2 bytes The remaining 9 bytes are for

cycle number, message type and the destination address Currently, the transmission

delay of a single message from a MSPU to another one in the testing environment is

between 300ms and 700ms As TinyOS only provides best effort message delivery

service A lost message will be considered a missed evaluation cycle and logged

accordingly by the coordinator node

Fig 1. Program at the base station Fig 2 Real time display of received

messages and statisticsOur experiment results show that the number of messages submitted in central

aggregation scheme (CAS), i.e all the participating MSPU submits sub-query result

to a central MUPU for data aggregation periodically, is much larger than that in

SeqPush. Two ammeters are connected to one of the participating nodes and the

coordinator node to measure the energy consumption rates of the nodes when

different operations are performed at the nodes

The result captured by the base station is displayed in real time as shown in Fig 2

The statistics include:

(1) Number of message transmitted, including sending and receiving messages

(2) Number of successful evaluations and number of missed query results due to

Trang 3

eVitae: An Event-Based Electronic Chronicle

Bin Wu, Rahul Singh, Punit Gupta, Ramesh JainExperiential Systems Group Georgia Institute of Technology

of the data in a manner that is independent of media To store events, a novel database called EventBase is developed which is indexed by events The unique characteristics of events make multidimensional querying and multiple perspective explorations of personal history information feasible In this demo

we present the major functions of eVitae.

1 Introduction

Personal history systems electronically record important activities from a person’s life

in the form of photographs, text, video, and audio Examples of some existing systems

such as [3] have shown that there are a number of salient challenges in this domain

First, information is fundamentally anchored to space and time and people often

exploit them as cues for querying information Second, the data as the carrier of

information stays in respective silos This fragments meaningful information across

data Third, to break down these silos, an information model independent of media is

required to depict the content of information Lastly, presentation of the information

must be dynamically generated according to individual users’ preferences

We have developed a personal eChronicle [1] called eVitae [4] In eVitae we

utilize a novel generative theory based upon the concept of event to design an

information system [2] In this paper, we show how eVitae system as an access

environment ingests heterogeneous data into meaningful information conveyed by

events, aids the user to quickly focus on what is of interest and presents a

multidimensional environment for exploration of events with their details stored in

appropriate media

2 Event-Based Data Modeling

The approach we employ to design and implement eVitae is based on the notion of

events [6] An event is an occurrence or happening of significance that can be defined

as a region or collection of regions in spatial-temporal-attribute space Given k events,

E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 834–836, 2004.

© Springer-Verlag Berlin Heidelberg 2004

Trang 4

eVitae: An Event-Based Electronic Chronicle 835

the event is formally denoted as and uniquely identified by eID

(event identifier) In this notation t characterizes the event temporally, s denotes the

spatial location(s) associated with the event, and are the attribute associated

with the event An event is defined by its event models, which includes the mandatory

attributes: space, time, transcluded-media, event-name, and event-topic, and a finite

set of free attributes

Events can be grouped together in collections, called event categories Formally, an

set of events that comprise the category Event categories provide a powerful

construct to support the multiple ways of organizing information, definition of

complex queries and notification, personalized views of information space where the

user is interested

According to the definition of the event, the information layer implemented by

events breaks down the data silos This layer uses an event-based data model to

construct a new index that is independent of data type The organization, indexing,

and storage of events conveying potentially changing information are accomplished

by parsing the data as it is entered, and storing all events in a database of events called

EventBase The data is parsed by the system and events are produced using the event

model The EventBase also stores links to original data sources, which means the

system can only present the appropriate media in the context of a particular event

EventBase is the extension of traditional database In the implementation of prototype

eVitae system, we use MySQL as the database to store and index events

3 System Architecture

The architecture of eVitae system comprises three modules, namely, Event Entry,

EventBase, and What-You-See-Is-What-You-Get (WYSIWYG) query and exploration

environments. The key features of the system are briefly discussed as following

EventBase. EventBase is the backend of eVitae system which stores the Events The

transclusion of media is maintained by storing links between an event and the data it

is based upon EventBase uses the eID attribute of events as the unified index and is

supported by MySQL

WYSIWYG query and exploration environment. This is an integrated interaction

environment to explore electronic chronicle of a person, as shown in figure 1 By

using temporal and spatial relationship as cues and by defining event categories to

organize events, we create an exploratory environment for the user The event exhibit

panel (Fig 1) presents an overall picture of events Options for zooming, filtering,

extraction, viewing relations, history keeping, and details-on-demand make the

environment appealing and powerful

Event Entry. An event may include multifarious data from different sources, such as

video, audio, sensors, texts The Event Entry module of the system is designed to

produce events using event models, and record the link between events and related

data

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 5

836 B Wu et al.

Fig 1. WYSIWYG Query and Exploration Environment

4 Conclusion

eVitae system demonstrates an event-based approach for organization, storage,

management, and querying of personal history information comprising of multifarious

data The system provides an event entry module for importing data from

heterogeneous data sources and assimilating them into events This information is

stored in EventBase which is indexed by events Event-based modeling allows

multidimensional querying and exploration of personal history information

Furthermore, flexibility in event definition and organization allows exploration of the

data from multiple perspectives Preliminary results indicate that an event-based

system not only offers significant advantages in information organization, but also in

exploration and discovery of information from data

References

R Jain “Multimedia Electronic Chronicles”, IEEE MultiMedia, pp 102-103, Volume

10, Issue 03, July 2003.

R Jain, “Events in Heterogeneous Data Environments”, Proc International Conference on

Data Engineering, Bangalore, March 2003.

J Gemmell, G Bell, R Lueder, S Drucker, and C Wong “MyLifeBits: fulfilling the

Memex vision”, ACM Multimedia, pp 235-238, ACM, 2002.

R Singh, B Wu, P Gupta, R Jain “eVitae: Designing Experiential eChronilces”, ESG

Technical Report Number : GT-ESG-01-10-03,

Trang 6

CAT: Correct Answers of Continuous Queries

Department of Computer Science, University of Illinois at Chicago,

Chicago, Il 60607,

{wolfson,nnedunga}@cs.uic.edu

1 Introduction and Motivation

Consider the query Q1: Retrieve all the motels which will be no further then 1.5

miles from my route, sometime between 7:00PM and 8:30PM, which a mobile

user posed to the Moving Objects Database (MOD) Processing such queries

is of interest to wide range of applications (e.g tourist information systems

and context awareness [1,2]) These queries pertain to the future of a dynamic

world Since MOD is only a model of the objects moving in the real world, the

accuracy of the representation has to be continuously verified and updated, and

the answer-set of Q1 has to be re-evaluated in every clock-tick

1

However, there-evaluation of such queries can be avoided if an update to the MOD does not

affect the answer-set

The motion of the moving object is typically represented as a trajectory –

in the X- Y plane is called a route The details of the construction based on

electronic maps and the speed-profiles of the city blocks are given in [5] After a

trajectory is constructed, a traffic abnormality may occur at a future time-point,

due to an accident, road-work, etc , and once it occurs, we need to: identify the

trajectories that are affected, and update them properly (c.f [5]) In the sequel,

we focus on the impacts of the abnormalities to the continuous queries

Figure 1 shows three trajectories – and and their respective

routes and If a road-work starts at 4:30PM on the segment between

A and B which will last 5 hours, slow down the speed between 4:30PM and

9:30PM enters that segment after 4:30PM, and its future portion will need

to be modified As illustrated by the thicker portion of instead of being at

the point B at 4:50, the object will be there at 5:05 A key observation is that

if the object, say whose trajectory is issued the query Q1, we have to

re-evaluate the answer

* Research partially supported by NSF grant EIA-000 0536

1 Hence the name continuous queries – formally defined for MOD in [3].

E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 837–840, 2004.

© Springer-Verlag Berlin Heidelberg 2004

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 7

There are tables which: store the trajectories (MOT) and landmarks of

inter-est (BUILDINGS); keep track of the queries posed and their answers

(PEND-ING-QUERIES and ANSWERS); and store the information about the traffic

abnormalities (TRAFFIC_ABN) The trajectories and the landmarks were

ob-tained using the real maps of Cook County, Chicago

In the heart of the CAT system is the set of Triggers, part of which we

illustrate in the context of the example query Q1 If posed the query Q1

at 3:30PM, its answer contains the motels and to which should be

close at 7:55PM and 8:20PM, respectively When an abnormality is detected, its

relevant parameters are inserted in the TRAFFIC_ABN table This, however,

is satisfied by and its action part will re-evaluate the query Q1,

based on the new future-portion of Due to the delay, trajectory will

Trang 8

CAT: Correct Answers of Continuous Queries Using Triggers 839

Fig 2. Behavioral Aspects of the CAT

be near at 8:35PM, which is a bit too late for the user On the other hand,

will be near the motel at 7:05PM Before the traffic incident was not

part of the answer, (it would have had the desired proximity at 6:50PM)

All the back-end components are implemented using Oracle 9i as a server We

used User-Defined Types (UDT) to model the entities and User-Defined

Func-tions (UDF) to implement the processing, exploiting the Oracle Spatial

predi-cates

The front-end client, which is the GUI presented to the end-user, is

imple-mented in Java The GUI gives the options of specifying the queries (i.e time

of submission; relevant time interval; objects of interest; etc ) Once the user

clicks the SUBMIT button, the query is evaluated and its answer is displayed

In the server, the query is assigned an id number and it is stored in the

PEND-ING_QUERIES table Clearly, in a real MOD application, the client will be

either a wireless (mobile) user of a web browser-based one, properly interfaced

to the server

To test the execution of the triggers and the updates of the answer(s) to the

continuous queries posed, the GUI offers a window for generating a traffic

ab-normality The user enters the beginning and the end times of the incident as

well as its “type” (which determines the impact on the speed-profile) He also

enters the route segments along which the traffic incident is spread The moment

this information is submitted to the server, the affected trajectories are updated

AND the new answer (s) to the posed continuous queries are displayed back to

the user

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 9

840 G Trajcevski et al.

References

A Hinze and A Voisard Location-and time-based information delivery in tourism.

In SSTD, 2003.

A Pashtan, R Blatter, A Heusser, and P Scheuermann Catis: A context-aware

tourist information system In IMC, 2003.

A P Sistla, O Wolfson, S Chamberlain, and S Dao Modeling and querying

moving objects In ICDE, 1997.

G Trajcevski and P Scheuermann Triggers and continuous queries in moving

Trang 10

Hippo: A System for Computing Consistent

Answers to a Class of SQL Queries

Jan Chomicki1, Jerzy Marcinkowski2, and Slawomir Staworko1

1 Dept Computer Science and Engineering, University at Buffalo

{chomicki,staworko}@cse.buffalo.edu 2

Instytut Informatyki, Wroclaw Uniwersity, Poland

Jerzy.Marcinkowski@ii.uni.wroc.pl

1 Motivation and Introduction

Integrity constraints express important properties of data, but the task of

pre-serving data consistency is becoming increasingly problematic with new database

applications For example, in the case of integration of several data sources, even

if the sources are separately consistent, the integrated data can violate the

in-tegrity constraints The traditional approach, removing the conflicting data, is

not a good option because the sources can be autonomous Another scenario

is a long-running activity where consistency can be violated only temporarily

and future updates will restore it Finally, data consistency may be neglected

because of efficiency or other reasons

In [1] Arenas, Bertossi, and Chomicki have proposed a theoretical framework

for querying inconsistent databases Consistent query answers are defined to be

those query answers that are true in every repair of a given database instance A

repair is a consistent database instance obtained by changing the given instance

using a minimal set of insertions/deletions Intuitively, consistent query answers

are independent of the way the inconsistencies in the data would be resolved

Example 1 Assume that an instance of the relation Student is as follows:

above instance has two repairs: one obtained by deleting the first tuple, the

other - by deleting the second tuple A query asking for the address of Jones

returns Chicago as a consistent answer because the third tuple is in both repairs.

However, the query asking for the address of Smith has no consistent answers

because the addresses in different repairs are different On the other hand, the

query asking for those people who live in Los Angeles or New York returns Smith

as a consistent answer

This conservative definition of consistent answers has one shortcoming: the

number of repairs Even for a single functional dependency, the number of repairs

E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 841–844, 2004.

© Springer-Verlag Berlin Heidelberg 2004

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 11

842 J Chomicki, J Marcinkowski, and S Staworko

can be exponential in the number of tuples in the database [3] Nevertheless,

several practical mechanisms for the computation of consistent query answers

without computing all repairs have been developed (see [5] for a survey): query

rewriting [1], logic programs [2,4,9], and compact representations of repairs [6,

7] The first is based on rewriting the input query Q into a query such that

the evaluation of returns the set of consistent answers to Q This method

works only for SJD1 queries in the presence of universal binary constraints The

second approach uses disjunctive logic programs to specify all repairs, and then

with the help of a disjunctive LP system [8] finds the consistent answers to a

given query Although this approach is applicable to very general queries in the

presence of universal constraints, the complexity of evaluating disjunctive logic

programs makes this method impractical for large databases

The system Hippo is an implementation of the third approach All information

about integrity violations is stored in a conflict hypergraph Every hyperedge

connects the tuples violating together an integrity constraint

Using the conflict hypergraph, we can find if a given tuple belongs to the set

of consistent answers without constructing all repairs [6] Because the conflict

hypergraph has polynomial size, this method has polynomial data complexity

and allows us to efficiently deal even with large databases [7] Currently, our

ap-plication computes consistent answers to SJUD queries in the presence of denial

constraints (a class containing functional dependency constraints and exclusion

constraints) Allowing union in the query language is crucial for being able to

extract indefinite disjunctive information from an inconsistent database (see

Ex-ample 1)

Future work includes the support for restricted foreign key constraints,

uni-versal tuple-generating dependencies and full PSJUD2 queries However, because

computing consistent query answers for SPJ queries is co-NP-data-complete [3,

6], polynomial data complexity cannot be guaranteed once projection is allowed

The whole system is implemented in Java as an RDBMS frontend Hippo

works with any RDBMS that can execute SQL queries, and provides a JDBC

access interface (we use PostgreSQL) The data stored in the RDBMS needs not

be altered

The flow of data in Hippo is presented on Figure 1 Before processing any

input query, the system performs Conflict Detection and creates Conflict

Hyper-graph for further usage We are assuming that the number of conflicts is small

enough for the hypergraph to be stored in main memory The only output of

this system is the Answer Set consisting of the consistent answers to the input

1 When describing a query class, P stands for projection, S for selection, U for union,

J for cartesian product, and D for difference.

2

Currently, our application supports only those cases of projection that don’t

intro-duce existential quantifiers in the corresponding relational calculus query.

Trang 12

Hippo: A System for Computing Consistent Answers 843

Fig 1. Data flow in Hippo

Query in the database instance DB with respect to a set of integrity constraints

IC.

The processing of the Query starts from Enveloping As a result of this step

we get a query defining Candidates (candidate consistent query answers) This

query subsequently undergoes Evaluation by the RDBMS For every tuple from

the set of candidates, the system uses Prover to check if the tuple is a consistent

answer to the Query Depending on the result of this check, the tuple is either

added to the Answer Set or not.

For every tuple that Prover processes, several membership checks have

typi-cally to be performed In the base version of the system this is done by simply

executing the appropriate membership queries on the database This is a costly

procedure and it has a significant influence on the overall time performance of

the system We have introduced several optimizations addressing this problem

In general, by modifying the expression defining the envelope (the set of

can-didates) the optimizations allow us to answer the required membership checks

without executing any queries on the database Also, using an expression

select-Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 13

844 J Chomicki, J Marcinkowski, and S Staworko

ing a subset of the set of consistent query answers, we can significantly reduce

the number of tuples that have to be processed by Prover A more detailed

description of those techniques can be found in [7]

The presentation of the Hippo system will consist of three parts First, we will

demonstrate that using consistent query answers we can extract more

informa-tion from an inconsistent database than in the approach where the input query is

evaluated over the database from which the conflicting tuples have been removed

Secondly, we will show the advantages of our method over competing approaches

by demonstrating the expressive power of supported queries and integrity

con-straints And finally, we will compare the running times of our approach and

the query rewriting approach, showing that our approach is more efficient For

every query being tested, we will also measure the execution time of this query

by the RDBMS backend (it corresponds to the approach when we ignore the

fact that the database is inconsistent) This will allow us to conclude that the

time overhead of our approach is acceptable

References

M Arenas, L Bertossi, and J Chomicki Consistent Query Answers in Inconsistent

Databases In ACM Symposium on Principles of Database Systems (PODS), pages

68–79, 1999.

M Arenas, L Bertossi, and J Chomicki Answer Sets for Consistent Query

An-swering in Inconsistent Databases Theory and Practice of Logic Programming,

3(4–5):393–424, 2003.

M Arenas, L Bertossi, J Chomicki, X He, V Raghavan, and J Spinrad Scalar

Aggregation in Inconsistent Databases Theoretical Computer Science, 296(3) :405–

434, 2003.

P Barcelo and L Bertossi Logic Programs for Querying Inconsistent Databases.

In International Symposium on Practical Aspects of Declarative Languages (PADL),

pages 208–222 Springer–Verlag, LNCS 2562, 2003.

L Bertossi and J Chomicki Query Answering in Inconsistent Databases In

J Chomicki, R van der Meyden, and G Saake, editors, Logics for Emerging

Appli-cations of Databases. Springer-Verlag, 2003.

J Chomicki and J Marcinkowski Minimal-Change Integrity Maintenance Using

Tuple Deletions Technical Report cs.DB/0212004, arXiv.org e-Print archive,

De-cember 2002 Under journal submission.

J Chomicki, J Marcinkowski, and S Staworko Computing Consistent Query

An-swers Using Conflict Hypergraphs In preparation.

T Eiter, W Faber, N Leone, and G Pfeifer Declarative Problem-Solving in DLV.

In J Minker, editor, Logic-Based Artificial Intelligence, pages 79–103 Kluwer, 2000.

G Greco, S Greco, and E Zumpano A Logical Framework for Querying and

Repairing Inconsistent Databases IEEE Transactions on Knowledge and Data

Trang 14

An Implementation of P3P Using Database

Technology

Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu

IBM Almaden Research Center

650 Harry Road, San Jose, CA 95120, USA

{ragrawal,kiernan,srikant}@almaden.ibm.com, xuyirong@cn.ibm.com

http://www.almaden.ibm.com/software/quest

1 Introduction

The privacy of personal information on the Internet has become a major concern

for governments, businesses, media, and the public Platform for Privacy

Pref-erences (P3P), developed by the World Wide Web Consortium (W3C), is the

most significant effort underway to enable web users to gain more control over

their private information P3P provides mechanisms for a web site to encode its

data-collection and data-use practices in a standard XML format, known as a

P3P policy [3], which can be programmatically checked against a user’s privacy

preferences

This demonstration presents an implementation of the server-centric

archi-tecture for P3P proposed in [1] The novel aspect of this implementation is that

it makes use of the proven database technology, as opposed to the prevailing

client-centric implementation based on specialized policy-preference matching

engines Not only does this implementation have qualitative advantages, our

experiments indicate that it performs significantly better (15-30 times faster)

than the sole public-domain client-centric implementation and that the latency

introduced by preference matching is small enough (0.16 second on average) for

real-world deployments of P3P [1].

The P3P protocol has two parts:

Privacy Policies: An XML format in which a web site can encode its

data-collection and data-use practices [3] For example, an online bookseller can

publish a policy which states that it uses a customer’s name and home

phone number for telemarketing purpose, but that it does not release this

information to external parties

Privacy Preferences: An XML format for specifying privacy preferences and

an algorithm for programmatically matching preferences against policies

The W3C APPEL working draft provides such a format and corresponding

policy-preference matching algorithm [2] For example, a privacy-conscious

consumer may define a preference stating that she does not want retailers

to use her personal information for telemarketing and product promotion

1

2

E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 845–847, 2004.

© Springer-Verlag Berlin Heidelberg 2004

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 15

846 R Agrawal et al.

Fig 1. Client-centric policy-preference matching

2.1 Client-Centric Implementation

A client-centric architecture for implementing P3P has been described in [4]

As a user browses a web site, the site’s P3P policy is fetched to the client side

The policy is then checked by a specialized APPEL engine against the user’s

APPEL preference to see if the policy conforms to the preference (see Figure 1)

There are two prominent implementations of this architecture: Microsoft IE6

and AT&T Privacy Bird

2.2 Server-Centric Implementation

Figure 2 shows the server-centric architecture we have developed A web site

de-ploying P3P first installs its privacy policy in a database system Then database

querying is used for matching a user’s privacy preference against privacy policies

The server-centric implementation has several advantages including: setting up

the infrastructure necessary for ensuring that web sites act according to their

stated policies, allowing P3P to be deployed in thin, mobile clients that are likely

to dominate Internet access in the future, and allowing site owners to refine their

policies based on the privacy preferences of their users

Our implementation consists of both client and server components

3.1 Client Components

We extend Microsoft Internet Explorer to invoke preference checking at the

server before a web page is accessed The IE extension allows a user to specify

her privacy preference at different sensitivity levels It invokes the preference

checking by sending the preference to the server

Trang 16

An Implementation of P3P Using Database Technology 847

Fig 2. Server-centric policy-preference matching

3.2 Server Components

We define a schema in DB2 for storing policy data in the relational tables This

schema contains a table for every element defined in the P3P policy The tables

are linked using foreign keys reflecting the XML structure of the policies We

extend IBM Tivoli Privacy Wizard (a web-based GUI tool for web site owners to

define P3P policies) with the functionality of parsing and shredding P3P policies

as a set of records in the database tables

When the server receives the APPEL preference from the client, it translates

the preference into SQL queries to be run against the policy tables The SQL

queries corresponding to the preference are submitted to the database engine

The result of the query evaluation yields the action to be taken The evaluation

result is sent back to the client If the policy does not conform to the preference,

the IE extension will block the web page and prompt a message to the user

Otherwise, the requested web page is displayed

References

Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu

Implement-ing P3P usImplement-ing database technology In 19th Int’l Conference on Data EngineerImplement-ing,

Bangalore, India, March 2003.

Lorrie Cranor, Marc Langheinrich, and Massimo Marchiori A P3P Preference

Ex-change Language 1.0 (APPEL1.0). W3C Working Draft, April 2002.

Lorrie Cranor, Marc Langheinrich, Massimo Marchiori, Martin Presler-Marshall,

and Joseph Reagle The Platform for Privacy Preferences 1.0 (P3P1.0)

Specifica-tion. W3C Recommendation, April 2002.

The World Wide Web Consortium P3P 1.0: A New Standard in Online Privacy.

Available from http://www.w3.org/P3P/brochure.html.

Trang 17

XQBE: A Graphical Interface for XQuery

Engines

Daniele Braga, Alessandro Campi, and Stefano Ceri

Politecnico di Milano Piazza Leonardo da Vinci,32 - 20133 Milano, Italy

{braga,campi,ceri}@elet.polimi.it

Abstract. XQuery is increasingly popular among computer scientists with a SQL background, since queries in XQuery and SQL require com- parable skills to be formulated However, the number of these experts

is limited, and the availability of easier XQuery “dialects” could be tremely valuable Something similar happened with QBE, initially pro- posed as an alternative to SQL, that has then become popular as the user-friendly query language supported by MS Access We designed and

ex-implemented XQBE, a visual dialect of XQuery that uses hierarchical

structures, coherent with the hierarchical nature of XML, to denote the input and output documents.

Our demo consists of examples of queries in XQBE and shows how our prototype allows to switch between equivalent representations of the same query.

The diffusion of XML sets a pressing need for providing the capability to query

XML data to a wide spectrum of users, typically lacking in computer

program-ming skills This demonstration presents a user friendly interface, based on an

in-tuitive visual query language (XQBE, XQuery By Example), that we developed

for this purpose, inspired by the QBE [2] QBE showed that a visual interface to

a query language is effective in supporting the intuitive formulation of queries

when the basic graphical constructs are close to the visual abstraction of the

underlying data model Accordingly, while QBE is a relational query language,

based on the representation of tables, XQBE is based on the use of annotated

trees, to adhere to the hierarchical nature of XML XQBE was designed with the

objectives of being intuitive and easy to map directly to XQuery Our interface is

capable of generating the visual representation of many XQuery statements that

belong to a subset of XQuery, defined by our translation algorithm (sketched

later)

XQBE allows for arbitrarily deep nesting of XQuery FLWOR expressions,

construction of new XML elements, and restructuring of existing documents

However, the expressive power of XQBE is limited in comparison with XQuery,

which is Turing-complete The particular purpose of XQBE makes usability one

E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 848–850, 2004.

© Springer-Verlag Berlin Heidelberg 2004

Trang 18

XQBE: A Graphical Interface for XQuery Engines 849

Fig 1. A sample document (bib.xml)

of its critical success factors, and we considered this aspect during the whole

de-sign and implementation process Still from a usability viewpoint, our prototype

is a first step towards an integrated environment to support both XQuery and

XQBE, where users alternate between the XQBE and XQuery representations

XQBE is fully described in [1] Here we only introduce its basics by means of the

query (Q1) “List books published by Addison-Wesley after 1991, including their

year and title”, on the data in Figure 1 Its XQBE version is in Figure 2(a),

while its XQuery version is

A query is formed by a source part (on the left) and a construct part (on

the right) Both parts contain labelled graphs that express properties of XML

fragments: the source part describes the XML data to be matched, the construct

part specifies which are to be retained Correspondence between the two parts

is expressed by explicit bindings XML elements in the target are represented as

labelled rectangles, their attributes as black circles (with the name on the arc),

and their PCDATA as an empty circle In the construct part, the paths that

branch out of a bound node indicate which of its contents are to be retained In

Figure 2(a) the source part matches thebook elements with a year greater than

1991 and apublisher equal to “Addison-Wesley” The binding edge between the

book nodes states that the result shall contain as many book elements as those

matched The trapezoidalbib node means that all the generated books are to be

contained into onebib element

The translation process translates an XQBE query into a sentence of the

XQuery subset defined by the grammar in figure 3

The generated translation of Q1 is:

It is also possible to obtain the XQBE version of an XQuery statement The

automatically generated XQBE version of Q1 is shown in Figure 2(b)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 19

850 D Braga, A Campi, and S Ceri

Fig 2. The XQBE version of Q1(a) and the automatically generated XQBE for Q1(b).

Fig 3. EBNF specification of the XQuery subset expressible with XQBE

3 Conclusions

The contribution of our work is the availability of an environment in which users

can query XML data with a GUI, access the generated XQuery statement, and

also visualize the XQBE version of a large class of XQuery statements Moreover

they can modify any of the representations and observe the changes in the other

representation

References

D Braga and A Campi A graphical environment to query xml data with xquery.

In Proc of the 4th WISE, Roma (Italy), December 2003.

M M Zloof Query-by-example: A data base language IBM Systems Journal, 1977.

1.

2.

Trang 20

P2P-DIET: One-Time and Continuous Queries

in Super-Peer Networks

Stratos Idreos, Manolis Koubarakis, and Christos Tryfonopoulos

Dept of Electronic and Computer Engineering Technical University of Crete, GR73100 Chania, Greece

{sidraios,manolis,trifon}@intelligence.tuc.gr

1 Introduction

In peer-to-peer (P2P) systems a very large number of autonomous computing

nodes (the peers) pool together their resources and rely on each other for data

and services P2P systems are application level virtual or overlay networks that

have emerged as a natural way to share data and resources The main

applica-tion scenario considered in recent P2P data sharing systems is that of one-time

querying: a user poses a query (e.g., “I want music by Moby”) and the system

returns a list of pointers to matching files owned by various peers in the network

Then, the user can go ahead and download files of interest The complementary

scenario of selective dissemination of information (SDI) or selective information

push is also very interesting In an SDI scenario, a user posts a continuous query

to the system to receive notifications whenever certain resources of interest

ap-pear in the system (e.g., when a song of Moby becomes available) SDI can be as

useful as one-time querying in many target applications of P2P networks ranging

from file sharing, to more advanced applications such as alert systems for digital

libraries, e-commerce networks etc

At the Intelligent Systems Laboratory of the Technical University of Crete,

we have recently concentrated on the problem of SDI in P2P networks in the

context of project DIET(http://www.dfki.de/diet) Our work, summarized

in [3], has culminated in the implementation of P2P-DIET, a service that

uni-fies one-time and continuous query processing in P2P networks with

super-peers P2P-DIET is a direct descendant of DIAS, a Distributed Information

Alert System for digital libraries, that was presented in [4] P2P-DIET combines

one-time querying as found in other super-peer networks and SDI as proposed

in DIAS P2P-DIET has been implemented on top of the open source DIET

Agents Platform (http://diet–agents.sourceforge.net/) and it is available

at http://www.intelligence.tuc.gr/p2pdiet

A high-level view of the P2P-DIET architecture is shown in Figure 1(a) and

a layered view in Figure 1(b) There are two kinds of nodes: super-peers and

clients. All super-peers are equal and have the same responsibilities, thus the

E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 851–853, 2004.

© Springer-Verlag Berlin Heidelberg 2004

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Trang 21

852 S Idreos, M Koubarakis, and C Tryfonopoulos

Fig 1. The architecture and the layered view of P2P-DIET

super-peer subnetwork is a pure P2P network Each super-peer serves a fraction

of the clients and keeps indices on the resources of those clients.

Clients can run on user computers Resources (e.g., files in a file-sharing

application) are kept at client nodes, although it is possible in special cases

to store resources at super-peer nodes Clients are equal to each other only

in terms of download Clients download resources directly from the resource

owner client A client is connected to the network through a single super-peer

node, which is the access point of the client It is not necessary for a client

to be connected to the same access point continuously since client migration

is supported in P2P-DIET Clients can connect, disconnect or even leave from

the system silently at any time To enable a higher degree of decentralization

and dynamicity, we also allow clients to use dynamic IP addresses Routing of

queries (one-time or continuous) is implemented using minimum weight spanning

trees for the super-peer subnetwork After connecting to the network, a client

may publish resources by sending resource metadata to its access point, post an

one-time query to discover matching resources or subscribe with a continuous

query to be notified when resources of interest are published in the future A

user may download a file at the time that he receives a notification, or save it in

his saved notifications folder for future use Additionally a client can download

a resource even when he has migrated to another access point The feature of

stored notifications guarantees that notifications matching disconnected users

will be delivered to them upon connection If a resource owner is disconnected,

the interested client may arrange a rendezvous with the resource P2P-DIET

also offers the ability to add or remove super-peers Additionally, it supports a

simple fault-tolerance protocol based on are-you-alive messages Finally,

P2P-DIET provides message authentication and message encryption For the detailed

protocols see [5]

Trang 22

P2P-DIET: One-Time and Continuous Queries in Super-Peer Networks 853

The current implementation of P2P-DIET to be demonstrated supports the

model [4] and it is currently been extended to support [4] Each

super-peer utilises efficient query processing algorithms based on indexing of

re-source metadata and queries and a hierarchical organisation of queries (poset)

that captures query subsumption as in [1] A sophisticated index that exploits

commonalities between continuous queries is maintained at each super-peer,

en-abling the quick identification of the continuous queries that match incoming

resource metadata In this area, our work extends and improves the indexing

algorithms of SIFT [6] and it is reported in [2].

References

A Carzaniga and D.S Rosenblum and A.L Wolf Design and evaluation of a

wide-area event notification service ACM Transactions on Computer Systems, 19(3) :332–

383, August 2001.

C Tryfonopoulos and M Koubarakis Selective Dissemination of Information in

P2P Networks: Data Models, Query Languages, Algorithms and Computational

Complexity Technical Report TUC-ISL-02-2003, Intelligent Systems Laboratory,

Dept of Electronic and Computer Engineering, Technical University of Crete, July

2003.

M Koubarakis and C Tryfonopoulos and S Idreos and Y Drougas Selective

Infor-mation Dissemination in P2P Networks: Problems and Solutions ACM SIGMOD

Record, Special issue on Peer-to-Peer Data Management, K Aberer (editor), 32(3),

September 2003.

M Koubarakis and T Koutris and C Tryfonopoulos and P Raftopoulou

Informa-tion Alert in Distributed Digital Libraries: The Models, Languages and Architecture

of DIAS In Proceedings of the 6th European Conference on Research and Advanced

Technology for Digital Libraries (ECDL 2002), volume 2458 of Lecture Notes in

Computer Science, pages 527–542, September 2002.

S Idreos and M Koubarakis P2P-DIET: A Query and Notification Service Based

on Mobile Agents for Rapid Implementation of P2P Applications Technical Report

TUC-ISL-01-2003, Intelligent Systems Laboratory, Dept of Electronic and

Com-puter Engineering, Technical University of Crete, June 2003.

T.W Yan and H Garcia-Molina Index structures for selective dissemination of

information under the boolean model ACM Transactions on Database Systems,

Trang 23

A Hierarchical Storage and Archive Environment for

Multidimensional Array Database Management Systems

Bernd Reiner and Karl HahnFORWISS (Bavarian Research Center for Knowledge Based Systems)

Technical University Munich Boltzmannstr 3, D-85747 Garching b München, Germany

{reiner,hahnk}@forwiss.tu-muenchen.de

Abstract. The intention of this paper is to present HEAVEN, a solution of intelligent management of large-scale datasets held on tertiary storage systems.

We introduce the common state of the art technique storage and retrieval of

large spatio-temporal array data in the High Performance Computing (HPC)

area An identified major bottleneck today is fast and efficient access to and evaluation of high performance computing results We address the necessity of developing techniques for efficient retrieval of requested subsets of large datasets from mass storage devices Furthermore, we show the benefit of managing large spatio-temporal data sets, e.g generated by simulations of

climate models, with Database Management Systems (DMBS) This means

DBMS need a smart connection to tertiary storage systems with optimized access strategies HEAVEN is based on the multidimensional array DBMS RasDaMan.

1 Introduction

Large-scale scientific experiments often generate large amounts of multidimensional

data sets Data volume may reach hundreds of terabytes (up to petabytes) Typically,

these data sets are stored as files permanently in an archival mass storage system on

up to thousands of magnetic tapes The access times and/or transfer times of these

kinds of tertiary storage devices, even if robotically controlled, are relatively slow

Nevertheless, tertiary storage systems are currently the common state of the art

storing such large volumes of data Concerning data access in HPC area the main

disadvantages are high access latency compared to hard disk devices and to have no

direct access A major bottleneck for scientific application is the missing possibility of

accessing specific subsets of data If only a subset of such a large data set is required,

the whole file must be transferred from tertiary storage media Taking into account

the time required to load, search, read, rewind and unload several cartridges, it can

take many hours/days to retrieve a subset of interest from a large data set Entire files

must be loaded from the magnetic tape, even if only a subset of the file is needed for a

further processing The processing with data across a multitude of data sets, for

example, time slices is hard to support Evaluation of search criteria requires network

transfer of each required data set, implying sometimes a prohibitively immense

E Bertino et al (Eds.): EDBT 2004, LNCS 2992, pp 854–857, 2004.

© Springer-Verlag Berlin Heidelberg 2004

Ngày đăng: 21/01/2014, 08:20

TỪ KHÓA LIÊN QUAN