TRIP PLANNING USE CASE

Một phần của tài liệu Strydom m AI and big datas potential for disruptive innovation 2020 (Trang 143 - 148)

Obviously, tourists browse multiple sources and dispersed information, from different systems to make prior decisions about their trip. Thus, they are exposed to an overwhelming amount of information that prevent them from choosing the most suitable travel destination, mode and activities. For that, building intelligent systems aiming to assist them in the decision- making process and enables them to acquire additional and detailed knowledge, is desirable to support them during their trip planning by providing more prominent and better-tailored destination-specific information.

To this end, the authors aim to build an ontology-based integration framework for Open Tourism Data to provide an integrated view across these complex and heterogeneous data sources. This integration framework will allow developing reliable and efficient cross-language tourism-related applications able to expose the touristic interest points that are most

attractive to a particular tourist and support him in its travel planning scenario.

As a typical use case, the authors aim to combine multiple Open Tourism Data sources from several domains to provide integrated, consistent and representative information on tourists’ points of interest for a specific

destination (Ile-de-France French Region). This trip planning scenario aims to provide a high-level representation of the possible points of interest of a tourist for this specific destination. For the purpose of gathering the major possible points of interests of a tourist during a trip planning process for the

Ile-de-France region, the authors consider the following cross-domain heterogeneous data sources as depicted in Table 2.

Table 2. Multiple data sets gathered from cross-domain heterogeneous data sources

Data sets Format Domain

DBpedia

Large-scale multilingual encyclopedic data-set able to link other data sets to make them available on the Web.

Allows creating a corpus for different fields through

sophisticated queries (SPARQL query) about any topic available in Wikipedia.

Considered as a nucleus for a web of Open Data (Auer et al., 2007) and allows the

development of cross-domain applications (Musto, Lops, de Gemmis, & Semeraro, 2017).

RDF Cross-Domains

Schema.org

Collaborative, community activity with a mission to create, maintain, promote and to share schemas for structured data on web (Guha, Brickley, &

MacBeth, 2015).

Resides in providing a single integrated schema covered a variety of topics that included persons, places, events,

RDF Cross-Domains

products, offers, and so on.

The schemas are a

hierarchically set of types, each associated with a set of

properties (597 types, 867

properties, and 114 Enumeration values).

French communal administrative division3 Data is provided by OpenStreetMap project, the mapping wiki that creates and provides free ODbL-licensed geographic data.

Contains all the French communes and boroughs given the following attributes:

- Insee: INSEE code with 5 characters of the

commune.

- Name: name of the municipality (as shown in OpenStreetMap).

- Wiki: Wikipedia entry (language code followed by the article name).

- Surf_ha: area in hectares of the municipality.

CSV Geo-Positions

French Tourist accommodations4

Comes from a certified public service (National Institute of Statistics and Economic Studies - INSEE).

Provides statistics on tourist accommodation capacities given

XLS Accommodations

the following attributes:

- Name, company name, address, Tel, postal code.

- City, longitude, longitude, Email, category, number of rooms.

- Method of payment, pricing, policies.

Geo-located Events5 Open Agenda is a calendar platform for event organizers:

festivals, theaters, concerts, exhibitions, exhibitions, conferences, etc.

Presents events with their precise geographic coordinates and detailed time slots given the following attributes:

- Title, organization, start date, event type.

- Event place, theme, style, genre.

- Longitude, latitude, time slot.

JSON Attractions

Geographical positions of RATP network stations6

Comes from a certified public service, namely RATP (Régie autonome des transports parisiens).

RATP provides all urban and interurban public transport modes: rail, metro, tram and bus.

Data is updated automatically approximately every 15 days

CSV Geo-Positions Transport

and includes all transport offers in Paris.

PASSIM offers, services and transport data in France7 Comes from a public service certified and supported by the Ministry of Sustainable

Development.

Lists and describes the offers of transport services and the information services useful for traveling in France, for all modes of transport given the following attributes:

- Company name, operator, and type of transport.

- Mode of transport, pricing, reservation URL.

CSV Transport

Location of Wi-Fi hotspots8

Comes from a certified public service (Mairie de Paris) and lists sites (localizations) with a Wi-Fi hotspot for a free internet connection, according to the following attributes:

- Id ArcGIS, city, latitude, longitude.

- Quota user max bytes and quota user max duration.

JSON Infrastructure

Population9

Gives access to the results of population censuses, time series of the INSEE Macroeconomic Data Bank.

HTML Demographic

Provides statistics on the subject of population and other data, given the following

attributes:

- Region code, region name, code department.

- department name, municipal population.

Open Food Facts10 Lists food products around the world.

Collects Food information (photos, ingredients, nutritional composition, etc.) and made it available to everyone for any purpose.

RDF Nutrition

Deployment Architecture

The proposed ontology-based integration framework for Open Tourism Data has been built over the Cloudera Distribution Hadoop platform which is an open source Apache Hadoop distribution. This offers a scalable,

flexible and integrated platform to rapidly manage a massive amount of data. More specifically, the proposed integration framework is implemented on fully-distributed mode over a cluster of 10 slave machines and one

master machine. Each one is equipped with 3.4 GHz Intel(R) Core i3(R) with 4 GB memory, Java 1.8 and Ubuntu 14.04 LTS.

Besides, the resources decomposition strategy leverages HBase, which is a No-SQL column-oriented key/value data store built to be run on top of the Hadoop Distributed File System (HDFS). It supports random, real-time read/write access and allows us to use a larger in-memory cache which effectively reduces the execution time and improves the performance of the ontology-based integration framework.

Một phần của tài liệu Strydom m AI and big datas potential for disruptive innovation 2020 (Trang 143 - 148)

Tải bản đầy đủ (PDF)

(775 trang)