1. Trang chủ
  2. » Ngoại Ngữ

The Rosetta Model Can the Different Physical Science Data Models be Reconciled

22 2 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 22
Dung lượng 285,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

480 terms 198 classes and, 282 enumeration elements; Purpose: Resource Discovery, Resource Sharing, Arching, Content Classification.; Specification: Narrative and XML Schema Dublin Core:

Trang 1

DRAFT – The Rosetta Model: Can the Different Physical Science Data Models be

Reconciled? – DRAFT

Todd A King1 (tking@igpp.ucla.edu)Deborah L McGuinness2,3 (dlm@ksl.stanford.edu)Raymond J Walker1 (rwalker@igpp.ucla.edu)

Peter Fox4 (pfox@ucar.edu)

D Aaron Roberts5 (aaron.roberts@nasa.gov)Christopher Harvey6 (christopher.harvey@cesr.fr)

1 Insitute of Geophysics and Planetary Physics/UCLA, 2835 Slichter Hall, Los Angeles, CA 90095-1567, United States

2 Rensselaer Polytechnic Institute

3 Stanford University, 353 Serra Hall, Stanford, CA 94305, United States

4 UCAR, 1850 Table Mesa Drive, Boulder, CO 80305, United States

5 NASA/NSSDC, Code 692 NASA Goddard Space Flight Center, Greenbelt, MD 20771

6 Centre de Données de la Physique des Plasmas (CDPP), 18 avenue Edouard Belin, TOULOUSE 31 401, France

1 Abstract

There are a variety of data models in the physical sciences, some of which are in

overlapping domains Each of the data models have been derived in different ways Somehave been based on formal ontologies, others on informal ontologies and others on relational schemas An additional complication is that different international agencies have divided the physical science domains into different sub-domains leading to some confusion as to which data model to adopt The most prevalent data models in use today are the Planetary Data System (PDS), Space Physics Archive Search and Extract

(SPASE), Virtual Solar Terrestrial Observatory (VSTO), the International Virtual

Observatory Alliance (IVOA) and the Global Change Master Directory (GCMD) We take a comparative look at the various data models and ask the questions: Can they be reconciled? Is it possible to have a Rosetta Model to translate between each of the

models? What role can ontologies play in defining a Rosetta Model?

2 Descriptions and Metadata

There are many different information models and classification ontologies in use today Each is designed for a particular application Some are very general and others are tailored for a specific discipline Some of the most widely used are:

CAA: Cluster Active Archive Designed to support the archiving and distribution

of high quality calibrated data products from ESA's Cluster mission, using

an approach general enough to be applicable to other environments It has

a Mission, Observatory, Instrument hierarchy The recovered data & metadata is adequate for API use 480 terms (198 classes and, 282 enumeration elements); Purpose: Resource Discovery, Resource Sharing, Arching, Content Classification.; Specification: Narrative and XML Schema

Dublin Core: Originally designed for information resources (documents) and has

been expanded to include data, images, movies, and other types of resources 27 terms (15 core, 12 element types) Purpose: Resource Discovery (published works).; Specification: Narrative

Trang 2

IVOA: The International Virtual Observatory Alliance (IVOA) is a set of

standards to "facilitate the international coordination" of the "utilization ofastronomical archives as an integrated and interoperating virtual

observatory." Standards set by the IVOA include VOTable, VOResource, Unified Content Descriptor (UCD) 63 terms (6 categories, 57 terms) and

486 UCD terms for data classification Purpose: Resource Discovery (data, collections, services, and curation) and Content Classification.; Specification: Narrative and XML Schema

OAI-ORE: The Object Reuse and Exchange (ORE) activity of the Open Archives

Initiative (OAI) which is developing specifications that allow distributed repositories to exchange information about their constituent digital

objects The first release of the ORE specifications is scheduled for March

8, 2008 The OAI-ORE is distinct from the OAI-PMH (a protocol for exchanging metadata) – Conceptual only Purpose: Compound Object Description

PDS3: The Planetary Data System (PDS) is a data set nomenclature designed to

be consistent across discipline boundaries and standards for labeling data files Its intent is archive planetary science data and supporting

information to enable effective use and interpretation 14,458 terms (1643 elements and 81 objects 12,734 standard values (2,848 target names, 144 volume sets, 1,966 volumes and 1,370 data set IDs)) Purpose: Archiving; Specification: Narrative, ODL with PDS vocabulary

SPASE: The Space Physics Archive Search and Extract (SPASE) is a data model

designed for the Solar and Space Physics communities to unify the data environment to facilitate finding, retrieving, formatting, and obtaining basic information about data essential for research 340 terms (10 resourcetypes, 35 entities (containers), 30 enumerations, 55 attributes 265 items which are values used in enumeration (controlled lists)) Purpose:

Resource Discovery, Resource Sharing and Content Classification;

Specification: Narrative, XML Schema and XMI

SWEET: Semantic Web for Earth and Environmental Terminology (SWEET)

provides a common semantic framework for various Earth science

initiatives There are 17 ontologies consisting of biosphere,

human_activities, process, substance, data_center, material_thing,

property, sunrealm, data, numerics, sensor, time, earthrealm, phenomena, space, and units 3,940 terms (17 ontologies) Purpose: Reference Model; Specification: OWL

VSTO: Virtual Solar Terrestrial Observatory Originally designed as a set of

ontologies for organizing and integrating information spanning upper atmospheric terrestrial physics to solar physics Fundamental classes include instrument, observatory, data, and services Its upper level has been reused in other science areas including volcanology and plate

tectonics 407 terms (one ontology with 35 top-level classes) Purpose:

Trang 3

Resource Discovery, Resource Sharing, and Content Classification Specification: OWL

Trang 4

3 The Rosetta Model

• Data Structure (Digital)

o Catalog (record collection)

o Table (row, column)

Trang 5

4 Cluster Active Archive

Designed to support the archiving and distribution of high quality calibrated data

products from ESA's Cluster mission, using an approach general enough to be applicable

to other environments It has a Mission, Observatory, Instrument hierarchy The

recovered data & metadata is adequate for API use

From the Cluster Metadata Dictionary, Issue: 2, Date: May 4, 2006 Rev : 2

Metadata is information which describes a dataset It should be complete, that is, contain all the information required to read and interpret the bits (syntactic description), and to understand what the resulting numerical values (or bit strings) represent (semantic description), including how the data was obtained ; the latter information impacts upon the scientific significance of the data The purpose of the CAA Metadata Dictionary is to describe fully the required CAA metadata information, and to explain how that

information must be formatted so as to be exploitable by the generic software of Cluster Active Archive

There are 6 top-level CAA concepts or classes:

Mission This level contains information relevant to the whole mission

Observatory The Cluster mission consists of 4 observatories : Cluster-1, Cluster-2,

Cluster-3, and Cluster-4

Experiment The Cluster mission has 11 experiments, each identified by its Principal

Investigator, plus the auxiliary data Instrument The Cluster instruments are identified by Observatory and Experiment

Dataset Each instrument produces one or more datasets ; this level of metadata is

common to the whole of each dataset

Trang 6

Parameter A dataset contains one or more parameters, each of which has its own

metadataFile Each dataset is composed of ¯les, the number of which will grow regularly

with time during CAA

For CAA, there will be :

one block of metadata at the mission level (for the Cluster mission),

four blocks at the observatory level (Cluster-1, Cluster-2, Cluster-3, Cluster-4)

eleven blocks at the experiment level (one for each of the eleven instruments),

sixty blocks of metadata (listed on page 32) and the instrument level, plus

a further six blocks of metadata for the various auxiliary data products

To recover all the metadata relative to any one dataset it is necessary to know the relation between these blocks of metadata For example, when looking at the metadata associated with the CIS-1 instrument (CIS instrument on Spacecraft 1) it is necessary to know that this is associated with metadata concerning the Experiment CIS and the Observatory Spacecraft-1, and that these are associated with the Mission Cluster Linkage between thedifferent levels (illustrated by the arrows in Fig 1) is provided at each level by concept keywords included specially for this purpose

Overall Characteristics

Scope: 480 terms (198 classes, 282 enumeration elements)

Purpose: Resource Discovery, Resource Sharing, Arching, Content Classification.Specification: XML Schema

References

[CAA] Cluster Metadata Dictionary

http://caa.estec.esa.int/documents/DataD_V22.pdf

Trang 7

Simple Dublin Core

The Simple Dublin Core Metadata Element Set (DCMES) consists of 15 metadata

elements

Full information on element definitions and term relationships can be found in the DublinCore Metadata Registry [DCMR]

Qualified Dublin Core

Subsequent to the specification of the original 15 elements, an ongoing process to

develop exemplary terms extending or refining the Dublin Core Metadata Element Set (DCMES) was begun The additional terms were identified, generally in working groups

of the Dublin Core Metadata Initiative, and judged by the DCMI Usage Board to be in

Trang 8

conformance with principles of good practice for the qualification of Dublin Core

metadata elements

Element refinements make the meaning of an element narrower or more specific A refined element shares the meaning of the unqualified element, but with a more restricted scope The guiding principle for the qualification of Dublin Core elements, colloquially known as the Dumb-Down Principle, states that an application that does not understand a specific element refinement term should be able to ignore the qualifier and treat the metadata value as if it were an unqualified (broader) element While this may result in some loss of specificity, the remaining element value (without the qualifier) should continue to be generally correct and useful for discovery

DCMI also maintains a small, general vocabulary recommended for use within the element Type This vocabulary currently consists of 12 terms:

A value expressed using an encoding scheme may thus be a token selected from a

controlled vocabulary (e.g., a term from a classification system or set of subject headings)

or a string formatted in accordance with a formal notation (e.g., "2000-12-31" as the standard expression of a date) If an encoding scheme is not understood by an application,the value may still be useful to a human reader

Overall Characteristics

Scope: 27 terms (15 core, 12 element types) [5]

Purpose: Resource Discovery (published works)

Specification: Narrative

References

[DCMR] Dublin Core Official web site

http://dublincore.org/dcregistry/

Trang 9

[DCENC] Dublin Core Encoding Guidelines

http://dublincore.org/resources/expressions/

[DCXML] Guidelines for implementing Dublin Core in XML

http://dublincore.org/documents/abstract-model/

Trang 10

The IVOA Resource Metadata specification (VOResource) permits describing the

following attributes of a resource [VORES]:

Subject, Description, Source, ReferenceURL, Type, ContentLevel, Relationship, RelationshipID

Collection and service content metadata

Facility, Instrument, Coverage.Spatial, Coverage.RegionOfRegard, Coverage.Spectral, Coverage.Spectral.Bandpass,

Coverage.Spectral.MinimumWavelength, Coverage.Spectral.MaximumWavelength, Coverage.Temporal.StartTime, Coverage.Temporal.StopTime, Coverage.Depth, Coverage.ObjectDensity, Coverage.ObjectCount, Coverage.SkyFraction, Resolution.Spatial,

Resolution.Spectral, Resolution.Temporal, UCD, Format, RightsData quality metadata

DataQuality, ResourceValidationLevel, ResourceValidatedBy, Uncertainty.Photometric,

Uncertainty.Spatial, Uncertainty.Spectral, Uncertainty.TemporalService metadata

Service.AccessURL, Service.InterfaceURL, Service.BaseURL, Service.HTTPResultsMIMEType, Service.StandardID,

Service.MaxSearchRadius, Service.MaxReturnRecords, Service.MaxReturnSize

Unified Content Descriptors

Unified Content Descriptors (UCD) is a formal vocabulary for astronomical data that is controlled by the International Virtual Observatory Alliance (IVOA) The vocabulary is restricted in order to avoid proliferation of terms and synonyms, and controlled in order

to avoid ambiguities A UCD is used to classify a token of information For example, it may be used to identify the type of information in a field of a table or a tagged value in metadata description [VOUCD]

Trang 11

All existing UCD1+ words are grouped into 12 main categories These categories

are expressed by the first atom of the word, whose possible values are:

8 pos (positional data)

9 spect (spectral data)

10 src (source)

11 stat (statistics)

12 time (time)

VOTable

The VOTable format is an XML standard for the interchange of data represented as a set

of tables [VOTAB] It extends the HTML Table specification by adding metadata to describe the contents of the table This includes the data type, units and classification of the contents of each field in a table The VOTable format also permits encode binary data

to be included in the table or reference external streams of binary data

Overall Characteristics

Scope: 63 terms (6 categories, 57 terms) and

486 UCD terms for data classification

Purpose: Resource Discovery (data, collections, services, and curation)

Trang 12

[VOASTR] Ontology of Astronomical Object Types, Version 1.0, IVOA Working Draft

2007 Feb 19

http://www.ivoa.net/Documents/WD/Semantics/AstrObjectOntology-20070219.pdf

Trang 13

7 OAI-ORE

The Object Reuse and Exchange (ORE) activity of the Open Archives Initiative (OAI) which is developing specifications that allow distributed repositories to exchange

information about their constituent digital objects The first release of the ORE

specifications is scheduled for March 8, 2008 The ORE is distinct from the PMH (a protocol for exchanging metadata)

OAI-Excerpts from the Object Reuse and Exchange white paper [OAIORE]

Compound information objects are aggregations of distinct information units that when combined form a logical whole Some examples of these are a digitized book that is an aggregation of chapters, where each chapter is an aggregation of scanned pages; a music album that is the aggregation of several audio tracks; an image object that is the

aggregation of a high quality master, a medium quality derivative and a low quality thumbnail; a scholarly publication that is aggregation of text and supporting materials such as datasets, software tools, and video recordings of an experiment; and a multi-page web document with an HTML table of contents that points to multiple interlinked HTML individual pages If we consider all information objects reusable in multiple contexts (a notable feature of networked information), then the aggregation of a specific information unit into a compound object is not due to the inherent nature of the information unit, but the result of the intention of the human author or machine agent that composed the compound object

Research in the Semantic Web community has introduced the notion of named graphs[5], which are essentially a set of RDF assertions, forming a graph, to which a URI is

assigned The graph as a whole then can be treated as a web resource, and assertions such as metadata statements, authority, etc can be associated with that resource These ideas are very promising as an approach to expressing the notion of a compound object

on the web However, they remain in a research phase, and need further specification in order to become adoptable as part of an implementable interoperability specification Our proposals described later in this document build on this notion of a named graph

A core goal of OAI-ORE – Object Reuse and Exchange – is to develop standardized, interoperable, and machine-readable mechanisms to express compound object

information on the web The OAI-ORE standards will make it possible for web clients and applications to reconstruct the logical boundaries of compound objects, the

relationships among their internal components, and their relationships to the other

resources in the web information space This will provide the foundation for the

development of value-adding services for analysis, reuse, and re-composition of

compound objects, especially in the areas of e-Science, e-Scholarship, and scholarly communication, which are the target applications of ORE

To enable widespread adoption of the standards developed by OAI-ORE we have

determined that they must be congruent with and leverage the Web Architecture This architecture essentially consists of:

Ngày đăng: 18/10/2022, 04:14

w