1. Trang chủ
  2. » Ngoại Ngữ

Preservation and Transition of NCSTRL Using an OAI-Based Architecture

3 5 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 3
Dung lượng 153 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Shivakumar Virginia Tech Blacksburg, Virginia USA {fox,pshivaku}@vt.edu Abstract NCSTRL Networked Computer Science Technical Reference Library is a federation of digital libraries provi

Trang 1

Preservation and Transition of NCSTRL Using an

OAI-Based Architecture

H Anan, X Liu, K Maly, M.

Nelson, M Zubair

Old Dominion University

Norfolk, Virginia USA

{anan,liu_x,maly,nelso_m,zubair}

@cs.odu.edu

J C French

University of Virginia Charlottesville, Virginia

USA french@cs.virginia.edu

E Fox, P Shivakumar

Virginia Tech Blacksburg, Virginia USA {fox,pshivaku}@vt.edu

Abstract

NCSTRL (Networked Computer Science

Technical Reference Library) is a federation of

digital libraries providing computer science

materials The architecture of the original

NCSTRL was based largely on the Dienst

software It was implemented and maintained by

the digital library group at Cornell University

until September 2001 At that time, we had an

immediate goal of preserving the existing

NCSTRL collection and a long-term goal of

providing a framework where participating

organizations could continue to disseminate

technical publications Moreover, we wanted the

new NCSTRL to be based on OAI (Open

Archives Initiative) principles that provide a

framework to facilitate the discovery of content

in distributed archives In this paper, we describe

our experience in moving towards an OAI-based

NCSTRL

Introduction

NCSTRL (http://www.ncstrl.org), organized and

supported at Cornell University, has been a

successful digital library (DL) in operation from

1994-2001 with over 100 international

participants and over 20,000 digital objects [1]

However, recent changes in the publication

paradigm for scientific material and realignments

of Cornell's DL research interests have caused

Cornell to cease coordinating operations of

NCSTRL This fact, along with the widening

acceptance of OAI [2], motivated us to look at an

alternative architecture to preserve and sustain

NCSTRL Besides the immediate goal of

preserving the old NCSTRL collection, we had a

long-term goal to support existing NCSTRL

collections by making them OAI compliant,

possibly with new large collections at the department/organization level based on e-prints software (www.eprints.org), and individual publishers using Kepler software (http://kepler.cs.odu.edu; [3]) to create small OAI compliant repositories (Figure 1)

Preserving Existing Collections

We first extracted both the metadata and data from the existing Dienst servers and ftp sites This process, including cleaning of metadata, was automated by writing scripts Next we provided an OAI wrapper around the extracted metadata enabling it to be harvested by the new NCSTRL search service The extracted documents and their metadata are currently being kept at Virginia Tech while the NCSTRL search/ browse service is being hosted at Old Dominion University

/ (

New NCSTRL Collections

at Large Organization University OAI Eprints Software)Compliant Repository

Old NCSTRL Collections with OAI Layer

Manual Registration Service for NCSTRL OAI Compliant Repository

Automated LDAP Based Registration Service for Kepler Archivelets

NCSTRL Search Service (Arc like Service)

Individual Publisher OAI Compliant Repository (KeplerArchivelet)

Figure 1 OAI based NCSTRL vision

Search Service

We implemented the NCSTRL search service based on the architecture of the Java servlet-based Arc (http://arc.cs.odu.edu; [4]) with an Oracle database in the backend The architecture

is platform independent and can work with any web server Moreover, minimal changes are

Trang 2

required to work with different relational

databases such as MySQL

The search service provides means to retrieve

documents by their metadata It supports both

simple and advanced search as well as result

sorting by archive or by discovery date

Simple search allows users to search free text

across archive contents Advanced search allows

users to search in specific metadata fields Users

also can search/browse specific archives and/or

archive partitions in case they are familiar with

specific data providers Author, title, and abstract

search are based on user input; the input can use

Boolean operators (Figure 2)

Repository Service

The repository stores the metadata for the

documents Currently Dublin Core (DC) is used

in representing the metadata The actual

documents are stored independently in the

providers' archives and URLs are provided in the

metadata records

The metadata fields are stored in an indexed

Oracle database that provides fast search

capabilities through the metadata sets

Figure 2 NCSTRL search service

Harvester Service

When harvesting metadata, some of the archives

such as CNRI (Corporation for National

Research Initiatives) and LTRS (NASA Langley

Technical Reports Server) were already

OAI-compliant, which facilitated harvesting and

collecting the metadata However, most of the

archives were not OAI compliant Other

protocols such as Dienst were available on these

archives to enable collecting the metadata, and,

where available the actual documents To

provide a historical snapshot of NCSTRL at the time of conversion from the Dienst-based operation, we developed a system that allowed collecting data from these archives, providing transformation and filtering tools The result then was established as an OAI service provider that

is used in the Arc-powered search service

Conclusion and Future Work

We have begun the initial steps for the conversion of the NCSTRL digital library, replacing Dienst with an OAI infrastructure We have completed the capture and the preservation

of the content that was embedded within the Dienst installations We believe the OAI framework of NCSTRL will result in more individuals and institutions participating in NCSTRL, as well as make for a simpler and easier to maintain DL

During the first phase of the NCSTRL project,

we have moved the old NCSTRL collection into

a new OAI based architecture The more difficult phase of converting existing publication paradigms used by NCSTRL data providers (for serving their ongoing publication collections) lies ahead The issues we face can be partitioned into technical and logistic ones The technical issues include handling metadata that is richer than DC, providing mirror sites, archiving, metadata normalization, caching, and handling of web crawlers We have clear ideas on solving these problems; it is a matter of implementing solutions The logistic issues involve site management, getting faculty to accept publication tools, finding funding for maintenance, and managing code evolution These are mostly unexplored and open for comment from the DL community

References

1.Davis, J R & Lagoze, C (2000) NCSTRL: design and deployment of a globally distributed digital library Journal of the American Society for Information Science, 51(3), 273-280.

2.Lagoze, C & Van de Sompel, H (2001) The Open Archives Initiative: building a low-barrier interoperability framework Proceedings of the First ACM/IEEE Joint Conference on Digital Libraries (pp 54-62), Roanoke, VA.

3.Liu, X., Maly, K., Zubair, M & Nelson, M L (2001) Arc - An OAI service provider for digital library federation D-Lib Magazine, 7(4).

4.Maly, K., Zubair, M & Liu, X (2001) Kepler

-An OAI Data/Service Provider for the Individual D-Lib Magazine, 7(4).

Ngày đăng: 18/10/2022, 14:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w