Shivakumar Virginia Tech Blacksburg, Virginia USA {fox,pshivaku}@vt.edu Abstract NCSTRL Networked Computer Science Technical Reference Library is a federation of digital libraries provi
Trang 1Preservation and Transition of NCSTRL Using an
OAI-Based Architecture
H Anan, X Liu, K Maly, M.
Nelson, M Zubair
Old Dominion University
Norfolk, Virginia USA
{anan,liu_x,maly,nelso_m,zubair}
@cs.odu.edu
J C French
University of Virginia Charlottesville, Virginia
USA french@cs.virginia.edu
E Fox, P Shivakumar
Virginia Tech Blacksburg, Virginia USA {fox,pshivaku}@vt.edu
Abstract
NCSTRL (Networked Computer Science
Technical Reference Library) is a federation of
digital libraries providing computer science
materials The architecture of the original
NCSTRL was based largely on the Dienst
software It was implemented and maintained by
the digital library group at Cornell University
until September 2001 At that time, we had an
immediate goal of preserving the existing
NCSTRL collection and a long-term goal of
providing a framework where participating
organizations could continue to disseminate
technical publications Moreover, we wanted the
new NCSTRL to be based on OAI (Open
Archives Initiative) principles that provide a
framework to facilitate the discovery of content
in distributed archives In this paper, we describe
our experience in moving towards an OAI-based
NCSTRL
Introduction
NCSTRL (http://www.ncstrl.org), organized and
supported at Cornell University, has been a
successful digital library (DL) in operation from
1994-2001 with over 100 international
participants and over 20,000 digital objects [1]
However, recent changes in the publication
paradigm for scientific material and realignments
of Cornell's DL research interests have caused
Cornell to cease coordinating operations of
NCSTRL This fact, along with the widening
acceptance of OAI [2], motivated us to look at an
alternative architecture to preserve and sustain
NCSTRL Besides the immediate goal of
preserving the old NCSTRL collection, we had a
long-term goal to support existing NCSTRL
collections by making them OAI compliant,
possibly with new large collections at the department/organization level based on e-prints software (www.eprints.org), and individual publishers using Kepler software (http://kepler.cs.odu.edu; [3]) to create small OAI compliant repositories (Figure 1)
Preserving Existing Collections
We first extracted both the metadata and data from the existing Dienst servers and ftp sites This process, including cleaning of metadata, was automated by writing scripts Next we provided an OAI wrapper around the extracted metadata enabling it to be harvested by the new NCSTRL search service The extracted documents and their metadata are currently being kept at Virginia Tech while the NCSTRL search/ browse service is being hosted at Old Dominion University
/ (
New NCSTRL Collections
at Large Organization University OAI Eprints Software)Compliant Repository
Old NCSTRL Collections with OAI Layer
Manual Registration Service for NCSTRL OAI Compliant Repository
Automated LDAP Based Registration Service for Kepler Archivelets
NCSTRL Search Service (Arc like Service)
Individual Publisher OAI Compliant Repository (KeplerArchivelet)
Figure 1 OAI based NCSTRL vision
Search Service
We implemented the NCSTRL search service based on the architecture of the Java servlet-based Arc (http://arc.cs.odu.edu; [4]) with an Oracle database in the backend The architecture
is platform independent and can work with any web server Moreover, minimal changes are
Trang 2required to work with different relational
databases such as MySQL
The search service provides means to retrieve
documents by their metadata It supports both
simple and advanced search as well as result
sorting by archive or by discovery date
Simple search allows users to search free text
across archive contents Advanced search allows
users to search in specific metadata fields Users
also can search/browse specific archives and/or
archive partitions in case they are familiar with
specific data providers Author, title, and abstract
search are based on user input; the input can use
Boolean operators (Figure 2)
Repository Service
The repository stores the metadata for the
documents Currently Dublin Core (DC) is used
in representing the metadata The actual
documents are stored independently in the
providers' archives and URLs are provided in the
metadata records
The metadata fields are stored in an indexed
Oracle database that provides fast search
capabilities through the metadata sets
Figure 2 NCSTRL search service
Harvester Service
When harvesting metadata, some of the archives
such as CNRI (Corporation for National
Research Initiatives) and LTRS (NASA Langley
Technical Reports Server) were already
OAI-compliant, which facilitated harvesting and
collecting the metadata However, most of the
archives were not OAI compliant Other
protocols such as Dienst were available on these
archives to enable collecting the metadata, and,
where available the actual documents To
provide a historical snapshot of NCSTRL at the time of conversion from the Dienst-based operation, we developed a system that allowed collecting data from these archives, providing transformation and filtering tools The result then was established as an OAI service provider that
is used in the Arc-powered search service
Conclusion and Future Work
We have begun the initial steps for the conversion of the NCSTRL digital library, replacing Dienst with an OAI infrastructure We have completed the capture and the preservation
of the content that was embedded within the Dienst installations We believe the OAI framework of NCSTRL will result in more individuals and institutions participating in NCSTRL, as well as make for a simpler and easier to maintain DL
During the first phase of the NCSTRL project,
we have moved the old NCSTRL collection into
a new OAI based architecture The more difficult phase of converting existing publication paradigms used by NCSTRL data providers (for serving their ongoing publication collections) lies ahead The issues we face can be partitioned into technical and logistic ones The technical issues include handling metadata that is richer than DC, providing mirror sites, archiving, metadata normalization, caching, and handling of web crawlers We have clear ideas on solving these problems; it is a matter of implementing solutions The logistic issues involve site management, getting faculty to accept publication tools, finding funding for maintenance, and managing code evolution These are mostly unexplored and open for comment from the DL community
References
1.Davis, J R & Lagoze, C (2000) NCSTRL: design and deployment of a globally distributed digital library Journal of the American Society for Information Science, 51(3), 273-280.
2.Lagoze, C & Van de Sompel, H (2001) The Open Archives Initiative: building a low-barrier interoperability framework Proceedings of the First ACM/IEEE Joint Conference on Digital Libraries (pp 54-62), Roanoke, VA.
3.Liu, X., Maly, K., Zubair, M & Nelson, M L (2001) Arc - An OAI service provider for digital library federation D-Lib Magazine, 7(4).
4.Maly, K., Zubair, M & Liu, X (2001) Kepler
-An OAI Data/Service Provider for the Individual D-Lib Magazine, 7(4).