1. Trang chủ
  2. » Luận Văn - Báo Cáo

Challenges of metadata migration in digital repository a case study of the migration of duo to dspace at the university of oslo library m a

98 0 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Challenges of Metadata Migration in Digital Repository: A Case Study of the Migration of DUO to Dspace at the University of Oslo Library
Tác giả Do Van Chau
Người hướng dẫn Dr. Michael Preminger
Trường học Oslo University College
Chuyên ngành Digital Library Learning
Thể loại Master thesis
Năm xuất bản 2011
Thành phố Oslo
Định dạng
Số trang 98
Dung lượng 1,49 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • CHAPTER 1: INTRODUCTION (10)
    • 1.1 Background (10)
    • 1.2 Problem statement (11)
    • 1.3 The aim of the study and the research questions (12)
    • 1.4 Research methodology (13)
    • 1.5 Scope of the study (13)
    • 1.6 Thesis outline (13)
  • CHAPTER 2: LITERATURE REVIEW (15)
    • 2.1 Metadata issues in institutional repository (15)
      • 2.1.1 Define institutional repository (15)
      • 2.1.2 Metadata quality issues in IRS (16)
      • 2.1.3 Metadata interoperability in IRs (18)
    • 2.2 Metadata conversion in IRs from methodological point of view (19)
      • 2.2.1 The crosswalk at schema level (19)
      • 2.2.2 Record conversion at record level (21)
    • 2.3 Practices of metadata conversion in IRs (22)
    • 2.4 Semantic mapping of metadata in crosswalk (27)
      • 2.4.1 Define semantic mapping (27)
      • 2.4.2 Types of similarity/correspondences among schemata elements in semantic mappings 27 (27)
      • 2.4.3 Practice of semantic mapping in crosswalk (29)
    • 2.5 The challenges in metadata conversion (30)
  • CHAPTER 3: RESEARCH METHODOLOGY (35)
    • 3.1 Methodology (35)
      • 3.1.1 Structured interview (35)
      • 3.1.2 The crosswalk (36)
    • 3.2 Sampling technique (39)
    • 3.6 Limitations of the research (43)
    • 3.7 Ethical consideration (43)
  • CHAPTER 4: DATA ANALYSIS AND FINDINGS (44)
    • 4.1 The analysis of data collected by online questionnaires (44)
      • 4.1.1 Strategy of converting DUO metadata elements to Dspace at UBO (45)
      • 4.1.2 The usage of metadata elements in Dspace (51)
      • 4.1.3 Challenges in metadata conversion from DUO to Dspace (55)
    • 4.2 Harmonization of metadata elements in DUO and Dspace (58)
    • 4.3 The crosswalk of metadata elements in DUO and default Dublin Core in Dspace (63)
    • 4.4 Findings of the study (66)
      • 4.4.1 Strategy for converting metadata elements in DUO to Dspace (66)
      • 4.4.2 Challenges of metadata conversion from DUO to Dspace (68)
  • CHAPTER 5: CONCLUSION AND RECOMMENDATION (69)
    • 5.1 Treatment of research questions (0)
      • 5.1.1 What is the appropriate strategy to convert metadata elements from DUO database to (69)
      • 5.1.2 In light of various issues experienced in previous metadata conversion projects at (72)
    • 5.2 Recommendations (74)
    • 5.3 Further research (76)
  • APPENDIX 1: TABLES DESCRIPTIONS OF DUO (University of Oslo Library) (83)
  • APPENDIX 2: DEFAULT DUBLIN CORE METADATA REGISTRY IN DSPACE (ver.1.5.2) (88)
  • APPENDIX 3: DUBLIN CORE METADATA INITIATIVE - DUBLIN CORE QUALIFIERS (91)
  • APPENDIX 4: THE INTRODUCTION LETTER (93)
  • APPENDIX 5: THE ONLINE QUESTIONNAIRE (94)

Nội dung

INTRODUCTION

Background

Metadata in digital institutional repositories (IRs) is a crucial aspect addressed by both research and practical communities, with the National Information Standards Organization (NISO) providing a formal definition According to NISO's 2004 publication, metadata is "structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource" and is often referred to as "data about data" or "information about information." There are three primary types of metadata: descriptive, structural, and administrative Metadata plays vital roles in resource discovery, organizing electronic resources, ensuring interoperability, digital identification, and supporting archiving and preservation efforts.

Park (2009) conducted an insightful study on the current research and practices regarding metadata quality in Institutional Repositories (IRs) Her critical review highlights key issues affecting metadata quality, including inconsistency, incompleteness, and inaccuracy of metadata elements, which can impact the discoverability and organization of digital collections This analysis underscores the importance of improving metadata standards to enhance the effectiveness and reliability of IRs for users and researchers alike.

Digital repositories face multi-level challenges related to policy and quality interoperability, including organizational, semantic, and technical dimensions (Vullo, Innocenti, & Ross, 2010) Despite the recognition of these challenges, there is currently no comprehensive solution that fully addresses the needs of digital library organizations and systems (Vullo, Innocenti, & Ross, 2010) According to NISO (2004), these interoperability issues remain a significant barrier to achieving seamless integration and data exchange across digital repositories Ensuring high-quality metadata in IRs is crucial for improving discoverability, access, and long-term preservation of digital content Addressing these interoperability challenges is essential for advancing the effectiveness and sustainability of digital libraries.

Interoperability refers to the ability of diverse systems with different hardware, software platforms, data structures, and interfaces to exchange data seamlessly with minimal loss of content and functionality According to NISO (2004), achieving interoperability involves the use of defined metadata schemes, shared transfer protocols, and crosswalks between schemes Two primary approaches for implementing interoperability in repositories are cross-system search via the Z.39.50 protocol and metadata harvesting using the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) These methods facilitate efficient data exchange and integration across heterogeneous systems, enhancing the accessibility and interoperability of digital repositories.

Chan and Zeng (2006) highlighted the proliferation of metadata schemas in Institutional Repositories (IRs), each designed to meet the specific needs of different user communities and subject domains They proposed various methods to facilitate metadata conversion and exchange across diverse schemas, enhancing interoperability at the repository, schema, and record levels At the repository level, efforts focus on mapping value strings for cross-collection searching; at the schema level, creating communication among metadata elements through techniques like derivation, crosswalks, and application profiles; and at the record level, integrating records via conversion and data reuse to generate new records by combining existing data These strategies significantly improve metadata interoperability, promoting more efficient resource discovery and data management in IRs.

Several key projects have been undertaken globally to enhance interoperability across various Institutional Repositories (IRs), including the conversion project at France's Energy and Environmental Information Resources Centre, the Metadata Repository initiative at the National Science Digital Library, the migration project at the University of Sydney Repository, and the crosswalking efforts at Drexel University's Internet Public Library These initiatives aim to improve data exchange and integration among diverse IR systems and will be explored in detail in Chapter 2.

Problem statement

DUO (abbreviated from Norwegian name “DigitaleutgivelservedUiO”) is a digital Institutional Repository at the University of Oslo (UiO), Norway DUO was developed in

2000 in cooperation between University Centre for Information Technology (USIT) and the

University of Oslo Library (UBO) Today, DUO includes electronic versions of theses, special assignments, doctoral dissertations and articles from UiO

Since 2010, UBO has implemented Dspace as a new platform for DUO migration to replace its obsolete system, aiming to develop an open digital archive for the University of Oslo's digital content The migration project involves three key subprojects: Student Communication, Communication Research, and Communication Media, requiring careful metadata conversion from the legacy system to Dspace DUO's original database used custom metadata elements tailored to user needs, while Dspace employs the standardized Dublin Core Metadata Set, making metadata mapping essential According to Woodley (2008), successful conversion involves mapping structural elements between systems, but differences in granularity and data definitions pose challenges Therefore, a strategic approach to metadata mapping is critical before executing the DUO-to-Dspace migration.

The aim of the study and the research questions

This study aims to identify challenges in metadata conversion, with a specific focus on DUO as a case study, to develop an effective strategy for converting DUO metadata elements to DSpace during the UBO migration project To accomplish this, the research addresses two key questions: what are the primary obstacles in the metadata conversion process, and how can these challenges be effectively mitigated to ensure a smooth transition from DUO to DSpace?

Research question 1: What is the appropriate strategy to convert metadata elements from DUO database to Dspace in light of current practices and the research available in this field?

Metadata conversion from the DUO database to DSpace presents several key challenges, including data inconsistency, complexity of schema mapping, and handling legacy data structures Additionally, issues specific to DUO, such as proprietary formats and incomplete records, complicate the transformation process Ensuring data integrity and maintaining metadata standards during conversion require careful planning and robust technical solutions These challenges highlight the need for tailored strategies to facilitate a smooth and accurate migration from DUO to DSpace.

Research methodology

This study investigates the DUO migration project at UBO, utilizing two data collection techniques: structured interviews and crosswalk analysis A bilingual questionnaire, consisting of open-ended and closed-ended questions in English, is delivered through the web-based survey platform SurveyMonkey to gather insights from project informants The collected data are qualitative, capturing participants' opinions and experiences regarding various research issues Subsequently, a constant comparative analysis method is applied to interpret the data, ensuring a thorough understanding of the informants' perspectives within the context of the DUO project.

This study critically reviews previous research and projects on metadata conversion in Institutional Repositories (IRs) to establish a solid theoretical and practical foundation It compares the structure and semantics of metadata elements used in DUO and Dspace to develop an effective metadata crosswalk between the two systems This process helps identify and resolve conflicts in metadata elements, ensuring smoother interoperability and more accurate data conversion.

Scope of the study

This study focuses on the strategies for metadata conversion at the schema level and the challenges encountered during the DUO migration project at UBO It emphasizes defining semantic mappings of metadata elements rather than just matching individual values, ensuring a more accurate and meaningful transfer of metadata Due to time and technical constraints, the research does not include experiments on the conversion of metadata elements and their values at the record level, prioritizing a conceptual approach over practical testing.

Otherwise, only informants involved in DUO migration project are consulted for this study.

Thesis outline

The content of thesis is presented in five chapters in addition to table of content, figures and tables, reference and appendices

Chapter 1 presents the background and research problem statement as well as the aim of the study and research questions, brief introduction of research methodology and scope of study

Chapter 2 gives a review of recent studies about various issues related to the topic of thesis such as metadata quality issues in IRs, metadata conversion in theories and practices in IRs, semantic mapping of metadata schemata and conflicts in crosswalk

Chapter 3 provides the justification of methods used in the research and the explanations of the ways these methods are going to be implemented to collect and analyze data

Chapter 4 deals with the data collected by data analysis and discussions Afterwards, findings of the research are summarized

Chapter 5 presents the conclusions and recommendations for the research It revisits the research questions set up in the beginning and lays out suggestions to solve the research issues and further studies related to topic.

LITERATURE REVIEW

Metadata issues in institutional repository

Institutional repositories have become a vital infrastructure for scholarly activities in universities worldwide, as evidenced by the thousands of IRs listed in the Directory of Open Access Repositories (DOAR) According to Lynch (2003), IRs are defined as a set of services offered by universities to their communities for managing and disseminating digital materials created by the institution and its members.

Heery and Anderson (2005) developed a typology that provides a helpful framework for exploring IRs, as presented in Figure 1 below:

Figure 2.1: Typology of IRs (Heery and Anderson, 2005, p.17)

This framework presents four main focus of IRs including content, coverage, users and functionality

2.1.2 Metadata quality issues in IRS

Ensuring metadata quality in information repositories primarily relies on consistency Bruce and Hillman (2004) emphasized the importance of implementing metadata elements in accordance with standard definitions and related domain concepts They also highlighted that presenting metadata consistently to users enhances usability and understanding, thereby improving the overall quality of information retrieval systems.

Park (2009) has defined the most common criteria for quality of metadata in institutional repository including completeness, accuracy and consistency

The completeness of metadata elements is assessed by their full access capability to individual local objects and their connections to parent local collections This highlights the primary function of metadata in facilitating resource discovery and usability, ensuring users can efficiently locate and utilize digital resources (Park, 2009, p.8).

Zeng and Qin (2008) emphasize that each project should establish customized analysis criteria aligned with its metadata system's functional requirements This approach is essential for effectively evaluating the completeness of metadata functions within the system, ensuring that all necessary features are adequately implemented Tailoring evaluation criteria to the specific project ensures accurate assessment and optimal system performance.

The accuracy (also known as correctness) of metadata elements “concerns the accurate description and representation of data and resource content” as well as accurate data input

(Park, 2009, p.9) According to Zeng and Qin (2008, p.255-256), the accuracy of metadata elements could be measured in such various dimensions as:

 “Correct content: metadata record represents resources correctly

 Correct format: correctness of element label and its values, data types, application of element syntax

 Correct input: examines spelling, grammar, punctuation, word spacing, missing words or sections, foreign characters, etc

 Correct mapping/integration: correct mapping of metadata elements in harvesting and crosswalks”

Utilizing tools like content standards such as the Anglo-American Cataloguing Rules (AACR2) and Cataloguing Cultural Objects (CCO), along with best practices guidelines from metadata standards and application profiles, is essential for ensuring that a metadata record accurately represents the content of resources These resources serve as the most reliable references to verify the correctness and completeness of metadata, thereby enhancing resource discoverability and data integrity.

Ensuring metadata consistency involves evaluating both conceptual and structural levels Conceptual consistency refers to the uniform use of data values or elements to describe similar concepts across resources, enhancing semantic clarity Structural consistency focuses on the uniform application of data formats and structures when presenting similar data attributes, which supports seamless data integration Maintaining both levels of consistency is essential for reliable data management and improved interoperability across information systems.

Zeng and Qin (2008) extensively discuss various methods of checking consistency in metadata conversion, emphasizing the importance of maintaining consistent source links, identifiers, and source descriptions They highlight the need for uniform metadata representation and data syntax to ensure accurate and reliable data conversion processes, thereby improving data integrity and interoperability.

Stvilia et al (2004) identified six key categories of metadata quality issues, including incomplete metadata, redundant entries, lack of clarity, incorrect application of metadata schemas, semantic inconsistencies, and structural inconsistencies They emphasized that addressing these issues is crucial to ensuring accurate and reliable metadata for effective information retrieval and digital resource management Improving metadata quality by resolving these problems enhances searchability, interoperability, and the overall usefulness of digital collections.

A study of Electronic Theses and Dissertations metadata at Drexel University's DSpace repository highlighted significant metadata gaps, including missing elements such as thesis defense date, degree awarded date, degree type, advisors and committee members, and author contact information Properly capturing these metadata elements is essential for enhancing the discoverability and completeness of digital academic repositories Addressing these gaps can improve metadata quality, supporting better indexing, searchability, and academic transparency Ensuring comprehensive metadata in digital repositories is crucial for effective research data management and scholarly communication.

Other quality issues of metadata were also conveyed in many studies such as:

Metadata often suffers from a lack of contextual information, particularly when such context is only available at the collection level, making it sparse and less informative Additionally, the absence of controlled vocabularies in subject headings hampers accurate retrieval and discoverability, highlighting the need for more comprehensive and standardized metadata practices to improve searchability and understanding.

 Semantic overlap in several Dublin Core elements: type and format, source and relation, two qualifiers-part of and version of in element relation (Park, 2005)

The National Science Digital Library (NSDL) exhibits inconsistent and inaccurate use of metadata elements, especially within the Dublin Core (DC) framework For instance, the physical description field is often misapplied as either the format or description, leading to confusion Additionally, key DC elements such as type and format are frequently misunderstood or misused, while the source and relation elements are applied inconsistently, which hampers data interoperability and discoverability (Bui & Park, 2005, p.3) Proper standardization and understanding of metadata elements are crucial for enhancing the quality and utility of NSDL's digital resources.

Park (2009) discusses the importance of metadata quality in digital repositories, highlighting various issues related to the meaning and effectiveness of metadata in information retrieval systems He emphasizes that accurate and high-quality metadata is crucial for efficient information retrieval, as it directly impacts search accuracy and resource discoverability The study identifies key challenges in maintaining metadata consistency, completeness, and semantic clarity, which are essential for improving the overall performance of digital repositories Ensuring the semantic integrity of metadata enhances the precision and usability of digital collections, making it a vital aspect in the field of information sciences.

 The same meaning can be expressed by several different forms (e.g., synonyms) and the same forms may designate different concepts (e.g., homonyms) (p.5)

 The same concept can be expressed by different morpho-syntactic forms (e.g., noun, adjective, compound noun, phrase, and clause) (p.5)

 Different communities may use dissimilar word forms to deliver identical or similar concepts, or may use the same forms to convey different concepts (p.5)

Recently, in study of metadata best practices guidelines at Utah Academic Library Consortium, Toy-Smith (2010) emphasized that metadata consistency should be the primary consideration when developing digital collections

Park and Tosaka (2010) conducted a survey of cataloging and metadata professionals in the United States to assess the current state of metadata practices across digital repositories They found that metadata interoperability remains a significant challenge due to limited exposure of locally created metadata and guidelines beyond their local environments Additionally, the use of custom, locally added metadata elements can hinder interoperability across digital collections, especially when there are no sharable mechanisms for managing these extensions and variants.

This study defines homegrown schemata and guidelines as local application profiles that clarify existing content standards and specify how metadata element values are selected and represented to suit specific contexts The authors explored motivations for creating custom metadata elements, revealing that a primary motivation is the desire to accurately reflect the unique nature of local collections and the characteristics of the target community Additionally, constraints posed by local conditions and systems significantly influence the development of these homegrown metadata standards.

In another study of metadata decisions for digital library projects, Zeng, Lee and Hayes

(2009) reported that interoperability issues were highly concerned in most of libraries

“Their concerns ranged from planning and mapping together various metadata templates to enable standards used by various communities interoperable within one discovery system”

Metadata conversion in IRs from methodological point of view

Blanchi and Petrone (2001) defined metadata conversion is “a set of operations to translate the metadata contained in the digital object into another metadata schema” 2

In study of methodology for metadata interoperability and standardization, Chan and Zeng

According to 2006, three levels of metadata interoperability among Institutional Repositories (IRs) are identified: schema level, record level, and repository level When converting metadata between different schemas, two primary methods are recommended: schema-level crosswalks and record-level conversions These approaches facilitate effective metadata integration and enhance interoperability across repositories.

2.2.1 The crosswalk at schema level

A crosswalk is "a mapping of the elements, semantics, and syntax from one metadata scheme to those of another" (NISO, 2004, p.11) In similar view, Pierre and LaPlant (1998) stated

A crosswalk is a set of transformations that convert content from source metadata standards into a compatible format within target metadata standards, ensuring proper content storage According to the Dublin Core Metadata Initiative (DCMI), a crosswalk is defined as a table that maps relationships and equivalencies between multiple metadata schemas Implementing crosswalks or metadata mappings enhances the ability of search engines to efficiently search across diverse and heterogeneous databases, facilitating better data interoperability and discoverability.

2http://www.dlib.org/dlib/december01/blanchi/12blanchi.html

3http://www.niso.org/publications/white_papers/crosswalk/

According to Chan and Zeng (2006), crosswalks are the most widely used method to facilitate interoperability between different metadata schemas They emphasize that crosswalks enable systems to efficiently convert metadata elements from one schema to another, ensuring seamless data integration across platforms.

The crosswalk process begins by analyzing two independent metadata schemata to identify equivalent or comparable elements and refinements The primary technique used is direct mapping, which establishes semantic equivalency between metadata elements across different schemata This formal mapping facilitates semantic interoperability by aligning similar data elements, often represented through tables or charts that illustrate the correspondence between a source and target metadata standard based on their functional or semantic similarity.

In crosswalk practices, two primary approaches are commonly used The first, absolute crosswalk, requires precise mapping between elements of source and target schemas, ensuring elements are closely equivalent or closely matched However, this approach is limited for data conversion purposes, as it excludes data values that lack exact mappings, particularly when the source schema has a more complex structure than the target schema.

The relative crosswalk is an effective solution for mapping source schema elements to target schema elements, even when the elements are not semantically equivalent This approach ensures that all source elements are mapped to at least one target element, facilitating data integration across diverse schemas It is particularly advantageous when simplifying complex schemas, such as mapping from MARC to Dublin Core (DC), although it may be less suitable for reversing the process Overall, the relative crosswalk enhances schema mapping accuracy and efficiency in data interoperability projects.

Pierre and LaPlant (1998) highlight the challenges of creating and maintaining crosswalks, noting that this process is complex and error-prone, requiring specialized knowledge of metadata standards Developing a crosswalk is particularly difficult due to the independent and varied development of metadata standards, which use different terminologies and methods Additionally, maintaining a crosswalk over time is even more challenging as standards evolve, necessitating ongoing expertise and a historical understanding of the metadata standards involved.

Chan and Zeng (2006) highlighted key challenges in crosswalking between independent metadata schemas, including varying degrees of equivalency such as one-to-one, one-to-many, many-to-one, and one-to-none They emphasized that the lack of exact equivalents and potential overlaps in element meanings and scope can lead to data quality issues during data conversion processes Effective metadata mapping requires careful consideration of these issues to ensure accurate and reliable data integration.

2.2.2 Record conversion at record level

Chan and Zeng (2006) highlight that record-level conversions are essential when integrating established metadata databases across various projects Recently, there has been a trend toward reusing existing metadata records and combining them with other metadata components to create new, comprehensive records Two primary methods for achieving this integration or conversion of data values associated with specific elements or fields are data conversion and data integration.

Woodley (2008, p.7) defines data conversion projects as the process of transferring metadata values from one system or schema to another She highlights various reasons for data conversion, including system upgrades when legacy systems become obsolete and the need to provide public access by converting proprietary schemas to standard schemas for easier data publishing.

Record conversion involves transforming metadata schemas by mapping metadata elements and data from one schema to another Notable projects exemplifying this process include The Picture Australia Project (PAP) and the National Science Digital Library (NSDL) In PAP, metadata records from partner institutions are centralized at the National Library of Australia and standardized into a common format based on Dublin Core metadata standards Similarly, NSDL harvested records from the Alexandria Digital Library and converted them into Dublin Core records to ensure consistency and interoperability across digital collections.

The primary challenge in record conversion is minimizing data loss or distortion Zeng and Xiao (2001) highlighted that data mapping becomes more complex when involving data values, especially when transforming detailed target records into simpler source formats During conversion, values from rich, detailed structures may need to be broken down into smaller units, increasing the risk of data loss Zeng (2006) provided empirical evidence demonstrating how crosswalk-based data conversions can significantly impact data quality, particularly when handling large datasets.

Chan and Zeng (2006) describe metadata components as interconnected puzzle pieces that can be assembled from diverse sources across different processes These metadata pieces can be reused and combined efficiently to generate new records, highlighting the flexibility and modularity of metadata management in information systems.

The Metadata Encoding and Transmission Standard (METS) is a widely used framework for encapsulating descriptive, administrative, and structural metadata within a single XML document, facilitating seamless interactions with digital repositories and enabling the integration of various internal and external metadata schemas Additionally, the Resource Description Framework (RDF) by the World Wide Web Consortium serves as a flexible data model for developing and sharing interoperable vocabularies across different communities, enhancing data interoperability and accessibility in digital ecosystems.

In short, the selection of methods for metadata conversion in IRs depends on status of metadata schemata being used and desired outcomes that the institutions want to reach.

Practices of metadata conversion in IRs

A number of projects of metadata conversion have been conducted in libraries worldwide so far

The University of Sydney Repository initiated a project to migrate isolated faculty and unit databases to Dspace, consolidating various self-developed metadata elements stored in Filemaker, SQL, and spreadsheet applications Since these custom metadata fields differ significantly from the standard Dublin Core Metadata Set used in Dspace, four different migration options have been proposed to ensure a seamless transition.

 Map original metadata elements to existing Dublin Core (DC) elements in Dspace

 Map original metadata elements to DC elements and create new qualifiers for DC elements

 Create a custom schema identical to the original metadata set

 Generate DC records as abstractions of the original metadata records and submit the original metadata records as digital object bit-streams

According to Brownlee (2009), each option for metadata management involves specific advantages and disadvantages The first option offers low submission and maintenance costs, compliance with OAI-PMH, and minimal effort for metadata schema customization, but it risks loss of metadata granularity and potential data distortion The second option preserves the granularity of original records and supports harvesting via OAI-PMH; however, it entails higher submission and maintenance costs and presents challenges in managing the Dublin Core registry The third option circumvents Dublin Core registry management issues but requires considerable effort in customizing metadata schemas, creating OAI crosswalks, and maintaining DSpace index keys and project-specific schemas The final option retains metadata records in their original format but does not support harvesting original records The University of Sydney Library ultimately chose the fourth option, aligning with the repository’s primary preservation function and minimizing ongoing resource requirements for maintenance of multiple schemas.

The Internet Public Library (IPL) at Drexel University undertook a project to convert local metadata elements stored in Hypatia (SQL database) to the Dublin Core Metadata Set, developing a crosswalk between existing metadata elements and Dublin Core To support this process, IPL conducted analyses of metadata quality and quantity, created a new IPL metadata schema as an application profile of Dublin Core, and developed a new database structure along with a metadata creation and maintenance interface (Galloway, M et al., 2009) Analytical comparisons revealed that there was no direct field-to-field mapping between IPL’s existing fields and Dublin Core elements due to differences in labels, definitions, and data representations, as well as the presence of fields used only in Hypatia and some no longer in use.

IPL has developed a custom metadata schema for crosswalk preparation by utilizing the concept of application profile, combining existing IPL domain-specific elements with the Dublin Core Metadata Element set This tailored schema includes four namespaces, ensuring comprehensive and standardized metadata management This approach enhances interoperability and consistency across diverse information systems.

 Dublin Core Metadata Element Set (version 1.1)

 Dublin Core Metadata Element Set Qualifier (2000)

 IPL-defined Metadata Element Set

 IPL-defined Metadata Element Set Qualifiers

The IPL metadata schema primarily emphasizes administrative and technical elements, with custom specifications for element status and repeatability tailored to the IPL context (Galloway et al., 2009) However, challenges remain in achieving consensus on metadata labels and element status within the IPL Dublin Core compliance group, highlighting ongoing efforts to develop more comprehensive content designation rules and enhance the semantic aspects of the schema (Galloway et al., 2009).

Khoo and Hall (2010) investigated the metadata merger between IPL and the Librarian’s Internet Index (LII), where each library’s metadata was mapped to Dublin Core to develop IPL2 Their study highlighted key challenges such as metadata inconsistencies, data normalization issues, and the complexities of integrating disparate metadata schemas, which are critical considerations for successful digital library data management and interoperability efforts.

Many metadata elements in sources like IPL and LII, such as Former Title, Sort Title, Acronym, Alternate Title, and Alternate Spelling, were rarely used and considered unnecessary Extensive discussions took place regarding whether these elements should be included in IPL2 Ultimately, these metadata elements were stored in custom administrative fields that are hidden from end-users, streamlining the user interface and focusing on more relevant content.

Many IPL collections set collection-level records; however, there are no item-level records available for individual objects within these collections This lack of item-specific metadata means that these objects are not mapped to Dublin Core (DC), potentially limiting discoverability and detailed cataloging Ensuring comprehensive metadata at the item level is essential for improved metadata interoperability and enhanced user access to digital collections.

 The collections are stored in both MySQL database and Filemaker Pro database so that they cannot be included in the same crosswalk process

 Lack of controlled subjects headings in both IPL and LII

The Energy and Environmental Information Resources Center has undertaken a project to convert Federal Geographic Data Committee (FGDC) metadata into MARC21 and Dublin Core formats within OCLC’s WorldCat database According to Chandler, Foley, and Hafez (2000), this conversion process involved three key steps, beginning with the identification of a reduced set of essential elements for effective metadata transformation.

An "essential FGDC metadata" set was selected to ensure full compliance with FGDC standards, focusing on mandatory elements such as author, title, subject, and date, along with commonly used creator elements A crosswalk was developed to map FGDC metadata to MARC21 and Dublin Core formats, facilitating interoperability Additionally, a converter program written in C was created to automate the conversion process, enabling seamless transformation between metadata standards.

Bountouri and Gergatsoulis (2009) developed a comprehensive crosswalk from Encoded Archival Description (EAD) to the Metadata Object Description Schema (MODS), consisting of three key components Their approach involves creating semantic mappings between EAD elements and attributes and MODS counterparts, ensuring accurate translation of metadata Additionally, they map the hierarchical structure of EAD documents to MODS, preserving the original organizational context This method facilitates effective metadata interoperability and maintains the integrity of archival information during schema conversion.

To create a semantic mapping between EAD and MODS, the process began with a detailed examination of their records, focusing on elements, attributes, semantics, and scope notes Next, comprehensive mappings between EAD and MODS fields were established to ensure accurate correspondence Finally, real-world examples were developed to verify the semantic accuracy and practical applicability of the mappings, ensuring effective integration between the two metadata standards.

Two approaches were explored to map the hierarchical structure of EAD documents to MODS The standalone approach is suitable when describing a single archival unit, such as a photograph, while providing contextual information about its broader collection In this method, the record for the photograph is linked to a separate record representing the entire collection Conversely, when a comprehensive representation of resources is needed, creating records with nested MODS records allows for detailed, hierarchical resource descriptions.

When converting an EAD document to MODS without considering inherited information, significant data loss can occur To address this issue, Bountouri and Gergatsoulis have proposed two distinct approaches to ensure the preservation of essential information during the transformation process.

(2009, p.20-21) They are resulting MODS records embodying the inheritance property and constructing self-contained MODS records with respect to their information content

Finally, National Science Digital Library had developed the Metadata Repository (MR) to convert metadata records harvested from various collections into Dublin Core records By

Arms et al (2003), MR “holds collection-level metadata about every collection known to the

The Metadata Repository (MR) in NSDL is designed to support multiple metadata standards, including Dublin Core, Qualified Dublin Core, IEEE Learning Technology Standards Committee (IMS), MARC 21, and Encoded Archival Description (EAD) Since establishing a universal metadata standard across all collections is challenging, MR accepts several preferred metadata formats provided by collections In addition to storing original harvested metadata records, MR creates a standardized Dublin Core record in a format called nsdl_dc for each object, primarily generated through crosswalks from the original metadata The process of importing metadata into MR is facilitated via the OAI-PMH protocol, ensuring seamless integration and access across diverse metadata standards.

Figure 2.2: Import metadata record into MR via OAI-PMH

Semantic mapping of metadata in crosswalk

Semantic mapping is “the process of analyzing the definitions of the elements or fields to determine whether they have the same or similar meanings” (Woodley, 2008, p.3)

Mapping in ontology involves establishing correspondences among source ontologies to identify overlapping concepts, which are similar in meaning but differ in names or structures According to Noy and Musen (2000), the primary goal is to determine these shared concepts as well as those unique to each source, facilitating effective integration and interoperability across different ontologies.

2.4.2 Types of similarity/correspondences among schemata elements in semantic mappings

Masood and Eaglestone (2003) suggested Extended Common-Concept based Analysis Methodology (ECCAM) ECCAM define 2 types of semantic similarity among schema elements:

 Shallow similarity: two elements share common concepts among their intrinsic meanings

 Deep similarity: two elements share common concepts among their intrinsic meanings in particular context

 The intrinsic semantics of a schema element is its meanings independent from the context within which it is used

In-context semantics refer to the specific meaning of an element within the particular contexts defined by its schema These semantics are shaped by both the intrinsic meaning of the element and the surrounding modeling contexts, ensuring a clearer understanding of its role and function in different schema environments.

In mapping assertion metamodel below, there are 4 types of relations: similar, narrower, broader and related to

Figure 2.3: Mapping assertion metamodel (Hakkarainen, 1999)

There is one more type of relations, which is called dissimilar relation, added to the modified metamodel (Su, 2004, p.105)

Hakimpour and Geppert (2001) defined four levels of similarity relations as well:

 Specialized definitions (sub concept or sub relation)

From museum and archival practices, Lourdi, Papatheodorou and Nikolaidou (2006) identified specific “association” types correlating a couple of elements from the two different schemata:

 equivalence: for mapping elements that have the same meaning

 refinement: to express a relationship between an element and its qualifier following exactly the DC

 Hierarchical: to connect elements that can be considered as broader and narrower concepts

2.4.3 Practice of semantic mapping in crosswalk

Lourdi, Papatheodorou, and Nikolaidou (2006) conducted a semantic mapping of metadata schemata within digital folklore collections at the Greek Literature Department of the University of Athens They created a detailed table that correlates the semantics of two different vocabularies, identifying semantically related elements between schemas Each metadata element was treated as a topic, with the researchers defining various types of associations that connect elements across different schemata, emphasizing the roles each element plays within these relationships This approach enhances interoperability and understanding of metadata in digital cultural heritage collections.

The mapping procedure follows these steps:

Firstly, they consider each metadata element as a “topic” with its own attributes, according to the metadata standard that comes from

They categorized the elements of the two schemata into three key topic types: descriptive, administrative, and structural metadata Each metadata element is classified as an instance of one of these specific types, ensuring clear organization and understanding of the metadata components This classification enhances the management and retrieval of digital resources by clearly distinguishing the different metadata functions Implementing these categorizations improves metadata consistency, facilitating more efficient data discovery and interoperability across systems.

Next, specific “association” types correlating a couple of elements from the two different schemata are formulated as following:

 Equivalence: mapping elements that have the same meaning

 Refinement: expressing a relationship between an element and its qualifier following exactly the DC

 Hierarchy: connecting elements that can be considered as broader and narrower concepts

In an association, each element fulfills a specific role, leading to the establishment of key role pairs These include equivalent terms for the “equivalence” association, broader and narrower terms for the “hierarchical” relationship, and element type or qualifier for the “refinement” association Understanding these role types is essential for accurately modeling relationships within data and ensuring clarity in semantic structures.

This article presents an example table illustrating the roles and association types used in mapping between the source application profile at the collection level and the target Dublin Core Collection Description Application Profile It highlights how different roles and association types facilitate the effective alignment of metadata elements, as discussed by Lourdi, Papatheodorou, and Nikolaidou (2006, p.18) This mapping process is essential for ensuring interoperability and standardization in digital collection descriptions within the metadata community.

This article discusses various metadata standards, including the Dublin Core Collection Description Application Profile (DC CD AP), ISAD (General International Standard Archival Description), ADL (the metadata model of Alexandria Digital Library), RSLP (Research Support Libraries Program), and LOM (IEEE Learning Object Metadata) These standards are essential for ensuring consistent and effective description, organization, and retrieval of digital resources across different platforms and institutions Understanding these metadata models helps enhance interoperability and discoverability in digital libraries and archival systems.

Figure 2.4: Semantic mappings between collection application profile and Dublin Core

However, in this table, there is no clear explanation of the reason why element

“ABSTRACT” in the target can be seen broader concept of element “(DC) _CONTRIBUTOR” from the source in mapping.

The challenges in metadata conversion

Three types of conflicts in schema integration which belong to structural conflicts were studied by Batini and Lenzerini (1987, p.346) as following:

Type conflicts occur when the same concept is represented in different ways across various metadata schemata, leading to inconsistencies For example, a class of objects may be modeled as an entity in one schema but as an attribute in another, which can complicate data integration and interoperability Understanding and resolving these type conflicts is essential for maintaining coherent and semantically aligned metadata across diverse systems.

 Dependency conflicts: the relations in group of concepts are expressed with different dependencies in more than one metadata schemata For example, the relationship

“marriage” between “man” and “woman” is expressed 1: 1 in one schema, but m: n in another schema

Behavioral conflicts occur when different insertion and deletion policies are assigned to the same class of objects across multiple schemata For example, one schema may permit a “department” class to exist without employees, while another schema mandates that deleting the last employee in a department results in the department's deletion These conflicts typically arise only when the data model supports representing the behavioral properties of objects, highlighting the importance of consistent policy enforcement in schema design.

In similar point of view, Su (2004, p.85-86) in his study has categorized two types of conflicts in semantic mapping were terminology discrepancies and structural discrepancies

 Synonym occurs when the same object or relationship is represented by different names/labels in component schemata

 Homonym occurs when different objects or relationships are presented by the same name in the component schemata

 Type discrepancies arise when the same concept have been modeled using different data structure

Dependency discrepancies occur when related concepts have different dependency relationships across various schemas For instance, the relationship between Project and Person as ProjectLeader can be a 1:1 dependency in one schema, but an m:n dependency in another, highlighting inconsistencies that can impact database design and data integrity Understanding these variations is crucial for ensuring accurate data modeling and seamless schema integration.

In study of metadata migration, Woodley (2008, p.7) has indicated some misalignments occurred during data migration include:

 There are no complete equivalent between metadata elements in source database and those in target database

 It is difficult to distinguish between metadata elements that described original object and those that described object related information such as related image or digital surrogate

 Data assigned in one metadata element in source schema may be mapped to more than one element in the target schema

 Data is presented in separate fields in source schema may be placed in a single field in the target schema

When no element in the target schema matches the meaning of the source content, unrelated information may be improperly included in metadata elements, leading to content that is either loosely related or entirely irrelevant, which can negatively impact data quality and search engine optimization.

 When there is no consistency in entering data into records, it may not be possible to use same mapping mechanism for all records that are being converted

 There may be differences in granularity and community specific information between the source and the target in conversion

 The source metadata schema may have hierarchical structure with complex relationships among elements while the target schema has flat structure or vice versa

Furthermore, Chan and Zeng (2006) also found “that data values may be lost when converting from a rich structure to a simpler structure” In another study, Zeng and Qin

(2008) addressed four most serious issues in metadata conversion including “(1) misrepresented data values, (2) valuable data values that are lost, (3) incorrectly mapped elements and data values, (4) missing elements” (p.256)

Jackson et al (2008) conducted experiments to examine semantic and value changes during metadata schema conversions, remapping records to Dublin Core at the University of Illinois Their findings revealed that publicly available crosswalks, like the Library of Congress MARC to Dublin Core, often fail to accurately reflect the semantic meaning of elements, leading to potentially misleading mappings Notably, the description, format, subject, and type fields exhibited the most significant changes when remapped from original records The increase in description and subject fields was primarily due to multiple value strings within single elements in the original metadata, highlighting challenges in maintaining semantic integrity during schema conversions.

The authors identified several conflicts in metadata mapping to Dublin Core elements, such as publication dates being incorrectly mapped to the coverage field instead of the date field Additionally, information from different digital collections within the same IRs is often placed in the source field rather than the relation field Some records misuse the format field to describe access methods instead of the digital object’s format They conclude that while original metadata records are rich in meaning within their own environment, this richness is lost during aggregation due to mapping errors, misunderstandings, and misuse of Dublin Core fields Proper semantic-based mapping, rather than relying solely on value strings, can significantly enhance metadata quality and ensure better interoperability.

Park (2005) conducted a pilot study to evaluate the accuracy of mapping cataloger-defined natural vocabulary field names to Dublin Core metadata elements, using a sample of 659 records from three digital image collections The study revealed instances of incorrect and null mappings, highlighting challenges in ensuring consistent and accurate metadata translation These findings emphasize the importance of refining mapping processes to improve metadata quality and interoperability in digital collections.

“physical field” in source was either mapped to “description” and “format” in target;

“subject” in target was mapped by various fields in source such as “category”, “topic”,

“keyword”, etc Furthermore, some null mapping fields such as “contact information”, “note”,

“scan date”, “full text”, etc were identified as well

This pilot study highlights the critical need for implementing a mediation mechanism, such as metadata mapping guidelines and a mediation model like concept maps, to assist catalogers in the mapping process Establishing clear guidelines and visual models can improve the accuracy and consistency of metadata mapping, ultimately enhancing metadata interoperability and searchability across systems Incorporating these tools into cataloging workflows is essential for effective metadata management and fostering seamless data integration.

The goal of this mechanism is increasing semantic mapping consistency and enhancing semantic interoperability across digital collections

This article reviews methods for metadata conversion, including crosswalks, record conversion, and data reuse or integration, highlighting their roles in improving information retrieval systems It discusses practical approaches based on real-world experiences, emphasizing effective strategies for metadata transformation Additionally, the article addresses critical challenges such as semantic conflicts and quality control issues encountered during metadata schema conversion, underscoring the importance of accurate and consistent metadata in ensuring reliable data integration.

34 theoretical background and experiences might be useful for defining appropriate strategy and make good preparation for DUO conversion project at UBO.

RESEARCH METHODOLOGY

Methodology

This research employs a qualitative methodology to explore the perspectives of UBO librarians and external experts, providing in-depth insights into their viewpoints It also analyzes the semantics of metadata elements used in the current DUO database, ensuring a comprehensive understanding of how metadata facilitates information retrieval According to Strauss and Corbin, qualitative approaches are essential for examining complex social phenomena, making this methodology ideal for capturing nuanced stakeholder experiences and the semantic structure of metadata in library systems.

Qualitative methods are essential for exploring individuals' experiences and uncovering underlying aspects of phenomena that are not yet well understood (1990, p.19) In the specific context of metadata conversion from DUO to DSpace at UBO, a case study approach is most appropriate According to Pickard (2007, p.86), case studies provide a comprehensive understanding of a case through in-depth descriptions within its real-world context, making them ideal for holistic investigations She emphasizes that using case studies allows researchers to gain detailed insights into complex phenomena or situations from the perspectives of all involved stakeholders (p.93), which aligns with the needs of this metadata conversion project.

The primary data collection method employed in this study is the structured interview, complemented by a critical analysis of previous research and system documents related to metadata used in DUO and DSpace This approach ensures a comprehensive understanding of current practices, research developments, and the specific context of the case study Additionally, a metadata crosswalk between DUO and DSpace is developed using harmonization techniques to facilitate interoperability and standardization across systems.

Structured interviewing, as described by Pickard (2007, p.175) and defined by Fontana and Frey (1994, p.363), involves an interviewer asking each respondent a set of pre-established questions with a limited set of response options This method ensures consistency and comparability across interviews, making it an essential approach in qualitative research.

Pickard (2007, p 175) identifies two types of structured interviews: the standardized, open-ended interview, where all respondents answer the same questions but can respond freely with any information they wish to share; and the close, fixed-response interview, where respondents choose answers from predetermined options These two forms can be used together in practice, as demonstrated in this study, which combines both structured interview approaches to gather comprehensive data.

According to Pickard (2007), the primary advantage of close, fixed-response interviews is the ability to pick up on visual and oral cues through listening and observing respondents She emphasizes that researchers can gain valuable insights not only from respondents' statements but also from how they express themselves Interviews are particularly useful for achieving an in-depth understanding of individual perceptions, especially when the data is complex and difficult to explore through simple Q&A When converting metadata from DUO to DSpace, it is crucial to consider the diverse attitudes and ideas of librarians and experts involved, making it essential to explore these perspectives beforehand to develop an effective conversion strategy.

This study proposes a two-step implementation of the structured interview technique Initially, a comprehensive questionnaire with both closed and open-ended questions is developed and distributed to informants involved in the DUO migration project Subsequently, selected informants are interviewed based on their responses to explore their experiences related to key aspects of the case study or to clarify ambiguous information However, only one informant was interviewed via email to exemplify their answers Some questionnaire questions required informants to interpret undecided aspects of the project, leading to refusal to answer, which limited the possibility of conducting additional interviews.

A crosswalk is a process that maps elements, semantics, and syntax from one metadata scheme to another, facilitating interoperability between different standards (NISO, 2004) According to Pierre and LaPlant (1998), a crosswalk involves applying transformations to source metadata elements to ensure that the content conforms appropriately when transferred to the target metadata standard This process is essential for enabling seamless data sharing and integration across diverse metadata frameworks.

According to Chan and Zeng (2006), crosswalks are the most commonly used method for enabling metadata interoperability between different schemas They facilitate effective data conversion from one metadata standard to another, making them essential in interoperability efforts At UBO, the decision to implement a crosswalk for metadata conversion from DUO to DSpace highlights its importance for seamless data integration and standardization.

The crosswalk process including two steps is harmonization and semantic mapping

In the definition by Pierre and LaPlant (1998), “harmonization is the process of enabling consistency across metadata standards” 6

Harmonization aims to effectively develop crosswalks between metadata standards, streamlining their development, implementation, and deployment by utilizing shared terminology, methods, and processes This process simplifies the integration of metadata standards, ensuring consistency and interoperability across different systems and platforms The harmonization procedure involves systematically aligning and combining standards to facilitate seamless data exchange and enhance metadata management.

This article begins by defining common terminologies, properties, and organizational structures shared between the source and target metadata schemas Standardized, formal definitions of each term and shared vocabularies are established to ensure clear communication and avoid misinterpretation between the two schemas, facilitating accurate data integration and interoperability.

This article compares the properties used in both schemas by identifying their similarities and differences Key metadata elements include attributes such as name, identifier, label, and definition, which help clarify each element's purpose Additional properties like data value (text, numeric, or controlled vocabulary), obligation (mandatory or optional), relationships (equivalent or hierarchical), and repeatability (repeatable or unrepeatable) are also essential in defining the structure and functionality of metadata Understanding these properties ensures effective schema design and enhances data interoperability.

Finally, those data in the source and the target schemata should be presented in similar way in order for the mapping in crosswalk could be created easily

5,6 http://www.niso.org/publications/white_papers/crosswalk/

Developing the crosswalk by semantic mapping of metadata elements between the source and the target schemata

According to Pierre and La Plant (1998), this step involves defining a mapping between each metadata element in the source schema and a semantically equivalent element in the target schema These mappings are typically organized in tables or charts for clarity Common types of mappings include various methods tailored to ensure accurate data transformation and interoperability between schemas, facilitating effective metadata integration and data sharing.

One-to-one mapping: The element in source schema is corresponding to the element in target schema

One-to-many mapping occurs when a source schema element contains multiple values, such as a title with formal title, subtitle, and second language title, which are mapped to multiple target schema elements This situation commonly arises when transforming data from a simple schema to a more complex one, requiring specialized knowledge of the source element's composition Effective mapping in such cases involves understanding how a single source element expands into multiple target elements, ensuring accurate and meaningful data transformation.

Many-to-one mapping is common when translating complex schemas into simpler ones, requiring clear rules for handling extra elements When all source element values are combined into a single target value, specific guidelines must be established for concatenation If only one source element is mapped to the target, there is a risk of information loss, so criteria such as importance or commonality should be defined to select the appropriate value Properly specifying these rules ensures accurate and efficient schema mapping.

Null mapping: The element in the source cannot find corresponding element in target schemas In this situation, qualifiers may be created in target schema

Sampling technique

Snowball sampling is an effective method for selecting respondents for structured interviews, as it helps identify key informants vital to the research Since it is challenging to initially locate all suitable informants, this technique is employed to expand the participant pool systematically In this study, the process begins by sending an introduction letter detailing the study's purpose and objectives to individuals involved in the DUO conversion project at UiO, including the library director, vice director, IT unit director, research department director, chief engineer, consultants, and Dspace administrators at Oslo University College and Cambridge University Repository These initial contacts then recommend other relevant individuals who can provide valuable insights, allowing the search to continue until all suitable participants are covered.

The instrument selected to collect data is online questionnaire

The questionnaire is designed to collect ideas, attitudes or comments about research issues from respondents at UBO and outside It has both closed questions and open-ended questions

The structure of questionnaire consists of the introduction, three sections and respondent profile described below:

The introduction gives guidelines for respondent about how to make an answer for the questions

Section 1: Strategy for metadata conversion

This section includes positioning questions about motivations, the approach, influence factors and the strategy for metadata conversion from DUO database to Dspace

Section 2: Metadata conversion from DUO to Dspace

The respondents are asked specific questions about the reuse of metadata elements in DUO database, the usage of Dublin Core elements and the configuration of metadata registry in Dspace

Section 3: Conflicts/risks in metadata conversion from DUO to Dspace

This section explores respondents’ perceptions and interpretations of potential conflicts and risks in metadata conversion based on their experiences It highlights how libraries can proactively prepare to identify, manage, and control these challenges to ensure a smooth metadata transformation process Effective risk mitigation strategies are essential for maintaining data integrity and supporting seamless library operations during metadata conversions.

The final section of the questionnaire gathers respondents' profiles, including their name, position or role, and email address Participants are assured that their personal information will be kept confidential and used solely for follow-up discussions related to the study.

The questionnaire is developed on a computer and distributed via Survey Monkey, an online survey tool, to informants at UBO and beyond due to its convenience and wide reach Online surveys enhance the ability to access potential respondents, especially when utilizing snowball sampling techniques, while also saving time, costs, and effort for both researchers and participants However, online surveys also pose challenges such as technical issues and low response rates caused by computer incompatibility and the absence of face-to-face interactions with informants.

Stt.010.Mssv.BKD002ac.email.ninhd 77.99.44.45.67.22.55.77.C.37.99.44.45.67.22.55.77.77.99.44.45.67.22.55.77.C.37.99.44.45.67.22.55.77.77.99.44.45.67.22.55.77.C.37.99.44.45.67.22.55.77.77.99.44.45.67.22.55.77.C.37.99.44.45.67.22.55.77.77.99.44.45.67.22.55.77.C.37.99.44.45.67.22.55.77.77.99.44.45.67.22.55.77.C.37.99.44.45.67.22.55.77t@edu.gmail.com.vn.bkc19134.hmu.edu.vn.Stt.010.Mssv.BKD002ac.email.ninhddtt@edu.gmail.com.vn.bkc19134.hmu.edu.vn

In short, the process of developing the questionnaire includes the following steps:

Figure 3.1: Steps to developing the questionnaire 3.4 Pilot testing

Pilot testing is a crucial step in research, enabling researchers to identify and make necessary adjustments before the official data collection begins In this study, the online questionnaire was tested by a digital services librarian at Oslo University College, whose expertise in digital technologies ensured valuable feedback for refining the instrument.

Identifying the main research issues

Reviewing previous studies related to topic

Discovering data required for research questions

Developing the questions in questionnaire:

Closed questions and open-ended questions

Making the priority order of questions

Designing the structure of questionnaire

Evaluating the questionnaire by pilot study

Distributing the questionnaires to the informants

42 repository and Dspace She is also invited to become a member of consultant committee for DUO conversion project at UBO

Pilot respondents emphasized the need to rephrase questions to minimize ambiguity, ensuring clearer understanding They recommended changing the three-scale options from “very important, less important, no need” to “definitely use, maybe use, won’t use” to better capture user intentions Additionally, merging two related questions into a single one was suggested to streamline the survey and improve response accuracy.

The data collected from structured interviews are primarily qualitative, focusing on respondents' perceptions and interpretations To analyze this data, a method called constant comparative analysis is employed for coding and categorizing, which helps identify broad themes, patterns, or categories that emerge from qualitative research According to Hewitt-Taylor (2001, p.42), constant comparative analysis is a valuable technique for uncovering insights within qualitative data.

This method comprises of three steps including coding, categorizing and clustering

In coding questionnaires, each question is assigned a specific code that represents the underlying theme of the data These codes are defined by their name, description, and abbreviation, facilitating organized data categorization Data is then grouped under the relevant codes, with additional notes such as question number and respondent’s name recorded for clarity This structured coding process ensures accurate data analysis and enhances the efficiency of qualitative research.

After completing the coding process, similar codes containing common opinions are merged to create meaningful categories, with all data assigned to each code consolidated accordingly This organization facilitates systematic analysis and helps identify overarching themes within the data Proper categorization enhances clarity and efficiency in qualitative research, ensuring that related insights are grouped together for comprehensive interpretation.

Categories are clustered around each research question to identify which categories address specific research issues, with some categories potentially relating to multiple questions Unrelated categories that do not fit current research issues can be recommended for future research directions This approach helps ensure a structured and comprehensive analysis aligned with the study's objectives.

The harmonization process results are organized into comprehensive tables that display key metadata elements, including labels, qualifiers (for Dublin Core), definitions, and refinements, ensuring clarity and consistency A composite table presents a clear crosswalk between source and target schemas, facilitating easy comparison of corresponding elements This approach allows users to trace how each source element maps to its target counterpart, with different mapping types clearly indicated, enhancing understanding of schema alignment and interoperability.

Limitations of the research

Some limitations of the research are addressed below:

The DUO conversion project at UBO is still in its early stages, limiting informants' ability to provide comprehensive insights Consequently, their responses may not sufficiently clarify research issues, as they struggle to interpret aspects that have not yet materialized in reality.

The metadata documentation for the DUO database is originally written in Norwegian, making it necessary to use translation tools like Google Translate or dictionaries to convert it into English However, due to potential inaccuracies in automated translation, fully understanding the content can be challenging, as some information may be translated incorrectly.

The use of English for all questions and answers can make informants uncomfortable when expressing their ideas, potentially affecting the quality of the data collected Additionally, the presence of technical terms may be challenging for informants to understand, leading to misunderstandings Furthermore, language barriers pose significant challenges for researchers in effectively conducting interviews with participants, which may impact the overall research process.

Finally, some informants are so busy with the work that they might not take enough time to answer the questions or they will refuse to participate in the study.

Ethical consideration

The confidentiality of informants was emphasized by clearly stating their anonymity in the questionnaires, with their names securely coded during data analysis and presentation All data collected from the questionnaires was strictly used for research purposes, and the questionnaire responses are not included in the appendices to maintain participant confidentiality.

DATA ANALYSIS AND FINDINGS

The analysis of data collected by online questionnaires

Twenty informants involved in various roles within the DUO project received online questionnaires to gather their insights; these roles included Project management, Steering committee members, Line managers, DUO students and academics, and DUO media representatives Six of these informants, possessing expertise in the DUO project, completed and submitted their responses via SurveyMonkey, an online survey platform Conversely, the remaining informants declined to participate, citing a lack of specialized knowledge regarding the DUO project.

The table below provides a brief overview of the profiles of the informants who participated in the study, with their names anonymized to ensure confidentiality Their original responses are presented in quotes to preserve authenticity, while supplementary clarifications are included in square brackets for better understanding This approach ensures that their insights are accurately represented within a coherent and SEO-friendly paragraph.

#H Vice director University of Oslo Library (UBO)

#K Head engineer of new DUO project

University of Oslo Library (UBO)

#To Consultant University Center for Information

Technology (USIT), University of Oslo

Table 4.1: The profile of informants

All responses collected from informants via SurveyMonkey were exported as PDF files to preserve the questionnaire's structure Each question was assigned a specific theme representing the various answers provided by informants Similar themes were then clustered into broader categories aimed at addressing the research issues Overall, three major categories were identified through data analysis, providing a clear framework for interpreting the findings.

 Strategy of converting DUO metadata elements to Dspace

 Customization of metadata elements in Dspace

 Challenges of metadata conversion from DUO to Dspace

4.1.1 Strategy of converting DUO metadata elements to Dspace at UBO

The informants provided insights on key aspects of converting metadata elements from DUO to DSpace, highlighting four main themes: motivations for the conversion, approaches to the conversion process, factors influencing the choice of metadata conversion methods, and criteria guiding their decision-making These insights shed light on the strategic considerations and technical approaches involved in metadata migration, emphasizing the importance of understanding motivations and influencing factors to ensure effective and efficient conversion strategies in digital repository management.

4.1.1.1 Motivations of migrating DUO to Dspace

The decision to migrate from DUO to DSpace was driven by two main motivations Firstly, the current technical platform of DUO cannot support future maintenance and development needs at UBO The existing system is no longer adequate to meet the evolving requirements, prompting the move to a more robust and scalable solution.

DUO technical platform is being deprecated” (#K) Furthermore, “DUO was developed in programming environment that all web-application in UiO (University of Oslo) shall leave”

(#M) Therefore, technical limitations of DUO might be one of the important reason that it was not received the support to use anymore

DSpace is chosen for its widespread adoption, ease of customization, and strong interoperability capabilities, making it an ideal replacement for DUO This preference is supported by the majority of informants, who highlight DSpace’s user-friendly features and adaptability as key advantages.

 “Almost every institution in Norway use the DSpace software for their institutional repository: easier to share code, no longer necessary to develop own software” (#T)

 “All other universities in Norway except NTNU [Norwegian University of Science and

 “[Dspace is] common software platform for nearly all repositories in Norway It is also

 “Interoperability, cooperation with other institutions (DSpace is very common in

Norwegian universities and colleges), highly customizable open source software, free, durability” (#To)

DSpace is an ideal solution for our needs due to its open-source nature, allowing for extensive customization and seamless integration with other systems at the University of Oslo Its low total cost of ownership makes it a cost-effective choice, while its strong community support enables collaboration with other DSpace institutions both nationally and internationally.

The technical limitations of the current DUO system and the need for improved interoperability with other institutional repositories in Norway are key factors motivating the migration project to DSpace Addressing these issues aims to enhance system functionality and facilitate seamless integration across repositories.

It’s interesting that informants have proposed two different approaches for converting metadata elements in DUO to Dspace

The initial approach involves fully updating the metadata elements in DUO to align with the default Dublin Core Metadata Element Set (DCMES) in DSpace According to two informants, this is a suitable strategy because "there is no reason to mix metadata schemes" across DUO and DSpace They emphasized that "ideally, both systems should adhere to relevant standards such as Dublin Core" to ensure consistency and interoperability.

Implementing a DSpace-based DUO database at UBO enables enhanced interoperability with other institutional repositories globally and in Norway However, since DUO was developed internally to address UBO's specific needs, it contains unique local metadata elements that may lack direct equivalents in the Dublin Core Metadata Element Set (DCMES) This can result in the loss or incorrect mapping of important local metadata during data conversion Therefore, careful consideration is essential when choosing this approach for migrating DUO to DSpace to ensure data integrity and metadata accuracy.

Informants emphasized the importance of ensuring all metadata values are preserved during format conversions, highlighting that "the library should of course make sure they keep all the metadata values in the conversion" (#T) Additionally, they noted that "a workaround may be useful if valuable information is held in the original formats" (#E), underscoring the need for flexible solutions to retain essential data.

To ensure proper handling of metadata conversion, two informants suggested alternative approaches The first recommends "keeping the original metadata elements intact" to avoid losing existing information, though the specific metadata to retain remains undecided The second proposes "retaining only important local elements," considering local administrative data as uninteresting for preservation Overall, it is understood that metadata elements essential for identifying and accessing digital objects in DUO should be preserved, while administrative database elements related to table management can be safely removed.

This approach emphasizes the importance of preserving metadata elements and their values in the current DUO system to prevent information loss during file conversion Key considerations include determining the optimal number of original metadata elements to retain, managing maintenance costs, and ensuring DUO's interoperability with other systems post-conversion However, insights from informants #H and #K did not address these specific questions.

There are two primary approaches for converting metadata elements from DUO to DSpace The first involves completely replacing DUO metadata elements with DCMES in DSpace to enhance system interoperability globally and within Norway The second approach emphasizes preserving original DUO metadata elements and key local elements during the conversion process to maintain local data integrity Each approach offers distinct advantages and challenges—while the first improves compatibility, it may compromise local data, whereas the second preserves local information but might limit external interoperability Careful evaluation of these pros and cons is essential when selecting the most suitable conversion method.

4.1.1.3 Factors influential to the selection of strategy for converting DUO to

Informants evaluated a set of predetermined factors influencing the selection of a migration strategy from DUO to DSpace, with an option to add additional considerations They rated these factors using three importance scales: most important, important, and least important The results highlight key criteria that inform the decision-making process, providing valuable insights for optimizing migration strategies This approach ensures a comprehensive understanding of the factors that impact successful system migration, aligning with best practices in digital repository management.

Figure 4.1: Factors influential to strategy of conversion

Harmonization of metadata elements in DUO and Dspace

Pierre and LaPlant (1998) have defined “harmonization is the process of enabling consistency across metadata standards” The purpose of harmonization is to successfully develop the crosswalks between metadata schemata

The harmonization process results are organized in a comprehensive table that features columns reflecting the semantics and content of metadata elements These columns include element labels, qualifiers (for Dublin Core elements), as well as definitions and refinements This structured approach ensures clear, consistent, and meaningful metadata documentation aligned with SEO best practices.

Understanding the structures of both the DUO database and the default Dublin Core schema in Dspace is essential before developing harmonized metadata elements This foundational knowledge ensures accurate alignment of metadata standards, facilitating seamless integration and improved metadata interoperability across the systems Proper harmonization of DUO and Dspace metadata elements enhances discoverability and accessibility of digital resources, making it a critical step in digital repository management.

In 2007, USIT released Norwegian documents detailing the structure of the DUO database These documents reveal that DUO is a relational database designed to efficiently describe and access objects through dedicated tables and fields Additionally, the database includes specific tables and fields dedicated to its administration, ensuring smooth management and operation of the system.

The database comprises a total of 16 tables, including BIB_WORK, BIB_LANGDESCR, BIB_ORGUNIT, BIB_XMLMETADATA, BIB_INSTANCE, BIB_CLASSIFICATION, ASSOCIATION TYPE, Works Association, BIB_CLASSES, BIB_ACTUAL USERS, BIB_EDITOR, BIB_LANGUAGE, BIB_LOGTEXTTABLE, BIB_LOGTABLE, DOCUMENT TYPE, and SCIENCE For a comprehensive understanding of each table's structure and fields, please refer to Appendix 1 in the appendices.

The complicated relation among these tables is depicted in the following diagram:

Figure 4.4: Relations among tables in DUO database

In 2005, the Dublin Core Usage Board approved a comprehensive document describing Dublin Core qualifiers, including two main categories and specific instances These qualifiers were primarily identified by working groups of the Dublin Core Metadata Initiative to enhance metadata descriptions Implementers are encouraged to develop additional qualifiers tailored for local applications or specific domains, promoting flexibility and adaptability within the Dublin Core framework.

Qualifiers in metadata are categorized into two classes Element refinement narrows or specifies the meaning of Dublin Core elements for more precise description Encoding schemes identify the methods that facilitate the interpretation of element values, including controlled vocabularies and formal notations or parsing rules These qualifiers enhance the clarity and interoperability of metadata, ensuring more accurate data description and retrieval.

List of Dublin Core elements and their qualifiers see appendix 3 in the appendices

The default Dublin Core Metadata Set in Dspace has been adapted, resulting in some deviations from the original qualified Dublin Core standards Specifically, the qualifier “author” within the “contributor” element is used to identify a person or organization responsible for the content, instead of using the “creator” element, which is reserved for harvested metadata A list of the Dublin Core elements and their qualifiers as implemented in Dspace’s metadata registry is provided in Appendix 2.

Below is the presentation of the harmonization between metadata elements in DUO and qualified Dublin Core Metadata Set (DCMES) in Dspace

DUO fields Definitions DCMES in

TITLE Title of document Title The name given to the resource

SUBTITLE Under title of document Alternative Any form of the title used as a substitute or alternative to the formal title

ALTTITLE Title in second language

AUTHORLIST List of authors, separated by # Creator An entity primarily responsible for making the content of the resource

Note: used only for harvested metadata

Entity responsible for making contributions to the content of the resource

Use primarily for thesis advisor

ABSTRACT Summarize the content of the resource

Papers related to the content of the resource

An account of content of the resource

A list of subunits of the content of the resource

A summary of the content of the resource

Information about sponsoring agency Uniform Resource Identifier pointing to description of the object

KEYWORDS Free keywords Subject ddc lcc lcsh mesh

The resource's content is defined by its subject, which is typically represented through relevant keywords, key phrases, and classification codes These classification codes include the Dewey Decimal Classification number and the Library of Congress classification number, both of which help organize and identify the resource's topic accurately Proper classification ensures efficient retrieval and enhances the discoverability of library materials.

Library of Congress Subject Headings Medical Subject Headings

ALTKEYWORDS Free keywords in second language

Publisher The entity responsible for making the resource available

DATE Date in which the document was created

Date will be associated with the creation or availability of the resource

Date of creation of intellectual content if different from date.issued Date that the resource will become available to the public

Date of publication or distribution Date of submission of the resource Recommend for theses/dissertations Date Dspace takes possession of object

Date of a statement of copyright

First time the document was published

D Last time the document was published

ED Month in which the document is approved

D Year in which the document is published

TYPE Category of objects (article, report, book chapter, conference paper, dissertation…)

In content categorization, "Type" refers to general categories such as images, sounds, or text, encompassing various functions, genres, or aggregation levels within the content The "OAI Type" name is specifically defined to facilitate proper mapping and harvesting within the Open Archives Initiative (OAI) framework Additionally, the "ENGNAME" provides an English designation for the document type, ensuring clear identification and consistency across multilingual platforms These elements are essential for efficient content organization, discovery, and interoperability in digital repositories.

NORNAME Norwegian name for document type

XML TEXT Xml stream with metadata Format

The physical or digital manifestation of the resource Typically, Format may include the media-type or dimensions of the resource

The size or duration of the resource The material or physical carrier of the resource

LangId ISO 6392 code for language Language

A language of the intellectual content of the resource

ENGNAME English name of language

NORNAME Norwegian name of language

Source A reference to a resource from which the present resource is derived

Note: Only use for harvested metadata

FilePath URL for the full text document Identifier

Govdoc Isbn Issn Sici Ismn

An unambiguous reference to the resource within a given context

When citing a resource, it is essential to include comprehensive bibliographic details to ensure unambiguous identification, regardless of whether the citation adheres to a specific standard Proper bibliographic referencing enhances the clarity and traceability of your sources, making your content more credible and SEO-friendly Following recommended practices by providing complete citation information helps readers locate the original resource easily and improves the overall quality of your content.

A government document number International Standard Book Number International Standard Serial Number Serial Item and Contribution Identifier

MAGTITLE The title of journal

MAGYEAR The published year of journal

MAGFIRSTPAGE The home page of journal

MAGLASTPAGE Last page of journal

INSTDESCR Attach a brief description of the file, which comes up on title page (such as it is a corrected version

Isversionof Hasversion Ispartof ispartofseries Haspart

A reference to a related resource References to earlier version of object References to later version of object The described resource is a physical or logical part of the referenced resource

Series name and number within that series

The described resource includes the referenced resource either physically or logically

Pointed to by referenced resource References Uniform Resource Identifier for related item

TEXTTO The series holding/contains

Referee Specify if the document is refereed

The extent or scope of the content of the resource

Spatial characteristics of the intellectual content of the resource Temporal characteristics of the intellectual content of the resource

Table 4.2: Harmonization between fields in DUO and default Dublin Core in Dspace

The study reveals significant differences between DUO field labels and Dublin Core metadata elements, with DUO labels lacking consistent assignment rules Additionally, DUO contains a greater number of elements compared to DCMES, resulting in some DUO elements mapping to a single DCMES element, while many DUO elements have no direct correspondence within DCMES.

The crosswalk of metadata elements in DUO and default Dublin Core in Dspace

The crosswalk is based on the approach outlined in section 4.1.1.2, emphasizing the preservation of key local elements of DUO during the conversion to Dspace This process involves mapping DUO data elements to existing Dublin Core elements within Dspace, ensuring consistency and data integrity Any remaining DUO elements that do not directly correspond to existing Dublin Core fields are mapped to newly created DC qualifiers, facilitating a comprehensive and accurate data transfer.

The crosswalk of metadata elements in both systems are presented in the composite table for easy comparison In this way, the element from the source (DUO) will have the

Information about rights held in and over the resource

Information about who can access the resource or an indication of its security status

A legal document giving official permission to do something with the resource

YEAROFBIRTH The birth year of author

ORGTYPE Specify the type of unit (faculty, institute…)

DISPLAY Norwegian name that appears in the interface ENGLISH

DISPLAY English name that appears in the interface UNIT CODE Unit code

SCIENCE The discipline of unit

CONTENT For series of booklets

64 correspondent element in the target (Dspace) Type of mapping of elements between the source and the target schemas is indicated as well

DUO fields Semantic mapping Qualified DC elements

KEYWORDS Subject lcsh, mesh, ddc, lcc Many to one ALTKEYWORDS

ISO 639-2RFC 3066 Many to one

Table 4.3: The crosswalk of metadata elements in DUO and Dspace

Analysis of Table 4.3 reveals that some elements listed at the bottom of the DUO table do not correspond to any elements or qualifiers in the Dublin Core metadata standard Based on the findings in Section 4.1.2.3, three elements—ORGNAME, ORGTYPE, and SCIENCE—were strongly recommended for reuse within Dspace Considering the semantic meanings of Dublin Core elements during the harmonization process, these three elements are potentially suitable candidates to map to the 'publisher' element, thereby enhancing metadata interoperability.

In the DCMES system, elements like YEAROFBIRTH, TUTOR, and CONTENT are recommended as optional, labeled "maybe use," which offers flexibility for data inclusion Administrative elements such as NORWEGIAN DISPLAY, ENGLISH DISPLAY, and UNIT CODE are utilized within the DUO platform primarily for management purposes and can be removed when not necessary Optimizing data entry by focusing on essential elements enhances system efficiency and clarity, ensuring a streamlined educational data management process.

Another issue identified in the table of crosswalk is conflicts of mapping metadata elements Some kinds of conflicts are discussed below:

Terminology conflicts between DUO and DSpace arise from inconsistent labeling of the same concepts, such as "SUBTITLE" and "ALTTITLE" in DUO versus "Title.Alternative" in Dublin Core, or "AUTHORLIST" versus "Contributor.author." Similarly, "KEYWORD" in DUO corresponds to "Subject" in Dublin Core These discrepancies can hinder interoperability and metadata integration across platforms, emphasizing the need for standardization in terminology to improve clarity and consistency in digital repositories.

 Null mapping: some elements in DUO cannot find the correspondent elements in Dublin Core List of those elements is presented in bottom in table 4.3

Many-to-one mapping in data integration involves multiple elements in DUO being consolidated into a single element in DC, with associated qualifiers stored separately This process can lead to challenges in accurate data representation, increasing the risk of data loss and distortion Proper management of these mappings is essential to ensure data integrity and prevent information from becoming misrepresented during the transformation.

KEYWORDS, ALTKEYWORDS, CLTYPE (in DUO) = Subject (in DC); CREATION DATE, FIRSTPUBLISHED, LASTPUBLISHED, MONTHAPPROVED, YEARAPPROVED (in DUO)

= Date.created, Date.issued, Date.modified, Date.accepted (in DC)

The primary challenge lies in the structural disparity between the complex metadata elements in DUO for relational databases and the simpler Dublin Core elements used in Dspace This results in numerous DUO elements functioning as qualifiers for a single Dublin Core element within Dspace, highlighting the need for effective mapping and standardization to ensure interoperability between the two metadata frameworks.

This section explains the metadata mapping mechanism involved in converting DUO to DSpace The crosswalk table 4.3, developed by combining findings from section 4.1.2 and metadata harmonization analysis in section 4.3, provides a comprehensive overview of the conversion process Evaluating the metadata mapping from DUO to DSpace helps identify potential risks and conflicts, ensuring a smoother transition Understanding these metadata issues is essential for developing a careful and effective conversion plan.

Findings of the study

The investigation of questionnaire data and the harmonization process has uncovered key insights for the study Notably, the research highlights effective strategies for converting metadata elements from DUO to DSpace, ensuring seamless data integration Additionally, the study identifies several challenges encountered during this conversion process, providing valuable guidance for future metadata management and digital library interoperability.

4.4.1 Strategy for converting metadata elements in DUO to Dspace

The important components of this strategy include motivations, approaches, influence factors and methods of the conversion

The primary motivation for migrating the DUO database to DSpace is to overcome the technical limitations of the current DUO platform DSpace offers significant advantages, including widespread adoption, easy customization, and seamless interoperability with other repository systems in Norway, making it an ideal solution for enhancing digital repository capabilities.

Secondly, two approaches for the conversion have been proposed They are presented as following:

 Completely change the metadata elements in DUO to fit with default Dublin Core Metadata Element set in Dspace

 Keeping elements of original records in DUO during the conversion

There is a remarkable emphasis that only important local elements should be kept in the conversion

Key factors influencing the conversion strategy include interoperability with other institutions in Norway, maintenance costs, preservation needs, and the availability of skilled staff Among these, interoperability and maintenance costs are considered the most critical factors affecting decision-making and successful implementation.

This article explores two primary options for converting DUO metadata in DSpace The first approach involves mapping DUO data elements to qualified Dublin Core elements and creating new qualifiers for default Dublin Core elements, enhancing interoperability Alternatively, constructing a custom schema in DSpace that mirrors DUO metadata allows for tailored metadata management Analyzed in section 4.1.1.4, both methods present distinct advantages and disadvantages, requiring careful consideration for optimal implementation.

To determine which local elements in DUO and metadata elements of DCMES in DSpace should be included in the mapping process, questionnaires were distributed to gather expert opinions The findings revealed that most Dublin Core metadata elements are essential for reuse, with strong recommendations to include elements such as document type, document type in English, unit, unit name in English, category, subtitle, approval date (day, month, year), Norwegian language type, and dissertation abstract Conversely, elements like journal first and last page, and status are suggested to be excluded Other elements should be employed as deemed appropriate based on context.

For better understanding of the mapping process of metadata elements at schema level in the conversion from DUO to Dspace, the crosswalk table 4.3 has been developed basing on

68 the above recommendations and the semantics of metadata elements analyzed in the harmonization

4.4.2 Challenges of metadata conversion from DUO to Dspace

Metadata conversion from DUO to DSpace presents several risks and conflicts, as identified through informant questionnaires The primary risks include data loss and data distortion during the conversion process Additionally, conflicts such as differences in data representation, synonym issues, structural mismatches in metadata elements, and duplicated values can arise in metadata mapping between DUO and the default Dublin Core schema in DSpace Addressing these challenges is crucial for ensuring accurate and reliable metadata integration.

Table 4.3 highlights various conflicts in metadata mapping, including terminology conflicts like synonymy and structural inconsistencies Additionally, the table reveals other issues such as null mappings and many-to-one mappings between the systems, emphasizing the complexity of effective crosswalk integration.

To control risks and conflicts of metadata mapping process in the conversion, some recommendations were given in the questionnaires

Firstly, a thorough planning before the conversion is the most important thing The plan includes competent staffs, cleaning and quality control of metadata, expertise of Dspace, procedures in the conversion

Conducting a pilot conversion with sample data is essential to identify and resolve potential problems and errors early in the process Any issues discovered during the pilot should be thoroughly analyzed and corrected to ensure a smooth full-scale conversion Careful testing at each stage, including manual comparison of individual records and automated processes, helps ensure data accuracy and minimizes risks during the conversion Lessons learned from the pilot are critical for refining the process and achieving a successful implementation.

Enhancing metadata customization in Dspace involves expanding the registry by creating additional qualifiers for existing Dublin Core elements This improvement, based on insights from the crosswalk table, will enable more precise and comprehensive metadata management within the platform Implementing these enhancements will improve discoverability and interoperability of digital resources in Dspace.

The above findings are going to be used for finding the answers for research questions in next chapter.

CONCLUSION AND RECOMMENDATION

Ngày đăng: 22/08/2023, 02:49

TRÍCH ĐOẠN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w