Abstract This article reports on the second quantitative phase of an exploratory sequential mixed methods research design focused on researcher data management practices and related ins
Trang 1Volume 6 Issue 1 Article 6 2017-03-31
An Exploratory Sequential Mixed Methods Approach to
Understanding Researchers’ Data Management Practices at UVM: Findings from the Quantitative Phase
https://escholarship.umassmed.edu/jeslib/vol6/iss1/6
Creative Commons License
This work is licensed under a Creative Commons
Attribution-Noncommercial-Share Alike 4.0 License
This material is brought to you by
eScholarship@UMassChan It has been accepted for
inclusion in Journal of eScience Librarianship by an
authorized administrator of eScholarship@UMassChan
Trang 2Full-Length Paper
An Exploratory Sequential Mixed Methods Approach to Understanding
Researchers’ Data Management Practices at UVM: Findings from the
Quantitative Phase
Elizabeth A Berman
Tufts University, Medford, MA, USA
*Formerly Library Associate Professor, University of Vermont
Correspondence: Elizabeth A Berman: elizabeth.berman@tufts.edu
Keywords: data management, mixed methods research, quantitative research, research data
services, academic libraries, survey
Rights and Permissions: Copyright Berman © 2017
All content in Journal of eScience Librarianship, unless otherwise noted, is licensed under
a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License
Abstract
This article reports on the second quantitative phase of an exploratory sequential mixed methods research design focused on researcher data management practices and related institutional support and services The study aims to understand data management activities and challenges of faculty at the University of Vermont (UVM), a higher research activity Research University, in order to develop appropriate research data services (RDS) Data was collected via a survey, built on themes from the initial qualitative data analysis from the first phase of this study The survey was distributed to a nonrandom census sample of full-time UVM faculty and researchers (P=1,190); from this population, a total of 319 participants completed the survey for a 26.8% response rate The survey collected information on five dimensions of data management: data management activities; data management plans; data management challenges; data management support; and attitudes and behaviors towards data management planning Frequencies, cross tabulations, and chi-square tests of independence were calculated using demographic variables including gender, rank, college, and discipline Results from the analysis provide a snapshot of research data management activities at UVM, including types of data collected, use of metadata, short- and long-term storage of data, and data sharing practices The survey identified key challenges to data management, including data description (metadata) and sharing data with others; this latter challenge is particular impacted by confidentiality issues and lack of time, personnel, and infrastructure to make data available Faculty also provided insight to RDS that they think UVM should support, as well as RDS they were personally interested in Data from this study will be integrated with data from the first qualitative phase of the research project and analyzed for meta-inferences to help determine future research data services at UVM
Trang 3Introduction
The need for data curation, “the active and ongoing management of data through its life cycle
of interest and usefulness to scholarship, science, and education” (Council on Library and Information Resources 2016, para 1), has become a major issue in scholarly communication:
“Data curation activities enable data discovery and retrieval, maintain its quality, add value, and provide for reuse over time” (para 1) Since 2003, the National Institutes of Health (NIH) have required investigators requesting $500,000 or more in direct costs in any year of a grant
to share their data with the scientific community (National Institutes of Health 2003) In 2011, the National Science Foundation (NSF) began to require that researchers submit a data management plan (DMP) with their grant applications; the purpose of the DMP was to account for the long-term preservation of and access to scientific research data produced through government funding In 2013, the White House Office of Science & Technology Policy (OSTP) issued a directive that requires granting agencies to develop a plan to make both the data and published articles of federally funded research available to the public at no cost Since that memorandum, federal agencies have been developing their own plans and policies to account for public access to federally funded research; the Association of Research Libraries (ARL) website (2016) is maintaining links to these agency plans
Beyond federal research mandates, data in and of itself is increasingly being acknowledged as
a scholarly product, a crucial part of academic discourse that has the potential to impact future research (Williford and Henry 2012) This is particularly true in interdisciplinary and transdisciplinary domains such as environmental studies where researchers are “dependent upon access, discovery, and interoperability of data sets drawn from a variety of sources” (Scaramozzino, Ramírez, and McGaughey 2012, 350) Data curation also extends into the arts and humanities; Flanders and Muñoz write, “a key aspect of humanities data curation is thus to ensure that the representations of objects of study in the humanities functions effectively as data: that they are processable by machines and interoperable such that they are durably processable across systems and collections whiles still retaining provenance and complex layers of meaning” (2014, para 3)
This increased recognition of the importance of preserving and maintaining digital data has had a direct impact on higher-education institutions that are working to provide data curation services, or “the active management and appraisal of digital information over its entire life cycle” (Pennock 2007, para 2) A number of researchers have conducted needs assessments
or environmental scans of their institutions in order to understand their research data landscape One popular method for conducting these scans has been to utilize quantitative methods, an approach that collects and analyzes numerical data from a sample population in order to examine the relationship among variables to test theories and generalize to a broader population (Creswell 2014; Singleton and Straits 2010) In particular, multiple studies have been published using survey instruments to collect data from a diverse sample (Table 1) These studies are generally framed around the Data Lifecycle Model (DDI Alliance Structural Reform Group 2004): collecting research data; describing, analyzing, and short-term storage of data; and access to and long-term preservation of data (Figure 1)
Trang 4Table 1: Comparison of methods used in data management studies
Akers and Doty (2013) Emory University 13 questions330 respondents Survey
D’Ignazio and Qin (2008) SUNY College of Environmental Science & Forestry
Syracuse University
-111 respondents Diekema, Wesolek, and Walters (2014) multi-institution 16 questions196 respondents Parham, Bodnar, and Fuchs (2012) Georgia Institute of Technology -63 respondents Scaramozzino, Ramírez, and
McGaughey (2012) California Polytechnic State University, San Luis Obispo 18 questions82 respondents
Steinhart et al (2012) Cornell University 43 questions86 respondents
Tenopir et al (2011) multi-institution 23 questions1,329 respondents Weller and Monroe-Gulick (2014) University of Kansas -415 respondents Whitmire, Boock, and Sutton (2015) Oregon State University 29 questions443 respondents
Figure 1: Data Lifecycle Model (DDI Alliance Structural Reform Group 2004)
Trang 5A number of these studies explicitly focus on researchers in the science and technology fields, where discussions about data management have been accelerated due to NIH and NSF funding mandates Cornell University’s Research Data Management Service Group surveyed NSF Principal Investigators (PIs) “in order to understand how well-prepared researchers are to meet the new NSF data management planning requirement, to build our own understanding of the potential impact on campus services, and to identify service gaps” (Steinhart et al 2012, 64) Diekema, Weslock, and Walters (2014) investigated whether science and engineering researchers had the skills to effectively manage data and whether the institution had the necessary infrastructure to support data management activities To answer these research questions, the authors surveyed three groups of interest: STEM faculty, sponsored program officers, and academic librarians affiliated with institutional repositories
Other researchers are taking a broader approach, surveying the entire faculty population to understand similarities and differences in disciplinary management of digital data Parham, Bodnar, and Fuchs (2012) designed a survey to better understand data resource output in order to “discover the types of data assets created and held by researchers, how the data are managed, stored, shared, and reused, and researchers’ attitudes toward data creation, sharing, and preservation” (10) Scaramozzino, Ramírez, and McGaughey (2012) surveyed teacher-scholar faculty at California Polytechnic State University, San Luis Obispo, to address issues of data preservation, data sharing, and education needs of researchers managing data Akers and Doty (2013) and Whitmire, Boock, and Sutton (2015) used surveys to understand varying approaches to data management in order to develop appropriate research data services
These studies are informative to the research behaviors of faculty, but their focus on institutional populations limits their generalizability to all research faculty McLure et al (2014) emphasize that “local studies can inform libraries and librarians about the behaviors, needs, interests, and concerns of researchers at individual institutions” (158) Guided by the literature, this study is crucial to unpacking and understanding specific approaches to data management,
as well as data management needs and challenges, at the University of Vermont
Purpose Statement
This article reports on the second phase of an exploratory sequential mixed methods research (MMR) design aimed at understanding data management behaviors and data management planning attitudes of faculty at the University of Vermont (UVM) The strength of mixed methods research is that it draws on the strengths of both qualitative and quantitative research, providing a more holistic understanding of a problem or phenomenon The exploratory sequential mixed methods design, characterized by an initial phase of qualitative data collection and analysis, followed by a phase of quantitative data collection and analysis (Figure 2), was selected in order to develop better instruments to measure data management activities at UVM, including behaviors and attitudes toward data management planning (Creswell 2014)
For the quantitative phase of this study, a survey instrument was developed based on the qualitative analysis of the first phase of the study in order to establish a broad understanding of the campus data management environment (Berman 2017) The survey measured the following dimensions: data management activities; data management plans; data management
Trang 6challenges; data management support; attitudes and behaviors towards data management planning; and demographics This survey was deployed to all current UVM faculty and researchers in an attempt to reveal key distinctions among different populations of researchers and generalize the findings from the phase one qualitative research, which only focused on successful National Science Foundation (NSF) grantees (Berman 2017)
The second phase of this MMR research was guided by four research questions The first two parallel the research questions from the qualitative phase, while questions three and four were developed explicitly from the qualitative data analysis (Berman 2017):
RQ1: How do faculty at UVM manage their research data, in particular how do they
share and preserve data in the long-term?
RQ2: What challenges or barriers do UVM faculty face in effectively managing their
research data?
RQ3: What institutional data management support or services are UVM faculty
interested in?
RQ4: How do researchers’ attitudes and beliefs towards the data management
planning process influence their data management behaviors, in particular how do they intend to share and preserve their data?
The primary objective of this phase of the research study is to understand researchers’ current data management behaviors and challenges within and across all disciplines The results of this phase will be integrated with the results of the first phase to guide the development of research data services at UVM As a result, the analysis of RQ4 will not be addressed in this publication as it proposes the development of a bipolar adjective scale to assess attitudes and beliefs towards the data management planning process in order to measure intention of implementing data management plans
Figure 2: Exploratory sequential mixed methods research design
Trang 7Methods
Population
The target population for this quantitative study was all full-time faculty at the University of Vermont UVM is a higher-research activity Research University with a humanities and social sciences-dominant graduate instructional program (The Carnegie Classification of Institutions
of Higher Education 2017) In 2015-2016, UVM enrolled 10,081 undergraduate students, 1,360 graduate students, and 457 medical students (University of Vermont 2017) Working with the Office of Institutional Research, a list was generated of 1,190 full-time instructional and research faculty as of October 1, 2015 Using nonrandom census sampling, the entire population was invited to participate in the survey via a personalized email invitation
Survey Instrument Development
Surveys provide a means to standardize measurement of a phenomenon, ensuring that consistent information is obtained across all respondents (Fowler 2014) Utilizing design-level data linking (Creswell and Plano Clark 2011; Fetters, Curry, and Creswell 2013), themes from the analysis of the qualitative data were used to drive development of the survey instrument; in particular, the language used and themes addressed by interview participants and in data management plans formed the foundation for writing questions (Berman 2017) Questions related to attitudes and behaviors used the theory of planned behavior (Ajzen 1991; Ajzen 2005; Ajzen and Fishbein 2000) as a model of how researcher attitudes and beliefs guide intention and behavior towards data management Survey development was also informed by prior research (in particular Akers and Doty 2013; Scaramozzino, Ramírez, and McGaughey 2012; Whitmire, Boock, and Sutton 2015)
The survey included 46 questions (Q1-Q46) and 72 items covering five dimensions: data management activities (Q4-Q16); data management plans (Q17-Q23); data management challenges (Q33); data management support (Q34-Q41); and attitudes and behaviors towards data management planning (Q24-Q32) Q1 was used to screen out participants who do not collect, generate, or use data for their research, while Q3 screened out participants who do not engage in management of digital data These participants were branched to the demographics section Demographic data (Q42-Q46) was requested from all survey participants and included college, department, rank, number of years at UVM, and gender The full instrument can be found in the Appendix
Survey Administration
The survey was created using UVM’s LimeSurvey software license, which allowed for electronic distribution and collection of data Following the advice of Dillman, Smyth, and Christian (2008), the layout provided intuitive navigation through the survey instrument, the questions were uncluttered and easy to read, and the response tasks were simple, with predominantly closed-question options The survey was pre-tested by six faculty researchers
in four disciplines to ensure that the questions were well understood and that the answers were meaningful (Madans et al 2011; Presser et al 2004) Based on feedback from the pre-test, survey questions and instrument design were modified A final survey instrument was
Trang 8submitted to the UVM Research Protections Office and received a Protocol Exemption Certification
All full-time UVM faculty and researchers were invited to participate in the study via a personalized email that included a brief description of the purpose of the survey and a unique link to the survey To encourage participation, the survey invited participants to enter their names into a raffle for six $50 Amazon.com gift certificates at the completion of the survey The survey was open from October 20, 2015 through November 11, 2015, with two reminder emails sent on October 29, 2015, and November 9, 2015 Data were downloaded from LimeSurvey and analyzed in SPSS version 22
Results
Quantitative Survey Respondents
Of the 1,190 UVM faculty who were invited to participate in the survey, 345 participants started the survey and 319 participants completed the survey for a 26.8% response rate This response rate is within the range of online response rates (20.0% to 47.0%) identified by Nulty (2008), and is comparable to response rates from similar published research (D’Ignazio and Qin 2008; Whitmire, Boock, and Sutton 2015) While appropriate measures were taken to reduce sources of bias, the relatively low response rate increases the potential for non-response bias, where respondents differ in meaningful ways from non-respondents (Singleton and Straits 2010) Descriptive statistics of respondent demographics can be found in Table 2
Table 2: Descriptive statistics of participants in phase two
Value Frequency Observed Proportion Observed Frequency Expected Proportion Expected Percentage Deviation Standardized Residuals
Trang 9Due to the wide range of disciplines within the College of Arts and Sciences, faculty were also sorted into disciplinary categories for analysis: Arts & Humanities (A&H), Social Sciences & Business (SS&B), and Science, Technology, Engineering & Mathematics (STEM) (Table 3)
Table 3: Disciplinary alignment of survey respondents
Because of the wide representation of researchers within the population of study, not all survey questions were applicable to all respondents Screening questions and branching logic were employed to ensure participants were asked to respond only to relevant questions; depending on responses, participants could be asked to answer 6 questions (N=43), 16 questions (N=38), 30 questions (N=177), or 46 questions (N=61) (Figure 3) Because there were no required questions, response rates for each question varied
Since the survey was distributed to the entire population, and not a random sample of the population, survey responses may be skewed towards researchers with a greater stake in data management activities A chi-square goodness of fit test was calculated to determine if the sample proportions of UVM faculty college, rank, and gender were in the same proportions of those reported for the UVM faculty population The test was conducted using α = 0.05 As shown in Table 2, there was a statistically significant difference between the sample and the population for college (n = 249, X2 = 16.55, df = 7, p = 0.0205), rank (n = 252, X2 = 11.61, df =
5, p = 0.0405), and gender (n = 252, X2 = 9.56, df = 1, p = 0.002) Faculty from the College of Arts and Sciences and the College of Education and Social Services were notably
Arts & Humanities (A&H)
Art & Art History (CAS)
Asian Languages &
Political Science (CAS) Psychological Sciences (CAS) Social Work (CESS)
Sociology (CAS)
Animal Science (CALS) Biochemistry (CALS) Biology (CAS) Chemistry (CAS) Computer Science (CEMS) Engineering (CEMS) Geology (CAS) Mathematics & Statistics (CEMS)
Medicine (COM) Microbiology & Molecular Genetics (CALS)
Natural Resources &
Environment (RSENR) Nursing & Health Sciences (CNHS)
Nutrition & Food Science (CALS)
Physics (CAS) Plant & Soil Science (CALS) Plant Biology (CALS)
Trang 10over-sampled, while faculty from Grossman School of Business and the College of Medicine were under-sampled As a result, the sample was not representative of the population, which may limit generalizability of the results to the campus
Quantitative Data Analysis
RQ1 Data Management Activities
Survey questions were structured around data management activities based on the Data Lifecycle Model (DDI Alliance Structural Reform Group 2004) and the themes covered in the phase one qualitative research (Berman 2017) Questions included: types of data collected (Q2); data file size (Q4); generation and use of metadata (Q5); short-term (5 years or less) data storage (Q6); long-term (more than 5 years) data storage and preservation (Q8); data retention (Q9); data sharing practices (Q13) and limitations (Q14)
On average, respondents produced and collected 4.42 types of digital data, with a standard deviation of 2.49; full results of data types, by discipline, can be seen in Figure 4 Table 4 shows frequencies for data management activity variables, including metadata generation, digital data size, short-term data storage, long-term data storage and preservation, retention of digital data, and data sharing methods Of respondents that do create metadata (N=50), seven indicated that they use known metadata standards, while the remaining 43 use a standard they devised Seventeen survey respondents indicated they deposited data into repositories, notably GenBank, Protein Data Bank (PDB), the Long-Term Ecological Research Network (LTER), and the Gene Expression Omnibus (GEO) Analysis of these data management variables (Q4-Q9) and gender, rank, college, and discipline, produced no statistically significant differences
Figure 3: Survey branching logic flowchart and number of respondents
Trang 11Table 4: Data management activities variables
*Respondents were allowed to select multiple responses
Q10 Long-Term Data Storage and Preservation Location (Always/Often)*
Q11 Retention of Digital Data
Trang 12Figure 4: Q2 Which of the following best describe the types of data you have produced, or anticipate
producing, as part of your research? Please choose all that apply (N=276)
Figure 5: Q13 How often do you share your digital data with others (outside your research team) using
the following methods (always, often, sometimes, rarely, never)? (N=208)