Guest post by Starr Hoffman, editor of Dynamic Research Support for Academic Libraries.
Similar to the confusion between open access as opposed to open source, the terms research data and secondary data are sometimes confused in the academic library context. A large source of confusion is that the simple term “data” is used interchangeably for both of these concepts.
What is Research Data?
As research data management (RDM) has become a hot topic in higher education due to grant funding requirements, libraries have become involved. Federal grants now require researchers to include data management plans (DMPs) detailing how they will responsibly make taxpayer-funded research data 1) available to the public via open access (for instance, depositing it in a repository) and 2) preserve it for the future. Because there are often gaps in campus infrastructure around RDM and open access, many academic libraries have stepped in to provide guidance with writing data management plans, finding appropriate repositories, and in other good data management practices.
This pertains to original research data–that is, data that is collected by the researcher during the course of their research. Research data may be observational (from sensors, etc), experimental (gene sequences), derived (data or text mining), among other type, and may take a variety of forms, including spreadsheets, codebooks, lab notebooks, diaries, artifacts, scripts, photos, and many others. Data takes many forms not only in different disciplines, but in different methodologies and studies.
Example: For instance, Dr. Emmett “Doc” Brown performs a series of experiments in which he notes the exact speed at which a DeLorean will perform a time jump (88 MPH). This set of data is original research data.
What is Secondary Data?
Secondary data is usually called simply “data” or “datasets.” (For the sake of clarity, I prefer to refer to it as “secondary data.”) Unlike research data, secondary data is data that the researcher did not personally gather or produce during the course of their research. It is pre-existing data on which the researcher will perform their own analysis. Secondary data may be used either to perform original analyses or for replication (studies which follow the exact methodology of a previous study, in order to test the reliability of the results; replication may also be performed by following the same methodology but gathering a new set of original research data). Secondary data can also be joined to additional datasets, including datasets from different sources or joining with original research data.
Example: Let’s say that Marty McFly makes a copy of Doc Brown’s original data and performs a new analysis on it. The new analysis reveals that the DeLorean was only able to time-jump at the speed of 88 MPH due to additional variables (including a power input of 1.21 jigowatts). In this case, the dataset is secondary data.
Reuse of Research Data
Another potential point of confusion is that one researcher’s original research data can be another researcher’s secondary data. For instance, in the example above, the same dataset is considered original research data for Doc Brown, but is secondary data for Marty McFly.
Data Services: RDM or Secondary Data?
The phrase “data services” can also be confusing, because it may encompass a variety of services. A potential menu of data services could include:
- Assistance locating and/or accessing datasets.
o This might pertain to vendor-provided data collections, consortial collections (such as ICPSR), locally-produced data (in an institutional repository), or with publically-accessible data (such as the U.S. census).
o Because this service specifically focuses on accessing data, it by default pertains to secondary data.
- Data management plan (DMP) assistance.
o Typically only applies to original research data.
- Data curation and/or RDM services.
o These may include education on good RDM practices, assistance depositing data into an institutional repository (IR), assistance (or full-service) creating descriptive or other metadata, and more.
o Typically only provided for original research data. However, if transformative work has been done to a secondary dataset (such as merging with additional datasets or transforming variables), data curation / RDM may be necessary.
- Assistance with data analysis.
o This service is more often provided for students than for faculty, but may include both groups.
o Services may include providing analysis software, software support, methodological support, and/or analytical support.
o May include support for both original research data and secondary data.
You Say “Data Are,” I Say “Data Is” …Let’s Not Call the Whole Thing Off!
So in the end, what does all this matter? The primary takeaway is to be clear, particularly when communicating about services the library will or won’t provide, about specific types of data. In many cases this will be obvious–for instance, “RDM” contains within it the term “research data” and is thus clear. Less clear is when a library department decides to provide “assistance with data.” What does this mean? What kind of assistance, and for what kind of data? Is the goal of the service to support good management of original research data? Or is the goal to support the finding and analysis of secondary data that the library has purchased? Or another goal altogether?
Clarity is key both to understanding each other and to clearly communicating emerging services to our researchers.
Starr Hoffman is Head of Planning and Assessment at the University of Nevada, Las Vegas, where she assesses many activities, including the library’s support for and impact on research. Previously she supported data-intensive research as the Journalism and Digital Resources Librarian at Columbia University in New York. Her research interests include the impact of academic libraries on students and faculty, the role of libraries in higher education and models of effective academic leadership. She is the editor of Dynamic Research Support for Academic Libraries. When she’s not researching, she’s taking photographs and travelling the world.
Sign up to our mailing list to hear more about our books:
It’s Love Your Data Week 2017 and today we have made Section 8 from Moira Bent’s Practical Tips for Facilitating Research available open access. A PDF of the Section, Specific interventions in the research process or lifecycle, canbe downloaded here.
We will be releasing more Open Access chapters throughout Love Your Data We
ek and publishing blogposts from our authors. For a chance to win one of our research
data management books, share a tweet about why you (or your institution) are participating in Love Your Data Week 2017 using #WhyILYD17. More details about the prize draw are available here.
Sign up to our mailing list to hear more about new and forthcoming books:
Facet Publishing have announced the release of The Data Librarian’s Handbook by Robin Rice and John Southall.
This new book, written by two data librarians with over 30 years’ experience, unpicks the everyday role of the data librarian and offers practical guidance on how to collect, curate and crunch data for economic, social and scientific purposes.
Interest in data has been growing in recent years. Support for this peculiar class of digital information – its use, preservation and curation, and how to support researchers’ production and consumption of it in ever greater volumes to create new knowledge, is needed more than ever. Many librarians and information professionals are finding their working life is pulling them toward data support or research data management but lack the skills required.
Covering everything from handling, managing and curating data; data literacy; research data management policies; data management plans; data repositories; confidential or sensitive data; open scholarship and open science, The Data Librarian’s Handbook is a must-read for all new entrants to the field, LIS students and working professionals.
The authors said, “Our aim is to offer an insider’s view of data librarianship as it is today, with plenty of practical examples and advice. At times we link this to wider academic and research agendas and scholarly communication trends, while grounding these thoughts back in theeveryday work of data librarians and other information professionals”.
Robin Rice is Data Librarian at EDINA and Data Library, an organisation providing data
services for research and education based in Information Services at the University of Edinburgh.
John Southall is Data Librarian for the Bodleian Libraries at the University of Oxford. He is based in the Social Science Library and is subject consultant for Economics, Sociology and Social Policy & Intervention.
Sign up to our mailing list below:
Starr Hoffman has made two videos to support her new book Dynamic Research Support for Academic Libraries, published this month by Facet. The first video describes how academic libraries can support the research lifecycle for faculty and students and the second introduces the book and defines ‘research support’.
Facet are pleased to announce the release of two new books, Practical Tips for Facilitating Research and Dynamic Research Support for Academic Libraries.
Higher education is in a period of rapid evolution and academic libraries must continually evaluate and adjust their services to meet new needs. Librarian roles are changing and new specialisms, such as data librarians are emerging. Activities are being driven by researcher requirements such as the demand for wider dissemination and the impact of research.
Two new books from Facet Publishing, Practical Tips for Facilitating Research and Dynamic Research Support for Academic Libraries, will provide inspiration and practical guidance to enable LIS staff developing their role in the research environment to evaluate their current provision and develop services to meet the evolving needs of the research community.
Practical Tips for Facilitating Research offers innovative tips and reliable best practice to assist academic liaison librarians, research support librarians and all library and information professionals who work with research staff and students.
Author Moira Bent said, “my book bridges the gap between theory and practice, grounding the very practical ideas garnered from library and information staff around the world in current research in the library and information science discipline.”
Dynamic Research Support for Academic Libraries provides inspiration through illustrative examples of emerging models of research support and is contributed to by library practitioners from across the world.
Editor Starr Hoffman said, “Dynamic Research Support for Academic Libraries is designed to inspire librarians and administrators to think of ‘research support’ not merely as Reference 2.0, but as an innovative, holistic activity that should be distributed throughout the organization.”
A preview chapter for each book is available on the Facet website, along with information about how to order.
The slideshow below takes you chapter-by-chapter through the new Facet title edited by Deborah Shorley and Michael Jubb, The Future of Scholarly Communication.
Picture a scene: in a county record office somewhere in England, a young archivist is looking through the morning post. Among the usual enquiry letters and payments for copies of documents is a mysterious padded envelope. Opening it reveals five floppy disks of various sizes, accompanied by a brief covering letter from the office manager of a long-established local business, explaining that the contents had been discovered during a recent office refurbishment; since the record office has previously acquired the historic paper records of the company, perhaps these would also be of interest? The disks themselves bear only terse labels, such as ‘Minutes, 1988-90’ or ‘customers.dbf’. Some, the archivist recognizes as being 3.5” disks, while the larger ones seem vaguely familiar from a digital preservation seminar she attended during her training. On one point she is certain: the office PCs are not capable of reading any of them. How can she discover what is actually on the disks, and whether they contain important business records or junk? And even if they do prove of archival interest, what should the record office actually do with them?
Meanwhile, a university librarian in the mid-west USA attends a faculty meeting to discuss the burgeoning institutional repository. Introduced a few years ago to store PDF copies of academic preprints and postprints, there is increasing demand from staff to store other kinds of content in a much wider range of formats, from original research data, to student dissertations and theses, teaching materials and course notes, and to make that content available for reuse by others in novel ways. How, the librarian ponders, does the repository need to be adapted to meet these new requirements, and what must the library do to ensure the long-term preservation of such a diverse digital collection?
Finally, in East Africa, a national archivist has just finished reading a report from a consultant commissioned to advise on requirements for preserving electronic records. The latest in a series of projects to develop records management within government, he knows that this work is crucial to promoting transparency, empowering citizens by providing them with access to reliable information, reducing corruption and improving governance through the use of new technologies. The national archives has achieved much in recent years, putting in place strong records management processes and guidance. But how to develop the digital preservation systems necessary to achieve the report’s ambitious recommendations, with limited budgets and staff skills, and an unreliable IT infrastructure?
Practical Digital Preservation is intended to help these people, and the countless other information managers and curators around the world who are wrestling with the challenges of preserving digital data, to answer these questions. If the book had been written only a few years ago, it would first have to explain the need for digital preservation at length, illustrated no doubt with celebrated examples of data loss such as the BBC Domesday disks, or NASA’s Viking probe.
Today, most information management professionals are all too aware of the fact that, without active intervention, digital information is subject to rapid and catastrophic loss – the warnings of an impending ‘Digital Dark Ages’ have served their purpose. Hopefully, they are equally alive to the enormous benefits of digital preservation, in unlocking the current and long-term value of that information. Instead, their principal concern now is how to respond in a practical way to these challenges. There is a sense that awareness of the solutions has not kept pace with appreciation of the potential and the problems.
Such solutions as are widely known are generally seen as being the preserve of major institutions – the national libraries and archives – with multi-million pound budgets and large numbers of staff at their disposal. Even if reality often doesn’t match this perception – many national memory institutions are tackling digital preservation on a comparative shoestring – there is no doubt that such organizations have been at the vanguard of developments in the field.
The challenges can sometimes appear overpowering. The extraordinary growth in the creation of digital information is often described using rather frightening or negative analogies, such as the ‘digital deluge’ or ‘data tsunami’. These certainly reflect the common anxieties that information curators and consumers have about their abilities to manage these gargantuan volumes of data, and to find and understand the information they need within. These concerns are compounded by a similarly overwhelming wave of information generated by the digital preservation community: no one with any exposure to the field can have escaped a certain sense of despair at ever keeping up to date with the constant stream of reports, conferences, blogs, wikis, projects and tweets.
Practical Digital Preservation demonstrates that, in reality, it is not only possible but eminently realistic for organizations of all sizes to put digital preservation into practice, even with very limited resources and existing knowledge. The book demonstrates this through a combination of practical guidance, and case studies which reinforce that guidance, illustrating how it has already been successfully applied in the real world.