Guest post by Gillian Oliver, co-author of Digital Curation, 2nd edition
As with any love story with a happy ending, a successful relationship with data will take effort and commitment. Here are five practical ways to ensure the course of true love runs smoothly:
1. Data by design
Unlike human relationships you can specify your ideal characteristics and so make sure you’re working from the best possible starting point. It’s never too early to begin design, project planning should incorporate awareness of data requirements from the perspectives of the stakeholders involved. If you need convincing, remember that up-front awareness and being proactive will greatly assist in reducing the overall costs involved in data curation. The types of features to think about are likely to include choices relating to open or proprietary file formats, metadata schema and workflows, naming conventions and storage requirements.
2. Learn from others
Learn from others. Don’t try to go it alone – there’s a wealth of experience out there and much of it is freely accessible to make use of. Here are just two examples of websites which can be mined for practical advice: The Digital Preservation Coalition contains many useful reports, especially the Technology Watch series. The Digital Curation Centre has an astonishing wealth of content, ranging from basic explanations of core definitions to very practical tools and guidance.
3. Don’t try to reinvent the wheel
This is further emphasising the point above, which can’t be repeated often enough. There are many standards available, such as the Open Archival Information System standard which provides ahigh level conceptual model for digital archives, or the Dublin Core schema for descriptive metadata. These standards have been developed by international and cross-disciplinary communities, and are subject to ongoing review.
4. Don’t be a loner, get out and socialise
There are plenty of opportunities to collaborate and work together with people grappling with the same problems which can only enrich your relationship with your data. Sharing your knowledge will help continue to build and grow the worldwide community of practice. Socialising can be face to face, if you’re fortunate enough to be able to take advantages of the many conferences, workshops and events that take place around the world, or online. The Open Preservation Foundation provides a central hub for tools, advice and knowledge exchange – particularly useful are the blogs which provide insight into current activities, both successes and failures.
5. Never give up
Good relationships can be established at a much later stage, unappreciated and unloved data need not be rejected if there are signs that there is potential for a fulfilling and positive future. But you will need specialist advice if you need to go down this track. BitCurator provides a gateway to digital forensics tools and methods in the cultural heritage context. Brown Dog is a project that seeks to bring the long tail of data into the light – the focus of their efforts is past and present uncurated data.
So, what are you waiting for? Love your data, starting today!
Gillian Oliver is Associate Professor at Monash University and the co-author of Digital Curation, 2nd edition (Facet 2016) and Records Management and Information Culture (Facet 2014), the co-editor of Engaging with Records and Archives (Facet 2016) and a Co-editor in Chief of the journal Archival Science.
Sign up to our mailing list to hear more about new and forthcoming books:
Guest post by Starr Hoffman, editor of Dynamic Research Support for Academic Libraries.
Similar to the confusion between open access as opposed to open source, the terms research data and secondary data are sometimes confused in the academic library context. A large source of confusion is that the simple term “data” is used interchangeably for both of these concepts.
What is Research Data?
As research data management (RDM) has become a hot topic in higher education due to grant funding requirements, libraries have become involved. Federal grants now require researchers to include data management plans (DMPs) detailing how they will responsibly make taxpayer-funded research data 1) available to the public via open access (for instance, depositing it in a repository) and 2) preserve it for the future. Because there are often gaps in campus infrastructure around RDM and open access, many academic libraries have stepped in to provide guidance with writing data management plans, finding appropriate repositories, and in other good data management practices.
This pertains to original research data–that is, data that is collected by the researcher during the course of their research. Research data may be observational (from sensors, etc), experimental (gene sequences), derived (data or text mining), among other type, and may take a variety of forms, including spreadsheets, codebooks, lab notebooks, diaries, artifacts, scripts, photos, and many others. Data takes many forms not only in different disciplines, but in different methodologies and studies.
Example: For instance, Dr. Emmett “Doc” Brown performs a series of experiments in which he notes the exact speed at which a DeLorean will perform a time jump (88 MPH). This set of data is original research data.
What is Secondary Data?
Secondary data is usually called simply “data” or “datasets.” (For the sake of clarity, I prefer to refer to it as “secondary data.”) Unlike research data, secondary data is data that the researcher did not personally gather or produce during the course of their research. It is pre-existing data on which the researcher will perform their own analysis. Secondary data may be used either to perform original analyses or for replication (studies which follow the exact methodology of a previous study, in order to test the reliability of the results; replication may also be performed by following the same methodology but gathering a new set of original research data). Secondary data can also be joined to additional datasets, including datasets from different sources or joining with original research data.
Example: Let’s say that Marty McFly makes a copy of Doc Brown’s original data and performs a new analysis on it. The new analysis reveals that the DeLorean was only able to time-jump at the speed of 88 MPH due to additional variables (including a power input of 1.21 jigowatts). In this case, the dataset is secondary data.
Reuse of Research Data
Another potential point of confusion is that one researcher’s original research data can be another researcher’s secondary data. For instance, in the example above, the same dataset is considered original research data for Doc Brown, but is secondary data for Marty McFly.
Data Services: RDM or Secondary Data?
The phrase “data services” can also be confusing, because it may encompass a variety of services. A potential menu of data services could include:
- Assistance locating and/or accessing datasets.
o This might pertain to vendor-provided data collections, consortial collections (such as ICPSR), locally-produced data (in an institutional repository), or with publically-accessible data (such as the U.S. census).
o Because this service specifically focuses on accessing data, it by default pertains to secondary data.
- Data management plan (DMP) assistance.
o Typically only applies to original research data.
- Data curation and/or RDM services.
o These may include education on good RDM practices, assistance depositing data into an institutional repository (IR), assistance (or full-service) creating descriptive or other metadata, and more.
o Typically only provided for original research data. However, if transformative work has been done to a secondary dataset (such as merging with additional datasets or transforming variables), data curation / RDM may be necessary.
- Assistance with data analysis.
o This service is more often provided for students than for faculty, but may include both groups.
o Services may include providing analysis software, software support, methodological support, and/or analytical support.
o May include support for both original research data and secondary data.
You Say “Data Are,” I Say “Data Is” …Let’s Not Call the Whole Thing Off!
So in the end, what does all this matter? The primary takeaway is to be clear, particularly when communicating about services the library will or won’t provide, about specific types of data. In many cases this will be obvious–for instance, “RDM” contains within it the term “research data” and is thus clear. Less clear is when a library department decides to provide “assistance with data.” What does this mean? What kind of assistance, and for what kind of data? Is the goal of the service to support good management of original research data? Or is the goal to support the finding and analysis of secondary data that the library has purchased? Or another goal altogether?
Clarity is key both to understanding each other and to clearly communicating emerging services to our researchers.
Starr Hoffman is Head of Planning and Assessment at the University of Nevada, Las Vegas, where she assesses many activities, including the library’s support for and impact on research. Previously she supported data-intensive research as the Journalism and Digital Resources Librarian at Columbia University in New York. Her research interests include the impact of academic libraries on students and faculty, the role of libraries in higher education and models of effective academic leadership. She is the editor of Dynamic Research Support for Academic Libraries. When she’s not researching, she’s taking photographs and travelling the world.
Sign up to our mailing list to hear more about our books: