Letter from the Editor
If I was to sum up the topic that comes up time and time again, not only in the articles in this issue, it is the necessity for standards to enable digital curation. It doesn’t matter what type of data is being curated; anything from metadata about research projects, publications and grey literature, the methodologies and results of laboratory work, or the measurements from long-term observational missions. One thing is certain, the rate at which data is being created is increasing so dramatically that the only way to manage curation is to automate it, and the only way to do that is to have standardized structures and ontologies.
Digital curation is an oft-neglected part of the research process, as most researchers have neither the time, funding, or possibly even inclination to deal with it. Still, if the value of the data is to be fully realized—and why would any researcher bother collecting the data if s/he didn’t think they were valuable—they need to be properly curated, enabling interdisciplinary and international collaboration and their reuse and repurposing.
Bird et al. open the issue with a discussion of data curation issues and potential solutions in the chemical sciences, acknowledging that chemists and other scientists have gained much from the developments of the digital era. Unfortunately, not all chemists have taken full advantage of the tools and services available for improving the curation and preservation of their data. The rise in prominence of data citation and the advent of DataCite, that assigns Digital Object Identifiers (DOIs) to data, has meant that the essentials of curation are brought further into the researchers‘ awareness, as compliance with DataCite requirements means that institutions minting their own DOIs are contracted to preserve the data and to provide a core kernel of essential, standardized metadata necessary to discover and access the data.
Conway et al. then take us on a tour of the different types of environmental datasets and their importance for society as a whole. Good data curation practices for these datasets are essential, as these datasets are non-reproducible (without the aid of a time machine), so curation is needed to obtain the return on investment, share the cost and avoid duplication of effort, promote innovation through interoperability, and improve social and environmental responsibility. But for effective curation to happen, standards are required, not only to ease the process of curation but also to ensure that the datasets will remain accessible and understandable into the future.
It’s not just data that are digital and need curation. Our final two papers take us into the realm of articles, publications, and reports and the associated issues with curating these objects.
Schirrwagen et al. discusses the OpenAIRE Scholarly Communication Infrastructure, which provides access to the research output of European funded projects and open access content from a network of institutional and disciplinary repositories. The OpenAIRE infrastructure allows researchers’ published results to be exposed in their full context, accompanied by all the related information that allows users to discover and understand the research processes, activities, and outputs that have taken place. This information includes such diverse topics as program funding, associated datasets, related publications, citations, institutional affiliation, and also a different range of metrics indicating the scientific impact, which may be amalgamated with the publication as a so-called “enhanced publication.” This aggregation and enrichment of “data” in an infrastructure such as OpenAIRE involves specific data curation activities over and above those already done by the article or dataset’s originating repository or data archive.
Moore and Evans describe the use of PDF as the common format for deposition of archaeological reports in the Archaeology Data Service (ADS) and the opportunities and issues posed by PDF/A . Reports from archaeological events are often enhanced publications in their own right, including a full text description, raster images (color and grayscale), tabular data imported from software such as Access or Excel, and vector data imported from CAD or GIS programs. PDF/A as a standard has provided a technically sophisticated, open, and self-contained archiving solution to the preservation of the PDF, though there are concerns about the inadequate metadata requirements for the embedded content.
I’ll leave you with one final quote from Bird et al.:
“To harness and reuse the efforts of others, researchers must acknowledge the necessity and value of curation; they must find innovative ways to overcome the burden of curation.“
Common standards are essential to build the tools and services to overcome that burden of curation. Let’s keep the conversation going!
Sarah Callaghan | Research Scientist and Project Manager British Atmospheric Data Centre