Home | About NISO | Blog

Archive for the ‘research data’ Category

NISO response to the National Science Board on Data Policies

Wednesday, January 18th, 2012

Earlier this month, the National Science Board (NSB) announced it was seeking comments from the public on the report from the Committee on Strategy and Budget Task Force on Data Policies, Digital Research Data Sharing and Management.  That report was distributed last December.

NISO has prepared a response on behalf of the standards development community, which was submitted today.  Here are some excerpts of that response:

The National Science Board’s Task Force on Data Policies comes at a watershed moment in the development of an infrastructure for data-intensive science based on sharing and interoperability. The NISO community applauds this effort and the focused attention on the key issues related to a robust and interoperable data environment.

….

NISO has particular interest in Key Challenge #4: The reproducibility of scientific findings requires that digital research data be searchable and accessible through documented protocols or method. Beyond its historical involvement in these issues, NISO is actively engaged in forward-looking projects related to data sharing and data citation. NISO, in partnership with the National Federation of Advanced Information Services (NFAIS), is nearing completion of a best practice for how publishers should manage supplemental materials that are associated with the journal articles they publish. With a funding award from the Alfred P. Sloan Foundation and in partnership with the Open Archives Initiative, NISO began work on ResourceSync, a web protocol to ensure large-scale data repositories can be replicated and maintained in real-time. We’ve also had conversations with the DataCite group for formal standardization of their IsCitedBy specification. [Todd Carpenter serves] as a member of the ICSTI/CODATA task force working on best practices for data citation and NISO is looking forward to promoting and formalizing any recommendations and best practices that derive from that work.

….

We strongly urge that any further development of data-related best practices and standards take place in neutral forums that engage all relevant stakeholder communities, such as the one that NISO provides for consensus development. As noted in Appendix F of the report, Summary Notes on Expert Panel Discussion on Data Policies, standards for descriptive and structural metadata and persistent identifiers for all people and entities in the data exchange process are critical components of an interoperable data environment. We cannot agree more with this statement from the report of the meeting: “Funding agencies should work with stakeholders and research communities to support the establishment of standards that enable sharing and interoperability internationally.”

There is great potential for NSF to expand its leadership role in fostering well-managed use of data. This would include not only support of the repository community, but also in the promulgation of community standards. In partnership with NISO and using the consensus development process, NSF could support the creation of new standards and best practices. More importantly, NSF could, through its funding role, provide advocacy for—even require—how researchers should use these broad community standards and best practices in the dissemination of their research. We note that there are more than a dozen references to standards in Digital Research Data Sharing and Management report, so we are sure that this point is not falling on unreceptive ears.

The engagement of all relevant stakeholders in the establishment of data sharing and management practices as described in Recommendation #1 is critical in today’s environment—at both the national and international levels. While the promotion of individual communities of practice is a laudable one, it does present problems and issues when it comes to systems interoperability. A robust system of data exchange by default must be one grounded on a core set of interoperable data. More often than not, computational systems will need to act with a minimum of human intervention to be truly successful. This approach will not require a single schema or metadata system for all data, which is of course impossible and unworkable. However, a focus on and inclusion of core data elements and common base-level data standards is critical. For example, geo-location, bibliographic information, identifiers and discoverability data are all things that could be easily standardized and concentrated on to foster interoperability. Domain-specific information can be layered over this base of common and consistent data in a way that maintains domain specificity without sacrificing interoperability.

One of the key problems that the NSB and the NSF should work to avoid is the proliferation of standards for the exchange of information. This is often the butt of standards jokes, but in reality it does create significant problems. It is commonplace for communities of interest to review the landscape of existing standards and determine that existing standards do not meet their exact needs. That community then proceeds to duplicate seventy to eighty percent of existing work to create a specification that is custom-tailored to their specific needs, but which is not necessarily compatible with existing standards. In this way, standards proliferate and complicate interoperability. The NSB is uniquely positioned to help avoid this unnecessary and complicating tendency. Through its funding role, the NSB should promote the application, use and, if necessary, extension of existing standards. It should aggressively work to avoid the creation of new standards, when relevant standards already exist.

The sharing of data on a massive scale is a relatively new activity and we should be cautious in declaring fixed standards at this state. It is conceivable that standards may not exist to address some of the issues in data sharing or that it may be too early in the lifecycle for standards to be promulgated in the community. In that case, lower-level consensus forms, such as consensus-developed best practices or white papers could advance the state of the art without inhibiting the advancement of new services, activities or trends. The NSB should promote these forms of activity as well, when standards development is not yet an appropriate path.

We hope that this response is well received by the NSB in the formulation of its data policies. There is terrific potential in creating an interoperable data environment, but that system will need to be based on standards and rely on best practices within the community to be fully functional. The scientific community, in partnership with the library, publisher and systems provider communities can all collectively help to create this important infrastructure. Its potential can only be helped by consensus agreement on base-level technologies. If development continues in a domain-centered path, the goal of interoperability and delivering on its potential will only be delayed and quite possibly harmed.

The full text PDF of the entire response is available here.  Comments from the public related to this document are welcome.

EU Research Data Preservation Project Seeks Survey Input from Publishers

Tuesday, November 11th, 2008

PARSE.Insight, a European Union project initiated in March 2008 “to highlight the longevity and vulnerability of digital research data,” is conducting an online survey about access and storage of research data.

PARSE.Insight is “concerned with the preservation of digital information in science, from primary data through analysis to the final publications resulting from the research. The problem is how to safeguard this valuable digital material over time, to ensure that it is accessible, usable and understandable in future.”

They are interested in getting publishers’ views included in their survey, in addition to researchers, since publishers play a critical role in the digital preservation of publications and related research data.

The survey is available here:
https://www.surveymonkey.com/s.aspx?sm=VfIpOoxogOv73uWOyaOhoQ_3d_3d

Reponses are aggregated for analysis and made anonymous. If you wish to be informed about the results of the survey you can enter your e-mail address at the end of the survey.

Ultimately, PARSE.insight plans to “to develop a roadmap and recommendations for developing the e-infrastructure in order to maintain the long-term accessibility and usability of scientific digital information in Europe.”

Posted by Cynthia Hodgson

NISO brings together Data Thought Leaders

Friday, October 3rd, 2008

We held the last of the Mellon-funded Thought Leader Meeting series Wednesday.  The topic of this meeting was on Research Data and explored many of the issues surrounding the use, reuse, preservation, and citation of data in scholarship.  Like the three previous meetings, it was a great success.  The meeting brought together a number of representatives from the research, publisher, library and system developer communities.  A list of the representatives is below.

Research data is becoming increasingly critical in almost every area of scholarship.  From census data to high-energy physics, and medical records to the humanities, the range of types of data and the uses which researchers apply this data has expanded dramatically in the past decade.  Managing this data, finding, accessing and curating it is a growing problem.  A report produced by IDC earlier this year concluded that the amount of digital data created exceeded the total available storage capacity in the world.  Determining which aspects are most valuable and adding value through curation will be a tremendous project in the coming decades. 

In order to be useful (in a scientific sense), data needs to verifiable, identifiable, reference-able, preservable, much in the way that published materials are.   Obviously, this poses many questions:  When referring to a data set that is constantly being updated or appended, what would you be citing?  What if the results are modeled from a subset?  Again the data set isn’t as relevant to the citation as which portion of the larger set were used, as well as the model and criteria that were used in the analysis.  Additionally, models and software that are used on a specific data set would be critical to determining the validity of any results or conclusions drawn from the data.  In the peer-review process of science, each of these aspects would need to be considered.  Some publishers are already considering these issues and review criteria. In the future, these issues will only grow for publishers, societies and scientists as they consider the output of science.

Another issue is the variety of life cycles for different types of data.  In fields such as chemistry, there is a much shorter half life in the usefulness of a dataset than it might be in the humanities or social sciences.  This could effect the value proposition of whether to curate a dataset.  Some work done by the JISC had been focused on mandating deposit of materials for the purpose of preservation. Unfortunately, the project didn’t succeed and was withdrawn in 2007. One of the potential reasons that more than $3 million investment turned out to be a disappointment was possibly its focus on archiving and preservation of the data deposited and not focused on reuse and application of deposited data. In order for the preservation to be deemed worth the investment, simultaneous focus on the reuse of the data is critical to ensuring that the investment sees some form of return — apart from developing a large repository of never-accessed data.

While there was some discussion during the day that related to encouraging use and sharing of research data and methodologies, technical standards will not help with what is inherently a political question.  Many of the rewards and recognition in the scholarly process come back to the formalities of publication, which have developed over centuries.  As with many standards-related questions, the problems are not normally related to technologies per se, but often hinge on the political or social conventions that support certain activities.  That said, the development of citation structures, descriptive metadata conventions, discovery methodologies, and curation strategies will add to the growing trends of utilizing these data forms in scholarly communications.  By expanding their use and ensuring that the content if preserved and citable, NISO could help encourage expanded use of data in the communication process.

The report of this meeting will be publicly available in a few weeks on the NISO website along with the other reports.  NISO’s leadership committee structure will be reviewing the recommendations and deciding which initiatives to push forward with in the coming months. 

 Research Data Thought Leader Participants:

Clifford Lynch, Coalition for Networked Information 

Ellen Kraffmiller, Dataverse Network 

Paul Uhlir, National Academy of Sciences

Lars Bromley, AAAS 

Robert Tansley, Google 

Jean Claude Bradley, Drexel University

Camelia Csora, 2collab, Elsevier  

MacKenzie Smith, MIT Libraries – DSpace

Stuart Weibel, OCLC