NISO Forum:
Tracking it Back to the Source: Managing and Citing Research Data

September 24, 2012
8:00 a.m. Registration Desk Opens
8:00 - 9:00 a.m. Continental Breakfast

9:00 - 9:30 a.m.

Introduction
Todd Carpenter, Executive Director, NISO

9:30 - 10:30 a.m.

Opening Keynote: The Many and the One: BCE themes in 21st century data curation
Allen Renear, Professor and Interim Dean, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign

Two scientists can be using "the same data" even though the computer files involved appear to be quite different.  This is familiar enough, and for the most part, in small communities with shared practices and familiar datasets, raises few problems. But these informal understandings do not scale to 21st century data curation. To get full value from cyberinfrastructure we must support huge quantities of heterogeneous data developed by diverse communities and used by diverse communities -- often with widely varying methods, tools, and purposes. To accomplish this our informal practices and understandings much be replaced, or at least supplemented, by a shared framework of standard terminology for describing complex cascades of representational levels and relationships. Fundamental problems in data curation -- and in particular problems involving provenance, identifiers, and data citation — cannot be fully resolved without such a framework. Although the deepest problems here have ancient origins, useful practical measures are now within reach.  Some recent work toward this end that is being carried out at the Center for Informatics Research in Science and Scholarship (CIRSS) at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign will be described. 

10:30 - 10:45 a.m.

Break


10:45 - 12:00 noon

EZID: Easy dataset identification & management
Joan Starr, Manager, Strategic and Project Planning and EZID Service Manager, California Digital Library

Data and data curation are assuming a growing role today’s research library. New approaches are needed both to address the resulting challenges and take advantage of the emerging opportunities. Long-term identifiers represent one such tool. In this presentation, Joan Starr will introduce identifiers and an application designed to make them easy to create and manage: EZID. She will provide a closer look at two identifier types: DOIs and ARKs, and discuss what bringing an identifier service to your institution might mean.

DataCite and Campus Data Services
Paul Bracke, Associate Dean for Digital Programs and Information Services, Purdue University

Research libraries are increasingly interested in developing data services for their campuses. There are many perspectives, however, on how to develop services that are responsive to the many needs of scientists; sensitive to the concerns of scientists who are not always accustomed to sharing their data; and that are attractive to campus administrators. This presentation will discuss the development of campus-based data services programs, the centrality of data citation to these efforts, and the ways in which engagement with DataCite can enhance local programs.

12:00 noon - 12:45 p.m.

Data Equivalence
Mark Parsons, Lead Project Manager, Senior Associate Scientist, National Snow and Ice Data Center

Data citation, especially using persistent identifiers like Digital Object Identifiers (DOIs), is an increasingly accepted scientific practice. Recently, several, respected organizations have developed guidelines for data citation. The different guidelines are largely congruent in that they agree on the basic practice and elements of data citation, especially for relatively static, whole data collections. There is less agreement on the more subtle nuances of data citation that are sometimes necessary to ensure precise reference and scientific reproducibility--the core purpose of data citation. We need to be sure that if you follow a data reference you get to the precise data that were used or at least their scientific equivalent. Identifiers such as DOIs are necessary but not sufficient for the precise, detailed, references necessary. This talk discusses issues around data set versioning, micro-citation, and scientific equivalence. I propose some interim solutions and suggest research strategies for the future.

12:45 - 1:45 p.m. Lunch

1:45 - 2:30 p.m.

ResourceSync: Web-Based Resource Synchronization. Also for Data.
Herbert Van de Sompel, Digital Library Researcher, Los Alamos National Laboratory, and Co-chair of NISO’s ResourceSync Working Group

Web applications frequently leverage resources made available by remote Web servers. As resources are created, updated, or deleted these applications face challenges to remain in lockstep with the server’s change dynamics. Several approaches exist to help meet this challenge for use cases where “good enough” synchronization is acceptable. But when strict resource coverage or low synchronization latency is required, commonly accepted Web-based solutions remain elusive. Motivated by the need to synchronize resources for applications in the realm of cultural heritage and research communication, the National Information Standards Organization (NISO) and the Open Archives Initiative (OAI) have launched the ResourceSync project that aims at designing an approach for resource synchronization that is aligned with the web architecture and that has a fair chance of adoption by different communities. The presentation will discuss some motivating use cases and will provide a perspective on the resource synchronization problem that results from ResourceSync project discussions. It will provide an overview of the ongoing thinking regarding an approach to address the challenges and will pay special attention to aspects that are relevant for the synchronization of data.

2:30 - 3:15 p.m.

Scientific discovery and innovation in an era of data-intensive science
William (Bill) Michener, Professor and Director of e-Science Initiatives for University Libraries, University of New Mexico; DataONE Principal Investigator

The scope and nature of biological, environmental and earth sciences research are evolving rapidly in response to environmental challenges such as global climate change, invasive species and emergent diseases. Scientific studies are increasingly focusing on long-term, broad-scale, and complex questions that require massive amounts of diverse data collected by remote sensing platforms and embedded environmental sensor networks; collaborative, interdisciplinary science teams; and new tools that promote scientific data preservation, discovery, and innovation. This talk describes the challenges facing scientists as they transition into this new era of data intensive science, presents current solutions, and lays out a roadmap to the future where new information technologies significantly increase the pace of scientific discovery and innovation.

3:15 - 3:30 p.m.

Break

3:30 - 4:15 p.m.

Needs for Data Management & Citation Throughout  the Information Lifecycle
Micah Altman, Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, Massachusetts Institute of Technology

This session will examine data management  and data citation from an information lifecycle approach. The session will discuss the implications for data management of analyzing the needs, rights, and responsibilities of researchers and other stakeholders at each lifecycle stage. And the session will  discuss data citation and other related mechanisms that are useful in linking services and aligning incentives across lifecycle stages and among stakeholders. 

4:15 - 4:45 p.m.

"Ask Anything" Session
Bring your questions, comments, and ideas to share with the entire group.
Moderator: Todd Carpenter, Managing Director, NISO

4:45 - 5:00 p.m. Forum Wrap-up
Todd Carpenter, Executive Director, NISO