Research Data Curation Part One: E-Science Librarianship

Webinar

About the Webinar

Presenters will discuss the role of the library in the academic research enterprise and provide an overview of new librarian strategies, tools, and technologies developed to support the lifecycle of scholarly production and data curation. Specific challenges that face research libraries will be described and potential responses will be explored, along with a discussion of the types of skills and services that will be required for librarians to effectively curate research output.

Part Two of this webinar, Libraries and Big Data, will be held on Wednesday, September 18.

Event Sessions

Introduction

Speaker

The Evolution of Escience Librarianship in the New England Region and Beyond

Speaker

Elaine Martin

Editor, Journal of eScience Librarianship, University of Massachusetts Medical School
University of Massachusetts Medical School

With considerable input from science and health sciences librarians throughout New England, the Lamar Soutter Library, University of Massachusetts Medical School (which serves as the Regional health sciences library under contract from the National Library of Medicine) has responded alone and with its network member libraries to develop a strategic approach to defining escience librarianship. The multifaceted model creates opportunities for offering professional development, job tools and resources, dissemination of information, and best practices for teaching scientific research data management. This presentation will discuss the multi-faceted model and how it effects academic librarians and the science and health sciences institutions they serve. Working in partnership, network member libraries are poised to participate in identifying new roles for librarians and creating a reinvention and rejuvenation of science and health sciences librarianship in New England and beyond.

Elaine Russo Martin, MSLS, DA, is director of the Lamar Soutter Library of the University of Massachusetts Medical School. The library serves under contract by the National Library of Medicine as the Regional Medical Library for the six New England States and Dr. Martin also serves as the director of that program. Before joining UMass, Dr. Martin was director of the health sciences library at the University of Illinois at Chicago and served in various other professional positions in medical school libraries in Virginia, Washington, D.C. and Washington state. Dr. Martin received her MSLS from the Catholic University of America, Washington, DC. and her doctorate in library science administration from Simmons College, Boston, MA.

Dr. Martin's research interests include: biomedical informatics; consumer health informatics; assessing the information needs of public health workforce; evidence-based public health; and organizational development, in particular leadership and teamwork as they apply to medical libraries. 

The Digital Research Enterprise: Identifying New Roles for Libraries

Speaker

Chris Shaffer, MS, AHIP

UniversiLibrarian and Associate Professor, Oregon Health & Science University Library
Oregon Health & Science University Library

Changes in the conduct of research, from the growth of team science to the emergence of big data, have created opportunities for libraries to become involved in the creation and management of research data. Through partnerships with institutional service providers, libraries can develop new services to help meet the information management challenges faced by researchers. Workforce training and expansion is required to help information professionals succeed in the eScience arena.

Chris Shaffer is University Librarian and Associate Professor at Oregon Health & Science University, where he develops partnerships to extend new library services to researchers. Mr. Shaffer was co-site PI for The eagle-i Consortium, led by Harvard University, which is developing a national biomedical research resource discovery network. He established the OHSU Library Ontology Development Group, which is reusing and developing sharable ontologies and creating curation practices and procedures. Mr. Shaffer is an active member of the Medical Library Association and a Distinguished Member of the Academy of Health Information Professionals. He holds a BA in Philosophy from Texas A&M University and an MS in Information Science from the University of North Texas.

Seeking Our Niche: Understanding the Needs of Research Personnel to Develop E-Science Services

Speaker

Megan Sapp Nelson

Associate Professor of Library Sciences, Purdue University
Purdue University

Data may look very different depending upon the discipline that it is being produced within. Big data, little data, and every research project in between represents a different set of needs and expectations. How do libraries come to know and understand the needs of research personnel (both faculty and graduate students) across the widely varied disciplines of one institution? Through the development of new strategies and tools, librarians are developing services that proactively meet researcher needs.

Event Q&A With Our Speakers

Q: How should a librarian target their efforts related to large-scale project with many distributed partners? E.g. If science teams are getting larger, any one or two individuals will have a smaller view of a full project than in the past. ­

CS: Possible responses include a train-the-trainer approach or partnering with other information specialists. The key is identifying the people who will most benefit from library services. An analogy can be made to providing services to a large department.

MSN: I would start with trying to understand the data workflow of the project. Where is data collected? What data is being collected? Is the data forwarded to another group for analysis? It may be that there are one or two key people who have the general view of the data lifecycle. If so, they may be able to point to specific data management tasks that are difficult for that project. That said, at Purdue we have multiple large scale data management IT groups who employ full time data managers. As such, we are primarily working with “small scale” projects. As I mentioned in the workshop, international collaborations, inter-institutional collaborations, and interdisciplinary research groups present many data management issues, due to the fact that those teams don’t have designated data managers, and may have not even thought through data management if the project pre-dates large grant issuing bodies’ data management mandates. Understanding the lifecycle and having conversations to identify issues that need addressed apply for distributed projects as well as centralized ones.



Q: The presentations have focused on data. What about scientific software? Are the approaches towards supporting data similar or different for scientific software?­

CS: The eagle-i project (http://eagle-i.org/) indexes software and algorithms, with an eye toward reuse and sharing of resources. Some libraries provide software via group purchase or dedicated workstations and computer labs.

MSN: I have been working on a research collaboration for the past two years that explored the needs of computer scientists for managing software packages as data sets. We have approached the computer scientists in similar ways to other data sets but there are significant differences. The jargon/vocabulary of software management is problematic for information scientists. A good example is traceability. This is the concept of being able to identify all individuals who made changes to the software code and all changes introduced. This is parallel, similar but not synonymous to the concept of provenance as understood by information scientists. Having a conversation with computer scientists frequently can bring up terms like this, and it is easy to talk past each other.



Q: Another difference is the prevalence of the attitude that technology can be created to fix the data management problem. As information scientists, we consider the human element to have a much more important role in the development of solutions than computer scientists do. 

Still another difference is the role that lawyers and law play in the sharing of the software packages. Sharing software is very difficult, searching for software snippets that meets individual project objectives is incredibly time-consuming, and once the final software is done it may be locked down by corporations who paid for the software code to be created. The awareness of usage agreements and confidentiality agreements is also heightened for those creating software.

That said, the data lifecycle does apply to the development of software packages, and many of the same skills that are needed by graduate students in other disciplines apply directly to computer science graduate students. If anything, there is an enhanced recognition of what the laboratories are doing well in teaching students and enhanced awareness of weaknesses as well.

Do you have data set archive criteria?­

CS: The standards for each discipline should drive the answer to this query. However, there are emerging criteria that can apply across disciplines. One place to start is the Data Curation Profiles (http://datacurationprofiles.org/).

MSN: This answer is from Courtney Matthews, the Purdue University Research Repository Digital Data Repository Specialist. 
“The PURR Digital preservation policy outlines PURR’s preservation commitments:https://purr.purdue.edu/legal/digitalpreservation.

In a nutshell we maintain all published datasets for 10 years. After 10 years / the “end of the initial commitment” datasets are vetted by the relevant subject specialist librarian(s) and a digital archivist.

They decide whether the dataset will be selected or deselected for inclusion the Library’s permanent collection based on the priority levels outlined in the Digital Preservation Policy:

Priority 1: Data Sets associated with Publications. Rigorous effort will be made to ensure preservation of data sets associated with journal publications or other scholarly publications in perpetuity, or for as long as the data sets meet the Purdue Libraries collection development policies and practices, or is superseded in the future by an acceptable data repository.

Priority 2a: Stand-Alone Data Publications. Every reasonable step will be taken to preserve stand-alone data publications in accordance with best practices and collection development policies.

Priority 2b: Data Sets with High Research/Teaching Value. Every reasonable step will be taken to preserve data sets that are identified by subject specialist librarians or archivists as having high value for meeting the research and teaching needs of Purdue University or within the broader research community.

Priority 3: Other Data Files and Materials. No preservation steps will be taken for ephemeral materials deemed to be of little or no long-term value to the comprehensiveness of the collection. Working files of particular significance to Purdue’s teaching and research needs, or within the greater research community, may be preserved on a select basis as appropriate.”

 

Q: A detail about the civil engineering project: What advice did you give the investigator for the CE project you described?­

MSN: We actually made that project a pilot project, targeting display of multiple types of data at once. Working with the researcher, his primary value for the outcome was to be able to share his data in a way that made sense. We created a data portal that displayed the sensor data, the video still frame, and GIS data for the location of the sensors as well. Based upon that initial foray into data management, the researcher has continued to work with us in exploring data management and is now the head of a research program that is trialing creating data sets intended for sharing and publishing the data sets simultaneously with the research reports published by the research group. 



Q: How much of this is science librarianship vs e-science librarianship? Is there really a difference anymore?­

CS: The answer to this question depends on your definition of e-science. If your definition is limited to large data sets and massive computing (e.g. astronomy data, or human genome projects) then there is a significant difference.

MSN: At Purdue, there is no difference any more. We are liaisons to departments and as part of that workload we include working with researchers to manage data. That said, we have highly trained data research specialists and metadata specialists to support us as we do this work and to supplement the technical knowledge that we are lacking. But from my viewpoint, if you are serving a STEM discipline, working with researchers to manage data is just part of being a good STEM librarian.



Q: A large question - How do get your foot in the door? Departmental meetings?­

CS: The first step is to prove your value and identify campus champions who will advocate for the library. Credibility demands success. Well executed pilot projects are one possible way to start.

MSN: I have found that departmental meetings don’t work very well. The information provided to the disciplinary researchers are too generic for them to picture how we can help them. The collaborations around data management that I have done are either a result of pre-existing working relationships that have been developed over time, or word of mouth from one researcher to another. It is nearly always surprising to the disciplinary faculty member that there is someone researching data management when they first hear about it, but the disciplinary faculty members also recognize the value to themselves of collaborating with someone who is researching data best practices. They have to hear that information within the context of their own data management needs. I have frequently had a conversation about instruction or information literacy turn into a discussion about data through natural segues in the conversation. Those conversations don’t always pan out, but they do frequently enough to keep me scrambling to get everything done. 



Q: What was that Twitter hashtag, one more time? 

MSN: @datainfolit is the Twitter hashtag for our research group. You can also monitor public announcements via Facebook at www.facebook.com/datainfolit. The Data Information Literacy symposium will be held on September 23rd and 24th and will feature Elaine Martin as the keynote speaker for Day 1.

Additional Information

  • Registration closes at 12:00 pm Eastern on September 11, 2013. Cancellations made by September 4, 2013 will receive a refund, less a $20 cancellation. After that date, there are no refunds.
  • Registrants will receive detailed instructions about accessing the webinar via e-mail the Monday prior to the event. (Anyone registering between Monday and the close of registration will receive the message shortly after the registration is received, within normal business hours.) Due to the widespread use of spam blockers, filters, out of office messages, etc., it is your responsibility to contact the NISO office if you do not receive login instructions before the start of the webinar.
  • If you have not received your Login Instruction email by Tuesday at 10AM (EST) please contact the NISO office or email Jill O'Neill, Educational Programs Manager at joneill@niso.org for immediate assistance.
  • Registration is per site (access for one computer) and includes access to the online recorded archive of the webinar. You may have as many people as you like from the registrant's organization view the webinar from that one connection. If you need additional connections, you will need to enter a separate registration for each connection needed.
  • If you are registering someone else from your organization, either use that person's e-mail address when registering or contact the NISO office to provide alternate contact information.
  • Library Standards Alliance (LSA) members receive one free webinar connection as part of their membership. You do not need to register for the event for this free connection. Your webinar contact will receive the login instructions the Monday before the event. You may have as many people as you like from the member's library view the webinar from that one connection. If you need additional connections beyond the free one, then you will need to enter a paid registration (at the member rate) for each additional connection required.
  • Webinar presentation slides and Q&A will be posted to the site following the live webinar.
  • Registrants and LSA member webinar contacts will receive an e-mail message containing access information to the archived webinar recording within 48 hours after the event. This recording access is only to be used by the registrant's or member's organization.