Data Interoperability Webinar Q&A
Below are listed questions that were submitted during the NISO Data Interoperability webinar. Answers from the presenters will be added shortly. Not all the questions could be responded to during the live webinar, so those that could not be addressed at the time are also included below.
- OCRIS: Online Catalogue and Repository Interoperability Study
Kathleen Menzies, Researcher, Centre for Digital Library Research, University of Strathclyde
Gordon Dunsire, Head, Centre for Digital Library Research
- IDEALS (Illinois Digital Environment for Access to Learning and Scholarship) Repository at the University of Illinois at Urbana-Champaign
Sarah Shreeves, IDEALS Coordinator, Scholarly Commons Coordinator, University Library, University of Illinois at Urbana-Champaign
Feel free to contact us if you have any additional questions about library, publishing, and technical services standards, standards development, or if you have have suggestions for new standards, recommended practices, or areas where NISO should be engaged.
From ILS to Repository and Back: Data Interoperability
Webinar Questions & Answers
January 13, 2010
- Have you encountered science datasets and do you see this interaction between the IR and the Library Catalog working for science datasets?
Kathleen Menzies: We have encountered scientific datasets - for example, DSpace at Cambridge holds 170,000 chemistry datasets describing molecular structures which were bulk uploaded from the Chemistry Department's Unilever Centre for Molecular Informatics. As mentioned, the eCrystals repository at Southampton is used to store datasets relating to crystal structures. This repository holds "all the fundamental and derived data resulting from a single crystal X-ray structure determination, but excluding the raw images" (http://ecrystals.chem.soton.ac.uk/).
There is no reason why basic metadata records for such objects could not be harvested from an OAI compliant IR for use within an LMS or within an aggregator service.
However, because of the complexity of this type of data, I would suggest that there are practical barriers. It is unlikely that an LMS or RDP would ever support highly discipline-specific descriptive or data exchange standards and formats (e.g. JMOL, cif files). And how would you adequately reflect this granularity or the inter-relationships that exist between a dataset and a publication in a simple OPAC record?
As Sarah described, best practice is probably to a hold simple DC records for such datasets (derived from the IR record), linking users through to the repository holdings from there.
These issues highlight the need for discipline or subject specific repositories which serve the needs of specific communities. The key is making sure that at least some of the metadata stored in these repositories is re-usable and open so that those using the OPAC can discover it too.
- How do you "sell" the benefits of standards to IRs who may feel that traditional library standards are too complex or time consuming?
Kathleen Menzies: Making the point that gathering the information required by funding agencies will be made much easier when standards are in place is a good starting point. If external stakeholders wish to know about the results of funded research, or if there are internal audits being undertaken regarding research outputs, clearly the workflows involved in gathering this data should be made as streamlined and straightforward as possible. This can probably be related to the priorities and imperatives of most institutions.
Gordon Dunsire: The use of standards improves rather than hinders long-term flexibility. Many of the IRs in UK Higher Education institutions are undergoing significant planned development or unplanned drift of scope (i.e. what materials are to be included where) and function (what outputs the institution requires). While the use of standards may appear unnecessary overkill for current scope and function, any significant change is likely to require either expensive retroconversion of existing metadata or development of a local standard. Both of these are expensive and time consuming. The IR may not be able to respond quickly enough to the needs of the institution. Forward planing in these times of economic and technical flux is difficult for institutions and IR managers, and the adoption of established standards is really the best that can be done.
- What description scheme do you use for your datasets? Are there best practices for cataloging datasets available somewhere? Thank you.
Gordon Dunsire: The JISC-funded CLADDIER project may have some relevant information and is worth looking at: http://claddier.badc.ac.uk/trac
- (Mostly for Kathleen): Why is a RDP - resource discovery platform only a partial solution - because of metadata disparities?
Kathleen Menzies: Yes, exactly - because they can only be as good as the quality of the data they handle - they may 'accidentally' expose the flaws/gaps in records. They cannot overcome problems with, for example, user generated metadata on an IR record. However, they are a very positive development in terms of 'seamless' access to IR and LMS records.
It is also worth considering whether the UIs (User Interfaces) of RDPs are suited to all types of user. For example, 'Web 2' features might not always help experienced users (teaching staff, PhD students, high-level researchers etc.) Features such as 'AutoSuggest', tag clouds and user reveiws are probably more helpful to Undergraduates who are at home with Google and Amazon. These users probably don't need to "drill down" into the catalogue in as much depth.
However, Resource Discovery Platforms are definitely a very positive development in terms of offering 'seamless' access to both IR and LMS records.
- Do you link records in the IR together somehow, e.g., the paper and the dataset it's based on?
Kathleen Menzies: The eBank UK project (searching and sharing dataset and publication metadata) is worth reading: http://www.ukoln.ac.uk/projects/ebank-uk/.
- Sorry, I think I missed that - OCLC is fetching those links in batch? Or you are uploading to OCLC?
- For Sarah: If a user submits an item to the IR with minimal metadata, do library staff review the submission and add metadata?
- Are you exploring ORCID project for researcher IDs?
- Do you link records in the IR together somehow, e.g., the paper and the dataset it's based on, or the dataset and a questionnaire?
- Do any of the presenters have experience with cataloging Resource Description Maps? It seems that this would not only be more efficient but more useful information to have in a discovery tool.
Kathleen Menzies: We didn't encounter Resource Description Maps during the project and are not familiar with cataloguing these.