dcmi

 

 

 

A NISO/DCMI Joint Webinar Series

Taking Library Data From Here to There

Below are listed questions that were submitted during the February 22, 2012 NISO/DCMI joint webinar. Answers from the presenters will be added when available. Not all the questions could be responded to during the live webinar, so those that could not be addressed at the time are also included below.

Speakers:

  • Karen Coyle, Digital Libraries Consultant; member, W3C Incubator Group on Library Linked Data
  • Thomas Baker, Chief Information Officer of the Dublin Core Metadata Initiative; co-chair, W3C Incubator Group on Library Linked Data

Feel free to contact us if you have any additional questions about library, publishing, and technical services standards, standards development, or if you have suggestions for new standards, recommended practices, or areas where NISO should be engaged.

NISO/DCMI Webinar Questions and Answers

1. I love this conceptually. But, *practically* how do small and mid-sized institutions implement this? Especially when much "information" is managed in CMSs to which we may not have back-end access? Advice?

Karen Coyle: This is a bit like asking how do small and medium institutions implement MARC -- most of us will "implement" by participating in normal activities, like cataloging, being part of consortia, etc. All of us need to understand it, but the technology is not something that most of us will build.

2. Can you speak to the use of Application Profiles in all of this?

Karen Coyle: Application Profiles fit in well with data that uses statements rather than records. An application profile is someone's definition of their data choices, something like a work flow in the RDA Toolkit. With an AP, however, you can not only select the elements you want to include, but you can add elements that you need but that are not part of the base set. This means that we no longer have to develop cataloging rules and data elements that are all things to all people. Specialist communities can share a basic set of bibliographic elements but are also free to add detail or new elements where they need them without losing compatibility with the greater community.

3. How about vendors, such as Serials Solutions? Are vendors buying in this same concept?

Karen Coyle: Vendors are watching this closely -- along with the potential move to RDA and to FRBR. One thing I've been talking to folks about is the possibility that our initial steps into linked data could be simply an addition to our displays. We could expose only a few key data elements for the purposes of experimentation without having to re-write our underlying system structures. That would give us time for our systems to evolve.

4. If I understand correctly, identifiers are data (as opposed to text) and, from your illustration of dog it seems that IDs can be translated to text, something human readable, is that right? That data can fulfill the needs of both machine processing and human consumption?

Karen Coyle: That's exactly right. Identifiers are for machines, and if you are working on a system that shows you raw identifiers, then something is very wrong. Humans should only see the text displays that are intended for them, even during cataloging.

5. It seems that "what properties to use" is still an open question with linked data that's occurring outside of libraries... and there's a lot of work to be done relating those vocabularies. Should we be involved in that work, or just wait to see what other data providers do?

Karen Coyle: I definitely want to see libraries involved in those discussions. Already libraries have contributed to standards like SKOS, and I have no doubt that we have a lot to offer because we know how complex metadata can be. Believe me, folks outside of libraries are very interested in our data, and interested in our ideas.

6. What are the discovery services?

Karen Coyle: I assume you are asking about services that you can use to discover linked data. Nearly all linked data is searchable through a standard called SPARQL (pronounced "sparkle"), but SPARQL only provides an SQL-like query capability and it's not easy to use. There are a few search engines, although none nearly as sophisticated as Google. You might try the Semantic Web Search Engine at Deri (Ireland).

7. How does XML fit into this scenario?

Karen Coyle: XML is one possible "wrapper" that can be used around data, and it is often used when transmitting linked data. You can see examples of linked data in RDF at the LC Authorities site and at VIAF. On the page for an individual entry you will see links to the RDF/XML. I also have some examples under "Examples" on my page of links.

8. Can you address the issue of searching the massive amount of data that this would produce? Is technology keeping up with the volume for the researcher?

Karen Coyle: The mechanics of searching this data are approximately the same as the mechanics of searching the web. To some extent, the web itself can be employed for searches in the same way that you can create a web site today that pulls in data from many different sources. Efficient searching, and value-added information retrieval will probably require some Big Data Engines. It is the case that Google has already purchased one of the few linked-data sites (Freebase), and I wouldn't be surprised if Google is intending to become one of the search engines on the semantic web.

9. Question for Karen: Will there ever be a definitive arrival at A destination? Or will library catalogues always be evolving to keep up with or work in conjunction with the semantic web?

Karen Coyle: I can't imagine a technology that is not in constant evolution. The question is not really about whether library technology will change, but how fast it will have to change, and to what extent we can take advantage of the whole technology environment that surrounds us. I see a great advantage to becoming part of the mainstream technology, something that we have not been in the past. With our current data structures we could not make use of software and systems that were on the general market because no one else was using the MARC format. Moving to a situation where we are using the common technology should mean that we can ride on the coattails of technology developments and take advantage of technologies invented by and for the world at large.

10. If we move to use and maintenance of linked data, what changes do you envision in our economic models?

Karen Coyle: If you mean the economic models of cataloging, we will be able to continue wide-spread sharing of the cataloging burden among libraries. But my goal is not just that we would continue to catalog efficiently, but that we would be able to afford to provide more value-added services, ones that we just would not be able to afford to do today.

11. What is the hash tag?

Karen Coyle: If you are referring to the use of the hash tag in "cool uris", then I refer you to the W3C documentation. I don't know of a sufficiently short explanation, but if someone finds one, please let me know.

12. Are there any software programs that help libraries convert to linked data?

Karen Coyle: Yes. There are a number of people who have created linked data from library data, and many of them make their code available. The best place to see which libraries are working in this area is to look at the list on the Data Hub. From there you will find a link to the library or the library's project, and that's where any technical details will be.

13. What are your thoughts on microdata?

Karen Coyle: I think that microdata has potential, although I suspect that it's a transitional technology. The microdata from schema.org, in its initial incarnation, was not using identifiers, and therefore was really only suitable for the kind of keyword searching that the big search engines are interested in. There are enough people now encouraging the use of identifiers and more "data-like" data that I think it will evolve in that direction, and will therefore be more precise and more useful to people outside of the search engine applications.

14. Do you have any examples of proof of concept projects for using linked library data?

Karen Coyle: There are two public databases, Freebase and the Open Library, that are based on linked data concepts and have loaded large amounts of bibliographic data from libraries. In addition, the Talis group, a library systems group in the UK, has spun off a company, Capita, that has created a library system based on linked data. I haven't ever seen it in action, though.

15. How do we combat the notion that full text indexing (ala Google) will solve our problems, and the related notion that we no longer need structured bibliographic data?

Karen Coyle: I think by showing the value of our subject access. To do that, however, we need to modernize our use of the rich relationships that are in the subject headings and classifications. We need to make them navigable in interesting ways.

16. Do you see any particular challenges with archival metadata as linked open data?

Karen Coyle: The main challenge that I see is that archival description can be rather "terse" in terms of facts and potential links because it is often the case that not a great deal is known about the materials. But I also see a tremendous potential for archival materials because they become so much richer when viewed in context. By linking archival materials to information about places and to times it will be easier for people to see the archive's place in history. I think we could see the ability for more libaries to create something like "American Memory" using their own materials.

17. Where are the vendors in this discussion?

Karen Coyle: see #3

18. I'm wondering about your example of statements versus records and W. Shakespeare -- the 'graph' you illustrate links the 'properties' directly to the 'resource'...would it not make more sense to link these to the 'data record ABOUT the object'?

Karen Coyle: This is a big philosophical question in the linked data community. The preference is to treat the things you are "talking about" in metadata as equivalent to the things in the real world. Thus, Shakespeare in a bibliographic record is the person we think of as Shakespeare, not a surrogate for that person; and we describe the book rather than create a surrogate data record "graph" for the book. I'd say that this is an area that is going to be won or lost in the actual practice of linked data. Essentially we'll end up doing what works, regardless of the philosophical arguments.

19. How usable is bibliographic data that's produced by Amazon and other online services?

Karen Coyle: The Open Library takes in data from Amazon and it is mixed in with data from LC Books files and data from other libraries. Sometimes you can tell where the data came from if you know what to look for, but other times not. (Hint: Amazon often uses full dates for publishing dates, like March 3, 2011; and it includes measurements for height, width and depth, as well as the weight of the book. Think: shipping.) Every data source will have some unique data or something really great, so a mash-up of the best of the best is the goal.

20. How do libraries move from creating records to using/creating statements? What would that mean for individual catalogers?

Karen Coyle: The only way to do that will be to move beyond MARC. But I think we can practice it today on paper and learn what does and what doesn't work. I hope that there will be exercises of this nature done in the investigation of the new bibliographic framework that is being studies at Library of Congress. The cataloging interface does not need to be at all graph-like. I expect any future cataloging interface to be made up of forms that are filled in, with the record format visible only to the applications.

21. Could you envision for us an idea of what you think as the ideal path and endpoint for the future of metadata creation?

Karen Coyle: I think I'm too realistic to have a single, ideal path. I also know that any path I envision today will soon be overcome by new technologies that I cannot even imagine at this moment. I would say that my ideal, if I have one, is for metadata to be a tool, and for us to have many toolkits that can work with it.

22. How are we going to change data coming into our catalogs? Are we going to rely on crosswalks? Should we be looking to cataloging in DC?

Karen Coyle: If we are using linked data then crosswalks should not be necessary, at least not crosswalks in the sense that we mean them today. Most of the linking will take place outside of our own databases/catalogs. Rather than taking in data from outside sources we should be linking out to data. Our own catalogs -- especially the parts that serve the library management functions like acquisitions, circulation, receipt, etc. -- may continue to be somewhat closed and protected. It's more the discovery function that will be out in the wild.

23. How important is FRBR to the move to linked data? The MARC bib record combines FRBR's WEMI objects into one. Do we need separate identifiers and descriptions for works, expressions, and manifestation?

Karen Coyle: There has been a lot of thought about FRBR in the linked data community, even among non-library technologists. FRBR is conceptually a step toward linked data because it expresses the library environment as related "things." There are some issues in implementing it as linked data, including the question about having identifiers for WEMI. This is an ongoing discussion, and I don't know where it will end up.

24. How can we verify accuracy of data in linked data statements?

Karen Coyle: If you mean: "How do I know this is really the author of this book?" then you cannot verify it, any more than you can today when you are looking at a MARC record. For that we will rely, as we do today, on trusting our sources. If instead you mean: "Is this supposed to be in ISO date format?", then the element definitions will provide rules for verification. The current interest in Application Profiles is in part because we see the APs as having precise data definitions that will make verification easier.

25. Since bibliographic records are becoming obsolete, will new record formats be based on groups of authorities instead?

Karen Coyle: That's definitely one way of looking at it, and I think it is a useful analogy. To the extent possible, we will have groups of defined "things" many of which today are in authority records (people, subjects, events). There will still be a fair amount of text and "messy bits" in bibliographic metadata because that is its nature. The things that can be clearly defined will provide the most linking and the most accurate linking. The messy bits will get some interpretation through full text analysis or other techniques.

26. How tolerant is a linked data environment of variations in metadata practice & quality? E.g. some places are very specific about ISBNs, others are indiscriminate, others leave it out altogether, etc.

Karen Coyle: Linked data can be as tolerant as you like, so it alone does not solve this issue. In fact, much could be done to improve the usability of bibliographic data without moving to linked data at all. Adding precision to our data elements such that ISBNs are always in the same format, or that we use standard date forms, would already be an improvement. I think it would be highly useful, in light of the new bibliographic framework, to start with an open-ended question of what we want to accomplish with our data, and then compare that to the data that we create today. This would be a great learning experience. In fact, I would like to see such a study done, not by librarians who live and breathe MARC, but by someone encountering the data for the first time. It would be extremely interesting to hear what they would suggest for our data.

27. If we start using algorithmically generated data in statements (say, automatic classification or subject assignment), how could we express the property so it's not as strong an assertion? (in case the algorithm is wrong x percent of the time)...?

Karen Coyle: I'm not a mathematician, but there are people applying this type of analysis to all kinds of data today. Maybe a future webcast should pull in someone who can talk about algorithmic analysis. I'd like to hear that myself.