NISO Two Part Webinar: Understanding Critical Elements of E-books: Standards for Formatting and Metadata

Part 2: Find That E-book – or Not: How Metadata Matters

Below are listed questions that were submitted during the March 14, 2012 webinar. Answers from the presenters will be added when available. Not all the questions could be responded to during the live webinar, so those that could not be addressed at the time are also included below.

Speakers:

  • Laura Dawson, Communications Chief, Firebrand Technologies
  • Pat Payton, Senior Director, Publisher Relations and Content Development, Bowker
  • Graham Bell, Chief Data Architect at EDItEUR

Feel free to contact us if you have any additional questions about library, publishing, and technical services standards, standards development, or if you have suggestions for new standards, recommended practices, or areas where NISO should be engaged.

NISO Webinar Questions and Answers

1. Can you repeat or post link to the white paper you mentioned in the introduction?

NISO: Here is the link to "Streamlinging the Book Metadata Workflow"

Also, make sure to check out the Resources page for this webinar by going here.

2. What was the name of the book that had the bad metadata in Laura Dawson's presentation?

Laura Dawson: A Handbook of Japanning for Ironware, Tinewae [sic], Wood by William N. brown [sic], a Bibliolife title. They are my go-to for wretched metadata.

3. In FRBRese, is ISTC intended to apply to works or expressions? Or can it be used for either? (Or is it not FRBR-compatible?)

Pat Payton: Graham has a visual representation of FRBR and ISTC toward the end of his presentation. ISTC can be considered the expression level of FRBR rather than being the same as a “work."

Graham Bell: As discussed in my presentation, the conceptual model used by the ISTC standard is the <indecs> model, which is different from FRBR.

FRBR, indecs and CIDOC-CRM are differing ways of modeling the world – FRBR comes from the library world, CIDOC-CRM is intended specifically for the cultural heritage sector, and indecs is intended to underpin commercial operations in the creative sector. Each of these models defines various entities that need identification and description. However, there is a lot of commonality – CIDOC and index for example are both fundamentally based on the idea of events. An extension to FRBR called FRBRoo (furberoo!) ports these ideas into the library model.

However, it is really important when looking for areas of overlap between these models to NOT just look at the names of the various entities – you have to look in detail at the semantics. A “work” in FRBR is not the same as a “work” in indecs.

ISTC is based on the indecs model, so an ISTC does not identify what FRBR would call a work. In fact, the indecs work which is identified by an ISTC is very much like what FRBR would call an “expression”.

As a practical example, the two books Mannen som hatar kvinnor and The girl with the dragon tattoo are two different FRBR expressions of one FRBR work. But they are two different indecs works. They have different ISTCs.

The relationships between indecs works (and thus between ISTCs) are as important as the ISTCs themselves, as one work can be derived from another. Works can be related by events like compilation, extraction, revision, translation, abridgement and so on, as Pat showed. And so The girl with the dragon tattoo has an ISTC of its own, and a ‘translated from’ relationship to the other ISTC.

Finally, as an ISTC identifies a indecs work, there will be a cluster of manifestations of that work, each manifestation having an ISBN. So you use ISBNs to differentiate between the manifestations, and ISTC to collocate all manifestations of the same content together.

4. ISBN, ISTC, is anything relevant to a magazine publisher who seeks to publish individual articles and collections of articles in EPUB3 and .mobi?

Pat Payton: Chapters or individual articles can be assigned an ISBN. This would allow users to find the content within the article. However, in the magazine industry, most publishers have used DOI’s to identify chapters. A DOI is a permanent URL to a particular piece of content. By distributing the DOI with the metadata, the libraries can always resolve to the correct point on the web where that content is hosted even if the publisher changes hosting services.

Graham Bell: Yes. Each article is effectively a very small monographic product, and is thus eligible for an ISBN (and I would recommend assignment of ISBNs if the individual articles are going to be sold through any kind of supply chain).

And as Pat explained, if you are selling EPUB and mobi formats, these would need separate ISBNs.

5. Is there a standard for ISTCs in a MARC record?

Pat Payton: Yes, the ISTC can be stored in the 787 field. See resources for a white paper on this topic.

6. Is ISTC implemented yet on any outlets?

Pat Payton: Yes, several publishers are using the ISTC in order to group related manifestations within their own system. They have chosen to use the standard rather than create their own title linking logic. However, the ISTC was a long time in development and in the interim retailers, wholesalers, aggregators, and publishers each developed their own linking logic, which they use internally. As such, there is inconsistence in individual definitions of what a work is and how their internal identifier is applied, which has caused a slow pick up of the ISTC as an identifier across parties.

Graham Bell: Almost certainly not yet, because very few ISTCs have been assigned so far. But big retailers are creating their own ad hoc ‘clusters of ISBNs’ by trying to match title and author. This is, unfortunately, subject to significant error, and publishers, librarians and readers are led astray by spurious ad hoc clusters of ISBNs.

7. Does ONIX handle metadata about 1 thing originating from multiple sources? (If so, how? Multiple records overlaid, or different attributions for different record parts, or something else?)

Graham Bell: Implementation-dependent. One would normally have a hierarchy of trust that would replace older data with newer, and data from more trusted sources would overlay less trustworthy data. This is why each message (and optionally, each record) contains information about the provenance of the metadata.

And as Laura pointed out on the call, this hierarchy of trust can operate at an element-by-element level.

8. How many resellers use all of the ONIX data? Is there a sense of how detailed it is worthwhile to get in general. I have seen several meta templates (as we don't have ONIX yet)that have very little metadata requirements, while ONIX is quite rich. Curious how much of the ONIX data is utilized.

Graham Bell: No one publisher, and no one retailer uses the WHOLE of ONIX, but people like B&N, Amazon and other retailers use a very large chunk of it. Aggregators like Bowker, wholesalers like Ingram and Baker and Taylor also deal with a very large subset.

BISG publishes a set of guidelines on the most important data elements within ONIX, and EDItEUR also recommends a slightly broader subset that almost everyone should support.

However, I would also caution that the minimum set of elements that is important to you is dependent on the nature of your business. A slightly different business will need a slightly different subset of the set of possible data elements.

Laura Dawson: ONIX is a communication tool, so your best bet is to find out what your trading partners expect and want to be hearing from you, and provide it. Different areas of the ONIX spec are useful to different constituencies – data that B&N and Ingram send back and forth via ONIX is going to be different from that which the publisher provides. BISG’s recommendations for minimum field support are here: http://www.bisg.org/what-we-do-21-8-product-metadata-best-practices.php. But Graham is right - you’ll want to modify that based on your trading partners’ needs.

9. Will there be a master ISBN? Or will the ISTC have that function?

Pat Payton: The ISBN relates to individual formats of a work. The ISTC will perform the function of a bridging identifier at a higher level linking works with individual ISBNs together.

Graham Bell: The ‘master ISBN’ idea is exactly what the ISTC is supposed to eliminate. ISBNs identify manifestations (more strictly, tradeable manifestations), and manifestations are peers – there is no real ‘master’. ISTCs identify the textual content, so one ISTC applies to multiple manifestations.

10. Do you expect publishers to move away ISBNs as they adopt ISTC numbers?

Pat Payton: No, publishers will still need ISBNs to sell their works through the supply chain, the ISTC will be a part of an ISBNs metadata and will allow those that choose to use it to more clearly display groups of ISBNs for sale. Think of it as a BISAC subject code that allows products of relevance to be shown to the user at once.

Graham Bell: No, they are complementary. ISBNs for differentiating between manifestations, ISTCs for collocating all manifestations of the same work.

11. Is there/will there be a relationship between the DOI and the ISTC?

Pat Payton: This is an interesting question. A DOI is a permanent URL to a particular piece of content. The DOI would could be a one to one match with an ISBN for say a PDF version of content.

Graham Bell: Not initially, just as there is no necessary relationship between the ISBN and the DOI.

But do you know about the ISBN-A? This is a DOI that incorporates the ISBN within its syntax (see http://www.doi.org/factsheets/ISBN-A.html). Not a lot has been done with this yet in the US, but there are a couple of European implementations in Italy and Germany (see for example http://www.isbn.it/LISBNA/cosèlISBNA.aspx). Maybe one day someone will create an ISTC-A…

12. I'm looking for the publisher guidance on accessibility mentioned by Graham. Was that an O'Reilly publication? Or on the EDiTEUR site?

Graham Bell: The accessibility guidelines that EDItEUR has produced are available here: http://www.editeur.org/109/Enabling-Technologies-Framework/.

These are deliberately very high-level, and have been written by my colleague Sarah Hilderley about how accessibility can be improved. Our work on this is part of the Enabling Technologies Framework project funded by WIPO, and we are also involved in WIPO’s TIGAR project. Sarah recently presented some of this work at BookNet Canada's Tech Forum and I think her presentation will available soon on the BookNet Canada website (http://booknetcanada.ca/index.php?option=com_content&view=article&id=665&Itemid=627).

Our ONIX controlled vocabulary for describing accessibility is here: http://www.editeur.org/files/ONIX%20for%20books%20-%20code%20lists/ONIX_BookProduct_CodeLists_Issue_16.html#codelist196

EDItEUR would be really interested in helping develop something semantically interoperable within MARC.

Matt Garrish's O'Reilly book on accessibility best practice (very detailed and technical, really aimed at those creating e-book files) is here: http://shop.oreilly.com/product/0636920025283.do – free for the cost of giving your e-mail address to O'Reilly. It is an extract from a larger forthcoming book by Matt and Markus Gylling on the EPUB format.

13. ONIX question: I suspect our e-book aggregator gets its metadata for our MARC records from publishers providing ONIX data. The subject fields are inadequate. Any plans to enhance subject headings in ONIX?

Graham Bell: I may have misunderstood this question during the call. Do you mean the data that is supplied to you by the aggregator is inadequate? Or that the structure of ONIX is inadequate to contain the detailed data you need?

On the second question, as I said on the call, ONIX can carry subject headings drawn from many different schemes – LCSH, Dewey, BISAC, BIC (used in parts of Europe) and many other schemes, even arbitrary keywords. Any one ONIX record can contain multiple classifications.

But the quality of the data is obviously dependent on the effort put in by the publisher (or by an intermediary like Bowker, who may enhance publisher records before they go to your aggregator). Publishers often want not to limit the appeal of their books, so classifying it as ‘Fiction’ might seem better than classifying it as ‘18th century historical fiction set in West Africa’. They don’t always understand that 18th century historical fiction counts as fiction as well. We encourage detailed classification, using BISAC or whatever scheme is used in the particular country. But with the best will in the world, you are not going to get publishers to use LCSH or Dewey well (or in most cases, at all).

14. Would MARC records created by LOC be better or worse than MARC records derived from publisher's ONIX?

Graham Bell: MARC records created by mapping from ONIX need intervention to apply the appropriate cataloguing rules – so LOC, OCLC and the like add value AFTER they do the initial mapping of the data. It's also true that mapping from ONIX to MARC throws away some of the value of the ONIX data – you lose the rich collateral material, for example.

15. As I understand it ONIX has info at the book level. Is there any standard to capture the chapter-level data as well?

Graham Bell: Chapter-level info (like separate subject headings, separate authors etc) can he included at the title level within ONIX. It is however, rarely used.

16. That is new in ONIX 3.0 only?

Graham Bell: No, ONIX 2.1 had some support for chapter-level information.

17. How do we work with third party e-book aggregators who do not see accurate metadata as a priority? We would switch vendors if there were an alternative vendor to provide the content we need but this is not possible.

Laura Dawson: If there is a single truth in the book business, it’s that you can’t control what other people do with metadata. And yet we all have to make agreements and send and receive data from one another. If you cannot go to another aggregator then a business case has to be made to your current aggregator – your poor attention to metadata is costing us real money…and thereby costing you real money. If readers can’t find the books, that’s going to affect hits, licensing fees, everything. Effective metadata is, fortunately, something that can be measured. (Ineffective metadata cannot.) See if the aggregator is willing to experiment on a subset of books, and track that success. Depending on the previous poor quality, the success rate could increase around 40% - which is compelling to anyone.

18. If ISTCs consider the text only, how does that work for books using the same basic text, but the difference lies mostly in the illustrations? Children's books illustrating a well-known poem or story, for example...

Pat Payton: In most cases I have seen, the illustrative version would be a derivative rather than sharing the ISTC of the original work. They would, however, list the source work's ISTC in their metadata.

Graham Bell: ISTCs can be related, as one work can be derived from another. One of the relationships between two ISTC can be ‘non-text material added or revised’. So you could have three books, the first non-illustrated, the second illustrated and the third illustrated with extra or different images. These would have three different ISTCs, yet they would be related.

19. ISTC For the process of assignment and request for registration, what is the required metadata?

Pat Payton: Here is a link to ISTC for ONIX, which is the standard for transmitting requests for ISTCs: http://www.istc-international.org/html/resources_links.aspx. However, currently Bowker as a registration agency for ISTC will accept similar data in excel form if you wish to apply for ISTCs.

Graham Bell: Off the top of my head…

  • is it original, or derived from a previous work?
  • if derived, how? (about a dozen possible relationships, and optionally you can supply the ISTC of the source)
  • title (and subtitle) (possibly also alternatives like a former or working title)
  • work type (prose, poetry, playscript etc)
  • language of text
  • edition info
  • a date
  • contributors and their roles
  • optionally, basic biblio information for one manifestations
  • name and role of registrant (author, publisher or whomever)

This can best be delivered to the ISTC registration agency of your choice via an XML message called ONIX for ISTC Registration, the schema for which provides a full specification of the minimum referent metadata. See here: http://www.editeur.org/106/ONIX-ISTC-Registration-Format/.

More information about the ISTC in general here: http://www.istc-international.org/html/about.aspx and here: http://www.istc-international.org/html/multimedia/pdfs/ISTC_User_Manual_2010v1.2.pdf.