Back From the Endangered List:
Using Authority Data to Enhance the Semantic Web
Below are listed questions that were submitted during the February 9, 2011 webinar. Answers from the presenters will be added when available. Not all the questions could be responded to during the live webinar, so those that could not be addressed at the time are also included below.
- Linking Things and the Virtual International Authority File
Jeff Young, Software Architect, OCLC Research
- Authorities as Linked Data Hubs
Richard Wallis, Technology Evangelist, Talis
- The Getty Vocabularies: 'Non-Authoritarian' Authority Files for Art, Architecture, and Material Culture
Murtha Baca, Head, Digital Art History Access, Getty Research Institute
Feel free to contact us if you have any additional questions about library, publishing, and technical services standards, standards development, or if you have suggestions for new standards, recommended practices, or areas where NISO should be engaged.
NISO Webinar Questions and Answers
- Is there any interest among name authorities to use VIAF to import names from other authorities into their own?
Answer (Jeff Young): Yes, this is one of the ways VIAF could be used. One scenario is someone in Hungary with a new name (to them) that they look up in VIAF and then copy the information from VIAF into their authority file to create their own authority record for that person.
- What is the relationship between VIAF and the Open Metadata Registry? With Freebase?
Answer (Jeff Young): The Open Metadata Registry documents a variety of metadata element sets. VIAF uses several of these element sets (aka namespaces) to describe resources in RDF:
- FOAF: Friend of a Friend
- FRBR Entities for RDA
- RDA Group 1 Elements
- SKOS: Simple Knowledge Organization System
- FOAF: Friend of a Friend
Freebase is a cross-domain node in the Linked Data cloud (http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.html). VIAF currently focuses on linking to DBpedia for cross-domain purposes. Of the 14 million clusters, 240,000 currently have links to DBpedia entities. Freebase does have many links to VIAF, however.
- Is there an identifier assigned to the "person" as opposed to the VIAF cluster? Is "person" a separate entity? What is its data representation?
Answer (Murtha Baca): In ULAN, the person or corporate body (in short, the creator) is the subject of the authority record, which can berepresented by any number of names. I would say that a VIAF cluster, if clustered properly, represents a single entity.
Answer (Jeff Young): The "person" is modeled as a separate entity from the "VIAF cluster" and consequently identified separately. The "cluster" deals with administrative and record-keeping details and is mostly represented using SKOS elements with a few custom VIAF elements thrown in. In contrast, the "person" entities are represented in two alternatives: 1) the FOAF element set and 2) the Open Metadata Registry's "FRBR Entities for RDA" and "RDA Group 1 Element" element sets. The identifiers for the cluster and the person are related by foaf:focus. You could read this as "X (the record) has focus Y (the person). See http://xmlns.com/foaf/spec/#term_focus for details.
- Do you have a sense of error rate?
Answer (Jeff Young): Our goal is to have more than 99% of the pair-wise matches be correct. To achieve this level we have to ignore many slightly ambiguous matches.
- Do you think that it is imperative to use a completely open licence for your data in order for the Linked Data principles to work? Might this be a disincentive to some?
Answer (Jeff Young): From a protocol POV, Linked Data may or may not be open data and vice versa. Linked Data needs to be secure and sustainable, so licensing and a variety of business models will remain important. Andy Powell and Pete Johnson at Eduserve have offered this list of useful clues:
Answer (Richard Wallis): To quote Paul Walk [http://blog.paulwalk.net/2009/11/11/linked-open-semantic/]:
- data can be open, while not being linked
- data can be linked, while not being open
- data which is both open and linked is increasingly viable
- the Semantic Web can only function with data which is both open and linked
There is some obvious fears about opening up data, not least from those whose business models may be effected. The key issue is what you open up. For the public good, the ideal is as much as possible. For the more constrained environment, it would be a process of sharing metadata to 'advertise' the existence and location of resources.
- For Richard: Karen Coyle has complained about the lack of a "killer app" for using open linked data. Can you recommend a good app for searching linked data and demonstrating the semantic web?
Answer (Richard Wallis): The killer apps that will be built upon the emerging Web of Linked Data are yet to appear. We are in a similar position to that of the emerging Web of the late 1990's, when Wikipeadia, Facebook, and Twitter had yet to be proceeded by Google. The Web of Linked Data, like the Web itself is an enabling infrastructure.
The main benefits of Linked Data is to identify relationships between concepts and to help users navigate those relationships, although data published as Linked Data can help improve search. Having identified a thing/concept, probably using search, Linked Data can enable intuitive navigational possibilities between other concepts. A good example of this is the BBC Wildlife Finder [http://www.bbc.co.uk/nature/life] which brings together concepts curated and published by others such as Wikipeadia and the Animal Diversity Web to add value to the BBC resources resulting in a rich user experience with intuitive navigation.
- Todd opened the webinar by talking about "trust" among people and how computers cannot use the "cues" that people use to determine trust. Yet the semantic web, at this point, appears to equate all "hubs" with equal validity. How do we bring the "trust" factor into the hub and the results searches and make it seamless to the end user?
Answer (Jeff Young): One of the advantages of "Cool URIs for the Semantic Web" (http://www.w3.org/TR/cooluris/) is that it allow us to leverage the HTTP protocol for verification purposes. If I encounter an assertion that looks suspicious, doing an HTTP GET request on the subject URI could confirm the assertion and/or put it in context. Having Linked Data available in bulk from trusted sources is another way for users to limit their processing to trusted sources.
Answer (Richard Wallis): Trust, and associated authority, is contextual - you may trust data from one source for one topic, but not another. Like the Web of Documents, the Web of Data is a place where anyone can say anything about anything. However it is up to you, and your judgement, as a service provider as to which data sources you trust to build your service - where you link out to. It requires a similar set of value judgements as does linking out to other web sites. The Linked Open Data Cloud diagram [http://richard.cyganiak.de/2007/10/lod/] already reflects some of these judgements by representing the quantity of links between data sets.
- Many different systems of unique ids? How is this being addressed?
Answer (Jeff Young): This should be viewed as a feature of Linked Data rather than a problem. Presumably these different systems contain information that is peculiar to their use cases that should be obtainable when their identifiers are dereferenced. Interoperability of these identifiers with those coined in other domains can be achieved using owl:sameAs.
- Authority files often have implicit assumptions about entities, e.g., that psuedonyms can be a separate entity. Do these assumptions about what defines an entity need to be explicit in the open linked data environment?
Answer (Murtha Baca): In the ULAN datamodel, pseudonyms are not considered to be separate entities. They are simply another variant name (equivalency relationship) for the person/creator/corporate body. The same should hold true for VIAF.
Answer (Jeff Young): VIAF does its best to reflect and preserve the explicit decisions represented in the contributed data.
- Are there any plans to bring BGN information into VIAF?
Answer (Jeff Young): VIAF currently focuses on entities of type "person" and "corporate body". Entities of type "Place" could be added in the future. The U.S. Board on Geographic Names (BGN) could be a useful contributor in that area.
- Will you expose on ULAN the links to "roles" in AAT? In the same way you described linking to a city in TGN on an artist record in ULAN?
- Are any of the major search engines relying on VIAF for search results? Or are there any pilot projects between search engines and VIAF underway?
Answer (Jeff Young): Not currently, although Freebase does link directly to VIAF and most search engines harvest at least some of VIAF.