Home | News & Events | Events | NISO Past Events | Past Events | Localization in Reference Linking

NISO/DLF/CrossRef Workshop on Localization in Reference Linking

NISO/DLF/CrossRef Workshop on Localization in Reference Linking July 24, 2000 -- CNRI, Reston, Va


MEETING REPORT

INTRODUCTION

In a series of workshops held in 1999, primary publishers, secondary publishers, libraries and other interested parties discussed issues related to reference linking for journal articles and worked out a general model for name-based linking. That model, however, did not explicitly address the issue which has come to be known as the "appropriate copy" problem: where multiple legitimate copies of an article exist, there must be some mechanism supporting the selection of the most appropriate copy or copies for a particular user.

Multiple copies of ejournal articles might exist for any number of reasons, from local loading of journal collections by libraries and consortia to redistribution by aggregator services such as OCLC and EBSCO. Sources of articles can only be expected to increase as eprint services, archiving services, and mirror sites become more common. (A case for the need for appropriate copy resolution was made by Dale Flecker in a presentation at the 2000 annual meeting of the International DOI Foundation (IDF) (View Flecker presentation in Powerpoint.) Since the appropriate copy for a given user commonly depends upon the user's institutional affiliation (and occasionally depends upon the user's personal memberships), there must be some place in the linking process where local criteria can be applied. Moreover, once a mechanism for localization exists, a number of services beyond linking to the appropriate copy become possible.

This meeting was called because of mutual interest on the part of the Digital Library Federation (DLF) and CrossRef to explore approaches to this problem. It was hoped that libraries and publishers might jointly work out a framework for localization, and perhaps even develop a prototype service. The format of the meeting consisted of a review of relevant developments since the earlier workshop series, a problem statement and discussion of options, and the development of a proposed framework for the localization of services.

REVIEW OF RELEVANT WORK

Models for appropriate copy resolution (Priscilla Caplan)
View Caplan Powerpoint presentation.

A series of workshops on reference linking was held in 1999, co-sponsored by the National Information Standards Organization, the National Federation of Abstracting and Indexing Societies (NFAIS), the Society for Scholarly Publishing (SSP), and the DLF. One of the outcomes was a general model for reference linking. In that model, publishers populated three logically if not physically distinct databases supporting three types of lookup. A client (user or program) would look up a citation in a reference database to obtain an identifier, then look up the identifier in a location database to obtain one or more locations (URLs). Finally the URL would be used to obtain some type of content, for example, the full-text of the journal article.

At a subsequent meeting of the DLF architecture committee, three models of intervening in that process were postulated in order to insert some localization of resolution. All of the models assumed that a "universal" name resolver (location database) would be available for each type of identifier.

In the first model, intervention occurs in the process of looking up an identifier to obtain the associated location(s). Assuming the universal name resolver points to the publisher's copy of an article, a local (institutional) name resolver would be maintained containing identifier-location pairs for all articles for which alternative copies were preferred. The identifier would first be looked up in the local name resolver, and if not found, would be passed on to the universal resolver. The main problem in this model is the difficulty of populating the local resolver and keeping it current.

In the second model, intervention occurs as the location(s) associated with an identifier are returned. This model assumes that the universal name resolver is capable of registering and returning multiple locations of articles. A local server would filter these locations against an institutional profile indicating preferred sources for articles given the containing journal and date. Since the name resolver would return to the filter server only the identifier and locations, the filter server would likely have to do a reverse look-up in a reference database, using the identifier to obtain a citation to match against the profile.

In the third model, the universal resolver returns all known locations for an article and the institutional server sends a query to all hosting locations asking if this article is legitimately available to this user. This places the burden of screening on the service offering the content and obviates the need for maintaining local profiles. It has the advantage of allowing some response even for articles to which the institution does not have subscription rights; for example, a content service could reply that the article was available for a fee. It has the disadvantage of broadcasting a potentially huge number of queries over the network.

Update on the DOI Handle System (Larry Lannom)

The CNRI Handle System is the technology currently used by the International DOI Foundation (IDF) to resolve DOIs. It was noted that less than half the members of the IDF are primary publishers, and three different organizations, including CrossRef, are now in discussion with the IDF to become registration agencies for the DOI.

In April 2000 CNRI introduced version 5.0 of the Handle System, including Java versions of both the Handle System server and the client libraries. The new version is an overall improvement and includes a number of new features, including public key authentication to enable trusted resolution and distributed administration.

It was noted that in most of the described models for addressing the appropriate copy problem, a name resolver capable of returning multiple locations is required. The Handle System allows an identifier (a handle) to be associated with a list of type-value pairs. For example, a type might be "URL" and the value a valid URL. It is possible to enter queries of the form "for a given handle, give me all type-value pairs", and also "for a given handle, give me all type-value pairs of type X". Even today, a client could request all type-value pairs where the type was URL, and obtain multiple locations associated with that handle in response. (See slide.) However, with current technology, a web browser would not be likely to know what to do with a returned list of locations.

Update on CrossRef (Ed Pentz)

CrossRef was established by primarily STM (scientific/technical/medical) publishers, but its goal is to enable reference linking over scholarly publishing as a whole. There are now more than 40 members of CrossRef. The system began collecting metadata from publishers in May, and went live in June with five publishers. Importantly, publishers have cooperated to achieve this, and have all agreed to use DOIs.

Metadata contributed by publishers is collected in a central database, which currently contains more than 1.5 million records from 2600 journals. The metadata can be used to look up the DOI of an article for inserting into a reference to that article. The DOI when clicked will be sent to the IDF's DOI resolver for resolution, where it will return a link to the copy at the publisher's site. Although not currently done, it would be possible to use the database of metadata for other purposes, for example, enabling look-up of a DOI to get the associated metadata for use in an application like SFX.

There was some discussion about the quality of the metadata. It is not rich enough to substitute for A&I (abstracting and indexing) data as a tool for resource discovery -- for example, only the first author of a multi-author document is recorded. Optimal requirements for contributed metadata is a big issue. It has to be minimal to get publishers to participate, and to some extent the richer the metadata the more restrictive the uses to which it can be put.

Currently metadata is contributed for journal articles only, and only at the level of a work (not of a particular manifestation). Expanding to other types of documents is a high priority, particularly conference proceedings and books. A working group is developing a DTD for conference proceedings, but this is difficult as the namespace is less well defined than for journal articles. It was noted that NISO has just published a standard for title pages of conference proceedings which might be helpful in this respect; the EPICS/ONYX family of standards may also provide a model for the metadata.

LinkBaton (Eric Hellman)

View Hellman Powerpoint presentation.

LinkBaton (linkbaton.com) is a service of Openly Informatics, Inc. offering one approach to the appropriate copy problem. LinkBaton allows a user to establish his preferred source for certain types of information and be automatically routed to that source. Originally implemented as a "central spot localizer" for books and stocks, it is currently being alpha-tested for the journal-article genre.

To use this service, authors must substitute for URLs hyperlinks called "LinkBatons". LinkBatons address the LinkBaton server, and are constructed to pass information about the desired object using ISBN as the identifier for books and the ticker symbol for stocks. The first time a user clicks on a LinkBaton, he gets a menu of sources for that item, which for books might be a list of online bookstores and libraries. When the user selects his preferred source this is recorded in a cookie and used to route subsequent LinkBatons of the same type. The actual URL for each source is constructed from metadata passed in the LinkBaton. So, for example, if a web page author citing a book includes a LinkBaton instead of a URL, one user clicking that link may go to Amazon.com, while another may go to Borders.com or to his local public library.

LinkBatons are proven to work; the LinkBaton server currently gets thousands of hits per day and has the capacity to handle millions. However there is no provision for ensuring the requested item is actually available from the preferred source; some links may turn out to be dead ends.

Hellman noted that LinkBaton has partially implemented OpenURL and will be an easy and secure way for any internet site to implement 'OpenURL awareness'.

Open Name Services (Keith Shafer)

View Shafer Powerpoint presentation

OCLC's Open Name Services (http://names.oclc.org) is neither a new naming authority nor a database of names, but rather a trusted third party that enables service registering and profiling, institutional customization, and authentication. It is based on two concepts: separating the name from the service, and using existing unique names of objects.

The format of a request embedded in a URL is:

http://[name of resolver]/[type of name]/[service requested (optional)]/name

So, for example, an ISBN request might be:

http://names.oclc.org/ISBN/0201342855

If the type of service requested is left out, the default server names.oclc.org will return a menu of all possible services based on that name. (This is based on the appropriateness of the service; as with LinkBaton, there is no provision for ensuring the desired item is actually available from any particular service.) It was noted that this technique would also provide a workable alternative to the IETF's proposed method for finding namespace resolvers for URNs.

Currently Open Name Services supports the following types of names: ISBN, ISSN, SICI, ISMN (international standard music number), and handles. It also supports requests submitted in the OpenURL syntax.

Third-party services such as OCLC's Open Name Services could be used as a solution to the appropriate copy problem. These services could take the requested reference and match it against complex profiles based on authorization information and administrative and resource metadata to pick the appropriate copy for the patron.

Open Citation (Bill Arms)

The Open Citation project has its genesis in separate initiatives in the US and Britain. The Open Journal project which ended in 1998 was an experiment in adding large numbers of hypertext links to documents by storing them in link databases and superimposing them on documents as they are viewed. That project led to a collaboration between the University of Southampton and Los Alamos National Laboratories to do reference linking for large-scale ejournal archives using Open Journal technology. Carl Lagoze of Cornell is the PI. At the same time, Bill Arm's own interest in providing better support for websites like D-Lib Magazine led him to look at automatic tools for reference linking heterogeneous materials, using D-Lib, the ACM Digital Library, the LANL archives, and old NCSTRL collections as a testbed. Open Citation has been gathering and evaluating all tools for the automation of reference linking to see how these methods can be extended to apply to very different types of materials.

SFX and OpenURL (Herbert Van de Sompel)

View Van de Sompel Powerpoint presentation

OpenURL is premised on the idea that links should lead a user to appropriate resources. An institutional service component (ISC) describes the context of the user. There are many possible ISCs, including Ex Libris' SFX and OCLC's Open Name Services. OpenURL interfaces between any ISC and remote sources of links.

For an information service to be OpenURL aware, it needs to implement a way to know the difference between a user with access to an ISC and one without access. The exact mechanism isn't important, and could be the cookie-pusher used by the original SFX, information contained in a digital certificate such as the one being proposed by the DLF digital certificates prototype project (http://www.clir.org/diglib/architectures/digcert.htm), part of a user's stored profile in an information service, or something else. For users known to have access to an ISC, information services must then provide an OpenURL for each object to be passed on to the user.

An OpenURL is an actionable URL that transports metadata or keys to access metadata for the object for which the OpenURL is provided. The target of the OpenURL is the user's ISC. The remainder of the OpenURL transports the object's metadata. The format specification for OpenURL can be found at http://www.sfxit.com/OpenURL/openurl.html.

SFX (Ex Libris) is an ISC that can read an OpenURL as input and take action upon it. A number of information services have developed or announced the ability to generate and output OpenURLs including arXiv, CrossRef, Ebsco, Ideal (Academic Press), ISI, OCLC, Open Citation, Ovid, and SilverPlatter. There have been preliminary discussions about bringing OpenURL to NISO for development as an ANSI Standard.

The DOI, CrossRef, SFX and OpenURL are not competitors but complementary services which can work together. Multiple ISCs will also exist and will do different things with the data. The important thing is to have a model that includes an intelligent service component that knows about the user and has both an identifier and metadata for an information object.

Van de Sompel noted that the building blocks of a name-based reference linking solution that supports localization are:

  • one or more databases relating identifiers and metadata;
  • a facility to query the database(s) of metadata using the persistent identifier of the work as a key;
  • a default resolver for the namespace as well as alternative resolvers;
  • a facility that directs a user to the appropriate resolver.

View Model in Powerpoint.

Optimally, the latter facility builds on the existence of a single "central spot" per namespace, which is the target of all links based on that type of identifier. The central spot is the registry for both default and alternative resolvers and redirects links to the appropriate resolver. Redirection should occur via a cross-namespace standard such as OpenURL so that alternative resolvers can function based on identifiers from different namespaces.

The DOI -- CrossRef -- SFX -- OpenURL experiment (http://sfxserv.rug.ac.be:8888/public/xref/) was designed as proof of concept for the model described above. In this implementation, the central spot for the DOI namespace is the DOI proxy server, the default resolver is the DOI handle server, and registration of alternative resolvers works dynamically via the CookiePusher mechanism.

This model handles services based on institutional affiliation well. It does not handle reciprocal arrangements between publishers (for example, where all subscribers to Science can get to Nature and vice versa), although intelligence to do so could conceivably be built into the DOI environment. Accommodating access rights that are based on the individual's identity rather than his affiliation, as is the case with some publishers like IEEE, has not yet been explored.

SUMMARY AND DISCUSSION

Dale Flecker summarized the problem and points for discussion.

The original question was how to find the appropriate copy of ejournal articles. However, from these presentations it is clear that the real problem is the more general one of where localization of linking should occur. There are lots of different kinds of links we want to build; linking to the appropriate copy is only one type of many possible services. Also, earlier discussions assumed we wanted to go directly to resolution, which may have been in error.

While we tend to think of localization as an institutional, not individual, problem, for certain access arrangements this is not the case. However, given that factoring in customization for individual access is very hard, we would probably make more headway if we focused on institutional localization first.

Given the above, there are three main components of a solution to discuss:

1) Where does localization occur, and how is it invoked?

2) Given that name-based schemes are "dumb" (i.e., the name contains no information about the named item), how do you get metadata to describe an item?

3) How do you know the address of an alternate copy, given that algorithmic transformations on metadata do not always work.

There was general (though not unanimous) consensus within the group on several key points:

  • Localization should occur at a central site in front of the default resolver for a given namespace. This was agreed despite the identification of two possible problems: first, this is a dangerous hijack point if someone can get in there, and second, there are some resolvers that would refuse to support this.
  • OpenURL was accepted as mechanism for carrying service requests and metadata.
  • Use of cookies was accepted as a means of telling the central site to divert resolution to a non-default service component. Passing the name of the non-default resolver in the cookie is better than passing the address. The institutional problem is how to set the cookie: since cookies can only be read by the domain which sets them, one cookie per namespace would be required.

This agreement leads to a general framework as described below:

It was noted that the non-default service component was not necessarily an institutional resolver. It could for example be a trusted third-party service subscribed to by the institution, such as OCLC's Open Names Service.

A general problem with the whole idea of localization was also raised. Some publishers may want their identifiers to point to them and not to alternate copies. If a publisher is registered in CrossRef, for example, and using the DOI, he is not necessarily giving permission for that DOI to be redirected. Localization introduces the possibility of introducing honest configuration errors as well as of hostile hijacking. There needs to be a way for publishers to opt out of localization. This currently can be done on a DOI-by-DOI basis using the (nosfx=y) parm, but would be better if moved into the central DOI environment. Overall, however, the group felt that "opting out" was a bad idea, as it would look random to a user, and publishers should want to give, not prevent, added value to their users.

In order to begin building prototype services, several next steps present themselves:

  • The general framework agreed to at this meeting will be publicized for comment by interested communities;
  • The IDF will explore policy issues related to implementation of central site redirection in front of the DOI resolver;
  • CrossRef and DLF libraries will pursue the possibility of a prototype project using this framework;
  • A standard method for "opting out" of localization must be developed;
  • CrossRef will explore offering a service to ally querying by DOI to obtain metadata.

Copyright 2000 National Information Standards Organization


Modified August 20, 2000