Home | Publications | White Papers | Internet, Interoperability and Standards: Filling the Gaps

Internet, Interoperability and Standards:
Filling the Gaps

Janifer Gatenby

European Product Manager

Geac Computers

jgatenby@geac.fr

 

Abstract

With major changes in electronic communications, the main focus of standardisation in the library arena has moved from that of supporting efficiency to allowing library users to access external resources and allowing remote access to library resources. There is a new emphasis on interoperability at a deeper level among library systems and on a grander scale within the environment of electronic commerce. The potential of full inter-operability is examined along with its likely impact. Some of the gaps in current standards are examined, with a focus on information retrieval, together with the process for filling those gaps, the interoperation of standards and overlapping standards.

 

Introduction

Libraries now find themselves in a very new environment. Even though they have always co-operated with one another and have led standards efforts for decades, their inter-operability has been at arm's length via such means as store and forward interlibrary loans and electronic orders. The initial goals of standardisation were to increase efficiency, e.g. by exchanging cataloguing, by electronic ordering and only secondarily to share resources.

Initial standards efforts in libraries concentrated on record exchange as part of the drive to improve efficiency by sharing cataloguing. This led to a raft of bibliographic standards concentrating on:

  • the way in which catalogue records are made (contents - cataloguing rules such as AACR2, classification schemes, subject headings, name headings)
  • how they are identified (LC card number, ISBN, ISSN etc.)
  • how they are structured for exchange (MARC)

Viewing library standardisation chronologically, acquisitions was the next area where libraries strove to increase efficiency co-operatively. Standards for the exchange of orders and interlibrary loan (ILL) data appeared. These commenced with efforts to standardise forms used among libraries and suppliers and these forms served as a good foundation for electronic transactions.

The focus is now changing and changing quite rapidly with the advent of Internet and electronic commerce creating the expectation of interactive real time access to resources, regardless of location. As the role of the library changes, its ability to access and be accessed becomes paramount and this involves interoperability. Moreover, interoperability is required on a larger scale, including not only a more detailed level of functional interaction among library automation systems, but also interoperability more generally within the environment of electronic commerce.

Interoperability

Interoperability requires standards on several levels. It is necessary to standardise both what is being exchanged (data elements), how to structure it for exchange (record schemas and record syntaxes) and how to actually exchange it (protocol transactions and messages and profiles). Examples:

  • Protocol standards e.g. Z39.50, ISO 10160/10161, X500, LDAP, HTTP, FTP, XQL - messages exchanged between client and server
  • Protocol profiles – e.g. Bath, UCP, CIMI, IPIG - limiting options to ensure interoperability
  • Data element standards – e.g. ISO 8459 - defines the elements that are part of messages. These may or may not be grouped into data structures or records
  • Record structure standards – e.g. MARC (ISO 2709), GRS-1, SGML, XML, HTML
  • Record content standards - Z39.50 Holdings schema, RDF, EAD,

 

Information Retrieval

NISO Z39.50 (now also ISO 23950) was first launched as an information retrieval protocol for bibliographic data in 1988. Since then it has evolved into a generic protocol that has been widely implemented for bibliographic, geospatial, medical, chemical, biological, museum and government databases of an estimated value between 10 and 100 billion dollars US.

Z39.50 is a rich protocol, addressing the following components of information retrieval:

  • Searching and presenting results
  • Sorting of the results before presentation (e.g. so that only the most recent are presented from a large results set)
  • Removal of duplicates from the set
  • Negotiating retrieval of large results sets (segmentation, query refinement etc.)
  • Retrieval of selected contents
  • Browsing indexes and thesauri
  • Restricting access to authenticated users
  • Extended services including placing orders, updating files, regular repeating queries, saving results sets and exportation of data.

The latest version is expandable to cater for the needs of information retrieval over different domains. Flexible definition is available for:

  • Differing search attribute sets
  • Different record formats (syntax and schema)
  • Generic record syntax
  • Diagnostic sets

The standard is loosely aligned with the ISO data element standard 8459-3, although this standard is now outdated, being produced in 1993 and more closely aligned with the deprecated standards ISO SR (10162 and 10163).

Profiles over Z39.50 have been prolific. Examples are:

  • The Bath profile - an international profile currently under construction that attempts to harmonise various regional and national search and retrieval profiles for interoperability (CENL, ONE, CCF, ATS-1, Finnish profile, Models, ZTexas)
  • The Union Catalogue Profile for catalogue updating
  • The CIMI profile for searching museum and cultural heritage data

From a bibliographic viewpoint, the standard will address all needs with the completion of current work on holdings. At the time that this paper is going to press, the Holdings schema is in its final draft and together with three other items indicating ways of searching and filtering holdings data, the long awaited solution for holdings is now completed.

Z39.50 and Other Enquiry Standards

In spite of the maturity and success of Z39.50 as a retrieval standard, it risks only getting a glance from the web community as a suitable generic protocol. Because of this and of emerging contenders, such as XQL, some doubt the future of Z39.50 and see it, at best, as having application within a limited sphere.

It will be a long time before a true rival to Z39.50 appears. The emerging XQL standard is a standard only for making a query. It does not address the richness of function in Z39.50. It is important to note other distinctions, apart from functionality, between Z39.50 and XQL.

The Z39.50 standard is based on abstract access that does not necessarily relate to the structure of the documents or records. This has the advantage of allowing systems of all ages to generate a front end that will accept Z39.50 searches and translate them into searches that will search the database. It also facilitates multi-target searches and cross domain searches. Z39.50 can be used over databases of all types, including those containing full text. The DSTC organisation in Brisbane has created a new SQL query type for Z39.50 – Z+SQL that allows interactive discovery of relational database tables and subsequent Z+SQL searches. Report generation with full computational powers over remote databases via Z39.50 will soon become a reality.

In contrast, XQL is mainly aimed at searching structured documents and relies heavily on the structure assigned to the document. Whilst it can search documents of different types, this is limited to the extent to which the document type definitions or record schemas have similar elements with the same nomenclature.

Ralph LeVan wrote of Z39.50 in November 1998:

" Because this standard was developed before the Web, it violates three of the tenants [sic] of the Web community. It is stateful, runs directly over TCP/IP and uses a binary encoding scheme. These exceptions to current Web practice make the standard itself unacceptable as a Web searching standard, but the experience of the community, reflected in that standard, is important."

Since this public statement, the working group of the standard, the ZIG (Z39.50 Implementors' Group) has sought to address these concerns. A profile has been developed allowing Z39.50 to be embedded within the HTTP protocol. This allows:

  • Z39.50 searches to be sent in a stateless way (by encapsulating groups of messages that need to executed in a set sequence)
  • Z39.50 searches to be coded in XML instead of BER but still be described in ASN.1

Other measures that have been taken to allow Z39.50 to interoperate optimally with the web are:

  • Definition of SGML, HTML and XML as record syntaxes. Thus, for example, it is possible to send a MARC record in SGML or XML using the MARC DTD rather than as a traditional MARC record, facilitating its display by a web browser.
  • Preliminary investigation into the extension of the protocol to allow external registration of query types, e.g. XQL instead of Z39.50 type 1 or Common command language (CCL).

It is more likely that future versions of web browsers will have XQL search capabilities as opposed to Z39.50 capabilities, therefore it is also likely that the status quo will not change, where it is only Z39.50 that is truly capable of searching and retrieving data from the databases behind the web pages.

Z39.50 and Interoperation with other Standards

Interoperability is paramount. In addition to interoperating with other systems, protocols need to interoperate with each other so that programs can seamlessly pass from one to another with no visible effect on the user interface. We already have examples of this:

  • ability to send ILL messages from within Z39.50
  • HTTP Z39.50 gateways
  • HTTP profile for Z39.50 (outlined above)
  • To come : NISO circulation protocol interfacing with Z39.50 and ISO ILL

Full Library Interoperation

Despite changing focus resulting in changing roles, the underlying mission of the library has not changed, although perhaps its phrasing could be modernised to be "the right information, to the right individual at the right instant". The modern library must strive to achieve this by regarding its own collection as only a fragment of potential sources to serve its public. The requirement is now for fully interoperable reader services, for example enabling identification of external resources, followed by the ability to place a remote reservation or booking, followed by a loan. The service needs to extend to a fully seamless user service by providing integrated notices, e.g. all overdue materials, not separate overdues for internal and external sources. Thus, information retrieval alone is not enough. Examples of the requirements:

  • Renewals and reservations from Web OPACs
  • Overdues and recalls for systems for interlibrary loan materials from one library management system to another or to an interlibrary loan system
  • Loans, fees and reservations transactions from a standalone interlibrary loan system to a library management system
  • Interlibrary loan transactions between libraries
  • Loan, return and other circulation transactions from a self checking system to a library management system.
  • Electronic orders, follow up and invoices between libraries and suppliers

Electronic ordering has been a reality for a decade now and the standards are maturing. The BISAC, SISAC and X12 standards are being superseded by EDIFACT with particular extensions for libraries. The EDItEUR committee is maintaining and promoting international adoption of this standard. There are obvious advantages in creating library and book trade extensions to a generic commercial standard. EDIFACT has its own syntax, but the Danish Bibliographic Centre has done a pilot implementation of encoding EDIFACT EDItEUR in XML for easier exchange of this data using the web.

For interlibrary loans, we already have the ISO ILL standard, together with the emerging IPIG profile and for more simple document requests, the item order extended service of Z39.50. Work is still being done in this area with discussion of a new edition of the standard. The separation of the standard into two parts makes it easy for it to be represented in more than one encoding and research into XML encoding has started as a joint Danish and Australian effort.

From the scenarios above, there is an obvious need for a full circulation protocol to facilitate in an optimum way reciprocal borrowing and interlibrary loans. Ideally, the local library management system should have a total picture of a person’s loans to correctly manage loan limits, fees and overdues etc. Work is currently underway in NISO to enhance the protocol made by the 3M company for its self checker system so that it caters fully for all circulation transactions in an international environment.

New Areas – User Services

In relation to the World Wide Web, new and very important roles are emerging for libraries :

  • Negotiating access to resources and facilitating remote access
  • Assisting resource discovery

Libraries are now striving to serve their users by providing a mechanism for remote access from personal devices to library services that may include subscriptions to remote materials. In this context, libraries need to provide a means of authenticating and authorising such access. Guidance leading to the discovery of potentially useful remote resources is also a valid new library service. Both these services can be based on a remote directory using current the industry standards X500 and LDAP (Light Directory Access Protocol). The European Union’s Pride project is leading the research in this area. One of the key players in this project is Macquarie University, represented by Kerry Blinco.

One outcome of the developments in this area has been the National Library of Australia’s initiative to update the ISO directory standard ISO 2146 to identify data elements relevant to the description of library and related institutions and resources. This development has the potential to impact library management systems. Only system specific data about personal users and institutions needs to be stored locally. Generic data such as address details etc. can now be derived from the distributed directory.

New Areas - Publication

  • New web resources – conversion and presentation
  • Existing web resources – authenticating and protecting

In relation to the web, libraries have two important new roles. Firstly, libraries are becoming publishers. Although, currently only in their infancy, digital libraries consisting of materials created or collected then organised for retrieval by libraries, will become increasingly important resources. In relation to important works, libraries must assume the responsibility of identification, description and protection. There must be a way of establishing the authenticity of an electronic document, and protecting it from disappearing when web pages get reorganised. As an extension of this role, libraries may also find themselves playing a role in the management of intellectual property rights.

A whole new range of standards is required. The new standards are being developed as extensions to emerging W3C base standards in the form of schemas and document type definitions to XML. Cataloguing has evolved to accommodate the needs of electronic data and transformed itself into metadata in such forms as the Encoded Archival Description (EAD), Dublin Core and the MARC-DTD. There is more work to do in this area. For example, Anne-Marie Vercoustre of the French research body INRIA found the Dublin Core date and format definitions inadequate for a digital library of photographs. The date a photograph was taken is more important than the date that "the resource was made available". This distinction is applicable to media other than photos. Additional format elements such as orientation and size are important for photographs.

Potential of Full Interoperability

The standardisation gaps are now closing in our area and the nirvana of total interoperability is in sight. What is the potential of full interoperability and how will it change the way that libraries operate? Some insight:

  • Greater choice of clients and servers
  • Flexible configuration
  • Robustness – standards based client server systems are more stable – can change one end without substantial testing of the other end
  • Can test drive new clients without needing a total change over
  • Can create networks without imposing a uniform system
  • Greater ability to tailor
  • Data access rather than data repetition - each system no longer needs to store complete supplier details, it accesses the most up to date information from a directory
  • Integrated data no longer requires an integrated system
  • Greater ease of development:
    • Tough decisions are already made
    • Avoids proliferation of conversion programs
    • Provides a checklist of functions
    • Provides development guidelines
    • Simplifies parameter settings
    • Exchangeable tool sets provide short cuts and limit testing
    • Easier to find trained personnel for development and implementation

 

The Standards Process

Producing a standard is often a long and labour intensive process that can also be expensive for the parties involved in the development. Usually, experts from interested parties meet to define the scope then the detail, and efforts are made to produce consensus as much as possible. The greater the number of people involved in producing an initial draft, the longer it takes to achieve consensus. Draft documents made by working committees and sub groups are then circulated for comment and then members of the standards bodies vote. There may be several voting stages, such as in the case of ISO standards where there may be 4 votes. Traditionally, standards are made at the national or regional (e.g. European) level then advanced as an international proposal, although with increasing internationalism, efforts are being made to achieve international standards from the outset.

Timely Standards

Is the standards process at the moment an optimum way of achieving results? There has been much criticism of the length of time taken to produce standards. If a standard appears too late, diverse and incompatible implementation decisions may be taken by system developers and these "premature implementations" then are often tardy in implementing standards when they are finalised. There is a risk that those developing systems before a standard has been determined, not only resist implementing it, but actively oppose its implementation. The long awaited solution to holdings search and retrieval may be an example of this. Each library system supplier that has created an OPAC using Z39.50 has made proprietary extensions for displaying holdings and borrower information.

When criticising the library, information and documentation standards process, a comparison with standards made for Internet is often made. Many of the Internet standards are small, e.g. relating to one small element such as definition of a url. These are registered, then available freely on the web. No effort is made to promote them. Larger standards are produced by teams of people who work full time on them. For example, the XML set of standards is produced by the W3C consortium that boasts a full time staff of 60 people. In contrast, most people working on library standards are doing it with the grace of their employers and families.

In the ISO arena, the minimum time to produce a standard from nothing is approximately 24 months if the standard goes through all the draft and voting stages. Nevertheless, drafts deemed to be already well constructed and reviewed could go through the process in about 12 months. While this may still seem too long, once a draft has achieved DIS (Draft International Status) it is possible to cite it and even to commence preliminary development planning. Moreover, it is not wise to bypass the comment and discussion process, especially when striving to make a standard with international application. The key to getting standards available in a timely fashion is to commence with a good quality draft and the key to getting a good quality draft in a timely fashion is to provide funding. This can avoid much circular behaviour.

Much standards work, however does not need to undergo the "full treatment". Standards such as Z39.50 contain hooks, so that new components may be added easily. Once consensus has been reached by the ZIG group, new attribute sets, diagnostics and record schemas are given object identifiers and made available on the maintenance agency web page. Therefore extending an existing standard rather than inventing a new one, can be quite efficient. Nevertheless, the efficiency can be lost without a sound draft for the extension at the commencement.

Alignment

Standards that should be aligned do not stay in line. An example was given before of the ISO data element enquiry standard (8459-3) becoming out of line with its associated protocol standard (ISO 23950). The same is true for part 1 – Interloan application and part 2 – acquisitions applications. Data element standards define data elements in a standard way, giving examples. They pay attention to the need in the international arena for unambiguity and precision, necessary for cross language understanding and translation. A recent ISO initiative has addressed this problem. It has relaxed its policies on web distribution to allow ISO subcommittees to nominate essential reference standards that should be available on the web. At the recent meeting in Paris, May 1999, the data elements standard, 8459 was nominated in this category, opening up the opportunity to make it available on the web. The next step for this standard is to design a framework for the consolidation of all 5 parts and investigate a mechanism for the continuing maintenance of the standard.

Profiles too need to stay aligned with the standards on which they are based. ISO International Standard Profiles never had a chance of staying aligned because all changes to them underwent the same draft and voting procedures as those for the protocols themselves. To overcome this situation, ISO TC46 / SC4 has defined a new category of profile, the International Registered Profile (IRP). IRPs are designed to be available only as electronic documents, available over the web and are expected to keep pace with their parent standard. There have already been two Z39.50 profiles that have been through this process, the Union Catalogue Profile (UCP) and the CIMI profile for museum and cultural data.

The Challenges

We must strive to build on the foundation stones that we have currently to create a sound interoperability platform that is necessary for libraries to provide coherent and valid services to their users. With systems that are standards based, the library will be able to provide its users on their information journey with a service analogous to that of travel agent, plus consulate, plus bank. The travel agent function is for identification of resources, packaging and marketing, reservation, voucher provision; the consulate function is for authentication and assistance in case of trouble and the bank function is for funding certain services.

What can libraries and librarians do to contribute? Firstly, libraries must create an environment that is favourable to the development of standards. They should collectively create more precise tender documents that will ensure the correct implementation of existing standards and will stimulate their vendors into participating in the standards process. This requires a deeper understanding of standards that can be derived from such things as implementation documentation and implementation seminars. Means of accrediting systems that conform to standards and profiles, similar to those for year 2000 compliance are important in ensuring correct and complete implementations. Librarians need to work together with Standards bodies to evolve accrediting methods and checklists. Funding needs to be found for an entire range of standards activity from creation to promotion, distribution and accreditation. Library organisations should encourage experts willing to participate in the standards process and should cooperatively contribute to the funding of the process.