Home | News & Events | Events | NISO Past Events | Past Events | Attribute Sets for the Z39.50 Protocol (1st Meeting)

Report on the NISO Attribute Set Architecture Meeting

March 3, 1998
Washington, DC


Background
On March 3, 1998 NISO held an invitational meeting in Washington, DC to discuss the array of issues surrounding the development and maintenance of attribute sets for the Z39.50 protocol (ANSI/NISO Z39.50). In Z39.50, attribute sets define the access points and operators by which searches can be conducted. Several attribute sets have been developed and others are in the development process. In Version 3 (1995) of Z39.50, the protocol allows a single query to reference two or more attribute sets, although few implementations appear to be taking advantage of this capability. The Z39.50 Implementors Group (ZIG) recognized last year that there was a need to develop an architecture that would give guidance to communities that were developing attribute sets for Z39.50. This architecture would define such things as what types of attributes could go into attribute sets, what the rules were for combining various elements in attribute sets, and how the various attribute sets that were developed by different communities related to each other. A subgroup of the ZIG developed a Z39.50 attribute set architecture. That document can be found at:

http://lcweb.loc.gov/z3950/agency/orlando/output/attrarch.htmll

The invitational meeting came about as a result of that document and the desire by NISO to move forward the process by which attribute set development could be furthered. It was recognized at that meeting that this would need to be an ongoing process of which the March 3rd meeting was only the beginning.

The invitation to the meeting stated:

NISO is aware of interest in new attribute sets to address retrieval of a number of different types of metadata, including archival finding aids, MARC bibliographic records, and the Dublin Core. Concurrently there is an acknowledged need for a revision of "bib-1" to better address bibliographic data generally. Existing attribute sets such as GILS and STAS may also benefit from revision in light of the new architecture.

However, while there is a clear need to proceed with these initiatives, there is also a need to do so in a coordinated fashion. The definition and scope of one attribute set may have direct impact on the requirements for another. Also, as there is no prior experience or precedent for using the new attribute architecture, guidelines and best practices will have to be worked out within the community of Z39.50 users.

The NISO meeting on March 3 brought together technical implementers and practitioners with functional interest in Z39.50 retrieval.

The goal of the meeting was to establish a common understanding of the new attribute architecture and to plan a coordinated approach to attribute set definition. Specific questions to be addressed included:

  • what possibilities does the new attribute architecture open up, and what constraints does it impose?
  • what specific attribute sets can we determine would be immediately useful?
  • how would these relate to each other in such a way that maximizes utility and minimizes redundancy?
  • under whose auspices should these sets be developed and maintained?
  • should any or all attribute sets become official standards, and under what structure?
  • how do we ensure coordinated development among these different communities
Joel Baron, the chair of the NISO Board, opened the meeting by setting the context for the problems and for what was being attempted for the one day meeting. He described some of the problems as he saw them, and some of the outcomes he hoped might be accomplished by the end of the day.

Context for the new attribute set architecture
Baron was followed by Clifford Lynch, the chair of the ZIG group that had developed the attribute architecture document. Lynch discussed the document and provided some context for it. He reviewed the history of Z39.50 and how the protocol developed, some of the problems with current attribute sets including maintenance, management and responsibility issues as well as some of the political context for the architecture document including dealing with installed user base and migration issues. He also discussed the ZIG itself and how its growth and development made necessary the development of an attribute architecture. In the early days of Z39.50 the ZIG was a more unified group focusing on the use of Z39.50 in a bibliographic context. As the use of Z39.50 expanded to other communities and the resultant expansion of the membership of the ZIG to include those communities, it became clear that the ZIG did not have the expertise to develop attribute sets for all of the communities that wished to use Z39.50. A consensus emerged in the ZIG that it should be responsible for protocol development and that, except for attribute sets directly related to the operation of the protocol, attribute set development should be left to the communities that were going to use them. There came a recognition that the ZIG needed to develop guidelines and procedures for those communities so that attribute sets could be developed in a systematic manner. This led to the development of the attribute architecture document (http://lcweb.loc.gov/z3950/agency/orlando/output/attrarch.htmll).

Lynch also discussed some of the problems with current attribute sets including the problems of lack of consistent semantics and the problems that occur when new attribute sets embed attributes from other sets and change semantics. Under version 2 of Z39.50 only one attribute set could be used in a query. Thus communities that wanted to make use of attributes from other sets needed to embed those attributes in their sets. This lead to synchronization problems as various attribute sets developed. Under version 3 multiple attribute sets per query are permitted. But there are problems in dealing with queries that may use multiple attribute sets with conflicting semantics. All of these issues were further motivating points for the development of an attribute set architecture.

Lynch outlined the working assumptions of the document:

  • it is designed for use with Version 3 of the protocol
  • it does not attempt to handle version 2; it will take time to phase-in as current v2 implementations upgrade, and
  • the architecture was designed to fit within the current protocol definition without any changes needed to that definition, specifically the current query type.
Lynch also discussed the bibliographic attribute set Bib-1. Lynch noted that there is wide consensus in the Z39.50 community that Bib-1 has major interoperability problems and there is a need for a new attribute set to replace it. Such an attribute set would be developed by the bibliographic community within the context of the new architecture definition. The migration path would be to leave the current Bib-1 alone and migrate to a new set. Bib-1 would continue to be extended as needed to handle short term needs. He also discussed some of the concerns raised by the international community on the process by which such an attribute set would be developed and the need for wide international participation in such a process. (Issues of how to get international participation in attribute set development in general are discussed later on in the meeting).

Finally, Lynch reviewed the architecture itself, and described how it defined a template of attribute types and defined the interaction among those types. Communities developing attribute sets would pick the types defined by the architecture that were meaningful for the data types and applications for which they were going to use Z39.50. He discussed some of the controversial issues in the document including the distinction between access points and database fields, the somewhat arbitrary classification of the attribute types into 8 facets, and the strong use of datatyping. He also mentioned the document recommends neutral names for attribute sets in order to avoid the political issues of which attribute sets are "better" than others.

There was a question of how this architecture would work with any new versions of the protocol and a decision was made to highlight anything in the document that could be a potential problem as the protocol evolved.

This was followed by a discussion as to whether the vendor community would adopt this new architecture and attribute sets developed under it, and also if this new architecture would help solve some of the current interoperability problems. Joel Baron said the NISO Board’s vendor relations committee might be a vehicle to address the vendor support.

On the interoperability issue there was discussion as to whether what was really needed was:

  • one universal attribute set that encompassed all of the attributes developed by all communities using Z39.50; or
  • a series of modular attribute sets that could be combined in queries as envisioned by the architecture document.
Other concerns expressed during this discussion included perceptions about Z39.50 instability by introducing a new attribute architecture, and some desire to also fix some perceived problems in the protocol itself, especially the type 1 query at the same time this new architecture was being introduced. There was also general discussion on whether there had been enough ZIG and other international participation in the development process for the architecture.

Existing and proposed attribute sets
This was followed by a review of existing and proposed attribute sets. Ray Denenberg from LC gave a brief overview of attribute sets, both existing and proposed. He classified existing sets as falling into three classes: Bibliographic, Profile/Application Support, and Protocol Support. Currently proposed new attribute sets were in two classes: Bibliographic, and Architecture Support. This introduction was followed by short presentations on many of the attribute sets:

Bib-1: Lennie Stovel of RLG gave a short presentation on the Bib-1 attribute set. She discussed the various attribute types in the set, gave some details on its history, measurement of how the various types have grown over time, and showed how Bib-1 related to the new architecture. She concluded with some of the problems with Bib-1 including unclear semantics, undocumented usage, overloading of meaning, unclear relationship to other sets, and lack of guidelines for extension.

STAS (Scientific and Technical Attribute Set): Les Wibberley (Chemical Abstracts Service) reviewed STAS which is both an attribute set and tag set designed to be used with scientific and technical data. STAS was specifically designed with an emphasis on precision and is intended to handle data that can not be effectively searched with Bib-1. However, it is a superset of Bib-1 and incorporates the Bib-1 attributes.

GILS (Government Information Locator System): Eliot Christian (U.S. Geological Survey) described the GILS profile and the attribute set defined in it and discussed its history and evolution from something that initially focused on numeric and earth science data to a system that can be used for accessing government information. He discussed some of its recent developments including use in the G7 and by state and local governments and how it has been crossfed with work of the geospatial community. GILS also incorporates Bib-1.

CIMI (Consortium for the Interchange of Museum Information): Bill Moen (University of North Texas) described the CIMI profile and attribute set which is a Z39.50 profile for providing access to museum information. He described the goal of CIMI was to explore new attribute structure beyond what was available in Bib-1 and to incorporate other work going on in the museum community. He also described interoperability testbed that was run to test out portions of the profile. CIMI also incorporates attributes from Bib-1. CIMI also has been and will be continuing to focus on the area of cross domain searching

GEO/CIP Profiles - Doug Nebert (U.S. Geological Survey) and Lou Riech (NASA) reported on this work that focuses on geospatial data. The CIP profile also focuses on interoperability among satellite imaging systems and the major space agencies. It was noted that efforts have been made to harmonize these profiles and identify overlapping attributes.

Two potential new attribute sets were also described:

Dublin Core: The Dublin Core (DC) is a core set of metadata elements with very loosely defined semantics intended to be able to interoperate among a wide range of communities. Ralph LeVan discussed some of the ways that have been proposed in the ZIG to define Dublin Core attributes both by defining a new attribute set for version 3 and by incorporating them into Bib-1 for version 2 systems. The initial proposal was to define those new DC elements that do not exist in Bib-1 and to use the Bib-1 attributes that duplicated DC elements. This was rejected because of the loose semantics of DC. Current efforts are focusing on adding all DC elements to Bib-1 with separate enumerations. This brings up the issue of what to do about other attribute sets that have already inherited Bib-1 or have defined DC already.

MARC: - Larry Dixson (Library of Congress) described this as an attribute set that defines mechanisms for searching directly with MARC tags and subfields. That is, instead of referring to a use attribute of title, a query can refer directly to a MARC tag with value "245". It also embeds Bib-1 so that queries can refer to more generic elements when appropriate. The MARC attribute set has been defined but not yet implemented.

Other attribute sets were also briefly described. These included:

  • Explain and Extended Services sets defined in the Z39.50 profile
  • ZDSR profile for simple distributed search and retrieval (while the ZDSR profile is no longer in use, some of its concepts such as language, weight, and ranking have been reflected in the architecture document)
  • An attribute set for legislative data developed by LC
  • An attribute set defined in the Digital Collections profile for navigating among collections; and,
  • Two attributes sets called for in the attribute architecture documents. These are an attribute set of basic use attributes for widespread interoperability among Z39.50 systems developed by different communities, and an attribute set that contain basic functions such as commonly used operators and mechanisms for query management.
Much of the rest of the meeting discussion focused on those two sets and the proposed new attribute set for bibliographic use to replace Bib-1.

One of the goals of the presentation and description of the various attribute sets was to give participants a common understanding of what attribute sets exist, how they were developed, some of the problems that are common to all current attribute sets, and those problems that are attribute set specific. The presentations underscored the problems that have developed without an overriding architecture, specifically the problem of name space clashes that have occurred with multiple independently developed sets all attempting to reuse all or portions of a single set, Bib-1, which in itself has serious design problems. This was intended to set the framework for the rest of the day's discussions.

Major issues in attribute set development

  • Lennie Stovel then led a general discussion on attribute set issues. She asked the participants to identify those issues they thought needed to be addressed, both issues in general and specific issues that were brought out by the previous presentations. Among the issues raised were:
  • Need for formal statement of what it was that was attempted to be accomplished by this process
  • How do attribute sets interrelate
  • Developing procedures for authorizing attribute set development - how to manage and coordinate development process
  • How to get broader participation in the discussion of the issues
  • How many attribute sets are needed - universal versus modular architecture
  • Version 2 versus Version 3 issues with the architecture - what are requirements for backward compatibility
  • Operational issues of multiple attribute sets
  • How to extend the architecture itself if new types are needed
  • How does Z39.50 attribute work relate to other activities such as EDI, SQL, XML, RDF - and how could Z39.50 attribute sets be used in other contexts
  • Need for guidelines for developers
  • Potential need for registry of implementors of an attribute set to propagate notification of changes
  • Utility attribute sets called for in architecture document - what should they contain
There was some further discussion on whether there should be one universal attribute set or a modular architecture encompassing many sets. Due to the differing needs of multiple communities (where some communities might define attributes for a specific context) and because of the difficulties in maintaining a universal set, there was consensus in support of the modular architecture model. However, because of the large installed base of Version 2 implementations, some attendees favored the development of a 'virtual' attribute set that would encompass the others and be Version 2 compatible.

Basic use attributes and functions
Much of the rest of the discussion of the meeting revolved around the two attribute sets called for in the architecture document. The discussion started out with the attribute sets as defined in the document--a basic set of use attributes and a set of basic functions. These became known in the discussion as the utility set and the basic function set. The original idea for the utility set was to pre-define a group of very commonly used use attributes; all other attribute sets would then not have to embed these attributes, as queries could refer to the utility set. As discussion progressed it became clear that most participants concluded that the utility set as originally proposed would be very small and not very useful. However, there was substantial agreement that an attribute set specifically designed for cross domain searching was needed.

This lead to a discussion of whether or not the Dublin Core should be used for a cross-domain set, or, if not, what the relationship should be to the Dublin Core. There was some sentiment that it should not be the Dublin Core because of problems of ownership, while other sentiment felt that there had been a lot of international effort involved in developing DC and strong justification would be needed as to why the DC was not applicable. There was strong consensus that what was needed was a rigorous scope statement that would define exactly what the scope and purpose of such an attribute set would be. There was also discussion about whether this set would be just use attributes or have other attribute types as well, what its relationship to the utility set was (whether there should be one set or two since it seemed there were interrelationships), should the utility set only consist of non use attributes or were there a small set of functional use attributes that belonged in it but not in the cross domain set, and what should the process be by which such sets would be developed. There was also some discussion of who would own these sets with the thought that the Z39.50 Maintenance Agency would probably be responsible for maintaining them.

A rough but not unanimous consensus evolved that:

  • an attribute set for cross-domain searching should be developed
  • that it should not be the Dublin Core but rather be informed by the DC and make use of DC elements where appropriate
  • that development would not be a ZIG activity since it was not protocol related (although certainly individuals who also participate in the ZIG might be involved in the development effort), and
  • that probably the most important thing that was needed at this point in the process was a strong scope statement to define the task.
During the discussion, the idea of a basic functions set evolved to what became known as a mechanical set with the thought that it may or may not include some use attributes. The initial thought was that the mechanical set could be developed by the ZIG, while use attribute sets should be developed by communities of users. However, there was some sentiment that there were enough interdependencies between what you could search and how you could search, that there should be some intersection of developers among the two sets. This was not resolved but noted as an outstanding issue.

There was also some discussion about what mechanisms and procedures would be needed if it became necessary to add new types to attribute types defined in the architecture document. The architecture group attempted to include all types that would be useful, and in fact surveyed all known existing attribute sets to make sure the architecture could handle the capabilities developers needed. The consensus arose that it was probably premature to deal with this issue, but that any new attribute type added would have profound implications for pre-existing sets developed under the architecture and thus should only proceed after wide community discussion.

There was also discussion of the replacement bibliographic attribute set. (This set was referred to for convenience as Bib-2. However, there was consensus that Bib-2 would not be its name in order to avoid the implication that it was of lesser value than Bib-1). The consensus was that this was just another application domain specific set and that ideally its overlap with the cross domain set would be minimal. There was discussion of timing and the need for cross domain and mechanical sets to be in place before Bib-2 (and other sets) were developed to avoid potential duplication - or at least for developers of new application specific sets to be aware of what was happening with those two sets to help inform their work.

There was a general consensus that policies and procedures for attribute sets were important and would need to be developed. These would include such things as who was allowed to register attribute sets that were publicly available, what type of vetting an attribute set needed to have to ensure compliance with the architecture before it could be registered, what guarantees of maintenance and support were required, and related issues along these lines.

Next steps
Most of the remaining discussion revolved around procedural issues and how to best ensure wide participation, both in the development of Bib-2 and also the cross domain and mechanical attribute sets. Ideas included a workshop where participants can present position papers, holding electronic discussions and ensuring that future meetings were held in conjunction with ZIG meetings and/or other events that were likely to draw wide international representation. There was also a sense that the types of discussions that were held during this meeting were needed with other individuals who were not able to be present and that a repeat of these discussions, perhaps in a European venue, might be needed. Cross domain searching came up as a possible topic for a future meeting. One idea that came up for the development of Bib-2 was that it would be done as a NISO standards activity with wide international participation and the WWW Consortium model might be used to allow members of the committee, even if they did not represent NISO voting members, to have a veto vote in the committee on any standard developed before it went out to the NISO voting members. Another suggestion made was to ask NISO to propose solutions and mechanisms for how some of these issues should be addressed.

To define next action steps, Priscilla Caplan, as the chair of the NISO Standards Development Committee, will organize a conference call with the participants who volunteered to discuss concrete mechanisms for moving the process forward. The following persons asked to be included in the conference call: Ray Denenberg, Mark Hinnebusch, Ralph Levan, Cliff Lynch, Mark Needleman, Lennie Stovel.

In sum, the discussions revealed that there are still a lot of issues, both technical and procedural, remaining to be resolved. Many ideas and viewpoints emerged during the day, and there was a recognition by the participants that this was the beginning of what will be a larger process with broader participation that will be needed to bring many of the issues discussed to fruition.

Persons Attending the March 3, 1998 Attribute Set Meeting:
Joel H Baron
Priscilla Caplan
Eliot Christian
Ray Denenberg
Larry Dixson
Eric G. Ferrin
Michael Fox
Patricia Harris
Janet Hylton
Ralph LeVan
Clifford Lynch
William E. Moen
Nassib Nassar
Doug Nebert
Mark H. Needleman
Sara Randall
Lou Reich
Mackenzie Smith
Lennie Stovel
Fay Turner
Les Wibberley