Report on the NISO Attribute Set Architecture Meeting
March 3, 1998
Washington, DC
Background
On March 3, 1998 NISO held an invitational meeting in Washington, DC to
discuss the array of issues surrounding the development and maintenance of
attribute sets for the Z39.50 protocol (ANSI/NISO Z39.50). In Z39.50,
attribute sets define the access points and operators by which searches can
be conducted. Several attribute sets have been developed and others are in
the development process. In Version 3 (1995) of Z39.50, the protocol allows
a single query to reference two or more attribute sets, although few
implementations appear to be taking advantage of this capability. The Z39.50
Implementors Group (ZIG) recognized last year that there was a need to
develop an architecture that would give guidance to communities that were
developing attribute sets for Z39.50. This architecture would define such
things as what types of attributes could go into attribute sets, what the
rules were for combining various elements in attribute sets, and how the
various attribute sets that were developed by different communities related
to each other. A subgroup of the ZIG developed a Z39.50 attribute set
architecture. That document can be found at:
http://lcweb.loc.gov/z3950/agency/orlando/output/attrarch.htmll
The invitational meeting came about as a result of that document and the desire by NISO to move forward the process by which attribute set development could be furthered. It was recognized at that meeting that this would need to be an ongoing process of which the March 3rd meeting was only the beginning.
The invitation to the meeting stated:
NISO is aware of interest in new attribute sets to address retrieval of a number of different types of metadata, including archival finding aids, MARC bibliographic records, and the Dublin Core. Concurrently there is an acknowledged need for a revision of "bib-1" to better address bibliographic data generally. Existing attribute sets such as GILS and STAS may also benefit from revision in light of the new architecture.
However, while there is a clear need to proceed with these initiatives, there is also a need to do so in a coordinated fashion. The definition and scope of one attribute set may have direct impact on the requirements for another. Also, as there is no prior experience or precedent for using the new attribute architecture, guidelines and best practices will have to be worked out within the community of Z39.50 users.
The NISO meeting on March 3 brought together technical implementers and practitioners with functional interest in Z39.50 retrieval.
The goal of the meeting was to establish a common understanding of the new attribute architecture and to plan a coordinated approach to attribute set definition. Specific questions to be addressed included:
- what possibilities does the new attribute
architecture open up, and what constraints does it impose?
- what specific attribute sets can we
determine would be immediately useful?
- how would these relate to each other in
such a way that maximizes utility and minimizes redundancy?
- under whose auspices should these sets be
developed and maintained?
- should any or all attribute sets become
official standards, and under what structure?
- how do we ensure coordinated development among these different communities
Context for the new attribute set
architecture
Baron was followed by Clifford Lynch, the chair of the ZIG group that had
developed the attribute architecture document. Lynch discussed the document
and provided some context for it. He reviewed the history of Z39.50 and how
the protocol developed, some of the problems with current attribute sets
including maintenance, management and responsibility issues as well as some
of the political context for the architecture document including dealing
with installed user base and migration issues. He also discussed the ZIG
itself and how its growth and development made necessary the development of
an attribute architecture. In the early days of Z39.50 the ZIG was a more
unified group focusing on the use of Z39.50 in a bibliographic context. As
the use of Z39.50 expanded to other communities and the resultant expansion
of the membership of the ZIG to include those communities, it became clear
that the ZIG did not have the expertise to develop attribute sets for all of
the communities that wished to use Z39.50. A consensus emerged in the ZIG
that it should be responsible for protocol development and that, except for
attribute sets directly related to the operation of the protocol, attribute
set development should be left to the communities that were going to use
them. There came a recognition that the ZIG needed to develop guidelines and
procedures for those communities so that attribute sets could be developed
in a systematic manner. This led to the development of the attribute
architecture document (http://lcweb.loc.gov/z3950/agency/orlando/output/attrarch.htmll).
Lynch also discussed some of the problems with current attribute sets including the problems of lack of consistent semantics and the problems that occur when new attribute sets embed attributes from other sets and change semantics. Under version 2 of Z39.50 only one attribute set could be used in a query. Thus communities that wanted to make use of attributes from other sets needed to embed those attributes in their sets. This lead to synchronization problems as various attribute sets developed. Under version 3 multiple attribute sets per query are permitted. But there are problems in dealing with queries that may use multiple attribute sets with conflicting semantics. All of these issues were further motivating points for the development of an attribute set architecture.
Lynch outlined the working assumptions of the document:
- it is designed for use with Version 3 of
the protocol
- it does not attempt to handle version 2;
it will take time to phase-in as current v2 implementations upgrade,
and
- the architecture was designed to fit within the current protocol definition without any changes needed to that definition, specifically the current query type.
Finally, Lynch reviewed the architecture itself, and described how it defined a template of attribute types and defined the interaction among those types. Communities developing attribute sets would pick the types defined by the architecture that were meaningful for the data types and applications for which they were going to use Z39.50. He discussed some of the controversial issues in the document including the distinction between access points and database fields, the somewhat arbitrary classification of the attribute types into 8 facets, and the strong use of datatyping. He also mentioned the document recommends neutral names for attribute sets in order to avoid the political issues of which attribute sets are "better" than others.
There was a question of how this architecture would work with any new versions of the protocol and a decision was made to highlight anything in the document that could be a potential problem as the protocol evolved.
This was followed by a discussion as to whether the vendor community would adopt this new architecture and attribute sets developed under it, and also if this new architecture would help solve some of the current interoperability problems. Joel Baron said the NISO BoardÕs vendor relations committee might be a vehicle to address the vendor support.
On the interoperability issue there was discussion as to whether what was really needed was:
- one universal attribute set that encompassed all of the attributes developed by all communities using Z39.50; or
- a series of modular attribute sets that could be combined in queries as envisioned by the architecture document.
Existing and proposed attribute
sets
This was followed by a review of existing and proposed attribute sets. Ray
Denenberg from LC gave a brief overview of attribute sets, both existing and
proposed. He classified existing sets as falling into three classes:
Bibliographic, Profile/Application Support, and Protocol Support. Currently
proposed new attribute sets were in two classes: Bibliographic, and
Architecture Support. This introduction was followed by short presentations
on many of the attribute sets:
Bib-1: Lennie Stovel of RLG gave a short presentation on the Bib-1 attribute set. She discussed the various attribute types in the set, gave some details on its history, measurement of how the various types have grown over time, and showed how Bib-1 related to the new architecture. She concluded with some of the problems with Bib-1 including unclear semantics, undocumented usage, overloading of meaning, unclear relationship to other sets, and lack of guidelines for extension.
STAS (Scientific and Technical Attribute Set): Les Wibberley (Chemical Abstracts Service) reviewed STAS which is both an attribute set and tag set designed to be used with scientific and technical data. STAS was specifically designed with an emphasis on precision and is intended to handle data that can not be effectively searched with Bib-1. However, it is a superset of Bib-1 and incorporates the Bib-1 attributes.
GILS (Government Information Locator System): Eliot Christian (U.S. Geological Survey) described the GILS profile and the attribute set defined in it and discussed its history and evolution from something that initially focused on numeric and earth science data to a system that can be used for accessing government information. He discussed some of its recent developments including use in the G7 and by state and local governments and how it has been crossfed with work of the geospatial community. GILS also incorporates Bib-1.
CIMI (Consortium for the Interchange of Museum Information): Bill Moen (University of North Texas) described the CIMI profile and attribute set which is a Z39.50 profile for providing access to museum information. He described the goal of CIMI was to explore new attribute structure beyond what was available in Bib-1 and to incorporate other work going on in the museum community. He also described interoperability testbed that was run to test out portions of the profile. CIMI also incorporates attributes from Bib-1. CIMI also has been and will be continuing to focus on the area of cross domain searching
GEO/CIP Profiles - Doug Nebert (U.S. Geological Survey) and Lou Riech (NASA) reported on this work that focuses on geospatial data. The CIP profile also focuses on interoperability among satellite imaging systems and the major space agencies. It was noted that efforts have been made to harmonize these profiles and identify overlapping attributes.
Two potential new attribute sets were also described:
Dublin Core: The Dublin Core (DC) is a core set of metadata elements with very loosely defined semantics intended to be able to interoperate among a wide range of communities. Ralph LeVan discussed some of the ways that have been proposed in the ZIG to define Dublin Core attributes both by defining a new attribute set for version 3 and by incorporating them into Bib-1 for version 2 systems. The initial proposal was to define those new DC elements that do not exist in Bib-1 and to use the Bib-1 attributes that duplicated DC elements. This was rejected because of the loose semantics of DC. Current efforts are focusing on adding all DC elements to Bib-1 with separate enumerations. This brings up the issue of what to do about other attribute sets that have already inherited Bib-1 or have defined DC already.
MARC: - Larry Dixson (Library of Congress) described this as an attribute set that defines mechanisms for searching directly with MARC tags and subfields. That is, instead of referring to a use attribute of title, a query can refer directly to a MARC tag with value "245". It also embeds Bib-1 so that queries can refer to more generic elements when appropriate. The MARC attribute set has been defined but not yet implemented.
Other attribute sets were also briefly
described. These included:
- Explain and Extended Services sets
defined in the Z39.50 profile
- ZDSR profile for simple distributed
search and retrieval (while the ZDSR profile is no longer in use, some of
its concepts such as language, weight, and ranking have been reflected in
the architecture document)
- An attribute set for legislative data
developed by LC
- An attribute set defined in the Digital
Collections profile for navigating among collections; and,
- Two attributes sets called for in the attribute architecture documents. These are an attribute set of basic use attributes for widespread interoperability among Z39.50 systems developed by different communities, and an attribute set that contain basic functions such as commonly used operators and mechanisms for query management.
One of the goals of the presentation and description of the various attribute sets was to give participants a common understanding of what attribute sets exist, how they were developed, some of the problems that are common to all current attribute sets, and those problems that are attribute set specific. The presentations underscored the problems that have developed without an overriding architecture, specifically the problem of name space clashes that have occurred with multiple independently developed sets all attempting to reuse all or portions of a single set, Bib-1, which in itself has serious design problems. This was intended to set the framework for the rest of the day's discussions.
Major issues in attribute set development
- Lennie Stovel then led a general discussion on attribute set issues. She asked the participants to identify those issues they thought needed to be addressed, both issues in general and specific issues that were brought out by the previous presentations. Among the issues raised were:
- Need for formal statement of what it was that was attempted to be accomplished by this process
- How do attribute sets interrelate
- Developing procedures for authorizing attribute set development - how to manage and coordinate development process
- How to get broader participation in the discussion of the issues
- How many attribute sets are needed - universal versus modular architecture
- Version 2 versus Version 3 issues with the architecture - what are requirements for backward compatibility
- Operational issues of multiple attribute sets
- How to extend the architecture itself if new types are needed
- How does Z39.50 attribute work relate to other activities such as EDI, SQL, XML, RDF - and how could Z39.50 attribute sets be used in other contexts
- Need for guidelines for developers
- Potential need for registry of implementors of an attribute set to propagate notification of changes
- Utility attribute sets called for in architecture document - what should they contain
Basic use attributes and
functions
Much of the rest of the discussion of the meeting revolved around the two
attribute sets called for in the architecture document. The discussion
started out with the attribute sets as defined in the document--a basic set
of use attributes and a set of basic functions. These became known in the
discussion as the utility set and the basic function set. The
original idea for the utility set was to pre-define a group of very commonly
used use attributes; all other attribute sets would then not have to embed
these attributes, as queries could refer to the utility set. As discussion
progressed it became clear that most participants concluded that the utility
set as originally proposed would be very small and not very useful. However,
there was substantial agreement that an attribute set specifically designed
for cross domain searching was needed.
This lead to a discussion of whether or not the Dublin Core should be used for a cross-domain set, or, if not, what the relationship should be to the Dublin Core. There was some sentiment that it should not be the Dublin Core because of problems of ownership, while other sentiment felt that there had been a lot of international effort involved in developing DC and strong justification would be needed as to why the DC was not applicable. There was strong consensus that what was needed was a rigorous scope statement that would define exactly what the scope and purpose of such an attribute set would be. There was also discussion about whether this set would be just use attributes or have other attribute types as well, what its relationship to the utility set was (whether there should be one set or two since it seemed there were interrelationships), should the utility set only consist of non use attributes or were there a small set of functional use attributes that belonged in it but not in the cross domain set, and what should the process be by which such sets would be developed. There was also some discussion of who would own these sets with the thought that the Z39.50 Maintenance Agency would probably be responsible for maintaining them.
A rough but not unanimous consensus evolved that:
- an attribute set for cross-domain searching should be developed
- that it should not be the Dublin Core but rather be informed by the DC and make use of DC elements where appropriate
- that development would not be a ZIG activity since it was not protocol related (although certainly individuals who also participate in the ZIG might be involved in the development effort), and
- that probably the most important thing that was needed at this point in the process was a strong scope statement to define the task.
There was also some discussion about what mechanisms and procedures would be needed if it became necessary to add new types to attribute types defined in the architecture document. The architecture group attempted to include all types that would be useful, and in fact surveyed all known existing attribute sets to make sure the architecture could handle the capabilities developers needed. The consensus arose that it was probably premature to deal with this issue, but that any new attribute type added would have profound implications for pre-existing sets developed under the architecture and thus should only proceed after wide community discussion.
There was also discussion of the replacement bibliographic attribute set. (This set was referred to for convenience as Bib-2. However, there was consensus that Bib-2 would not be its name in order to avoid the implication that it was of lesser value than Bib-1). The consensus was that this was just another application domain specific set and that ideally its overlap with the cross domain set would be minimal. There was discussion of timing and the need for cross domain and mechanical sets to be in place before Bib-2 (and other sets) were developed to avoid potential duplication - or at least for developers of new application specific sets to be aware of what was happening with those two sets to help inform their work.
There was a general consensus that policies and procedures for attribute sets were important and would need to be developed. These would include such things as who was allowed to register attribute sets that were publicly available, what type of vetting an attribute set needed to have to ensure compliance with the architecture before it could be registered, what guarantees of maintenance and support were required, and related issues along these lines.
Next steps
Most of the remaining discussion revolved around procedural issues and how
to best ensure wide participation, both in the development of Bib-2 and also
the cross domain and mechanical attribute sets. Ideas included a workshop
where participants can present position papers, holding electronic
discussions and ensuring that future meetings were held in conjunction with
ZIG meetings and/or other events that were likely to draw wide international
representation. There was also a sense that the types of discussions that
were held during this meeting were needed with other individuals who were
not able to be present and that a repeat of these discussions, perhaps in a
European venue, might be needed. Cross domain searching came up as a
possible topic for a future meeting. One idea that came up for the
development of Bib-2 was that it would be done as a NISO standards activity
with wide international participation and the WWW Consortium model might be
used to allow members of the committee, even if they did not represent NISO
voting members, to have a veto vote in the committee on any standard
developed before it went out to the NISO voting members. Another suggestion
made was to ask NISO to propose solutions and mechanisms for how some of
these issues should be addressed.
To define next action steps, Priscilla Caplan, as the chair of the NISO Standards Development Committee, will organize a conference call with the participants who volunteered to discuss concrete mechanisms for moving the process forward. The following persons asked to be included in the conference call: Ray Denenberg, Mark Hinnebusch, Ralph Levan, Cliff Lynch, Mark Needleman, Lennie Stovel.
In sum, the discussions revealed that there are still a lot of issues, both technical and procedural, remaining to be resolved. Many ideas and viewpoints emerged during the day, and there was a recognition by the participants that this was the beginning of what will be a larger process with broader participation that will be needed to bring many of the issues discussed to fruition.
| Joel H Baron Priscilla Caplan Eliot Christian Ray Denenberg Larry Dixson Eric G. Ferrin Michael Fox Patricia Harris Janet Hylton Ralph LeVan Clifford Lynch |
William E. Moen Nassib Nassar Doug Nebert Mark H. Needleman Sara Randall Lou Reich Mackenzie Smith Lennie Stovel Fay Turner Les Wibberley |
