Format for Exchange of Thesaurus Data Conforming to ISO 25964-1
Schema (current version is 1.4)
Schema Documentation (downloadable zip file containing the HTML documentation)
This webpage enables access to an XML schema associated with ISO 25964-1, Thesauri and interoperability with other vocabularies – Part 1: Thesauri for information retrieval. Clause 15 of the standard presents a data model which is intended to guide the format used for exchanging thesaurus data. The associated XML schema may be used for transmitting a whole thesaurus or portions of a thesaurus. More information about ISO 25964 is on the homepage for the standard.
The approach that has been chosen is a “flat” schema (rather than a “nested” schema).
- The schema specifies the XML root element (ISO25964Interchange), then it defines most of the classes in the UML model (see Figure 15 in clause 15) as top-level elements within this "wrapper". The one exception is Thesaurus, which includes three flat lists representing the classes: ThesaurusConcept, ConceptGroup, and ThesaurusArray, and having the relationship "contains" with the thesaurus. Thus the thesaurus actually "contains" these classes in the XML schema. All other elements are declared as children of the "wrapper" element, i.e. they float as level-1 elements in the XML document. Identifiers are used to stitch all elements together.
- A concept's terms are declared as child elements of the concept to which they belong. Compound equivalence is defined separately. The advantage of this approach is a separation of the thesaurus structures from the XML structure used to represent them. Because XML is hierarchical, and thesauri are hierarchical, it is often assumed that thesaurus hierarchies ought to be represented as deeply-nested XML. However, this assumption is only valid in the simplest cases, and is broken by polyhierarchical links and partial thesauri.
- By providing all the links via reference (using identifiers), rather than via structure (nesting), the schema enables fragments of thesauri to be exchanged. Furthermore it represents polyhierarchies without unnecessary repetition.
- Accompanying the schema is a package of documentation in HTML format and a test document, “Serialization Example 1, illustrating diverse features to show how it works in practice. This document is made up of many examples of features from different thesauri. It is not designed for indexing a realistic domain, but illustrates how to handle features of the data model.
It is hoped to extend the schema if there is sufficient interest from the user community. At least two types of extension are envisaged:
- Adding referential integrity constraints when the schema is used for exchange of a "complete" thesaurus.
- Adding (optional) attributes to the schema so as to exchange only the update of a thesaurus. (Typically, such an attribute could take the values: removed, new, modified, and unchanged.)