ISO 25964-1, Thesauri and interoperability with other vocabularies – Part 1: Thesauri for information retrieval
Draft Format for exchange of thesaurus data conforming to ISO 25964-1
Overview
This site shows development work associated with ISO/DIS 25964-1, Thesauri and interoperability with other vocabularies Part 1: Thesauri for information retrieval.
Clause 15 of the draft standard presents a data model which is intended to guide the format used for exchanging thesaurus data. On this webpage you can find an XML schema developed for this purpose. The schema may be used for transmitting a whole thesaurus or portions of a thesaurus.
The present version is a DRAFT, showing a “flat” schema (rather than a “nested” schema, which would be another possible approach). Accompanying it is a test document illustrating how it works in practice.
The developer explains as follows:
- What I have done is to provide a "wrapper" element (ISO25964Interchange) around the whole document, then defined most of the classes in the UML model as top-level elements within this "wrapper".
- The one exception is Thesaurus, where I have directly included the three classes (Concept, ConceptGroup and Array) which have the relationship "contains". This means that the thesaurus actually "contains" these classes in the XML, which makes sense to me.
- For everything else, I declare the elements as children of the "wrapper" element, i.e. they are all floating about as level-1 elements in the XML document. Then we use identifiers to stitch everything together. In my straw man, this has been taken to the extreme - even a concept's terms will be declared separately from the concept to which they belong.
- A more pragmatic implementation might include terms as subelements within Concept. (However, even there we might run into problems with compound equivalence.)
- The advantage of this approach is that you separate the thesaurus structures you are trying to create from the XML structure you use to represent them. There is a seductive fallacy that XML is hierarchical, and thesauri are hierarchical, so you can just represent the thesaurus hierarchy as deeply-nested XML. This assumption is only true in the simplest possible case, and is broken by polyhierarchical links and partial thesauri, for a start.
- If all the links are done by reference (using identifiers), rather than by structure (nesting), you gain the ability to transfer fragments of thesauri. You can also represent polyhierarchies without gratuitous repetition of data.
Feedback is now sought on the questions raised in the above bullet points, and/or any other features of the schema. To send feedback, go to the comments page for the schema and click on the "Add Comment" button. From this page, you can view the comments of others as well as the status of each comment as the Working Group reviews them.
ISO 25694-1 Standard
This standard is currently a Draft International Standard; it has not yet been approved as an ISO standard publication. The ballot for the current stage of the standard ends on March 26, 2010.
The British Standards Institute is making the draft standard available for public comment on their website. You must first register at the site to receive access.
By registering at this site you can view all the clauses in the draft, and submit comments on them. In clause 15 you can find the data model underpinning the XML schema, with explanatory notes.
Schema
ISO 25964-1 XML Schema v0.5 (draft to go with ISO/DIS 25964-1 dated 2009-09-18)
Test Document
ISO 25964-1 Test Document (using draft schema version 0.5 dated 2009-09-18)
