Dagobert Soergel
College of Information Studies, University of Maryland

Data models for an integrated thesaurus database

This paper presents two data models for storing multiple thesauri in a single integrated database to be used as an aid to searchers in multi-database searching, for the construction of conversion tables between thesauri, and as a tool for constructing and maintaining individual thesauri. The paper first describes the nature of thesaurus data and a relational data structure for such data, which is flexible and ?through its use of term numbers in recording relationships ?economical in storage. It then describes two data models for structuring an integrated thesaurus database. In both models, general data on terms and relationships are stored once, with indication of one or more sources, resulting in storage economy. The term-based model stores all relationships as relationships between terms. This is flexible but redundant: If the same concept relationship is expressed through different terms in different thesauri, it is stored multiple times in the integrated database. The concept-based model identifies concepts by concept numbers and uses these concept numbers to record concept relationships, thus bringing together all occurrences of the same concept relationship regardless of the terms used to express the related concepts. This results in more compact storage but is less flexible.
