This application began with the American Geographical Society Map Catalog Subject Entries and the U.S. Bureau of Reclamation Thesaurus of Water Resource Terms. The Theme Keyword Thesaurus for Geospatial Data is presented in association with geospatial metadata and bibliographic description of GIS data and other information in a variety of formats available from the USBR Mid Pacific Region server. Sources for terms included in the thesaurus are identified. URLs and links are provided when possible. Intended to be flexible and hospitable, the thesaurus may be extended by key terms from the thesauri (glossaries, definitions of terms, etc.) of the disciplines involved in the development of datasets and metadata at USBR.
The capture of a theme keyword list for digital or nondigital data is part of the preliminary documentation. Capture of keywords provides a subset of the thesaurus which becomes a list of keywords used in metadata for Mid Pacific geospatial datasets. At any time, the theme keyword list for a GIS dataset may be included as part of fully compliant metadata documentation. Assignment of key words may be done either in ArcView or ArcInfo. Searches using the theme keyword index are done in ArcView, or ArcInfo or via World Wide Web browsers such as Netscape or Mosaic.
Software: Implementation of the theme keyword thesaurus has been done in HTML, Avenue scripts, and Arc Macro Language (AML).
We have been working on a scheme to identify the thematic or subject content of GIS datasets and other information generated by USBR Mid Pacific Region that will comply with the keyword thesaurus requirements of the FGDC Content Standards for Geospatial Metadata with as little pain as possible. Compliance with the Content Standard has been hampered by all kinds of interesting issues, among which theme keywords probably don't get top priority for practitioners of GIS. Still, it has to be done.
Our thesaurus of theme or subject keywords is intended to be used to describe digital datasets and written materials created and used at the USBR Mid-Pacific Region. For producers of digital spatial datasets (a.k.a. GIS files) and written materials (reports and surveys), the thesaurus is meant to help in the selection of keywords for abstracting and indexing. Anyone searching for datasets or reports through the USBR Intranet or, eventually, the WWW will be encouraged to use the thesaurus as an aid to understanding the scope of what is available and to select terms to describe what they want. Subject searching is most successful when the query language matches the indexing.
The word is half his that speaks and half his that hears it.
-
Montaigne.
In the beginning was the FGDC Content Standards for Digital Geospatial Metadata, section 1.6, which defines keywords as "words or phrases summarizing an aspect of the dataset" and requires that theme (subject), place, stratum and temporal keywords be supplied in metadata and that these keywords be associated with or "reference[d] to a formally registered thesaurus or a similar authoritative source" (1.6.2.1). There actually is a place to register thesauri or at least deposit them. There are two international clearinghouses; one for English language thesauri at the library of the University of Toronto and one for other languages in Warsaw NISO,1993). We haven't contacted them yet. Instead, I looked for available thesauri in the fields and disciplines represented in datasets produced at the Bureau of Reclamation Mid-Pacific Region offices. The FGDC Content Standard did provide a source of authoritative (and presumably registered) thesauri, the USMARC Code List for Relators, Sources, Description Conventions. That's where I found out that we already had a thesaurus.
In 1971, the Department of the Interior and the Bureau of Reclamation published a Thesaurus of Water Resources Terms: a Collection of Water Resources and Related Terms for Use in Indexing Technical Information which is terrific, except that it's more than 25 years old and the technical information it was meant to index was text, not geospatial digital datasets. The missing elements were, of course, recent and current terminology, but also terms to describe the map-like content of GIS data layers.
Many of the thesauri I looked at included some mapping and/or cartographic terms although none had the GIS specific terminology we needed. The American Geographic Society's Map Catalog Subject Entries, which they no longer use, contains a lot of useful cartographic terms, was available online, and seemed like a reasonable starting place. Using it as a backbone, I've flattened it a bit (more later about hierarchy), cut it a lot, and added to it from a number of more recent sources to begin developing a thesaurus that will, we hope, fulfill the letter and the spirit of the Metadata Content Standard requirements.
Sources used for thesaurus terms so far have been those mentioned above plus:
In our implementation, we have been aware of some impediments to the assignment and use of keywords. The implementation has set the following requirements.
Keyword lists or thesauri are subject to continual modification. There are a variety of lists representing different disciplines or areas of practice. They represent terms used by one or more authorities and are assumed to be related to active terminology for the discipline. As such, it is assumed that a current glossary exists which defines the usage of the terms.
The structure of contributing formal thesauri usually includes and identifies major terms and associated or related terms in a hierarchy. Terms for keyword lists may be selected from any level of hierarchy.
The vocabulary control function of a thesaurus identifies preferred terms and those they replace. Both sides of the relationship are noted. Terms which are controlled out of the vocabulary have a USE or SEE reference to the term that is to be used instead. The preferred term has a UF (Used For) or SEE FROM reference to the term or terms it replaces. Only preferred terms may be selected for use in keyword lists.
The initial implementation of our tool for selecting and reporting keywords for GIS themes is in ArcView. Avenue scripts were prepared to read in ASCII text files of the keyword lists or thesauri and output a reformated comma delimited text file. This text file can then be loaded into INFO or database tables. Any number of lists can be read in as required for different disciplines or to keep the tables current.
For loading into database preferred terms, modifiers of the preferred term, related terms, and terms that are not used are identified by different columns. The AGS Map Catalog Subject Entries was used as a test file. This list has some common characteristics of a thesaurus. There is a hierarchy of major terms, and terms with modifying phrases. Associated terms (See also) are identified. Terms which replace other terms (Used for ) are identified. Figure 1 shows a portion of the keyword list in the original format and as an ASCII file ready for loading into a database table.
As a database table, this list provides the basis for Avenue scripts to generate selected theme keywords to be included in metadata indexing or to be applied as a search string against existing metadata. Additionally, the table provides the management functions of identifying the terms actually used for indexing and the capacity to count the number of times each term is used.
Selecting keywords from very long lists or thesauri can be a frustrating experience. You need access to the entire list, but a selection routine that requires return to the entire list for each term is going to inspire very short lists and half-hearted indexing. In addition to selecting terms from the entire list, the designed routine permits the user to create groups of keywords for a theme and to select from existing groups of keywords associated with particular themes or subject areas. The designed routine permits the user to select terms from the entire thesaurus, prepare a group of keywords for a theme, or select words from an existing theme group. Figure 2 shows the process of reviewing and selecting keywords.
Presently, the Document program in ArcInfo provides very limited space for keywords in the metadata description. We use the Document program as one of our documentation tools. This ArcView application prepares a text file that can contain a robust list of identified keywords for a theme. This text file contains other metadata information that is important for our office in managing our GIS themes. This text file can then be substituted for the narrative file that is generated by the Document program. Figure 3 shows various keyword forms for a narrative text file of one GIS theme.
To recap, our implementation is in ArcView with storage of local thesauri and keyword groups in INFO or in dBase format. Any ASCII file of potential keywords can be used, provided they follow some basic conventions. Avenue scripts are used to review and reformat text files of keyword thesauri. Access for the user is in ArcView. No limit is set to the number of keyword phrases that can be identified for a theme.
Some barriers to the development and implementation of keyword thesauri in the GIS community have been:
base through which to distribute data gathering costs(Frank, 1994, p.588). Research, gathering and creating data, is big fun but not cheap. Digitizing data is somewhat less fun and the single greatest expense associated with GIS. Why should your organization pay the full price of duplicating data that is already available?
It's almost too obvious to say out loud, but how much information about your organization's data depends on the memories of individuals? How much is in files with labels that once seemed perfectly clear but have become positively cryptic over time? How much work is duplicated even within your organization in, for example, the creation of macros. Metadata offers a structure for keeping internal records for your organization.
The control part is fairly simple stuff that can be done to whittle down the sheer size of the potential searching and indexing vocabulary. It standardizes (and corrects) spelling. It establishes whether a singular or plural form is to be used. It refers the user to so-called preferred terms - which means that you'll have to use bathymetry rather than submarine relief if you actually want to retrieve something on the topic or have it found. Scope notes (little parenthetical clarifications like this one) are added to sort out the meanings of potentially confusing terms such as: DIKES(igneous intrusions). For water protection use LEVEES.
The context part is not so simple. It has to do with the relatedness of terms and, in carefully wrought comprehensive thesauri, the hierarchical relations between terms. The arguments for hierarchical arrangement are based on what is believed to be a hierarchical ordering principle in human memory (Najarian, 1981; Rosch, 1978). The difficulty in working with hierarchy is that, while the principle may be at work in all of us, the particular hierarchical structures of individuals are unique; they differ with respect to the breadth and depth of knowledge and experience. That's my excuse, or rationale, for abandoning hierarchical structure except for a trial collection of groups of very strongly related terms having to do with Cultural Resource Management, analytical methods, laws and regulations, and a stab at classifying kinds of boundaries.
Data producing organizations, quaintly called the drawing
office
by one map librarian (Perkins, 1992), have a different approach to
organizing data and output. They have fewer items to control than libraries, fewer
requests
to process and their community of data users tends to be aware of both format and subject
matter. Organizational schemes used by data producers don t generally run to library style
universal classification and cataloging . Depending on the size and nature of the
organization, they may organize in piles or the computer equivalent of piles (odd
collections of files spread throughout the system in unclassified order), impose a formal
local system of based on anything from client, location or project identification to some
form of cataloging with multiple access points. A lucky few get to hand the problem over
to the National Archives to sort out.
Both of these approaches, from the library and from the data-producing agencies, have their advantages and limitations. The FGDC is encouraging you, through the requirements of the Content Standard to bridge the gap between universal and local forms for the organization of data for digital distribution. The matter of subject keywords is a small piece of the problem. We hope our solution demonstrates two things: that theme keywords thesauri needn t be built from scratch and that the whole process can be done fairly simply using software you already have.
This implementation makes use of ArcView with storage of thesauri and keyword groups in INFO or in dBase format. Any ASCII file of keywords can be used provided that they follow some basic coventions. The lists are then available for selection of keywords for assignment to a GIS theme or for query against the existing metadata.
American Geographical Society (n.d.). AGS Map Catalog Subject Entries [Online]. Available at http://leardo.lib.uwm.edu/maptops.html [March 23, 1997].
Federal Geographic Data Committee (1994). Data standards: Content standard for Federal geospatial metadata [Online]. Available at: http://fgdc.er.usgs.gov/metaover.html [1996, March 22].
Frank, Steven (1994). Cataloging digital geographic data in the information infrastructure: a literature and technology review. Information Processing and Management, 30(5), pp. 587-606.
Library of Congress, Network Development and MARC Standards Office (1993). USMARC code list for relators, sources, description conventions (1993 ed.). Washington, D.C.: Cataloging Distribution Service, Library of Congress.
Mangan, Elizabeth U. (1995). The making of a standard. Information Technology and Libraries, 14(2), 99-110.
Moratto, Michael J. (1984). California archaeology. New York: Academic Press.
Najarian, Suzanne (1981). Organizational factors in human memory: implications for library organization and access systems. Library Quarterly, 51(3), 269-291.
National Information Standards Organizaton (U.S.) (1993). Guidelines for the construction, format, and management of monolingual thesauri (ANSI/NISO Z38.19-1993). Bethesda, MD: NISO.
Perkins, Chris (1992). Metaphysical mayhem? Retrieving and describing maps and spatial data in the map library and drawing office. Bulletin of the Society of University Cartographers 26(2), pp.21-24.
Rosch, Eleanor (1978). Principles of categorization. In Eleanor Rosch and Barbara B. Lloyd (Eds.), Cognition and Categorization (pp. 27-48). Hillsdale, NJ: Lawrence Erlbaum Associates.
U.S. Department of the Interior, Bureau of Reclamation (1971). Thesaurus of water resources terms: a collection of water resources and related terms for use in indexing technical information. Denver, CO: U.S. Department of the Interior.
U.S. Geological Survey (1996, February 14). USGS Mapping Information: Feature Class Types [Online]. Available at: http://mapping.usgs.gov/www/gnis/features.html [March 31, 1997].
U.S. Geological Survey (1995,October 4). USGS National Mapping Information: Geographic Names Information System Data Users Guide: APPENDIX C.--Geographic Names Information System (GNIS) Feature Class Definitions [Online]. Available at: http://mapping.usgs.gov/www/ti/GNIS/gnis_users_guide_appendixc.html [March 31, 1997].
Bovey, J. D. (1995). Building a thesaurus for a collection of cartoon drawings. Journal of Information Science, 21(2), pp. 115-122.
Lai, Pohchin & Gillies, Charles F. (1991). The impact of geographical information systems on the role of spatial data libraries. International Journal of Geographical Information Systems, 5(2), 241-251.
Library of Congress (1976, and supplements through 1991). Library of Congress Classification Schedule: Class G: Geography, Maps, Anthropology, Recreation (4th ed.). Washington, D.C.: Library of Congress.
NASA, Scientific and Technical Program Office (1994). NASA Thesaurus [Online]. Available at http://www.sti.nasa.gov/nasa-thesaurus.html [April 18, 1997].
NASA, Scientific and Technical Program Office (1997, January). NASA Thesaurus Supplement [Online]. Available at http://www.sti.nasa.gov/Pubs/Thesaurus-Supplement.html [April 18, 1997].
Milstead, Jessica L. (1993). Thesaurus management software. In Encyclopedia of library and information science (Vol. 51, Supp. 14, pp. 389-407). New York: Dekker.
Weinberg, Bella Hass, & Cunningham, Julie A. (1988). The design of online thesauri. In Martha E. Williams & Thomas H. Hogan (Compilers), National online meeting proceedings - 1988: Proceedings of the 9th national online meeting, New York, May 10-12, 1988 (pp. 411-419), Medford, NJ: Learned Information.