Daniel O. Nelson, Robert J. Krumm, Sally L. Denhart, Sheena K. Beaverson
Illinois State Geological Survey
ArcInfo Solutions to Metadata Problems:
Building a Solid NSDI Clearinghouse Node on a Shifting Metadata Landscape
Abstract
Introduction
Project Background
The Value of Metadata
Adopting a Standard Metadata Format
Staff Training
Tools Used to Produce Metadata
Document.aml: A Review and an Alternative
Summary
Acknowledgments
References
Author Information
Abstract
The 199697 Illinois Clearinghouse Node project of the National Spatial Data Infrastructure (NSDI) is a multiagency effort, led by the Illinois State Geological Survey (ISGS), to make metadata and digital geospatial data about Illinois natural resources available on the Internet. The primary objectives are to generate metadata compliant with the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM), develop a clearinghouse node to support search and retrieval of the metadata, and offer the results as a model for other organizations in Illinois. This is a project status report, emphasizing the value of metadata and metadata generation methods applicable to UNIX ArcInfo users.
Metadata issues in Geographic Information System (GIS) management are relatively new and evolving quickly. To ensure a viable product into the future, metadata generation tools and techniques that produce adaptable, software independent metadata were sought. The use of ASCIIbased text files was adopted to keep metadata compilation as simple and flexible as possible. This approach is supported by the FGDC tool mp (Metadata Parser), which parses ASCII metadata text in preparation for indexing, search and display. The software tools Document.aml and Xtme (Xt Metadata Editor) were used to generate FGDCcompliant metadata for ArcInfo GIS data, and the processes and results compared. Document.aml automatically extracts several metadata elements directly from an ArcInfo coverage, but has several drawbacks. Xtme is less time consuming, more stable, easier to understand, and better structured to meet CSDGM content and format requirements than Documant.aml. To take best advantage of both tools, a simpler and faster AML program (called Fgdcmeta.aml) was developed from Document.aml to extract ArcInfo coverage metadata and pass it to Xtme for subsequent metadata collection. A metadata collection system using Fgdcmeta.aml and the FGDC tools Xtme and mp to produce ASCII metadata text is recommended.
Introduction
The Illinois State Geological Survey (ISGS), in cooperation with other offices and divisions of the Illinois Department of Natural Resources (DNR), has implemented a National Geospatial Data Clearinghouse Node dedicated to serving digital geospatial information about Illinois natural resources. The node contains metadata and GIS data for geological, hydrological, natural resource, historical, administrative and infrastructural issues.
The NSDI encompasses policies, standards and procedures for organizations to cooperatively produce and share geospatial data. The FGDC has assumed leadership in the evolution of the NSDI in cooperation with state and local governments, academia, and the private sector (FDGC, 1997).
One of the primary methods the FGDC has used to advance the NSDI is to establish an organized web of linked Internet sites that serve GIS data and associated descriptive information (metadata). Each Internet site is a "node" of the NSDI Clearinghouse. An individual seeking GIS data can use the Internet to "go to" the NSDI Clearinghouse and search metadata indices or browse metadata lists. This system relies on metadata that conform to the FGDC Content Standard for Digital Geospatial Metadata (FGDC, 1995). The CSDGM provides guidance for the creation of "data about data", or information such as a description of available geospatial data, data quality and organization, spatial reference, entity and attribute descriptions, and distribution information. Many clearinghouse nodes also offer downloadable GIS data. Any individual node can be accessed directly, and all registered nodes can be searched simultaneously from the primary node maintained by the FGDC. The FGDC NSDI Clearinghouse Node can be found at http://www.fgdc.gov/clearinghouse/index.html. The Illinois NSDI Clearinghouse Node can be accessed via the ISGS Internet Home Page at http://www.isgs.uiuc.edu/isgshome/
The DNR units participating in this project are:
- Illinois State Geological Survey (ISGS)
- Illinois Natural History Survey (INHS)
- Illinois State Water Survey (ISWS)
- Waste Management and Research Center (WMRC)
- Illinois State Museum (ISM)
- Office of Mines and Minerals (OMM)
- Office of Realty and Environmental Planning (OREP)
- Office of Scientific Research and Analysis (OSRA)
Each has contributed metadata and GIS data to be served and accessed on the Illinois node, making available such data as the Illinois Public Land Survey System (county, township and range, and section lines), bedrock and Quaternary geology maps, wetlands and streams, landfill inventory, fish and wildlife areas, land cover, political boundaries, municipal boundaries, roads and railroads, and much more. Metadata for approximately 100 GIS data sets are currently available, with at least 200 data sets yet to be added.
To support future intra and interagency clearinghouse efforts in Illinois, a standard minimum set of metadata elements conforming to the FGDC Content Standard for Digital Geospatial Metadata (CSDGM) was developed by the participating agencies. It is hoped this effort will serve as a prototype for other agencies in Illinois and will stimulate the development of a fully integrated system of clearinghouse nodes connecting Illinois state agencies with users of digital geospatial data nationwide.
Project Background
The Scientific Surveys and other offices and divisions of the Illinois DNR have an established history of publishing and distributing information to the public, other government agencies, academia, and industry. The ISGS has used GIS technology since 1973, and since then has witnessed an increasing demand for digital geospatial data from all of these sectors. The project participants have worked together several times to meet this demand: by direct distribution of data to endusers, in cooperative projects with other organizations, through significant contributions of digital data to two multiagency CDROM compilations, and by taking prominent roles on the Illinois Geographic Information System (IGIS) Committee. We have participated in numerous national, state, county, and local GIS projects to serve the environmental, geological, socioeconomic, and civil planning needs of Illinois and the midwest United States. As a result, the project partners have several hundred individual ArcInfo data sets available for analytical use.
The participants' experiences with building inhouse GIS databases and sharing digital data with others have demonstrated the importance of thorough documentation; it is an essential part of any geospatial data set. However, most of our GIS data have been documented by the many individual creators, using a variety of styles, methods and computer platforms. Because most of our efforts are project driven, there has been little formal maintenance and update of our metadata holdings. To address this situation, we have, over the last three years, implemented several small pilot efforts focused on metadata generation and distribution. These were dedicated predominantly to examination of various metadata collection tools, namely the United States Geological Survey (USGS), ArcInfo, and Bureau of Land Management (BLM) versions of Document.aml, the United States Army Construction Engineering Research Laboratory (USACERL) Corpsmet tool for PC, the National Biological Service (NBS) WordPerfect Template for FGDC Metadata, the NBS MetaMaker tool, and the USGS mp and Xtme tools. In addition, the ISGS generated a Metadata Table of Contents in ASCII format for use with a PERLbased search tool, listing an absolute minimum number of eight descriptive metadata elements for 400+ data sets, and all participants worked together to generate an ASCIIbased template for a subset of the CSDGM for use with the 1996 Illinois DNR geospatial data CDROM. These efforts, and the needs that engendered them, led to the development of the current project.
The Value of Metadata
From an institutional point of view, this project has provided a number of benefits related to a better organized and documented GIS database. The project participants were provided with an opportunity to examine the overall organizational quality of their GIS data holdings. The ISGS GIS database, for example, was found to be a combination of documented and organized coverages along with undocumented coverages that were not easy to locate or understand. Working with the database has generally not been problematic because the people responsible for generating the information are still onstaff, available for questions. However, staff turnover is inevitable, and we recognize the potential for loss of valuable, undocumented data histories.
The overall makeup of the inhouse GIS user base in Illinois state agencies, and in many other organizations, is changing from a core of dedicated ArcInfo specialists to a much larger group that includes many ArcView users. Along with this evolution comes a database management responsibility to provide and maintain a GIS database that is relatively easy to access, understand and use. Although many of the ArcView users will likely be satisfied to work with a standard set of project files, we expect many others will want to learn what additional data are available on our network of servers. It is especially important to provide searchable metadata to these users so they can locate and use the available data to the greatest extent possible.
Comprehensive metadata allow for better management, control, and protection of the data investment, by providing information on identification (name, description, purpose, version, location); quality (accuracy, completeness, currentness); lineage (sources, processing steps, previous versions); and contact personnel (Who do I call?). With this information, the GIS manager can, for example, improve data catalogs, assign revolving update and review dates, and retain a record of processing and revision histories. Institutional control is enhanced by specifying the proper (and improper) uses of the data, by applying access, distribution and security policies, and by supplying all users with a uniform product. These controls may also serve to protect an organization by limiting liability for misuse of a GIS data product.
Further, metadata can be used to leverage data resources and generate supplementary benefits. For example, redundant data collection and preparation can be avoided, saving resources. Existing data can be combined to create new products, expanding the GIS resource base. The data catalog can be used as a GIS Portfolio to promote data resources and negotiate data exchanges. Older data can be made useful through donation to schools, libraries and communities, thereby generating goodwill.
Finally, online, overthenet, realtime GIS using ArcView will be a reality in the future. Ultimately, data users will locate and access various GIS data layers on potentially several different Internet servers, and without ever actually downloading or possessing the data, immediately combine them within a single GIS application (perhaps ArcView) and perform a spatial analysis. It is imperative that data made available for this type of activity are thoroughly described so that the compatibility of data can be assessed, and so proper and improper use is made plain. The metadata efforts expended on this NSDI project will help prepare the state agencies of Illinois as they move toward participating in this sort of "live" GIS on the Internet.
The Illinois Metadata Experience
One of the greatest concerns being addressed by this project is the need for a flexible, adaptable metadata system. We wish to provide as much useful metadata as possible, while avoiding commitments of time, effort and training to metadata tools and formats that could become obsolete quickly. Four major questions that are defined by this concern are:
- What specific metadata format should be adopted?
- How much and what types of staff training should be provided?
- Which metadata tools should be used?
- How can the problems with Document.aml be solved?
Adopting a Standard Metadata Format
The FGDC metadata standard has been undergoing revision over the last year. Personal communication with members of the FGDC metadata committee indicate that the standard will probably not change drastically. Rather, some metadata elements will be redesignated as "core", "recommended if applicable" or "optional" (or something similar), and a standard method of adding "userdefined" metadata elements will be instituted. This will give users of the standard more freedom in the way they choose to apply it, while maintaining relative uniformity. The implication is that it is likely that any relatively comprehensive metadata based on the existing version of the metadata standard will also comply with the revised standard. Nonetheless, project participants decided that to formally adopt a specific metadata format based on a soontobereplaced standard was illadvised. However, they informally agreed to continue to produce FGDC compliant metadata using the set of elements previously identified for use with the Illinois DNR GIS Data CDROM (Illinois DNR, 1996). This metadata element set consists of CSDGM sections 1 (Identification Information) and 7 (Metadata Reference) and substantial parts of other sections, as applicable. Although the difference is subtle, proceeding in this manner has placed the participants in a position to better assess and recommend a formal metadata format for Illinois data after the revisions to the CSDGM are complete.
Staff Training
This is a pilot project, hence training of staff not directly involved with the project has been kept to a minimum. There are two reasons for this. First, the tools and techniques needed by the metadata developers are not necessarily those needed by the data developers or the data users. It is more prudent to establish a prototype clearinghouse node based on other successful clearinghouse node efforts, assess the results, and refine the product. Then the response of data developers and users can be evaluated to determine the type and scope of training required. Second, as previously mentioned, the FGDC metadata standard is being revised. Project participants are already familiar with the standard, and training of others in the standard is not an immediate necessity. The project participants decided that it would be inefficient to provide training in the current version and, shortly thereafter, have to retrain for the revised version.
Tools Used to Produce Metadata
Initially, because most of the participants in this project are dedicated ArcInfo users, Document.aml was the tool of choice for collecting metadata. It is sufficient (albeit occasionally unstable) for producing online metadata. However, when the transition was made to producing FGDC compliant metadata, new problems with Document.aml were discovered. Some were related to program bugs, and others were related to the postprocessing of Document.aml output. (The problems of Document.aml and one solution to them are discussed in the next section.) This situation prompted a search for metadata tools that were less software-specific and more basic and adaptable. The report by Mitre Corporation (1996) provides an excellent review of metadata tools.
The most flexible format for metadata is ASCII text. It is the most platform- and software-independent vehicle available. Although primitive in many ways, it can be imported into virtually any moresophisticated text manipulation or word processing software, and if necessary, manipulated directly with system level commands. The FGDC recognized this and has developed excellent ASCIIbased metadata compilation tools for UNIX and other platforms, to support the development of the NSDI Clearinghouse network.
A metadata collection system developed for the FGDC by Peter Schweitzer (1997) of the USGS is the most straightforward. This system comprises three tools for UNIX:
- Xtme (Xt Metadata Editor),
- mp (Metadata Parser), and
- cns (Chew 'n Spit).
Xtme is an X Windows application that provides a list of metadata elements in outline form from which the user can pick and choose to build a properly formatted metadata document. Help is provided for every metadata element, describing what information is appropriate for the field, and if applicable, a list of standard values. Mp is a text parser that checks metadata files (i.e. the output from Xtme) for proper format, order and element values, issuing warnings or errors when problems are identified. It is used reiteratively to prepare metadata documents for indexing by the NSDI Clearinghouse Node server software. It is also used to generate output files in text, HTML and SGMLformats. Cns can be used to "clean up" existing metadata files that may have been generated by hand or by some other tool. It will, for example, remove leading section numbers from metadata elements if they are present so that the file can be run through mp.
The advantage of the cns/Xtme/mp method is that once the metadata files are generated, they are entirely independent of the tools that generated them. The ASCII metadata files are not dependent on any software (other than the operating system) to maintain their viability and accessibility. Even in the unlikely event that the FGDC would cease all development and support of these tools, an existing ASCIIbased metadata database will not be detrimentally affected.
Document.aml: A Review and an Alternative
Document.aml (Esri 1995) is a metadata generation tool for ArcInfo that was created by staff of the Water Resources Division of the USGS and subsequently included with recent releases of ArcInfo. For the purposes of this section, it is assumed that the reader has some familiarity with the ArcInfo document atool.
Over the last three years, the ISGS has used Document.aml as its primary metadata collection tool. In this time there have been five or six generations of the tool: USGS ver. 2.0.2; ArcInfo vers. 7.0.2, 7.0.3, and 7.0.4; Blmdoc from the BLM; and the most recent USGS release. (Esri version 7.1.1 has subsequently been released but is not included in this review.) The concept of Document.aml is excellent and the program has proven useful for creating online documentation for individual ArcInfo data sets. However, the mechanics of the program have shown several problems, especially in generating FGDC formatted metadata. These problems can be attributed to three primary causes:
- Document.aml was written prior to the advent of the FGDC metadata standard, and the two formats are dissimilar,
- there have been several different versions of Document.aml in a short time, and not all are compatible with each other, and
- Document.aml forces ArcInfo to become a data entry interface, something it was not built to do.
Some of the specific problems encountered are:
- Files made with an earlier version seldom translate exactly to a later version. With every new version of ArcInfo, reediting of the narrative file and information in various fields of the input menus is invariably necessary.
- Some versions have produced FGDCtype output without a problem, but others fail on this operation. Different versions fail due to different special AML characters in the narrative file.
- Lines of text are frequently omitted from the narrative file. The reason for this is the 80character limit for each line in the .NAR INFO file. Unfortunately, the 80character limit is not enforced during the text input phase, and the result is inadvertent loss of text.
- Sections of the narrative file are frequently omitted from the FGDC output (especially when files created by an earlier version are processed by a subsequent version.)
-
Document.aml consumes an unreasonably large amount of cpu time during the narrative file phase.
- The Arc license is unavailable during narrative file phase because it is running the text editor.
- The FGDC output includes multiple pages of LOG information that is redundant and not very informative, but which often accounts for the bulk of the metadata document.
- The fact that the FGDC output must always be edited to include additional information not collected by Document.aml is the crux of the problem. It raises the question, "Why do all that data entry in Document.aml when it's so much more straightforward (and less cpu intensive) to do it in Xtme or some other textbased editor?"
In addition, we understand that Esri is developing a new metadata tool, which casts some doubt on the future development and support of Document.aml. Because of these problems, the ISGS has chosen not to continue using Document.aml for the collection of FGDC metadata. Document.aml does have, however, some impressive functions including:
- a very useful metadata gathering engine that uses the AML &DESCRIBE function, and
- a powerful block of code that determines projection information and calculates the spatial domain in decimal degrees.
Not wanting to forego the convenience of these functions, the ISGS is using Document.aml as a template to develop a related AML called Fgdcmeta.aml. This tool retains the excellent automatic data gathering routines of Document.aml, but discards entirely any manual data entry within ArcInfo. All data entry is done in Xtme or another editor. The new aml consists of line commands only; no menus. The main function is described in four steps:
- The user issues the fgdcmeta command in ArcInfo.
- Data that can be automatically gathered (DESCRIBE data, etc.) are collected.
- The data are written to a user defined FGDC CSDGM template. This template can be created by the user using Xtme or any other ASCII text editor.
- The "skeleton file" (template with describe data) is ported out to Xtme (or other text editor) for subsequent editing.
The approach used in Fgdcmeta.aml has the following advantages over Document.aml:
- The Arc license is not constrained to text editing functions only. The user can continue to use the Arc session and write metadata simultaneously.
- The time-consuming process of editing inside ArcInfo and then reediting outside ArcInfo is avoided.
- Understanding the relationship between Document.aml data entry fields and the associated FGDC CSDGM element fields is no longer required. Users need know only one metadata format.
- Fgdcmeta.aml supports the continued use and development of the FGDC tools mp and Xtme.
- The output template is user defined, and so supports revisions to the CSDGM. It is flexible enough to evolve with FGDC metadata format as long as the CSDGM maintains its present structure. Reordering of elements and sections, for example, would be less problematic for this tool than for mp and Xtme, because it uses those two programs for template definition.
- It helps avoid the problem of the unknown evolution of Esri metadata collection tools and support for Document.aml. Because this tool is extremely simple in concept, it will be useful as long as ArcInfo version 7 is supported (and possibly beyond, depending on how versions 8 and higher operate).
There are disadvantages to using Fgdcmeta.aml. It is not as robust as Document.aml. In its current form, it only supports a onetime query of DESCRIBE data from coverages, grids, and tins for the purposes of generating FGDC compliant metadata outside of ArcInfo. It does not write the metadata to an INFO file or support update of an existing metadata file (although those options are being explored).
The present usage for Fgdcmeta.aml is:
fgdcmeta <geo_dataset> {view ¦ create}
The create option is as described above. The view option (default) displays an existing metadata file. Currently, this option is very system dependent, requiring all metadata files to be stored in a single system directory. (These are in fact the same metadata files that are served on the NSDI node.) The AML code must be edited to indicate the proper directory.
It is anticipated that when development is complete, the usage will be as follows:
fgdcmeta template <template_file>
fgdcmeta <geo_dataset> {view ¦ create ¦ update}
fgdcmeta <geo_dataset> putinfo <metadata_file>
The template option will allow the ArcInfo administrator to define the institutional FGDC metadata template to be used by all users. It is intended that the template will be created with Xtme and checked for integrity with mp, although this will not be an absolute requirement. The update option will update an existing metadata file with current DESCRIBE values from the related ArcInfo data set. The putinfo option will write the metadata file as an INFO file of the appropriate data set. Note, however, that Document.aml currently has problems writing text files to INFO files because of the 80characters per line limit. If these limitations cannot be overcome, and the integrity of the text absolutely guaranteed, then the update and putinfo options may be abandoned. It is more important to protect the integrityof the metadata file than to have a copy attached to the data set in INFO file format.
When complete, Fgdcmeta.aml will be available on the Illinois NSDI Clearinghouse Node and possibly at the FGDC NSDI node. Check the Illinois Clearinghouse Node for updates.
Summary
For dedicated ArcInfo users, the past year was an uncertain time in terms of generating FGDC compliant metadata and establishing a NSDI Clearinghouse node on the Internet. The FGDC Content Standard for Digital Geospatial Metadata was in a state of flux and the viability of Document.aml was in question. The danger was not in the task itself, but in the commitment to the specific metadata policies, tools and designs required to complete the task. At issue were strategies to maximize metadata productivity in the near term, while avoiding work practices with a recognized potential of obsolescence in the longer term. The value of metadata is undeniable, but only if it efficiently supports the GIS enterprise.
This particular metadata generation effort is young, and the relatively new concept of metadata is dynamic. Thus, the guiding principle of this ongoing project has been to maintain maximum flexibility in the metadata product so it is free to evolve with the changing concepts and procedures applied to the compilation of metadata. This philosophy has led to the adoption of FGDC methods and tools for metadata generation. Xtme and mp are recommended for use ingenerating FGDC compliant metadata. The FGDC is involved at the national and international levels in the development and application of metadata standards. It can be assumed that where the FGDC leads, metadata will follow. However, the tools and methods of the FGDC should produce an ASCII textbased metadata product whatever the mode of generation. Such a product protects the producers the metadata investment. If the FGDC method is generally accepted, then those who use it will be at an advantage. If the FGDC method does not become the generally accepted method, then metadata prepared in ASCII format with FGDC metadata tools are still in the most universal format, and can be easily recast into the prevailing format.
The use of Document.aml to produce FGDC compliant metadata for an NSDI Clearinghouse node is not recommended. It has software and design flaws that make it less efficient to use than other tools. We have developed a simpler and faster AML program called Fgdcmeta.aml from Document.aml. The program provides a more direct path to FGDC metadata generation by writing ArcInfo DESCRIBE data to a preformatted file which is subsequently edited in Xtme. Fgdcmeta.aml is recommended for firsttime generation of metadata for ArcInfo coverages, grids and tins. It is, however, still in development, and a function for successive automatic update of DESCRIBE values is not yet complete.
Acknowledgments
Funding for this project was provided by the FGDC 1996 Competitive Cooperative Agreements Program (CCAP) administered by the USGS. The authors wish to thank the following DNR representatives for their participation on this project:
- Jill Blanchar, Illinois Waste Management and Research Center,
- Mark Joselyn, Illinois Natural History Survey,
- Kingsley Allan, Illinois State Water Survey,
- Erich Schroeder, Illinois State Museum,
- Ray Druhot, Office of Mines and Minerals, and
- Will Hinsman and Sheryl Oliver, Office of Realty and Environmental Planning.
Although they may not know it, Doug Nebert and Peter Schweitzer of the USGS provided invaluable assistance in metadata generation, clearinghouse node operations, and the development of Fgdcmeta.aml.
References
Environmental Systems Research Institute, Inc. (1995)
Document.aml (several versions through Esri version 7.0.4),
Esri, Redlands, California, original programming by D. Nebert
and M. Negri (USGS), and M. Hoel (Esri).
Federal Geographic Data Committee Web Site, http://www.fgdc.gov/clearinghouse/index.html.
Federal Geographic Data Committee (1995) Content Standards for Digital Geospatial Metadata Workbook (March 24), Federal Geographic Data Committee, Washington, D.C.
Illinois Department of Natural Resources (1996) Illinois Geographic Information System CD-ROM of Digital Datasets of Illinois, Illinois Department of Natural Resources, Springfield, Illinois, 2 vols.
Mitre Corporation (1996) Metadata Tool Evaluation, http://www.fgdc.gov/Metadata/metahome.html.
Schweitzer, Peter (1997, most recent update) Chew 'n Spit (cns), Metadata Parser (mp) and Xt Metadata Editor (Xtme) Metadata Tools, United States Geological Survey, Reston, Virginia,
http://GeoChange.er.USGS.gov/pub/tools/metadata.
Author Information
Daniel O. Nelson
Associate Staff Geologist
Illinois State Geological Survey
615 East Peabody Drive
Champaign, Illinois 61821
USA
Telephone: 217-244-2513
Fax: 217-333-2830
Email: nelson@muck.isgs.uiuc.edu