David Hansen, and Michael Sebhat

COMPILATION OF SPATIAL METADATA FOR ACCESS IN ARCVIEW AND MOSAIC


ABSTRACT

Metadata conforming to the Content Standards for Digital Geospatial Metadata is required for all GIS data under development by Federal agencies. This paper concerns the development of metadata conforming to these Standards for an on line data catalog and for data transfers by the Mid Pacific Region of the U.S. Bureau of Reclamation.

In this implementation, the Standards were examined to identify information that is common for multiple data sets, information that is stored by ArcInfo software, and information that usually must reference other documents. Information that is common to multiple geospatial data sets is written once to a metadata file and this file used as reference for related geospatial data sets.

Information which is stored by ArcInfo software is written at the time a particular metadata file is prepared. Citations are prepared as part of the metadata for information that is contained in other documents or which require contacting other individuals or agencies. The metadata developer generates these files using AML and form menus to prepare simple text files. These files are hardware and software independent. The files are formatted for immediate use in MOSAIC and can be displayed in ArcView 2 for use as on line catalogs. They can also be concatenated to produce a metadata file for data transfer.

INTRODUCTION

This paper is about the integration of information describing geospatial data sets and software applications for viewing that information in an on-line catalog. It concerns a system in place using ArcInfo software for the GIS environment. MOSAIC and ArcView 2 are used as on line catalogs for data visualization and metadata display. Information about data or the description of data is metadata. The metadata described in this paper is based on the Federal Content Standards for Digital Geospatial Metadata (FGDC, 1994).

Content Standards for Geospatial Metadata

The Content Standards for Digital Geospatial Metadata (Standards) were adopted by the Federal Geographic Data Committee (FGDC) on June 8, 1994. These Standards with some additional characteristics are under review by ASTM as content specifications for digital geospatial metadata (ASTM, 1994). They provide common terminology for describing geospatial data and guidelines on information that should be provided by a data developer. Under Executive Order 12906 on Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure, all Federal agencies must comply with these standards for any data sets under development since January 1995 (Clinton, 1994).

Sections of the Standards. A description of the implementation of the Standards requires an examination of the major sections of the Standards. The metadata required for a geospatial data set must contain information described in seven main sections. These are:

Identification Information
This is the basic description of the data set. It includes the purpose, the originator, and restrictions on access and use. The data theme and the area represented are described. Date of data set preparation is identified as well as the time period represented by the data set. Elements in this section identify software and hardware platforms.
Data Quality Information
This section is an assessment of data quality for both the spatial coordinates as well as the attributes. It includes a description of the data set lineage, logical consistency and completeness as called for in the Spatial Data Transfer Standards (Department of Commerce, 1992).
Spatial Data Organization Information
This describes the mechanism used to represent the spatial features as described in SDTS. Basically, it identifies whether vector or raster format is used to represent the features. The number and type of objects representing the entities are identified.
Spatial Reference Information
This section describes the means of encoding the spatial coordinates (map projection) used to store the data set. It includes an estimate of the resolution of the stored coordinate values.
Entity and Attribute Information
This is the description of the entities or features represented by the data set and the attributes assigned to those entities. It includes the domain of values assigned to the attributes. For numeric domains, the units of measure, and the measurement resolution carried by the attributes are described. It also includes the time period represented by the data values where appropriate.
Distribution Information
This section describes the method for obtaining the digital data set and identifies formats for data transfer.
Metadata Reference Information
This section identifies who prepared the metadata or the description of the data set, the date of preparation, and the version of the Standards that was used in its preparation.

The Standards provide a comprehensive list of elements for describing a geospatial data set. Within these seven sections, three additional sections are referenced for citation of source documents, to identify contact individuals and agencies, and for stating time and date relationships. The ASTM version of these Standards include assignment of a unique tag name and tag value for each element (ASTM 1994). These tags are intended for use with standard graphic markup language (SGML) or database applications.

Not all elements identified are appropriate for a particular data set and the Standards provide production rules for metadata following Yourdan syntax. This syntax identifies elements that are required and optional. The syntax identifies choices between groups of elements for describing characteristics that are appropriate for that geospatial data set.

IMPLEMENTATION OF THE STANDARDS

An examination of the sections of the Standards provides a useful starting point for identifying the information required for a geospatial data set. The implementation of the Standards is assisted by an examination of the methods and procedures in use by the particular organization. For this paper they will be the procedures followed by the GIS service center for the Mid Pacific Region of the U.S. Bureau of Reclamation (MPGIS). While the Standards require a complete set of required metadata elements to accompany a data transfer, much of this information would be duplicated and stored multiple times if stored with each data set. This includes:

Elements Common to Multiple Sets of Data

Some of the information identified in the Standards are common to multiple data sets. For MPGIS, the distribution information is the same for virtually all of the data sets. In addition, there are only a few projections used to store the geospatial data sets. The information required for these two sections can be described once in a file and then referred to as needed to complete a metadata description.

Information common within themes. The Standards recognize the existence of common geospatial data themes or data sets which cover the same or similar subjects. MPGIS has large blocks of geospatial data sets that have the same or similar characteristics based on theme. These include:

Data sets within the same theme can be expected to have descriptions which are the same or very similar for major sections of metadata including the identification, data quality, and entity and attribute sections. Data sets which are clearly related can be described by identifying a modal or representative data set which is fully described. The related data sets can then be referenced to it with the identification of differences between the data sets.

Common data processing environment. The data processing environment and many of the processing steps are common for many geospatial data sets. MPGIS has established procedures for many of the initial data capture and processing steps which are dependent on the source of the original data, the method of digital capture, and on the purpose for developing the data set. Information on the standard procedures for capturing and automating the data can provide the basic information for the processing steps under the data set lineage. They also provide valuable information in assessing data quality for both the spatial coordinates of the entities and the attribute values. As these standard procedures are documented, they form the basis for a quality control / quality assurance program.

Elements Automatically Stored by ArcInfo Software

ArcInfo automatically stores some of the information required in the Standards. Information required for the section on spatial data organization is essentially completely stored by ArcInfo and can be accessed directly by the software for either vector or raster data. This information is the number and type of objects representing the geographic features. These features are referred to as entities in the Standards. In addition, tolerances used in processing the geospatial coordinates, the bounding coordinates of the features, and the presence of topology are maintained by the software. The resolution of the stored coordinate values is also stored. However, the resolution of the source coordinate values is not stored.

Optionally, the ArcInfo user may store other information identified in the Standards. Map projection information may be automatically stored by the software. Where the PROJECTDEFINE command has been implemented or where the PROJECT command has been used, a PRJ file is written under the coverage name. Additionally, the complete machine processing steps for the data set lineage may be stored.

Elements Described in Other Documents

MPGIS relies heavily on the citation and contact sections of the Standards for the sections on data quality information, and on entity and attribute information. The citation of sources or contacts is critical for information that often is not part of the source materials used for the geospatial data set but which provide information about data quality or the attribute domain. Generally MPGIS only has the original source material for a comparison with the digital coordinate data and the attribute data unless other material is available for reference. An exception to this would be where measured values are involved such as global positioning system (GPS) data or survey data.

This limits the ability of MPGIS to make any quantitative assessment of accuracy for either the attributes or positional accuracy of a geospatial data set. A description of the type, quality, and condition of the source document and the digital capture process is provided. This information includes:

  1. The map scale of the source information,
  2. The clarity and sharpness of the representation of the entities and attributes,
  3. The type of material used in the representation,
  4. The location and sharpness of the control point locations.
Other documentation from the source of the geospatial data generally must be relied on to make a meaningful quantitative accuracy assessment.

The domain of valid values and the resolution of numeric attribute values are important components of the entity and attribute information section. MPGIS can report the domain of attribute values captured from the source material and the resolution of those values as contained in the source materials. It also identifies any entities which have an unknown or unrepresented domain for an attribute value.

Often, the documentation supporting the source materials is not directly associated with the source data. This documentation consists of reports, map sheets, legends of the attribute domains, or files which describe the development of the source document. Where this material has been referenced or provided with the source material, it is cited in the metadata. Where the information is lacking, the source agency and contact person are provided in the metadata.

MPGIS APPLICATION OF THE STANDARDS

Basic Files

Information which is the same and common to groups of digital geospatial data are stored once as text files which can be updated and revised as conditions change. MPGIS finds it easiest to work with simple text files to store the metadata element information. These can be reviewed, updated and have additional information appended to them as required. The files are software independent and can be immediately concatenated for inclusion in a data transfer.

For each geospatial data set that is documented, three separate files are prepared. These are:

  1. A file containing the Identification Information and Spatial Data Organization Information
  2. A file containing the Data Quality Information
  3. A file containing the Entity and Attribute Information.
The metadata information or the identification of the person who prepared each text file is written to these files. The date and time of metadata preparation or review is date stamped to these files.

The two remaining major sections of the Standards are maintained in separate text files since they are largely the same for all of the geospatial data sets of MPGIS. These are the sections containing distribution information and spatial reference information. These five separate files serve as the basic information for:

For modal geospatial data sets or those representing related data sets, unique characteristics or differences are noted. Specifically, the number of objects (points, lines, polygons) are unique to each geospatial data set. The geospatial data set name or pathname will be different. In addition, the set of entities, the set of attributes, or time period represented by the data set may be different. These differences are noted in the file containing the identification information. Where the set of attributes differ between the modal data set and the other data sets, these differences are reported with the entity and attribute information.

Additional information that is not identified in the Standards is also written to these files. This includes the identification of the project account. MPGIS works by project for accounting purposes and most of the geospatial data sets are stored by project. To assist users of the MPGIS system in accessing the geospatial data sets and to provide them with ready access to the metadata, views of the data are prepared with ArcView 2 and stored under a views directory and the metadata files are written to an overall metadata directory. Subdirectories under these overall directory are set up by project name. Optionally, the basic metadata files can be stored under the appropriate coverage name or under another directory.

AML Routines for Metadata File Preparation

These files are prepared with a set of program routines written in Arc Macro Language (AML). The routines also write embedded HTML and SGML statements for display and hypertext linking with MOSAIC. The routines use the element tag name as identified in the ASTM version of the Standards for variable names (ASTM, 1994). These tag names are also written to the text files for future use in query operations. The routines use a series of form menus to capture basic information about the data set. A database of key information about MPGIS is stored and maintained in INFO for use in the routines. At present this only consists of major data themes recognized by MPGIS, contact persons with MPGIS, and MPGIS projects.


Figure 1: Initial form menu for metadata preparation

Display of initial form menu used in metadata
preparation

Information captured from the form menus called for in the Standards include:

  1. The data set description including status and access restrictions
  2. Sources and contacts for the spatial coordinates and attributes with a citation for the sources
  3. Type and condition of source material
  4. Methods of digital capture for the spatial coordinates and attributes
  5. Methods of verification of the digital representation of the entities and the attributes
  6. Identifies modal data sets or data sets that have characteristics that are representative of several data sets
  7. MPGIS individual who serves as a contact for the data set
  8. The type of domain of the attribute values (Range, Enumerated, or Codeset Domain)
  9. Identifies the author and date of the metadata and records any updates

In addition to the elements called for in the Standards, additional characteristics are described for management of the various geospatial data sets. These include:

  1. The project name or account under which the data set was developed
  2. The state of the data set as to whether it is a copy of another data set or an original.
  3. Condition of the data set as to whether it is active and on line or archived to tape.
  4. Official record status as to whether the data set represents a Federal record

The routines then prepare the set of three files and includes information which is automatically stored by ArcInfo to complete the basic files. Automatically stored information includes:

  1. The pathname for the geospatial data set
  2. Type and number of objects generated by the GIS software to represent the entities
  3. The topological status of the geospatial data set
  4. Number of tics used as coordinate control
  5. Precision and units of stored coordinate values
  6. Resolution of the X and Y coordinate values as set in processing the data set
  7. Attribute names and definitions

The resulting files are about two pages long which can then be edited for inclusion of additional information that is not easily treated in the menus. This includes providing more data quality information such as special data processing steps, logical consistency, and completeness of the geospatial data set. In the entity and attribute information section, the domain of valid values for the attributes and where appropriate, the units of measure and measurement resolution are identified.

For modal geospatial data sets, the names and/or pathnames of the related geospatial data sets are entered and the differences in characteristics noted. Generally, these differences are added to the identification information. Where there are differences in the attributes of data sets, these characteristics are also entered in the entity and attribute information.

INTERACTIVE DISPLAY OF METADATA IN MOSAIC AND ARCVIEW

At the time that these files are written, a file identifying the metadata files is prepared with hypertext links to the separate metadata files. This file is an index to the geospatial data sets that have been described for a project area. It and an overall file describing the projects serve as the initial guides for users of the MPGIS system. These files and views of the projects identify the available geospatial data sets using ArcView 2 and MOSAIC. The files have embedded SGML statements that set up the text file for viewing in MOSAIC and HTML statements for linking the separate metadata files. All of these files can then be called up in either MOSAIC from the MPGIS home page or with hotlinks from the views set up for that project area. MOSAIC is an Internet information browser developed by the National Center for Supercomputing Applications at the University of Illinois as part of World Wide Web.


Figure 2: A project page in MOSAIC with links to metadata

A view of the MOSAIC application showing a
project page

MPGIS uses MOSAIC for interactive query and display of information about MPGIS activities, procedures, and available geospatial data sets. This includes the U.S. Bureau of Reclamation's implementation plan for the National Spatial Data Infrastructure and the Federal Geospatial Data Clearinghouse. The MPGIS home page has links to the metadata directory for descriptions of data that is both on line and archived for storage. Users of the MPGIS system can use MOSAIC to browse through descriptions and graphic images of the available data.

ArcView 2 is used by MPGIS to familiarize the user of the MPGIS system to the basic characteristics of the geospatial data sets. The spatial domain and the entity and attribute structure including the domain of attribute values is readily apparent in an ArcView display.The metadata environment is further enhanced by using ArcView 2 hotlinks within an ArcView 2 project file specifically designed to display files of metadata.

The project file simply contains a view with a set of boxes with theme text labels. At MPGIS, this project file is given the name DATAFINDER. The user simply applies the hotlink to a given theme box and text files describing the network paths of where the data resides and other summarized metadata are displayed. The project file is kept in the public view area and can be imported into any other project files the user is viewing. The boxes used in the DATAFINDER view are nothing more than a generated mesh polygon coverage that has a PAT with the paths of the summarized metadata files. With the advent of Avenue the hotlink can be tied directly to the MOSAIC metadata files to allow the user to access the MPGIS comprehensive metadata library.


Figure 3: ArcView display of a project with Finder and hotlink display to metadata

A view of the ArcView Application with
Finder

SUMMARY

The implementation of the Content Standards for Digital Geospatial Metadata (FGDC, 1994) is assisted by an examination of the systems and procedures used by an organization. This identifies information that common for multiple sets of data, information that is stored by the software system, and information which must be referenced. Information which is common for multiple sets of geospatial data can then be described once. Identification and documentation of common data processing steps supports the quality control / quality assurance program. That documentation can be referenced as part of the data set lineage. Machine stored information can be accessed and written to a metadata file as needed. The developer of the metadata can then focus on information that is different between the data sets and on the identification of documents that need to be referenced in the metadata.

MPGIS has selected simple text files for storing and maintaining the metadata on the geospatial data sets. These files are software independent and can easily be transferred between software and hardware systems. They are easily updated and can be concatenated to form a complete metadata description for data transfers.The files are generated using Arc Macro Language and form menus for data input and verification. Information that is stored by ArcInfo is then easily accessed and inserted into the text files as needed.

These text files are integrated into the systems used by MPGIS as the on line catalog of geospatial information for MPGIS system users. MPGIS uses ArcView 2 and MOSAIC as the catalog tools for displaying the geospatial data and associated metadata. SGML and HTML statements are embedded in these files for viewing in MOSAIC. In ArcView 2, the metadata files can then be accessed with the MPGIS FINDER and individual metadata files can be displayed using Avenue scripts. This application permits the development of uniform metadata for both data transfer and the on-line catalogs.

REFERENCES

ASTM D18.01-Z3947Z, 1994. Draft Content Specifications for Digital Geospatial Metadata. Philadelphia, PA:1995

Clinton, William, 1994. Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure. The White House, Washington D.C.:April 11, 1994. Executive Order 12906

Department of Commerce, 1992. Spatial Data Transfer Standard (SDTS). 1992: Washington D.C. National Institute of Standards and Technology: 1992. Federal Information Processing Standard 173

Federal Geographic Data Committee. 1994. Content Standards for Digital Geospatial Metadata. Washington, D.C.:June 8, 1994


David Hansen
GIS Specialist
Michael Sebhat
GIS System Administrator
MPGIS
U.S. Bureau of Reclamation
2800 Cottage Way
Sacramento CA 95825-1898
Telephone: (916) 979-2418
Fax: (916) 979-2450
E-mail: dhansen@mpgis7.mp.usbr.gov