David Hansen

Spatial Data Coordinate Quality and the ArcInfo Data Model

ABSTRACT: The Spatial Data Transfer Standards and the Content Standards for Digital Geopsatial Metadata call for a data quality report to accompany GIS data sets. Data quality of GIS data sets is often a difficult problem to address. The data quality report specified in these standards provides an initial basis for assessing the quality of the spatial components of a GIS data set. This report contains a description of the lineage of the GIS data set and an assessment of the positional accuracy of the coordinate values. The standards do not include elements which describe an error estimate for a confidence interval of the values, or the precision to which the values are carried.

The source of the spatial features and the digital capture method provides an initial estimate of the resolution of the spatial features, the accuracy of the coordinate values, and the precision of those values. Increasingly, GIS data sets are developed with measured or calculated coordinate values with known precision for those values and an error estimate with a stated confidence interval. Many data sets are not static but are updated with revised information on feature location. Increasingly, these updates include more accurate coordinate information than was available from the initial source document.

This information represents feature or entity level information that can and should be carried with the GIS data set. Much as perimeter and area are stored as attributes of a GIS data set, these characteristics of the features provide information that can be used in estimating the range of reported values of distance, length, and area calculations.

Presently the ArcInfo data model stores information which is applicable to the extent of the coverage and information which is system dependent. This includes the precision used for coordinate storage, the tolerances used in processing the coverage, and the log files. Coordinate values with error estimates is lineage information that should be captured from the sources of the features. As conventions develope for describing and handling this information, the GIS software can begin to use it in data processing. This will assist in evaluating the results of GIS analysis and in the development of error models for GIS processing.


INTRODUCTION

This paper concerns the data quality of the spatial features or components of GIS data sets. The Spatial Data Transfer Standard (SDTS)requires a data quality report to accompany transfer of digital spatial data. Components of this data quality report are further defined as elements in the FGDC Content Standards for Digital Geospatial Metadata as part of the data quality section. These documents provide a basis for identifying characteristics of GIS data sets that are useful in both managing GIS data sets and in describing their quality. While the focus of this paper is on the data quality of the spatial features, some of the material also relates to the attribute portion of the GIS data sets.

Data Quality in the Context of the Content Standards for Digital Geospatial Metadata

The section of the Content Standards on data quality contains most of the elements that are discussed.

Data Quality section
Attribute Accuracy
Elements describing table components of the GIS data set. It parallels the section on positional accuracy.
Logical Consistency Report
This is a description of the fidelity of relationships in the data set and the test used. It has been interpreted in various ways by different investigators. The logical consistency report may focus on the operation of the GIS software in maintaining the data structure. It may focus on the consistent reproduction of results in a GIS analysis. The focus may be on consistent representation of the same or similar spatial features. For the data set, it is implied that construction rules were applied uniformly over the entire extent of the data set.
Completeness Report
This report basically identifies the criteria or rules used in the development of the data set. Implied in the selection criteria is the resolution at which features were captured for the data set.
Positional Accuracy
A report on the horizontal and where applicable the vertical accuracy of the spatial data set. The emphasis is on accuracy of the coordinate values in the X, Y, and Z plane. Error estimates with a confidence intervals for these coordinate values are not explicitly described as elements nor is the precision of coordinate values. Other dimensions such as position, area, length, or orientation of the spatial features are not explicitly identified.
Lineage
This section describes the sources used in the development of data set and the processing history. It is the section that provides much of the supporting information for the positional accuracy report.
  • Source Citation
  • Source Scale Denominator
  • Type of Source Media
  • Source Time Period of Content
  • Process Description
  • Process Date
  • Process Time
  • Process Contact
In addition to the information called for in this section, the spatial reference information contains key parameters identifying the coordinate system or map projection and the resolution of those coordinates. As pointed out by Buttenfield, this information provides an initial basis for evaluating the data quality of a spatial data set for an application. It follows concepts followed in developing cartographic products and the National Map Accuracy Standard. However, it will fall short in providing information necessary to model error propagation when using a data set in GIS analysis with other data sets or in providing an accuracy assessment of the values reported by GIS.

Basis for Assessing Coordinate Data Quality

Currently, the primary basis for making a report of positional accuracy is the lineage report where the data set coordinates can not be measured against real world coordinates. In this report, an assessment can be made of the resolution and accuracy of coordinates from the source documents and the processing steps followed to develop the digital representation of the features.

Role of the Source Document in Coordinate Quality of GIS Features

The resolution and accuracy of the coordinates in the source document for the features in a GIS data set are usually the greatest source of error in the stored coordinate values (Goodchild, et. al.). The resolution of digitizing tablets or scanning devices is much finer than the symbolization that can be used on source maps. The precision of stored coordinate values in a GIS system is generally far greater than the values that we are able to capture from digital systems such as GPS devices. Major sources of GIS data sets are:

Maps containing the features to be captured
The resolution of the spatial features may be described in an accompanying report or may be inferred based on the scale of the map and the symbolization used to represent the feature.
Surveys of bearings and distances
Accuracy of the bearing and/or distances to the survey points should be available. This will be reported for individual coordinates and bearings. An error estimate at a stated confidence interval should be available. For current surveys, this is often available in a data base or digital form.
Coordinate values captured via photogrammetry
This should contain an estimate of accuracy for the values captured based on the procedures followed.
Global Positioning Systems(GPS)
An estimate of the accuracy of recorded coordinates should be provided along with the stream of coordinate values.
Other digital data sets
This could be other GIS data sets, remote sensing data and images. All of which should include an assessment of coordinate resolution and accuracy.

Where the source is a map, the map compilation procedures determine the resolution at which the spatial features are displayed on the map. Where this is not described, the source map scale and the line width or size of symbols used on the map serve as surrogates for estimating accuracy in the displayed representation of the features.

For GIS data sets that are constructed from survey data, photogrammetry, or GPS data, highly accurate coordinate information can be obtained. However, the estimated or calculated error must be carried with the survey data at a stated confidence interval. In addition to coordinate values, additional dimensions of the spatial features being represented such as area, orientation, and distance may be available.

Coordinate Processing

Significant additional errors can be introduced in the digitizing process by the placement of tics or coordinate control points. Control points are selected where the earth coordinates are accurately known and also clearly located on the source map or in the source files of coordinate values. The transformation process of the digitized coordinates to another coordinate system will introduce additional error dependent on the accuracy of of the control points. This error will not be uniform over the spatial extent of the data set.

In a GIS environment, new data sets are often generated by processing with other GIS data sets. Lanter models this progression to the end results of a GIS analysis in his lineage approach for managing GIS applications. It is a useful tool for maintaining and updating GIS data bases. These other data sets will often have different resolutions for the spatial features shown and different accuracies associated with their coordinate values. Durgin identifies some of the problems inherent when GIS processing includes map derived data with data sets developed from measurement based systems such as surveys.

As pointed out by Moellering, few GIS data sets are entirely static but are updated and revised over time or as new information becomes available.Data set lineage can become quite complex as a GIS data set incorporates this new information. Changes in coordinate locations may be due to boundary changes, more accurate coordinate information about location, the abandonment or construction of features, or geologic events. Often this affects only a portion of the features in a GIS data set.

All of this represents additional information that can be carried forward in describing the data set. Measured or calculated values of coordinate position and other dimensions provide important information in assessing data quality. Quantifying the variation in the quality of the spatial features in a spatial data set is beginning to be formally described. It will require an error estimate with a confidence interval and the level of precision for the values. This provides a basis for estimating the accuracy of the GIS data set in representing the location, area, and length of the feature. The effects of running an analysis with data sets that have different spatial resolutions on the resulting spatial features are more difficult to quantify (Buttenfield).

Generally, this information has been stored by identifying the source of this new information and a citation that describes this additional information. It has not been carried down to the individual spatial features themselves. Information on the measurements, their source, resolution, accuracy, and error represent attributes of the spatial features.

The ArcInfo Data Model and Coordinate Data Quality

There are several basic items within the ArcInfo data model that provide supporting information for assessing coordinate data quality.

TIC File
Identifies the control points used in the registration of the data set.
RMS Report
Report on the Root Mean Square error of the TIC locations when a data set is transformed from one coordinate system to another coordinate system.
Log File
Time stamped processing steps of the data set identifying the person doing the processing and the command executed.
Data Set Precision
Precision of the stored coordinate values
Fuzzy Tolerance
Resolution allowed between coordinates and whether that distance has been used in processing as the fuzzy tolerance.
Dangle Distance
For coverages containing arcs, identifies if a dangle distance has been set and used in processing.

In the ArcInfo data model, adjustments may affect the overall coordinate system or the coordinates of the individual features themselves. An example of an overall adjustment is the linear transformation of a GIS data set from one coordinate system into another. The tics that have been identified for the GIS data set act as the control for this transformation. The coordinate values for these tics in the old and the new coordinate systems determine the values in the equation to carry out the transformation. ArcInfo reports the root mean square error (RMS) in the displacement of these tic locations in the source material to the tic locations in the new projection system. While this report is not automatically stored by ArcInfo, the tics are a component of both the source data set and the resulting data set.

Once the transformation is made from the source coordinate system to another coordinate system, additional adjustments may be needed for individual features to better represent the "true" location of the data set.

Within a project area, usually a base data set is a control base to which the data is adjusted. In ArcInfo, adjustment of individual features takes place after the transformation to a new coordinate system. This rubber sheet process requires the setting up of links between features in the GIS data set and some control base. This control base may have been embedded in the source material such as a digital orthoquad. It may be a data set that was captured at a greater resolution. Increasingly for many areas it is a survey control base. Such a base offers the opportunity to compare on the ground measured or calculated values to the GIS stored values that have been carried into a map projection system.

Current and Future Developments in Coordinate Data Quality

Any estimate of the accuracy of the coordinate values of a feature requires some estimate of the true position of that feature. This in the past was comparison of the GIS representation of the feature to the original source data set or to a map of greater accuracy. Now, many more measurements of the true location of features are increasing available. These have been used to replace or update the existing GIS data sets.

Increasingly, many data sets will contain features where the accuracy of the coordinates of those features will be different. As procedures for measuring and displaying those differences improve they will be carried forward as part of GIS data set manipulation and analysis. Additonal information of the measured or calculated values for the features will also become available.

The data structure will become more comprehensive with the ability to directly link the GIS feature coordinates to data bases of available measured coordinates, distances, areas, elevations, and other dimensions. This information will provide a basis for producing an error estimate for the GIS reported coordinate values, area and length calculations.

SUMMARY

The FGDC Content Standards for Digital Geospatial Metadata requires a data quality report as called for in the SDTS. This report provides an opportunity to make a quantitative assessment of the positional accuracy of the features represented in a GIS data set. Such an assessment must rely heavily on the description of the data set lineage. This lineage identifies

At present, the report of coordinate quality has generally been limited. The accuracy of the coordinate values captured from the source document has often not been described or has been very limited. Increasingly, more accurate coordinate values are becoming available. These will include an error estimate for the value at a stated confidence level. Some sources can provide field measured or calculated values of the spatial features such as length and area.

This information represents additional attribute information that should be stored with the spatial feature. They serve as basic information for the features. They provide a basis for estimating the accuracy of the GIS reported values for distance and area calculations. This represents a more robust model of information that should be carried as part of a GIS data set. Much of it represents attribute information to be stored with individual features. In the context of the Content Standards for Digital Geospatial Metadata , it represents lineage information from the sources of the GIS spatial features. This information has not been formally identified as elements in these standards. In the ArcInfo data model, it represents additional attribute information that must initially be user defined. As conventions develop for addressing and storing this information, it can be incorporated in GIS processing to assist in evaluating the results of GIS analysis. Error models of the coordinate representations can then use this information in GIS processing.

REFERENCES

Buttenfield, Barbara P. . Cartographica Vol. 30 N0 2-3, Summer - Autumn 1993.

Department of Commerce. Spatial Data Transfer Standard . National Institute of Standards and Technology, FIPS 173. Washington, D.C. 1992.

Durgin, Paul M. Measurement Based Databases: One Approach to the Integration of Survey and GIS Cadastral Data Surveying and Land Information systems, Vol 53, N0. 1, 1993.

E.S.R.I. Metadata Management in GIS .Esri White Paper Series. Redlands, CA, August, 1995.

FGDC. Content Standards for Digital Geospatial Metadata . Washington, D.C. June 8, 1994.

Goodchild, M. F., Davis, F. W., Painho, M., Stoms, D. M.. The Use of Vegetation Maps and Geographic Information Systems for Assessing Conifer Lands In California Report prepared for the Forest and Rangeland Resources Assessment Program (FRRAP), California Department of Forestry and Fire Protection, August, 1991.

Hansen, David and Michael Sebhat. Compilation of Spatial Metadata for Access in ArcView and Mosaic . Paper presented at the 1995 E.S.R.I. User Conference. Palm Springs, CA May, 1995

Lanter, David P. A Lineage Metadata Approach to Removing Redundancy and Propagating Updates in a GIS Database . Cartography and Geographic Information Systems. Vol. 21 No 2 1994.

Moellering, Harold. Continuing Research Needs Resulting from the SDTS Development Effort . Cartography and Geographic Information Systems, Vol 21 No 3 1994.


David Hansen
GIS Specialist
MPGIS
U.S. Bureau of Reclamation
2800 Cottage Way
Sacramento, CA 95825-1898
Telephone:(916) 979-2418
Fax: (916) 979-2505
Email: dhansen@mpgis7.mp.usbr.gov