The source of the spatial features and the digital capture method provides an initial estimate of the resolution of the spatial features, the accuracy of the coordinate values, and the precision of those values. Increasingly, GIS data sets are developed with measured or calculated coordinate values with known precision for those values and an error estimate with a stated confidence interval. Many data sets are not static but are updated with revised information on feature location. Increasingly, these updates include more accurate coordinate information than was available from the initial source document.
This information represents feature or entity level information that can and should be carried with the GIS data set. Much as perimeter and area are stored as attributes of a GIS data set, these characteristics of the features provide information that can be used in estimating the range of reported values of distance, length, and area calculations.
Presently the ArcInfo data model stores information which is applicable to the extent of the coverage and information which is system dependent. This includes the precision used for coordinate storage, the tolerances used in processing the coverage, and the log files. Coordinate values with error estimates is lineage information that should be captured from the sources of the features. As conventions develope for describing and handling this information, the GIS software can begin to use it in data processing. This will assist in evaluating the results of GIS analysis and in the development of error models for GIS processing.
This paper concerns the data quality of the spatial features or components of GIS data sets. The Spatial Data Transfer Standard (SDTS)requires a data quality report to accompany transfer of digital spatial data. Components of this data quality report are further defined as elements in the FGDC Content Standards for Digital Geospatial Metadata as part of the data quality section. These documents provide a basis for identifying characteristics of GIS data sets that are useful in both managing GIS data sets and in describing their quality. While the focus of this paper is on the data quality of the spatial features, some of the material also relates to the attribute portion of the GIS data sets.
The section of the Content Standards on data quality contains most of the elements that are discussed.
The resolution and accuracy of the coordinates in the source document for the features in a GIS data set are usually the greatest source of error in the stored coordinate values (Goodchild, et. al.). The resolution of digitizing tablets or scanning devices is much finer than the symbolization that can be used on source maps. The precision of stored coordinate values in a GIS system is generally far greater than the values that we are able to capture from digital systems such as GPS devices. Major sources of GIS data sets are:
Where the source is a map, the map compilation procedures determine the resolution at which the spatial features are displayed on the map. Where this is not described, the source map scale and the line width or size of symbols used on the map serve as surrogates for estimating accuracy in the displayed representation of the features.
For GIS data sets that are constructed from survey data, photogrammetry, or GPS data, highly accurate coordinate information can be obtained. However, the estimated or calculated error must be carried with the survey data at a stated confidence interval. In addition to coordinate values, additional dimensions of the spatial features being represented such as area, orientation, and distance may be available.
Significant additional errors can be introduced in the digitizing process by the placement of tics or coordinate control points. Control points are selected where the earth coordinates are accurately known and also clearly located on the source map or in the source files of coordinate values. The transformation process of the digitized coordinates to another coordinate system will introduce additional error dependent on the accuracy of of the control points. This error will not be uniform over the spatial extent of the data set.
In a GIS environment, new data sets are often generated by processing with other GIS data sets. Lanter models this progression to the end results of a GIS analysis in his lineage approach for managing GIS applications. It is a useful tool for maintaining and updating GIS data bases. These other data sets will often have different resolutions for the spatial features shown and different accuracies associated with their coordinate values. Durgin identifies some of the problems inherent when GIS processing includes map derived data with data sets developed from measurement based systems such as surveys.
As pointed out by Moellering, few GIS data sets are entirely static but are updated and revised over time or as new information becomes available.Data set lineage can become quite complex as a GIS data set incorporates this new information. Changes in coordinate locations may be due to boundary changes, more accurate coordinate information about location, the abandonment or construction of features, or geologic events. Often this affects only a portion of the features in a GIS data set.
All of this represents additional information that can be carried forward in describing the data set. Measured or calculated values of coordinate position and other dimensions provide important information in assessing data quality. Quantifying the variation in the quality of the spatial features in a spatial data set is beginning to be formally described. It will require an error estimate with a confidence interval and the level of precision for the values. This provides a basis for estimating the accuracy of the GIS data set in representing the location, area, and length of the feature. The effects of running an analysis with data sets that have different spatial resolutions on the resulting spatial features are more difficult to quantify (Buttenfield).
Generally, this information has been stored by identifying the source of this new information and a citation that describes this additional information. It has not been carried down to the individual spatial features themselves. Information on the measurements, their source, resolution, accuracy, and error represent attributes of the spatial features.
There are several basic items within the ArcInfo data model that provide supporting information for assessing coordinate data quality.
In the ArcInfo data model, adjustments may affect the overall coordinate system or the coordinates of the individual features themselves. An example of an overall adjustment is the linear transformation of a GIS data set from one coordinate system into another. The tics that have been identified for the GIS data set act as the control for this transformation. The coordinate values for these tics in the old and the new coordinate systems determine the values in the equation to carry out the transformation. ArcInfo reports the root mean square error (RMS) in the displacement of these tic locations in the source material to the tic locations in the new projection system. While this report is not automatically stored by ArcInfo, the tics are a component of both the source data set and the resulting data set.
Once the transformation is made from the source coordinate system to another coordinate system, additional adjustments may be needed for individual features to better represent the "true" location of the data set.
Increasingly, many data sets will contain features where the accuracy of the coordinates of those features will be different. As procedures for measuring and displaying those differences improve they will be carried forward as part of GIS data set manipulation and analysis. Additonal information of the measured or calculated values for the features will also become available.
The data structure will become more comprehensive with the ability to directly link the GIS feature coordinates to data bases of available measured coordinates, distances, areas, elevations, and other dimensions. This information will provide a basis for producing an error estimate for the GIS reported coordinate values, area and length calculations.
The FGDC Content Standards for Digital Geospatial Metadata requires a data quality report as called for in the SDTS. This report provides an opportunity to make a quantitative assessment of the positional accuracy of the features represented in a GIS data set. Such an assessment must rely heavily on the description of the data set lineage. This lineage identifies
At present, the report of coordinate quality has generally been limited. The accuracy of the coordinate values captured from the source document has often not been described or has been very limited. Increasingly, more accurate coordinate values are becoming available. These will include an error estimate for the value at a stated confidence level. Some sources can provide field measured or calculated values of the spatial features such as length and area.
This information represents additional attribute information that should be stored with the spatial feature. They serve as basic information for the features. They provide a basis for estimating the accuracy of the GIS reported values for distance and area calculations. This represents a more robust model of information that should be carried as part of a GIS data set. Much of it represents attribute information to be stored with individual features. In the context of the Content Standards for Digital Geospatial Metadata , it represents lineage information from the sources of the GIS spatial features. This information has not been formally identified as elements in these standards. In the ArcInfo data model, it represents additional attribute information that must initially be user defined. As conventions develop for addressing and storing this information, it can be incorporated in GIS processing to assist in evaluating the results of GIS analysis. Error models of the coordinate representations can then use this information in GIS processing.
Department of Commerce. Spatial Data Transfer Standard . National Institute of Standards and Technology, FIPS 173. Washington, D.C. 1992.
Durgin, Paul M. Measurement Based Databases: One Approach to the Integration of Survey and GIS Cadastral Data Surveying and Land Information systems, Vol 53, N0. 1, 1993.
E.S.R.I. Metadata Management in GIS .Esri White Paper Series. Redlands, CA, August, 1995.
FGDC. Content Standards for Digital Geospatial Metadata . Washington, D.C. June 8, 1994.
Goodchild, M. F., Davis, F. W., Painho, M., Stoms, D. M.. The Use of Vegetation Maps and Geographic Information Systems for Assessing Conifer Lands In California Report prepared for the Forest and Rangeland Resources Assessment Program (FRRAP), California Department of Forestry and Fire Protection, August, 1991.
Hansen, David and Michael Sebhat. Compilation of Spatial Metadata for Access in ArcView and Mosaic . Paper presented at the 1995 E.S.R.I. User Conference. Palm Springs, CA May, 1995
Lanter, David P. A Lineage Metadata Approach to Removing Redundancy and Propagating Updates in a GIS Database . Cartography and Geographic Information Systems. Vol. 21 No 2 1994.
Moellering, Harold. Continuing Research Needs Resulting from the SDTS Development Effort . Cartography and Geographic Information Systems, Vol 21 No 3 1994.