David T. Hansen
Ideally, an assessment of digital spatial data quality is a comparison of our digital representation to the known feature on the ground. It includes qualitative as well as quantitative measures. This comparison involves more than the coordinate location of a feature. It includes other characteristics such as shape, direction (or angular measure), distance, area, and topology. Some of these characteristics are well defined and recognized for many geospatial themes. Other characteristics are not clearly described for the variety of spatial data that we capture.
We use the relative importance of these different spatial characteristics in the display and manipulation of the data. The development of common terminology describing spatial characteristics assist in the development of software for manipulation and display. It can also assist in the development of methods for qualitatively and quantitatively analyzing and displaying the results of GIS processing.
As a focus for discussion, accuracy and uncertainty of spatial characterics for three digital data sets will be presented.
In 1995, FGDC provided a workbook to assist in their implementation. This workbook suggested several possible tests for assessing positional accuracy. Tests suggested are:
Consensus definitions of spatial data quality for much of our data are not well developed. Buttonfield (1993) reviews the meaning of data quality applied to spatial data and the impediments to common terminology. The formal identification of these characteristics provide a basis for evaluating the digital representation. They provide a basis for describing, analyzing, and displaying the feature (Buttonfield, 1993). Different data themes can be expected to require different methods for assessing spatial data quality. Moellering (1994) in his report on the implementation of the Spatial Data Transfer Standard (SDTS) recognizes work is needed to identify appropriate tests for different data themes. Moellering stress the need for identifying tests to efficiently capture quantitative data quality information. Qualitative assessments identifying levels of uncertainty may only be possible for some spatial characteristics.
For the purposes of discussion, this paper will examine issues of data quality for the features contained in several data themes.
An accuracy assessment implies an understanding of the features that we are digitally capturing, the characteristics of digital objects representing these features, and the effect of any spatial transformations. Our digital representation of real world phenomena consists of models built with points, lines, polygons, or raster cells. Typically, these geometric structures are built with coordinate pairs, strings of coordinates, or from the measurement of angles and distances. They may be defined by a cell size starting from a coordinate pair. The spatial characteristics of this geometric representation is affected by the defined characteristics of a map projection onto a defined spheroid of the earth.
An accuracy assessment should begin by identifying the spatial characteristics of the feature that is to be digitally represented. Variation or uncertainty in spatial characteristics of that phenomenon need to be recognized. More information may be available for some spatial characteristics than others from the original source of the data. These other spatial characteristics can affect the evaluation of positional accuracy for this data. A positional accuracy assessment alone may be misleading and fail to provide the information needed for the effective use and application of the digital data.
Identifying spatial characteristics important for various data themes has led to software development such as network analysis, dynamic segmentation, and regions. Dutton (1996) identifies several potential software developments that are possible when a quantitative assessment of spatial accuracy is linked directly to coordinate locations. In his encoding scheme, the user is able to identify the topological role and dimensions, locational importance, uniformity of the features extent, and feature priority in processing. He points out that having access to this information can provide opportunities in resolving conflicts in the display and labeling of features as well as in identifying the coordinate structures that should be used and maintained in overlay and other processing.
Defining these spatial characteristics forms a basis to begin modeling error. This includes identifying the type of error distribution and methods of estimation for a spatial characteristic of a feature. Measurement based systems develop error estimates based on a normal distribution of error for repeated measurements and reduntant measurements back to the same point from other points. This permits correction for distortions introduced by map projections and the differences in actual elevation and the spheroid surface to length and area measurements of survey data (Buckholder, 1993 and Durgin, 1993). Other spatial characteristics may require the use of another error distribution.
As part of the framework for this geodetic network, the U.S. Bureau of Reclamation (USBR) in conjunction with the USGS, and NGS is developing a network of stations in the San Joaquin - Sacramento Delta region of northern California. This network of approximately 30 stations is focused on development of vertical and horizontal control for the Delta region and will tie back to vertical control that has recently been completed for the San Francisco Bay region.
Interest is focused on this area because of flooding from the levee failures during January of 1997. Shortly after flooding began in the area, USBR assisted other Federal and State agencies in developing a coverage of the levee system. The initial source of this information came from USGS 1:24,000 scale topographic maps for the Delta. The control network developed with NGS will be used to adjust the location and elevation of the levee network using a combination of GPS equipment and photogrammetry.
The digital data collected on this network for horizontal and vertical control is a measurement based system with coordinate locations for established sites. In the case of GPS, repeated estimates of horizontal and vertical location are made during the time that the GPS unit is at the site. This is processed to provide an estimate of the true XYZ location with an error estimate. The target value for this project is plus or minus 2 cm.
To become part of the National geodetic data base, the capture and recording of these locations must follow the Input Format and Specifications of the National Geodetic Survey Data Base 1994. As part of the requirements, each station location has a monument set in a stable location with a description and map of that station. This information becomes part of the record for that station. The spatial data quality report for this data references the NGS geodetic standards and applicable reports and field records.
The development of this base survey control has major implications for the already captured levee theme and other existing GIS themes. GPS coordinates and photogrammetry for levee locations and elevations will be based on the new datums. Existing data is based on 1:24,000 scale USGS topographic maps constructed on the NAD1927 horizontal datum and NAD1929 vertical datum. Existing data captured at 1:24,000 will be transformed to the new datums.
For the Delta area, adjustment of features to the new datum is expected to have minor effects on the relative position of features although it will affect latitude and longitude values. Positional changes relative to other features due to the transformation to the 1984 datum are expected to be in the range of 1 meter. This adjustment should have little affect on topological relationships within or between themes.
Major adjustments will occur for digital features which will have
more accurate location and elevation information. At least portions of
the levee theme will be replaced by more accurate data from GPS
or photogrammetry. New levee data will be representing the crest or
centerline and elevation of the levees.
Figure 1 shows a portion of the existing digital data for levee system
along the Sacramento River.
As a digital line, this coverage has no width. The
levees, however, have a base and crest width which is variable along
the length of the system. For display, an assigned width of 20 meters is shown
as the dark shaded area.
Expected shifts in the placement of the levees based on the survey
information are expected to be from 10 to 50 meters. This is shown as the
lightly shaded area around the existing digital lines for the levees.
A hypothetical GPS survey along a road on the crest of the levee
system is shown by the dotted line.
The GPS data will not be to the resolution of the control established for the NGS sites. Points along the levee system will attempt to capture the location and elevation of the crest. Each point will have a precisely recorded location and elevation but will also have width. This width is the uncertainty of the point reading being at the true centerline. The uncertainty of the location of the crest or centerline of the levee is greater than the recorded location of the survey reading. The variation expected in both the location of the centerline and the elevations recorded for the levees will be addressed in the survey instructions. Estimates of this variation will be based on these instructions and on evaluations made of the procedures that were followed.
For the levee system, some features will be replaced by more accurate survey methods. The location of the crest and associated elevation can then be reported with a range in values determined by an evaluation of the survey methods and tolerances. Other levees will be adjusted to match to these new features but with a different level of uncertainty. The accuracy or uncertainty associated with the location of the crest and levee elevation is most efficiently handled at the feature level. Attributes for the arcs representing the levees will contain the uncertainty associated with both location and elevation.
The Geographic Coordinate Data Base (GCDB) of the U. S. Bureau of Land Management (BLM) is a database of geographic coordinates for recorded survey points of the Public Land Survey System (PLSS). PLSS is a measurement based system with an extremely long lineage. In the terms of the spatial data organization section of the Content Standards it is a local planar coordinate system of measured bearings and distances. A major component of the PLSS are section corners which have generally lacked geographic coordinates. The Bureau of Land Management is in the process of producing calculated geographic coordinates for not only section corners but all other corners for each township.
For many areas in the United States, these corners have served not only in the transfer of public land into private ownership, but also as reference points for documenting the location of other information and surveys. As part of the survey instructions, section corners and other key points of measurement are monumented during the field survey. Other points are located in the field following the survey instructions in force at the time of the survey. These recorded measurements serve as the basis for the area of sections and other subdivisions of a township.
These points represent true points in our GIS systems in that they have coordinate locations but with no other dimensions. Corners with monuments that can still be found on the ground are well defined points. The real world location of other points are less certain but their location can be recovered based on the survey record and any on the ground evidence. In our terms, the record defines the topological relationship between the points of measurement. This topological relationship and measured distances and angles are the key spatial characteristics of the survey. GCDB represents a summary of this record with an assignment of latitude and longitude based on the best fit to the survey record.
The best available information for linking found corners with geographic coordinates provides geographic coordinate control. This control ranges from survey level GPS data to the digital capture of found corners on the USGS 1:24,000 scale maps. An error estimate is assigned to these control points based on the source. A separate error or reliability value is carried as an attribute of each point. This estimate depends on the software used to generate the geographic coordinates for the township. One set of software holds control points for the township fixed and a reliability of the calculated geographic coordinates is reported in ground units. The other set of software have an error estimate for the control points used in the survey. The points are allowed to float to provide a best fit with the survey record. Error ellipses are produced which reflect the internal accuracy of the survey record. It does not produce an error estimate for the on the ground reliability of the geographic coordinates.
For example, Township 18 North, Range 4 East of the Mount Diablo Meridian in California has only a seven points with remaining monuments. The only source for their location are USGS 1:24,000 scale topographic maps. Other coordinate values were generated based on the survey record with the control points allowed to float. Error ellipses were generated for all points based on the survey record internal to that survey. Table 1 shows the range in values for the 390 points in this survey.
Table 1: Reported Error Ellipses for Township 18 North Range 4 East of Mt. Diablo Meridian Error Percent of Section Corners Percent of Other Corners Meters Semi-Major Semi-Minor Semi-Major Semi-Minor Axis Axis Axis Axis ________ ____ ____ ____ ____ 5 to 10 36.0 70.0 16.0 19.0 11 to 20 47.0 29.0 39.0 40.0 21 to 30 16.0 1.0 32.0 40.0 31 to 40 0.5 -- 3.0 0.5 41 to 50 -- -- -- -- 51 to 65 0.5 -- 10.0 0.5 ______________________________________________________________________________
The overall coordinate control of found corners has an estimated on the ground error of up to 12 meters. The location of the corners defining the sections and other subdivisions within the township has an uncertainty based on the survey record. The points for section corners in the reconstructed township have error ellipses whose the major and minor axis are less than 30 meters with most less than 20 meters.
The points in the GCDB represent the nodes of arcs which define the boundaries of sections and other subdivisions in PLSS. Part of the survey record identifies distances to points of measurement and the area of these subdivisions. The area of these subdivisions as well as the length of the arcs defining them are important spatial characteristics for this database. This information provides an opportunity for estimating the accuracy of the GIS theme developed from GCDB and the recorded values for the subdivisions recognized in the township.
GCDB represents a summary of the review of the survey record. It identifies the accuracy for the control used to generate the geographic coordinates and maintains the internal consistency of the topological relationships in the survey. It can by updated as more accurate control information is available.
Another GIS theme from a 1:24,000 scale source is a study of geomorphology done by Brian Atwater (1981). This study in the Sacramento - San Joaquin Delta region identifies major surficial geologic units, provides relative ages for those surfaces, and describes the depositional environment prior to major alteration of the system after 1850. This thematic data does not represent well defined points but linear and aerial features. Atwater makes extensive use of symbology to provide information about the surficial features mapped in this study. He discusses the identification of the features in the field and uncertainty associated with their location and boundary conditions in his report. In addition to uncertainty of boundary location, the contact between some features is not abrupt but diffuse.
Line symbology is used to identify the level of uncertainty in the location of contacts between surficial units or other features on the compiled maps. A solid line indicates that the boundary is within 150 meters. A dashed line indicates that the boundary may be in error by more than 150 meters. Other line work identify features that may be in error between 300 to 450 meters or by more than 450 meters. The legend also indicates whether the contact between units is abrupt, or gradational.
Table 2 - Dimensions of Line Features in Delta Geomorphology Study Indicated Uncertainty Width at 1:24,000 Scale On the Ground On Map _____________________ _______________________ 150 meters 0.625 cm on map 300 meters 1.250 cm on map 450 meters 1.875 cm on map _____________________ ________________________
Depending on the feature being represented on the source maps, the line width used in the map compilation was about 0.05 to 0.10 cm. This represents about 10 to 25 meters on the ground. Features were carefully digitized from the source maps and their placement, attributes, and topology were verified in check plots against the source maps. An estimate of the accuracy of the location of the lines in this data could conclude that the placement is within 15 meters of their location on the source maps. If we considered the line weight shown on the compiled maps, we could conclude that there is additional uncertainty in the placement of the boundaries of about 25 meters. However, the actual uncertainty of these features is actually considerably larger. This uncertainty is not identified on the compiled maps but is described in the report and legend for the maps.
The uncertainty of the line placement has a considerable effect the
area of polygons as well as the length of linear features.
Figure 2 illustrates the uncertainty associated with boundaries for a portion
of this study. This figure shows an area of tidal deposits and their
contact with Pleistocene age alluvium. The estimated low tide line of 1850
predating the construction of levees is also shown.
This information is valuable in the use and application of this data. For example, the location of the boundary between the area of tidal influence and alluvium from the major streams is useful in identifying archeological sites. The amount of uncertainty would not be apparent to the user of this data without reference back to the separate legend and report accompanying the map sheets. It would not be indicated by an accuracy report that relied only on measurements of the check plots against the compiled map sheets.
These characteristics are not uncommon with other digital themes. The points, lines, polygons, and other features that we digitally capture often have dimensions and other characteristics. This information on the characteristics of the features that we initially capture as points, lines, or polygons can have a significant effect on the uncertainty associated with that feature. This uncertainty affects not only location but other spatial attributes for the feature.
The real world dimensions of our spatial features form the basis on which to assess the accuracy of the digital representation. Errors affecting characteristics such as length, area, and direction introduced by a projection system can be factored out for measurement based systems such as land surveys. Other digital themes have seldom made such adjustments because the real world dimensions have not been as well defined or captured with sufficient resolution.
Accuracy statements for the digital representation of a feature need to recognize the uncertainty associated with the spatial characteristics of that feature in the real world. These features may have real world dimensions which should be considered in preparing an spatial accuracy assessment.
Capturing and storing real world measurement information is increasingly common. Much of this information is at the feature or entity level. Storing this information as part of the feature attributes provides opportunities for resolving coordinate conflicts in the display and labeling of features. It improves our ability to maintain and update our data. As these characteristics become standardized, it will provide the opportunity for software developments in modeling and processing our data. It will also assist in the development of uncertainty or probability values for our digital products.
Beard, Kate and William MacKaness, Visual Access to Data Quality in Geographic Information Systems. Cartographica. Vol 30 No. 2-3:1993
Bureau of Land Management Branch of Cadastral Survey, California Geographic Coordinate Data Base (GCDB) Users' Guide. Sacramento, CA: 1996.
Burkholder, Earl, Design of a Local Coordinate System for Surveying, Engineering, and LIS/GIS. Surveying and Land Information Systems. Vol 53 No 1 pg 29-40 : 1993
Buttonfield, Barbara, Representing Data Quality. Cartographica. Vol 30 No2-3: 1993
Durgin, Paul, Measurement Based Databases: One Approach to the Integration of Survey and GIS Cadastral Data. Surveying and Land Information Systems. Vol. 53, No 1 pg 41-47: 1993
Dutton, Geoffrey, Improving Locational Specificity of Map Data - a Multi-resolution, Metadata-driven Approach and Notation. International Journal of Geographic Information Systems. Vol 10 No 3 pg 253-268: 1996.
Federal Geographic Data Committee, Content Standards for Digital Geospatial Metadata. Washington, D.C., June, 1994.
Federal Geographic Data Committee, Content Standard for Digital Geospatial Metadata Workbook. Version 1.0: March 24, 1995
Goodchild, Michael F., Closing Report, NCGIA Research Initiative 1 Accuracy of Spatial Databases. National Center for Geographic Information and Analysis. University of California Santa Barbara:January 1992
McGranaghan, Matthew, A Cartographic View of Spatial Data Quality. Cartographica. Vol 30 No 2-3:1993
Moellering, Harold, Continuing Research Needs Resulting from the SDTS Development Effort. Cartography and Geographic Information Systems. Vol 21 No 3:1994 pg 180-189
National Geodetic Survey, Input Formats and Specifications of the National Geodetic Survey Data Base. NOAA Department of Commerce Vol 1: September 1994
National Geodetic Survey Standards for Horizontal Control and Content Standards. http://www.ngs.noaa.gov/FGCS/metadata.html