David T. Hansen

Visualizing Uncertainty Captured from Source Documents with GRID

For much thematic data, quantitative estimates of spatial accuracy are limited to: However, qualitative information is frequently contained in the source documents. This information is valuable for the display and analysis of the thematic data. Qualitative estimates of data quality are not explicitly identified as elements in the Content Standards for Digital Spatial Metadata and related documents.

Qualitative information is often contained in the symbology used in cartographic products. It has been carried as information in the map legend or in map unit descriptions. Common types of information are the uncertainty in the location of some of the map features, the recognition of inclusions of other features within map units, or descriptions of boundary conditions. It may include descriptions of the type and characteristics of the contact between map feature. Frequently, this qualitative information is not uniform across the extent of the feature.

The compiled map sheets by Atwater of surficial geology in the Sacramento - San Joaquin Delta region of California made extensive use of line symbology and spot symbols. Separate line symbols indicate the uncertainty or ambiguity in the placement of the boundary between one surficial geologic unit and another. This symbology is captured as attributes for the arcs representing the polygons of the units. GRID is then used as a tool to examine the uncertainty in the placement of the boundaries based on the original symbology.

For this thematic data of surficial geology, the uncertainty in boundary location is much larger than the accuracy of digitally capturing the line work. Carrying this information forward with the digital data provides opportunities for displaying this uncertainty. In addition, the qualitative information provides a basis for analysis in the digital environment.


Introduction - Uncertainty indicated in Source Documents

The spatial accuracy assessment identified in the Content Standards for Digital Geospatial Metadata emphasize quantitative measurements. Quantitative measurements are limited for many of digital themes. The FGDC workbook on metadata (1995) identifies several optional methods for reporting positional accuracy. Only two methods apply to much of our thematic data: For some thematic data, these tests also apply to the evaluation of attribute accuracy. Qualitative information about the features are often contained in the source documents.

Both the Federal standards and the complimentary ASTM specifications (ASTM D5714) are intended to provide an evaluation of the information contained in the digital data. This evaluation is a comparison of the digital representation against our model of the real world features. This evaluation can focus narrowly on digital data. Optionally, It can capture information on the model used in the source documents of the real world phenomena. Generally, we digitally capture only a portion of the information contained in the source documents for complex thematic data. This is supplemental information on the quality, completeness, and consistency in the mapping of the real world phenomena.

This broader interpretation requires an understanding of the objectives and methods used in constructing the source documents or maps, and constraints faced in preparing and symbolizing those maps. The standards call for a description of the source materials when describing the lineage of the data in the data quality section. Description of the symbology used or notes about features being represented are often overlooked in the description of the source documents. The homogeneity of the classes as well as the boundaries separating the separate classes are usually not uniform in the real world for many thematic data layers. Our digital representation as points, lines, and polygons represent these characteristics uniformly across the extent of the feature. Boundaries between polygons are treated as sharp uniform boundaries. Variations in the composition of map units or boundary characteristics are often not captured as part of the digital product. This information can be valuable in the processing and analysis of spatial data.

Buttenfield explores these issues in her discussion on accuracy, logical consistency and completeness of spatial information. The description of data quality varies with the types of data that we use. Quantitative estimates of positional accuracy are well addressed for measurement based systems. An accurate description of boundary position does not provide information on other boundary characteristics that may affect the spatial accuracy. These boundary conditions may vary over the extent of the feature. In addition, the feature itself may not be uniform and homogenous across the extent that is mapped. The analysis of spatial variation often involves examination of the data with other information which may not be clearly associated or identified with the data set. This information has often not been easily captured as part of the digital product.

Beard and MacKaness point out that uncertainties and ambiguities which are readily apparent in the source material are no longer visible in the digital environment. They and McGranaghan explore the use of graphical methods for the display of qualitative and quantitative data quality information. Qualitative information is particularly important in conveying uncertainties associated over the extent of the data. Spatial variation and uncertainties present in mapping geographic features have often been recognized by symbology, in map legends, or in map unit descriptions. Displayed visually, this qualitative information is useful in an exploratory environment. Relationships inherent to the data and variation in characteristics such as location, size, or boundary conditions can be displayed and explored.

Raster processing and GRID are useful tools in the analysis of the variability in the location or composition of geographic features. Soil mapping is often cited as an example of the ambiguities and uncertainties in thematic data. Map unit composition is not uniform between or within map units. The map unit components and inclusions are often described by landscape position. Components can vary across the extent of the map unit. Boundaries between map units may be diffuse or abrupt. Seelig, Richardson, and Knighton describe raster processing in the design and mapping of soils in North Dakota. Their study uses grided data with principal component analysis and krieging of soil characteristics. Hoogerwerf and Busink reported on using GRID in Monte Carlo simulations in a model of nitrate leaching using soil maps. This model depended on the characteristics identified for the main map unit components and inclusions. In running these simulations for the overall model, the boundary between their map units began to blur.

Boundary conditions between map units also vary spatially. This information is useful not only for conveying data quality information Boundary conditions recognized when mapping occurs for thematic data are also important. This is particularly true where:

Raster processing with GRID provides opportunities for analysis and display of the characteristics or uncertainty associated with boundary conditions. Geologic mapping often makes use of line symbology for identifying boundary characteristics.

Representing Uncertainty of Boundaries in GRID

Atwater (1982) recognized uncertainty as a characteristic in the boundaries in his mapping of surficial geologic units in the Sacramento - San Joaquin Delta of California. The map compilation uses different line weights and line symbols for the contact between units. Line symbols represent the ambiguity or uncertainty in boundary placement as follows:
Solid heavy weight line
Boundary generally accurate within 150 meters.
Dashed heavy weight line
Boundary may err by more than 150 meters.
Dotted line
Boundary is concealed by flooding.
In addition to the use of this symbology, the line work does not completely define the boundaries between map units. The map user infers the missing boundary contact based on features displayed on the map.

The digital capture of this map information included both the capture of the surficial geologic units as polygons and the line symbol representing the boundaries. Polygons representing the geomorphic units carry the map symbol for that unit. The lines representing the polygons carry attributes representing the line symbology. Figure 1 shows one quad in the Delta representing a portion of this data with geologic units and line symbols.


Portion of Geomorphology in the Delta showing
line symbology for polygon boundaries.

The source maps were compiled on the of 1:24000 scale U.S. Geological Map series for the Delta region. A full legend and report describing the surficial units accompany the compiled maps (Atwater, 1982). The line weight used for line symbols of boundaries represents about 20 to 25 meters on the ground. Table 1 shows the uncertainties were associated with the line symbology.

Line SymbolSource Description Distance
Solid Line Boundary accurate within 150 meters 150 meters
Dashed Line Boundary may err by more than 150 meters 225 meters
Dotted Line Boundary concealed by flooding 300 meters
Inferred Line Boundary not indicated and added to close polygon 450 meters

Atwater identified only a specified distance for the solid and dashed lines. Distances or zones of uncertainty were assigned to the other lines for display and analysis. This line symbology represents the uncertainty associated with the placement of the boundary. This uncertainty is larger by an order of magnitude than the positional accuracy associated with the digital capture of the lines. However, it does not display the extent that this uncertainty affects the map units. GRID provides a variety of opportunities to display this uncertainty.

Figure 2 shows one quad of this data with areas of uncertainty modeled in GRID. High lighted squares are enlarged in subsequent figures. Each square is 1500 meters on a side.


GRID Model of surficial geology and uncertainty in
the position of boundaries for one quad of data.

Surficial Geology in the Delta

The delta area is the largest estuarine environment on the west coast of the United States. It is an area of active deposition. Elevation ranges from about 15 meters below sea level to about 50 meters where alluvial fan or bedrock units extend well above sea level. Generally, the elevation varies from only a few meters above and below sea level. Since the beginning of the Pleistocene, sea water has advanced and retreated through the delta area. Tidal sediments are interlayered with alluvium from the major rivers draining the Sierra Nevada and the Coast ranges through the Central Valley of California. In addition, eolian sediments have accumulated in areas of the Delta as the climate has cycled from wet to arid periods during the Pleistocene and early Holocene.

Peat and muck are associated with tidal wetlands. During periods when sea levels are high this unit covered most of the Delta. Alluvium from the major rivers of the San Joaquin and Sacramento mantle and have built natural levees along the main stream channels. Minor streams from the adjacent mountains mantle the edges of the Delta at slightly higher elevations.

The Delta has been greatly affected by human activities. The major streams feeding fresh water into the Delta have been dammed to provide hydro power and water supply. The Delta serves as the major conduit to supply fresh water to parts of the San Joaquin Valley and southern California. The rich soils of the peat and organic soils in the Delta have been farmed since the turn of the century protected by levees constructed along the channels. When drained and farmed, the organic soils are subject to subsidence or rapid decomposition. Many of the drained areas in the Delta have subsided several tens of meters below sea level since the turn of the century.

Modeling the Geologic Units and Boundaries for the Delta

Some map units can be expected to occur at certain elevations. Certain units occur in particular areas. Atwater discusses the relationship in elevation and location of the units in the report accompanying the map sheets. The percentage of or type of inclusions present within the units is not identified. Elevation and location within the Delta are factors used in modeling the occurrence of map units where uncertainty in the boundary position is greater than 225 meters.

The process followed in GRID is:

  1. Uncertainty is associated with each line based on line symbol.
  2. Range of uncertainty created with the focal function.
  3. Within the bands of uncertainty of 150 and 225 meters adjacent geologic units are allowed to migrate as a local function. This based on the assumption that the geologic unit at particular location within the zone is either the original unit or the adjacent unit.
  4. For larger zones of uncertainty, GRIDS of elevation and separate geologic units were used to randomly assign geologic units.

Figure 3 shows the migration of geologic units within GRID where the polygons on the source map are bounded by solid or dashed lines. The cell size is 25 meters.


Display of GRID model of alternate values for
geologic units in area of solid and dashed lines.

For boundaries where the contact between units is either hidden by flooding or not clearly delineated on the source maps, probabilities are assigned to the map units. Surface elevation and the area within the Delta is used to assign a probability of that unit occurring at a location. The TOPOGRID function was used to generate a elevation grid from the U.S. Geological Survey 1:24,000 scale hypsography data for the area. Probabilities based on elevation and location for the various geologic units are:

Geologic UnitElevation (m) Probability
Qpm- Tidal Peat and Muck -15 to -5 0.90
-5 to 0 0.50
0 to 2 0.20
Qm2e - Eolian deposits in Upper Pleistocene -10 to -5 0.20
-5 to 5 0.50
Qymc - Holocene alluvium of local drainages -1 to 5 0.60
Qym - Holocene alluvium of local drainages 5 to 20 0.70
Qmz - Alluvial fans at edge of Delta 5 to 20 0.20
20 to 50 0.90
Qds - Hydraulic dredge spoils along channels -5 to 15 0.30

The random number generator in GIRD is used in the assignment of geologic units in the area of uncertainty for a boundary. This assignment is based on grids of the maximum extent of each unit and the elevation at the location. Figure 4 shows the assignment of geologic units within GRID where the delineations on the source map are inferred and a zone of uncertainty of 450 meters was assigned. The cell size is 25 meters.


Display of GRID model of assigned values for
geologic units using the random number generator
in an area of inferred lines.

Summary

The source maps, map legend, and report provided the information used for this analysis. The source maps and map legend clearly identified that there is uncertainty in the placement of boundaries. In the digital capture process, some boundaries were not clearly delineated on the source maps and were inferred. Zones of uncertainty were developed in GRID based on line symbols for the arcs representing the boundaries. The description of the geologic units and depositional environment provided the basis for further analysis in GRID. The uncertainty identified for the location in the boundaries between geologic units is 150 meters or more. This is far greater than the accuracy identified in digitally capturing this information.

The uncertainty in the placement of the boundaries is one characteristic of thematic data. Other common characteristics are the composition of map units, the distinctness in the contact between units, transition zones from one unit to another, and discontinuous contacts between units. This information is often not captured as attributes in GIS. It is information that is often apparent in the source documents. The map sheets, map legend, or reports often convey information which may be qualitative about these characteristics. This and other qualitative information is useful in the application and display of the digital information.

In capturing digital data, we generally capture a portion of the information available for the geographic feature. This is the data which is critical for our application. The metadata for our digital data set will provide a quantitative estimate of the accuracy in representing the source document. This estimate may be misleading. As metadata represents a condensed summary of the information and quality of the digital data, the digital data represents the capture of a portion of the information on thematic maps. Qualitative infomation from the source materials can assist in describing the quality of the digital model in representing the real world features. The identification of source maps used to capture the digital data should identify any reports or legends associated with those maps. Salient characteristics of the features being captured and any quantitative or qualitative information about mapping of the feature should be identified. This can assist in the analysis and use of the digital data.


References

ASTM D5714-95, Specifications for Content for Digital Geospatial Metadata , 100 Barr Harbor Drive, West Conshohocken, PA 194428-2959.

Atwater, Brian, Geologic Maps of the Sacramento - San Joaquin Delta, California. Miscellaneous Field Studies Map MF-1401. Denver CO: U. S. Geological Survey, 1982

Beard, Kate and William MacKaness, Visual Access to Data Quality in Geographic Information Systems. Cartographica. Vol 30 No. 2-3:1993

Buttonfield, Barbara, Representing Data Quality. Cartographica. Vol 30 No2-3: 1993

Federal Geographic Data Committee, Content Standards for Digital Geospatial Metadata. Washington, D.C., June, 1994.

Federal Geographic Data Committee, Content Standard for Digital Geospatial Metadata Workbook. Version 1.0: March 24, 1995

Hoogerwerf, M.R. and E. R. V. Busink, Error Propagation in GIS Models., Proceedings of the Fourteenth Annual Esri User Conference, May, 1994.

McGranaghan, Matthew, A Cartographic View of Spatial Data Quality. Cartographica. Vol30 No 2-3:1993

Seelig, B. D., J. L. Richardson, and R. E. Knighton, Comparison of Statistical and Standard Techniques to Classify and Delineate Sodic Soils . Soil Science Society of America Journal. Vol 55 Pg 1042-1048, 1991.


David T. Hansen
GIS Specialist / Soil Scientist
U.S. Bureau of Reclamation
Mid Pacific Region
2800 Cottage Way
Sacramento, CA. USA 95825-1898
Phone: (916) 978-5268
FAX: (916) 978-5290
Email: dhansen@mp.usbr.gov