Modeling Spatial Uncertainty in Analysis of Archeological Site Distribution

Authors: David T. Hansen, G. James West, Barbara Simpson, Pat Welch


Abstract

In developing data for the analysis of archeological site distributions along the American and Cosumnes Rivers of California, it was recognized that there is considerable uncertainty in the spatial representation of site locations. There is uncertainty in the location and dimensions of known prehistoric sites. It is suspected that errors in the position of these sites affect the ability to model the relationship of these sites to key features associated with prehistoric site record. This study relies on the weights of evidence extension to ArcView 3.x to generate predictive surfaces. In this extension, prehistoric sites are represented as point data and weights are calculated for associated evidential themes of distance to stream channel, landforms, surface elevation, and slope.

Uncertainty site location are not directly modeled for point data. Instead proximity to point data or generalization of other features for association with point data was used. Buffering key features was used to effectively model proximity to point locations. Proximity to landform contacts was found to be useful in developing predictive surfaces for separate study areas of these rivers. Generalization of the 30 meter DEM data for elevation and slope as well as other processing of the DEM data provided some insights on the relationship of these themes to site locations. They also added to the predictive model. This will assist the U.S. Bureau of Reclamation in improving our predictive modeling capability for archeological sites.


Introduction

Prehistoric site locations are often associated with natural physical features. Archeologists associate site locations with choices made by prehistoric groups that met their needs at that time. Sites have developed on and in response to changing landscapes.An understanding of the underlying probability distributions of prehistoric sites requires identifying features or variables associated with these site locations. This should improve modeling for prediction of site densities.  GIS and automated spatial analysis provides the opportunity to examine large amounts of data to assist in modeling for site prediction. The effectiveness of this modeling is often limited by the quality of information associated with the archeological sites. Characteristics captured in detailed site descriptions often do not match those same characteristics in our digital representation of natural features. This is often compounded by uncertainty in site location.

This study is on prehistoric site distributions for two river systems on the eastern side of the Central Valley of California, the American River and Cosumnes River. The study area is entirely within Sacramento County and extends from the mouth of both river systems up to the lower foothills of the Sierra Nevada. This area has some of the highest recorded site densities for central California. The cultural chronology for these sites extends back 4500 years. However, most occupations date within the last 2500 years. Many of these sites indicate repeated occupations over time.

For the purpose of this study, the lower 47.9 kilometers (29.8 miles) of the American River and 58.6 kilometers (36.4 miles) of the Cosumnes River are included. The stream channels of these systems are within 20 kilometers of each other. The overall intent of the program is to characterize prehistoric settlement patterns and densities with these goals:

  1. Compilation of an accurate database of known site locations and temporal affiliation
  2. Identify and evaluate characteristics or variables which influence site densities
  3. Improve modeling capabilities for prediction of site densities (West et. al.,2002).

For this analysis, the weights of evidence extension to ArcView 3.x was used. This extension was developed by the Geological Survey of Canada. It is a Baysian method for evaluating the occurrence of site locations against different data or evidential layers. These data themes reclassified into presence or absence classes can then be combined to produce a predictive surface for site locations. Bonham-Carter provides an description of weights of evidence and its application in GeographicIinformation Systems for Geoscientists. This extension was used to evaluate archeological site distribution for Delta Region of California (Hansen, 2000). As in this earlier study, relationships initially explored using this extension were

  1. Relationship to Landforms
  2. Proximity to Stream Channels
  3. Elevation Range
  4. Slope Range.

Figure 1 shows the results from this initial iteration. Contrast values shown in the table of weights are the sum of the positive and negative weights for areas of the evidential theme associated with the test sites.

Figure 1: Initial Trial of Weights of Evidence for American and Cosumnes Rivers

Contrast values of 2.0 or more show a strong relationship between the theme and the test points. While the themes of landforms, proximity to stream channel, and elevation groups show a strong relationship, the overall posterior probability for predicting site locations is very low. From this trial with weights of evidence, it was decided to take a closer look factors affecting the data layers used and the representation of the site locations.

Evaluation of Data Layers for Analysis

Principle data used in this analysis are site locations, stream channels for the river systems, major landforms, and elevation data. For use with weights of evidence, the site locations are treated as point data.

Site Locations

For this analysis, site locations were treated as point data. Sites used represent all know sites as of 1998. Most of these sites were recorded before 1960. Many of the early site surveys were not systematic and problems are expected in representation of site location. There are 47 sites along the American River and 123 sites along the Cosumnes River. Generally sites are described as occurring on gently sloping mounds ranging in height from about 15 cm to over 2 meters above the surrounding terrain. Often they are assumed to be natural rises that have been enlarged by the accumulation of midden material. Some of the sites are congruent with the general land surface or are completely buried with no surface indication of prior habitation. No midden accumulations are found on dry higher terraces and uplands away from stream channels (West et. al., 2002).

Table 1 identifies the primary sources for these sites for the two river systems.
Table 1 - Source of Site Locations
Data Source American River Cosumnes River
Nilsson 1995 report - polygons 31 ----
Kitchen Middens identified by Wier et. al. 1950 ---- 23
Point locations from CHRIS database 16 100
Total Number of sites 47 123

The best positional and aerial representation of sites is for a portion of the American River form a 1995 report for the Corps of Engineers (Nilsson et. al., 1995). These site locations were mapped in detail on 1:24, 000 scale topographic sheets from field notes. A second group of polygon data was captured from soils mapping for the 1940 to 1941 time period (Weir, et. al., 1950). This mapping was compiled on a planimetric base at a scale of about 1:31, 680. Source maps were scanned and the images rectified to match as best as possible other 1:24,000 scale data layers. The remaining group of data was captured from a tabular database of site locations from the California Historical Resource Information Service Center (CHRIS). All sites were reviewed to remove duplicates, nonetheless some duplication, particularly in the Cosumnes River dataset, may still occur.

Duplication of sites and the actual site location will affect the computation of weights with the weights of evidence extension. Figure 2 shows a portion of a scanned images from the 1950 soil mapping.

Figure 2: Image of 1950 Soil Map with Kitchen Middens registered to other Base Data along the Cosumnes River

In this figure, the white dotted lines and numbers are from the scanned image. The flood plain level in blue is from detailed soil mapping from NRCS published in 1993. Kitchen middens from the 1950 maps carry the symbol 36. In developing the digital representation of kitchen middens, the overall displacement of midden locations was felt to be about 250 meters. This distance is indicated by the light halo around the midden locations. Other points in this figure are from the CHRIS database. While these points are in close proximity to midden locations, they were treated as separate sites for this analysis. In comparing the polygons from the 1:24,000 scale site mapping for the American River study area and the kitchen middens, the following range in polygon size was noted:

Differences between the two areas may be due to the tendency for sites along the Cosumnes River to be larger rather than differences in mapping between sources or registration problems in developing the digital kitchen midden layer. The 1950 soil mapping did not identify any kitchen middens along the American River for comparison. All duplicates were eliminated for the American River. For comparison, points from the CHRIS are shown with 250 meter halos around the points. Although most sites from the CHRIS database were reported before 1960 and their actual positional accuracy in their location is not known, these sites along the American River appear to be accurate based on field investigations and review.

Stream Channels

The digital source for the river channels is the National Hydrology Database (NHD) with the lines representing the stream channels selected out. For the analysis, these lines represent the current thalweg for the two systems. The NHD layer for this area is the currently available 1: 100,000 scale NHD data. These main channels also serve to define the study areas with a buffer of  500 kilometers as shown in Figure 1. The flow in the American River is considerably larger than the Cosumnes. Flow regimes in this area have large seasonal variations. Most precipitation occurs between October and May during fall and winter months with high flows in winter and spring. Low flows occur during summer and fall. Before development both systems typically had perennial flow. The American has one primary channel and the Cosumnes River has Deer Creek as a secondary channel that parallels the main channel. These trials buffered the lines representing the channels for the American and Cosumnes Rivers for calculation of weights based on proximity to channel. Deer Creek was not buffered for calculation of weights. The initial analysis showed high contrast values for the buffered section of the stream channel. Running the trials separately for the two systems also showed strong contrast values. However, the contrast was much stronger for the American River within a shorter buffer distance of the stream channel. This can be seen in Table 3.

Landforms

The SSURGO database for Sacramento County from the Natural Resource Conservation Service is the source of both detailed soil information and landforms for this analysis (NRCS). The digital representation of landforms was developed from the SSURGO database based on the description by Roger Parsons of landform and soil relationships contained in the soil survey report for Sacramento County (Turgel, 1993). Figure 3 shows the landforms for the American and Cosumnes Rivers and associated sites. Table 2 contains the extent of major landforms for the study areas and the number of sites.

Figure 3: Site Distribution and Major Landforms along American and Cosumnes Rivers

Table 2 -- Extent of Landforms in Study Areas
Landform American River Cosumnes River
Hectare % Area # Sites Hectare % Area # Sites
F0 -- Active Flood Plain 1,524 3.4 4 350 0.6 5
F1 -- Low flood plain 5,205 11.6 7 7,715 14.8 57
F2 -- High flood plain 2,627 5.9 11 2,550 4.9 12
Total -- Flood plain level 9,356 20.9 22 10,615 20.3 74
T1 -- Low stream terrace 1,436 3.2 6 400 0.6 0
T2cd -- Low terrace - channel deposit 13,075 29.3 8 1,031 2.1 2
T2ml -- Low terrace - main level 11,911 26.5 1 23,903 46.2 28
Total for T2 -- Low terraces 24,986 55.8 9 24,934 48.3 30
T3 -- Intermediate terrace 449 1.0 5 1,390 2.4 5
T4 -- High terrace and Hills 8,547 19.1 5 14,588 28.4 14
Other (Dredge tailings, Urban) 12 0.0 -- 0 0.0 --
Total 44,786 100 47 51,927 100 123

The table shows that while there are differences between the river systems, the extent of major landform groups is comparable. This table also shows the number of prehistoric sites identified with each landform based on overlay of the point data. While this shows some differences between the weights for some landform units, both systems show positive weights and contrast for flood plain levels F0, F1, and F2. It is expected that most sites will be in close proximity to what over time has represented the flood plain. Figures 4 and 5 show the areas associated with sites based on the weights calculated for the landform units. For both river systems, the flood plain units comprise most of the area associated with the sites. This is represented in the landform data by the combined flood plain levels. The first trial with weights of evidence as shown in Figure 1 showed relationships in contrast values based on the flood plain association. The American River also includes some areas on the low stream terrace level and intermediate terrace level. These sites are typically near the contact with the flood plain level.

Figure 4: Areas of Major Landforms associated with sites for the American River - Primarily the Flood Plain Level

Figure 5: Areas of Landforms associated with sites along the Cosumnes River - the Flood Plain Level

The break between the overall flood plain level and the other landforms is fairly distinct based on the detailed soil data. The break between the currently active flood plain and the low and high flood plain is very distinct. However, as is well known with soils data inclusions of other soils and in this case other landform surfaces can be expected. This will affect the relationship identified between the point site location and the landform. The break between the older (Pleistocene) surfaces such as T2, T3 and T4 is not as easily marked in this gently sloping landscape. Sites on these surfaces are expected to be in close proximity to the flood plain level even if they were incorrectly located. Figure 3 shows that the site distribution is frequently near the contact between the flood plain level and other surfaces. The proximity of sites near this contact can be seen in Figures 4 and 5 for portions of the rivers. The proximity of sites to the contact between flood plain levels and other surfaces was modeled in the extension by buffering the lines representing these contacts. Figures 6 and 7 show the result of this modeling. The graphs in these figures show that the contrast values peak at about 4.0 about 700 meters from the landform contact with the other surfaces.

Figure 6: Area associated with sites by buffering flood plain contact along the American River

Figure 7: Area associated with sites by buffering flood plain contact along the Cosumnes River

This includes the currently active flood plain which is expected to migrate across the other.3 flood plain levels. With extraordinary events, it can include the low stream terrace and other low terraces.

Elevation and Slope

The range in elevation within these separate basins is from about sea level to 180 meters (590 feet). Within the study area, the elevation along the stream channel for the American River ranges from about sea level to 30.5 meters (0 to 100 feet). Within the study area, the terrain ranges up to 186 meters (600 feet) with a mean elevation of 38.3 meters. For the Cosumnes River, elevation along the stream channel ranges from about sea level up to 54.8 meters (0 to 180 feet). The National Elevation Dataset (NED) with 30 meter postings for elevation were used for evaluating relationships of elevation and slope with site locations. In the initial run with weights of evidence in Figure 1, strong contrast values are shown for slope and elevation. This relationship is not very clear when looking at weights for individual elevation or slope values. Both elevation and slope had to be grouped into broad classes when generating these weights. For analysis with each river system, the elevation and slope data were generalized from 30 by 30 meter cells to 150 by 150 meter cells using GRID focal functions. Several block sizes were tried ranging from 90 to 270 meters.

For this analysis, lines representing the streams were split into 5 kilometer segments. The buffered area split at these points to generate approximately equal areas. This is shown in Figure 8 with the percentage area of the main flood plain level identified for each section. The area of flood plain ranges from 90 percent at the confluence of the American River with the Sacramento River down to about 2 percent 45 kilometers upstream. The area in each segment of flood plain for the Cosumnes River ranges from about 60 percent down to about 12 percent. The area and width of the flood plain is much more uniform along its length. This does show some clear differences between the two systems.

Figure 8: American and Cosumnes Rivers split into 5 kilometer segments for processing with elevation data

It is also expected that relative elevation above the nearby water surface has a stronger relationship to site position than absolute elevation above sea level or an elevation datum. The sections of 5 kilometer river segments shown in Figure 8 were used to adjust elevation values back to an overall base level along the stream. Elevation values were maximized on a 150 by 150 meter block basis for calculating weights. Maximum slopes were also generalized to 150 by 150 meter blocks for calculation of weights.

Combined Layers for Predictive Surface

The data layers were reclassified into binary classes of association or of no association. Table 3 shows the data layers and their data values classified as associated with the sites.
Table 3 - Features Associated with Sites for Combined Weights
Data Layer American River Cosumnes River
Landform (polygons) F0 -- Active Flood Plain F0 -- Active Flood
F1 -- Low Flood Plain F1 -- Low Flood Plain
F2 -- High Flood Plain F2 -- High Flood Plain
Proximity to Stream Channel (Buffered Distance -- meters) 100 to 900 100 to 2,200
Proximity to Flood Plain Conatact (Buffered Distance - meters ) 100 to 700 100 to 900
Proximity to Low Stream Terrace Contact (Buffered Distance -meters) 100 to 700 100 to 700
Proximity to Low Flood Plain Contact (F1) (meters) --------- 100 to 900
Slope Range for Maximum Slope 1 to 55 2 to 28
Adjusted Elevation Range (meters) 7 to 56 2 to 29

The layers were combined and produced the probability surfaces shown in Figures 9 and 10 for both streams.

Figure 9: Results of combining weights for data themes along the American and Cosumnes Rivers

Figure 10: Results of combining weights for data themes along the American and Cosumnes Rivers

In these figures, W2 represents areas of the theme associated with sites and W1 represents areas not associated with the sites. Contrast is the combined weight for the theme. Comparing these values to Figure 1, the overall contrast for the landform theme has dropped for separate runs of the American and Cosumnes Rivers. However, contrast for the proximity to the flood plain contact increases dramatically for both river systems. The proximity to the stream channel also increases for both systems when they are run independently. Maximum slopes in 150 meter blocks worked well for the American River, but not for the Cosumnes. The adjusted elevation is lower than the elevation groups used in the initial trial. However, both elevation and slope classifications contributed positive contrast values to the overall surfaces for the American and Cosumnes trials. The overall probability surfaces for the two systems have a much broader range for comparison with the site locations.

 

Summary and Conclusions

Of the 170 sites in this study, only 31 sites were considered to be well located at a reference scale of 1:24,000. The other 16 sites for the American River were felt to be reasonably well located. Most of the other sites were identified and reported prior to 1960. The accuracy in their position and size is unknown. Based on site descriptions and general positions, it was expected that the sites should show strong relationships to landform position, proximity to stream channel, and location above the active flood plain.

Errors in the position of site locations can affect the generation of predictive surfaces. In weights of evidence, the predictive surface is generated on the area or extent of features associated with or not associated with point events. In this study, the uncertainty in position of prehistoric sites relative to features ranged from15 meters for recently mapped sites to well over 200 meters for kitchen middens. The overall higher values shown for the probability surface of the American River may be do to the better positional accuracy for prehistoric sites along the American River.

Besides running the analysis based on the point test feature being in or out of a polygon theme of landforms, site locations were evaluated against their proximity to landform contacts by buffering that contact. This effectively modeled the uncertainty in the site locations. This reduced the effect of having sites being incorrectly located outside of a landform with which the site is located. In addition, it reduces the effect of inclusions in a polygon feature, if those inclusions also occur nearby.

In an effort to model uncertainty of site position relative to surface elevation and derived slope, these surfaces were generalized. For this area, a generalization to 150 meter square blocks appeared to provide the best relationship to site location. This assisted in developing a predictive surface. Elevation was adjusted back to a simulated water surface at 5 kilometer intervals along the river systems. While this processing did not identify any clear relationships for either elevation or slope, it provides a framework for further analysis of any relationship.

In this study, the spatial uncertainty in the position of archaeological sites was not directly modeled.  Besides generating a predictive surface based on other GIS layers containing or not containing a site, this study used proximity of sites to key features, and generalized elevation models. For key features associated with site location, proximity to those features such as stream channels and major landform breaks assisted in developing a predictive surface. This and generalizing elevation and slope data to a level that showed some response to the test sites assisted in developing predictive surfaces for these two study areas. It provides a basis for further work in developing our modeling capability to predict prehistoric site distribution.


References

Bonham-Carter, Graeme F., Geographic Information Systems for Geoscientists , Pergamon Press, Elsevier Sciences Inc, Tarrytown, New York, 10591-5153, 1994.

Hansen, David T., Visualizing Uncertainty Captured from Source Documents with GRID., Proceedings of the Eighteenth Annual Esri International User Conference, San Diego, CA, July 1998.

Hansen, David T., Describing GIS Applications: Spatial Statistics and Weights of Evidence Extension to ArcView in the Analysis of the Distribution of Archaeology Sites in the Landscape., Proceedings of the Twentieth Annual Esri International User Conference, San Diego, CA, July 2000.

Natural Resource Conservation Service, Soil Survey Geographic (SSURGO) Data Base, U.S. Department of Agriculture, NRCS, National Soil Survey Center, Miscellaneous Publication Number 1527, P.O. Box 6567, Fort Worth TX 76115-0567, January 1995.

Nilsson, E., J. J. Johnson, M. S. Kelly, and S. Flint , Archeological Inventory Report, Lower American River Watershed Investigation, California, Dames & Moore, Inc. for U.S. Army, Corps of Engineers, Sacramento District, Sacramento, 1995.

Parsons, Roger B. ,Geomorphic Surfaces in Soil Survey of Sacramento County California, A. J. Tugel et. al., Natural Resource Conservation Service, Washington DC., 1993.

Tugel, A., et. Al.,Soil Survey of Sacramento County, California,, Natural Resource Conservation Service, Washington, D.C., 1993.

Weir, W. W., Soils of Sacramento County California, University of California, Berkeley, College of Agriculture, Agriculture Experiment Station,April 1950.

West, G. James, David Hansen, and Patrick Welch, A Geographic Information System Based Analysis of the Distribution of Prehistoric Archeological Sites in the Sacramento - San Joaquin River Delta, California, Along the Shores of Time;, Proceedings from and International and Interdisciplinary Conference, Rodger F. Kelly and Gary Franklin, Editors, National Park Service, March 31 to April 3, 1999.

West, G. James, David T. Hansen, Patrick Welch, William Olsen, and Tom Heinzer, A Spatial and Temporal Analysis of Prehistoric Site Distribution in the Lower Reaches of Two Central California Drainages, Paper for prepared for publication, 2002.

Acknowledgments

The authors would like to acknowledge William Olson for providing information on the regions archeology.


Authors

David T. Hansen
G. James West
Barbara Simpson
Pat Welch
U.S. Bureau of Reclamation
Mid Pacific Region
2800 Cottage Way
Sacramento, CA. USA 95825-1898
Phone: (916) 978-5268
FAX: (916) 978-5290
Email: dhansen@mp.usbr.gov