Jun (John) Hu

Methods of Generating Surfaces In Environmental GIS Applications

In environmental GIS applications, generating surfaces is a frequently imposed requirement in the early stage of analyses. Topographic surfaces, bedrock surfaces, groundwater tables, and contamination plumes are examples of surfaces which need to be generated from available data sources. The importance of generating these surfaces is that these surfaces are used as the basic information to perform further spatial analyses in environmental applications. Based on these surfaces, we can carry out additional analyses to answer questions such as what is the saturated thickness, or how the contamination is distributed in subsurface soil. Thus, the accuracy of subsequent analyses directly depends on the accuracy of the surfaces built in the early stage of analyses.

ArcInfo offers several methods for creating surfaces. These methods include trend surface (TREND), inverse distance weighted (IDW), triangulation (CREATETIN), and kriging (KRIGING). Each of these method has its own advantages and disadvantages in terms of data interpolation. No one method works universally as the best for all the data set. Selection of a particular method depends on the distribution of data points and the study objectives.

The purpose of this paper is to discuss these methods and to provide some suggestions on how these methods should be properly used to construct surfaces in environmental applications. The emphasis of this paper is on process and methodology of generating surfaces. It will also address related problems such as how the data set should be evaluated prior to surface generation.

INTRODUCTION

In environmental GIS applications, generating surfaces is a frequently imposed requirement in the early stage of analyses. Example of surfaces include topographic surfaces, bedrock surfaces, groundwater tables, and contamination plumes for the specific media. Surfaces like these provide critical information for subsequent analyses which involves subjects such as groundwater modeling, saturated thickness, subsurface distribution of contamination, and volume estimation.

Data for generating these surfaces are usually collected though field sampling and surveying. Because of high cost and limited resource, data collection can be conducted only in selected point locations with limited numbers. In order to generate a continuous surface of a property (i.e. groundwater table), some kind of interpolation method has to be used to estimate surface values at those locations where no samples or measurements were taken.

ArcInfo offers four interpolation methods: trend surface (TREND), inverse distance weighted (IDW), triangulation (CREATETIN), and kriging (KRIGING). The triangulation method generates surfaces represented by irregularly spaced points (TIN). The other three methods generate surfaces represented by equally spaced data points (GRID). In terms of interpolation, each of these method has its own advantages and disadvantages. No one method works universally as the best for any data set. The intent of this paper is to give a comparative evaluation of these interpolation methods and provide some insights on how these methods should used properly to generate surfaces.

DATA EVALUATION AND PREPARATION

This is always the first and an important step for generating any surfaces since different interpolation methods work best for different data set. A good understanding of the data set is an essential ingredient of good interpolation. The time taken to explore, understand, and describe the data set should be amply regarded though quick adoption of the most suitable interpolation method (Issaks and Srivastava, 1989).

Data evaluation consists of two parts: quality evaluation and statistical evaluation. Quality evaluation is to ensure that data are accurate and representative. For example, well data for generating groundwater table should be based on water level readings on same date and from the same type of wells (such as shallow monitoring wells). In order to accurately represent normal groundwater table, sampling date should be carefully chosen to reflect the normal weather conditions.

Chemical and radiological data should be verified and validated before they are used to generate concentration surfaces. If these data are reported from several laboratories, special check should be made to eliminate the differences among laboratories. It is possible that the same sample may have different concentration or activity values because different laboratories may use different instruments or analytical procedures. Another issue is related to the use of analyzed values that are below the detection limits. In most cases, detection limits are used as the substituted concentration or activity values for samples below detection limits. However, other statistically defined values can also used (Helsel, 1990).

Spatial attention should be given to the abnormal values appeared in the data set since these value will seriously distort the interpolation process. The best way to detect these value is though map or visual display of the data set. Abnormal values usually appear as "spatial outliner", i.e. an extreme low value are surrounded by high values or visa versa. Map display also provides additional insights about the data set: how samples are spatially distributed (uniform or cluster), and what is the general pattern and trend (high values and low values) associated with the data set.

Statistical evaluation is to understand statistical properties associated with the dada set. These properties may include distribution, location, spread, and shape of the data set. There are many tools available in the univariate description statistics which can be used to describe these properties (Isaaks and Srivastava, 1989; Geo-Eas, 1992). For example, the frequency table and corresponding historgram can be used to describe how often observed values fall within certain intervals or classes. Probability plot can be used to determine how close the distribution of the data set is to a Gaussian or normal distribution.

Summary statistics provide a valuable summary of information contained in the historgram. It includes three groups: measures of location, measures of spread, and measures of shape. Measures of location describe where various parts of the distribution lie. The center of distribution is described by the mean, the median, and the mode. The location of other part of the distribution are given by various quantiles. Measures of spread describe the variability. This group includes the variance, the stand deviation, and the interquarterile range. Measures of shape include coefficient of skewness and coefficient of variation. Coefficient of skewness provides information on the symmetry while coefficient of variation gives information on the length of the tail for certain types of distributions.

INTERPOLATION METHOD

A frequently asked question is what is the best interpolation method for any data set. The answer to this question is none. There are many interpolation methods available (Watson, 1992; Burrough, P.A., 1986, p.147-166; Lam, 1983, p.129-150; and Ripley, 1981, p.28-77). Each of these method works best for a particular data set because of its inherent assumptions and algorithm design for estimation. For a given data set, different interpolation methods may work best for different study objectives (i.e. smooth surface vs. accurate surface). Although neighborhood-based linear interpolation with blended gradients may seem to be a more general and flexible method (Waston, 1991), there is still no interpolation method which will guarantee the best results for all data sets. With this statement in mind, selection of a particular method should depend upon characteristics of data set as well as study objectives.

The following discussions focus on four interpolation methods which can be easily accessed by ArcInfo users: trend surface, inverse distance weighted, triangulation, and kriging. Among these four methods, trend surface acts as global interpolation method while other three method function as local interpolation methods. With global interpolation methods, all observational points within a study area are utilized to estimate the value at a new point. With local interpolation methods, only a subset of observational points located near the new point are used to estimate the value at this point.

1. Trend Surface

The idea behind the trend surface interpolation is to fit a least-squares surface to observational data points by using polynomial regression. The advantage of this method is that it is superficially easy to understand, at least with respect to the way the surfaces are estimated. It can be used to show broad features of the observational data points, such as the overall flow direction of groundwater.

The disadvantage is that this method is highly affected by the extreme values and uneven distribution of observational data points. The problem is further complicated by the fact that some of the data points are more informative than others. For example, in interpolating topographic surfaces, the data points taken from the peaks, pits, passes, and pales are more significant than the points taken from the slope or plain. Trend surface is a smoothing and approximate method, it rarely pass through original data points.

It is implicit in multiple regression that the residuals from a regression line or surface are normally distributed and independent errors. The deviations from a trend surface are always to some degree spatially dependent; in fact one of the most fruitful uses of trend surface has been to reveal parts of a study area that show the greatest deviations from the general trend. Therefore, the main use of trend surface is not as an interpolator, but as a way of removing broad features of the data prior to using some other local interpolation methods.

2. Inverse Distance Weighted

In this interpolation method, observational points are weighted during interpolation such that the influence of one point relative to another declines with distance from the new point. Weighting is assigned to observational points through the use of a weighting power that controls how weighting factors drop off as the distance from new point increases. The greater the weighting power, the less effect points far from the new point have during interpolation. As the power increases, the value of new point approaches the value of the nearest observational point.

The simplicity of underlying principle, the speed in calculation, the ease of programming, and reasonable results for many type of data are some of the advantages associated with inverse distance weighted interpolation. The disadvantages are: choice of weighting function may introduce ambiguity, especially when the characteristics of underlying surface are not known; the interpolation can easily affected by uneven distribution of observational data points since an equal weight will be assigned to each of the data points even if it is in a cluster; maxima and minima in the interpolated surface can only occur at data points since inverse distance weighted interpolation is a smoothing technique by definition;

3. Triangulation

Triangulation is the most flexible interpolation method offered by ArcInfo. It can generate interpolated surfaces from many different data sources such as point data, lines, breaklines, and polygons (erase, replace, or clip). Because of this flexibility and the speed of interpolation, triangulation has become a popular interpolation method for ArcInfo users.

There are several triangulation methods available (Watson, 1992, p.130-136). Delaunay triangulation is the method adopted in ArcInfo because of these advantages: triangles are as equi-angular as possible, value for a new node is ensured to be as close as possible to a known observation point, and triangulation is not affected by the order of observational points to be considered.

Comparing with grid based interpolation methods, triangulation offers several advantages: First, when the surfaces have significant relief features, triangulation generates more accurate surfaces. This is because the triangulation maintains breakline features such as stream channel, ridges, and shorelines, and relief features such as peaks and pits; Secondly, triangulation is an exact method. It is more accurate than the grid based methods because original data points are located exactly on the surface. The grid based methods only occasionally honor the original data points; Finally, triangulation is also a faster method in terms of computing time. This is because triangulation can efficiently represent the same surface as grid based methods do but with fewer data points.

The main disadvantage is that the surfaces are not smooth and may give jagged appearance. This is caused by discontinuous slopes at the triangle edges and data points. In addition, triangulation is generally not suitable for extrapolation beyond the domain of observed data points.

4. Kriging

Kriging is a weighted moving averaging method of interpolation. It is derived from regionlized variable theory which assume that the spatial variation of any geological, soil, or hydrological property, known as a �regionalized variable� is statistically homogenous throughout the surface,; that is, the same pattern of variation can be observed at all locations on the surface. This spatial variation of the property can be expressed in terms of semivariograms. Kriging derives weights from the semivariograms, and it minimizes the estimation variance which are themselves estimated. In ArcInfo, the estimation variance (called variance grid) is automatically generated as a by-product of interpolated surface.

The computation and interpretation of semivariogram is the "heart" of the kriging method. Semivariogram measures the degree of spatial correlation among observational data points in a study area as a function of the distance and direction between observational data points. It controls the way that kriging weights are assigned to data points during interpolation, and consequently controls the quality of the results.

Computing semivariogram is always the first step for using kriging interpolation methods. Since there are only limited observational data points available, it is impossible to compute the "true" semivariogram. In practice, semivariograms are often estimated from observed data values. This is achieved though selecting the function which fits observational data points the best. This process involves interpretation and judgment, and often requires a large number of "trial and error" computations. ArcInfo offers five mathematical functions which can be used as possible candidates for the semivariogram of a give data set: spherical, circular, exponential, Gaussian, and linear.

Kriging overcomes many shortcomings of the traditional interpolation methods. For instances, the kriging weights are determined by the semivariogram and the configuration of the data set. Kriging is an optimal interpolator in the sense that the estimates are unbiased and have known minimum variances. Since the estimation variances can be determined and mapped like the estimates, and assuming a particular distribution, we can calculate the confidence we can place in the estimates. This makes kriging uniquely different from other interpolation methods. The estimation variance can also be used to determine where more information is needed if future sampling is planned.

Kriging weights depend not only on the distances between observational points and estimation locations but on the mutual distances among observational points as well. As a result, kriging has two interesting and unique properties: declustering and screen effect. With the declustering property, several observational points close together will have collectively the weight of a single observational point located near the centroid of the cluster. With screen effect, the influence of an observational point will be reduced by addition of one or more observational points at the intermediate locations between the original observational point and the estimation location. As the screen effect makes the influence of distant observational points negligible, the use of sampling subset in kriging is a safe practice compared with other weighting methods.

One weakness of kriging is that the original data points are seldom honored. This is a common problem associated with grid based interpolation methods. It may lead to the dilemma that contours generated from the surface appear on the wrong side of observed data points. Explanation for this is that kriging is essentially a smooth interpolation method, and it behaves like a filter away from the data points. Kriging neither intends nor is able to duplicates reality. Its aim is avoid blunders by following more general spatial trends in the observational data points. Another problem associated with kriging is the estimation of semivariogram. It is not always easy to ascertain whether a particular estimate of the semivariogram is in fact a true estimator of the spatial correlation in an area. The reasons for choosing a particular semivariogram to fit the given data set are often difficult to explain in terms of physical processes. They can only be rationalized in terms of a least-squares on maximum likelihood fit to the data set. Finally, kriging is not a suitable method for data sets which have anomalous pits or spikes, or abrupt changes such as breaklines.

EVALUATION OF INTERPOLATED SURFACES

If a GIS user can not decide which of above interpolation method should be used , a simple solution is to try several interpolation methods and compare the results through statistical methods. Both univariate and bivariate descriptive tools can be used for such purpose (Issaks and Srivastava, 1992)

1. Univariate Distribution of Estimation

It is reasonable to expect that a good interpolation method will produce estimated values whose distribution is similar to the distribution of true values. The differences between estimated values and the true values are referred as residuals. The mean of the residual distribution is often called bias. A reasonable goal for any interpolation method is to produce unbiased estimates. This would mean that the mean, median, and mode of the residual distribution should close to zero. Another feature is to see the small spread of residual distribution. The variance or standard deviation are both good yardsticks for measuring the spread of residual distribution.

2. Bivariate Distribution of Estimation

A scatterplot of true verses interpolated values provides a good measure on how well an interpolation method has performed. If all estimated value are the same as true values, the scatterplot would be a 45-degree straight line. In reality, the closer the cloud of points to this line, the better the estimation. In addition, the correlation coefficient can also be used as a reliable index for measuring how close the interpolated values are to the true values.

CONCLUSION

The fact that no universal interpolation exists give both good and bad news to GIS users. The bad news is that a GIS user can not simply throw his or her data into the first available method and hope to a get quick and satisfying solution. The good news is that a GIS user may eventually get a surface which is most suitable to his or her data set. This is achieved through understanding of the data set and selecting the most appropriate interpolation method based on comparative evaluation.

In practice, selection of a particular interpolation method should depend upon the configuration of the data set, the type of surfaces to be generated, and tolerance of estimation errors. In generating surfaces, a three step procedure is recommended: the first step is to evaluate the data set. This give ideas on how data are spatial distributed, and may provide hints on which interpolation method should be used. The second step is adopt and apply an interpolation method which is most suitable to both the data set and the study objectives. When in doubt, a user should try several methods that are available. The third step is compare the results using univariate and bivariate descriptive tools and determine the most stratifying result and the most suitable method. This may look like a timing consuming process at the beginning. However, as an user gains more experience and acquires more knowledge of different interpolation methods, the time required for generating the most suitable surface should be greatly reduced.

REFERENCES

Burrough, P.A. Principles of Geographic Information System For Land Resource Assessment. New York: Oxford University Press, 1986

England, E. and Sparks, A. GEO-EAS: Geostatistical Environmental Assessment Software. Las Vegas, Neveda: Environmental Monitoring Systems Laboratory, U.S. Environmental Protection Agency, 1992.

Helsel, D.R. "Less Than Obvious: Statistical Treatment of Data Below the Detection Limit," Environmental Science and Technology. 24(1990):1766-1774

Isaaks, E.H. and Srivastava, R.M., Applied Geostatistics, New York: Oxford University Press, 1989

Lam, N.S. "Spatial Interpolation Methods: A Review", The American Cartographer. 10(1983):129-149.

Ripley, B. Spatial Statistics. New York: John Wiley And Sons, 1981

Waston, D.F. Contouring: A Guide To The Analysis And Display Of Spatial Data. New York: Pergaman Press, 1992.

Jun (John) Hu
Bechtel Environmental Inc.,
151 Lafayette Drive
Oak Ridge, Tennessee 37922
Telephone: (615) 220-2465