Jun (John) Hu
In environmental GIS applications, generating surfaces is a frequently imposed requirement in the early stage of analyses. Topographic surfaces, bedrock surfaces, groundwater tables, and contamination plumes are examples of surfaces which need to be generated from available data sources. The importance of generating these surfaces is that these surfaces are used as the basic information to perform further spatial analyses in environmental applications. Based on these surfaces, we can carry out additional analyses to answer questions such as what is the saturated thickness, or how the contamination is distributed in subsurface soil. Thus, the accuracy of subsequent analyses directly depends on the accuracy of the surfaces built in the early stage of analyses.
ArcInfo offers several methods for creating surfaces. These methods include trend surface (TREND), inverse distance weighted (IDW), triangulation (CREATETIN), and kriging (KRIGING). Each of these method has its own advantages and disadvantages in terms of data interpolation. No one method works universally as the best for all the data set. Selection of a particular method depends on the distribution of data points and the study objectives.
The purpose of this paper is to discuss these methods and to provide some suggestions on how these methods should be properly used to construct surfaces in environmental applications. The emphasis of this paper is on process and methodology of generating surfaces. It will also address related problems such as how the data set should be evaluated prior to surface generation.
Data for generating these surfaces are usually collected though field sampling and surveying. Because of high cost and limited resource, data collection can be conducted only in selected point locations with limited numbers. In order to generate a continuous surface of a property (i.e. groundwater table), some kind of interpolation method has to be used to estimate surface values at those locations where no samples or measurements were taken.
ArcInfo offers four interpolation methods: trend surface (TREND), inverse distance weighted (IDW), triangulation (CREATETIN), and kriging (KRIGING). The triangulation method generates surfaces represented by irregularly spaced points (TIN). The other three methods generate surfaces represented by equally spaced data points (GRID). In terms of interpolation, each of these method has its own advantages and disadvantages. No one method works universally as the best for any data set. The intent of this paper is to give a comparative evaluation of these interpolation methods and provide some insights on how these methods should used properly to generate surfaces.
Data evaluation consists of two parts: quality evaluation and statistical evaluation. Quality evaluation is to ensure that data are accurate and representative. For example, well data for generating groundwater table should be based on water level readings on same date and from the same type of wells (such as shallow monitoring wells). In order to accurately represent normal groundwater table, sampling date should be carefully chosen to reflect the normal weather conditions.
Chemical and radiological data should be verified and validated before they are used to generate concentration surfaces. If these data are reported from several laboratories, special check should be made to eliminate the differences among laboratories. It is possible that the same sample may have different concentration or activity values because different laboratories may use different instruments or analytical procedures. Another issue is related to the use of analyzed values that are below the detection limits. In most cases, detection limits are used as the substituted concentration or activity values for samples below detection limits. However, other statistically defined values can also used (Helsel, 1990).
Spatial attention should be given to the abnormal values appeared in the data set since these value will seriously distort the interpolation process. The best way to detect these value is though map or visual display of the data set. Abnormal values usually appear as "spatial outliner", i.e. an extreme low value are surrounded by high values or visa versa. Map display also provides additional insights about the data set: how samples are spatially distributed (uniform or cluster), and what is the general pattern and trend (high values and low values) associated with the data set.
Statistical evaluation is to understand statistical properties associated with the dada set. These properties may include distribution, location, spread, and shape of the data set. There are many tools available in the univariate description statistics which can be used to describe these properties (Isaaks and Srivastava, 1989; Geo-Eas, 1992). For example, the frequency table and corresponding historgram can be used to describe how often observed values fall within certain intervals or classes. Probability plot can be used to determine how close the distribution of the data set is to a Gaussian or normal distribution.
Summary statistics provide a valuable summary of information contained in the historgram. It includes three groups: measures of location, measures of spread, and measures of shape. Measures of location describe where various parts of the distribution lie. The center of distribution is described by the mean, the median, and the mode. The location of other part of the distribution are given by various quantiles. Measures of spread describe the variability. This group includes the variance, the stand deviation, and the interquarterile range. Measures of shape include coefficient of skewness and coefficient of variation. Coefficient of skewness provides information on the symmetry while coefficient of variation gives information on the length of the tail for certain types of distributions.
The following discussions focus on four interpolation methods which can be easily accessed by ArcInfo users: trend surface, inverse distance weighted, triangulation, and kriging. Among these four methods, trend surface acts as global interpolation method while other three method function as local interpolation methods. With global interpolation methods, all observational points within a study area are utilized to estimate the value at a new point. With local interpolation methods, only a subset of observational points located near the new point are used to estimate the value at this point.
The disadvantage is that this method is highly affected by the extreme values and uneven distribution of observational data points. The problem is further complicated by the fact that some of the data points are more informative than others. For example, in interpolating topographic surfaces, the data points taken from the peaks, pits, passes, and pales are more significant than the points taken from the slope or plain. Trend surface is a smoothing and approximate method, it rarely pass through original data points.
It is implicit in multiple regression that the residuals from a regression line or surface are normally distributed and independent errors. The deviations from a trend surface are always to some degree spatially dependent; in fact one of the most fruitful uses of trend surface has been to reveal parts of a study area that show the greatest deviations from the general trend. Therefore, the main use of trend surface is not as an interpolator, but as a way of removing broad features of the data prior to using some other local interpolation methods.
The simplicity of underlying principle, the speed in calculation, the ease of programming, and reasonable results for many type of data are some of the advantages associated with inverse distance weighted interpolation. The disadvantages are: choice of weighting function may introduce ambiguity, especially when the characteristics of underlying surface are not known; the interpolation can easily affected by uneven distribution of observational data points since an equal weight will be assigned to each of the data points even if it is in a cluster; maxima and minima in the interpolated surface can only occur at data points since inverse distance weighted interpolation is a smoothing technique by definition;
There are several triangulation methods available (Watson, 1992, p.130-136). Delaunay triangulation is the method adopted in ArcInfo because of these advantages: triangles are as equi-angular as possible, value for a new node is ensured to be as close as possible to a known observation point, and triangulation is not affected by the order of observational points to be considered.
Comparing with grid based interpolation methods, triangulation offers several advantages: First, when the surfaces have significant relief features, triangulation generates more accurate surfaces. This is because the triangulation maintains breakline features such as stream channel, ridges, and shorelines, and relief features such as peaks and pits; Secondly, triangulation is an exact method. It is more accurate than the grid based methods because original data points are located exactly on the surface. The grid based methods only occasionally honor the original data points; Finally, triangulation is also a faster method in terms of computing time. This is because triangulation can efficiently represent the same surface as grid based methods do but with fewer data points.
The main disadvantage is that the surfaces are not smooth and may give jagged appearance. This is caused by discontinuous slopes at the triangle edges and data points. In addition, triangulation is generally not suitable for extrapolation beyond the domain of observed data points.
The computation and interpretation of semivariogram is the "heart" of the kriging method. Semivariogram measures the degree of spatial correlation among observational data points in a study area as a function of the distance and direction between observational data points. It controls the way that kriging weights are assigned to data points during interpolation, and consequently controls the quality of the results.
Computing semivariogram is always the first step for using kriging interpolation methods. Since there are only limited observational data points available, it is impossible to compute the "true" semivariogram. In practice, semivariograms are often estimated from observed data values. This is achieved though selecting the function which fits observational data points the best. This process involves interpretation and judgment, and often requires a large number of "trial and error" computations. ArcInfo offers five mathematical functions which can be used as possible candidates for the semivariogram of a give data set: spherical, circular, exponential, Gaussian, and linear.
Kriging overcomes many shortcomings of the traditional interpolation methods. For instances, the kriging weights are determined by the semivariogram and the configuration of the data set. Kriging is an optimal interpolator in the sense that the estimates are unbiased and have known minimum variances. Since the estimation variances can be determined and mapped like the estimates, and assuming a particular distribution, we can calculate the confidence we can place in the estimates. This makes kriging uniquely different from other interpolation methods. The estimation variance can also be used to determine where more information is needed if future sampling is planned.
Kriging weights depend not only on the distances between observational points and estimation locations but on the mutual distances among observational points as well. As a result, kriging has two interesting and unique properties: declustering and screen effect. With the declustering property, several observational points close together will have collectively the weight of a single observational point located near the centroid of the cluster. With screen effect, the influence of an observational point will be reduced by addition of one or more observational points at the intermediate locations between the original observational point and the estimation location. As the screen effect makes the influence of distant observational points negligible, the use of sampling subset in kriging is a safe practice compared with other weighting methods.
One weakness of kriging is that the original data points are seldom honored. This is a common problem associated with grid based interpolation methods. It may lead to the dilemma that contours generated from the surface appear on the wrong side of observed data points. Explanation for this is that kriging is essentially a smooth interpolation method, and it behaves like a filter away from the data points. Kriging neither intends nor is able to duplicates reality. Its aim is avoid blunders by following more general spatial trends in the observational data points. Another problem associated with kriging is the estimation of semivariogram. It is not always easy to ascertain whether a particular estimate of the semivariogram is in fact a true estimator of the spatial correlation in an area. The reasons for choosing a particular semivariogram to fit the given data set are often difficult to explain in terms of physical processes. They can only be rationalized in terms of a least-squares on maximum likelihood fit to the data set. Finally, kriging is not a suitable method for data sets which have anomalous pits or spikes, or abrupt changes such as breaklines.
In practice, selection of a particular interpolation method should depend upon the configuration of the data set, the type of surfaces to be generated, and tolerance of estimation errors. In generating surfaces, a three step procedure is recommended: the first step is to evaluate the data set. This give ideas on how data are spatial distributed, and may provide hints on which interpolation method should be used. The second step is adopt and apply an interpolation method which is most suitable to both the data set and the study objectives. When in doubt, a user should try several methods that are available. The third step is compare the results using univariate and bivariate descriptive tools and determine the most stratifying result and the most suitable method. This may look like a timing consuming process at the beginning. However, as an user gains more experience and acquires more knowledge of different interpolation methods, the time required for generating the most suitable surface should be greatly reduced.
England, E. and Sparks, A. GEO-EAS: Geostatistical Environmental Assessment Software. Las Vegas, Neveda: Environmental Monitoring Systems Laboratory, U.S. Environmental Protection Agency, 1992.
Helsel, D.R. "Less Than Obvious: Statistical Treatment of Data Below the Detection Limit," Environmental Science and Technology. 24(1990):1766-1774
Isaaks, E.H. and Srivastava, R.M., Applied Geostatistics, New York: Oxford University Press, 1989
Lam, N.S. "Spatial Interpolation Methods: A Review", The American Cartographer. 10(1983):129-149.
Ripley, B. Spatial Statistics. New York: John Wiley And Sons, 1981
Waston, D.F. Contouring: A Guide To The Analysis And Display Of Spatial Data. New York: Pergaman Press, 1992.