Dynamic Segmentation and Thiessen Polygons: A Solution to the River Mile Problem

William W. Hargrove, Richard F. Winterfield, Daniel A. Levine

Abstract

Managers of many river and reservoir systems employ a specialized coordinate system for the identification of locations on the water. The location of a point in this coordinate system requires the specification of two numbers; a river mile, as measured along a curving center line, and a linear distance from either the left or right river bank. Typically, both of these coordinates are estimates that are made from boats in the water.

GIS can be used to improve the estimation of these river mile coordinates. River miles are ``defined'' by points on navigation charts along the center line. Because of the ``fractal coastline'' phenomenon, one cannot simply calculate the linear distance along the centerline, since the linear distance will keep increasing as the centerline vector is broken into more segments.

The calibrateroutes command, available as a part of Dynamic Segmentation, permits the establishment of a route system along the centerline, but will not allow the identification or location of points off the centerline yet still in the water.

We have developed a procedure involving the construction of Thiessen polygons around events along the centerline route which permits the construction of a coverage where polygons specify particular rivermile increments. The intersection of this polygon coverage with a point coverage of sample locations, for example, allows the accurate assignment of river mile locations to those samples. Variants of this technique may also be useful for interstate highway applications, where a similar type of coordinate system may be in use.


INTRODUCTION

Nature of the Problem

In large reservoir systems, an unusual non-orthogonal coordinate system is often used to describe approximate spatial locations on the water. Locations in this system are specified by 2 coordinates; a river mile distance, as measured along an arbitrary pre-defined river ``centerline'', and a linear distance from the left or right shoreline at this river mile distance. The specification of these 2 coordinates locates a point only approximately, since the curving centerline of a meandering river creates location ambiguity. If we construct ``rungs'' perpendicular to the curving centerline at each river mile distance, these rungs will get closer together on the inside edge of river bends, and will stretch farther apart along the outside edge of bends. Adjacent rungs may even cross each other on the inside edge of tight bends or meanders.

Despite the approximate nature of the river mile coordinate system, it enjoys general use in large reservoir systems. Often, river mile position locations are visually estimated by workers in boats, with reference to published charts or landmarks, so high precision in the coordinate system is not required.

Subtle variants of the river mile coordinate system are used in other disciplines as well, particularly highway systems. Pavement maintenance records, traffic accident location studies, and bus and train stop schedules often use highway mile markers for a distance estimation, and occasionally a second lateral distance coordinate to locate, say, a pothole, manhole cover, or telephone pole. Large historical data sets may exist in these applications which use a river mile-type coordinate system to specify locations, and it may be prohibitively costly to re-survey such features in a different coordinate system.

The river mile coordinate system presents problems, however, for automated location using modern GIS systems. The curved centerline is, at some scale, ultimately represented in the GIS as a series of straight vector segments. As the computer attempts to calculate the length of the curved centerline, the ``fractal coastline'' problem is encountered (Mandelbrot 1967). As the curved centerline is broken into more and more shorter and shorter straight line segments, the cumulative total length of those straight line segments increases.

To demonstrate this effect, interested readers can perform the following experiment: digitize a sweeping curve which contains 4-5 vertices, build the arc, and look at the total length in the .aat file. Now use the densify command to add vertices to the arc, rebuild, and again check the total length in the .aat file. The densified arc will have a greater total length than the undensified version, even though the vertices fall on the same centerline. This would seem to preclude implementation of a river mile coordinate system inside a GIS, since the density of vertices along the centerline affects the distance measured.

General Strategy for the Solution

Our goal in solving the river mile problem is to determine the river mile to which any given point is closest. This goal can be accomplished by developing a polygon coverage in which the river is divided into areas within which all points are closer to that tenths mile division than any other tenths mile. Although we have chosen to use tenths of river miles for this application, the technique that we describe could be used to produce a polygon coverage for any desired river mile measurement resolution. Once we have the tenths of river miles polygon coverage, we can intersect it with a point coverage of sample locations, for example, in order to assign river mile locations to each of those sample points.

The Dynamic Segmentation (DS) and routing capabilities of ArcInfo provide the advanced GIS tools required to solve the river mile problem. DS allows the development of route systems along the centerlines of all rivers, and the subsequent calibration of route measurements along those routes according to the known positions of river mile markers. Maps of the Clinch /Tennessee River/Watts Bar Reservoir system showing the arbitrary but pre-defined centerlines were available, along with the positions defining particular river miles. An example of such a map is available for viewing on the World Wide Web at URL: http://www.esd.ornl.gov/programs/CRERP/IMAGES/WWWBUOYS.GIF

The construction of perpendiculars to the curving centerline halfway between each river mile marker in an automated way with a GIS, and clipping those perpendiculars at their intersection with the shoreline also represents a technically difficult task. Rather than attempting to construct an elaborate looping AML to accomplish this task, we realized that a very similar function could be automatically performed by using ArcInfo's built-in Thiessen polygon generator.

Thiessen polygons (Thiessen and Alter 1911) are constructed by connecting a series of point locations with line segments, erecting perpendiculars to those line segments at their midpoints, and then extending those perpendiculars until they intersect. Finally, the original connecting line segments are dissolved, leaving irregularly-shaped polygons containing the original points (Okabe et al. 1992). Each polygon has the unique property that any location within the polygon is closer to that polygon's point than to the point of any other polygon (Gold 1991). The construction of Thiessen polygons around the river mile points automatically provides the perpendicular ``rungs'' at their halfway points.

METHODOLOGY

Establishment and Calibration of the Route System

Centerlines were digitized into a vector coverage (allcl) for all rivers in the reservoir system. Similarly, the positions and distance labels for all known river mile markers were digitized into a point coverage (allrm). We used additem to add a unique field (cl_rt-id) to allcl.aat. We used select many to select the arcs comprising each river in turn, and used calculate to assign a code value for cl_rt-id. Finally, we used the measureroute command to automatically establish a separate route system for each river in the system.

Because the calibrateroutes command effectively interpolates distance measurements between the known positions of mile markers, it is important to ensure that all segments of the route system have nonzero measurements. If more than one measurement point has a zero value, the interpolation algorithm will fail, producing an arcane error message. Thus, we employed the following series of commands to ensure that all segments of all routes had nonzero measures before invoking the final calibrateroutes command:

arc> measureroute allcl cl_rt cl_rt-id cl_rt-id # LL ARC
arc> calibrateroutes allcl cl_rt allrm cl_rt-id mile# # nosplit full
arc> calibrateroutes allcl cl_rt allrm cl_rt-id mile# # split full

The nosplit full options in the initial calibrateroutes command ensure that each entire route is given nonzero measures. Then, the split full option is used in the ultimate calibrateroutes, giving the ultimate segments of each route the proper measurement extrapolations.

This figure shows the correctly interpolated river route system. The reservoir shoreline is shown in blue, Each river route is shown in yellow, and the known river mile positions are indicated in green. Interpolated measures are shown using the routehatch command.

Generating Events and Constructing Thiessen Polygons

Next, we wrote a small UNIX shell script, genrivmil to generate event points at the desired tenths intervals on all river route systems as a comma-delimited ASCII file. In INFO, we constructed an INFO database with the appropriate fields, and used the ADD FROM command to populate this database (IN_COORDS.DAT) with the records from the ASCII file. In arcplot, we established an eventsource ``relate'' called `samples' with the command:

ap> eventsource add point samples in_coords.dat info ordered cl_rt-id cl_rt-id rivmil

This relate estabilishes each record as an event at the appropriate location on each river route. The `samples' relate will be active throughout the current arc session. Finally, we used the eventpoint command in arc to dump these events to a new point coverage (tenths):

arc> eventpoint allcl cl_rt samples tenths

Thiessen polygons were created around these points with the arc command:

arc> thiessen tenths tenthspoly

In this figure, the weblike structure of the Thiessen polygons is shown, along with the tenths of river miles points, in red, and the standard river mile markers in blue.

A New Problem: Harbors and other Diverticula

We used the shoreline of the reservoir system (lkwgs84) to clip the Thiessen polygons, but examination of this coverage indicated a new problem. ``Blind'' harbors which extend past adjacent river mile boundaries have their ``tips'' classified as the adjacent river mile, even though there is no direct water connection to the centerline of the river at that river mile. Such diverticula should be uniformly assigned the river mile associated with the mouth of these features, defined as the point of communication with the centerline. Moreover, there were occasional ``faults'' where Thiessen polygons from one river would intrude into the clipped shoreline of a nearby adjacent river system.

In order to solve this diverticulum problem, we devised a method, by way of a junk point coverage (tenthsbill), to select all Thiessen polygons through which the centerline passed, and then, by way of an nselect and a flag value assignment, to label all such problem areas for semi-automatic correction.

We used an intersect between the original point coverage and the clipped Thiessen polygon coverage to produce a new point coverage (tenthsbill) in which the points along the centerline had the same polygon IDs as the clipped Thiessen polygons:

arc> clip tenthspoly lkwgs84 tenthsclip poly
arc> intersect tenths tenthsclip tenthsbill point

Then, in arcplot, we used a keyfile reselect to select only the portion of each now possibly disjunct Thiessen polygon which contained the original centerline mile points. An nselect then selected all portions of clipped Thiessen polygons which did not touch the centerline. River mile values for these harbor sections which did not directly communicate with the centerline at that rivermile were calculated to a flag value (888).

ap> reselect tenthsclip polys keyfile tenthsbill.pat tenthsclip#
ap> nsel tenthsclip polys
ap> calculate tenthsclip polys rivmil=888

A semi-automated AML was devised to help perform cleanup on the problem polygons. The third line of the cleanup AML calls an arcplot macro, billap, which picks an appropriate shade set, reselects all remaining ``problem'' polygons, and shades them. The cleanup AML then enters a loop, first drawing all remaining ``problem'' polygons, allowing the user to select the ``mother'' polygon at the river mile where the mouth of this bay originates, and saves the river mile from this mother polygon to a variable. Then the cleanup AML allows the user to select a rubberband box around all of the polygons in the bay which are isolated from the centerline. The river mile attributes of all of the selected ``problem'' polygons are calculated to the correct stored value, and the user is offered a chance to quit. If not taken, the loop is entered again, and more diverticula can be corrected.

Another New Problem: Islands and River Miles

Because of the earlier clip operation, all islands in the reservoir system were missing. We made a copy of the shoreline and island coverage:

arc> copy ~user/clinch/newcov/lkwgs84 islands

and then, from within arcedit, reselected all polygons whose LPOLY or RPOLY were the universe polygon. This selected all polygons which ``touched'' the land surrounding the reservoir system. All selected polygons were deleted, leaving only polygons not touching the outer land: the islands. The coverage containing the islands was built as a polygon coverage:

arc> build islands poly

We used the erase command to erase the island polygons from the clipped and cleaned Thiessen river mile polygons. The erase command takes care of adding new label points to polygons which are split by islands.

arc> copy tenthsclip tenthsultra2
arc> erase tenthsultra2 islands tenthswo

However, a new problem emerged; Iron Hill Island falls at a point in Lower Watts Bar where the reservoir, in its meandering, folds back very sharply upon itself. This sharp ``hairpin'' created a common border between distant river mile Thiessen polygons. The island fell slightly to one side of this folded division between distant rivermiles, yet it seemed unreasonable that river mile values should be the same on both sides of the island.

Arcedit was used to clean up the river mile polygons by first selecting and deleting all polygon separators and labels on the eastern side of the island, then selecting all river mile ``rungs'' and, using the extend command, connecting them to the island polygon. Since the polygon topology was altered in this operation, the new polygon coverage was rebuilt:

arc> copy tenthswo tenthswofixed
... various arcedit commands including extend ...
arc> build tenthswofixed poly

Tools for Visual Error Checking

Two visual techniques were devised for easier examination of classification errors in the final river mile polygon coverage. The first involves the generation of an arcplot map composition:

ap> map errchk.cmp
ap> mape tenthswofixed
ap> arcs tenthswofixed
ap> linecolor green
ap> dropline tenthswofixed rivmil # NOTEXT

This map composition first draws all arcs in red, then uses the dropline command to overplot all lines separating polygons with different river mile values in green. Thus, bays and diverticula should contain all red segments, and no red segments should appear elsewhere. The use of a map composition tool also permits the user to pan and zoom around the map.

A final aid to visual inspection of the river mile coverage uses the arc dissolve command to physically remove arcs separating polygons with identical river mile attributes. Since different river systems may touch at the same river mile, we narrowed the dissolve criterion to an exact match on all attributes:

arc> dissolve tenthswofixed tenthsdslv #all poly

CONCLUSIONS

The capability to interpolate route measurements from known standards is the key to solving the river mile problem. Without this capability, the fractal nature of the centerline makes it impossible to accurately measure river miles down its length. This capability is uniquely provided by the Dynamic Segmentation portion of ArcInfo.

The solution to the river mile problem is only a small part of our GIS work on the Clinch River/Watts Bar reservoir system. We have used GRASS to produce a number of animated movie sequences in which the viewer ``flies'' down toward the river, splashes through the transparent water surface, and then travels underwater above the riverbed. Black river mile buoys, floating at the water surface, were generated by the solution to the river mile problem, and give the underwater traveller a point of reference for location in the reservoir. This and other animations are publicly available on the World Wide Web at URL:http://www.esd.ornl.gov/programs/CRERP/SEDIMENT/DATA.HTM

We hope to extend this solution to the river mile problem by devising an AML which will, given the river mile and a distance from the left or right bank, automatically locate the point on a map of the reservoir system. We would appreciate any suggestions or ideas about this AML or other parts of this work which other ArcInfo users may have.

ACKNOWLEDGEMENTS

Thanks to Mary Alice Wood, who provided moral support throughout the sometimes frustrating path to this solution, and who obtained the DLG river boundary file used in the final product. Mike Weir, Esri instructor, made helpful initial suggestions about how to attack the problem. Forrest Hoffman provided assistance with html markup of this document.

Research sponsored by Office of Environmental Restoration and Waste Management, U.S. Department of Energy under contract DE-AC05-84OR21400 with Martin Marietta Enargy Systems, Inc.
"The submitted manuscript has been authored by a contractor of the U.S. Government under contract No. DE-AC05-84OR21400. Accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish of reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes."

REFERENCES

Gold, C.M. 1991. Problems with handling spatial data--the Voronoi approach. C.I.S.M. Journal 45:65-80.

Mandelbrot, B.B. 1967. How long is the coast of Britain? Statistical self-similarity and fractal dimension. Science 155, 636-638.

Okabe, A., B. Boots, and K. Sugihara. 1992. Spatial Tesselations: Concepts and Applications of Voronoi Diagrams. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons. Chichester.

Thiessen, A.H., and J.C. Alter. 1911. Climatological Data for July, 1911: District No. 10, Great Basin. Monthly Weather Review July 1911:1082-1089.


William W. Hargrove
University of Tennessee
Oak Ridge National Laboratory
P.O. Box 2008, M.S. 6038
Oak Ridge, TN 37831-6038
(615) 574-1902
(615) 576-8646 fax
hnw@mtqgrass.esd.ornl.gov
Daniel A. Levine
Automated Sciences Group, Inc.
Oak Ridge National Laboratory
P.O. Box 2008, M.S. 6038
Oak Ridge, TN 37831-6038
oki@ornl.gov
Richard F. Winterfield
University of Tennessee
Oak Ridge National Laboratory
P.O. Box 2008, M.S. 6407
Oak Ridge, TN 37831-6407
wtf@ornl.gov