Anita Russo, Erik Shepard

A Methodology Analysis for Creating Polygons from Line Vector Data

Defining Issue: The boundaries that define an area of interest do not necessarily follow the boundaries that define the areas for which data are collected. A method of extracting data that pertains to the area of interest is with a custom polygon that is based on a drive distance from a start point on a road network.

GIS Solution: Information Technology Outreach Services at the University of Georgia is developing a Geographic Information System (GIS), known as the Georgia 100 that, among other things, provides the tools to create useful polygon study areas from line features representing a drive path.

Methodology: The input to the process is a highway line coverage generated from TIGER data. NETWORK functions are used to find the end points of the drive path within the highway network. A sequence of BUFFER functions are used in the final stage of the process to interpolate the position of the points in between the known end points of the drive that define the polygon.

Software: The Georgia 100 GIS is written in Arc Macro Language and employs ArcInfo CLIP, NETWORK, and BUFFER functions to create a polygon coverage that represents the final study area. The purpose of this paper is to explore the methodology employed in creating a drive distance study area polygon with special consideration given to: 1. alternative factors that may contribute to the shape of the polygon and 2. limitations of the system.


INTRODUCTION

The Georgia 100 geographic information system (GIS) is designed to address economic development concerns.  Within the application study area polygons give users a basis for mapping areas of interest and defining areas from which to report labor, expenditures, demographics, manufacturing, business, housing, income and other data types. These maps and reports may serve many purposes, including: aiding in the economic growth planning processes of both private and public enterprises, providing information that helps industry comply with regulatory requirements, or supplementing grant application information. Enabling study area customization, without the user having to know how the software performs the customization, makes the application more valuable in serving a user's needs.

There are several types of custom study areas that a user can make with the application. If a user needs to sample data randomly, study areas created arbitrarily may be useful. The Georgia 100 Drive Distance Option, which is polygon creation with the use of a road network's line vector data, enables the user to make less arbitrary study areas which are more suitable to their chosen area of study. The Drive Distance Option requires more information from the user, more information from the database, and more processing from the system. The implementation of this functionality is a result of more qualitative decision making.

This paper describes this methodology and explores the qualitative decisions that affect the output polygon. The methodology is explored within the context of the Georgia 100 GIS.


DRIVE DISTANCE STUDY AREA: A TECHNICAL DEFINITION

A drive distance study area is an ArcInfo polygon coverage. The polygon is defined by end points of a road newtork, based on a distance from a specified start point on the road network, and interpolated points in between the known end points. These interpolated points represent areas for which there are no road data or, which are not on the drive path. The road network is a line coverage of vector data representing interstate, U.S. and state highways with arc and node topology. As such, it contains the length attribute for each arc, and the x,y coordinates for each node, that are necessary for performing the operations that will create the final study area polygon.

DRIVE DISTANCE STUDY AREA: THE CREATION PROCESS

The drive distance study area is essentially created in three stages: CLIP, NETWORK, and BUFFER. These stages are capitalized because they are also the primary ArcInfo commands that produce the desired output for the stage. The inputs to this process are the highway coverage and the following user input:

 1. The starting point for the drive.
    The starting point is the closest node in the highway coverage to the user's selected location.
    The selected node is shown on screen so that the user is aware of the actual starting point.

2. The distance in miles to be travelled along the network.

3. The study area name.

The output of these stages are coverages which are illustrated in the Study Area Creation Process Stages figure below.

THE CLIP STAGE

Once the user provides the necessary input, the creation process begins. In the first stage, a temporary circle coverage is created and used to CLIP the master highway coverage, which contains all arcs within the state. The object of this stage is to extract a subset of the arcs needed, based on the user criteria.

The center of the circle coincides with the start point of the drive so that only the highway arcs contained within the longest possible distance from the start point are output from the CLIP function. The circle coverage is CLEANed to establish the polygon topology needed for the CLIP to work. This topology represents each arc's geographic relationship to other arcs in the coverage, and each arcs relationship to all arcs in the coverage. Without this relationship established, the CLIP polygon, in this case the circle used to define the extent of the output coverage, has no reference to the input coverage.

Normally, coverages need only to be built with the ArcInfo BUILD command to establish topology. The BUILD takes less time than a CLEAN since there is no correction of topological errors. However, the BUILD process fails with subsequent temporary circle coverage creation because the process detects a topological error. This occurs even though the previous temporary circle coverage and all its components are eliminated. Since the circle is a single polygon, the additional time it takes to CLEAN the coverage instead of BUILD it is negligible.

The CLIP finds the arcs of the highway coverage contained within the circle, creates a new coverage with the vector data associated with the proper arcs and reestablishes arc topology. The circle coverage is eliminated as soon as the clip is completed. The resulting subset coverage of the master highway coverage is placed in a temporary workspace.

This stage is done for the sake of efficiency and stability. Processing the subset of arcs cuts down on processing time and memory. NETWORK functions performed in the next step of the process require that the highway coverage be in the user write access state. Moving the arcs to another coverage in another location eliminates possible conflict of two processes using the same coverage. The Georgia 100 is a dynamic application, and in this manner, the system can create a study area in the background while a user safely initiates a foreground process using the highway coverage.

THE NETWORK STAGE

The input of this stage is the subset highway coverage generated from the CLIP. The extent of this coverage is potentially greater than what is required by the user specified drive distance. The purpose of this stage therefore, is to create another highway coverage from the subset highway coverage, whose end points actually represent the end points of the drive distance. The input and output of this stage are represented in the following figure:

The ArcInfo NETWORK module provides the functionality to produce the output coverage for this stage. This is done by defining a route-system from the arc highway coverage with the ArcInfo NETCOVER and ALLOCATE commands and then creating a new coverage from the route-system. A route-system, according to ArcInfo version 7.0.4 on line documentation, is "A collection of routes representing separate instances of a common linear entity, for example, all school bus routes in a city." The same documenation defines a route as, "A feature class in ArcInfo that is part of the route-system data model used to represent linear features. Routes are based on an arc coverage and are defined as an ordered set of sections. Because sections represent the portion of an arc used in a route, routes do not have to begin or end at nodes."

Route systems are an efficient means of organizing data. Several route-systems can be associated with the same coverage, without duplicating the vector data. Route-systems are valuable in this example because the route definitions are independent of vector definitions; they are not confined to nodes as end points of arcs. The route is defined precisely by the length of the user specified drive distance, without compromising the integrity of the vector data in the arc coverage.

The NETCOVER command establishes a specific route-system to arc coverage relationship and the ALLOCATE command is used to define the route-system by assigning arcs to the start point based on the demand length of the network. ALLOCATE uses the three notions of supply, demand and impedance to write the route-system. In the case of the drive distance, the supply and impedance are the same: the user specified distance. The demand is the length of the arcs along the network. If the user wishes to see a 20 mile drive, then the ALLOCATE algorithm finds that the center, or start point only has 20 miles of length to assign to the route along each arc in the path. All arcs within 20 miles of the start point are assigned to the route-system and arcs that extend beyond the 20 miles are severed at the 20th mile.

Within the Georgia 100 application, the ALLOCATE process actually occurs twice in the NETWORK stage. In the first process, the impedance length (or supply assigned to the center) is the actual distance the user specifies. The resulting route-system created by the ALLOCATE function is displayed to the user with the ArcInfo ROUTELINES command.

This route-system is created for display purposes only. If the user accepts the result, then the second ALLOCATE is perfomed. In the second process, the impedance length is the user specified distance minus one mile. The resulting study area still reflects the total distance specified by the user; the BUFFER stage compensates for the lost mile.

The other major difference between the two ALLOCATE sessions is the use of a centers file to designate the start point of the route system and the impedance item. The first ALLOCATE uses the tart point coordinates provided by the &PUSHPOINT directive to find the center node in the node attribute table and uses the amount parameter. The second allocate uses a centers file to provide this information as input to the algorithm. The primary reason for the use of the centers file is that the &PUSHPOINT function requires the current shell to be using a graphic device. Since this is a background process and there is no need for a graphic display, the use of a centers file is more appropriate.

The BUFFER stage:

This is the final stage. The input of the BUFFER stage is the subset highway coverage whose arcs represent the actual length of the drive distance specified by the user from the center. The output coverage is the Drive Distance Study Area, a single polygon whose extent represents the endpoints or extent of the input subset highway coverage, and interpolated points in between. BUFFER is the primary ArcInfo command that accomplishes the creation of a single polygon coverage from the input arc coverage in this stage. The results of the steps performed in this stage are shown as items in the following figure:

Two BUFFER commands are issued in this stage. The first BUFFER is performed on the subset highway coverage with the polygon option. This produces the billowy shaped polygon shown as item 3 in the above BUFFER stage figure. The second BUFFER is perfomed with the line option, on the polygon created from the first BUFFER. The result is show as item 4 in the above BUFFER stage figure. The line option enables the polygon to be buffered in both directions.

Theoretically, the first BUFFER should work with the subset highway coverage created in the NETWORK stage. However, when the input coverage contains a dense network of roads, the BUFFER bails out with a full point table error1 . Simplifying the input coverage works around this error.

Since only the the end points of the outermost arcs are needed for the BUFFER, the inner arcs are eliminated. This is accomplished by BUILDing the arc coverage as a polygon coverage, selecting the arcs of the outer polygons and creating a new arc coverage from the selected set of arcs. This simplification is illustrated as item 2 in the previous BUFFER stage figure. The resulting coverage is then further simplified with the GENERALIZE command, which eliminates vertices along arcs based on the specifed elimination specification, or weed tolerance.

Before the second BUFFER is performed, the intermediary polygon is checked for slivers and inner polygons. This is done by selecting the outermost arcs, creating a new cover from the selected set and building the new coverage with line topology. The occurrence of slivers and inner polygons is very rare, especially given that the input coverage is a GENERALIZEd coverage, but their presence would propogate through later stages.

The buffer distance parameter of the first BUFFER operation has a direct relationship in the shape of the resulting polygon and was chosen for specific reasons. This buffer distance is 9 miles, which tends to create a final study area polygon that is billowy shaped. The areas that are billowed inward in the final study area polygon represent an area of inference, or void of data in between arcs in which other road networks of higher resolution, such as streets, are not accounted for in the model but could be physically chosen for the drive. Choosing an initial buffer distance that is too small results in a final study area polygon that billows inward drastically in between the known end points of the drive. The resulting polygon in this scenario limits the areas of inference and therefore, the possibility of higher resolution roads in the area. Inversely, if the buffer distance is too great, then the shape of the final study area polygon tends to blur the distinction between the areas of inference and the known end points of the drive. The areas of inference in between then known end points may be too extensive in this resulting polygon. The final study area polygons created with these two scenarios are shown in the following illustration. The green shaded polygon represents a study area created with a two mile buffer distance. The blue polygon represents a study area created with a 20 mile buffer distance. The red lines represent the highway road network used for the drive model. This highway network exists in a metropolitan area where the street network is extremely dense.


Since the first BUFFER extends away from the road network 9 miles, the second buffer, performed on the polygon created from the first buffer, must extend 8 miles. The resulting inner polygon of this coverage correctly represents the extent of the user specified drive distance from the network start point. The extra mile compensates for the missing mile in the network created from the second ALLOCATE. The outer polygon in this coverages is a by-product of the second buffer, and is reselected from the polygon coverage and discarded. The inner polygon is retained as the final study area. This method creates a study area that is less angular in shape than if the route system extended out the entire distance and the second BUFFER did not compensate for the lost mile. The following figure illustrates this point, albeit the difference in the results is very slight. In fact, the difference may not be enough to warrant the overhead it takes to define the extra route-system in the NETWORK stage.


The Drive Distance Study Area creation process is now complete at the culmination of this stage. The polygon may now be used in the application for highlighting particular areas of interest on a map, or for defining a new geographic entity on which to report the many types of economic data that the application offers. Use of the study area in mapping is a straightforward process. For use with reporting a user has many more options on how to report data with the study area as a geographic unit. For instance, the user may choose to extract data by: the data collection units that fall entirely within the study area boundaries, the data collection units that are touched by the study area, or to use the study area as a "cookie cutter" to clip the data collection units at the study area boundary line, whereby the units severed are assigned relevant factors to appropriately apportion the data, which the user wishes to report. All options are valid, and all options give a user added value with the creation of their own custom Drive Distance Study Srea.

CONCLUSION

The Drive Distance Study Area creation functionality makes the Georgia 100 application a true GIS. The ArcInfo functions used to create the user defined data set are hidden within an intuitive interface so that the user does not need to depend on their ArcInfo knowledge to find, for example, how many poultry proccessing plants are within a 25 mile drive of their farm, or to determine the demand potential of their product for the area within a 10 mile drive of their store. They are sufficiently informed to make judgements of the validity of their products but not bogged down with extraneous details of database management and command order or syntax.

The functionality is interesting in that there is a degree of qualitative judgement on the part of the developer in its implementation that directly affects maps, and reports in particular, created with the benefit of these study areas. These decisions must balance the priorities of application performance in speed and memory management, and the shape of the final polygon. Of course, a Drive Distance Study Area created with the benefit of a comprehensive, accurate transportation coverage including streets, county roads, state highways, U.S. highways and interstates would be the most desirable input to the process. However, processing time increases exponentially with such a dense road network to model. A decision such as lowering the buffer distance in the final stage would cut processing time. However, this would produce a polygon that eliminates many areas that could be included in the drive, had the network been modelled on a complete transportation coverage, from within its extent.

ACKNOWLEDGEMENT

The Georgia 100 was developed by the University of Georgia, Office of Information Technology Outreach Services located at Chicopee Complex, 1180 E. Broad Street, Athens, GA 30602-5418.

END NOTES

1.   The error message reported to the standard output is thus:

              Software limitation, point table is full (CREPNT). Bailing out of OVRSEG.

      An investigation of the ArcInfo technical notes found on the Esri web site reveals that the       problem could be a result of coverage data limitations. ArcInfo Technical Note No: 1209,
      dated November 15, 1995 states:      

      "This limitation results from a maximum of 8,000 entries in the active segment table. It could happen,        for instance, when generating a buffer with a fuzzy tolerance that is relatively large compared to the        buffer distance."

      Since a user can choose to create a Drive Distance Study Area in different areas with varying degrees       of road density, using a fixed low fuzzy tolerance would theoretically solve the problem, but finding       the right fuzzy tolerance would take a lot of time in experimentation. This would also not be optimum       in many cases, where a dense configuration of arcs in the input coverage could be altered       significantly. Another possible solution, finding the right fuzzy tolerance for each instance of study       area creation, is impractical and would result in a lot of overhead in the application. Simplifying the       input coverage works around the BUFFER bailout.


REFERENCES

Environmental Systems Research Institute Inc., (Esri). ArcInfo version 7.0.4 on-line documentation:      Environmental Systems Research Insitute, Inc., Redlands, CA.

Environmental Systems Research Institute Inc., (Esri). Technical Notes. Internet home page URL:
      http://www.Esri.com. 

AUTHOR INFORMATION

Anita Russo, Program Specialist
University of Georgia, Information Technology Outreach Services,
Chicopee Complex, 1180 E. Broad Street, Athens, GA 30602 -5418
Phone: (706)542-5323
E-mail: russo@itos.uga.edu

Erik Shepard, Program Specialist
University of Georgia, Information Technology Outreach Services
Chicopee Complex, 1180 E. Broad Street, Athens, GA 30602-5418
Phone: (706)542-5323
E-mail: shepard@itos.uga.edu
WWW: http://www.itos.uga.edu/~shepard