Alnoor Ladak, and Roberto B. Martinez

Automated Derivation of High Accuracy Road Centrelines Thiessen Polygons Technique

The Roads Centreline coverage for the State of Qatar was created a few years ago in response to the needs of the growing GIS user community. Over the years, with the absence of formal maintenance procedures, it had become increasingly difficult to maintain this dataset using traditional methods which were resource intensive, time consuming and expensive. Last year this problem was thoroughly investigated by CED Roads Division, GIS Unit as the availability of a highly accurate, updated roads centreline coverage was becoming imperative. As resource restrictions did not allow a complete site survey of the extensive road network, an economical yet effective solution was required.

The availability of a high quality urban dataset from The Centre for GIS representing man-made structures formed the basis of the project. The challenge was to be able to identify polygons representing roads and then derive road centrelines from these polygons that precisely traversed the centre of the road throughout the network.

This paper talks about the outcome of this project, namely: possibilities investigated to develop an automated process requiring minimal user intervention the Thiessen Polygons-based solution that was consequently adopted post-processing issues such as treatment of intersections, round-abouts, accessways, parking bays, etc. quality control and derived data verification procedures lessons learnt in developing automated data- derivation techniques

The program - which is now in production - was developed using AMLs working in ARC, ARCEDIT and ARCPLOT with ArcInfo 7.0.3.



Introduction



The coordinated national GIS-effort in the State of Qatar entrusts each 

participating agency with the responsibility of providing high quality

data sets that fall within the domain of the respective agencies. The 

CED Roads Division, GIS Unit is thus charged with the responsibility 

of providing the roads flowline (centreline) network coverage that not 

only accurately reflects the road network in the State, but also is 

continually updated  to reflect the changing geometry of the road 

network. A flowline is defined as the line at the centre of the traveled 

way across a road extent.



A project within this scope was thus undertaken to analyze potential 

avenues to re-build this dataset as it was evident that a significantly 

updated Road Flowline Network coverage was required to replace the 

current data. The challenge was to determine an economical - yet 

effective - solution without doing a complete site survey of the road 

network.



Various methods such as manual on-screen digitizing of the network 

from digital ortho images, were investigated. The basic problem with 

manual processes, especially in this case where the mid-point of the 

cross-section had to be continuously determined, was digitizing 

consistency and accuracy. Digitizing speed was also a problem. Based 

on these factors, it was determined that an automated process for 

generating flowlines was required. The objective of this process was to 

try and automate the generation as much as possible in order to either 

eliminate or minimize manual or operator intervention.



The next step in the feasibility study was to determine the basic 

technique that would enable the flowline to be mathematically 

calculated. The most obvious approach was to pick pairs of points on 

opposite edges of the road polygons, calculate the shortest distance 

between them, and hence the mid-point. This collection of mid-points 

could then be joined to form the flowline.



The THIESSEN polygon command in ArcInfo seemed to show the 

promise of deriving these mid-points and their corresponding 

centrelines. The technique essentially generates a Triangular Irregular 

Network (TIN) from the available points, which in this case are 

located on the road edges. Next, the sides of each triangle are bisected 

perpendicularly. The resulting lines (bisectors) form the edges of the 

Thiessen polygons.



In our particular scenario, the premise was that as each of the TIN 

triangles was generated from at least two points on opposite sides of the 

road polygon, the perpendicular bisector for at least one of the triangle's 

sides should fall along the centre of the road polygon's cross-section. 

All such perpendicular bisectors would then collectively form the 

flowline that would traverse the mathematical centre of each road 

polygon. Results from preliminary tests proved encouraging and seemed 

to support the theory, in effect allowing the project to proceed.



Data Sources



The key factor in developing the process was the availability of good 

quality, complete base vector data from which the flowlines could be 

derived. The 1:1000 scale Urban polygon layer as identified in The 

Qatar National GIS Database Specifications and Data Dictionary 

Volume I: Topographic, was selected as the source. The data is 

organized in an ArcInfo library with each tile representing a 1:1000 

scale mapsheet covering a 0.5 sq. km. area. Urban features or structures 

are represented by polygons. Each polygon is assigned a unique eight 

character feature code based on the above specification. This code is 

used to identify the type of feature the polygon represents, for example, 

road (TGARROAD), intersection (TGARIRRD), carpark (TGARCAPK), etc.



Once the flowlines were generated, it was necessary to ensure that they 

were a true reflection of the actual road network as quality was an 

important factor. Thus, it was essential to verify the accuracy of the 

network with an independent data source. 1:4000 scale digital ortho 

imagery of the urban landscape of the State of Qatar was selected for 

this post-generation stage.



Methodology



The flowlines are generated in stages, one mapsheet at a time. This is 

partly due to the organization of the base polygon data in the form of 

library tiles. Other factors that supported this approach were  processing 

time and disk space requirements as the intermediate or temporary 

coverages generated during the process are quite large compared to the 

final flowline coverage. The flowline generation process is divided into 

the following stages:



Area Definition:

A clipcoverage that is based on the mapsheet neatline coverage is created. 

The extent of the clip coverage is 10% larger than that of the neatline 

coverage. The larger area is required to overcome problems at the 

mapsheet edges as detailed later in the paper. The area encompassed by 

the clipcover becomes the working area and all road polygons that fall 

within this area are selected for processing.





Feature Selection:

Selection of the road polygons from the base Urban polygon layer is 

governed by the feature code classification system as described earlier. 

All road related polygons such as roads, intersections, accessways, 

major carparks, etc. within the mapsheet are selected as are road 

polygons from adjacent mapsheets that fall within the clipcoverage 

extent.

Figure I


Figure I: 	All road-related polygons are selected (including 

intersections, accessways, pedestrian crossovers, etc.) from 

the 1:1,000 Topographic Urban layer database.



Once the polygons are selected, the process now needs to identify their 

associated arcs. That is, the arcs that comprise the polygons. This is 

necessary as at this stage the processing shifts focus to the arcs. An 

important point to note is that just as the polygons each have a feature 

code, the arcs too have feature codes assigned to them. Based on these 

codes, arcs that make up the common side of adjacent polygons are 

coded as virtual extents. These virtual extents do not in reality represent 

physical features such as kerb lines. Thus un-selecting these arcs results 

in a seamless road polygon network as depicted in Figure II. At this 

juncture, all the required arcs that have been selected enable the 

flowline generation process to continue.

Figure II


Figure II: 	All road edges of the road polygons from the Urban Layer 

are selected excluding virtual extents (shown as dotted lines).



Base Data Conditioning:



The selected arcs now need to be conditioned for the Thiessen polygons 

technique. The vertices and nodes from each arc are used to generate 

the polygons. The smoothness of the flowline that will be extracted, 

depends heavily on the number of polygons generated. Basically, the 

larger the individual Thiessen polygon, the rougher (jagged) the flowline.

This is because the side of the larger polygon that will form a segment of

the flowline will not mimic the road curvature as finely as a collection of 

smaller lines. It thus follows that a larger number of smaller polygons

needs to be generated.



As the Thiessen polygons are generated from the nodes and vertices of 

the road-edge arcs, by conjecture, the larger the number of vertices close 

to each other, the larger the number of "smaller" Thiessen polygons. 

Hence the selected road-edge arcs are DENSIFYed with the VERTEX 

option to achieve the desired result.



An important consideration is the densification factor as this value has 

numerous ramifications particularly on the smoothness of the resulting 

flowline, size of intermediate coverages and processing time. Extensive 

testing resulted in a 30 cm (11.81 inches) interval. This in effect creates 

a vertex every 30 cm along the road-edge arc.



The densified arc coverage from which the nodes and vertices are 

extracted is converted to a point coverage. The point coverage whose 

size is directly related to the densification factor, contains approximately 

1,000,000 points on average. This will obviously vary depending on the 

factor selected.

Figure III


Figure III: 	All the densified lines forming the road edges are converted 

to points traversing the road outline at 30 cm. (11.81 inches) 

intervals. A typical clipcoverage contains approximately 

1,000,000 points.



Thiessen Polygon Technique:



A brief description of how Thiessen polygons are generated, and their 

application to this paper's subject, is discussed in the introduction. A 

detailed elucidation is beyond the scope of this paper.



The THIESSEN command is applied to the point coverage to generate 

the necessary polygons. What emerges from this is a coverage with a 

significantly large number of polygons which at first sight appear to 

depict a work of spirographic art. However amongst this medley of 

polygons and their associated arcs, exists the flowline in its primitive 

form as depicted in Figure IV.

Figure IV


Figure IV:	All points along the road edges are converted to Thiessen 

polygons. The resulting arcs along the mathematical center 

of the roads form the centerlines.



The flowline must now be filtered out by discarding all unwanted 

features. This is done by working on the arc feature-level of the 

Thiessen polygon coverage. The bulk of  the surplus arcs are un-

selected by overlaying the Thiessen coverage on the original road 

polygon network from the Urban layer. Only arcs that fall entirely 

within the original road polygon network are selected. Although this 

filters out most of the unwanted arcs, some rogue arcs such as forks and 

dangles still need to be processed.

Figure V


Figure V:	All Thiessen polygon lines falling within the road polygons 

are selected to become the newly generated road flowlines.



At this juncture, the flowline is made up of a large number of small 

arcs. This is a result of the high density of points that was generated at 

the beginning of this process which was necessary to generate smooth 

curvatures at turns and roundabouts - in effect a smooth flowline.



This objective being accomplished, arcs with common nodes 

(pseudonodes) are combined using UNSPLIT. This significantly reduces 

the number of arcs in the coverage by as much as 95%. For example, a 

typical coverage at this point is reduced from around 27,000 arcs to 800 

arcs. This is in keeping with the subject of the geometry being created. 

That is, a flowline representing a road does not need to be in the form 

of a large number of small multiple arcs attached to each other node to 

node. Thus, where feasible the number of arcs are reduced. Noted 

exceptions include intersections where the arc is split although it 

continues past the intersection on the same road.



Although the number of arcs at this point has been reduced, the effect of 

densifying the original arcs may still result in an unnecessarily high 

number of vertices in each of the arcs. The number of vertices are 

reduced by GENERALIZing the arcs. An important consideration is the 

weed tolerance as a high value can alter the geometry of the flowline to 

yield unacceptable results. At this point the arcs have been processed 

enough to establish arc-node topology. This is also necessary as the 

feature attribute table is required for further processing.



Attributes Transfer:



This stage of the process is only included to transfer specific attributes 

from the original Urban layer to the flowlines. Arcs from the flowline 

coverage are INTERSECTed with the Urban layer polygons to transfer 

the feature code (described earlier). This enables the arcs to be 

individually identified as roads, accessways, intersections, etc. This 

stage impacts the post-processing stage as it enables easier selection of 

required arcs.



Post Processing Flowlines:



Post processing of flowlines is necessary - although not required - to 

enhance the quality of the end product. This stage deals with each of the 

situations discussed below individually as each one of them is distinct 

and requires extensive logic due to the intricacies involved. In 

hindsight, there was absolutely no indication at the beginning of this 

project that this stage would turn out to be so involved. Approximately 

55% of the code for the entire process is dedicated to this stage. The 

individual situations are described below in brevity.



Dangles:



In the flowline network, dangles are arcs that are not connected at either

one or both ends. However, not all dangles are illegitimate roads. Certain 

arcs that appear to be dangles are actually roads that continue past the 

individual mapsheet boundary; or are roads that lead to carparks or 

accessways. These arcs have to be retained as part of the network, while 

the remaining dangles must be processed.



Before continuing however, all roads intersecting with mapsheet 

boundaries are selected and re-tagged so as not to be misconstrued as 

dangles.



All (true) dangles - excluding roads that lead to carparks, accessways 

and those that crossover neatlines - are selected. The processing at this 

stage is controlled by focusing on the nodes of the dangles. The logic 

determines if the dangle qualifies as a "true dangle" or is a fork 

situation. In the former case, the dangle is deleted. In the latter case, it 

is retained for further processing as described below.

Figure VI


Figure VI:	A closer view of a typical dangle and "fork" in the flowline.



Forks:



Fork situations occur at road ends throughout the network. This is a 

result of how Thiessen polygons are generated. The situation is not 

rectified earlier in the process, as the entire fork lies within the 

corresponding road polygon extent, thus making it difficult to filter out.



A fork situation is identified by focusing on the common node of the 

three arcs forming the fork. The prong arcs of the fork are deleted, 

while the stem of the fork is extended to the true end of the road. The 

"true end" of the road is determined by temporarily appending the 

appropriate road extent arcs from the original Urban layer. This enables 

the precise extension of the stem to the appended road extent arc. Upon 

completion, the appended road-extent arcs that are easily 

distinguishable based on their feature code, are dropped reverting the 

coverage back to its desired state.

Figure VII


Figure VII:	Post-processing routines are applied to delete dangles or 

overshoots along the road flowlines;  and to remove 'forks' 

at road ends.



T-junctions:



The next major stage is the processing of arcs at intersections. As 

depicted in the graphic below, the flowline network has kinks that are 

created as a result of the true location of the centre of the intersection 

along the flowline. This situation generally occurs at T-junctions, 

accessways and roundabouts. This stage is relatively complicated in that 

it requires extensive processing for the numerous distinct occurrences.

Figure VIII


Figure VIII:	Examples of typical T-junction intersections.





Basically, the flowline arcs within each intersection polygon are 

selected. The angles of the (arc) nodes intersecting with the intersection 

polygon, are compared to the angle of the of the intersection's 

orientation. Consistently, two of the three intersecting nodes in question 

have similar angular attributes that enable the selection of the two 

associated arcs. These arcs are straightened to form the head of the T-

junction. The third arc is extended to the head of the junction.

Figure IX


Figure IX:	Further post-processing is required to straighten flowlines at 

road intersections and accessways.



Accessways:



This situation is similar to the T-junctions in how it is processed. The 

only significant difference is how the three arcs that make up the 

accessway "T", are selected. This is because unlike T-junctions, where 

an intersection polygon from the Urban layer explicitly defines the 

intersection's extent, no such polygon exists for accessways.



An arc with the accessway feature code is selected. Its underlying 

accessway polygon from the Urban polygon layer is appended to the 

flowline coverage. Please note that this polygon is different from a 

typical intersection polygon. An intersection polygon is defined within 

the road polygon network, while an accessway polygon sits at the edge 

of the road polygon network sharing only one common side with the 

road network.



The process shifts focus to the arc-feature level. The edges of the 

accessway polygon are extended until they intersect with the flowline. 

The flowline arcs are split at these intersection points. This is done so 

as to end up with the three-arc scenario as described in the T-junctions 

stage. The third arc is the actual accessway arc. This arc too is split at 

the accessway-road common edge. The three arcs that form the "T" are 

selected and processed in exactly the same way as T-junctions. The 

appended accessway polygon arcs are removed at the end of this stage.



Carparks:



Although carparks are not usually part of a true road flowline network, 

our specific needs required their inclusion. They are easily identified as 

the carpark arcs have a distinct feature code. The process creates a 

carpark by traversing the edge of the carpark polygon from the Urban 

layer as depicted in Figure X below.

Figure X


Figure X:	Carparks are added to ensure connectivity to isolated roads. 

Minimal operator intervention is needed to rectify special 

cases like roundabouts and other irregularly-shaped road intersections.



Mapsheet Extent Clipping:



Once the flowline coverage passes the above post-processing 

procedures, the coverage is clipped at the mapsheet neatline. All arcs 

between the mapsheet and the clipcover neatlines are dropped. This 

technique was adopted to overcome accuracy and edge-matching 

problems that arose if the generation process was restricted to the 

mapsheet neatline.



During the pilot stages of this project, it was observed that the flowlines 

generated at the edge of the mapsheet in the vicinity of the neatlines 

were skewed or bent. This problem was attributed to the arrangement of 

the points in this area and how the resulting Thiessen polygons were 

created. Thus, an additional "buffer" area of 10% was selected after 

experimentation. This buffer in effect relocates the problem outside the 

mapsheet boundary resulting in accurate flowlines once the coverage is 

clipped.



Quality Control:



The post-processing stage completes the flowline generation process. 

The flowline coverage is thoroughly examined before it is databased. 

The coverage is first checked by a trained technologist to look for any 

discrepancies or inconsistencies in the flowline. Suspect situations are 

noted.



The flowline coverage is then overlaid on the Urban polygon layer to 

resolve any suspect situations. Generally, these situations are resolved at 

this stage. The flowline coverage is further checked by overlaying it on 

the ortho image(s) for a final check.



Once the flowline coverage passes the above quality control procedures, 

it is databased by appending it into the master flowline coverage for the 

State of Qatar. The newly databased area in the master coverage is 

visually screened to ensure its complete integration into the road 

network.



The complete quality control stage - which is the only part where 

manual intervention is required - takes approximately 20 minutes.



Conclusion:



The flowline generation process took approximately 7 man-months and 

5,000 lines of code to complete. It has been in production for the past 8 

months and has successfully processed around 800 mapsheets. The 

initial objective of the project requiring an automated process with 

minimal user intervention has been achieved. It is hoped that with this 

process, maintaining an updated flowline network will become a 

practical reality. 



This project provided valuable lessons and insights, not only in how 

ArcInfo works, but also into the applications and power of GIS 

technology.  The experience gained over the past year by working on 

this project has already given rise to new ideas that will help refine the 

process even further. The next version of this process will hopefully 

extend its capability to produce even better results.



Acknowledgments



Our sincere thanks to Mr. Zul Jiwani, Head, Centre for GIS, State of 

Qatar for believing in us by giving us the latitude to explore; Mr. Jeff 

Kuran, Head, Mapping and Positioning Services Section, Centre for 

GIS, State of Qatar, for his encouragement and guidance through the 

development phase of this project.



References



[1] Environmental Systems Research Institute, Inc. ARC Command 

References. Redlands:Esri, 1992

 

[2] Environmental Systems Research Institute, Inc. ARCPLOT 

Command References. Redlands:Esri, 1992

 

[3] Environmental Systems Research Institute, Inc. ARCEDIT 

Command References. Redlands:Esri, 1992

 

[4] Environmental Systems Research Institute, Inc. COGO Command 

References. Redlands:Esri, 1992

 

[5] Environmental Systems Research Institute, Inc. AML User's 

Guide. Redlands:Esri, 1992

 

[6] Centre for GIS, State of Qatar. The Qatar National GIS Database 

Specifications and Data Dictionary Volume I: Topographic


Alnoor Ladak, Roads Division GIS Co-ordinator
Roberto B. Martinez, GIS Technologist
Centre for GIS, State of Qatar
P.O. Box: 22088
Doha, Qatar
Telephone: (974) 337556
Fax: (974) 444036