Alnoor Ladak, and Roberto B. Martinez
The availability of a high quality urban dataset from The Centre for GIS representing man-made structures formed the basis of the project. The challenge was to be able to identify polygons representing roads and then derive road centrelines from these polygons that precisely traversed the centre of the road throughout the network.
This paper talks about the outcome of this project, namely: possibilities investigated to develop an automated process requiring minimal user intervention the Thiessen Polygons-based solution that was consequently adopted post-processing issues such as treatment of intersections, round-abouts, accessways, parking bays, etc. quality control and derived data verification procedures lessons learnt in developing automated data- derivation techniques
The program - which is now in production - was developed using AMLs working in ARC, ARCEDIT and ARCPLOT with ArcInfo 7.0.3.
Introduction The coordinated national GIS-effort in the State of Qatar entrusts each participating agency with the responsibility of providing high quality data sets that fall within the domain of the respective agencies. The CED Roads Division, GIS Unit is thus charged with the responsibility of providing the roads flowline (centreline) network coverage that not only accurately reflects the road network in the State, but also is continually updated to reflect the changing geometry of the road network. A flowline is defined as the line at the centre of the traveled way across a road extent. A project within this scope was thus undertaken to analyze potential avenues to re-build this dataset as it was evident that a significantly updated Road Flowline Network coverage was required to replace the current data. The challenge was to determine an economical - yet effective - solution without doing a complete site survey of the road network. Various methods such as manual on-screen digitizing of the network from digital ortho images, were investigated. The basic problem with manual processes, especially in this case where the mid-point of the cross-section had to be continuously determined, was digitizing consistency and accuracy. Digitizing speed was also a problem. Based on these factors, it was determined that an automated process for generating flowlines was required. The objective of this process was to try and automate the generation as much as possible in order to either eliminate or minimize manual or operator intervention. The next step in the feasibility study was to determine the basic technique that would enable the flowline to be mathematically calculated. The most obvious approach was to pick pairs of points on opposite edges of the road polygons, calculate the shortest distance between them, and hence the mid-point. This collection of mid-points could then be joined to form the flowline. The THIESSEN polygon command in ArcInfo seemed to show the promise of deriving these mid-points and their corresponding centrelines. The technique essentially generates a Triangular Irregular Network (TIN) from the available points, which in this case are located on the road edges. Next, the sides of each triangle are bisected perpendicularly. The resulting lines (bisectors) form the edges of the Thiessen polygons. In our particular scenario, the premise was that as each of the TIN triangles was generated from at least two points on opposite sides of the road polygon, the perpendicular bisector for at least one of the triangle's sides should fall along the centre of the road polygon's cross-section. All such perpendicular bisectors would then collectively form the flowline that would traverse the mathematical centre of each road polygon. Results from preliminary tests proved encouraging and seemed to support the theory, in effect allowing the project to proceed. Data Sources The key factor in developing the process was the availability of good quality, complete base vector data from which the flowlines could be derived. The 1:1000 scale Urban polygon layer as identified in The Qatar National GIS Database Specifications and Data Dictionary Volume I: Topographic, was selected as the source. The data is organized in an ArcInfo library with each tile representing a 1:1000 scale mapsheet covering a 0.5 sq. km. area. Urban features or structures are represented by polygons. Each polygon is assigned a unique eight character feature code based on the above specification. This code is used to identify the type of feature the polygon represents, for example, road (TGARROAD), intersection (TGARIRRD), carpark (TGARCAPK), etc. Once the flowlines were generated, it was necessary to ensure that they were a true reflection of the actual road network as quality was an important factor. Thus, it was essential to verify the accuracy of the network with an independent data source. 1:4000 scale digital ortho imagery of the urban landscape of the State of Qatar was selected for this post-generation stage. Methodology The flowlines are generated in stages, one mapsheet at a time. This is partly due to the organization of the base polygon data in the form of library tiles. Other factors that supported this approach were processing time and disk space requirements as the intermediate or temporary coverages generated during the process are quite large compared to the final flowline coverage. The flowline generation process is divided into the following stages: Area Definition: A clipcoverage that is based on the mapsheet neatline coverage is created. The extent of the clip coverage is 10% larger than that of the neatline coverage. The larger area is required to overcome problems at the mapsheet edges as detailed later in the paper. The area encompassed by the clipcover becomes the working area and all road polygons that fall within this area are selected for processing. Feature Selection: Selection of the road polygons from the base Urban polygon layer is governed by the feature code classification system as described earlier. All road related polygons such as roads, intersections, accessways, major carparks, etc. within the mapsheet are selected as are road polygons from adjacent mapsheets that fall within the clipcoverage extent.
Figure I: All road-related polygons are selected (including intersections, accessways, pedestrian crossovers, etc.) from the 1:1,000 Topographic Urban layer database. Once the polygons are selected, the process now needs to identify their associated arcs. That is, the arcs that comprise the polygons. This is necessary as at this stage the processing shifts focus to the arcs. An important point to note is that just as the polygons each have a feature code, the arcs too have feature codes assigned to them. Based on these codes, arcs that make up the common side of adjacent polygons are coded as virtual extents. These virtual extents do not in reality represent physical features such as kerb lines. Thus un-selecting these arcs results in a seamless road polygon network as depicted in Figure II. At this juncture, all the required arcs that have been selected enable the flowline generation process to continue.
Figure II: All road edges of the road polygons from the Urban Layer are selected excluding virtual extents (shown as dotted lines). Base Data Conditioning: The selected arcs now need to be conditioned for the Thiessen polygons technique. The vertices and nodes from each arc are used to generate the polygons. The smoothness of the flowline that will be extracted, depends heavily on the number of polygons generated. Basically, the larger the individual Thiessen polygon, the rougher (jagged) the flowline. This is because the side of the larger polygon that will form a segment of the flowline will not mimic the road curvature as finely as a collection of smaller lines. It thus follows that a larger number of smaller polygons needs to be generated. As the Thiessen polygons are generated from the nodes and vertices of the road-edge arcs, by conjecture, the larger the number of vertices close to each other, the larger the number of "smaller" Thiessen polygons. Hence the selected road-edge arcs are DENSIFYed with the VERTEX option to achieve the desired result. An important consideration is the densification factor as this value has numerous ramifications particularly on the smoothness of the resulting flowline, size of intermediate coverages and processing time. Extensive testing resulted in a 30 cm (11.81 inches) interval. This in effect creates a vertex every 30 cm along the road-edge arc. The densified arc coverage from which the nodes and vertices are extracted is converted to a point coverage. The point coverage whose size is directly related to the densification factor, contains approximately 1,000,000 points on average. This will obviously vary depending on the factor selected.
Figure III: All the densified lines forming the road edges are converted to points traversing the road outline at 30 cm. (11.81 inches) intervals. A typical clipcoverage contains approximately 1,000,000 points. Thiessen Polygon Technique: A brief description of how Thiessen polygons are generated, and their application to this paper's subject, is discussed in the introduction. A detailed elucidation is beyond the scope of this paper. The THIESSEN command is applied to the point coverage to generate the necessary polygons. What emerges from this is a coverage with a significantly large number of polygons which at first sight appear to depict a work of spirographic art. However amongst this medley of polygons and their associated arcs, exists the flowline in its primitive form as depicted in Figure IV.
Figure IV: All points along the road edges are converted to Thiessen polygons. The resulting arcs along the mathematical center of the roads form the centerlines. The flowline must now be filtered out by discarding all unwanted features. This is done by working on the arc feature-level of the Thiessen polygon coverage. The bulk of the surplus arcs are un- selected by overlaying the Thiessen coverage on the original road polygon network from the Urban layer. Only arcs that fall entirely within the original road polygon network are selected. Although this filters out most of the unwanted arcs, some rogue arcs such as forks and dangles still need to be processed.
Figure V: All Thiessen polygon lines falling within the road polygons are selected to become the newly generated road flowlines. At this juncture, the flowline is made up of a large number of small arcs. This is a result of the high density of points that was generated at the beginning of this process which was necessary to generate smooth curvatures at turns and roundabouts - in effect a smooth flowline. This objective being accomplished, arcs with common nodes (pseudonodes) are combined using UNSPLIT. This significantly reduces the number of arcs in the coverage by as much as 95%. For example, a typical coverage at this point is reduced from around 27,000 arcs to 800 arcs. This is in keeping with the subject of the geometry being created. That is, a flowline representing a road does not need to be in the form of a large number of small multiple arcs attached to each other node to node. Thus, where feasible the number of arcs are reduced. Noted exceptions include intersections where the arc is split although it continues past the intersection on the same road. Although the number of arcs at this point has been reduced, the effect of densifying the original arcs may still result in an unnecessarily high number of vertices in each of the arcs. The number of vertices are reduced by GENERALIZing the arcs. An important consideration is the weed tolerance as a high value can alter the geometry of the flowline to yield unacceptable results. At this point the arcs have been processed enough to establish arc-node topology. This is also necessary as the feature attribute table is required for further processing. Attributes Transfer: This stage of the process is only included to transfer specific attributes from the original Urban layer to the flowlines. Arcs from the flowline coverage are INTERSECTed with the Urban layer polygons to transfer the feature code (described earlier). This enables the arcs to be individually identified as roads, accessways, intersections, etc. This stage impacts the post-processing stage as it enables easier selection of required arcs. Post Processing Flowlines: Post processing of flowlines is necessary - although not required - to enhance the quality of the end product. This stage deals with each of the situations discussed below individually as each one of them is distinct and requires extensive logic due to the intricacies involved. In hindsight, there was absolutely no indication at the beginning of this project that this stage would turn out to be so involved. Approximately 55% of the code for the entire process is dedicated to this stage. The individual situations are described below in brevity. Dangles: In the flowline network, dangles are arcs that are not connected at either one or both ends. However, not all dangles are illegitimate roads. Certain arcs that appear to be dangles are actually roads that continue past the individual mapsheet boundary; or are roads that lead to carparks or accessways. These arcs have to be retained as part of the network, while the remaining dangles must be processed. Before continuing however, all roads intersecting with mapsheet boundaries are selected and re-tagged so as not to be misconstrued as dangles. All (true) dangles - excluding roads that lead to carparks, accessways and those that crossover neatlines - are selected. The processing at this stage is controlled by focusing on the nodes of the dangles. The logic determines if the dangle qualifies as a "true dangle" or is a fork situation. In the former case, the dangle is deleted. In the latter case, it is retained for further processing as described below.
Figure VI: A closer view of a typical dangle and "fork" in the flowline. Forks: Fork situations occur at road ends throughout the network. This is a result of how Thiessen polygons are generated. The situation is not rectified earlier in the process, as the entire fork lies within the corresponding road polygon extent, thus making it difficult to filter out. A fork situation is identified by focusing on the common node of the three arcs forming the fork. The prong arcs of the fork are deleted, while the stem of the fork is extended to the true end of the road. The "true end" of the road is determined by temporarily appending the appropriate road extent arcs from the original Urban layer. This enables the precise extension of the stem to the appended road extent arc. Upon completion, the appended road-extent arcs that are easily distinguishable based on their feature code, are dropped reverting the coverage back to its desired state.
Figure VII: Post-processing routines are applied to delete dangles or overshoots along the road flowlines; and to remove 'forks' at road ends. T-junctions: The next major stage is the processing of arcs at intersections. As depicted in the graphic below, the flowline network has kinks that are created as a result of the true location of the centre of the intersection along the flowline. This situation generally occurs at T-junctions, accessways and roundabouts. This stage is relatively complicated in that it requires extensive processing for the numerous distinct occurrences.
Figure VIII: Examples of typical T-junction intersections. Basically, the flowline arcs within each intersection polygon are selected. The angles of the (arc) nodes intersecting with the intersection polygon, are compared to the angle of the of the intersection's orientation. Consistently, two of the three intersecting nodes in question have similar angular attributes that enable the selection of the two associated arcs. These arcs are straightened to form the head of the T- junction. The third arc is extended to the head of the junction.
Figure IX: Further post-processing is required to straighten flowlines at road intersections and accessways. Accessways: This situation is similar to the T-junctions in how it is processed. The only significant difference is how the three arcs that make up the accessway "T", are selected. This is because unlike T-junctions, where an intersection polygon from the Urban layer explicitly defines the intersection's extent, no such polygon exists for accessways. An arc with the accessway feature code is selected. Its underlying accessway polygon from the Urban polygon layer is appended to the flowline coverage. Please note that this polygon is different from a typical intersection polygon. An intersection polygon is defined within the road polygon network, while an accessway polygon sits at the edge of the road polygon network sharing only one common side with the road network. The process shifts focus to the arc-feature level. The edges of the accessway polygon are extended until they intersect with the flowline. The flowline arcs are split at these intersection points. This is done so as to end up with the three-arc scenario as described in the T-junctions stage. The third arc is the actual accessway arc. This arc too is split at the accessway-road common edge. The three arcs that form the "T" are selected and processed in exactly the same way as T-junctions. The appended accessway polygon arcs are removed at the end of this stage. Carparks: Although carparks are not usually part of a true road flowline network, our specific needs required their inclusion. They are easily identified as the carpark arcs have a distinct feature code. The process creates a carpark by traversing the edge of the carpark polygon from the Urban layer as depicted in Figure X below.
Figure X: Carparks are added to ensure connectivity to isolated roads. Minimal operator intervention is needed to rectify special cases like roundabouts and other irregularly-shaped road intersections. Mapsheet Extent Clipping: Once the flowline coverage passes the above post-processing procedures, the coverage is clipped at the mapsheet neatline. All arcs between the mapsheet and the clipcover neatlines are dropped. This technique was adopted to overcome accuracy and edge-matching problems that arose if the generation process was restricted to the mapsheet neatline. During the pilot stages of this project, it was observed that the flowlines generated at the edge of the mapsheet in the vicinity of the neatlines were skewed or bent. This problem was attributed to the arrangement of the points in this area and how the resulting Thiessen polygons were created. Thus, an additional "buffer" area of 10% was selected after experimentation. This buffer in effect relocates the problem outside the mapsheet boundary resulting in accurate flowlines once the coverage is clipped. Quality Control: The post-processing stage completes the flowline generation process. The flowline coverage is thoroughly examined before it is databased. The coverage is first checked by a trained technologist to look for any discrepancies or inconsistencies in the flowline. Suspect situations are noted. The flowline coverage is then overlaid on the Urban polygon layer to resolve any suspect situations. Generally, these situations are resolved at this stage. The flowline coverage is further checked by overlaying it on the ortho image(s) for a final check. Once the flowline coverage passes the above quality control procedures, it is databased by appending it into the master flowline coverage for the State of Qatar. The newly databased area in the master coverage is visually screened to ensure its complete integration into the road network. The complete quality control stage - which is the only part where manual intervention is required - takes approximately 20 minutes. Conclusion: The flowline generation process took approximately 7 man-months and 5,000 lines of code to complete. It has been in production for the past 8 months and has successfully processed around 800 mapsheets. The initial objective of the project requiring an automated process with minimal user intervention has been achieved. It is hoped that with this process, maintaining an updated flowline network will become a practical reality. This project provided valuable lessons and insights, not only in how ArcInfo works, but also into the applications and power of GIS technology. The experience gained over the past year by working on this project has already given rise to new ideas that will help refine the process even further. The next version of this process will hopefully extend its capability to produce even better results. Acknowledgments Our sincere thanks to Mr. Zul Jiwani, Head, Centre for GIS, State of Qatar for believing in us by giving us the latitude to explore; Mr. Jeff Kuran, Head, Mapping and Positioning Services Section, Centre for GIS, State of Qatar, for his encouragement and guidance through the development phase of this project. References  Environmental Systems Research Institute, Inc. ARC Command References. Redlands:Esri, 1992  Environmental Systems Research Institute, Inc. ARCPLOT Command References. Redlands:Esri, 1992  Environmental Systems Research Institute, Inc. ARCEDIT Command References. Redlands:Esri, 1992  Environmental Systems Research Institute, Inc. COGO Command References. Redlands:Esri, 1992  Environmental Systems Research Institute, Inc. AML User's Guide. Redlands:Esri, 1992  Centre for GIS, State of Qatar. The Qatar National GIS Database Specifications and Data Dictionary Volume I: Topographic