A Semi-Automated Approach to Extract Roads From Scanned Color Maps

The U.S. Army Topographic Engineering Center (TEC) is investigating semi-automated methods to extract roads from scanned color topographic maps. The first step in this process is to extract a preliminary road network using a commercial software package that applies a neural network approach. This results in a raster file containing a rough road network together with numerous non-road artifacts such as speckles, sporadic text, etc.. Knowledge-based rules that use geometric descriptors and compactness measures related to the linear nature of roads were written by TEC using ArcInfo's Arc Macro Language (AML). These rules are used to eliminate the artifacts and retain the roads. The resulting road network serves as a vector transportation layer suitable for use in a GIS.

_________________________________________________________________________________

1. INTRODUCTION

Data used in a Geographic Information System (GIS) commonly consists of various attributed vector data layers. Of the many possible layers, the transportation layer is frequently one of the most significant thematic data layers in many GIS's. Multiple sources of digital vector transportation layers currently exist for the United States. Nationwide road coverage exists in the U.S. Census Bureau's TIGER files, and U.S.G.S Digital Line Graphs (DLG) at a scale of approximately 1:100,000. Private vendors also market custom made vector transportation layers at larger scales. These layers exist primarily for metropolitan areas only.

However, this type of data is not available for many areas of the world or at scales required for certain types of analyses. Global digital coverage of roads is provided by the National Imagery and Mapping Agency's (NIMA) Digital Chart of the World (DCW) at a scale of 1:1,000,000. Vector transportation layers at intermediate scales of 1:250,000, or at larger scales of 1:50,000, do not exist over large portions of the world. These scales are required to support many military operations.

2. OBJECTIVES

In order to meet military requirements, the Army is faced with the need to rapidly build a GIS database containing roads over an area of interest at a scale greater than 1:1.000.000. The U.S. Army Corps of Engineers Topographic Engineering Center (TEC) is working on this problem as part of its internal Digital Terrain Data Generation (DTDG) research effort. The traditional approach to meet this need is to manually digitize and attribute roads from existing hardcopy maps or imagery. However, this is time consuming and labor intensive. Semi-automated and automated methods are required to rapidly produce the required data. Strictly automated approaches to extract roads from remotely sensed imagery show promise, but are still relatively immature.

TEC has built a prototype of a semi-automated method to identify potential roads on scanned color topographic maps, isolate them, and then vectorize the resulting road network. This method is capable of taking advantage of the following:

3. METHODOLOGY

Digital color map products distributed by NIMA called Arc Digitized Raster Graphics (ADRGs) are being used as input to this map parsing procedure. ADRGs are created at NIMA by scanning either a paper JOG or TLM. These products are scanned at 240 Dots Per Inch (DPI) and are georeferenced prior to distribution. This makes them more desirable than scanning a paper map, which would then have to be manually georeferenced. JOG and TLM ADRGs are available for approximately 30% of the world.

It should be noted that while ADRGs were used as input in the developed method, the principles described in this paper apply to any color map that is scanned and then subsequently georeferenced. The map parsing steps used to extract a road network from a digital color map are outlined in Diagram A.


A) Convert the raw ADRG. The native ArcInfo ADRG importer was used to create an ArcInfo grid, which was then exported to a SunRaster file. The SunRaster file is required for use in the next processing step. The ADRG importer also creates a World File that contains georeferencing information about the resulting ADRG grid. This World File is used in a subsequent processing step.

B) Extract preliminary road network. Lockheed Martin's Autographics software is a commercial automated feature extraction program used to create raster geographic databases from hardcopy map sources [1]. Autographics accepts a SunRaster raster file as input. Input color map files are usually 24 bits per pixel, 8 bits for each color. Once the SunRaster file is imported, the user manually trains the software's neural network to recognize various thematic categories by identifying representative pixels from each category to be classified. Once the training is complete, the image is automatically classified based on the manually identified training pixels. Output from this process is an 8-bit SunRaster file. Output files can contain multiple thematic layers, or layers can be broken apart and stored as separate raster files. For example, it is possible to output roads and contours in two separate files. For our purposes the output is a SunRaster bit map containing a preliminary road network .

C) Apply knowledge-based rules. The road bit map output from the Autographics software is imported into ArcInfo using the GRID IMAGEGRID command. The World File created in Step A is used to georeference the new grid to the same coordinate system as the original ADRG.

The preliminary road bit map contains numerous non-road artifacts such as speckles, sporadic text, etc. This bit map is processed using an ArcInfo AML written by TEC to eliminate most of the artifacts and yet retain the road. This AML uses knowledge-based rules that apply GRID and ARC functions together with identified geometric road descriptors and compactness measures. This AML is discussed in more detail later in the paper.

D) Perform manual editing. Despite the best efforts of the postprocessing AML, non-road artifacts frequently remain. ArcInfo's ArcScan is used to eliminate noise that may still exist and to connect roads that may be disjoint.

E) Perform raster to vector conversion. Prior to vectorization, the grid road network is thinned using the THIN function. The remaining grid is now ready for vectorization. The ArcInfo GRIDLINE function is used to accomplish the vectorization. Once the roads are vectorized, they can be manually attributed.

Applying Knowledge-Based Rules -- A Detailed Description

As introduced in Step C above, knowledge-based rules were developed to refine the preliminary road network identified by the Autographics software. These rules are a combination of ArcInfo ARC commands and GRID functions together with geometric road descriptors and compactness measures. The parameters used in the formulation of the rules were obtained by interactive query of the original and intermediate grids used in this step. The rules were refined by application to multiple data sets. The substeps involved in the knowledge-based rule application are outlined in Diagram B.


4. RESULTS

This map parsing prototype was tested on several ADRGs with varying road network densities. A visual comparison of the resulting road networks showed a favorable correspondence with the original ADRG roads. Diagram J shows a portion of the vector road network that resulted from application of this method on one TLM ADRG having an extensive road network. This particular grid was vectorized without going through the manual edit phase to illustrate why the manual edit phase is required. Note that gaps and spurs do exist.



Diagram J
Vector Road Network Resulting From Semi-Automated Process

The experimental method was applied to a representative TLM ADRG, containing 5561 rows and 6673 columns. The following are the time estimates for processing the significant portions of the map parsing operation on a SunSpark 20:

Detailed time comparisons have not yet been performed to see if the semi-automated approach is faster than manual digitizing. However, it should be noted that the human operator's portion of the time spent in the semi-automated approach is proportionally small compared to the proportion of the time that the computer is working.

5. RECOMMENDATIONS FOR FUTURE RESEARCH

In addition to remaining work to improve the road extraction process, other areas of future research include:


6. CONCLUSIONS

The combined success of the Lockheed Martin Autographics software and the TEC-developed knowledge-based rules provide a proof of concept for the semi-automated extraction of roads from scanned maps. Even though the commercial Autographics software package is capable of providing a preliminary road network, there is still too much noise to be useful as a GIS vector data layer. The use of TEC's raster rules utilizing ArcInfo functions together with standard geometric descriptors and compactness measures significantly improves the resulting road network. Early indications are that the semi-automated map parsing approach presented in this paper may help the Army meet it's goal of rapid data base production.

REFERENCES

1. Autographics User's Guide (Lockheed Martin, 1995), pp. 3-4.
2. Cell-based Modeling with GRID (Esri, 1994), pp. 310-328.
3. Statistics In Geography (David Ebdon, 1981), pp. 119-120.


Brian Graff, Cartographer
U.S. Army Topographic Engineering Center
USATEC-TD-TD
7701 Telegraph Road
Alexandria, VA 22315-3864
Phone: 703-428-6071
FAX: 703-428-6176
E-mail: bgraff@tec.army.mil