The U.S. Army Topographic Engineering Center (TEC) is investigating
semi-automated methods to extract roads from scanned color topographic
maps. The first step in this process is to extract a preliminary
road network using a commercial software package that applies
a neural network approach. This results in a raster file containing
a rough road network together with numerous non-road artifacts
such as speckles, sporadic text, etc.. Knowledge-based rules that
use geometric descriptors and compactness measures related to
the linear nature of roads were written by TEC using ArcInfo's
Arc Macro Language (AML). These rules are used to eliminate the
artifacts and retain the roads. The resulting road network serves
as a vector transportation layer suitable for use in a GIS.
_________________________________________________________________________________
1. INTRODUCTION
Data used in a Geographic Information System (GIS) commonly consists of various attributed vector data layers. Of the many possible layers, the transportation layer is frequently one of the most significant thematic data layers in many GIS's. Multiple sources of digital vector transportation layers currently exist for the United States. Nationwide road coverage exists in the U.S. Census Bureau's TIGER files, and U.S.G.S Digital Line Graphs (DLG) at a scale of approximately 1:100,000. Private vendors also market custom made vector transportation layers at larger scales. These layers exist primarily for metropolitan areas only.
However, this type of data is not available for many areas of the world or at scales required for certain types of analyses. Global digital coverage of roads is provided by the National Imagery and Mapping Agency's (NIMA) Digital Chart of the World (DCW) at a scale of 1:1,000,000. Vector transportation layers at intermediate scales of 1:250,000, or at larger scales of 1:50,000, do not exist over large portions of the world. These scales are required to support many military operations.
2. OBJECTIVES
In order to meet military requirements, the Army is faced with the need to rapidly build a GIS database containing roads over an area of interest at a scale greater than 1:1.000.000. The U.S. Army Corps of Engineers Topographic Engineering Center (TEC) is working on this problem as part of its internal Digital Terrain Data Generation (DTDG) research effort. The traditional approach to meet this need is to manually digitize and attribute roads from existing hardcopy maps or imagery. However, this is time consuming and labor intensive. Semi-automated and automated methods are required to rapidly produce the required data. Strictly automated approaches to extract roads from remotely sensed imagery show promise, but are still relatively immature.
TEC has built a prototype of a semi-automated method to identify
potential roads on scanned color topographic maps, isolate them,
and then vectorize the resulting road network. This method is
capable of taking advantage of the following:
3. METHODOLOGY
Digital color map products distributed by NIMA called Arc Digitized Raster Graphics (ADRGs) are being used as input to this map parsing procedure. ADRGs are created at NIMA by scanning either a paper JOG or TLM. These products are scanned at 240 Dots Per Inch (DPI) and are georeferenced prior to distribution. This makes them more desirable than scanning a paper map, which would then have to be manually georeferenced. JOG and TLM ADRGs are available for approximately 30% of the world.
It should be noted that while ADRGs were used as input in the
developed method, the principles described in this paper apply
to any color map that is scanned and then subsequently georeferenced.
The map parsing steps used to extract a road network from a digital
color map are outlined in Diagram A.
B) Extract preliminary road network. Lockheed Martin's Autographics software is a commercial automated feature extraction program used to create raster geographic databases from hardcopy map sources [1]. Autographics accepts a SunRaster raster file as input. Input color map files are usually 24 bits per pixel, 8 bits for each color. Once the SunRaster file is imported, the user manually trains the software's neural network to recognize various thematic categories by identifying representative pixels from each category to be classified. Once the training is complete, the image is automatically classified based on the manually identified training pixels. Output from this process is an 8-bit SunRaster file. Output files can contain multiple thematic layers, or layers can be broken apart and stored as separate raster files. For example, it is possible to output roads and contours in two separate files. For our purposes the output is a SunRaster bit map containing a preliminary road network .
C) Apply knowledge-based rules. The road bit map output from the Autographics software is imported into ArcInfo using the GRID IMAGEGRID command. The World File created in Step A is used to georeference the new grid to the same coordinate system as the original ADRG.
The preliminary road bit map contains numerous non-road artifacts such as speckles, sporadic text, etc. This bit map is processed using an ArcInfo AML written by TEC to eliminate most of the artifacts and yet retain the road. This AML uses knowledge-based rules that apply GRID and ARC functions together with identified geometric road descriptors and compactness measures. This AML is discussed in more detail later in the paper.
D) Perform manual editing. Despite the best efforts of the postprocessing AML, non-road artifacts frequently remain. ArcInfo's ArcScan is used to eliminate noise that may still exist and to connect roads that may be disjoint.
E) Perform raster to vector conversion. Prior to vectorization, the grid road network is thinned using the THIN function. The remaining grid is now ready for vectorization. The ArcInfo GRIDLINE function is used to accomplish the vectorization. Once the roads are vectorized, they can be manually attributed.
Applying Knowledge-Based Rules -- A Detailed Description
As introduced in Step C above, knowledge-based rules were developed to
refine the preliminary road network identified by the Autographics
software. These rules are a combination of ArcInfo ARC commands and
GRID functions together with geometric road descriptors and compactness
measures. The parameters used in the formulation of the rules were obtained
by interactive query of the original and intermediate grids used
in this step. The rules were refined by application to multiple
data sets. The substeps involved in the knowledge-based rule application
are outlined in Diagram B.
b) Eliminate very small regions. The BOUNDARYCLEAN function is used to eliminate very small regions within larger regions. This is useful because holes (noise) sometimes exist within large road regions. If these holes are left within the road regions, they can cause adverse effects during the raster-to-vector conversion process. For example, if the hole is too big, the vector may be split in two in order to go around the hole. The two vectors would then be merged after bypassing the hole. It may be necessary to run the BOUNDARYCLEAN function several times to produce the desired result [2]. Diagram E shows the result of two passes of the BOUNDARYCLEAN function.
c) Reassign cells to unique regions. Because many regions are eliminated as a result of BOUNDARYCLEAN, it is useful to assign new IDs to the remaining regions by applying REGIONGROUP again.
d) Eliminate additional non-road regions based on size. Based on an interactive query of the grid resulting from the previous step, it was determined that the size of road features tends to fall between 30 and 100,000 cells. Regions equal to or less than 30 cells are most likely noise but are too large to be eliminated by the BOUNDARYCLEAN function. Regions equal to or greater than 100,000 cells belong to the background and not to road features. The CON function is used to assign a NODATA value to those regions greater than 30 cells and less than 100,000 cells.
This range of values for NODATA appears to apply to both JOGs and TLMs scanned at 240 DPI. If these same maps were scanned at a higher resolution, for example 1000 DPI, then the lower and upper bounds would increase in number of cells. This is because regions from a map scanned at 1000 DPI contain more cells than regions scanned at 240 DPI. Thus, the lower and upper bounds must increase as well. In addition to further eliminating noise and background regions, this step also makes the following steps run faster. Diagram F shows the result of building new regions and then subsequently eliminating regions greater than 30 and less then 100,000 cells.
e) Calculate standard geometric statistics for each region. In order to apply rules based on the linear characteristic of roads, geometric statistics related to each region in the grid must be obtained. The ZONALGEOMETRY function is applied to compute the required geometric descriptors. Output from this function is an INFO table containing area, perimeter, thickness, x-centroid, y-centroid, length of major axis, length of minor axis and orientation angle for each region. The JOINITEM command in ArcInfo is used to join the INFO table containing the geometric statistics to the Value Attribute Table (.VAT) of the region.
f) Calculate measures of compactness for each region. These measures use the geometric statistics, calculated in the previous step, for each region. Because roads are linear, not compact, measures of compactness will be used in rules that eliminate features if they are compact (non-linear). The two measures of region compactness used in this method are:
Compact2 = (4*Area) /(3.14*Diameter^2)
Compact2 results in a lower value the more linear the region. A detailed description of this measure can be found in Ebdon [3].
g) Use geometric rules to eliminate non-linear regions. Based on an interactive query of the resulting compactness measures, rules were developed that preserve road features and eliminate non-road features. As applied to test cases, results of these rules appear consistent from sheet to sheet within a series. The rules are as follows:
If Compact1 is greater than 2 and Compact2 is less than 1, in
almost all cases the region is linear. If the region is a road
and it is sprawling over a large portion of the map, Compact2
may in fact misclassify that region as non-road. To insure that
major roads are not inadvertently removed, regions containing
more than 500 cells are retained.
For those regions that passed the compactness tests, they are tested to see if they fall within a minimum and maximum thickness designated for roads. Any regions that fall below the minimum thickness or above the maximum thickness are eliminated.
Thickness is defined as the radius (in cell units) of the largest circle that can be drawn inside a region. In the rule below, the variables %lowthick% and %highthick% indicate the minimum and maximum thickness for a region to be designated a road. In the case of a TLM with cell size of 0.162 decimal seconds, the minimum thickness is 0.2592 decimal seconds and the maximum thickness is 2 decimal seconds. In terms of number of cells, the minimum thickness is a 2-cell radius and the maximum thickness is a 12-cell radius.
h) Remove remaining island regions. Despite the use of
filters and geometric tests, there are still regions that exist
that are not roads. If these are single linear artifacts totally
isolated from other regions and less than 4000 cells
they are eliminated. By interactive query of the grid, it was
determined that regions less than 4000 cells represent noise,
not roads. The results of removing island regions are shown in
Diagram H. The following rules were developed to remove these
regions:
if (variety ge 1)
bit_grid = 1
else
bit_grid = NODATA
endif
i) Connect segmented road regions.In order to be used for analysis, all roads in the final vector coverage must be connected lines, regardless of their representation on the paper map or ADRG. The FOCALMAX function, with a 15 by 15 window, is used to expand the remaining road regions in order to join segmented road regions prior to vectorization. Thus all road regions are expanded by 7 cells in each direction. The size of the window was determined by interactive query of the road regions up to this point. Diagram I shows the effects of expanding the regions to join the road segments.
Even after running the FOCALMAX function not all of the segments are joined. The 15 by 15 window was designed to be large enough to expand road segments representing dashed lines. Those road segments that are not joined exceed the 15 by 15 window and are further apart than dashed road segments normally are.
This map parsing prototype was tested on several ADRGs with varying road network densities. A visual comparison of the resulting road networks showed a favorable correspondence with the original ADRG roads. Diagram J shows a portion of the vector road network that resulted from application of this method on one TLM ADRG having an extensive road network. This particular grid was vectorized without going through the manual edit phase to illustrate why the manual edit phase is required. Note that gaps and spurs do exist.
The experimental method was applied to a representative TLM ADRG,
containing 5561 rows and 6673 columns. The following are the time
estimates for processing the significant portions of the map parsing
operation on a SunSpark 20:
Detailed time comparisons have not yet been performed to see if
the semi-automated approach is faster than manual digitizing.
However, it should be noted that the human operator's portion
of the time spent in the semi-automated approach is proportionally
small compared to the proportion of the time that the computer
is working.
5. RECOMMENDATIONS FOR FUTURE RESEARCH
In addition to remaining work to improve the road extraction process, other areas of future research include:
6. CONCLUSIONS
The combined success of the Lockheed Martin Autographics
software and the TEC-developed knowledge-based rules provide a
proof of concept for the semi-automated extraction of roads from
scanned maps. Even though the commercial Autographics software
package is capable of providing a preliminary road network, there
is still too much noise to be useful as a GIS vector data layer.
The use of TEC's raster rules utilizing ArcInfo functions together
with standard geometric descriptors and compactness measures significantly
improves the resulting road network. Early indications are that
the semi-automated map parsing approach presented in this paper
may help the Army meet it's goal of rapid data base production.
REFERENCES
1. Autographics User's Guide (Lockheed Martin, 1995), pp.
3-4.
2. Cell-based Modeling with GRID (Esri, 1994), pp. 310-328.
3. Statistics In Geography (David Ebdon, 1981), pp. 119-120.
Brian Graff, Cartographer
U.S. Army Topographic Engineering Center
USATEC-TD-TD
7701 Telegraph Road
Alexandria, VA 22315-3864
Phone: 703-428-6071
FAX: 703-428-6176
E-mail: bgraff@tec.army.mil