Stanley L. Dallal

Automated Attribution of Shapefile Data through Map Merging

ABSTRACT

Map merging or conflation is the process of merging two vector maps to produce a map with superior positional accuracy and richer attribution. Conflation often involves a high degree of costly manual work to match features between two maps and update their positions and attributes. ESEA has developed an automated conflation system that significantly reduces the effort involved in merging vector maps. ESEA's conflation system implements sophisticated matching algorithms to rapidly identify corresponding features between two maps and to automatically transfer attribute information between these matched features. ESEA�s capabilities were used in a USGS evaluation of current conflation technology.

BACKGROUND

Conflation is the process of merging two digital geographic datasets that cover overlapping regions to produce a superior target dataset. Through the conflation process, individual strengths of the source datasets can be combined. For example, a dataset with excellent spatial accuracy but little attribute information can be merged with a dataset with rich attribute information but poor spatial accuracy to produce a map that is both spatially accurate and attribute rich.

Conflation is a complex process of matching and combining the spatial information and attributes of source dataset features that represent the same earth entity. Accomplishing conflation with currently-used methods can be tedious and prone to accidental errors due to the high level of manual work required. Automating the conflation process reduces its associated time and cost. Various levels of automation can be achieved depending on the similarity of the source datasets. There can be situations in which no level of automation can entirely alleviate the need for manual pre-conflation and post-conflation data processing. The goal of an automated conflation system is to automate the conflation process to the extent permissible by the source data thereby significantly reducing the need for manual data processing.

ESEA has developed software to support vector map conflation, written entirely in C++ and using object-oriented programming techniques. This paper summarizes an automated conflation task in which roadway centerline data was conflated using ESEA�s Conflation System (ECS). ESEA performed this conflation on behalf of the USGS Mid-Continent Mapping Center as part of their effort to evaluate commercially-available conflation software.

DATA DESCRIPTION

ESEA received two ArcInfo coverages containing road data for Douglas County, GA from the USGS. The first coverage was 1:24,000 Georgia Department of Transportation (DOT) data containing over 4000 arcs. The second coverage was 1:100,000 U.S. Census Bureau Tiger data for the same region and containing over 5000 arcs.

ECS uses the shapefile format for import and export of data. To perform the USGS conflation task, ESEA imported the DOT and Tiger export format (.E00) files into ArcInfo coverages and then used the ArcInfo ARCSHAPE command to generate the shapefiles to be imported by ECS. The general ARCSHAPE command line used to create shapefiles was:

arcshape <input cover> arcs <output shapefile name>

The resulting shapefiles generated by ArcInfo were then imported into ECS for the conflation process.

CONFLATION DESCRIPTION

Base and Non-Base Coverages

ECS performs conflation on two coverages at a time. One coverage is identified to be more spatially accurate. This coverage is referred to as the base coverage. The geometry of the base coverage is anchored and is not modified during the conflation process. The other, less spatially accurate coverage is referred to as the non-base coverage. The non-base geometry is transformed via rubber-sheeting to match the base geometry during conflation. The DOT shapefile data was chosen to be the base coverage. Thus all features derived from the DOT geometry kept the same positional values. The geometry of conflated features derived from Tiger geometry were transformed to be consistent with the DOT geometry.

Figure 1 shows a subregion of the DOT base map in yellow and the Tiger non-base map in orange. The Tiger features are offset from the DOT features in different directions and by varying amounts throughout the road network.

Figure 1: Section of DOT and Tiger Road Coverages

The Conflation Process

The conflation process is carried out with ECS in three steps: node matching, line matching, and feature merging. Each of these steps is discussed in turn below.

Node matching is performed to create rubber-sheeting transformations and to match node features. Distance, topological and attribution measures are used for matching nodes.

In searching for candidate base and non-base node matches, ECS only considers node pairs within an operator-specified match distance. All base and non-base node pairs separated by more than this match distance are excluded as node match candidates.

Along with the distance between candidate matching nodes, the number and distribution of lines that meet at the node is an important measure of similarity. The ECS operator can specify the relative importance of this type of matching.

ECS can use a comparison of attribute values associated with line or node features as an additional node match measure. If arcs are present, the match is performed on attributes of the arcs incident at the nodes being compared. This match measure is effective at finding unique intersections of attributed arcs. For instance, it will match the intersection of Oak Street and Main Street between two road coverages. When appropriate attributes exist in the two maps, such as street names, this measure can clearly generate very high confidence node matches.

The matched node pairs are used to generate a rubber-sheeting transformation that brings the non-base coverage into better alignment with the base coverage. Rubber-sheeting and node matching proceed iteratively. Each iteration produces a new transformation which brings the coverages into better alignment possibly causing some nodes that did not match previously to now match and become anchor points for a new rubber-sheeting transformation. The rubber-sheeting and node matching proceed until no new node matches are found.

The operator may view the node matches to determine if the anchor points found are sufficient for building a good rubber-sheeting transformation. The operator can also choose to manually add or remove individual node matches. The operator may then relax the node match criteria (for instance, the match distance) and start a new match iteration to find more anchor points. Alternatively, an iteration may be redone with stricter criteria if poorly qualified anchor points were found.

Figure 2: DOT and Tiger Road Coverages After Automatic Node Matching

The results of automatic node matching are shown in Figure 2. The green lines denote node matches that will be used as anchor points in the rubber-sheeting transformation. The blue lines are node matches that ECS has less confidence in and are consequently rejected from being used as anchor points in rubber-sheeting. In this region, ECS found two matches that were determined to be incorrect. These incorrect matches were manually deleted and replaced with correct matches. The manually added matches will be used as anchor points in the rubber-sheeting transformation. The modified node match results are shown in Figure 3 below.

Figure 3: DOT and Tiger Road Coverages After Modifying Node Match Results

Line matching proceeds once node matching has been completed to the operator�s satisfaction. For each line to be matched a region is considered within an operator-specified distance from the line. A path in the other map is considered for matching if it lies within this distance. In addition to geometry, line matching can also use attribute information to help identify matches.

Often a single arc in one map does not correspond to a single arc in the other map. Three of these cases are highlighted in Figure 4. Towards the top of the figure, two DOT arcs match a single Tiger arc (a two-to-one match). Likewise, there are two one-to-two matches highlighted where one DOT arc matches two Tiger arcs. These must be converted into one-to-one matches in order for the Tiger attributes to be transferred across to the DOT data. In these situations ECS will automatically add a node to the single arc creating two one-to-one matches. Partial and many-to-many line matches are treated similarly.

Figure 4: Line Match Conditions

The next step in the ECS conflation process is feature merging, which consists of selecting the desired features and attributes to include in the target conflated data set. Target features can be any combination of matched features, unmatched base coverage features and unmatched non-base features. Any combination of base and non-base attributes can be transferred to the target coverage.

The road network shown in Figure 4 contains a number of features that have no corresponding feature in the other map. A few of these are highlighted as either "unmatched base" or "unmatched non-base." In a manual conflation, special care must be taken in merging the unmatched non-base features to avoid making them unconnected roads in the target map. To handle this case, ECS automatically adjusts these features to connect them to the appropriate base feature. In this way, the correct road network connectivity is generated in the target map. A target map resulting from transferring all matched and unmatched features is shown in Figure 5. It has the DOT coordinates and the combined attributes from both the DOT and Tiger datasets.

Figure 5: Merged Map Containing Both DOT and Tiger Attributes EXPORT OF DATA TO SHAPEFILES

Once the conflation process is finished and the ECS operator is satisfied with the results, then the next step is to export the conflated data from ECS to shapefiles.

ECS provides the ability to enter and use default attribute values. These default attribute values may be used to flag specific feature attributes which have no data associated with them. For example, unmatched DOT features were included in the conflated target coverage. In the target coverage, Tiger attributes were associated with these features and the attribute values were set to the operator specified default values to indicate that these attribute values may need additional processing.

After shapefiles containing the conflated data were created by ECS export operations, these shapefiles were imported into ArcInfo using the following ArcInfo command line:

shapearc <input shapefile name> arcs <coverage name> No problems were encountered in importing shapefiles into ArcInfo coverages. Figure 6 below provides a snapshot view of the merged map for the entire region. The blue lines represent merged DOT and Tiger features. Red represents unmatched DOT features and green represents unmatched Tiger features. Figure 6: Snapshot of Merged Map

Figure 6: Snapshot of Merged Map

By automating the conflation process, ECS greatly reduces the time and effort needed to merge geographic data. All the conflation steps for this effort took less than 1 hour of CPU time in total, and approximately 12 person-hours overall. Data import took less than 4 minutes of CPU time, and exporting the conflated coverages took less than 10 minutes including the time spent by the ECS operator to entering shapefile names and default attribute values.

SUMMARY OF SYSTEM CAPABILITIES

The following is a summary of ECS�s key capabilities.

Input/Output in Esri Shapefile Format: ECS uses the Esri shapefile format for import and export which allows the conflation of any GIS formats which can be converted to the shapefile specification. The shapefile input/output format thus enables ECS to conflate DLG, VPF, TIGER, ArcInfo, and any other formats which are first converted to shapefile format.

Supports Large Datasets: ECS has internal object-oriented data structures which are optimized for conflation algorithms. The resulting performance enables large datasets to be merged in a timely and cost efficient manner.

Generates Topology: ECS imports shapefile points and lines and generates a clean topology, regardless of the topology in the source shapefile data.

Automatic One-to-One Line Matching: ECS automatically adds nodes to resolve one-to-many, many-to-one, many-to-many as well as partial match relationships between arcs common to both source datasets.

User Defined Target Shapefile Contents: ECS gives the operator control in structuring the target shapefile datasets. The operator determines which coverages and feature attributes exist in the target map and how the feature data are stored within the shapefile outputs.

ECS runs on Sun Microsystems compatible workstations with at least 32 MB of memory using Sun Microsystems SOLARIS operating system, Version 2.5. ECS executable requires 12 MB of disk space. A Sun Microsystems UltraSparc II workstation was used to perform the work described in this paper. CONCLUSION

While the current version of ECS has not eliminated the need for a person "in the loop", using ECS provides a great improvement over more manually intensive conflation methods, both in speed and quality of the conflated data. Despite the fact that the original DOT and Tiger positional data are very inconsistent in the type and direction of positional mismatch, ECS automatically found and used thousands of anchor points to develop a rubber-sheeting transformation that greatly enhanced the positional accuracy of the Tiger geometry. Manual intervention consisted of identifying and removing erroneous anchor points and adding corrected ones through a simple point and click interface. The speed of the ECS allowed the use of trial and error attempts in finding the appropriate values for match parameters. Undoing and retrying with a different parameter set was easily accomplished. Attribute update of conflated features was automatic and easily configured to meet the goals of this task.

The completion of this effort showed that a high degree of automation can be achieved in the conflation process. Using Esri's open shapefile format as the intermediary format for exchanging data with ECS enables a wide variety of datasets to be conflated in a more automated process. The level of automation achievable is dependent on the similarity of the datasets being merged. Although automation reduces the labor requirements to a large extent, there is still a need for a qualified GIS specialist to manage conflation projects and ensure quality in the product dataset.

ACKNOWLEDGMENTS

ESEA would like to acknowledge the Georgia Department of Transportation and the USGS Mid-Continent Mapping Center for providing data for this task. ESEA would also like to acknowledge Bob Davis, Cartographer at the USGS Mid-Continent Mapping Center for providing ESEA with guidelines for this conflation task.

REFERENCES

Saalfeld, A., 1993, Conflation: Automated Map Compilation. prepared as part of a doctoral dissertation, Computer Vision Laboratory, Center for Automation Research, University of Maryland

Saalfeld, A., 1988, Automated map compilation, International Journal of Geographic Information Systems, 2, 217-228.

Lynch, M. and Saalfeld, A., 1985, "Conflation: Automated Map Compilation, a Video Game Approach", Proceedings, Auto-Carto VII.

AUTHOR INFORMATION

Stanley L. Dallal, Senior Engineering Specialist
ESEA
5150 El Camino Real, Suite B-15
Los Altos, CA 94022
Telephone: 650-962-1167
Fax: 650-962-0976
sdallal@esea.com