Automated Conflation with Validation

Stanley Dallal

Abstract

ESEA’s conflation system (ECS) automates the process of transferring attributes from one vector data set to another. Although ECS automates the majority of conflation work, there is often some amount of manual data editing necessary to correct conflation errors. The bulk of labor costs are associated with this manual cleanup effort even though over 90% of a conflation job may be completed with ECS. A tool that helps reduce the time it takes to cleanup the data will significantly increase productivity. ESEA has developed an ArcMap extension that does automated data validation for quality assurance. This paper describes preliminary results applying the validation system to data produced with automated conflation.

Background

Conflation is the process of merging two digital geographic datasets that cover overlapping regions to produce a superior target dataset. Through the conflation process, individual strengths of the source datasets can be combined. For example, U.S. Census Bureau TIGER attribution including road name and address range information can be conflated to USGS DLG data to produce a dataset with DLG spatial accuracy and TIGER attribution.

Conflation is generally a costly and time consuming task involving matching corresponding arcs between two data sets and then transferring the arc’s attributes from one data set to the other. To address this, ESEA has developed a stand-alone automated conflation system called the ESEA Conflation System (ECS) that automates the process of transferring attributes from one vector data set to another. ESEA uses this tool to provide services to customers that wish to merge vector data sets. The ECS automates the majority of the conflation work, however there is often some amount of manual cleanup required to correct errors resulting from the automated conflation. The majority of the labor costs are associated with this manual cleanup work even though over 90% of the job may be completed using ECS. For this reason, any tool that helps reduce the time required to perform the cleanup task will significantly increase productivity.

A typical conflation job involves the transfer of the address ranges from a Tiger data set to a data set with more accurate coordinates but less attribution. In certain instances address ranges may be transferred incorrectly and these cases can be very difficult to find. As part of a Phase II Small Business Innovative Research (SBIR) contract with the U.S. Army Topographic Engineering Center, ESEA has developed a prototype Map Validation Tool, named Validater, which applies a set of validation rules to query and modify vector feature data. Using a validation rule set that looks for address range abnormalities significantly reduces the time required to locate these possible errors. With the Validater, each address abnormality can be reviewed case-by-case, and suitable corrections can be made on the spot.

Conflation

Shape files for the two source data sets are imported into ECS, conflated within ECS and then exported to a target shape file. The ECS conflation process is carried out with three steps: node matching, line (arc) matching, and feature merging. Each of these steps is discussed in turn below.

Node matching is performed to create rubber-sheeting transformations and to match node features. Distance, topological and attribution measures are used for matching nodes. In searching for candidate base and non-base node matches, ECS only considers node pairs within an operator-specified match distance. All base and non-base node pairs separated by more than this match distance are excluded as node match candidates.

Along with the distance between candidate matching nodes, the number and distribution of lines that meet at the node is also an important measure of similarity. The ECS operator can specify the relative importance of this type of matching. ECS can use a comparison of attribute values associated with line or node features as an additional node match measure. If arcs are present, the match is performed on attributes of the arcs incident at the nodes being compared.

Line matching proceeds once node matching has been completed to the operator’s satisfaction. For each line to be matched a region is considered within an operator-specified distance from the line. A path in the other map is considered for matching if it lies within this distance. In addition to geometry, line matching can also use attribute information to help identify matches. ECS allows for the user to visually inspect the results of automatic node and line matching. The operator can selectively add and delete point matches, add and delete line matches and split and unsplit line features.

Often a single arc in one map does not correspond to a single arc in the other map. These match situations must be converted into one-to-one matches in order for the attributes to be transferred across. In these situations ECS will automatically add nodes to split arcs and create one-to-one matches.

The next step in the ECS conflation process is feature merging, which consists of selecting the desired features and attributes to include in the target conflated data set. Any combination of attributes from the two source maps can be transferred to the target map.

Validation

The Validater is an ArcMap extension written in C++ that applies a set of consistency rules to either individual features or groups of related features in a vector map. The rules are expressed in an application specific language called Frequency Manipulation Language (FML). The rules used to validate address ranges look for occurrences where:

1. two adjacent road features with the same road name have address ranges that increase in one road and decrease in the other road.

2. two adjacent road features with the same name have even addresses on one road and odd addresses on the same side of the other road.

3. two adjacent road features with the same name have their adjacent addresses separated by more than some amount, such as 100.

4. three road features coming out of an intersection all have the same name.

5. a road feature’s addresses increase on one side of the road but decrease on the other.

A rule may apply to a single feature or a rule may apply to a logical grouping of features that share some relation. The first four of these rules apply to connected line features: groups of line features that share an end node. The fifth rule applies to individual line features. The Validater supports the groupings listed below.

1. Individual Point: Rules apply to single point features only.

2. Individual Line: Rules apply to individual line features only.

3. Same Point: Rules apply to two coincident point features. The point features need not be in the same layer.

4. Same Line: Rules apply to two coincident line features. The line features need not be in the same layer.

5. Connected Lines: Rules apply to line features that touch at their end points. The line features need not be in the same layer.

6. Intersecting Lines: Rules apply to line features that intersect and their intersection point. The line features need not be in the same layer.

7. Node Line: Rules apply to nodes and line features where the node lies on top of the line feature.

These rules were written in FML and applied to data formed by using ECS to conflate the road names and address ranges from Tiger derived data to more spatially accurate data for Baltimore County, Maryland. The Validater located a number of address range anomalies, many of which were in the original Tiger data and consequently were not caused by the conflation process. The conflation process introduced a few address anomalies, some of which would have been very difficult to find without using the Validater tool. In the example shown below, three road features are highlighted in light blue. The Validater found that two of the highlighted road features satisfied the rule that tests for address ranges that increase in one road and decrease in the other, and the rule that tests for addresses on the same side of the road that change from even to odd across an intersection.

Road features with an address range anomaly

Figure 1. Road features with an address range anomaly.

The address ranges for the two road features are shown in the following table:

From LeftTo LeftFrom RightTo Right
Road 12023201320102000
Road 21962199820012011

The first road’s addresses decrease and the second road's addresses increase. Also, the first road's left side addresses are odd and the second road's left side addresses are even. It is clear that the second road's From/To and Left/Right addresses have been reversed. This is likely caused by ECS matching the opposite nodes of corresponding line features causing the address range attributes to get reversed when they are transferred to the target data set.

An additional tool that compares validation results for the conflated data set with validation results for the source data would be helpful in differentiating which anomalies in the conflated data set were caused by the conflation and which were already in the source data.

Conclusion

An automated rule-based validation system can be used to find errors resulting from automated conflation. The Validater is effective because of its use of FML, a rule language designed expressly for map validation, and the rule strategy concept, which gives the Validater the ability to apply sophisticated rules to groups of associated features. FML enables rules to be written for the specific requirements of the data being validated. For conflated street centerline data, using a validation rule set written for detecting street address range abnormalities significantly reduced the time required to locate and correct these types of errors. Each street address abnormality can be reviewed case-by-case, and suitable corrections can be made on the spot with ArcMap’s editing capabilities. The combination of a rule language, feature grouping strategies, rule engine and the ArcMap platform yields a validation tool that has potential to be a central component in the quality assurance of conflated data.


Author Information

Stanley L. Dallal, Senior Software Engineer
ESEA
100 W. El Camino Real, Suite 74
Mountain View, CA 94040
Telephone: 650-962-1167
Fax: 650-962-0976
dallal@esea.com