Bart Guetti, PlanGraphics, Inc.

Steven Johnson, Wake County (NC) GIS

Reverse Engineering a GIS Database: Translating GDS Drawings into Arc/Info Coverages

 

1. Introduction

Since 1989, Wake County, NC has used GDS (Convergent Group, Englewood, CO) for the development and maintenance of several large datasets, including cadastral, topography, soils, street centerlines. However, in early 1997 development of the GDS product was suspended and Wake County decided to migrate its data and applications to Arc/Info (Esri, Redlands, CA). With the assistance of PlanGraphics, Inc. (Silver Spring, MD & Frankfort, KY) a method was devised to translate all datasets into Arc/Info. This paper discusses the design, procedures, and process required to preserve the investment of the County and continue providing GIS products to the Wake County community.

2. Planning

The planning for translation involved four steps. These were:

Each of these steps are examined in turn.

2.1 The contents of all GDS drawings were inventoried first. The contents of each drawing were documented and any anomalies were noted. These anomalies might include misnamed objects, relics and stray lines left over from previous projects and data structures which had no counterpart in Arc/Info. The County's data dictionary was used in conjunction with these inventories to provide several key pieces of information:

Wake County had many GDS drawings, which varied not only by size and feature density but also by complexity. They ranged from the very complex to single-theme drawings.

For example, the cadastral basemap consisted of these themes:

Most smaller drawings were limited to one theme, such as townships, or precincts, or corporate limits.

2.2 While assembling the supporting documentation, existing mapping practices were also documented. In most instances, clear business rules and protocols had been established for most all mapping procedures. For example, cadastral mapping is governed by rules established at the state level. Also, because the County acts as the MSAG (Master Streets Address Guide) for all County municipalities, street naming must conform to a well-defined procedure. It was also necessary to document mapping procedures in other departments. For instance, the Planning Department maintains zoning maps and associated data such as case number, petitioner, and classification.

The exceptions to the rules were 'special projects', usually a one-time analysis or mapping project done at the request of another County agency or private individual. Since mapping rules for these projects were not formalized, they were often a source of errors and stray linework, which had to be corrected later.

2.3 Using the data inventories created earlier, County staff began cleaning up the more obvious and easily correctable errors in GDS. Misnamed features were given correct names. Misspellings were corrected. Obvious slivers were eliminated and bad linework was snapped or moved. Procedures were run which insured that the graphics were reconciled with the attribute database. Also, any unattributed linework was given attributes, or deleted where shown to be in error.

 

3. Design

3.1 The overall goal in designing the new database was that the resulting database must support the organization�s existing business functions. The coverage structure, database fields and attribute values needed to support the existing as well as immediate future applications of the cadastral, planimetric, topographic and district data.

The most important design decision that had to be made was whether to faithfully replicate the GDS database in Arc/Info, or to adopt a whole new database design, optimized for Arc/Info. And then, there was a third option, which entailed making just enough changes to the GDS database to permit the data to be imported into Arc/Info. Given that the County wanted to minimize disruption to the larger organization, and given that clear business rules had to be followed for map updates, the third option seemed to be the safest path. It also permitted the County to streamline the GDS database by throwing out or consolidating certain features and by simplifying the mapping of features from GDS to Arc/Info.

This was an important philosophical decision for two reasons. First, it guided the design and methodology by which we accomplished the translation. This meant simple feature mapping from GDS to DXF layers and then into single theme, single-feature Arc/Info coverages. Second, it allowed us to use the translation as an opportunity to correct past mistakes and design flaws. Slivers created by overlapping polygons and gaps between polygons were eliminated. Totally overlapping polygon themes such as townhome footprints were separated into a separate coverage. Historical polygons were put into a separate coverage or in the case of historical parcel data, were simply dropped from the design entirely.

Translating the GDS data required developing a translation matrix to assist in the process of grouping GDS layers into ArcInfo coverages. For the most part there was a direct relationship of GDS drawings to ArcInfo coverages. However some drawings layers were split between coverages. Table 1 is a sample of the translation matrix that was developed.

CODE

NAME

GDS_DATA_TYPE

COUNT

DXF/ASCII_FILE

LAYEr/CODE

DXF Type

TRANS_STAT

Comments

FLAG_ANNO

FAT_NAME

COVER_TYPE

SUB_CLASS

Field1

FEATURE_CODE

BASEMAP

PARCEL:*:*

polygon

1818

parcel.dxf

parcel

line

DXF

No

property.pat

BASEMAP

PARCEL:*:*:*:*

polygon

226

parcel.dxf

parcel

line

DXF

No

property.pat

BASEMAP

PARCEL:AREA

attached

1267

parcanno.dxf

area

annotation

DXF

Yes

property.tatarea

area

BASEMAP

PARCEL:AREA:CALC

attached

564

parcanno.dxf

calc

annotation

DXF

Yes

property.tatcalc

calc

BASEMAP

PARCEL:DISTANCE

item

2

parcanno.dxf

dist

annotation

DXF

Yes

property.tatdist

dist

BASEMAP

PARCEL:LABEL

attached

65

parcanno.dxf

lab

annotation

DXF

Yes

property.tatlab

lab

BASEMAP

PARCEL:PIN:SYMBOL

attached

61

Ignore

No

Table 1

For the initial translation, annotation layers were converted into individual annotation coverages. These were then later appended to the coverage they were associated with. For instance the PARCEL:AREA layer in the BASEMAP drawing was converted into a layer named AREA in the parcanno.dxf file. This was then converted into a coverage named AREA which was appended to the PROPERTY coverage to create the annotation subclass PROPERTY.TATAREA.

The three major types of coverages created were:

Polygon coverages were designed to contain a primary key field and some attributes, Table 2.

FAT_NAME

Start Column

Attribute Name

Item Name

Item Def.

WIDTH

ATTR_DESC

PROPERTY.PAT

1

Area

AREA

8, 18, F, 5

0

Polygon area

PROPERTY.PAT

9

Perimeter

PERIMETER

8, 18, F, 5

0

Polygon perimeter

PROPERTY.PAT

17

Coverage#

PROPERTY#

4, 5, B

0

Unique system identification number

PROPERTY.PAT

21

Coverage Feature ID

PROPERTY-ID

4, 5, B

0

Unique feature identification number

PROPERTY.PAT

25

Feature Code

FTR_CODE

3, 3, I

0

Parcel feature code

PROPERTY.PAT

35

Pin

PIN

10, 10, C

0

Parcel identification number

PROPERTY.PAT

28

Account

ACCOUNT

7, 7, C

0

Parcel account number

PROPERTY.PAT

45

Deedbook Number

DEEDBOOK

6, 6, C

0

Parcel deed book identifier

PROPERTY.PAT

51

Page

PAGE

6, 6, C

0

Parcel page number

Table 2

Line coverages had minimal attributes to carry over from GDS and therefore usually were designed to contain only a feature code, Table 3.

FAT_NAME

Start Column

Attribute Name

Item Name

Item Def.

WIDTH

ATTR_DESC

CONTOUR.AAT

1

From Node #

FNODE#

4, 5, B

0

The beginning node of the arc

CONTOUR.AAT

5

To Node #

TNODE#

4, 5, B

0

The end node of the arc

CONTOUR.AAT

9

Left Polygon #

LPOLY#

4, 5, B

0

The polygon to the left of the arc

CONTOUR.AAT

13

Right Polygon #

RPOLY#

4, 5, B

0

The polygon to the right of the arc

CONTOUR.AAT

17

Arc Length

LENGTH

8, 18, F, 5

0

The length in feet of the contour segment

CONTOUR.AAT

25

Coverage#

CONTOUR#

4, 5, B

0

Unique system identification number

CONTOUR.AAT

29

Coverage Feature ID

CONTOUR-ID

4, 5, B

0

Unique feature identification number

CONTOUR.AAT

33

Feature Code

FTR_CODE

3, 3, I

0

Code assigned to contours

CONTOUR.AAT

36

Contour ID Number

ELEVATION

3, 3, I

0

Contour identification number

Table 3

Annotation coverages were designed with the minimum attributes required in ArcInfo.

 

Based upon the constraints of the existing GDS drawing structure and the requirements of the new database design, a translation process was designed. It consisted of the following general steps:

 

 

4. Translation

4.1 To translate the GDS drawings into ArcInfo coverages, three applications were used. The first was an application to export the GDS drawings to DXF files. A third party translator, DXFOUT/GDS was purchased from GEODESY (San Francisco, CA) as the GDS DXF translator lacked sufficient control over the layers that it could produce.

The second application, to obtain polygon attributes from the GDS structures and create a point file of label points, was developed by Wake County GIS staff.

The third application was developed by PlanGraphics and was written in Arc Macro Language (AML) to convert the DXF files into coverages. The GDS to ArcInfo application, GAIA, was designed to be table-driven, reading the translation parameters directly from INFO tables. This avoided the need to hard code the logic into the application. It had the added advantage of making the application easy to modify as the translation design evolved. Finally the INFO tables used were created directly from the database design and translation matrix tables developed in Microsoft Access.

Figure 1 illustrates the steps used in creating the line and polygon coverages.

 

Figure 1

 

The translation process was initiated with the creation of DXF files from the GDS drawings. Using the GEODYSEY DXFOUT translator DXFOUT/GDS, DXF files from GDS drawings were created using the translation schema to group the various DXF layers into DXF files for a particular coverage. ArcInfo coverages were created from the DXF files using the DXFARC command in ArcInfo.

The translation process provided an excellent opportunity to eliminate sliver polygons caused by double digitizing of shared boundaries, Figure 2. Sliver polygons are small vestigial polygons created where there are gaps between or overlapping of adjacent polygons.

Figure 2

 

 

 

 

 

 

 

The elimination process involved:

A search for polygons having a perimeter to area ratio greater than .04 was conducted. The arcs defining those polygons were collapsed using the SNAPENVIRONMENT and SNAP commands. The perimeter to area ratio approach was developed to prevent the accidental elimination of valid small polygons. This step was conducted for polygon coverages only.

Many of the ArcInfo polygon analysis commands that were needed to accomplish the translation require that valid polygon topology exist. This required that the BUILD command be used for the polygon coverages. ArcInfo�s data model requires that nodes exist where two arcs intersect if polygon topology is to be built. The CLEAN command is used to create these nodes. Therefore CLEAN was used to establish nodes at all arc intersections and eliminate dangling arcs. BUILD was used to create arc or polygon topology.

The DXFOUT process from GDS returned the entire polygon/arc, which most times extended beyond the tile boundary. This created problems in later operations such as APPEND and CREATELABELS. Therefore the CLIP command was used early in the translation process to eliminate those portions of features extending beyond the tile boundaries.

Even though the drawings in GDS were digitized in STATEPLANE feet, the DXFARC command failed to retain the projection information. Therefore the PROJECT command was used to project the coverages into the State Plane coordinate system, datum NAD83 and FIPSZONE 3200.

The DXFARC process also dropped the attributes associated with the GDS polygons. To recover these attributes a three-step process was developed.

ArcInfo�s CREATELABELS command was used to create polygon labels for each polygon. This was done in ArcInfo as the label creation capabilities within GDS could not guarantee that the resulting label would physically be located within the polygon. Labels created within ArcInfo were guaranteed to physically fall within the polygon, even if it was irregularly shaped. ArcInfo�s UNGENERATE command was used to write out the X,Y coordinates of the polygon label points to an ASCII file.

To determine which polygon a particular label belonged to, a four-step process was developed within GDS:

Scanning the GDS polygons involved a search of all GDS structures to determine which ones were polygons. This was a expensive task as it involved a search of the entire County. Once a list of all of the polygons was returned, a tile search was conducted to determine which ones were in the tile being processed. Using a point in polygon routine, the ID of the polygon that each label point fell within was determined. To pass the data back to ArcInfo, attribute TXT�s were created by writing out the ID, X,Y coordinates and attributes of each polygon to an ASCII file.

Bringing the attributes back to into ArcInfo was accomplished spatially for polygons and using relates for arc coverages. Both approaches required adding items to the FAT to store the primary keys and other attributes.

Attributing polygons was done spatially by intersecting a point coverage with attributes with unattributed polygons. This involved:

An INFO file of attributes and X,Y coordinates was defined and populated using the GET command with the COPY ASCII option to get the TXT file into the INFO file. A point coverage was GENERATEd from the X,Y coordinates using output from the INFO file. Using the IDENTITY command, the point coverage containing the attributes and the polygon coverage were merged.

Attributing arc coverages was accomplished by a two-step process:

The DXFARC command creates an ACODE file for arc coverages containing several pieces of DXF information including the layer name and an ID. The ID is the ArcInfo COVER-ID and can be used to relate the AAT to the ACODE file. The layer name is contained within the DXF-LAYER field. For several of the drawings converted the layer name was the GDS facet name. When Wake County designed its GDS database it stored individual feature attributes in the facet name. For example, the for the EASEMENT drawing, each feature had the easement type as part of its facet name:

EASEMENT:LINE:ELECTRIC

EASEMENT:LINE:ELECTRIC:OVERLAP

EASEMENT:LINE:GAS

EASEMENT:LINE:MISC

Using the relate to the ACODE file, each features attributes can be moved over, either directly or indirectly from the item DXF-LAYER. For some layers this required extracting a portion of the DXF_LAYER item.

Contours required a considerable amount of effort to retain the attributes. Each contour has 4 attributes:

To facilitate the translation of the contour attributes, a special DXF layer naming convention was developed. Each layer had a six-character name, which encoded the contour�s attributes. Column 1 indicated the TYPE, column 2 the VISIBILITY, column 3 the APPROXIMATE and columns 4-6 the ELEVATION. For example, a index contour of 335 feet that was an approximate and representing a depression was stored in the CNDA335 layer. This value was stored in the DXF-LAYER item, and was parsed out to recreate the attributes Table 4.

$RECNO CONTOUR-ID DXF-LAYER

1 1 CDXX225

2 2 CDXX225

3 3 CDXX225

4 4 CTXX235

Table 4

Translation of annotation from GDS to ArcInfo was accomplished in three steps:

Annotation in GDS was stored as an attached data type. Each annotation category was exported to its own DXF layer in a particular drawing�s DXF file. Empty annotation sub-classes were created and populated by getting the annotation features from annotation coverages. The sub-classes were created using the Arc BUILD command and sub-class population was accomplished in ArcEdit by making the annotation sub-class the edit feature and using the GET command to get the annotation coverage.

5. Results

In spite of the amount of time devoted to data QC and planning, several data anomalies did appear.

Within the parcel drawing, town homes in GDS were stored as nested polygons. Many times the search of the GDS polygons to determine the correct polygon for a label returned the PIN for the "parent" polygon i.e. the base polygon for the town homes. This resulted in many polygons having the same PIN or no PIN, Figure 3.

Figure 3

When digitizing the contours in GDS, gaps were intentionally placed in the contours to prevent interference with the elevation labels, Figure 4. When converted to ArcInfo, this prevented the conversion of the contours to polygons, as polygons must be closed. Most 3D analysis in ArcInfo requires topographic data in polygon format for correct results.

Figure 4

When developing the CONTOURS coverage a threshold was hit in the ArcInfo processing. The CLEAN command has a maximum number of arcs that can be processed of 10,000 arcs. To overcome this limitation, smaller DXF files and therefore coverages were created.

The CONTOURS coverages also had a large number of vertices due to the way that they were originally captured. ArcInfo has a maximum number of vertices of 500. Any arc with more than that is split into two. This splitting created a large number of pseudo-nodes, i.e. nodes joining two arcs. This had the effect of creating coverages with a larger number of arcs than were actually required. Although not a significant problem, it will affect drawing speed and spatial analysis of the data.

Once the data was successfully translated into complete coverages, the data was loaded into Librarian. Librarian is Esri�s map management sub-system within ArcInfo. It is a tile-based system allowing users to insert their data into a single repository for easier access and management.

One application of the Librarian data was to organize and standardize the data for eventual extraction and insertion into the County�s SDE database. When attempting to do this, a problem was discovered in that parcels that spanned tile boundaries were disappearing, Figure 4. The problem was traced to the COVER-ID. Librarian uses the COVER-ID when extracting polygons to determine which polygons are the same on opposite sides of tile boundaries. To overcome the problem, post processing was conducted consisting of a series of steps to correct the problem. Key among these steps was the setting all of the COVER-ID�s to the same value and then using IDEDIT to propagate these changes to all coverage files. This appeared to remedy the problem so that subsequent extractions were complete.

Figure 4

 

6. Conclusion

As expected, there were obstacles in the conversion process. We encountered four notable impediments to a smooth translation. Because GDS evolved from the CAD environment, its graphic component retained many of the characteristics of CAD-based drawing packages. This presents difficulty in conversion because polygons that are functionally closed in GDS were not necessarily closed when exported to Arc/Info. Consequently, there were cases where polygons were not built where they should have been, or the resulting Arc/Info coverage contained many dangles and overshoots.

Secondly, GDS and Arc/Info have radically different models of topology. Topological structures in GDS were not tightly coupled to the database as they are in Arc/Info, which caused problems. GDS lacked an effective and efficient method of associating the Arc/Info-generated label point with its surrounding polygon.

Processor Power was a factor as well. All of GDS processing was done on the County�s DEC Alpha server. The lack of geoprocessing utilities in GDS, such as an efficient means to find the centroid of a convex polygon, forced us to find each polygon, find the Arc/Info-generated label point inside, and then write the attribute data to the text file. This was a very CPU and I/O intensive operation. On the Arc/Info side, we began processing on a 333 Mhz PC, switching over to a 450 Mhz PC, and for the final bulk of the translation used the County�s Compaq Proliant 7000 server to speed the processing.

Finally, the size of the geographic areas that could be successfully processed in GDS severely impacted the translation approach. Originally it was hoped that the County could be processed as four quadrants. However, due to a memory leakage problem, the translation had to be conducted on smaller areas. The final translation areas selected were 400 scale tiles.

7. Recommendations

Firstly, test the entire process from start to finish. Only when we attempted to load the finished data into an Arc/Info Library did we discover problems that forced us to re-engineer the a significant portion of the process. Originally, we had planned to export very large, single-theme DXF files and build the coverages in Arc/Info. Problems during the insert forced us to move to a tile-based scheme.

Secondly, it is imperative to do a comprehensive pre-translation data inventory and clean-up. An inventory of the data showed the County what was really in the database, as opposed to what was thought to be in the database. Also, by performing some of the clean up in GDS the County was able to take advantage of the staff�s existing skill set in existing applications. This made clean up relatively easy, made the translation easier, and made the post-translation clean up task much easier.

Lastly, we learned that it is important to think �Multi-purpose� and to think �End-user�. In section 3.1 we outlined the approach taken by PlanGraphics and Wake County for translating the existing GDS databases into Arc/Info. What we mean by �multi-purpose� and �end-user� is, when designing the database it is best to consider a design that is extensible, a design that allows the agency to extend GIS applications enterprise-wide. Careful consideration needs to be made to not only how the data is currently used, but also to ways in which the data might be employed so that other agencies can realize the benefits of GIS technology.

Steven Johnson

Software Developer

Wake County, North Carolina GIS

Sejohnson@co.wake.nc.us

(919) 856-6391

Bart Guetti

Systems Analyst

PlanGraphics, Inc.

Bguetti@plangraphics.com

(301) 588-8535