Using Digital Quality Assurance Routines for Checking Converted Data


Janette Lovely

Abstract: One of the most important and challenging tasks of GIS implementation is performing quality assurance and quality control (QA/QC) of converted data. Pressure from schedules, a significant increase in workload, and the tediousness of the task all contribute to the difficulty of allotting QA/QC the attention it deserves. Fortunately, digital quality assurance routines can be developed to perform many routine checks and analyses. This paper will describe the process and illustrate the Arc Macro Language (AML) scripts that were used for quality assurance of coverage, SDE, and SQL Server data at the City of Franklin, WI.


Background

The City of Franklin, Wisconsin, initially retained PlanGraphics, Inc., an independent geographic information systems (GIS) consulting firm, to assess existing geographic information utilized within several cooperating agencies which could be incorporated into a city-wide GIS. After inventorying, designing, and planning for the Enterprise GIS, the City began its implementation. As a first step, the City contracted for conversion of existing spatial data and hard copy maps into ArcInfo coverage and ArcSDE layer formats and for conversion of attribute data into Microsoft SQL Server 2000 format. Digital data consisted of geodetic control, planimetric/topographic, cadastral, street centerline addresses, water system, sanitary sewer system, and property attribute data. The spatial data was primarily stored in GenaMap file format, with some data in AutoCAD format. Property attribute data was stored in a Computer Aided Mass Appraisal (CAMA) system UNIVERS database maintained by the Assessor’s office. Hard copy maps detailed zoning, selected administrative boundaries, and neighborhood areas. The data converted in this project was obtained from several sources—the Milwaukee County Automated Mapping and Land Information System (MCAMLIS) program, the City of Franklin Public Works Department, the City of Franklin Assessor’s Office, and the Southeast Wisconsin Regional Planning Commission (SEWRPC).

PlanGraphics’ task during the data conversion process was to ensure that the converted data complied with the physical database design previously commissioned by the City and that the data itself was accurately converted when compared to the original source data.

Quality Assurance Overview

The success and timely completion of the GIS data conversion project for the City of Franklin depended on the establishment of an efficient process for transmitting, reviewing, and accepting products from the contractor. The long-term value of the GIS data conversion project was also dependent on a comprehensive and thorough quality checking (i.e., quality assurance) performed uniformly on all components of the delivered data.

The contractor delivered the converted data in five delivery areas based on U.S. Public Land Survey System (PLSS) sections, along with the original source data and digital orthophotos (DOQs). PlanGraphics reviewed each delivery according to an established QA process, which included automated checks performed in ArcInfo and manual checks performed in ArcView. These tests examined edgematching of features and attributes between adjacent map sheets, topology (i.e., connectivity between graphic elements, contiguity between adjacent boundaries, and calculation of area or length), line quality (i.e., smooth transitions from straight line to curvilinear segments, etc.), feature coding, and feature types.

Acceptance of the converted data was based on one of four outcomes:

    1.   Rejected—The converted data had too many structural errors and required general rework.

    2.   Corrections to be made by the Contractor—The contractor was required to make specified corrections to the data before acceptance.

    3.   Accepted—The converted data was accepted as submitted.

    4.   Corrections to be made by the City of Franklin—The submittal was acceptable, however, minor corrections were noted that would be made by the City of Franklin.

Methodology

The data conversion review was performed in two stages—Pilot Study Review and Final Digital Review (Figure 1). A pilot study area of one PLSS section with representative layers was selected for initial data conversion. After acceptance of the pilot study area, the contractor delivered the remaining digital data in four delivery areas, each of which also incorporated all previous delivery areas. The data in each delivery was verified using the QA process. The data in the most recent delivery was also checked against the previous delivery to verify that all required changes were made and that no problems were created with the conversion of the new data.

Figure 1. Data Conversion Review

Figure 1. Data Conversion Review

QA Process

The process of verifying the converted data fell into two categories—visual and automated. Both methods were used to fully examine the delivered data. A brief overview of the visual process is explained in the following section. The automated process will be described in detail in the remainder of this document.

Visual Process

A comprehensive visual examination of the converted data was performed in ArcView 3.2a to—1) verify feature placement; 2) verify data elements; 3) verify feature coding; 4) compare the converted digital files against the original graphic and attribute files; and 5) assess the quality of data provided by the contractor. Any errors or discrepancies were flagged in a new QA shapefile and reported to the contractor.

The visual QA process involved an examination of the features of each ArcInfo coverage, including annotation placement/attribution and edgematching between PLSS quarter sections, compared to the original source data. The visual assessment was performed systematically across the extent of the coverage via panning and zooming methods. As the coverage was assessed, the corresponding original source data was viewed in the ArcView map window or in hardcopy format. The SDE layer was then visually compared for feature content and placement versus its associated ArcInfo coverage. Checkplots were also created as needed.

The visual QA process also included an examination of the SQL Server attribute tables. The assessment was performed in MS SQL Server Enterprise Manager by opening each table to verify that the table contained valid data per column. Again, any errors or discrepancies were noted and reported to the contractor.

Automated Process

The automated QA process used in the data conversion project utilized batch files and Arc Macro Language (AML) routines to perform systematic checking of spatial data and attribute tables in ArcInfo 8.1 Workstation. The automated QA process was used to validate the attribute parameters established in the physical database design and to confirm that all major system files necessary to access, display, and manipulate the graphic data, such as feature class tables, were installed. Reports were generated for each of the checks performed during the process.

AML was chosen as the programming language for the QA process since known routines were already available that could be adapted to the Franklin project. These routines were used as the base for the development of the QA AMLs.

Table 1 lists the batch files and AMLs used in the automated QA process.

Table 1. Automated QA Process Routines

QA Check

Routine

Description

ArcInfo Coverage

covqa.bat

Initializes the ArcInfo coverage AMLs

covqa_setup.aml

Creates output CSV and TXT files for coverage results

covqa.aml

Checks the coverage against the design

SDE

sdeqa.bat

Initializes the SDE AMLs

sde_setup.aml

Creates an output CSV file for SDE results

sde_extract.aml

Extracts information from the table schema

sdeqa.aml

Checks the SDE layer against the design

SQL Server

sqlqa.bat

Initializes the SQL Server table AMLs

sqlqa_setup.aml

Creates an output CSV file for SQL Server table results

sql_extract.aml

Extracts information from the table schema

sqlqa.aml

Checks the SQL Server attribute table against the design


ArcInfo Coverage

The ArcInfo coverage parameters that were checked during the automated QA review included feature class, precision, projection, dangle distance, fuzzy tolerance, polygon topology errors, item definitions, annotation subclasses, labels, and dangle errors. In all, a total of 58 coverages were checked during the process.

The COVQA.AML was used to verify the coverage specifications, as detailed in the physical database design, by first describing the coverage; comparing the results to the design; and generating a report listing any errors found. The AML checked for the following information:

    1.   Double precision. The coverage had double precision.

    2.   Map projection. The coverage was in Wisconsin State Plane NAD27 coordinates.

    3.   Dangle errors. There were no dangle errors, which might indicate polygon closure problems.

    4.   Duplicate or missing point labels. Each polygon had a single label point.

    5.   Existence of inappropriate feature classes and feature attribute tables. There were no missing feature classes or attribute tables or any additional feature classes.

    6.   Feature attribute table. The feature attribute table was formatted correctly.

    7.   Annotation levels and subclasses. The annotation subclass was correct.

    8.   Tolerances. The specified tolerance was used in processing the coverage.

Two reports were output detailing the AML results. One report detailed the results in an easy to read text format, which was output for each coverage checked. The other report provided a summary of all of the results in CSV format. An example of the detailed report is found in Table 2, while Table 3 presents an example of the summary results. A similar summary results table was also output during the SDE and SQL QA processes, described later in this document.

Table 2. Example of Detailed Report

Date: 03 Dec 01 14:40:11 Monday
Workspace: e:\franklin_data_conversion\area_4\qa_reports\jlovely\cov_qa
Coverage: aldermanic

Projection Information
  Projection: STATEPLANE
  FIPS Zone: 4803
  Datum: NAD27
  Units: FEET
  Spheroid: CLARKE1866

  Coordinate System Test
    PASS

Precision
  Precision of Coverage: DOUBLE

  Precision Test
    PASS

Tolerances
  Fuzzy Tolerance: 0.1
   Coverage processed with fuzzy tolerance

  Dangle Distance: 0
   Coverage processed with dangle distance

  Tolerance Test
    PASS

Dangles
  Number of Dangles: 0

  Dangle Test
    PASS

Label Errors
  Number of Missing Labels: 0
  Number of Multiple Labels: 0

  Labels Test
    PASS

Annotation
  Annotation: N/A

Topology
  Polygon Topology is present

  Topology has not been edited

  POLYGON Topology Test
    PASS

Attributes
  POLYGON Feature Table present

  POLYGON Feature Table Test—Item Number
  Number of Items Match...Continuing with item format check

  POLYGON Feature Table Test—Item Format
  Items match

  POLYGON Feature Table Test
    PASS



Table 3. Summary Results

COV QA Results

Date: 03 Dec 01 14:40:11 Monday

Table

Test

Status

Error

Layer Description

Design Description

ALDERMANIC

Coordinate System Test

PASS

ALDERMANIC

Precision Test

PASS

ALDERMANIC

Tolerance Test

PASS

ALDERMANIC

Dangle Test

PASS

ALDERMANIC

Labels Test

PASS

ALDERMANIC

Annotation Test

N/A

ALDERMANIC

POLYGON Topology Test

PASS

ALDERMANIC

POLYGON Feature Table Test

PASS


Process

The first step in the coverage QA process was to create a text file for each feature class, detailing the expected attributes. The required name and format for each item within the feature class were obtained from the database design. Table 4 provides an example of the information contained in the text file. During the automated QA process, as each coverage in a delivery unit was checked, an output text file for each feature class was created. The output text file was then used to check the required database design item format found in the text file created in the first step of this process against the converted data item format.

Table 4. Item Format

AREA,8,18,F,5
PERIMETER,8,18,F,5
ALDERMANIC#,4,5,B
ALDERMANIC-ID,4,5,B
DIST_ID,1,1,I

The next step during the process was to initiate the COVQA.BAT file. The COVQA.BAT file launched the COVQA_SETUP.AML, which created a run list from which to process each coverage with the COVQA.AML. The list took the form of ARC "&r covqa.aml <coverage>". The list was generated from the directory of coverages. The COVQA_SETUP.AML then created the output CSV and TXT files for the QA results.

The final step during the process was to initiate the run list for the COVQA.AML from within the batch file. Once the COVQA.AML was launched, the AML performed the following items:

    1.   Set initial path variables
    2.   Described the coverage
    3.   Opened output files
    4.   Opened comparison files
    5.   Began writing to output files
    6.   Checked projection information
    7.   Checked precision
    8.   Checked fuzzy tolerance
    9.   Checked dangle distance
    10.   Checked for dangles
    11.   Checked for missing or duplicate labels
    12.   Checked for correct annotation
    13.   Checked topology
    14.   Checked attributes
      a.   Output item information
      b.   Checked the number of items
      c.   Compared the design table to the converted data table
      d.   Checked the format of matching items
    15.   Closed comparison files
    16.   Closed output files.

After the AML process was complete, the technician interactively verified the results by spot-checking the results against the coverage using a combination of viewing problem areas in ArcEdit and examining attributes in ArcInfo Tables. Any discrepancies found were sent to the contractor in charge of converting the data.

Algorithms

Examples of the processes that the coverage QA routines performed are listed below:

    1. Wrote to output files

      /* Write out header information

      &sv date = [Date -vfull]
      &sv outstat = [write %outfile% [quote Date: %date%]]
      &sv ws = [Show workspace]
      &sv outstat = [write %outfile% [quote Workspace: %ws%]]
      &sv outstat = [write %outfile% [quote Coverage: %cov%]]


    2. Checked coverage parameters using the DESCRIBE command (i.e., coordinate system, precision, tolerances, dangles, label errors, annotation, and topology)

      /* Check coordinate system

      &if %prj$name% = STATEPLANE and %prj$fipszone% = 4803 and %prj$datum% = NAD27 and %prj$units% = FEET and %prj$spheroid% = CLARKE1866 &then
        &call Test_write_pass
      &else
        &do
         &sv desc = Wrong coordinate system used
         &call Test_write_fail
        &end


      /* Check precision

      &if %dsc$precision% = DOUBLE &then
        &call Test_write_pass
      &else
        &do
          &sv desc = Precision is set to: %dsc$precision%
          &call Test_write_fail
        &end


      /* Check tolerances

      &if %dsc$qfuzzy_verified% &then
        &sv outstat = [write %outfile% [format '    Coverage processed with fuzzy tolerance']]
      &else
        &sv outstat = [write %outfile% [format '    Coverage not processed with fuzzy tolerance']]

      &if %dsc$qdangle_verified% &then
        &sv outstat = [write %outfile% [format '    Coverage processed with dangle distance']]
      &else
        &sv outstat = [write %outfile% [format '    Coverage not processed with dangle distance']]

      &if %dsc$fuzzy% = 0.10000 and %dsc$dangle% = 0 and %dsc$qfuzzy_verified% and %dsc$qdangle_verified% &then
        &call Test_write_pass
      &else
        &do
          /* Check fuzzy tolerance and dangle distance
          /* Check to verify processing with fuzzy tolerance and dangle distance
        &end


      /* Check dangles

      &if %dsc$polygons% > 0 AND %dsc$qtopology% &then
        &do
          /* Start ArcEdit session
          /* Select dangles
          &if %dangnum% = 0 &then
            &call Test_write_pass
          &else
            &do
              &sv desc = Number of Dangles: %dangnum%
              &call Test_write_fail
            &end
         &end


      /* Check polygon labels

      &if %dsc$polygons% > 0 AND %dsc$qtopology% &then
        &do
          /* Start ArcEdit session
          /* Select labels
          /* Output number of multiple or missing labels
        &end


      /* Check annotation

      &if %dsc$annotationsubclasses% > 0 &then
        &do
          /* Open the annotation list
          /* Compare the annotation file with the %dsc$annotationsubclasses% variable
         /* Output annotation information
        &end


      /* Check topology

      &if %dsc$qedit% &then
        &do
          &sv outstat = [write %outfile% [format '     Topology has not been edited']]
            &sv outstat = [write %outfile% [quote %str%]]
           &call Test_write_pass
         &end
        &else
         &do
           &sv outstat = [write %outfile% [format '     Topology has been edited']]
           &sv outstat = [write %outfile% [quote %str%]]
           &call Test_write_fail
         &end


    3. Compared the contents of text files containing the expected record layout vs. the record layout as delivered.

      /* Check items

      /* Check to see if there is a feature table in the design
      /* Initially open the feature table to count the number of items
      /* Create an item list for the feature
      /* Write the items to a text file
      /* Check the number of items in the design against the feature text file
      /* Fail or pass depending if number of items match
      &if Pass
        &do
          /* Compare the design text file to feature text file to see if an item is missing
          /* Compare the feature text file to the design text file to see if there are additional items
          /* Check the format of the items in the feature text file against the design text file
          /* Fail or pass depending if the item formats match
        &end


SDE

The SDE layer parameters that were checked during the QA review consisted of attribute items, item format, primary keys, foreign keys, and feature classes. The SDEQA.AML was used to verify the SDE layer specifications, as detailed in the physical database design, by comparing the layer to the design and generating a report listing any errors found (Table 5). In all, a total of 96 SDE layers were checked for each delivery area during the process.

Table 5. SDE Summary Report

SDE QA Results

Date: 03 Dec 01 14:54:15 Monday

Table

Item

Status

Layer Description

Design Description

ALDERMANIC

ALDERMANIC table

PASS

NEWDEV

NEWDEV_KEY

FAIL

NEWDEV_KEY PK Numeric (8) N

NEWDEV_KEY PK Numeric (4) N


Process

As a first step in the SDE QA process, a text file was created for each SDE layer detailing required item names and formats from the database design. Table 6 depicts a typical text file created at this stage containing the following—column name, key type, data type, precision, scale, and column definition.

Table 6. Item Format

ALDRCT_SHP,,SE_INTEGER,N
DIST_ID,,Numeric (1),N
DIST_ID,PK,Numeric (1),N

Utilizing SQL Server Query Analyzer, a text file was generated for the SQL Server tables in the database, detailing the items found in Table 6. Due to the output format of the text file generated from the SQL statements, the text file required reformatting using MS Access before it could be used in the QA process. A new text file was then generated.

An additional text file was created containing the number of rows that each SDE layer contained. The text file was also generated using SQL statements in SQL Server Query Analyzer.

The next step in the process was to initialize the SDEQA.BAT file. The batch file launched the SDE_SETUP.AML, which created a run list from which to process the information for each SDE layer. The list took the form of ARC "&r sdeqa.aml <sde layer>". The list was generated from a directory of files that were imported into SQL Server. The SDE_SETUP.AML then created the output CSV file for the QA results.

The SDEQA.BAT file then launched the SDE_EXTRACT.AML, which extracted and reformatted information from the table schema text file generated from SQL Server. The extracted information was written to individual text files, one for each SDE layer.

The final step in the SDE QA process was to initiate the run list for the SDEQA.AML from within the batch file. The AML performed the following items:

    1. Set initial path variables
    2. Opened output files
    3. Opened comparison files
    4. Began writing to output files

      a. Checked the number of items
      b. Compared the design table to the converted data table
    5. Checked the format of matching items
    6. Checked row number
    7. Closed comparison files
    8. Closed output files.

After the AML process was complete, the technician interactively examined the results in SQL Server to verify that they were accurate. Any discrepancies found were sent to the contractor in charge of converting the data.

Algorithms

Examples of the processes that the SDE QA routines performed are listed below:

    1. Wrote to output files

    2. Checked the number of rows in the SDE layer against the number of records in the corresponding coverage

      /* Check to see if record numbers match between SDE layer and coverage

      /* Start ArcEdit session
      /* Select features
      &if %recnum% = %selnum% &then
        &sv outstat = [write %outfile% [quote %sde%,%sde% table,PASS]]
      &else
        &sv outstat = [write %outfile% [quote %sde%,Number of features do not match,FAIL]]


    3. Compared the contents of text files containing the expected record layout vs. the record layout as delivered.

      /* Check items

      /* Initially open the SDE text file and design text file to count the number of items
      /* Check the number of items in the design text file against the SDE text file
      /* Fail or pass depending if number of items match
      &if Pass
        &do
          /* Compare the design text file to the SDE text file to see if an item is missing
          /* Compare the SDE text file to the design text file to see if there are additional items
          /* Check the format of the items in the SDE text file against the design text file
          /* Fail or pass depending if the item formats match
        &end


SQL Server

The SQL Server attribute table parameters that were checked during the QA review consisted of attribute items, item format, primary keys, and foreign keys. The SQLQA.AML was used to verify the SQL attribute table specifications, as detailed in the physical database design, by comparing the table to the design and generating a report listing any errors found (Table 7). In all, a total of 30 attribute tables were checked during the process.

Table 7. SQL Summary Report

SQL QA Results

Date: 03 Dec 01 16:18:17 Monday

Table

Item

Status

Layer Description

Design Description

BLDG_ADDRSS

BLDG_ADDRSS table

PASS

CONDO

UNIT_NUM

FAIL

UNIT_NUM PK Character (4) N

UNIT_NUM PK Character (4) N


Process

As a first step in the SQL QA process, a text file was created for each SQL attribute table detailing the item format from the database design. Table 8 depicts a typical text file created at this stage containing the following—column name, key type, data type, precision, scale, and column definition.

Table 8. Item Format

BLDG_KEY,,LONG INT,N
BLDG_KEY,FK,LONG INT,N
BLDG_KEY,PK,LONG INT,N
PRCLADD_KEY,,Numeric (9),N
PRCLADD_KEY,FK,Numeric (9),N
PRCLADD_KEY,PK,Numeric (9),N

The next step in the process was to initialize the SQLQA.BAT file. The batch file launched the SQL_SETUP.AML, which created a run list from which to process the information for each SQL attribute table. The list took the form of ARC "&r sqlqa.aml <sql attribute table>". The list was generated from a directory of files that were imported into SQL Server. The SQL_SETUP.AML then created the output CSV file for the QA results.

The SQLQA.BAT file then launched the SQL_EXTRACT.AML, which extracted and reformatted information from the table schema text file generated from SQL Server as described in the SDE QA process. The extracted information was written to individual text files, one for each SQL attribute table.

The final step in the SQL QA process was to initiate the run list for the SQLQA.AML from within the batch file. The AML performed the following items:

    1. Set initial path variables
    2. Opened output files
    3. Opened comparison files
    4. Began writing to output files

      a. Checked the number of items
      b. Compared the design table to the converted data table
    5. Checked the format of matching items
    6. Closed comparison files
    7. Closed output files.

After the AML process was complete, the technician interactively examined the results in SQL Server to verify that they were accurate. Any discrepancies found were sent to the contractor in charge of converting the data.

Algorithms

Examples of the processes that the SQL QA routines performed are listed below:

    1. Wrote to output files

    2. Compared the contents of text files containing the expected record layout vs. the record layout as delivered.

      /* Check items

      /* Open the SQL text file and design text file to count the number of items
      /* Check the number of items in the design text file against the SQL text file
      /* Fail or pass depending if number of items match
      &if Pass
        &do
           /* Compare the design text file to the SQL text file to see if an item is missing
           /* Compare the SQL text file to the design text file to see if there are additional items
           /* Check the format of the items in the SQL text file against the design text file
           /* Fail or pass depending if the item formats match
        &end


QA Transmittals

Results of the QA process were submitted to the contractor in digital form. Acceptance/Rejection of a delivery area was communicated to the contractor using the results found in the visual and automated QA processes. The QA process was finalized after all delivery areas were accepted and a composite of the City was delivered.

Conclusion

Automated and visual QA procedures that were utilized in the data conversion process for the City of Franklin, WI, assisted in the assessment of converted data. The visual QA process assessed the converted data versus the original source data. The automated QA process assessed the converted data against the established physical database design. The automated QA process assessed the ArcInfo coverage, SDE layers, and SQL Server attribute tables using AMLs. Using the AMLs greatly decreased the time it would have usually taken to assess the data and increased the confidence that the data was accurately and completely checked.

This effort also proved that AML can serve as a tool for a wide range of validation tests of data files, formats, and contents. Examples of the techniques used in these reviews include:

    1. Writing to output files
    2. Comparing the contents of text files containing the expected record layout vs. the record layout as delivered
    3. Checking coverage parameters using the DESCRIBE command (i.e., coordinate system, precision, tolerances, dangles, label errors, annotation, and topology)
    4. Checking the number of rows in the SDE layer against the number of records in the corresponding coverage.

AML was found to be a flexible tool that was well suited for this type of QA, both in its programmatic capabilities and its ability to extract information from coverages and text files. The type of routines that were used in this project can be adapted to work with other such projects that utilize ArcInfo coverages and text-based information.

Acknowledgements

PlanGraphics’ Project Team—Franki Jenkins, Annie Metcalf, Tami Rudnick, and Mike Walls


Janette Lovely
Senior Research Analyst
PlanGraphics, Inc.
112 East Main Street
Frankfort, KY 40601
Ph: (502) 223-1501, ext. 2265
jlovely@plangraphics.com