Kara L. Hastings and Mark A. White
The management of digital data demands the use of quality control and quality assurance (QA/QC) parameters throughout the duration of the project to maintain data integrity. With the increased dissemination of digital data, this aspect of geographic system information (GIS) management is most important. The quality of digital data can be ensured with management focused in three fundamental areas. The initial project management component is the first phase. The second phase is the quality control and quality assurance procedures throughout the project, followed by a final quality assurance check prior to data delivery. Managing a project in a GIS can be difficult without quality assurance and quality control. Since there can be subtasks, multiple workers, and various forms of output, the margin for error can be quite large. The often poorly defined QA/QC protocol contributes to inconsistencies in data management. From conception to data delivery, a project should pass through a series of checks and established control measures. A range of acceptable values sought by the control measures are quality assurance parameters. Project progress is directly proportional to the level of confidence in the data. This paper discusses quality control measures used to manage data effectively for the production of ESI atlases. These atlases are a compilation of biologic and socioeconomic resources and shoreline characteristics for the purpose of oil spill response. With numerous data sources, complex data structure, and digital data deliverables, these atlases require strict quality assurance parameters in order to maintain a high level of confidence in the data.
Environmental Sensitivity Index (ESI) atlases are used as oil spill planning and response tools by the National Oceanic and Atmospheric Administration (NOAA), the U.S. Coast Guard, oil companies, and state agencies. Produced in both hardcopy and digital formats, they offer the user detailed environmental and socioeconomic information in a spatial GIS format. In addition, several INFO files, originally ORACLE® tables, are used to record specific information about each data layer in the GIS. The nature of the use of these tools requires the data to be both accurate and precise, thus making quality control a necessary part of atlas production. However, defining the steps for this process can sometimes be a difficult task. Quality control can be described as "a two-stage process: (1) implementing techniques and procedures that attempt to reduce errors and eliminate mistakes; and (2) reviewing all completed work to identify and correct errors before any product is released (Nugent, 1995)." In another context, quality control is "delivering data which precisely meets the specifications of the user (Fain, 1995)." After developing the ESI data structure, experiencing recurrent and unique difficulties, and establishing a desired final product, Research Planning, Inc. (RPI) has created a QA/QC program which promises a final, high quality product. Now, some might ask, 'What is high quality?' Since the quality of the source data ultimately limits the quality of the final product, selection of digital and hardcopy data sources and methods used to integrate these data are critical. In RPI terms, 'high quality data' is that which has had minimal integrity degradation or very low error propagation during the project tasks. Secondary methods of data collection, gathering data from existing documents (Thapa and Bossler, 1992), are used to compile digital and hardcopy information into a GIS. Since secondary data collection also incorporates errors from primary data collection, a situation occurs which might threaten data integrity if quality control techniques are not implemented. Errors which might result from secondary methods of data collection are described in Thapa and Bossler (1992). They list the following errors which could be found in spatial data processing:
These and other errors likely to result from ongoing GIS manipulations during the project are treatable and avoidable with QC tactics. However, as Thapa and Bossler (1992) and Campbell and Mortenson (1989) found, the standards and specifications for secondary data collection and information on QC measures for digital data are not readily found or easily developed.
The success of data integration and ESI atlas production is directly dependent upon a quality control program which is both preventative and pro-active from project conception to formal data delivery. Stringent QC standards and QA parameters increase the confidence level in the production of reliable digital data. This paper discusses the three phases of QC necessary to maintain the integrity of digital data and ensure the end user of a high quality ESI atlas. Initial project management, maintaining data integrity throughout the project, and final quality assurance check and metadata compilation are the three quality control phases developed for ESI atlases.
The initial project management and set-up of an ESI atlas are most important to achieve total quality control. Understanding the potential for error in all tasks should be at the top of the to-do list even before starting the project. In addition, personnel management and job responsibilities must be established to ensure efficient and timely progress. In the initial project management phase of an ESI project, the Project Manager, GIS Coordinator, Biologist, and Geologist come together to approve methods of data acquisition, integration, and methodology. Since RPI produced the first digital ESI atlas in 1989, the initial data structure and format of final data deliverables have become a solid regime of mandatory coverage names, item definitions, and specifications. The Environmental Sensitivity Index Guidelines (Halls et al., 1997) outlines the general data structure, possible data layers, attribute labeling methods, and database design used for all ESI atlases. All possible geographic themes and attribute values are identified and general QC standards are explained. This document is the backbone of ESI production and is primarily responsible for the efficient incorporation of the vast amount of data required for the atlas. However, the more technical view of data manipulation and quality control is left to the discretion of the Project Manager and GIS Coordinator. To standardize data entry and atlas production between Project Managers and GIS Coordinators, process-specific documents, logs, and check-off sheets are introduced. Most of these tools are used in Phases II and III of the QC program; however, it is imperative that they be established in the beginning. A bound log book acts as a repository of all known information about the project, tracks progress, and documents worker accountability for data processing. Any changes to the data structure, new attribute values, and special concerns are recorded in the log book and flagged for special QC measures. After careful consideration of properties unique to the project, such as scale, projection, units, etc., the Project Manager and GIS Coordinator develop quality assurance parameters which limit the range of accepted values of specified machine processes. These processes include registration of images, building/cleaning tolerances, weed/grain tolerances, and number of tics per coverage. The Project Manager sets up a schedule of events which includes checks of attribute databases and edit plots of coverages so that the QC program has a temporal element as well as technical.
To review guidelines for process steps and approve a set of project parameters, a set-up meeting is held for all those involved with atlas production. One of the most overlooked steps of quality control is the lack of information dissemination to all individuals involved with the project. Making sure that all persons are up-to-date with methods, parameters, and procedures will make quality control and assurance more enjoyable. Issues such as data sources, required data layers, specific client requests, and projection information should be recorded in the log book and openly discussed. This phase could also be called Quality Awareness.
The second phase of quality control for ESI atlases is usually the most detailed and time consuming, lasting for the duration of data integration and GIS processing. Three major concerns during this phase are digitization consistency, coverage accuracy and precision, and attribute accuracy. QC is performed with the use of check-off sheets and technical standards. Beginning with scanning, cleaning, and registration of hardcopy maps, quality assurance parameters set in Phase I are used to keep tolerances, attribute values, item definitions, and feature classifications within desired limits. For example, the allowed Residual Mean Square (RMS) error for registration of images is 10 meters or less, based on unit and scale. A RMS error of 5 is optimal when transforming images to be used as basemaps. Digitization of biology and socioeconomic coverages begins with the creation of several data layers per map with their respective items. To control the quality of this process, macros are used to generate coverages and add the necessary items with the appropriate definitions. Without macros, generation of close to 500 coverages per atlas would be a tedious and time consuming venture with a very high probability of error. The step-by-step processes of digitization and data integration are outlined in the standards document and records of persons working on the task are kept. Examples of tasks in this phase which contain quality control measures are:
Projection parameters: All data must be in the correct projection for data analysis and map production. Each coverage must have defined projection parameters.
Logical consistency or topological integrity: Data sets are checked for label errors, slivers, pseudo nodes, and overlapping polygons with the same data. This is primarily for hardcopy data which is digitized in house; however, received digital data are also checked for logical consistency prior to integration.
Edgematching between coverages: This function is performed during the digitization process. As ESI atlases consist of many coverages per data layer, snapping and edgematching is imperative to reduce errors of omission and commission. Errors of omission include features which were overlooked during digitization and errors of commission are a result of inadvertently digitized polygons or slivers. Quality control for this step is accomplished by snapping nodes between coverages, mapjoining across data layers, and the production of edit maps to overlay and compare to source data.
Coverage attributes: All possible values for any item used in the ESI atlas should be known before attributing begins. Every item field should be populated with one of the accepted values. To check this, macros are run on all coverages and data sets to find missing or illogical information. An error file is written and the GIS Coordinator reviews and corrects all discrepancies.
Edit Plots: Two review edits are performed by internal personnel as well as source data providers. Internal edits and corrections are followed by delivering the processed data in review map form to the source data providers for correction and approval. Edits which may be encountered during this process include deletion or addition of polygons and points and their associated data, rearrangement of data, change of feature type, or even incorporation of new data. Because of this wide range of possibilities, a last set of edit maps are assembled and checked before final map production begins.
ORACLE® database updating: Besides attributing the coverage features with an ID number and general feature type, the bulk of information is contained in an ORACLE® database, related to the coverages by an ID number. Quality control of this database is managed with macros which relate each coverage to the database and check for any discrepancies. In addition to automated QC measures, the content of the data is reviewed by in-house and out-of-house personnel and corrections are made to the database.
The final stage of quality control addresses digital data delivery. Final digital data are delivered to the client as master coverages for each data layer and related INFO files which are imported from ORACLE®. The mapjoining of master coverages offers another opportunity to ensure the quality of data by bringing out errors not apparent in an edit plot. Errors of omission and commission are sometimes missed in individual coverage checks but easily identified in master coverages. It is recommended that the review of master coverages be performed by a person other than the coverage originator. Frequencies are run on each data layer to examine its attributes and locate any data entry errors. Corrections are made to both the master and the original map coverages. A series of SQL® scripts analyze the ORACLE® tables for inaccurate data entry, missing data, and table relationships. The scripts analyze each polygon or point in each data layer and its related records in ORACLE® tables. A report file is generated describing the condition of the coverages and/or tables. If all checks are performed properly during Phase II, the edits at this point will be minimal. All quality control measures for data coverages are based on region features up until now. However, if a client cannot support region topology, RPI will convert the data to polygon topology.
The conversion from region topology to polygon topology is achieved using macros. This process step demands another check of final data integrity. Since the actual data structure of the coverage is changed, it is necessary to relate the master polygon coverages back to the INFO files. Relates are used to find missing or duplicate data or errors in data entry. A check-off sheet is used to document the completion of all coverages, data tables, and lookup tables (Fig. 1). If the structure of the coverages and tables were set up properly during Phase I of the project, this final assurance check prior to data delivery is swift and makes metadata production easier.
Metadata makes quality control possible even after digital data has been taken off line. It gives the next data user the potential for a better quality product. Each section included in metadata gives information to assess the usability, accuracy, potential development, and applicability of the data set for his/her purpose. Use of metadata will ultimately keep time, expense, and stress to a minimum and allow the next user to develop a high quality product.
FIGURE 1. Final data structure check-off sheet.
With the implementation of quality control techniques into ESI atlas production, final hardcopy and digital products have become more accurate and trusted tools for oil spill response and planning. Guidelines and standards serve as reference and teaching materials, while logs and check-off sheets are a means to track progress and worker accountability. However, the most important aspect of quality control is getting the necessary information to workers so that they might do the job right the first time.
RPI has made great efforts to standardize procedures and develop QA/QC programs for all GIS projects, which has enabled the production of high quality data. However, as GIS technology is a continuously changing discipline, new methods are essential to project success.
We would like to thank the GIS Department at RPI, Bill Holton, Chris Locke, and Joanne Halls, for comments on this paper and their continued efforts to improve quality control methods at RPI. In addition, we would like to thank Todd Montello, Coastal Geomorphologist, for his ideas and insights.
Campbell, W.G. and D.C. Mortenson, 1989, Ensuring the Quality of Geographic Information System Data: A Practical Application of Quality Control. Photogrammetric Engineering & Remote Sensing, 55:1613-1618.
Fain, M.A., 1995, Quality Assurance: How to Build QA into the Conversion Process. Esri 1995 Proceedings Publication.
Halls, Joanne, Jacqueline Michel, Scott A. Zengel, and Jeffrey A. Dahlin, 1997, Environmental Sensitivity Index Guidelines (Vers. 2.0): NOAA Technical Memorandum NOS ORCA 92, Hazardous Materials Response and Assessment Division, National Oceanic and Atmospheric Administration, Seattle, Wash., 86 pp. plus appendices.
Nugent, J.L., 1995, Quality Control Techniques for a GIS Database Development Project. Photogrammetric Engineering & Remote Sensing, 61:523-527.
Thapa, K. and J. Bossler, 1992, Accuracy of Spatial Data Used in Geographic Information Systems. Photogrammetric Engineering & Remote Sensing,, 58:835-841.