Yichun Xie and George D. Graettinger

AN INTEGRATED ARCVIEW EXPERT SYSTEM FOR ANALYZING CONTAMINATED SEDIMENTS IN THE GREAT LAKES BASIN

Abstract

The U.S. Environmental Protection Agency (USEPA), in conjunction with the U.S. Army Corps of Engineers (USACE), Michigan Department of Environmental Quality (MDEQ) and Eastern Michigan University, is developing a desktop GIS system for managing, analyzing, and visualizing contaminated sediments in tributaries, harbors, and coastal zones of the Great Lakes. The system is designed to meet challenges for a common database and a set of versatile analytical tools to support quality assurance, regulatory and enforcement activities, and in-field sampling operation for State and Federal government agencies. The system is built on top of Esri's ArcView 2.1, customized Avenue scripts, and specialized environmental analytical modules.

The system applies the technology of expert system to integrate administrative and professional intelligence with the technical capacity of ArcView 2.1. It has a built-in machine learning mechanism to enable users to accumulate knowledge or obtain insights from past system experience for formulating analytical tasks. With this added functionality, the system provides powerful analytic tools and flexible query builders for examining pollutant issues under various scenarios in priority geographic areas. The system also includes a customized field-data-entry module to facilitate in-field sampling activities. This module imports GPS readings, updates sampling site maps, and links with other environmental factors in real time. The module can be employed to locate in- field sampling sites for pollutant hot spot search through integration with the USEPA statistical package. This module also supports data entry of in-field laboratory results, which recognized the database structure, transfers common information (key table fields) to relevant tables, and automatically leads users to next appropriate table on the hierarchy after data entry is done for a table. Compulsory data entry fields must be entered to create a record, while others fields can be filled out in greater detail at a later time as warranted. The system is an all purpose visualization tool box that supports large scale (currently up to 1:1,200 scale) multi-media display of maps, images, photos, graphics and drawings depicting the environmental impacts of contaminated sediments. In addition to the contaminated sediment data, data on sources, hydrology, transportation, hypsography, and land use are being developed to enhance system capabilities.


     Introduction



          Beginning in 1993, the U.S. Army Corps of

     Engineers - Detroit District (USACE-Detroit) developed

     a relational database to manage sediment chemistry data

     collected in the major waterways in Southeast Michigan. 

     This work was performed to support the U.S

     Environmental Protection Agency - Region 5 (USEPA-

     Chicago), and the Michigan Department of Environmental

     Quality (MDEQ) for quality assurance, regulatory and

     enforcement activities in and out of the field with the

     use of applied GIS technology, under the South East

     Michigan Initiative (called SEMI Project).  The SEMI

     project is expected to ultimately provide each of the

     three agencies with a comprehensive geographic

     information system (GIS) as well as a compatible

     database for importing, analyzing, modeling,

     visualizing, and reporting sediment sampling and

     contaminant information for the southeast Michigan

     region.  Eastern Michigan University (EMU) has been

     called in to provide GIS technical support to the SEMI

     system.  The SEMI project area includes the following

     counties in Michigan: Lenawee, Livingston, Macomb,

     Monroe, Oakland, St. Clair, Washtenaw, and Wayne, and

     the watersheds: St. Clair, Black, Clinton, Detroit,

     Rouge, Huron, and Raison Rivers, and Lake St. Clair.

     
Fig. 1 The SEMI Project

     Area Basemap
 



          Recently, the urgency for integrating decision

     support capacity in the SEMI system becomes

     overwhelming.  As a result, the SEMI system has been

     formally replaced by the Fully Integrated Environmental

     Location Decision Support (FIELDS) System.  The FIELDS

     system is designed to be an innovative and cost

     effective desk-top GIS system to support objective

     decision making in and out of the field for managing,

     analyzing, modeling and visualizing environmental

     information within priority areas and sites (Williams,

     Lin, and Graettinger, 1996).  The completion of a

     three-phase FIELDS system prototype in Southeast

     Michigan (the original SEMI system) is scheduled to go

     on-line by September, 1996.  This paper introduces

     major technical innovations and capacities of  the

     FIELDS system.



     A Desktop GIS for Multiple Applications



          The FIELDS system provides a set of dynamic and

     powerful GIS tools for importing, analyzing, modeling,

     visualizing, and reporting environmenal information to

     assist decision-making in channel dredging, quality

     assurance, regulatory and enforcement activities.  In

     addition to popular GIS functionalities of spatial data

     management, mapping and analysis, the FIELDS is

     designed to fulfill specific agency analytical tasks,

     including,

          (1) identification of pollutant hot spots;

          (2) query of hot-spot laboratory results and

     related environmental information;

          (3) volumetric calculation of contaminated

     sediment;

          (4) estimate of engineering cost of sediment

     removal and dredging; and

          (5) customized visualization tools to communicate

     decisions.

     
Fig. 2 Customized

     Multi-Step Hot-Spot Search and Query


          Fig. 2 Multi-Step Search and Query is a successful

     example of using the FIELDS system to search for hot

     spots of contaminated sediment and to query laboratory

     measurement results and related environmental

     information in the Monroe Harbor, Michigan.  This

     customized analytical function involves multi-step

     queries through dynamic join/link among a number of

     data sets (see Fig. 3).

     
Fig. 8 The Database

     Structure of the FIELDS System
     We first search for hot spots among all sampling

     sites by linking and querying the pollutant laboratory

     measurement results, and create a ArcView hot-spot

     shapefile (the red dots in Fig. 2).  Then we link the

     newly created hot spots with the laboratory results and

     other environmental information to conduct further

     queries for subjective decision making (the yellow dot

     in Fig. 2).

     
Fig. 3 Volumetric Analysis

     for Dredging and Sediment Removal
     

          Fig. 3 demonstrates an important analytical task for

     engineering estimate of sediment removal.  However, due

     to the nature of field sampling and laboratory

     procedures, this is a tough programming job.  For

     example, each hot spot may have tens of lab samples and

     each lab sample may have hundreds of chemical tests. 

     So changeable are the number of hot spots in a

     designated area, the number of lab samples for each hot

     spot, and the number of chemical tests for each lab

     sample.  Therefore, the calculation of the average

     depth of lower section samples and the average

     concentration of pollutants becomes tremendously

     complicated.  Fortunately, with the innovative

     scripting in the FIELDS, we streamline this analytical

     task to a button and a few dialogue browsers.



          In addition to meet specialized analytical tasks

     for technical staff in USEAP, USACE and MDEQ, the

     FIELDS system also provides the broadest and simplest

     means for novice system users to acquire immediate

     visualization or query of priority areas of great

     environmental concerns.

     
Fig. 4 Public Inquiry of

     Environmental Concerns through the FIELDS System
     Fig. 4 provides an illustration that industrial activity

     near the estuary of a river has polluted a fish sprouting

     site.



          Moreover, the FIELDS system handles spatial data

     at very large scales (normally 1:12,000), covering

     engineering details for maintaining navigation channels

     by the USACE.  Moreover, the USEPA prefers more

     extensive sampling programs as well as more specific

     chemical sampling than is currently available, both

     requiring more accurate base maps.  As a result, the

     USEPA funded a Trenton Channel remapping projects at a

     scale of 1:6,000.  Maps at such a larger scale can show

     detailed shoreline, hydrology, transportation and

     structure features, including individual mapping of

     residential, commercial, industrial, and institutional

     constructions.  The features were collected directly

     from the digital orthophotography generated from the

     project with a horizontal accuracy of +/- 1 meter.  All

     digital vector data are stored in a schema derived from

     the Department of Defense Tri-Service Spatial Data

     Standards (USACE, 1995; USGS, 1993, 1995).  The FIELDS

     System provides a set of display tools, taking full

     advantage of the detailed base maps, sampling points,

     orthophotos, and laboratory results as well as GIS

     visualization power, and generating very informative

     multi-media representation of contaminated sediments

     and related environmental factors (see most figures in

     the paper).

    

     An Expert System Integrating with User s Knowledge and

     Agency s Experiences



          The FIELDS database contains laboratory results of

     more than 200 chemicals, which are currently required

     to be examined by the USEPA, and in addition a dozen

     environmental factors, including hydrology,

     hypsography, transportation, land use/cover, soil,

     groundwater, and pollution sites.  Numerous ways can be

     employed to explore the interactions between pollution

     chemicals and environmental factors.  Moreover,

     governments, agencies and organizations involved with

     this project have shown varied interests in pollution

     issues or prevention actions, and thus have different

     requirements concerning query tools for examining the

     contaminated sediment and related environmental data. 

     It is very challenging to accommodate users of various

     political, operational, and technical backgrounds with

     a common system, which signifies a critical need to

     develop a flexible query tool box.  This module is

     called Query Knowledge Database (QKD), based on the

     concepts and techniques of expert knowledge,  machine

     learning, and knowledge-base management.  QKD enables

     end-users to integrate their specific experience or

     knowledge when performing a particular analytical task. 

     Explained in non-technical (popular) terms, the sub-

     module On-Screen Query allows users to formalize

     analytical criteria interactively in the customized

     windows.

     
Fig 5 Customized Knowledge

     Acquisition Module


          The module Knowledge Acquisition records down the

     analytical statements specified in the sub-module On-

     Screen Query, and stores them as a file called

     Knowledge File for later references.  The sub-module

     also prompts users to answer a set of predefined

     questions (Fig. 5),

     
Fig 5 Customized Knowledge

     Acquisition Module
     which are appended to the knowledge file as "metadata" to

     facilitate knowledge database management.  Clearly this

     phase performs the function of machine learning.  The

     sub-module Knowledge Database shows users all existing

     knowledge files sorted by the involved agencies or

     projects and stored in the FIELDS System, allowing

     users to load, read, and modify these knowledge files

     as new query statements (Fig. 6).

     
Fig 6 Knowledge Database

     Helps Formulate Analytical Tasks
     The module Expert Query actually launches query actions.



     A Real Time GIS System Supporting USEPA/USACE In-field

     Activities



          One of the objectives of the USEPA is for the

     desktop GIS to be capable of providing user access,

     updates, and views in the field.  With recent

     advancements in portable laboratory analysis in the

     field, the USEPA intends to use the FIELDS system on

     notebook-PCs to integrate with USEPA  Mud Puppy  GPS

     system to evaluate sampling site design plans and

     contaminant levels while on-site.  These tasks require

     that the FIELDS System be a real-time portable GIS

     system.



          We have developed several Avenue Script modules to

     provide these technical capacities.

     
Fig 7 In-field Integration

     with GPS System
     Figure 7 describes the GPS Loader Module, which

     reformats GPS reading export files into a format

     supported by ArcView and converts the geographical

     coordinates from the longitude-latitude to the state-

     plane system.  Then the loader module creates a point

     shape file (the red dots in Fig. 7) based on the newly

     converted GPS readings by launching an Add-XY-Event

     function in ArcView.  This loader module also supports

     dynamic link or join with other environmental

     information or laboratory results, assuring that the

     spatial information derived from GPS readings be part

     of the real time portable FIELDS system.



          A field data-entry module has been designed to

     allow the USEPA staff and contractors to update the

     FIELDS database on-site and to visualize laboratory

     results in the field.  The FIELDS database is a

     relational database for contaminated sediments in the

     major waterways of Southeast Michigan developed by an

     USACE contractor.  The work involved development of an

     Oracle database structure and loading routine, and a

     dBase counterpart along with data dictionaries and

     accompanying documentation.  Population of the database

     has been completed with all available sediment data

     sets from the Detroit River, as well as from the

     watersheds of the Clinton, Rouge, Huron, and Raisin

     Rivers.  The FIELDS sediment database structure has

     been designed to hold all data attributes generated

     from historic sampling and laboratory analysis (Fig. 8).

     
Fig 8 The Database

     Structure of the FIELDS System
     It includes tables for project names, station locations

     and political references, sampling methods and related

     information, and laboratory analysis results.  The

     database also contains watershed and river reach

     designators to allow specific geographic look-ups and

     related political or hydrologic-based queries.  The

     database is designed in a manner to facilitate

     execution of (1) structured query for pollution

     chemicals;  (2) spatial query for environmental

     impacts; and (3) the combination of the structured and

     spatial queries for exploring joint effects of

     pollutants and environmental factors.



          The field data-entry module (Fig. 9) recognizes the

     
Fig 9 In-field Updating

     of Sediment Lab Measurements
 

     FIELDS database structure, transfers common information

     (key table fields) to relevant tables, and automatically

     leads users to next appropriate table on the hierarchy

     after data entry is done for a table,  Moreover, some

     of these data entry fields are compulsory, while others

     can be filled out in great detail at a later time, as

     warranted.



          Another innovation of the FIELDS System is the

     integration of statistics for designing sampling plans

     for sediment studies.  The integration of statistics

     assures that the selection of sampling sites and the

     location of these sites in field can obtain an

     acceptable and desirable level of accuracy and

     precision at minimum costs (Lubin, Williams, and Lin,

     1995).  The data collected from these sample sites can

     then be used in contaminant and mass/volume analysis,

     sediment risk assessment, and cost and remediation

     scenarios to make well-informed decisions (Fig. 10).

     
Fig 10 Statistical

     Techniques Applied To Sediment Sampling
 

     References



     Lubin, A. N., M. H. Williams, and J. C. Lin, 1995.

     Statistical Techniques Applied To Sediment Sampling

     (STATSS) , USEPA Region 5, Water Division, 77 West

     Jackson Blvd, Chicago, IL 60604-3590



     U.S. Army Corps of Engineers, CORPSMET, April, 1995



     U.S. Geological Survey, The SDTS Mapping of the DLG-E

     Model, October, 1993



     U.S. Geological Survey, Metadata Definition, June, 1995



     Williams, M. H., J. Lin, and G. D. Graettinger, 1996.

     The FIELDS System, USEPA Region 5, Water Division, 77

     West Jackson Blvd, Chicago, IL 60604-3590

     

Yichun Xie, Ph.D. Assistant Professor Department of Geography and Geology Eastern Michigan University Ypsilanti, MI 48197 (313) 487-0218 Fax: (313) 487-6979 Email: xie@emunix.emich.edu. George Graettinger GIS Program Manager US EPA Region 5 Water Division 77 West Jackson Blvd Chicago, IL 60604-3590 (312) 866-5266 Fax: (312) 886-7804 Email: Graettinger.George@epamail.epa.gov