Edward D. Speer

Accessing Water Quality data through GIS

Abstract

King County, Washington's Metropolitan Services Department (Metro) will complete a GIS installation in March, 1995. As part of this two year project, three specific applications were built to support Metro's Water Pollution Control efforts. The focus of this paper is one of these applications: a geographical view into Oracle-based water quality information contained in a Laboratory Information Management System (LIMS). This database contains information on over 3000 point locations that have been sampled as far back as twenty years. The GIS application described in this paper communicates with the LIMS database through Oracle SQL*net and ArcInfo's Database Integrator (DBI) to obtain requested information on the selected points.

The "Water Resources GIS Application" was developed using in-house staff at King County Metro, and built using Arc Macro Language (AML) . While the host platform is a Unix server, users can access the application from a workstation, or from their personal computers (using X-terminal emulation). The application was built using menus and forms so that the end user does not require a working knowledge of any ArcInfo commands. Since many of the end users may only need to access this information sporadically, the use of a graphical user interface with on-line help enables the application to be used by a larger number of planners and water quality technicians.

Overview

King County, Washington's Department of Metropolitan Services (Metro) operates the Puget Sound region's two largest wastewater treatment plants and has had an active water quality monitoring program for over twenty years. In 1991, the database that contains the analytical results from thousands of samples was overhauled and stored into a comprehensive Laboratory Information Management System (LIMS). Water Quality planners at that time were trained in using new computer technology that allowed them to use their personal computers to directly access the information.

Upon its completion in March, 1995, Metro's Geographic Information System (GIS) project provided the capability for desktop users to also make geographic queries to certain "core" datasets, including sewer lines, basin boundaries, and the water sampling locations. Using Esri's ArcInfo and ArcView software products, users have access to primary datasets pertinent to the agency. In addition to the datasets and the vendor's software, Metro also built some custom applications for specific needs. One of these applications allows users to access data from the Laboratory (LIMS) through ArcInfo.

Application description

Planners have historically extracted information from the Laboratory database utilizing two methods. The first is to contact a 'data analyst' who translates the requested information into a program that produces a report. The second method (available since 1991) is to directly use the planner's desktop computer to run a forms-based program to extract the required information. Using this second method, the user chooses a geographic sampling location by entering its identifier, then enters a sampling parameter (e.g. pH, Temperature, bacteria count). The resulting information is usually exported to a desktop spreadsheet program for further manipulation.

The application described in this paper provides a third alternative. The "Water Resources application" is written in AML (Arc Macro Language) and appears to the user as a series of menus. The user is not expected to know any ArcInfo commands. Using various form menus, the user can select and retrieve data from water quality samples from an Oracle database that resides at King County Metro's Environmental Laboratory. The main menu is a pulldown menu and is shown in figure 1.

Image of Main Menu and Arcplot Display
                Figure 1. Main menu and display of application

The primary functions of the application are:

  1. Selection of sampling points
  2. Extraction of selected data from LIMS database
  3. Plotting or mapping of resulting data

This application assumes that the user interacts with one 'site' at a time. A site is a logical group of sampling locations (locators) based on use. Examples of sites are 'Major Lakes', 'Streams', 'Marine Offshore', etc. All water sampling points are represented as a single ArcInfo coverage, and the site is given on the point attribute table. While site is not strictly required to perform a query to the Oracle database, the speed of the query is significantly increased, since all of the tables in the LIMS are indexed by site.

Selection of sampling points

The pulldown menu shown in figure 2 is used to select points. All functions work with points within the currently selected 'site' (which can be changed from the 'options' menu). While most functions are straight-forward translation of the various RESELECT, ASELECT, and UNSELECT options, the 'Select by Query' often proves a useful alternative. In many cases, the analyst has no prior knowledge of what is sampled at the points shown on the map. The 'Select by query' allows the user to have Oracle select locators based on what has been sampled, and, to some extent, what the results were. Refer to figure 3. In this example, the user can select all points where the average Sample Temperature taken between two given dates exceeds a given value. The "advanced..." menu button is used to qualify the results based on the average, minimum, or maximum value of the selected parameter (in this case, Temperature). This also allows the user to find the 'active' sampling locations. The database query actually returns the minimum, average, and maximum values for the parameter, and allows the user to saves these results in an INFO file. This file will then be compatible with other statistics files that can be created using the regular extraction method discussed in the next section.

Screenshot of Menu choices for selecting points
             Figure 2.  Selection Menu choices
Screenshot of Select-by-Query Form Menu
             Figure 3.  Select by Query

Extraction of selected data from LIMS database

Once a set of sampling points is selected, the user can query the Oracle database in one of three ways:

  1. Select minimum, maximum, and average values of a selected parameter for all selected points (statistics)
  2. Select the date and value for a selected parameter for ONE selected point (time series)
  3. Select the depth and value for a selected parameter for ONE selected point on ONE selected date (depth profile)

The results from the query will be stored in an INFO file. An extension will be added to the user-supplied filename to help determine whether it is the result of a statistical query, a time series, or a depth profile. In addition, metadata describing the elements of the query are stored in a single record in a 'metafile', which is another INFO file. A user will have up to three metafiles: one for statistics, one for time series, and one for depth profiles. The metafile will contain all information about a query, such as the parameter, the units returned, the date of the query, and a user comment.

A special form was built into the application to help the user browse and maintain a workspace. This is shown in figure 4. If a user deletes an INFO file using this form menu, the metafile record is also removed. This also allows the user to export INFO tables to formats usable by desktop spreadsheets (dBase or delimited text).

Screenshot of Workspace Menu

Figure 4. Workspace menu

For depth profile queries, the user can prompt the Oracle database to return a list of valid sample dates from which to choose, since the query must contain the specific date of the samples.

Plotting or mapping of resulting data

All plotting or mapping functions presume that the user has already extracted the pertinent information from the Oracle database and has stored the results in an INFO table. Therefore, each function begins will a file browsing dialog box that shows the INFO files in the user's current workspace. For statistics files (*.STAT), the user can perform the following:

For time series and depth profile files, a simple plot can be generated. See figures 9 and 10. Note that time series plots are not enabled with extensive x-axis labels, which could represent dates. Since AML has no date arithmetic built in, it would be cumbersome to write the code to calculate tics and labels for this axis.

Also note that the depth profile graph is oriented so that the y-axis is reversed. This is because depth is stored in the LIMS database as a positive number. In order to create this graph an additional item is stored in the INFO table that represents a "reverse depth".

Screenshot of Workspace Menu
         Figure 5. Sample of Spot Map
Screenshot of Workspace Menu
         Figure 6. Sample of Located Bar Map
Screenshot of Sorted Bar Graph
         Figure 7. Sample Sorted Bar Graph
Screenshot of Isopleth Map
         Figure 8. Sample Isopleth Map
Screenshot of Time Series Plot
         Figure 9. Sample Time Series Plot
Screenshot of Depth Profile Plot
         Figure 10. Sample Depth Profile Plot

Metro's GIS Environment

Metro purchased two servers for its GIS environment, a Netware server and a unix (OSF/1) server. Access to Metro's GIS database can be accomplished through either ArcView 2 or ArcInfo. Most end-users have been equipped with a personal computer running Microsoft Windows that is attached via Novell Netware to the Novell file server. This server, using Novell's NFS-Gateway, makes the production data on the unix server available through Netware. If the user also requires access to ArcInfo, an X-emulation software package is also installed on the Netware server. A select number of users also have access directly to the unix server through unix workstations or X-terminals. In order to use the Water Resources application, a user must have workstation access or use the X-emulation software on their personal computer. Metro's unix server has 13 ArcInfo licenses and an Oracle license.

Conclusion

Metro's Water Resources application allows planners who are not familiar with ArcInfo commands to use some of the more powerful features of the software. This application also makes it easier for end-users to extract information that has traditionally been difficult to access, and usually requires the intervention of a data specialist. However, using GIS to access this data should only be used to view the geographic nature of the information. For example, time series graphs from a single sampling point can be performed by many software packages that are more suited to that kind of analysis. This application was built to provide a geographic dimension to the data, not to replace existing data retrieval tools.

Appendix: Data background

The datasets relevant to Metro's Water Pollution Control Department (WPCD) are:

Since the business units within the organization are physically and organizationally distributed, "corporate" data is likewise distributed. The GIS project sought to work within this framework by accessing existing databases on-line and to avoid replicating data. ArcInfo's data model makes this difficult with geographic data. However, attribute data can usually be accessed using the Database Integrator (DBI). In the case of the water quality sampling points (referred to as "locators"), the only items on the feature attribute table (beyond those imposed by ArcInfo) are:

Strictly speaking, site is not required and could be accessed from LIMS. However, all records from the relevant tables in LIMS are indexed by SITE, which speeds performance greatly. A script is run weekly to generate the "water sampling points" coverage from the X and Y location data contained in the Oracle database.

The Oracle table that contains the analytical results is called episodedata and will be referred to simply as the 'data table'. There is also a 'summary table' (epiosdesumm).

Generally speaking, data in the LIMS database is stored in normalized form. The data table contains results of laboratory tests run on the samples. However, since the types of analysis differ from sample to sample (even for the same location), the schema of data table (simplified) is:

For a single sample, there may be up to fifty records listed in the data table. Some parameters will require the use of textvalue (e.g. Weather Conditions). However, for data of interest in this application, the parameter result will be in the 'Numvalue' column (e.g. Fecal Coliform, pH, Sample Temperature). A simplified schema for the summary table is listed below:

Therefore, to ascertain all values of pH from locator 'KTHA02' in the site 'STREAMS', the following query is constructed:

select data.numvalue from summary, data where summary.site = 'STREAMS'

and summary.locator = 'KTHA02' and data.parmname = 'pH' and data.datasetid = summary.datasetid;

The database structure above is not amenable to ArcInfo's relate environment, which presumes a one-to-one relationship in data elements. For this reason, the application uses the DBMSCURSOR function in several instances to move data from the Oracle tables into INFO files. An alternative approach would be to construct a view in the Oracle database after the user has made the required selections (date range, parameter name, etc.).

Another complication arises when obtaining depth profiles. The data model used at Metro defines a locator, or water sampling point, as a two dimensional point. Therefore, when samples are taken at different depths, the records are all stored under the same locator key, but another "parameter" called 'Sample Depth' is also inserted into the table. Therefore, another join must be written into the query to obtain the data. For example, to obtain all values of temperature from locator '0612' in the sites 'MAJOR LAKES' from the sample taken February 23, 1995, the query would be:

select adata.numvalue,  bdata.numvalue from data adata, data bdata, summary where

summary.site = 'MAJOR LAKES' and summary.locator = '0612' and 

adata.parmname = 'Sample Temperature' and bdata.parmname = 'Sample Depth' and summary.collectdate = '23-FEB-95' and adata.datasetid = bdata.datasetid and 

adata.datasetid = summary.datasetid;

Eddie Speer
King County Dept of Metropolitan Services
821 2nd Avenue
Seattle, WA 98104
eddie.speer@metrokc.gov
(206) 684-2071