Jonathan M. Pickus, William B. Samuels, and David E. Amstutz

APPLYING GIS TO IDENTIFY EFFLUENT DOMINATED WATERS IN CALIFORNIA

This paper describes a GIS-based approach for identifying and characterizing the effluent dominated waters (EDW) of California on a statewide basis. This system integrates the geo-spatial and hydrographic characteristics of NHD, PCS, NWIS, ERF1, IFD, and STORET databases into a single architecture and systematically evaluates the hydrology within each Hydrologic Unit (HUC). The system is designed to give State and Federal resource managers a tool with which to isolate those streams where statewide water quality standards and designated uses may be applied differently from streams where the predominant flow is from natural sources. The work reported here represents the initialization of our effort to develop a fully automated EDW identification system for California. To date, the EDW application has been applied to only a sampling of the state's HUC's.

INTRODUCTION

The purpose of this project was to provide a geographic information system (GIS)-based approach for identifying and characterizing effluent dependent waters (EDW) on a statewide basis for the California State Water Resources Control Board (SWRCB). This is the first step in the analysis of the changes that can occur in ephemeral streams or streams with minimal flow when wastewaters are discharged into them. In addition, this GIS analysis can be used to isolate those streams where statewide water quality standards and designated uses may be applied differently from streams where the predominant flow is from natural sources.

Effluent dependent or effluent dominated waters (EDW) are typically defined as surface waters that consist primarily of discharges of treated wastewater and runoff from urban and agricultural areas. Many of these water bodies would be ephemeral in the absence of these discharges. These waters differ significantly from natural water bodies because they exist primarily due to human activity. These water bodies pose unique challenges to water quality and beneficial use protection that may require different approaches than those used for perennial waters.

Previous reports on the EDW issues (Effluent-Dependent Waters Task Force, 1995; Metropolitan Water District, 2000; Regional Water Quality Control Board, 2000) provide: (1) some definitions of EDWs, (2) discussion of applicable laws and regulations, (3) lists of parameters of concern, (4) beneficial uses and water quality objectives, and (5) recommended management approaches (i.e., the watershed approach vs. Total Maximum Daily Load (TMDL). A systematic GIS-based analysis to identify and characterize EDWs, using available digital databases, had yet to be performed.

The hypothesis of the project was that a stream could be characterized quantitatively as an EDW, by deriving the ratio between its flow and a facility discharge on that stream, where "discharge" refers to sites that are releasing effluent and "flow" refers to the gaged volume. If the ratio, known as Flow Indicator (FI= QDISCHARGE / QSTREAM), was greater than or equal to X% (X= a value to be determined), then the stream could be characterized as EDW, or at least warrant further field inspection. The following simple hypothetical scenario illustrates an example of this relationship; a single reach is identified, a single facility is associated with that reach, and two flow gages are located (one up stream and one immediately down stream of the facility). Assuming that these features have known flows, it would be possible to characterize the reach as either effluent dominated or not by calculating the FI. Figure 1, illustrates this scenario.

Figure 1. Example Scenario For Determining An Effluent Dominated Stream.

METHODOLOGY

The technical approach for this project consisted of three sub-tasks. The first task was to develop a system to inventory and characterize data (both spatial and monitoring) required for the identification of effluent dominated waters. Once this task was completed, a GIS-based system was designed and a methodology developed to identify and characterize effluent dominated waters. This methodology was then applied to 10 pilot watersheds defined by USGS hydrologic unit (HUC) boundaries. A review phase was then implemented that determined that the technique established in the pilot phase was promising enough that a production phase would be initiated where the EDW methodology would be applied to the entire state of California (155 HUCs). These steps are described briefly below.

Database Inventory and Characterization

The project required both spatial location data for the GIS and flow statistics to calculate the FI. In addition, the data had to be extended to the entire state of California to meet the production phase requirements. With these criteria in mind, a database inventory was performed using both Federal and State (CA) databases. The inventory focused on the locations of: stream, dischargers, and gages, as well as the statistical flow information associated with these features. The databases were either assimilated into a "master" database or multiple databases for feature types included in the project because there was no one single database that best represented each of these features.

Hydrography

For example, both the USGS National Hydrography Dataset (NHD) and the EPA River Reach File1 (ERF1) databases were chosen to represent the streams in the GIS The NHD dataset is the culmination of recent efforts of the USEPA and USGS. It combines elements of the USGS digital line graph hydrography files and the USEPA Reach File (RF3). Key characteristics of the NHD include: a feature-based dataset that interconnects and uniquely identifies the stream segments or reaches at a scale of 1:100,000, attributes including hydrographic category (intermittent, perennial, unspecified), and stream name. The ERF1 dataset is a 1:500,000 scale surface water digital database that was enhanced by the USGS by adding mean flow and mean velocity values for all 68,000 reaches in the file. While a reach could be identified by the "stream name" in either database, the larger scale NHD provided more robust spatial data coverage and the ERF1 included statistical flow information associated to reaches. Rather than eliminate one of the databases, both were used in the application design.

Stream Flow

A further review of the available sources of stream flow data included the following datasets: the US EPA's Storage and Retrieval (STORET) gaging stations, the USGS's National Water Storage and Retrieval System (WATSTORE) Stream Flow Basin Characteristics File, the USGS National Water Information System (NWIS) Historical Flow Database, and the USGS's Hydroclimate Data Network (HCDN).

The STORET stream flow dataset is an inventory of surface water gaging stations data that includes 7-Q-10 low and monthly stream flows. For the purposes of this project, the STORET data was extracted from EPA's BASINS system. The WATSTORE dataset contains information about selected active and discontinued USGS streamflow gaging stations through the water year 1986. NWIS is part of the USGS program of disseminating water data to the public through a distributed network of computers and file servers The type of water data collected in NWIS includes: site information, flow, stage height, and peak flow. The data can be retrieved for either historic or current conditions. Averaging the monthly mean flow for each gage station of record derived the annual mean flow for the NWIS dataset. The HCDN dataset is an extract of the WATSTORE dataset. The extraction was performed in accordance with strictly defined criteria of measurement accuracy and natural conditions. For example, no reconstructed records of "natural flow" were permitted, nor was any record extended or were missing values "filled in" using computational algorithms. If the streamflow at a station was judged to be free of controls for only a part of the entire period of record that was available for the station, then only that part was included in the HCDN, but only if it was of sufficient length (generally 20 years) to warrant inclusion. In addition to the daily mean discharge values, complete station identification information and basin characteristics were retrieved from WATSTORE for inclusion in the HCDN. Statistical characteristics, including the monthly mean discharge, as well as the annual mean, minimum and maximum discharge values, were derived for the records in the HCDN dataset.

An analysis of the datasets revealed a number of issues. Not every gaging station had streamflow data associated to it because of the assorted sources of data. In addition, the streamflow data did not always correlate between the flow data when it did exist on the same gaging station. It was impractical to select a single "best" dataset. As a result, merging all these datasets created a "master" gage dataset that retained the streamflow data from each source. This master dataset increased the probability that a streamflow could be calculated using all the datasets.

Facility Discharge

There was no single facility database that completely represented all the discharges in California. Datasets assessed for their facility information included EPA's Permits Compliance System (PCS) database and their Industrial Facilities Discharge (IFD) database. The PCS database contained location information, receiving water, design flow, permit limits and discharge monitoring reports (DMR) that contained a facility's conduit flow. The IFD database contained municipal and industrial facilities that have National Pollutant Discharge Elimination System (NPDES) permits. This database supplemented the PCS since it contained additional minor facilities not included in PCS. Consequently, combining these two datasets created the "master" facility dataset used in the application. Each facility's flow value could be derived from either its design flow or from its annual mean flow (typically calculated over a five year period).

Watersheds

A 1:24,000 scale Hydrologic Unit Boundary dataset from the California Department of Forestry (CALWATER 2.2) was added to the project. The HUC Boundaries delineated the state into manageable subdivisions for analysis. Both the NHD and ERF1 datasets were organized by 8-digit HUC boundaries as well. From a data management perspective, this enabled a scheme for the high resolution NHD datasets to be loaded into the system on an "as-needed" basis.

In summary, the major datasets included in the GIS system were two stream datasets that contained stream names (NHD and ERF1), a master gage dataset (derived from PCS and IFD) that contained a facilities design flow and annual mean flow, and a master gage dataset (derived from ERF1, BASINS, NWIS, WATSTORE, and HCDN) that contained annual mean streamflows. The system was segregated by 8-Digit HUC boundaries.

Pilot Phase

The Pilot Phase of the project consisted of selecting 10 HUCs for use in developing and testing procedures for identifying EDWs. The California SWRCB selected the HUCs based on the following criteria: (1) they contain known EDWs, and (2) they are as geographically distributed across the state as possible. The 10 HUCs included: the Lower Sacramento, the Upper Yuba, the Upper Coon-Upper Auburn, the Middle San Joaquin-Lower Chowchilla, the Middle San Joaquin-Lower Merced-Lower Stanislaus, the Upper Cosumnes, the Santa Clara, the Los Angeles, the Santa Ana, and the Santa Margarita. Figure 2 illustrated the locations of these 10 Pilot HUCs.

Figure 2. Locations of 10 Pilot HUCs

The first step in the Pilot Phase was to develop a procedure to classify waterbodies in the 10 pilot HUCs. Before developing the procedural code, two test HUCs (Santa Margarita and the Upper Yuba) were selected and the following procedures were applied by hand on a reach-by-reach basis.

Identify a NHD reach to perform the EDW analysis upon.
Select all facilities that are located within 300 meters of the NHD stream. The resulting selected facilities represent potential sources of effluent in the stream. The amount of effluent is derived from the annual mean flow in each facility. The fixed distance criterion was based on the visual inspection of the facilities and reaches within the HUC.
Verify that each selected facility is associated with its closest stream. This is accomplished by examining the stream name in: the NHD dataset, the ERF1 dataset, and the receiving name in the master facility (PCS-IFD) dataset. Continue with the analysis if the stream names match the receiving name in the associated facility.
If the stream names from the ERF1 and the master facility dataset match, then it is possible to calculate the FI. This is accomplished by linking the common gage-id in each dataset. The FI for the stream is calculated by dividing the annual mean flow (master facility dataset) by the annual mean flow (master gage dataset). A maximum of five FI values can be calculated if the associated annual flow exists for ERF1, BASINS, NWIS, WATSTORE, and HCDN.
If the ERF1 stream name is not associated to the stream, then the closest gage to the stream is selected and the FI calculated using the four possible annual flows in the master gage dataset.

The multiple FI values reported in steps 4 and 5 reflect the varied sources of gage flow data in the master gage dataset. As mentioned earlier, not every source had flow data for every station or when they did, the flow values may have varied significantly. Greater confidence was given to flows of similar values. As a result, reporting the universe of flow and FI values maximized the probability that a "best" FI could be derived.

In association with this procedure, sets of rules were defined to support the EDW classification process. The following rules were defined in a hierarchical classification scheme, consisting of four categories:

All streams that had a facility close to them were by default classified as UNKNOWN.
If two out of three stream names matched (NHD, ERF1, or the master facility dataset), then there was more confidence that their associated flows (facility and stream) were associated with the same stream. Consequently, the stream was classified as a POTENTIAL EDW. A stream could also be classified as POTENTIAL EDW, if no stream names matched and there was no gage associated directly with the reach, but spatially the facility appeared to discharge to the stream.
A stream was classified as NOT EDW, if the names of the stream matched, a gage was located downstream on the reach, but the associated FI was less then .5 (50%).
A stream was classified as EDW, if the stream names matched and there was a gage located downstream and the associated FI was greater than or equal to .5 (50%).

Performing the EDW procedure by hand exposed the true complexity of the problem. For example, there were several observations where either the dischargers could not be associated with a reach or the gages could not be associated with the reach. This was in part due to the inaccuracy of the facility location data. Examining the discharge addresses using Street Map 2000, suggested that some permit addresses might cite the location of an administrative building and not the location of the releasing facility. It was also possible that the discharges from some facilities were transported by pipeline to the point for release into a stream. Another issue that needed to be addressed was that the brief periods of record for a discharge (typically 5 years) did not reflect the variety of flow observed during the much longer periods of the gage records. As a result, a more comparable facility flow value was derived from the "design" flow instead of its "annual mean" flow. It was also obvious that the EDW procedures could not be fully automated. Some degree of user interaction was necessary to identify a gage location in relation to a discharge (either upstream or downstream), identify multiple facilities discharging on a single stream, associate streams using assorted naming conventions (i.e. "River", "R.", "river") and define a distance criteria to select the facilities on a reach, all needed to be addressed with some degree of user interaction.

On the other hand, the procedure did successfully characterize some streams. This occurred when the reach could be identified, gages and facilities were located on the stream, and their associated flows available to calculate the FI.

With these issues in mind, a test system was developed and applied to the 10 pilot HUCs. The test system utilized the procedural steps and rules outlined in the previous section. The steps required to perform an EDW characterization were automated where possible. Modifications to the design were also administered to address some of the issues identified earlier. The following illustrates the pilot system in more detail.

A HUC is selected and the associated NHD data is automatically loaded and symbolized in the View. NHD summary information is then displayed on the screen.
The distance threshold, used to select potential streams with which facilities may discharge into, is now user-defined. A tally of facilities located within this threshold is immediately calculated and displayed. Given that no optimal threshold exists at this time, the threshold distance may require calibration. Before continuing the analysis, an opportunity to adjust the threshold and reassess the facilities is provided. When an acceptable distance is defined, a new view is created containing all the same themes, merely clipped to the HUC boundary. The EDW analysis is more efficient when the dataset sizes are reduced.
As indicated earlier, the reach-by-reach analysis continues sequentially by;
1. Confirming the identification of the stream segment through the comparison of NHD, ERF1, and master facility (receiving stream) datasets,
2. Identifying the associated stream gage if multiple gages exist,
3. Selecting the "best" annual mean flow value available from the master gage dataset (ERF1, BASINS, NWIS, WATSTORE, and HCDN),
4. Calculating the FI values based on the "best" flow value and facility design flow (Q DESIGN FLOW / Q ANNUAL MEAN FLOW),
5. Defining the stream EDW characteristic when the resulting FI value exceeds a user-defined threshold (i.e. 50%).

The "Flow Indicator Display", as illustrated in Figure 3, facilitates the analysis (a through e). This display tool integrates all the known relevant information (stream names, gage and discharge flows) into a single document.

Figure 3. Flow Indicator Display Tool

The Flow Indicator Display contains the following information:

an area that identifies the number of streams to characterize (both total and current),

an area to classify the streams (UNKNOWN, NOT EDW, POTENTIAL EDW, and EDW),

an area displaying the three possible names of the current stream (NHD, ERF1, and the master gage dataset),

an area (Gage Name pull-down list) where all the stream gages are available for review. Each gage in the list can be identified by using the "Gage ID Tool" and selecting the gage on the map display.

and finally an area where the design flow and stream flows are indicated and the associated FI values are automatically calculated. The stream flow and associated FI values, reflect the currently selected stream gage when multiple gages are available.

A stream is now characterized through a combination of the ArcView Map Display, the Flow Indicator Display, and the rules illustrated in Figure 4. The rules are designed to aide in systematically stepping the user through the analysis. The green triangles indicate a procedural test. The text outside the boxes and arrows reflect the possible test results and associated path to follow. The red rectangles contain the characterization classification. The analysis begins on the left side of the figure and systematically proceeds to the right until each stream is characterized as either EDW, NOT EDW, or POTENTIAL EDW.

Figure 4. EDW Rules

Beginning with the first record, the stream names are visually compared in the Flow Indicator Tool. Using the rules, a visual inspection of the ArcView Map display will confirm if the associated facility is spatially located near the stream. The stream is characterized as NOT EDW when the facility is not located near the stream. The Flow Indicator Display lists all of the facilities and stream gages that are either associated with the stream (through its attribute information) or spatially located within the user defined distance threshold. The stream is characterized as POTENTIAL EDW when the facility is near the stream, but no stream gage is available. Potential EDW indicates that there is not enough information to properly characterize the stream one way or another. The "Gage Name" pull-down list in the Flow Indicator Display will list all the gage names when multiple gages are associated with a stream. When the user-selected "best" stream gage is identified in the map display, it's corresponding flows are displayed and FI calculated. The streams with FI greater or equal to a user-defined threshold are characterized as EDW. The stream final characterization is selected in the Flow Indicator Tool and the record is saved. The next stream is selected for analysis and the process repeats until all streams in the HUC with a facility near them are characterized.

The results of characterizing the streams in the 10 pilot HUCs were promising. As seen in the chart below, this application systematically characterized almost 200 streams. One hundred twelve of these streams could be further characterized as either an EDW or potentially an EDW. Figure 5, summarizes the results of the Pilot Phase.

HUC	HUC NAME	FEATURE TOTAL	EDW FEATURES CHARACTERIZED	POTENTIAL OR EDW	EDW
18020109	LOWER SARCAMENTO	5,605	48	22	0
18020125	UPPER YUBA	1,668	3	2	0
18020127	UPPER COON-UPPER AUBURN	131	3	2	0
18040001	MIDDLE SAN JOAQUIN-LOWER CHOWCHILLA	2,283	11	8	1
18040002	MIDDLE SAN JOAQUIN-LOWER MERCED-LOWER STANISLAUS	1,920	17	12	0
18040013	UPPER COSUMNES	795	1	1	0
18070102	SANTA CLARA	2,117	10	8	0
18070105	LOS ANGELES	573	31	18	2
18070203	SANTA ANA	2,023	62	37	4
18070302	SANTA MARGARITA	703	3	2	0
TOTAL		17,820	189	112	7

Figure 5. Pilot HUC Summary

Consequently, from a universe of almost 18,000 hydrologic features, 112 streams were recognized as credible enough to warrant further investigation. The results of a field analysis of these 112 streams will confirm the validity of the system results. If the results of a field study deem promising, then the process of characterizing EDW streams becomes a cost-effective and manageable proposition.

Production Phase

The EDW Characterization System created in the Pilot Phase produced a set of procedures and rules applied to 10 test HUCs. The production phase expanded the system capabilities to analyze all 155 California HUCs. The EDW Characterization System was then delivered to the SWRCB for further testing and validation.

SUMMARY

This EDW Characterization System integrates currently available datasets into a GIS to systematically evaluate all of the streams in California. While much of the current system is automated, human interaction is still required to perform the analysis. The system has been applied to 10 of California's 155 HUC's; each was chosen to reflect the broad spectra of environmental complexity, agriculture and industrial development found throughout the state. A larger sample size will be required for determining the skill and sensitivity of the complete application. Further testing and field validation is recommended to complete the evaluation.

Due to the complexity of the problem and the lack of accurate and complete data, the EDW application process results must be examined closely; and, until a more thorough procedure can be established, the automated system remains in development.

REFERENCES

Effluent-Dependent Waters Task Force, 1995, Report of the Effluent-Dependent Waters Task Force For Consideration of Issues Related To The Inland Surface Waters Plan, October, 1995. http://www.swrcb.ca.gov/general/publications/docs/effluent-dependent-waters-1995.pdf

Metropolitan Water District, 2000, Challenges Facing Imported Water Supplier and Wastewater Dischargers from Effluent Dependent Waterbodies and TMDLs, August 2000.

Regional Water Quality Control Board, 2000, Effluent Dominated Water Bodies, Draft Report, September 2000.

Jonathan M. Pickus
Geographer
Science Applications International Corporation
Hazard Assessment and Simulation Division
1710 Goodridge Drive
McLean, VA 22102
Telephone: (703) 827-4814
Fax: (703) 356-8408
email: Jonathan.M.Pickus@saic.com

William B. Samuels
Senior Scientist
Science Applications International Corporation
Hazard Assessment and Simulation Division
1710 Goodridge Drive
McLean, VA 22102
Telephone: (703) 556-7074 Fax: (703) 356-8408
email: william.b.samuels@saic.com

David E. Amstutz
Senior Scientist
Science Applications International Corporation
Hazard Assessment and Simulation Division
1710 Goodridge Drive
McLean, VA 22102
Telephone: (540) 972-7106 Fax: (703) 356-8408
email: david.e.amstutz@saic.com