Michael Plastino


EPA Office of Water Georeferencing Plan:

Towards a More Integrated EPA Information Framework


Abstract

The EPA Office of Water (OW) is initiating georeferencing of all water-related information to National Hydrography Dataset (NHD) reaches and 8-digit Hydrologic Unit Code (HUC) watersheds. EPA's Reach Indexing Tool and other geocoding tools will be utilized to assign reach addresses and watersheds for point and nonpoint entities. Addresses will be stored in EPA's NHD Reach Address Database and HUC common reference tables. EPA and the public will be able to seamlessly access water program information through EPA's EnviroMapper, improving implementation and oversight of watershed-scale programs such as the Total Maximum Daily Load and Unified Watershed Assessment programs.


Introduction

The purpose of this paper is to:

.Highlight the need for a more comprehensive integration of EPA's environmental information;

.Discuss how reliance solely upon current standards and policies on information integration could undermine confidence in several watershed-related programs;

.Present Office of Water (OW) efforts at initiating a geo-referencing framework through reach and watershed geo-referencing; and

.Introduce the need to create a comprehensive EPA Geo-Referencing Framework.

Background

The EPA is increasingly faced with the need to integrate data at various spatial scales such as the reach and watershed. This data integration effort is being conducted to support EPA s shift to the more holistic and multi-media, watershed-based environmental protection strategy.

Recent integration efforts have geo-referenced EPA water information to reaches and watersheds. This geo-referencing can effectively define a receiving reach and watershed for all point and nonpoint entities. This ability improves the quality of not only OW watershed-scale programs such as the Total Maximum Daily Load (TMDL) and Unified Watershed Assessment (UWA) programs, but also Agency-wide programs ranging from permitting and monitoring to environmental justice and compliance assurance. Additionally, geo-referencing integration efforts improve Web-based GIS tools such as EnviroMapper and the Index of Watershed Indicators that serve the Public Right to Know.

Associating pollutant and other point and nonpoint entities with reaches and watersheds facilitates pollutant tracking across political jurisdictions. Geographic reach and watershed addresses are comparable to the political specification of streets and zip codes. Reach geo-referencing generally specifies a portion of a reach (or multiple reaches), analogous to a street address specifying a portion of a street. Figure 1 demonstrates the assignment of a reach address to a programmatic entity of interest.

Figure 1: Example Assignment of Reach Addresses to Programmatic Entities of Interest

Reach Addressing Figure

Once reach addresses are assigned, a wide variety of point and nonpoint source information could be accessed through applications such as EnviroMapper (Figure 2).

Figure 2: Example of EnviroMapper Access to Reach Addressed Information

EnviroMapper Access to Reach Addressed Information

The ability to correctly associate water information with reach addresses depends highly upon the quality of geographic (latitude/longitude) coordinates of features of interest. While geographic coordinates of EPA data are occasionally collected through Global Positioning System (GPS) and other high quality techniques, the bulk of EPA's geographic coordinates have been derived from politically-defined locations such as a street address, a zip code, or a town. Coordinates derived from politically-defined locations can be inaccurate for many reasons, including:

Accuracies (90th percentile) of politically-derived coordinates range from 150 meters for highest quality street address-derived coordinates to over 10,000 meters for zip code centroid-derived coordinates (Yorczyk and Hall, 1996). A 1997 sample cross-section of 1,307,289 EPA entities requiring assignment of coordinates, revealed that 43% of the entities could only be assigned coordinates using zip code centroids (EPA OIRM, 1997).

While the inaccuracy of coordinates greatly limits the success of geo-referencing integration efforts, these efforts are also hampered by a lack of geo-referencing standards for reporting formats and documentation of methods. In order for geo-referencing to be fully effective and efficient, a comprehensive EPA Geo-Referencing Framework is required. Since pollutants also travel to water bodies through the air and the ground, the EPA Geo-Referencing Framework should include not only reaches and watersheds, but eventually groundwater aquifers and airsheds (detailed delineations for aquifers and airsheds are currently not available at the national level).

Current Status of EPA Geo-Referencing Framework

Although there is currently no official EPA Geo-Referencing Framework, as one of the six Reinventing Environmental Information (REI) Standards, the Latitude/Longitude Data Standard (LLDS) provides a good first step towards achieving such a framework. The LLS defines mandatory and optional data elements for all programs that are required to record locational information under the Locational Data Policy (LDP). The standard specifies reporting formats and require the documentation of the method, accuracy, and description by which any reported geographic coordinates are determined. Another REI standard, the Facility Data Standard (FDS) specifies that any geographic coordinates reported by regulated facilities comply with the LLDS.

The above standards are important for improving identification of facilities and collection of geographic coordinates. Equivalent standards currently do not exist for determining, recording, and documenting the geo-referencing of EPA data to reaches and watersheds. Information has already been referenced to watersheds in some EPA water information systems and to reaches in a few systems. The following sections highlight some of the numerous problems that arise due to EPA's use of different methods for translating various existing locational information, such as street address and geographic coordinates, to reach and watershed addresses.

Difficulties Representing Location of Sources, Reaches, and Watersheds

Geographic coordinates are the preferred location information for geo-referencing. The simple association of the data geographic coordinates with reaches and watersheds, however, often produces inadequate results due to:

1) Uncertainty in the geographic coordinates, including
 

  • Error involved in coordinate determination techniques that use street address, zip code, or other politically defined addresses. This error can range from 150 meters to over 10,000 meters.

  •  
  • Error in direct determination of geographic coordinates. Surveying techniques, including several Global Position System (GPS) techniques, can be 100 meters or greater (Yorczyk and Hall, 1996).

  •  

     

    2) Uncertainty in the representation of reaches and watershed boundaries, including
     

  • Errors in creation of topological paper maps such as missing streams and inconsistent stream densities arising from different creation methods.

  •  
  • Errors associated with inconsistencies in translating information on the paper maps to digital reach networks and watershed boundaries. As inferred from conformance with the National Map Accuracy Standards, the 90th percentile accuracy of the National Hydrography Dataset (NHD) reach network and the Hydrologic Unit Code (HUC) watershed boundaries is approximately 75 meters and 125 meters respectively.

  •  

     

    3) Uncertainty arising from the process of bringing together coordinate data with reach networks and watershed boundaries, including
     

  • Processing of coordinates and/or reach networks and watershed boundaries to bring them all into the same coordinate system (e.g. State   Plane, UTM, Latitude/Longitude) and the same reference datum (e.g. NAD27, NAD83, WGS84). Coordinate system and reference datum changes can lead to errors of greater than 100 meters.

  •  

     

    These three types of error in combination inevitably lead to a 'gap' between reality and what is ultimately displayed on a computer screen. Associating a point with a reach and therefore a watershed based solely on geographic proximity, as represented on a computer, can therefore yield inaccurate results.

    Difficulties Due to Topography

    Even if the gap between reality and computer representation did not exist, association of points to reaches and watershed can be incorrect if they do not account for local topography, as demonstrated in Figure 3.

    Figure 3: Improper Association of Data Source Location with Reach and Watershed

    If association is based only on proximity of data source coordinates to reaches, the treatment facility below would be associated with Watershed A. However due to the intervening watershed boundary, the proper association is with Watershed B.

    The following examples illustrate the implications on water program implementation that can result from geo-referencing efforts that use inaccurate coordinates, or that use methods that do not account for topography:
     

  • Associating a feature with the incorrect reach
  • Use of geographic coordinates with large potential error creates great uncertainty in the association of point and nonpoint features with the National Hydrography Dataset (NHD) reaches. Features could be associated with the wrong water body. This uncertainty threatens the validity of reach-based programs, such as TMDL assessments. To minimize litigation associated with improper implementation of TMDL and other reach-based programs, it is necessary to use highly accurate coordinates.
  • Associating a feature with the incorrect watershed
  • Point and nonpoint sources with inaccurate coordinates could be incorrectly associated with 8-digit USGS Hydrologic Unit Codes (HUC) watersheds utilized in Unified Watershed Assessments, the Index of Watershed Indicators, EnviroMapper, and other watershed-based programs and resources. When smaller watershed delineations (10 and 12-digit HUC watersheds) become available, inaccurate coordinates will increase the likelihood that a point will be associated with the wrong watershed, eroding public confidence in these programs and resources.


    Difficulties Due to Non-Uniform Methods and Documentation

    In addition to problems associated with topography and potential sources of error, geo-referencing efforts can also be plagued by lack of agency-wide standards for geo-referencing methodology. For example, OW's Listing of Fish and Wildlife Advisories contains very useful information, but uses a reach addressing scheme unique to the project, preventing advisory data from being easily integrated with OW data geo-referenced to defacto standard reach network, currently Reach File 3 (RF3). Clearly, establishing agency standards for reach networks as well as for watersheds is essential for efficient use of EPA information resources. The upcoming NHD reach network and the 8-digit HUC watersheds are recommended standards. Methods and documentation for geo-referencing data to these standards also need to be defined and implemented.

    Office of Water Geo-Referencing Plan

    The OW Water Information Management Advisory Committee (WIMAC) is developing plans to geo-reference OW water information system data to reaches and watersheds. OW is encouraging agency-wide examination of the plan, both during its development and implementation, for potential use of the plan or components of the plan, in the establishment of an EPA Geo-Referencing Framework.

    Establishing an OW Geo-Referencing Framework

    The OW Geo-Referencing Plan will associate all OW water system information to reaches and watersheds by establishing the NHD and the 8-digit HUC respectively as standard reach networks and watershed delineations. The NHD (USGS/EPA, 1999), completed in February 2000, supercedes the RF3 (EPA OW, 1998a). The reach addresses assigned to water system information will be stored in the upcoming NHD-Reach Address Database (RAD), due for completion in March 2000. The first portion of each reach address specifies the 8-digit HUC. Water information currently associated with RF3 and stored in the RF3-RAD (EPA OW, 1998c) will be migrated to the NHD-RAD.

    EPA s NHD-Reach Indexing Tool (RIT) (EPA OW, 1999a), completed in April 2000, will be the primary instrument for performing reach      addressing. The NHD-RIT will be designed similarly to other NHD applications and will incorporate lessons learned during Statesuse of the RF3-RIT for 303(d) TMDL listings. The NHD-RIT, NHD-RAD, and NHD form the core of the OW Geo-Referencing Framework, shown in Figure 4.

    Figure 4: OW Geo-Referencing Framework

    The NHD-RIT and other geo-referencing tools will utilize geographic coordinates currently maintained by EPA as well as those collected by states and regions. OW will work with the Agency on collecting this existing coordinate data, emphasizing use of high-quality coordinates. In cases where this data is not available, OW may resort to using politically-derived, low-quality geographic coordinates, such as those derived from the facility street address or the zip code centroid.

    Future Directions

    In addition to geo-referencing current OW data to reaches and watersheds, OW will continue to advocate improvements in latitude/longitude collection, such as more accurate collection methods and collection of coordinates at the point of interaction with the environment rather than the location of the facility or office. OW will also explore the possibility of having data collection parties specify the receiving reach and watershed associated with entities. This data collector specification of reach and watershed could reduce error by avoiding the sources of error leading to the gap discussed earlier.

    Efforts are currently under way to create more accurate reach network and watershed delineations. The National Elevation Dataset (USGS EROS Data Center, 1999), for example, offers the ability to supplement or improve the NHD and the HUC 8-digit, 10-digit, and 12-digit codes. OW is involved with these improvement efforts and will investigate incorporation of improved reach networks and watershed delineations into the OW Geo-Referencing Framework.

    Additionally, OW will monitor and investigate database technology developments which could allow for a more efficient and effective method of geo-referencing. Developing spatial indexing functions of Oracle's Spatial Data Cartridge or Esri's Spatial Database Engine, for example, could obviate the need for storing reach addresses in a central repository.

    Conclusion

    In summary, the OW Geo-Referencing Plan will:

    1) Develop OW Standards by:
    .Defining NHD and the 8-digit HUC as standards for reach networks and watershed delineations respectively;
    .Establishing the RAD as a common reference table for NHD;
    .Establishing common reference tables for 8-digit HUCs;
    .Describing and Presenting Methods for Geo-Referencing; and
    .Outlining Documentation Requirements for Geo-Referencing to Reaches and Watersheds, similar to the Latitude/Longitude Data Standard documentation.
    2) Develop a plan to geo-reference all OW water system information to reaches and watersheds.

    In addition to implementing the OW Geo-Referencing Plan, the WIMAC will play an active role within OW and throughout the agency in:

    1) Advocating the continuing improvement of latitude/longitude collection through both
    .Collection of coordinates from the point of interaction with the environment rather than at the facility or office street address; and
    .Use of high resolution Global Positioning System (GPS) techniques.
    2) Exploring the possibility of data collector specification of receiving reaches and watersheds by
    .Encouraging optional data collectors' specification of a receiving reach and watershed;
    .Modifying the Locational Data Policy to require specification of a receiving reach and watershed in addition to geographic coordinates; and
    .Creating a new agency standard to specify Agency-wide common reach and watershed reporting and documentation methods for water data.

    As OW Geo-Referencing efforts progresses, the EPA may benefit from implementing OW-tested components of the OW Geo-Referencing Plan. OW establishment of the NHD reach network and HUC watersheds as Agency standards, as well as common geo-referencing methodologies and documentation, could provide more efficient use of EPA information resources. The WIMAC is committed to assisting the EPA in any efforts to implement components of an EPA Geo-Referencing Framework. OW supports the addition of airshed and groundwater components to the framework when detailed delineations of these components become available at the national level.

    References

    EPA OIRM, March 1997, Latitude/Longitude Values Report, Prepared by EPA Systems

    Development Center, Science Applications International Corporation.

    EPA OW, October 1998, The U.S. EPA Reach File Version 3.0 Alpha Release (RF3-Alpha)

    Technical Reference, http://www.epa.gov/owow/monitoring/rf/techref.html.

    EPA OW, November 1998, Reach Addressing Tool Assessment Document, Prepared by Indus
    Corporation for EPA OW/OWOW.

    EPA OW, December 1998, Reach Addressing Database Design and Development Support:

    Physical Database Design Document, Prepared by Indus Corporation for EPA
    OW/OWOW.

    EPA OW, October 1999, Reach Indexing Tool for the National Hydrography Dataset (NHD-
    RIT) Requirements Document, Prepared by Center for Environmental Analysis, Research

    Triangle Institute for EPA OW/OWOW

    USGS/EPA, July 1999, National Hydrography Dataset, http://nhd.usgs.gov/

    Yorczyk, Rick (NGS) and Hall, Loren (EPA), July 1996, Location Data Collection Method
    Default Accuracies, Table.


    Author Information
    Michael Plastino
    Information Resource Specialist, plastino.michael@epa.gov
    Environmental Protection Agency, Office of Water
    1200 Pennsylvania Avenue N.W., Mail Code 4102
    Washington, D.C. 20460