DEVELOPMENT AND IMPLEMENTATION OF FEATURE LEVEL METADATA FOR EPA REGIONS

With the expanded efforts to gather spatial data in the U.S. Environmental Protection Agency (EPA) and the increased use of GIS as a tool for environmental programs, the EPA Regions concluded that there is a need to manage the locational accuracy of feature data. Up to this point, EPA's Regional GIS programs have had no consistent way to store or retrieve locational data obtained from sources such as GPS or geo-coding, or from most of the Agency's own databases.

Although the GIS programs do not own the Agency data, and, therefore, have little control over the locational data stored in these databases, they do have an inherent responsibility of using the most accurate data available. In order to meet this need and to provide a means of improving efficiency, and maintaining and improving product quality, the EPA Regions and the National GIS Program worked jointly to develop a standard Oracle-based feature level metadata database.


INTRODUCTION

As a regulatory Agency, the U.S. Environmental Protection Agency (EPA) has as one of its goals, the implementation and enforcement of environmental regulations in a fair, efficient and cost-effective manner. An inherent part of the overall environmental enforcement process is the accumulation of a multitude of information relating to regulated facilities as well as the environment. Over the years, the Agency has developed a number of databases for storage of such information to use in a variety of manners. With the introduction of GIS, it has become recognized that this technology has an important use in the administration of environmental programs. However, most of the Agency's databases relating to regulated facilities were developed and implemented prior to the recognition that there was a need for accurate locational information. In order for GIS to be utilized as an effective tool, it is critical that the locational data be accurate and well documented.

A brief assessment of EPA's existing databases revealed that many have fields for locational data, but in most cases associated fields containing information on the accuracy of the data and the method used to collect the data (i.e., metadata) are ether lacking or not populated. Also, no EPA databases contained adequate fields to store a high accuracy latitude and longitude and its locational metadata. While there are ongoing initiatives within the Agency to modify the databases to include appropriate metadata and locational information, the need for accurate locational data is already upon us.

The EPA GIS Technical Sub-Workgroup, comprised of technical GIS representatives from the National GIS program in EPA Headquarters and from each of the 10 EPA Regions, formed as a mechanism to share information and advance GIS technology throughout the Agency. Through this workgroup's activities, a growing concern was expressed over the lack of adequate standards and a uniform methodology for storing collected locational data and their attributes. Because spatial data is expensive to gather and highly critical for GIS work being performed within the Agency, the workgroup wanted to design a database that all EPA Regions could use for storing their locational data.

A team was formed to perform an analysis of requirements, formulate design specifications and develop a system that could be used to maintain accurate locational data until the time came when the Agency databases contained the capacity to meet this need. This system was named the Spatial Preferred Locational Attribute Table, or SPLAT for short.

SPLAT database diagram

DESIGN CONSIDERATIONS

The SPLAT database was designed to:

WHAT IS AN ENTITY?

One of the team's greatest hurdles was forming a consensus on the definition of an entity. An entity in SPLAT is any "thing" that has a location or locational attributes. Examples of an entity are a smoke stack, an underground injection well, an endangered species nesting site, a fence line that borders a superfund site, etc. Any unit that can be located spatially is an entity. The most common types of entities in SPLAT are EPA regulated facilities.

Records defining entities use a GIS_ID as a unique identifier. GIS_ID's are unique to the method of data collection, location or shape, and type of feature. In SPLAT, location means all the information necessary to describe the shape and location of a feature. Type of features are lines, points, areas, routes, or regions. Methods of data collection include all of the elements from the EPA's MAD codes.

An entity may have multiple SPLAT records, and one GIS_ID can have multiple entities. For example if several entities were matched to the same zip code centroid, then those facilities would all have the same GIS_ID - the location or shape is the same, the method (matching to zip code centroid) is the same, and the feature type (point) is the same.

An example of using more than one GIS_ID for one entity would be having addressed-matched coordinates and map interpolated coordinates for the same entity. Because there are two different methods used to obtain locational coordinates, there would be two feature types.

The matrix below displays various GIS_ID/ENTITY combinations:


Method        Location       Feature Type   GIS ID

Same          Same           Same           Same
Same          Different      Same           Different
Same          Same           Different      Different
Same          Different      Different      Different
Different     Same           Same           Different
Different     Different      Same           Different
Different     Same           Different      Different
Different     Different      Different      Different

PREFERRED LOCATION SELECTION METHODOLOGY

Just storing the locational data in a consistent, logical database does not solve the entire problem. There also needs to be an automated process to identify and flag the "preferred" locational coordinate pair (latitude and longitude) out of multiple coordinates for the same spatial entity (e.g., a regulated facility). In this document, "preferred" refers to the set of coordinates that the GIS users have the greatest confidence represents the actual location of the entity in the real world. Preference is largely dependent on the accuracy of the collection method, but is modified by additional knowledge or insight.

The following sequence describes the logical flow for selection of a preferred coordinate pair from a group of coordinate pairs for the same feature.

  1. Check accuracy_value in MAD table. All null records will be populated with a valid value from the accuracy lookup table.
  2. Convert accuracy_value to raw_score:
    if accuracy_value le 20 then raw_score = accuracy_value
    else raw_score = 15 + (accuracy_value ** 0.61)
  3. Apply modifier weights: verify_weight - scores of 0%, 5%, 10% and 15% depending on the precision of the test or user_def_weight - 0%, 5%, 10% and 15% for those locations known to be highly valid.
    modified_score = raw_score - (raw_score * verify_weight) - (raw_score * user_def_weight)
  4. For duplicate epa_id's, lowest modified_score record, calculate preferred = y

CONCLUSION

Through the hard work of the SPLAT team, the initial design and development of the Spatial Preferred Locational Attribute Table has been completed. Although the database structure is still evolving, it is currently being tested in two of the ten EPA Regions. Locational data is available on-line through this system for Regional use in GIS applications, including ready reference to its accuracy. Through the Regional Intranet, users can select a site and produce an accurate map of a specified entity and other features in the surrounding area.

The system will continue to be evaluated, tested and documented. Upon completion of this process, the final product will be made available for distribution to other Regions and interested state agencies.

ACKNOWLEDGEMENTS

The authors wish to thank the other members of the Spatial Preferred Locational Attribute Table Team for their participation and support: Pat Ausman, Robert Eckman, Don Evans, Phyllis Mann, Barry Bolka, Cheryl Henley, Randy Deardorff, Dave Wolf, Andy Battin and a special thanks to Loren Hall.


Vickie S. Damm
US Environmental Protection Agency
Region 7
726 Minnesota Ave.
Kansas City, KS 66101
Telephone: (913) 551-7247
Fax: (913) 551-7863
E-mail: damm.vickie@epamail.epa.gov

Kelvin L. Moseman
CDSI/EPA Region 6
Suite 1200
1445 Ross Ave.
Dallas, TX 75202
Telephone: (214) 665-8562
Fax: (214) 665-2146
E-mail: kmoseman@r6ser1.r06.epa.gov

Anthony R. Selle
US Environmental Protection Agency
Region 8
999 18th Street
Denver, CO 80202
Telephone: (303) 312-6774
Fax: (303) 312-6065
E-mail: selle.tony@epamail.epa.gov