DEVELOPMENT AND IMPLEMENTATION OF FEATURE LEVEL METADATA
FOR EPA REGIONS
With the expanded efforts to gather spatial data in the U.S.
Environmental Protection Agency
(EPA) and the increased use of GIS as a tool for environmental programs, the EPA Regions
concluded that there is a need to manage the locational accuracy of feature data. Up to this point,
EPA's Regional GIS programs have had no consistent way to store or retrieve locational data
obtained from sources such as GPS or geo-coding, or from most of the Agency's own
databases.
Although the GIS programs do not own the Agency data, and, therefore, have little control over
the locational data stored in these databases, they do have an inherent responsibility of using the
most accurate data available. In order to meet this need and to provide a means of improving
efficiency, and maintaining and improving product quality, the EPA Regions and the National
GIS Program worked jointly to develop a standard Oracle-based feature level metadata
database.
INTRODUCTION
As a regulatory Agency, the U.S. Environmental Protection Agency (EPA) has as one of its
goals,
the implementation and enforcement of environmental regulations in a fair, efficient and
cost-effective manner. An inherent part of the overall environmental enforcement process is the
accumulation of a multitude of information relating to regulated facilities as well as the
environment. Over the years, the Agency has developed a number of databases for storage of
such information to use in a variety of manners. With the introduction of GIS, it has
become recognized that this technology has an important use in the administration of
environmental programs. However, most of the Agency's databases relating to regulated facilities
were developed and implemented prior to the recognition that there was a need for accurate
locational information. In order for GIS to be utilized as an effective tool, it is critical that the
locational data be accurate and well documented.
A brief assessment of EPA's existing databases revealed
that many have fields for locational data,
but in most cases associated fields containing information on the accuracy of the data and the
method used to collect the data (i.e., metadata) are ether lacking or not populated. Also, no EPA
databases contained
adequate fields to store a high accuracy latitude and longitude and its locational metadata.
While there are
ongoing initiatives within the Agency to modify the databases to include appropriate metadata and
locational information, the need for accurate locational data is already upon us.
The EPA GIS Technical Sub-Workgroup, comprised of technical GIS representatives from the
National GIS program in EPA Headquarters and from each of the 10 EPA Regions, formed
as a mechanism to share information and advance GIS technology throughout the
Agency. Through this workgroup's activities, a growing concern was expressed over the lack of
adequate standards and a uniform methodology for storing collected locational data and their
attributes. Because spatial data is expensive to gather and highly critical for GIS work being
performed within the Agency, the workgroup
wanted to design a database that all EPA Regions could use for storing their locational data.
A team was formed to perform an analysis of requirements, formulate design specifications and
develop a system that could be used to maintain accurate locational data until the
time came when the Agency databases contained the capacity to meet this need.
This system was named the Spatial Preferred Locational Attribute Table, or SPLAT for
short.
SPLAT database diagram
DESIGN CONSIDERATIONS
The SPLAT database was designed to:
- Store, retrieve, update and manage locational data of different origin and quality through a
standardized system
- Preserve integrity of all feature data
- Standardize feature level metadata consistent with
EPA's Method Accuracy Description (MAD) code guidelines
- Provide easy access to locational entities
- Use a consistent, documented criteria for selecting preferred locations for use in GIS
products
- Support a standard method of GIS data distribution
- Provide regulated entity cross-referencing through logical and maintainable linkages
- Work in conjunction with
EPA's Envirofacts Warehouse
- Be a distributed locational data warehouse for the Agency
WHAT IS AN ENTITY?
One of the team's greatest hurdles was forming a consensus on the definition of
an entity. An entity in SPLAT is any "thing" that has a location or locational
attributes. Examples of an entity are a smoke stack, an underground injection well, an
endangered species nesting site,
a fence line that borders a
superfund site, etc. Any unit that can be located spatially is an entity. The most common types of
entities in SPLAT
are EPA regulated facilities.
Records defining entities use a GIS_ID as a unique identifier.
GIS_ID's are unique to the
method of data collection, location or shape, and type of feature. In SPLAT,
location means all
the information necessary to describe the shape and location of a feature. Type of features are
lines,
points, areas, routes, or regions. Methods of data collection include all of the elements
from the EPA's MAD codes.
An entity may have multiple SPLAT records, and one GIS_ID can have multiple entities.
For example if several
entities were matched to the same zip code centroid, then those facilities would
all have the same GIS_ID - the
location or shape is the same, the method (matching to zip code centroid) is the
same, and the feature type (point)
is the same.
An example of using more than one GIS_ID for one entity would be having addressed-matched
coordinates and map
interpolated coordinates for the same entity. Because there are two different methods used to
obtain locational coordinates, there would be two feature types.
The matrix below displays various GIS_ID/ENTITY combinations:
Method Location Feature Type GIS ID
Same Same Same Same
Same Different Same Different
Same Same Different Different
Same Different Different Different
Different Same Same Different
Different Different Same Different
Different Same Different Different
Different Different Different Different
PREFERRED LOCATION SELECTION METHODOLOGY
Just storing the locational data in a consistent, logical database does not solve the entire problem.
There also needs to be an automated process to identify and flag the "preferred" locational
coordinate pair (latitude and longitude) out of multiple coordinates for the same spatial entity
(e.g., a regulated facility). In this document, "preferred" refers to the set of coordinates that the
GIS users have the greatest confidence represents the actual location of the entity in the real
world. Preference is largely dependent on the accuracy of the collection method, but is modified
by additional knowledge or insight.
The following sequence describes the logical flow for selection of a preferred coordinate pair from
a group of coordinate pairs for the same feature.
- Check accuracy_value in MAD table. All null records will be populated with a valid value
from the accuracy lookup table.
- Convert accuracy_value to raw_score:
if accuracy_value le 20 then raw_score = accuracy_value
else raw_score = 15 + (accuracy_value ** 0.61)
- Apply modifier weights: verify_weight - scores of 0%, 5%, 10% and 15% depending on
the precision of the test or user_def_weight - 0%, 5%, 10% and 15% for those locations known
to be highly valid.
modified_score = raw_score - (raw_score * verify_weight) - (raw_score *
user_def_weight)
- For duplicate epa_id's, lowest modified_score record, calculate preferred = y
CONCLUSION
Through the hard work of the SPLAT team, the initial design and development of the Spatial
Preferred
Locational Attribute Table has been completed. Although the database structure is still
evolving, it is currently being tested in two of the ten EPA Regions. Locational data is
available on-line through this system for Regional use in GIS applications, including ready
reference to its
accuracy. Through the Regional Intranet, users can select a site and produce an accurate map of
a specified
entity and other features in the surrounding area.
The system will continue to be evaluated, tested and documented. Upon completion of this
process, the final product will be made available for distribution to other Regions and interested
state agencies.
ACKNOWLEDGEMENTS
The authors wish to thank the other members of the Spatial Preferred Locational Attribute Table
Team for their participation and support: Pat Ausman, Robert Eckman, Don Evans,
Phyllis Mann, Barry Bolka, Cheryl Henley, Randy Deardorff, Dave Wolf, Andy Battin and a
special thanks to Loren Hall.
Vickie S. Damm
US Environmental Protection Agency
Region 7
726 Minnesota Ave.
Kansas City, KS 66101
Telephone: (913) 551-7247
Fax: (913) 551-7863
E-mail: damm.vickie@epamail.epa.gov
Kelvin L. Moseman
CDSI/EPA Region 6
Suite 1200
1445 Ross Ave.
Dallas, TX 75202
Telephone: (214) 665-8562
Fax: (214) 665-2146
E-mail: kmoseman@r6ser1.r06.epa.gov
Anthony R. Selle
US Environmental Protection Agency
Region 8
999 18th Street
Denver, CO 80202
Telephone: (303) 312-6774
Fax: (303) 312-6065
E-mail: selle.tony@epamail.epa.gov