An ArcGIS-based System to Create Spatial Surrogate Data for Geographically Distributing Air Emissions

Patricia S. Stiefer and Tami H. Funk, Sonoma Technology, Inc., Petaluma, CA

Emission inventories are used for regional photochemical modeling of ozone formation in the atmosphere to help determine possible source control strategies to improve air quality. Detailed, gridded emission inventories are required as input to air quality models and must include emissions estimates of ozone precursor pollutants reported at a sub-state and -county grid-cell level. The geographic locations of many large point sources such as power plants and factories are known, providing a method to assign emissions from these sources to specific geographic locations. In contrast, area sources tend to be smaller, transient, and widely distributed throughout a region. Because the exact locations of area sources are often unknown, the geographic distribution of these sources must be approximated using emissions surrogates which are geographic land feature data that can be used as indicators of emissions activity because their spatial distributions are assumed to be representative of the geographic distribution of emissions sources. Developing emissions surrogates is a labor-intensive process that involves acquiring, formatting, and processing spatial data sets (in many different formats) from many different sources. Sonoma Technology, Inc. (STI) recently developed an ArcGIS-based system to organize, format, and automate data processing of emissions surrogate data. This paper presents an overview of the ArcGIS-based processing system as it was applied to the development of gridded surrogate data for the California Air Resources Board (CARB) and the Texas Natural Resource Conservation Commission (TNRCC).


Introduction

Health-based air quality standards have been established by the federal government for several criteria pollutants including ozone. Ozone in the lower atmosphere is produced through complex chemical reactions involving nitrogen oxides (NOx), hydrocarbons, and sunlight. Nitrogen oxides are among the primary pollutants emitted by combustion sources. Hydrocarbons are released into the atmosphere through the combustion, handling, and processing of petroleum products as well as through the evaporation of volatile organic compounds used for industrial processes.

Local, state, and federal government agencies operate air quality monitors to determine prevailing ambient conditions in select urban and suburban regions. These monitoring data are used to determine whether a particular area is in attainment with National Ambient Air Quality Standards (NAAQS) (U.S. Environmental Protection Agency, 2001). When an area fails to meet the federal standard for ozone, that area is required to develop an emission inventory containing estimates of air emissions from both man-made and natural sources. Emission inventory data is then used for regional photochemical modeling of ozone formation in the atmosphere to determine possible source control strategies to improve air quality.

Photochemical air quality models use meteorological, topographic, and emission inventory data to simulate the physical and chemical processes that influence ozone formation in an airshed. Grid-based air quality models portray the modeling region as a three dimensional grid matrix and ozone concentrations are predicted for each individual grid cell in the modeling domain. To accurately represent the geographic distribution of emissions throughout a gridded modeling domain, it is necessary to develop an emission inventory that reports ozone precursor emissions corresponding to each grid cell of the modeling domain.

An emission inventory identifies sources of air pollutants in an area and quantifies emissions from specific sources. Emissions are typically reported at the state or county level for four types of sources that are necessary for ozone modeling:

Detailed, gridded emission inventories are required as input to air quality models and must include emissions estimates of ozone precursor pollutants reported at the grid-cell level. Modeling grids may extend over multi-state domains and may include hundreds of thousands of grid cells, depending on grid resolution. For large point sources such as power plants and factories, geographic coordinates are commonly reported with emissions, providing a method to assign emissions from these sources to specific geographic locations. In contrast, area sources tend to be smaller, transient, and widely distributed throughout a region. Because the exact locations of area sources are often unknown, the geographic distribution of these sources must be approximated using emissions surrogates. Emissions surrogates are geographically resolved socio-economic, facility, and land cover data that are used as indicators of emissions activity because their spatial distributions are assumed to be representative of the geographic distribution of area sources.

Developing emissions surrogates is a labor-intensive process that involves acquiring, formatting, and processing spatial data sets (in many different formats) from many different sources. These spatial data sets are then imposed onto the defined modeling grid and the geographic features are disaggregated into individual grid cells. Sonoma Technology, Inc. (STI) recently developed an ArcGIS-based system to organize, format, and automate data processing of emissions surrogate data. This paper presents an overview of the ArcGIS-based processing system as it was applied to the development of gridded surrogate data for the California Air Resources Board (CARB) and the Texas Natural Resource Conservation Commission (TNRCC).

Background

Historically, spatial allocation of state- or county-wide area source emissions was based on limited spatial surrogate data. However, improvements in GIS technology and spatial data sets have increased the amount and resolution of surrogate data available for use in emissions surrogate development. For example, 200-m resolution U.S. Geological Survey (USGS) land cover data developed in the 1970s has been augmented by the recent release of the 30 m resolution National Land Cover Data (NLCD) (U.S. Geological Survey, 2002).

Spatial surrogates are geographically resolved socio-economic, facility, and land cover data that can be used as indicators of area source emissions activity because their spatial distribution is assumed to be representative of the source. Agricultural land cover data, for example, can be used as a surrogate for emissions from area sources such as off-road agricultural equipment. Spatial allocation factors, or computed values that represent the proportion of a surrogate within a grid cell as a fraction of the state (or county) total surrogate value, are derived from emissions surrogate data. If emission inventory data are reported by state or county, spatial allocation factors can be multiplied by the state or county total emissions to determine the fraction of emissions that occur in each grid cell.

An ArcGIS-based processing system was developed to produce gridded surrogate data and spatial allocation factors for the CARB (Funk, 2001a) and TNRCC (Funk, 2001b). Both projects required that gridded surrogate data be developed to support air quality modeling and were based on very large, highly resolved (2-km x 2-km) modeling domains. The CARB modeling domain encompassed the entire state of California as well as offshore waters. The TNRCC domain included portions of Texas, Oklahoma, Arkansas, and Louisiana, as well as three smaller, nested 1-km x 1-km grids within Texas. The extents of the CARB and TNRCC domains are shown in Figures 1 and 2, respectively.

CARB Modeling Domain
Figure 1. California Air Resources Board modeling domain.
TNRCC Modeling Domain
Figure 2. Texas Natural Resources Conservation Commission modeling domain.

Acquisition and development of emissions surrogate data for regions as large as those shown in Figures 1 and 2 is a major effort. The amount of spatial data, inconsistencies among data formats and projections, and data quality present challenges in preparing spatial data sets for surrogate development. Furthermore, manual processing and quality assurance of this amount of spatial data is extremely labor intensive. Lastly, the spatial data sets used for surrogate development are updated periodically, thus creating the need for processing methods that are easily updateable as new data become available. The ArcGIS-based processing system offers several benefits for large-scale emissions surrogate projects:

The remainder of this paper describes the process used to apportion county-wide area source emissions estimates to individual grid cells in a modeling domain, using surrogate-based spatial allocation factors for select area sources. High-resolution NLCD recently released by the USGS is used to illustrate the development of spatial allocation factors for emissions activities associated with residential land use. The NLCD is 30-m resolution land cover data for the conterminous United States, based primarily on Landsat thematic mapper data acquired in 1992 (U.S. Geological Survey, 2001).

Approach

The approach for developing emissions surrogates consists of three general steps. The first step involves performing research to identify sources of spatial data available for surrogate development. The second step involves assigning spatial surrogates to emissions source categories, and the third step involves the processing and development of gridded surrogate and spatial allocation factor files. Each general step is discussed in detail below.

Review of Spatial Data Appropriate for Surrogate Development

Spatially resolved surrogate data associated with point, line, or polygon geographic features are typically obtained from local transportation planning organizations (TPOs), air pollution control agencies, state and federal agencies, and commercially available data sources. Examples of surrogate data include population, employment and housing statistics; addresses of small facility sources such as service stations or auto body refinishers; highways, railroads, and shipping lanes; elevation; local, state, and national land cover data; and water bodies used for recreational activities.

A thorough review was performed to identify sources of local, regional, and statewide spatial data sets appropriate for use in the development of emissions surrogates and spatial allocation factors. Multiple sources of spatial data often exist for the same surrogate data set. For example, land use data are available from many different public and private entities. Consideration must be given to the spatial resolution, vintage, and quality of each data set. Furthermore, each data set must be assessed to determine how well the spatial data represent each emissions source.

Assignment of Spatial Surrogates to Emissions Source Categories

Each spatial surrogate must be assigned to an emission source category in the inventory; a surrogate may be associated with more than one source category. A total of 65 surrogates were developed for approximately 500 emissions source categories for the CARB project, and 22 surrogates were developed for over 200 source categories for the TNRCC project. Table 1 provides several examples of surrogate-emission source category assignments for select source categories.

Table 1. Example of select surrogate-emission source category assignments

Emissions Source Category Description Spatial Surrogate
Agricultural irrigation internal combustion engines Agricultural cropland
Pesticide application Agricultural land cover
Recreational boats Lakes, reservoirs, coastline, rivers
Dry cleaning Locations of dry cleaners
Vehicle refueling - vapor displacement losses Locations of gasoline service stations
Wine fermentation Locations of wineries
Military aircraft Military airports
Marine coatings Port locations
Locomotives Railroad lines
Liquid petroleum gas - residential Residential land cover
Oil-based traffic coatings Roads
Marine vessels, commercial diesel Shipping lanes


Development of Gridded Surrogate and Spatial Allocation Factor Data

The ArcGIS-based processing system was developed to grid and display the emissions surrogate data and to calculate and quality assure the spatial allocation factors. The processing scheme consists of six steps and includes multiple tiers of ArcGIS spatial data sets, Access databases, and Arc Macro Language (AML) scripts to automate data processing within ArcGIS. The system can be used to develop gridded surrogate data and spatial allocation factors for point, line, or polygon features.

Step 1. Development of Standardized Basemaps. Spatial surrogate data obtained from various sources (in many different formats and projections) were standardized by converting the raw files into ArcInfo coverages and reprojecting them to match the modeling grid.

Step 2: Preliminary Data Processing. Access databases containing tabular feature attribute and statistical data were constructed to standardize all tabular data sets for use in spatial allocation factor calculations.

Step 3: Gridding Spatial Surrogate Data. The basemap coverages were spatially disaggregated into grid cells by overlaying each polygon, arc, or point basemap coverage with the grid cell domain coverage using ArcInfo and customized AML processing scripts. The output of this step is gridded spatial data that is used as input to Step 4.

Step 4: Calculations. Databases were constructed to ingest data produced in Steps 2 and 3 and calculate the gridded surrogate and spatial allocation factor tables. The gridded surrogate data values are based on the density of surrogate features (e.g., area, length, or number) contained in each grid cell. The spatial allocation factors are weighted values that indicate what fraction of the total surrogate value for a state or county resides in each grid cell.

Step 5: Quality Assurance of Gridded Surrogate and Spatial Allocation Factor Data. ArcInfo geodatabases were created to display and quality assure the gridded surrogate and spatial allocation factor data created by the database calculations in Step 4.

Step 6: Development of Spatial Allocation Factor Database. The final step in the processing system involves the development of the final spatial allocation factor database. The final database incorporates all the spatial allocation factor tables for each surrogate in a standard format suitable for input into a photochemical or other grid-based model.

For the TNRCC project, the 30-m resolution NLCD was used as the spatial surrogate data for emission sources associated with housing density, industrial and commercial activity, and forested or agricultural areas. The NLCD residential data for Louisiana, the state most completely encompassed by the TNRCC 2-km x 2-km modeling domain, is presented in this paper to illustrate each of the six processing steps in the ArcGIS-based system discussed above.

The NLCD data exist as image files in their native format. The native NLCD image files were converted to Esri raster grids and reprojected to match the TNRCC modeling grid. Figure 3 shows the NLCD for Louisiana including all land cover classes included in the NLCD. Many of the land use categories contained in the NLCD were not used to develop surrogate data. The land use classes of interest were extracted from the data set and were reclassed into four land use surrogates as presented in Figure 4. The four reclassed grids, each representing a single land use surrogate, were then converted to polygon coverages to arrive at standardized basemaps. Figure 5 shows the resulting basemap coverage for residential land use.

Original NLCD land cover classes in Louisiana
Figure 3. Raw data with 18 land cover classes.

Reclassed grids representing land use surrogates
Figure 4. Raw NLCD reclassed to four land use types.

Basemap coverage for residential land use
Figure 5. Basemap coverage for residential land use.

The next step in the processing scheme is to overlay the residential land use map with the grid domain coverage. Figure 6 illustrates the result of gridding the NLCD residential land use surrogate. Each 2-km x 2-km grid cell contains between 0 and 4 km2 of residential land area. Next, the gridded surrogate value for each grid cell is divided by the aggregate county total surrogate value to obtain the portion of surrogate in each grid cell as a fraction of the state or county total. Figure 7 illustrates the residential spatial allocation factors generated from the gridded surrogate data created in Steps 3 and 4. Each 2-km x 2-km grid cell in Figure 7 indicates the fraction of the county (or parish) total residential land area that is contained in each grid cell.

Residential land cover in square kilometers
Figure 6. Square kilometers of residential land cover in each 2-km x 2-km grid cell.

Spatial allocation factors for residential surrogates
Figure 7. Final spatial allocation factors for residential surrogates.

Summary and Conclusions

The ArcGIS-based processing system provides an organized, structured method to develop gridded surrogate and spatial allocation factor data on a large-scale. This system offers many benefits:

STI's future work in this area will continue to expand on the ArcGIS-based processing system including the development of customized processing applications and integration of new GIS spatial analysis tools.

References

Funk T.H., Stiefer P.S., and Chinkin L.R. (2001a) Development of gridded spatial allocation factors for the state of California. Technical memorandum prepared for California Air Resources Board, Sacramento, CA by Sonoma Technology, Inc., Petaluma, CA, STI-900201/999542-2092-TM, July.

Funk T.H., Stiefer P.S., and Chinkin L.R. (2001b) Development of gridded spatial allocation factors for the state of Texas. Final report prepared for Texas Natural Resource Conservation Commission, Sacramento, CA by Sonoma Technology, Inc., Petaluma, CA, STI-900570-2114-FR3, August.

U.S. Environmental Protection Agency (2001) National Ambient Air Quality Standards (NAAQS). Office of Air Quality Planning and Standards, February. Web page at http://www.epa.gov/airs/criteria.html, last accessed June 20, 2002.

U.S. Geological Survey (2001) National Land Cover Data, Product Description. Web page at http://landcover.usgs.gov/prodescription.html, last accessed on June 20, 2002.

U.S. Geological Survey (2002) National Land Cover Characterization. April. Web page at http://landcover.usgs.gov/natllandcover.html, last accessed on June 20, 2002.


Patricia S. Stiefer
GIS Specialist
Sonoma Technology Inc.
1360 Redwood Way, Suite C
Petaluma CA 94954
Phone: 707-665-9900
Fax: 707-665-9800
Email: pats@sonomatech.com

Tami H. Funk
Project Manager/GIS Specialist
Sonoma Technology, Inc.
1360 Redwood Way, Suite C
Petaluma CA 94954
Phone: 707-665-9900
Fax: 707-665-9800
Email: tami@sonomatech.com