Jian Dai and David M. Rocke

Using GIS to Evaluate Spatial Variations in Area Source Emissions

Abstract

Area source emissions of air pollution have been traditionally estimated at the national level and then allocated to the states and counties. The spatial variations within a county have been largely ignored mainly due to the difficulty in collecting, processing, and analyzing the data at smaller geographic scales. With the increasing availability of digital data relevant to emission studies and the special capabilities of a GIS to manage spatially referenced data, the spatial variations of emissions within counties can now be analyzed. This paper reports a new method for allocation of certain area source emissions to sub-county units using GIS methods. The approach consists of three main steps. The first step is development of a spatial data base. This includes identification of the emission producing facilities of interest in the study area, geo-coding the locations of the facilities by address-matching with the TIGER/Line coverages, and to develop other geo-referenced data related to the emission producing activities. The second step is disaggregation of larger areas such counties into smaller zones. Conventional model grid cells are used as the unit for allocation and modeling. The data coverages are integrated with the model grid cell coverage for analysis. In step 3, county-wide emission estimates are allocated to the grid cells based on attributes of interest, and spatial variations are evaluated. The paper also describes the methods for developing predictive statistical models for estimating emission related activities based on easily available data. The procedures and methods discussed in the paper are illustrated in an application, namely automobile refinishing emissions in the Sacramento modeling region in the State of California.


I. Introduction

Area source emissions of air pollution have been traditionally estimated at the national level and then allocated to the states and counties. The spatial variations within a county have been largely ignored mainly due to the difficulty in collecting, processing, and analyzing data at smaller geographic scales. With the increasing availability of digital data relevant to emission studies and the special capabilities of a GIS to manage spatially referenced data, spatial variations of emissions within counties can now be analyzed. This paper reports a new method for allocation of certain area source emissions to sub-county units using GIS methods.

Geographic information systems have been widely used in environmental modeling and analysis including studies of the hydrological systems, the atmospheric systems, the land surface and subsurface processes, biological and ecological modeling, and risk and hazards (Goodchild et al., 1996; U.S. EPA, 1995; Goodchild, Parks and Steyaert, 1993). Applications of GIS to air pollution problems include air quality analysis (Jensen and Sathisan, 1996; Hallmark and O'Neill, 1995; Souleyrette et al., 1992), mobile source emission modeling (Bachman et al., 1996a, 1996b; Sarasua et al., 1996), and estimation of PM10 (particulate matter smaller than 10 microns in aerometric diameter) area source emissions (Shimp and Campbell, 1995). Using agriculture tilling emission inventory as an example, Shimp and Campbell have shown that the GIS is a valuable tool for estimating and analyzing spatially distributed emissions.

In emission inventory, area sources are generally referred to those sources that individually emit relatively small quantities of air pollutants but collectively results in significant emissions (CARB, 1995). This paper proposes a GIS-based approach to spatial allocation of emission estimates for those area sources that are individually can be referenced to specific points in space (e.g. automobile refinishing activity). The approach consists of three main steps. The first step is development of a spatial database. This includes identifying the emission-producing facilities of interest in the study area, geocoding the locations of the facilities by addressing-matching with the TIGER/Line coverages, and developing other geo-referenced data related to the activities that generate the emissions. The second step is disaggregation of larger areas such as counties into smaller zones. Conventional model grid cells are used as the unit for allocation and modeling. The data coverages are integrated with the model grid cell coverage for modeling. In step three, county-wide emission estimates are allocated to the model grid cells based on attributes of interest, and spatial variations are evaluated. Often, predictive models can be developed by correlating emission-producing activity with relevant data. The paper describes the methods for developing predictive statistical models for emission allocation. The methods and procedures discussed in this paper are illustrated in an application, namely automobile refinishing emissions in the Sacramento modeling region in the State of California.

II. Methodology

Spatial analysis and allocation of area source emissions involve complicated spatial data manipulations. Geocoding tools are needed for geo-referencing emission activities, and spatial overlay tools are needed for disaggregating and aggregating data layers to model grid cells. Statistical spatial modeling tools are needed for estimating and predicting emission activity based on a sample of data.

2.1 Spatial Database Development

The basic spatial objects in a GIS are points, lines, and polygons (areas). In the spatial database for the emission analysis, an industrial facility that emits pollutants of interest is represented by a point entity consisting of a single XY coordinate pair and a number of attributes. The object for emission allocation is an area entity represented by a grid square.

The locations of the emission-producing facilities can be identified and geocoded by their addresses. The addresses of the facilities can be found from a variety of sources such as phone books, commercial business lists, government records, or online databases. Geocoding is a mechanism for building a database relationship between addresses and spatial features. In geocoding, a GIS compares the address of the facility against the street coverage that has address attributes. When a match is found, a geographic coordinate is calculated for the address, and a spatial point is created in the database. If the address wasn't matched, the GIS would display diagnostic massages that explain why a match was not found, allow the address in question to be edited, and restart the matching process. Specifically, the ArcInfo GIS provides following geocoding capabilities: creating address coverage (or converting TIGER files to address coverages), building and maintaining INFO files containing a list of address to be matched, matching the list to the address coverage to create points (or matching addresses interactively to specified locations), processing unmatched addresses, and maintaining address coverage.

To create the model grid coverage, the cell size and the extend of the coverage must be decided. The smaller the size the finer the spatial resolution for allocation and modeling. The map extend should be big enough to cover whole study area. The coordinates for the intersections of the grid lines can be easily computed. Using the coordinates, a GIS creates the grid square coverage and builds it into polygon topology. Other data sets often need to be developed or utilized for analysis and modeling. For example, it may be useful to show where the emission sources are located in relation to land uses, road network, and population distribution. In the application presented in Section 3, we select urban land use, miles of highways, population density, and retail employment density as the spatial surrogates for allocation of automobile refinishing emissions. In the database for the application, population and employment data are attributes of the census tract boundary layer which has polygon topology. The land use data is a polygon layer, and the highway layer is a line layer.

2.2 Spatial Data Overlay and Spatial Allocation

Spatial Overlay

A spatial database for emission analysis consists of a variety of elements in different spatial units and at a variety of resolutions. For example, the facilities are spatial points, while the unit for emission allocation is square cells. The demographic variables can be based on census tracts, census blocks, or postal ZIP codes. It is a common problem in spatial analysis that the spatial units for which data are available are not necessarily the one that the analysis or modeling requires. A solution to the problem is spatial data overlay. A GIS has strong spatial overlay capabilities and is ideally suitable for deriving data in the target unit given the relevant source data. Three types of spatial overlay operations are usually used in emission analysis and modeling:

(1) point in polygon operation - count the number of emission points that fall within a cell;

(2) polygon on polygon operation - proportion a polygon's attribute value to a cell (e.g. population density); and

(3) line in polygon operation - compute the length of a linear feature (e.g. a highway) within a cell .

Areal Weighting Method

A technical issue involved in polygon overlays is known as the areal weighting problem. Since the boundaries of the source zone and the target zone usually don't coincide, one must weight the source zone values according to the area of the target zone they make up. The method of areal weighting is briefly described below.

Let V be the variable of interest, S be the source zone, T be the target zone, and A be the area of a zone. In the example of deriving population density for the model grid cells, V is the population density variable, Zone S may be a census tract, and Zone T is a grid cell. As S intersects T, their boundaries form a zone of intersection ST. The problem is to find the value of V for the target zone or the intersection zone. The computation depends on the measurement of V, whether it is "extensive" or "intensive" (Goodchild and Lam, 1980). The variable V is extensive if its value for a target zone is equal to the sum of its values for the intersection zone. V is intensive if its value for a target zone is weighted average of its value for the intersection zones. V is usually considered to be extensive if it is a count (e.g. number of employees), and to be intensive when it is proportions, percentages or rates (e.g. percentage of urban land).

Assuming that V is evenly distributed within the source zone, the values of V are computed as follows. If it is an extensive variable,

Vt = Sum (Vs Ast / As)

If V is an intensive variable,

Vt = Sum (Vs Ast / At)

where the summations are done are over s (source zones). The assumption that the variable of interest is uniformly distributed over the source zone is not always plausible. For example, there might be a lake in the zone. In those cases, the methods of areal interpolation using ancillary data (Flowerdew and Green, 1994; Green, 1990) can be used which take into account other relevant information available about the source zones.

Spatial Allocation Methods

Spatial allocation of emissions or emission activities can be done by overlaying the facility location layer with the grid cell layer and aggregating the facility points in each cell. The facility location layer can have a number of attributes to be used for allocation. Two methods of spatial allocation are considered here. The first method may be called discrete allocation. Using this method, one first overlays the facility point coverage over the grid cell coverage and then computes for each zone the fraction (weight) for allocation. Let aj be the value of attribute of interest of facility point j (j=1,…,M), and Wi be the weight for cell i (i=1,…,N), then

Wi = Ci / T

where Ci = Sum(aj ) and T = Sum(Ci ).

The discrete allocation method treats the pollution area as a collection of discrete grid cells. The operation is simple but won't produce a smooth pollution surface. If the attribute value used for allocation is measured at the interval or ratio scale, one may first use the value to interpolate a surface, and then overlays the surface to the grid cell layer to estimate the allocation weight for each cell. The second method, therefore, may be called the surface allocation method. The surface of interest can be generated using the Triangulated Irregular Network (TIN) data model (Peuker et. al., 1978; Esri, 1994). The TIN is a surface model that uses a sheet of continuous, connected triangular facets based on a Delaunay triangulation of irregularly spaced sample points. Using either method, one transforms point objects (emission-producing facilities) into areal units (emission allocation weights for grid cells).

2.3 Statistical Spatial Modeling

Statistical methods can be used to analyze and predict the spatial variations of pollution. Here, the purpose of building statistical models is to predict the spatial distributions of the emission-producing activity. This is done by correlating the activity with relevant data. The selection of models depends the measurement of the prediction variable of interest and the availability of data. A simple predictive model can be made by assuming that the emission location follows the Poisson distribution. The Poisson model is a spatial point process model and has been widely used for spatial analysis.

Poisson Regression

Let yi be the number of emission producing facilities in a grid cell. The model assumes that each yi is drawn from a Poisson distribution with parameter hi , which is related to the regressors, xi. The Poisson model is

Pr(Yi = yi) = (exp(-hi) hi yi) / yi! yi = 0, 1, 2, …

The most common formula for hi is

ln hi = b'xi

It can be easily shown that

E[yi | xi] = Var [yi | xi] = hi = exp(b'xi)

where b is a vector of regression coefficients and xi are the explanatory variables. Thus, the Poisson model is a nonlinear regression. Denote b* as the estimated values of b and yi* as the prediction, then

yi* = exp(b*'xi)

The parameters of the Poisson regression model can be estimated using the maximum likelihood method. The log-likelihood function is

ln L = Sum ( -hi + yi b'xi - lnyi!)

The likelihood equations are

dlnL/db = Sum ( -hi + yi ) xi = 0

The Hessian is

d2 lnL/db db' = - Sum( hi xi xi').

III. An Application

The application is spatial allocation of automobile refinishing emissions in the Sacramento modeling region in the State of California. The study area consists of three counties, Sacramento, Solano, and Yolo. Automobile refinishing is one of the emission inventory source categories defined by California Air Resources Board (CARB, 1995). The main pollutants from auto refinishing are the total organic gas (TOG) emissions, which are due to the solvents contained in auto refinishing products such as refinish paints, snamels and lacquers, refinish primers and undercoatings. The spatial distribution of auto refinishing emissions in this area as well as development of new methodology for area source emission allocation are of interest to the state and local agencies working on the emission inventory and attainment plan.

Auto refinishing emissions have been traditionally estimated at the national level and then allocated to the states and counties. In this study, we use the GIS methods discussed in the previous section to allocate the emission estimates to sub-county units, specifically, 4 km by 4 km model grid squares. Furthermore, by integrated use of GIS and statistical analysis tools, we build a predictive statistical model for emission allocation. Specifically, Poisson regression models are estimated by correlating the emission activity with data that are widely available such as urban land use, major highways, and population density.

3.1 Data Collection and Database Development

To collect data for the study, we consulted and searched a variety of data sources including federal, state, and local government agencies, libraries, the Internet and online databases, and data vendors. The names and addresses of auto refinishing businesses are listed on local phone books. This information is also available from commercial business lists providers and the online Yellow Pages. We used following data as spatial surrogates for allocation: urban land use, major roads, population and employment densities. These data were selected because they are widely available and can be routinely obtained throughout the state. The land use data was obtained from the California Department of Water Resources (DWR). The DWR data contain fairly detailed land use information and is based on recent land use surveys. The years of surveys for the three counties range from 1989 to 1994. The land use data came in different formats. The data for Solano and Yolo counties were in AutoCAD format, while the data for Sacramento county was in Intergraph format. The data were converted into ArcInfo format using a set of AML programs.

The street and highway GIS data was created based on the 1994 TIGER files, which are available from the U.S. Census Bureau and could be downloaded directly from the World Wide Web. We converted the TIGER files into ArcInfo format and built them into address coverages. The 1990 population census data were from two sources. The census tract boundary files in ArcInfo format and with population density attributes were obtained from the Teale Data Center in Sacramento. The employment data were copied from the 1990 census (Summary Tape File 3) CD-ROM, which are available in many public libraries. The employment data were added as attributes to the INFO files of the census tract coverages.

Since the spatial data sets obtained from different sources were in different geographic projections, it was necessary to reproject them into a common geographic coordinate system. We selected UTM (Zone 10) as the standard projection and reprojected coverages that were in non-UTM systems. The map layers were overlaid on the computer screen and visually compared to find any discrepancies. The layers were edited, shifted, or reprojected if necessary. The locations of auto refinishing facilities were geo-coded to create a point data layer. This was accomplished by matching the addresses of the facilities with the ones in the street data layer. Since the TIGER files contained incomplete addresses and errors, detailed street maps of the local areas were used for references and check-ups.

3.2 Spatial Overlay and Allocation

To transfer the attribute values in point, line, and polygon layers to grid cells for allocation and modeling, spatial overlay operations were carried out. The overlaying grid cell layer was created and built into a polygon coverage. To count the number of auto refinishing facilities that fall within a cell, a point in polygon operation was performed. A line in polygon operation was needed to compute the length of highways within a grid cell. To obtain population density and employment density values for the grid cells, polygon overlays were necessary. In this operation, the census tract polygons were first broken into smaller areas that fell completely within overlaying grid cells, and then the areas within each cell were aggregated. The density value for each cell was calculated using the areal weighting method described in section 2.2. We wrote a set of AML programs to automate the computation processes.

The spatial allocation factor for each grid cell was computed using the discrete allocation method presented in Section 2.2 due to its simplicity. Map 1 shows the spatial distribution of auto refinishing facilities in the study area with relation to urban land use and major highways. Map 2 displays the results of the allocation. The maps clearly show that the emission activities are clustered in urban areas and along major highways. Most of the auto refinishing businesses are located in the Greater Sacramento area, which is the dominant urban center in the region. In Solano county, the emission sources are concentrated in cities along the interstate highway I-80, Vallejo, Fairfield, and Vacaville. Yolo county is the most sparsely populated area in the region, and has much fewer auto refinishing facilities, most of which are found in Woodland. Map 2 shows that auto refinishing activities are not only concentrated in urban areas but also distributed unevenly within the urban areas.





Map 1 - Spatial Distribution of Auto Refinishing Facilities in the Study Area

Spatial Distribution of Auto Refinishing   Facilities in Sacramento Modeling Region










Map 2 - Spatial Allocation of Auto Refinishing Emissions for the Study Area

Spatial Allocation of Auto Refinishing  Emissions in Sacramento Modeling Region



3.3 Poisson Regression Estimation

The objective was to predict auto refinishing activity based on data that are widely available and can be updated on a regular basis, facilitating auto refinishing emission inventory. The Poisson regression model was used to estimate the number of auto refinishing facilities in a grid cell. The variables used in the Poisson regression model are defined in Table 1.

Using the data on these variables collected from the study area, we estimated a number of Poisson regression models. The preliminary results suggested that the model with all five predictors fit the data best. The estimation results of the model are presented in Table 2. All coefficients of the model are statistically significantly different from zero. The coefficients for URBAN, HWAY, PDEN, RDEN have positive signs, indicating positive relationships between these variables and the prediction variable. This is consistent with our prior expectation that auto refinishing activities tend to occur in areas with higher percentage of urban land use, higher population density and retail employment density, and better highway accessibility. In other words, the model results reflect the fact that auto refinishing businesses tend to be located near where the demand is. Using the model, we are able to predict the new level of auto refinishing activity in the grid cells given new data on these predictor variables.

Table 1 - Definition of Variables

-------------------------------------------------------------------------------------------------------

Dependent Variable

FACILITY Number of auto refinishing facilities

Predictors

URBAN Percentage of urban land use

HWY Miles of highway

PDEN Population density (1000 persons/square miles)

RDEN Retail employment density (1000 employees/squrare miles)

PR pden times rden

-------------------------------------------------------------------------------------------------------



Table 2 - Poisson Regression Estimation Results

-------------------------------------------------------------------------------------------------------

Variable Coefficient Std. Err. Asy. t ratio P|t|

URBAN 0.0407 0.0048 8.567 0.000

HWY 0.1467 0.0203 7.214 0.000

PDEN 0.4091 0.1098 3.727 0.000

RDEN 3.1394 1.3505 2.325 0.021

PR -1.2994 0.2231 -5.825 0.000

Cons_ -2.7181 0.1528 -17.789 0.000

Summary Statistics

Number of observations = 545

Model chi2(5) = 1198

Pseudo r2 = 0.61

-----------------------------------------------------------------------------------------------------

IV. Conclusions

In this paper we have described a new methodology for allocating certain area source emissions estimates to sub-county units (grid cells). These emission sources are individually points in space and emitting relatively small quantities of pollutants but collectively result in significant emissions. Since the activities associated with the emissions can be referenced to specific points in space, a GIS with powerful geocoding and spatial overlay capabilities provides an ideal tool for allocating these emissions. The combination of GIS and statistical modeling makes it possible to correlate emission activity with data on variables related to the activity and to predict the change of the activity over time and space.

In implementation and application, data availability is often an important consideration. One of the major constraints in emission modeling and indeed in many GIS applications is the lack of specialized data for study purposes. The application presented in this paper is designed to allocate emissions estimates to grid cells based on widely available data, so that it can be easily applied to other regions. It has the merit of simplicity and low cost. However, the accuracy of the emission estimate is uncertain. Emissions of small areas can be directly estimated from the individual facilities that emit the emissions if emission specific information at the facility level is available. This bottom-up approach might be able to produce better estimates of area source emissions, and can provide data to evaluate the accuracy of the spatial allocation estimates. The methodology described in this paper is readily to be applied to this approach. The collection of detailed information at the facility level, however, involves costly surveys and requires the cooperation of the managers of the facilities.

The GIS is an indispensable tool for air quality analysis and management. In addition to its powerful capabilities in spatial database development, spatial data processing, managing and modeling, it provides visualization and map-making tools that can be used to effectively present the spatial variability of emissions. The use of GIS can improve not only the analytical capabilities for emission inventory and air pollution management but also our ability to communicate work results and research findings to the decision makers and the public as general.

Acknowledgments

This paper is based on work supported by the California Air Resources Board. Any opinion, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the California Air Resources Board. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. The authors wish to thank Carmen Mayo, Pingkuan Di and Rhonda Boughtin for assistance in preparing the data for application.



References

Bachman, W., W. Sarasua, R. Guensler, 1996. A GIS Framework for Mobile Emissions Modeling. Washington D.C.: Transportation Research Board.

Bachman, W., et al, 1996. Integrating Travel Demand Forecasting Models with GIS to Estimate Hot Stabilized Mobile Source Emissions. In: Geographic Information Systems for Transportation (GIS-T) Symposium. Proceedings of the 1996 Geographic Information Systems for Transportation (GIS-T) Symposium. Washington, D. C.: American Association of State Highway and Transportation Officials.

California Air Resources Board, 1995. Emission Inventory Procedural Manual, Volume III, Methods for Assessing Area Source Emissions. California Environmental Protection Agency.

Environmental Systems Research Institute, 1994. Surface Modeling with TIN. Redland, California.

Flowerdew, R., and M. Green, 1994. Areal Interpolation and Types of Data. In S. Fotheringham and P. Rogerson (eds.), Spatial Analysis and GIS. Bristol, PA: Taylor & Francis Inc.

Goodchild, M.F., et al., eds, 1996. GIS and Environmental Modeling: Progress and Research Issues. Fort Collins, CO: GIS World Books.

Goodchild, M. F., B. O. Parks, and L. T. Steyaert, 1993. Environmental Modeling with GIS. New York: Oxford University Press.

Green, M., 1990. Statistical Methods for Areal Interpolation, In J. Harts, H. F. L. Ottens and H. J. Scholten (eds.), EGIS '90: Proceedings of the First European Conference on Geographic Information Systems, EGIS Foundation, Utrecht, The Netherlands, 1, pp. 392-399.

Hallmark, S., W. O'Neill, 1995. Integrating Air Quality Analysis and GIS-T. In: Geographic Information Systems for Transportation (GIS-T) Symposium. Proceedings of the 1995 Geographic Information Systems for Transportation (GIS-T) Symposium. Washington, D. C.: American Association of State Highway and Transportation Officials.

Jensen, J. J., S. K. Sathisan. 1996. GIS Applications for Linking Travel Demand Modeling and Air Quality Analysis. In: Geographic Information Systems for Transportation (GIS-T) Symposium. Proceedings of the 1996 Geographic Information Systems for Transportation (GIS-T) Symposium. Washington, D. C.: American Association of State Highway and Transportation Officials.

Peuker, T. K., R. J. Fowler, J. J. Little, D. M. Mark, 1978. The triangulated irregular network. In: Proceedings of the DTM Symposium, American Society of Photogrammetry-American Congress on Survey and Mapping.

Shimp, D. R. and S. G. Campbell, 1995. Using a Geographic Information System to Evaluate PM10 Area Source Emissions. California Air Resources Board, P.O. Box 2815, Sacramento, CA 95812.

Sarasua, W. et al., 1996. Using a Dynamic GIS to Visualize and Analyze Mobile Source Emissions. In: Geographic Information Systems for Transportation (GIS-T) Symposium. Proceedings of the 1996 Geographic Information Systems for Transportation (GIS-T) Symposium. Washington, D. C.: American Association of State Highway and Transportation Officials.

Souleyrette, R. R., et al., 1992. GIS for Transportation and Air Quality Analysis. In: Transportation Planning and Air Quality. New York, N.Y.: American Society of Civil Engineers.

U.S. EPA, 1995. National Conference on Environmental Problem-solving with Geographic Information Systems, September 21-23: Cincinnati, Ohio. U.S. Environmental Protection Agency, Office of Research and Development.


Jian Dai and David M. Rocke
Graduate School of Management
University of California
Davis, CA 95611