Jian Dai and David M. Rocke
Area source emissions of air pollution have been traditionally
estimated at the national level and then allocated to the states
and counties. The spatial variations within a county have been
largely ignored mainly due to the difficulty in collecting, processing,
and analyzing the data at smaller geographic scales. With the
increasing availability of digital data relevant to emission studies
and the special capabilities of a GIS to manage spatially referenced
data, the spatial variations of emissions within counties can
now be analyzed. This paper reports a new method for allocation
of certain area source emissions to sub-county units using GIS
methods. The approach consists of three main steps. The first
step is development of a spatial data base. This includes identification
of the emission producing facilities of interest in the study
area, geo-coding the locations of the facilities by address-matching
with the TIGER/Line coverages, and to develop other geo-referenced
data related to the emission producing activities. The second
step is disaggregation of larger areas such counties into smaller
zones. Conventional model grid cells are used as the unit for
allocation and modeling. The data coverages are integrated with
the model grid cell coverage for analysis. In step 3, county-wide
emission estimates are allocated to the grid cells based on attributes
of interest, and spatial variations are evaluated. The paper also
describes the methods for developing predictive statistical models
for estimating emission related activities based on easily available
data. The procedures and methods discussed in the paper are illustrated
in an application, namely automobile refinishing emissions in
the Sacramento modeling region in the State of California.
Area source emissions of air pollution have been traditionally
estimated at the national level and then allocated to the states
and counties. The spatial variations within a county have been
largely ignored mainly due to the difficulty in collecting, processing,
and analyzing data at smaller geographic scales. With the increasing
availability of digital data relevant to emission studies and
the special capabilities of a GIS to manage spatially referenced
data, spatial variations of emissions within counties can now
be analyzed. This paper reports a new method for allocation of
certain area source emissions to sub-county units using GIS methods.
Geographic information systems have been widely used in environmental
modeling and analysis including studies of the hydrological systems,
the atmospheric systems, the land surface and subsurface processes,
biological and ecological modeling, and risk and hazards (Goodchild
et al., 1996; U.S. EPA, 1995; Goodchild, Parks and Steyaert, 1993).
Applications of GIS to air pollution problems include air quality
analysis (Jensen and Sathisan, 1996; Hallmark and O'Neill, 1995;
Souleyrette et al., 1992), mobile source emission modeling (Bachman
et al., 1996a, 1996b; Sarasua et al., 1996), and estimation of
PM10 (particulate matter smaller than 10 microns in
aerometric diameter) area source emissions (Shimp and Campbell,
1995). Using agriculture tilling emission inventory as an example,
Shimp and Campbell have shown that the GIS is a valuable tool
for estimating and analyzing spatially distributed emissions.
In emission inventory, area sources are generally referred to
those sources that individually emit relatively small quantities
of air pollutants but collectively results in significant emissions
(CARB, 1995). This paper proposes a GIS-based approach to spatial
allocation of emission estimates for those area sources that are
individually can be referenced to specific points in space (e.g.
automobile refinishing activity). The approach consists of three
main steps. The first step is development of a spatial database.
This includes identifying the emission-producing facilities of
interest in the study area, geocoding the locations of the facilities
by addressing-matching with the TIGER/Line coverages, and developing
other geo-referenced data related to the activities that generate
the emissions. The second step is disaggregation of larger areas
such as counties into smaller zones. Conventional model grid cells
are used as the unit for allocation and modeling. The data coverages
are integrated with the model grid cell coverage for modeling.
In step three, county-wide emission estimates are allocated to
the model grid cells based on attributes of interest, and spatial
variations are evaluated. Often, predictive models can be developed
by correlating emission-producing activity with relevant data.
The paper describes the methods for developing predictive statistical
models for emission allocation. The methods and procedures discussed
in this paper are illustrated in an application, namely automobile
refinishing emissions in the Sacramento modeling region in the
State of California.
Spatial analysis and allocation of area source emissions involve
complicated spatial data manipulations. Geocoding tools are needed
for geo-referencing emission activities, and spatial overlay tools
are needed for disaggregating and aggregating data layers to model
grid cells. Statistical spatial modeling tools are needed for
estimating and predicting emission activity based on a sample
of data.
The basic spatial objects in a GIS are points, lines, and polygons
(areas). In the spatial database for the emission analysis, an
industrial facility that emits pollutants of interest is represented
by a point entity consisting of a single XY coordinate pair and
a number of attributes. The object for emission allocation is
an area entity represented by a grid square.
The locations of the emission-producing facilities can be identified
and geocoded by their addresses. The addresses of the facilities
can be found from a variety of sources such as phone books, commercial
business lists, government records, or online databases. Geocoding
is a mechanism for building a database relationship between addresses
and spatial features. In geocoding, a GIS compares the address
of the facility against the street coverage that has address attributes.
When a match is found, a geographic coordinate is calculated for
the address, and a spatial point is created in the database. If
the address wasn't matched, the GIS would display diagnostic massages
that explain why a match was not found, allow the address in question
to be edited, and restart the matching process. Specifically,
the ArcInfo GIS provides following geocoding capabilities: creating
address coverage (or converting TIGER files to address coverages),
building and maintaining INFO files containing a list of address
to be matched, matching the list to the address coverage to create
points (or matching addresses interactively to specified locations),
processing unmatched addresses, and maintaining address coverage.
To create the model grid coverage, the cell size and the extend
of the coverage must be decided. The smaller the size the finer
the spatial resolution for allocation and modeling. The map extend
should be big enough to cover whole study area. The coordinates
for the intersections of the grid lines can be easily computed.
Using the coordinates, a GIS creates the grid square coverage
and builds it into polygon topology. Other data sets often need
to be developed or utilized for analysis and modeling. For example,
it may be useful to show where the emission sources are located
in relation to land uses, road network, and population distribution.
In the application presented in Section 3, we select urban land
use, miles of highways, population density, and retail employment
density as the spatial surrogates for allocation of automobile
refinishing emissions. In the database for the application, population
and employment data are attributes of the census tract boundary
layer which has polygon topology. The land use data is a polygon
layer, and the highway layer is a line layer.
A spatial database for emission analysis consists of a variety of elements in different spatial units and at a variety of resolutions. For example, the facilities are spatial points, while the unit for emission allocation is square cells. The demographic variables can be based on census tracts, census blocks, or postal ZIP codes. It is a common problem in spatial analysis that the spatial units for which data are available are not necessarily the one that the analysis or modeling requires. A solution to the problem is spatial data overlay. A GIS has strong spatial overlay capabilities and is ideally suitable for deriving data in the target unit given the relevant source data. Three types of spatial overlay operations are usually used in emission analysis and modeling:
(1) point in polygon operation - count the number of emission points that fall within a cell;
(2) polygon on polygon operation - proportion a polygon's attribute value to a cell (e.g. population density); and
(3) line in polygon operation - compute the length of a linear
feature (e.g. a highway) within a cell .
A technical issue involved in polygon overlays is known as the
areal weighting problem. Since the boundaries of the source zone
and the target zone usually don't coincide, one must weight the
source zone values according to the area of the target zone they
make up. The method of areal weighting is briefly described below.
Let V be the variable of interest, S be the source zone, T be the target zone, and A be the area of a zone. In the example of deriving population density for the model grid cells, V is the population density variable, Zone S may be a census tract, and Zone T is a grid cell. As S intersects T, their boundaries form a zone of intersection ST. The problem is to find the value of V for the target zone or the intersection zone. The computation depends on the measurement of V, whether it is "extensive" or "intensive" (Goodchild and Lam, 1980). The variable V is extensive if its value for a target zone is equal to the sum of its values for the intersection zone. V is intensive if its value for a target zone is weighted average of its value for the intersection zones. V is usually considered to be extensive if it is a count (e.g. number of employees), and to be intensive when it is proportions, percentages or rates (e.g. percentage of urban land).
Assuming that V is evenly distributed within the source zone, the values of V are computed as follows. If it is an extensive variable,
Vt = Sum (Vs Ast / As)
If V is an intensive variable,
Vt = Sum (Vs Ast / At)
where the summations are done are over s (source zones). The assumption
that the variable of interest is uniformly distributed over the
source zone is not always plausible. For example, there might
be a lake in the zone. In those cases, the methods of areal interpolation
using ancillary data (Flowerdew and Green, 1994; Green, 1990)
can be used which take into account other relevant information
available about the source zones.
Spatial allocation of emissions or emission activities can be done by overlaying the facility location layer with the grid cell layer and aggregating the facility points in each cell. The facility location layer can have a number of attributes to be used for allocation. Two methods of spatial allocation are considered here. The first method may be called discrete allocation. Using this method, one first overlays the facility point coverage over the grid cell coverage and then computes for each zone the fraction (weight) for allocation. Let aj be the value of attribute of interest of facility point j (j=1, ,M), and Wi be the weight for cell i (i=1, ,N), then
Wi = Ci / T
where Ci = Sum(aj ) and T
= Sum(Ci ).
The discrete allocation method treats the pollution area as a
collection of discrete grid cells. The operation is simple but
won't produce a smooth pollution surface. If the attribute value
used for allocation is measured at the interval or ratio scale,
one may first use the value to interpolate a surface, and then
overlays the surface to the grid cell layer to estimate the allocation
weight for each cell. The second method, therefore, may be called
the surface allocation method. The surface of interest can be
generated using the Triangulated Irregular Network (TIN) data
model (Peuker et. al., 1978; Esri, 1994). The TIN is a surface
model that uses a sheet of continuous, connected triangular facets
based on a Delaunay triangulation of irregularly spaced sample
points. Using either method, one transforms point objects (emission-producing
facilities) into areal units (emission allocation weights for
grid cells).
Statistical methods can be used to analyze and predict the spatial
variations of pollution. Here, the purpose of building statistical
models is to predict the spatial distributions of the emission-producing
activity. This is done by correlating the activity with relevant
data. The selection of models depends the measurement of the prediction
variable of interest and the availability of data. A simple predictive
model can be made by assuming that the emission location follows
the Poisson distribution. The Poisson model is a spatial point
process model and has been widely used for spatial analysis.
Let yi be the number of emission producing facilities
in a grid cell. The model assumes that each yi
is drawn from a Poisson distribution with parameter hi
, which is related to the regressors, xi. The
Poisson model is
Pr(Yi = yi) = (exp(-hi)
hi yi) / yi!
yi = 0, 1, 2,
The most common formula for hi is
ln hi = b'xi
It can be easily shown that
E[yi | xi] = Var [yi
| xi] = hi = exp(b'xi)
where b is a vector of regression coefficients and xi
are the explanatory variables. Thus, the Poisson model is a nonlinear
regression. Denote b* as the estimated values
of b and yi* as the prediction,
then
yi* = exp(b*'xi)
The parameters of the Poisson regression model can be estimated
using the maximum likelihood method. The log-likelihood function
is
ln L = Sum ( -hi + yi b'xi
- lnyi!)
The likelihood equations are
dlnL/db = Sum ( -hi + yi
) xi = 0
The Hessian is
d2 lnL/db db' = - Sum( hi
xi xi').
The application is spatial allocation of automobile refinishing
emissions in the Sacramento modeling region in the State of California.
The study area consists of three counties, Sacramento, Solano,
and Yolo. Automobile refinishing is one of the emission inventory
source categories defined by California Air Resources Board (CARB,
1995). The main pollutants from auto refinishing are the total
organic gas (TOG) emissions, which are due to the solvents contained
in auto refinishing products such as refinish paints, snamels
and lacquers, refinish primers and undercoatings. The spatial
distribution of auto refinishing emissions in this area as well
as development of new methodology for area source emission allocation
are of interest to the state and local agencies working on the
emission inventory and attainment plan.
Auto refinishing emissions have been traditionally estimated at
the national level and then allocated to the states and counties.
In this study, we use the GIS methods discussed in the previous
section to allocate the emission estimates to sub-county units,
specifically, 4 km by 4 km model grid squares. Furthermore, by
integrated use of GIS and statistical analysis tools, we build
a predictive statistical model for emission allocation. Specifically,
Poisson regression models are estimated by correlating the emission
activity with data that are widely available such as urban land
use, major highways, and population density.
To collect data for the study, we consulted and searched a variety
of data sources including federal, state, and local government
agencies, libraries, the Internet and online databases, and data
vendors. The names and addresses of auto refinishing businesses
are listed on local phone books. This information is also available
from commercial business lists providers and the online Yellow
Pages. We used following data as spatial surrogates for allocation:
urban land use, major roads, population and employment densities.
These data were selected because they are widely available and
can be routinely obtained throughout the state. The land use data
was obtained from the California Department of Water Resources
(DWR). The DWR data contain fairly detailed land use information
and is based on recent land use surveys. The years of surveys
for the three counties range from 1989 to 1994. The land use data
came in different formats. The data for Solano and Yolo counties
were in AutoCAD format, while the data for Sacramento county was
in Intergraph format. The data were converted into ArcInfo format
using a set of AML programs.
The street and highway GIS data was created based on the 1994
TIGER files, which are available from the U.S. Census Bureau and
could be downloaded directly from the World Wide Web. We converted
the TIGER files into ArcInfo format and built them into address
coverages. The 1990 population census data were from two sources.
The census tract boundary files in ArcInfo format and with population
density attributes were obtained from the Teale Data Center in
Sacramento. The employment data were copied from the 1990 census
(Summary Tape File 3) CD-ROM, which are available in many public
libraries. The employment data were added as attributes to the
INFO files of the census tract coverages.
Since the spatial data sets obtained from different sources were
in different geographic projections, it was necessary to reproject
them into a common geographic coordinate system. We selected UTM
(Zone 10) as the standard projection and reprojected coverages
that were in non-UTM systems. The map layers were overlaid on
the computer screen and visually compared to find any discrepancies.
The layers were edited, shifted, or reprojected if necessary.
The locations of auto refinishing facilities were geo-coded to
create a point data layer. This was accomplished by matching the
addresses of the facilities with the ones in the street data layer.
Since the TIGER files contained incomplete addresses and errors,
detailed street maps of the local areas were used for references
and check-ups.
To transfer the attribute values in point, line, and polygon layers
to grid cells for allocation and modeling, spatial overlay operations
were carried out. The overlaying grid cell layer was created and
built into a polygon coverage. To count the number of auto refinishing
facilities that fall within a cell, a point in polygon operation
was performed. A line in polygon operation was needed to compute
the length of highways within a grid cell. To obtain population
density and employment density values for the grid cells, polygon
overlays were necessary. In this operation, the census tract polygons
were first broken into smaller areas that fell completely within
overlaying grid cells, and then the areas within each cell were
aggregated. The density value for each cell was calculated using
the areal weighting method described in section 2.2. We wrote
a set of AML programs to automate the computation processes.
The spatial allocation factor for each grid cell was computed
using the discrete allocation method presented in Section 2.2
due to its simplicity. Map 1 shows the spatial distribution of
auto refinishing facilities in the study area with relation to
urban land use and major highways. Map 2 displays the results
of the allocation. The maps clearly show that the emission activities
are clustered in urban areas and along major highways. Most of
the auto refinishing businesses are located in the Greater Sacramento
area, which is the dominant urban center in the region. In Solano
county, the emission sources are concentrated in cities along
the interstate highway I-80, Vallejo, Fairfield, and Vacaville.
Yolo county is the most sparsely populated area in the region,
and has much fewer auto refinishing facilities, most of which
are found in Woodland. Map 2 shows that auto refinishing activities
are not only concentrated in urban areas but also distributed
unevenly within the urban areas.
The objective was to predict auto refinishing activity based on
data that are widely available and can be updated on a regular
basis, facilitating auto refinishing emission inventory. The Poisson
regression model was used to estimate the number of auto refinishing
facilities in a grid cell. The variables used in the Poisson regression
model are defined in Table 1.
Using the data on these variables collected from the study area,
we estimated a number of Poisson regression models. The preliminary
results suggested that the model with all five predictors fit
the data best. The estimation results of the model are presented
in Table 2. All coefficients of the model are statistically significantly
different from zero. The coefficients for URBAN, HWAY, PDEN, RDEN
have positive signs, indicating positive relationships between
these variables and the prediction variable. This is consistent
with our prior expectation that auto refinishing activities tend
to occur in areas with higher percentage of urban land use, higher
population density and retail employment density, and better highway
accessibility. In other words, the model results reflect the fact
that auto refinishing businesses tend to be located near where
the demand is. Using the model, we are able to predict the new
level of auto refinishing activity in the grid cells given new
data on these predictor variables.
-------------------------------------------------------------------------------------------------------
Dependent Variable
FACILITY Number of auto refinishing facilities
Predictors
URBAN Percentage of urban land use
HWY Miles of highway
PDEN Population density (1000 persons/square miles)
RDEN Retail employment density (1000 employees/squrare miles)
PR pden times rden
-------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------
Variable Coefficient Std. Err. Asy. t ratio P|t|
URBAN 0.0407 0.0048 8.567 0.000
HWY 0.1467 0.0203 7.214 0.000
PDEN 0.4091 0.1098 3.727 0.000
RDEN 3.1394 1.3505 2.325 0.021
PR -1.2994 0.2231 -5.825 0.000
Cons_ -2.7181 0.1528 -17.789 0.000
Summary Statistics
Number of observations = 545
Model chi2(5) = 1198
Pseudo r2 = 0.61
-----------------------------------------------------------------------------------------------------
In this paper we have described a new methodology for allocating
certain area source emissions estimates to sub-county units (grid
cells). These emission sources are individually points in space
and emitting relatively small quantities of pollutants but collectively
result in significant emissions. Since the activities associated
with the emissions can be referenced to specific points in space,
a GIS with powerful geocoding and spatial overlay capabilities
provides an ideal tool for allocating these emissions. The combination
of GIS and statistical modeling makes it possible to correlate
emission activity with data on variables related to the activity
and to predict the change of the activity over time and space.
In implementation and application, data availability is often
an important consideration. One of the major constraints in emission
modeling and indeed in many GIS applications is the lack of specialized
data for study purposes. The application presented in this paper
is designed to allocate emissions estimates to grid cells based
on widely available data, so that it can be easily applied to
other regions. It has the merit of simplicity and low cost. However,
the accuracy of the emission estimate is uncertain. Emissions
of small areas can be directly estimated from the individual facilities
that emit the emissions if emission specific information at the
facility level is available. This bottom-up approach might be
able to produce better estimates of area source emissions, and
can provide data to evaluate the accuracy of the spatial allocation
estimates. The methodology described in this paper is readily
to be applied to this approach. The collection of detailed information
at the facility level, however, involves costly surveys and requires
the cooperation of the managers of the facilities.
The GIS is an indispensable tool for air quality analysis and
management. In addition to its powerful capabilities in spatial
database development, spatial data processing, managing and modeling,
it provides visualization and map-making tools that can be used
to effectively present the spatial variability of emissions. The
use of GIS can improve not only the analytical capabilities for
emission inventory and air pollution management but also our ability
to communicate work results and research findings to the decision
makers and the public as general.
This paper is based on work supported by the California Air Resources
Board. Any opinion, findings and conclusions or recommendations
expressed in this material are those of the authors and do not
necessarily reflect the views of the California Air Resources
Board. Mention of trade names or commercial products does not
constitute endorsement or recommendation for use. The authors
wish to thank Carmen Mayo, Pingkuan Di and Rhonda Boughtin for
assistance in preparing the data for application.
Bachman, W., W. Sarasua, R. Guensler, 1996. A GIS Framework for
Mobile Emissions Modeling. Washington D.C.: Transportation Research
Board.
Bachman, W., et al, 1996. Integrating Travel Demand Forecasting
Models with GIS to Estimate Hot Stabilized Mobile Source Emissions.
In: Geographic Information Systems for Transportation (GIS-T)
Symposium. Proceedings of the 1996 Geographic Information Systems
for Transportation (GIS-T) Symposium. Washington, D. C.: American
Association of State Highway and Transportation Officials.
California Air Resources Board, 1995. Emission Inventory Procedural
Manual, Volume III, Methods for Assessing Area Source Emissions.
California Environmental Protection Agency.
Environmental Systems Research Institute, 1994. Surface Modeling
with TIN. Redland, California.
Flowerdew, R., and M. Green, 1994. Areal Interpolation and Types
of Data. In S. Fotheringham and P. Rogerson (eds.), Spatial Analysis
and GIS. Bristol, PA: Taylor & Francis Inc.
Goodchild, M.F., et al., eds, 1996. GIS and Environmental Modeling:
Progress and Research Issues. Fort Collins, CO: GIS World Books.
Goodchild, M. F., B. O. Parks, and L. T. Steyaert, 1993. Environmental
Modeling with GIS. New York: Oxford University Press.
Green, M., 1990. Statistical Methods for Areal Interpolation,
In J. Harts, H. F. L. Ottens and H. J. Scholten (eds.), EGIS '90:
Proceedings of the First European Conference on Geographic Information
Systems, EGIS Foundation, Utrecht, The Netherlands, 1, pp. 392-399.
Hallmark, S., W. O'Neill, 1995. Integrating Air Quality Analysis
and GIS-T. In: Geographic Information Systems for Transportation
(GIS-T) Symposium. Proceedings of the 1995 Geographic Information
Systems for Transportation (GIS-T) Symposium. Washington, D. C.:
American Association of State Highway and Transportation Officials.
Jensen, J. J., S. K. Sathisan. 1996. GIS Applications for Linking
Travel Demand Modeling and Air Quality Analysis. In: Geographic
Information Systems for Transportation (GIS-T) Symposium. Proceedings
of the 1996 Geographic Information Systems for Transportation
(GIS-T) Symposium. Washington, D. C.: American Association of
State Highway and Transportation Officials.
Peuker, T. K., R. J. Fowler, J. J. Little, D. M. Mark, 1978. The
triangulated irregular network. In: Proceedings of the DTM Symposium,
American Society of Photogrammetry-American Congress on Survey
and Mapping.
Shimp, D. R. and S. G. Campbell, 1995. Using a Geographic Information
System to Evaluate PM10 Area Source Emissions. California
Air Resources Board, P.O. Box 2815, Sacramento, CA 95812.
Sarasua, W. et al., 1996. Using a Dynamic GIS to Visualize and
Analyze Mobile Source Emissions. In: Geographic Information Systems
for Transportation (GIS-T) Symposium. Proceedings of the 1996
Geographic Information Systems for Transportation (GIS-T) Symposium.
Washington, D. C.: American Association of State Highway and Transportation
Officials.
Souleyrette, R. R., et al., 1992. GIS for Transportation and Air
Quality Analysis. In: Transportation Planning and Air Quality.
New York, N.Y.: American Society of Civil Engineers.
U.S. EPA, 1995. National Conference on Environmental Problem-solving
with Geographic Information Systems, September 21-23: Cincinnati,
Ohio. U.S. Environmental Protection Agency, Office of Research
and Development.
Jian Dai and David M. Rocke
Graduate School of Management
University of California
Davis, CA 95611