David Wong and Wing Chong
ArcInfo AND S-PLUS FOR SEGREGATION ANALYSIS
ArcInfo, as compared to other GIS, is relatively rich in
its analytical functions. It has built-in modules for
advanced spatial models, such as the spatial interaction
model and
location-allocation model. These models incorporated in
ArcInfo utilize the pertinent geographic information for
spatial analysis and modeling, such as topology, stored in
ArcInfo. However, various types of geographic information
cannot be used flexibility inside ArcInfo for other spatial
analytical techniques and modeling. This paper focuses on a
segregation study using spatial indices which cannot be
calculated without GIS and the spatial information stored in
ArcInfo. Various types of spatial information are extracted
from ArcInfo and are transferred to S-Plus, a very
sophisticated statistical package. Using various data
manipulation functions and procedures in S-Plus, different
types of spatial information are then combined with
attribute data to calculate a family of spatial segregation
indices.
Introduction
One of the major functions of GIS is for analysis, in
particular spatial analysis. Most GIS do possess general
spatial analysis functions, such as buffering, overlay. Some
also regard spatial query as a spatial analytical function.
ArcInfo includes some advanced spatial analytical tools and
models, such as spatial interaction models and network
analysis. However, the potential of GIS in spatial analysis
and modeling should go beyond rudimentary functions and
built-in spatial models. GIS should facilitate the
development of new spatial tools and models. It is assumed
that GIS, such as ArcInfo store spatial information which
are pertinent to spatial modeling and analysis, and hence,
GIS can be a very powerful vehicle to facilitate the
development of spatial models. But some types of spatial
information are not easily available in ArcInfo. In
addition, spatial models and
analysis usually require a flexible environment for
mathematical manipulations. ArcInfo does not provide this
ideal environment.
In this paper, we use a segregation study as an example to
demonstrate that certain types of spatial information
crucial to many spatial models and analysis can be extracted
from ArcInfo, but the mathematical computation still have
to rely on external tools. We chose S-Plus in this example
because it is very powerful, and it has GisLink
TM, which is a data bridge between ArcInfo and S-
Plus. Thus, data obtained in ArcInfo can be transferred to
S-Plus for mathematical manipulation easily.
Segregation Indices
Since the introduction of the segregation index or the index
of dissimilarity, D , by Duncan and
Duncan (1955), numerous studies have used the index to
reflect level of segregation. However, as mentioned by
several geographers (Morrill 1991, Wong 1993), the index
fails to distinguish various patterns of population
distribution, and thus, does not capture the level of
segregation very well. To overcome this major deficiency in
D, Morrill (1991) proposes a modified version of the index,
which includes a spatial interaction component in it. The
modified index is defined as
D(adj), where cij is the ith row and jth
column of a binary connectivity matrix, and zi is
the proportion of minority population in areal i. Thus,
zi - z j is the difference of
black population across zones in the traditional two-group
situation. It also reflects the difference in the spatial
concentration of minority population. The element of the
connectivity is one when areal units i and j are neighbors,
and zero otherwise.
Wong (1993) further enhances this modified D by including
more spatial information in the index formulation. He
proposes that not just adjacency, but also the length of
common boundary can affect the intensity of spatial
interaction among population
groups across zonal boundary, and hence affect the level of
segregation. Therefore, he suggests another index, D(w) , where dij is the
length of the common boundary between areal units i and j.
Wong further argues that another major factor affecting the
interaction of different population groups across zonal
boundary is the size and the length of the area. These
geometric characteristics reflect individual's
accessibility to the zonal boundary , and hence be able to
interact with population in the neighboring areal units.
Therefore, he proposes another version of D, which is
defined as D(s) , where
Pi and Ai are the perimeter and area
of unit i, respectively. However, Pi excludes the
sides of i overlapping with the boundary of the entire study
region because these sides do not facilitate interaction of
different ethnic groups. The MAX (P/A) is the maximum
perimeter-area ratio that can be found in whole studyarea.
This ratio is served as a benchmark for comparing
segregation among spatial configurations of different sizes.
Spatial Information for Segregation Measures
To implement these spatial segregation measures, GIS are
indispensable tools because these measures utilizes many
types of spatial information stored in ArcInfo, and some
other GIS. As in many techniques in spatial statistics and
spatial analysis, especially in spatial autocorrelation
studies (Anselin 1988, Griffith 1988), building the
adjacency information is the first step. The latter three
indices of spatial segregation utilize this information. The
adjacency information, stored as polygon-left and polygon-
right, are found in the AAT file in ArcInfo. The length of
common boundary is also recorded in the AAT file. The AAT
file has to be combined with information from the PAT file
to build the adjacency matrix. An AML is used to extract
relevant information from the AAT file and to combine
information from the AAT file and the PAT file.
For the index D(s), the formulation needs the perimeter and
area of each areal unit. These geometric variables are in
the PAT file. However, the perimeter in D(s) excludes the
sides that are shared with the outer polygon. Therefore, the
perimeter information in the PAT file cannot be used
directly. Instead an AML is written to calculate the
perimeter of each areal unit without counting the sides
adjacent to the outer polygon.
Implementing Spatial Segregation Measures
ArcInfo and S-Plus provide the two major software
environments. The S-Plus GisLink TM is a data
bridge which allows users to move data between ArcInfo and
S-Plus easily. However, this study does not require the
display of the modeling results and results do not need to
be moved back to ArcInfo. Therefore, after different types
of pertinent information are extracted in the ArcInfo
environment, they are moved into S-Plus through the GisLink
for mathematical manipulation. Figure 1
describes the operations in both the ArcInfo and S-
Plus environments.
In ArcInfo, we create the census geography coverage from
TIGER/Line files. Then we import population statistics from
Summary Tape Files (STF) 3A into the PAT file of the
coverage. We use an AML written by Dodson (Anselin, Hudak,
and Dodson, 1992) to extract the adjacency information and
the spatial weights matrix using the length of common
boundary as the weights. Two other AMLs are used to obtain
the perimeter without the outer boundary for each polygon.
This modified perimeter measure is put back to the PAT file.
The maximum perimeter-area ratio indicating the maximum
compactness is also calculated. The PAT file and the spatial
weight matrix are moved to S-Plus using the GisLink.
In S-Plus, pertinent items in the PAT object, such as the
population count of each areal unit in each group, are
scanned in as individual vectors. The record for the outer
polygon is also removed. Through a series of vector and
matrix operations, we derive D(adj), D(w), and D(s).
Case Study
In this study, we select two areas: Washington, DC, and the
state of Connecticut. These two areas are different in
scale, but can demonstrate that the indices can be
implemented in the same manner. There are eight counties in
Connecticut. We treat each county as an individual region.
We calculate the classical D dissimilarity index, and the
three spatial indices for each county and DC at the census
tract level. The results are reported in Figure 2.
Summary
In this paper, we argue that many types of spatial
information stored in GIS, especially in ArcInfo, are
critical to many spatial analytical techniques and modeling.
Extracting these data from ArcInfo is the first important
step. However, spatial analysis and modeling also require a
flexible mathematical manipulation environment. Most GIS,
including ArcInfo, fail to provide such an environment.
Thus S-Plus is used in this paper. We demonstrate that
implementing spatial segregation measures has to combine the
spatial information from ArcInfo and the powerful
mathematical modeling tools in S-Plus.
References
Anselin, L., 1988, Spatial Econometrics: Methods and
Models. Kluwer Academic Publishers.
Anselin, L., Hudak, S., and Dodson, R., 1992, Spatial
data analysis and GIS: Interfacing GIS and econometric
software. National Center for Geographic Information
Analysis, University of California, Santa Barbara, CA.
Duncan, D., and Duncan, B., 1955, "A methodological analysis
of segregation indexes", American Sociological Review
20, 210-17.
Griffith, D.A., 1988, Advanced Spatial Statistics.
Kluwer Academic Publishers.
Morrill, R. L., 1991, "On the measure of geographic
segregation", Geography Research Forum, 11, 25-
36.
Wong, D.S. W., 1993, "Spatial Indices of Segregation",
Urban Studies, 30(3), 559-572.
David Wong and Wing Chong
Dept. Geography & Earth Systems Science
George Mason University
Fairfax, VA 22030
Telephone: (703) 993-1212
Fax: (703) 993-1216
E-mail: dwong2@gmu.edu