David Wong and Wing Chong

ArcInfo AND S-PLUS FOR SEGREGATION ANALYSIS

ArcInfo, as compared to other GIS, is relatively rich in its analytical functions. It has built-in modules for advanced spatial models, such as the spatial interaction model and location-allocation model. These models incorporated in ArcInfo utilize the pertinent geographic information for spatial analysis and modeling, such as topology, stored in ArcInfo. However, various types of geographic information cannot be used flexibility inside ArcInfo for other spatial analytical techniques and modeling. This paper focuses on a segregation study using spatial indices which cannot be calculated without GIS and the spatial information stored in ArcInfo. Various types of spatial information are extracted from ArcInfo and are transferred to S-Plus, a very sophisticated statistical package. Using various data manipulation functions and procedures in S-Plus, different types of spatial information are then combined with attribute data to calculate a family of spatial segregation indices.


Introduction

One of the major functions of GIS is for analysis, in particular spatial analysis. Most GIS do possess general spatial analysis functions, such as buffering, overlay. Some also regard spatial query as a spatial analytical function. ArcInfo includes some advanced spatial analytical tools and models, such as spatial interaction models and network analysis. However, the potential of GIS in spatial analysis and modeling should go beyond rudimentary functions and built-in spatial models. GIS should facilitate the development of new spatial tools and models. It is assumed that GIS, such as ArcInfo store spatial information which are pertinent to spatial modeling and analysis, and hence, GIS can be a very powerful vehicle to facilitate the development of spatial models. But some types of spatial information are not easily available in ArcInfo. In addition, spatial models and analysis usually require a flexible environment for mathematical manipulations. ArcInfo does not provide this ideal environment.

In this paper, we use a segregation study as an example to demonstrate that certain types of spatial information crucial to many spatial models and analysis can be extracted from ArcInfo, but the mathematical computation still have to rely on external tools. We chose S-Plus in this example because it is very powerful, and it has GisLink TM, which is a data bridge between ArcInfo and S- Plus. Thus, data obtained in ArcInfo can be transferred to S-Plus for mathematical manipulation easily.

Segregation Indices

Since the introduction of the segregation index or the index of dissimilarity, D , by Duncan and Duncan (1955), numerous studies have used the index to reflect level of segregation. However, as mentioned by several geographers (Morrill 1991, Wong 1993), the index fails to distinguish various patterns of population distribution, and thus, does not capture the level of segregation very well. To overcome this major deficiency in D, Morrill (1991) proposes a modified version of the index, which includes a spatial interaction component in it. The modified index is defined as D(adj), where cij is the ith row and jth column of a binary connectivity matrix, and zi is the proportion of minority population in areal i. Thus, zi - z j is the difference of black population across zones in the traditional two-group situation. It also reflects the difference in the spatial concentration of minority population. The element of the connectivity is one when areal units i and j are neighbors, and zero otherwise.

Wong (1993) further enhances this modified D by including more spatial information in the index formulation. He proposes that not just adjacency, but also the length of common boundary can affect the intensity of spatial interaction among population groups across zonal boundary, and hence affect the level of segregation. Therefore, he suggests another index, D(w) , where dij is the length of the common boundary between areal units i and j.

Wong further argues that another major factor affecting the interaction of different population groups across zonal boundary is the size and the length of the area. These geometric characteristics reflect individual's accessibility to the zonal boundary , and hence be able to interact with population in the neighboring areal units. Therefore, he proposes another version of D, which is defined as D(s) , where Pi and Ai are the perimeter and area of unit i, respectively. However, Pi excludes the sides of i overlapping with the boundary of the entire study region because these sides do not facilitate interaction of different ethnic groups. The MAX (P/A) is the maximum perimeter-area ratio that can be found in whole studyarea. This ratio is served as a benchmark for comparing segregation among spatial configurations of different sizes.

Spatial Information for Segregation Measures

To implement these spatial segregation measures, GIS are indispensable tools because these measures utilizes many types of spatial information stored in ArcInfo, and some other GIS. As in many techniques in spatial statistics and spatial analysis, especially in spatial autocorrelation studies (Anselin 1988, Griffith 1988), building the adjacency information is the first step. The latter three indices of spatial segregation utilize this information. The adjacency information, stored as polygon-left and polygon- right, are found in the AAT file in ArcInfo. The length of common boundary is also recorded in the AAT file. The AAT file has to be combined with information from the PAT file to build the adjacency matrix. An AML is used to extract relevant information from the AAT file and to combine information from the AAT file and the PAT file.

For the index D(s), the formulation needs the perimeter and area of each areal unit. These geometric variables are in the PAT file. However, the perimeter in D(s) excludes the sides that are shared with the outer polygon. Therefore, the perimeter information in the PAT file cannot be used directly. Instead an AML is written to calculate the perimeter of each areal unit without counting the sides adjacent to the outer polygon.

Implementing Spatial Segregation Measures

ArcInfo and S-Plus provide the two major software environments. The S-Plus GisLink TM is a data bridge which allows users to move data between ArcInfo and S-Plus easily. However, this study does not require the display of the modeling results and results do not need to be moved back to ArcInfo. Therefore, after different types of pertinent information are extracted in the ArcInfo environment, they are moved into S-Plus through the GisLink for mathematical manipulation. Figure 1 describes the operations in both the ArcInfo and S- Plus environments.

In ArcInfo, we create the census geography coverage from TIGER/Line files. Then we import population statistics from Summary Tape Files (STF) 3A into the PAT file of the coverage. We use an AML written by Dodson (Anselin, Hudak, and Dodson, 1992) to extract the adjacency information and the spatial weights matrix using the length of common boundary as the weights. Two other AMLs are used to obtain the perimeter without the outer boundary for each polygon. This modified perimeter measure is put back to the PAT file. The maximum perimeter-area ratio indicating the maximum compactness is also calculated. The PAT file and the spatial weight matrix are moved to S-Plus using the GisLink.

In S-Plus, pertinent items in the PAT object, such as the population count of each areal unit in each group, are scanned in as individual vectors. The record for the outer polygon is also removed. Through a series of vector and matrix operations, we derive D(adj), D(w), and D(s).

Case Study

In this study, we select two areas: Washington, DC, and the state of Connecticut. These two areas are different in scale, but can demonstrate that the indices can be implemented in the same manner. There are eight counties in Connecticut. We treat each county as an individual region. We calculate the classical D dissimilarity index, and the three spatial indices for each county and DC at the census tract level. The results are reported in Figure 2.

Summary

In this paper, we argue that many types of spatial information stored in GIS, especially in ArcInfo, are critical to many spatial analytical techniques and modeling. Extracting these data from ArcInfo is the first important step. However, spatial analysis and modeling also require a flexible mathematical manipulation environment. Most GIS, including ArcInfo, fail to provide such an environment. Thus S-Plus is used in this paper. We demonstrate that implementing spatial segregation measures has to combine the spatial information from ArcInfo and the powerful mathematical modeling tools in S-Plus.

References

Anselin, L., 1988, Spatial Econometrics: Methods and Models. Kluwer Academic Publishers.

Anselin, L., Hudak, S., and Dodson, R., 1992, Spatial data analysis and GIS: Interfacing GIS and econometric software. National Center for Geographic Information Analysis, University of California, Santa Barbara, CA.

Duncan, D., and Duncan, B., 1955, "A methodological analysis of segregation indexes", American Sociological Review 20, 210-17.

Griffith, D.A., 1988, Advanced Spatial Statistics. Kluwer Academic Publishers.

Morrill, R. L., 1991, "On the measure of geographic segregation", Geography Research Forum, 11, 25- 36.

Wong, D.S. W., 1993, "Spatial Indices of Segregation", Urban Studies, 30(3), 559-572.


David Wong and Wing Chong
Dept. Geography & Earth Systems Science
George Mason University
Fairfax, VA 22030
Telephone: (703) 993-1212
Fax: (703) 993-1216
E-mail: dwong2@gmu.edu