Pau Serra and Derek Thompson

Count Data Set Transformations for Demand Analysis in ArcInfo


GIS embedded transportation models, despite their increasing sophistication, still have to overcome some challenges, such as the integration of data types, point and polygons, in their data structure, based upon lines and nodes. The use of Thiessen polygon tessellation provides a solving method for aggregation and transformation of a number of data types.


An appreciation for transit demand analysis in urban areas requires an understanding not only of what factors influence the demand for public transportation services but also some of the practicalities inherent in empirical studies. Our efforts to teach undergraduate students some basic principles about travel and transportation through data analysis using ArcInfo and ArcView software have led us to identify some interesting practical matters which get in the way of an easy path to understanding.

We do not intend here to give an account of the different ways transportation and land use models get connected in or with a GIS environment(cfr. BRAIL, 1990; FISCHER y NIJKAMP, 1993; BATTY y XIE, 1994; BAO et alia, 1995; WEGENER, 1995). We rather focus on the preliminary tasks required to set up consistent transportation modelling in ArcInfo.

We have undertaken some experiments to establish some of the impacts of data availability and the data types and algorithms available in the software. We aim to demonstrate the impact of spatial data type and the necessary manipulations to select areas for analysis, obtain statistical summaries and measurements, and undertake spatial interaction modelling. After presenting a scheme of possibilities, the paper will give examples of experimental results for selected tasks: (1) the assignment of aggregates for polygons to their bounding arcs, and (2) the assignment of scalar quantities from one set of polygons to a different set. The results from the exercise of different procedures or assumptions cast light on the pitfalls of using data of certain types, e.g. polygon counts when demand is needed for arcs, and the limitations of the underlying data models. In this way students can obtain an appreciation for the realities of data analysis and the sensitivities of model algorithms to the properties of input data.

The procedures here have been designed to carry out data model transformations in order to make the data ready for realistic distance calculations according to the ArcInfo Network module method. These data model transformations may be grouped as follows:

1. one-to-one transformations:

1.1. from polygons to their corresponding centroid points (and vice versa)

1.2. location of points into a network as nodes

In both transformations, the one-to-one relationship must be preserved.

2. one-to-many and many-to-one transformations, say disaggregation and aggregation

2.1. disaggregation from polygons to some set of points (and the inverse way, aggregation)

2.2. disaggregation from polygons to another set of polygons (and the inverse way, aggregation)

2.3. disaggregation from polygons to their bounding nodes as points (and the inverse way, aggregation)

2.4. disaggregation from polygons to their bounding arcs (and the inverse way, aggregation)

2.5. disaggregation from points to another set of points (and the inverse way, aggregation)

2.6. disaggregation from arcs to their bounding nodes as points (and the inverse way, aggregation)

Demand data type transformations

A common spatial data type that carries counts for demand data is the polygon. Census tracts and blocks, transportation analysis zones and the like are based upon areal features or polygons, in ArcInfo jargon. Other features frequently used in transportation analysis are home locations featured by points, such as Home Interview Surveys. ArcInfo provides commands for demand data analysis and spatial interaction models which utilize either polygons or points. Nevertheless, distance calculations for polygons and points features can not be done through realistic network lines featuring roads and highways, but only using Euclidean or Manhattan dyad calculations.

Calculations along the network constitute a more precise way of performing transportation analysis, especially in densely urbanised areas. Some of the ArcInfo software spatial interaction calculations along the network (the main commands are locateallocate, nodedistance, interactions, accessibility, which have also the polygon and point Euclidean and Manhattan distance calculations) assume that demand data are on the nodes of the network line. However, other commands more related with travel modelling (tour, path and allocate) assume that the demand data lay in the arc or line features. In spite of these assumptions (the fact that most common demand data are not held in nodes nor arcs, but in polygons or points), ArcInfo does not provide straightforward tools for assigning point or polygon demand data to nodes nor to arcs. Consequently, involving and time-consuming processes are needed to carry out data set preparation.

From the standpoint of the demand data, it is necessary to ensure that all the original features, whether they might be polygons or points, get converted or integrated properly in the network line as nodes. Some procedures may seem to work, such as using "near" command or snap the points in the network line. However, in some cases (such as regularly gridded urban areas) the one-to-one relationship "one point to one node" may be compromised, resulting in a loss of some demand data. In the paper we will show one procedure that takes care of the one-to-one relationship in the transformation from point features to node (in a line) features. This procedure uses the ArcInfo "link" features to infallibly relate the network nodes to the point features (or "centers") that hold the demand data.

Still, this method introduces a distance distortion with the added links to the network line. Likewise, the more traditional method that assumes that a polygon centroid or point may just be moved to the nearest location in the network line introduce a distance distortion as well. However the effects this assumption may have in the measurement of distance between all node dyads is not shown. The paper will show a different method, a disaggregation method based upon "Thiessen polygons" (proximal or catchment areas made out of point features) that, lessens the distortions in the distance measurements.

Polygon-to-polygons and polygon-to-arcs demand data transfer

It may be necessary to do some preliminary data preparation, such as a one-to-many transformation from polygon to polygon features. For instance, it may be advisable to assign population data from census block polygons to residential land use polygons, or to aggregate census blocks into tracts or into transportation. Another similar kind of transformation would be assigning demand data from polygon to arc features. In the first case polygon-to-polygons data transfer, Euclidean and Manhattan distances calculations may be done. In the second, after the polygon-to-arcs data transfer, travel demand computations may be performed.

The procedure in both cases is as follows: assuming the population is distributed evenly in the polygon features, the data assignment is based upon the area share of the to-polygon on the from-polygon area, or the length share of the to-arc on the from-polygon perimeter. The first step would be an overlay operation with the two from-to coverages, union command in the first case, identity in the second. Then, through relates (since both coverages share a common unique ID item coming from the overlay coverage), the area share or length share may be calculated. The "share" factor is then multiplied by the from-coverage demand data item (resident in the overlaid or to-coverage, since it has been moved in the overlay operation). This calculation sets up the to-coverage demand item.

Polygon-to-nodes and arcs-to-nodes demand data transfer

These two transformations consist on demand data assignment from either polygon or arc features to their corresponding nodes. Therefore, the two of them are based upon, again, on one-to-many relationship transfer. A straight overlay operation can not be done between polygons and nodes nor between arcs and nodes, so it is necessary to find another way to transfer the demand data from polygons and arcs to their nodes.

One way may be setting up some sort of nodes' "areas of influence". This is accomplished via Thiessen polygons tessellation out of the nodes themselves. Therefore, the nodes become centroids of the Thiessen polygons. Then, an overlay analysis is performed using the "Thiessen coverage" and the polygon coverage (union command) or the line coverage (identity command). Then, calculate the area or length share through a relate (just like in the previous data transfer task) from the original coverage (either polygon or arc) to the overlaid one. Then an INFO file is created from the overlaid coverage with the summary (statistics command) of the demand data for the features, either polygons or lines, which have the same "Thiessen coverage" ID. At this time, a joinitem command is run so that every of the "Thiessen coverage" polygons get the summarised demand data.

Once the centroids of these polygons become points, keeping the polygon items, the result is similar to having the nodes of the original features (polygons or arcs) with the demand data on their nodes. Therefore, the point coverage may be used as if it contained the polygon or arc nodes with their estimated demand data.

Point-to-(network integrated) node transformation tasks

The necessary procedures to perform spatial interaction models using network distance calculations are shown here. The two basic features that intervene on it are a network line and a point coverage of locations with demand data.

In this section it is assumed that original geometry features containing demand data other than points, say polygons or arcs, have already been transformed into point features. These transformations may be summarized as follows:

a. polygon centroids to points

b. polygon nodes to points

c. arc nodes to points

a. consists on a simple transformation, and may be performed in ARCEDIT just creating a point coverage out of the polygon labels.

b. and c. transformations have just been explained in the previous section.

Spatial interaction using real network distances (in ArcInfo) assumes that there is a point coverage ("centers" coverage in ArcInfo Network) with a demand item and a network line. The network line has to have a Node Attribute Table (a NAT INFO file). Only the node records in the NAT which have the center-ID will be accountable for the network calculations. The calculations use the center-ID in the network nodes to relate it with the coverage and use the demand data from the centers. There has to be a one-to-one relationship between the line coverage nodes and the centers coverage. In certain urban environments may be enough to just run the "near" command using the center (point coverage) and the network (line) nodes.

The Euclidean distance from every center to its corresponding closest node in the network is computed with the "near" command in the Arc level. An Euclidean distance item and the center# internal ID are added to the line nodes NAT table after performing the "near" command. The center# item is used as a relate item so that the centers get the node-ID item. An inverse relate is set up from centers to nodes, using again the center# to transfer the demand data. In a few words, the ArcInfo Network module needs the centers to have the node-ID and the nodes to hold the centers demand data.

However in some urban environments with a Manhattan-like or gridded pattern of streets, the centers (actually, polygon centroids) may have same distances with a certain node-s. Thus, the necessary one-to-one relationship may be compromised if "near" command is used carelessly.

One solution may be to set up link features (in ARCEDIT) going from the centers to the network line coverage arc segments. Then the link features have to be converted into line features and added to the network coverage. Links between centroids and the closest arc are established with the command autolink, after being set up the "snapping" environment, in an ARCEDIT session. Unfortunately, however, ArcInfo does not provide a direct procedure to convert links to arcs. So three steps need to be performed. Firstly, the "ungenerate" Arc command creates a file with a string of the "x" and "y" coordinates of the links. Secondly, alter the format of the "ungenerated" file, using for example an awk script. Thirdly, the "generate" Arc command converts the transformed file into a line coverage.

The "link-line" coverage becomes one coverage with the network coverage using "identity" command (or "get" in ARCEDIT). Once this is accomplished, the centers overlap with the line coverage dangle nodes. At this point, the one-to-one relationship can not be compromised, so that "near" may be performed without risk of unexpected results. The subsequent "joinitem" command can also be performed.

From now on, the network environment can be set up inARCPLOT, using the commands netcover, demand and impedance (among others)and the run accessibility, nodedistance, locateallocate or interactions.


References

BAO, Shuming, HENRY, Mark y BARKLEY, David (1995); "RAS: A Regional Analysis System Integrated with ArcInfo"; Computers, Environment and Urban Systems; Vol. 19, No. 1, pp. 37-56.

BATTY, Michael y XIE, Yichun (1994); "Modelling inside GIS: Part 1. Model structures, exploratory spatial data analysis and aggregation"; International Journal of Geographical Information Systems; vol. 8, no. 3; pp. 291-307.

BRAIL, R. K. (1990); "Integrating urban information systems and spatial models"; Environment and Planning B: Planning and Design; Vol. 17, pp. 417-427.

FISCHER, M.M. y NIJKAMP, P. (1993); Geographic information systems, spatial modelling, and policy evaluation; Springer-Verlag; Berlin, Heidelberg, New York.

MARTIN, David (1991); Geographic Information Systems and Their Socioeconomic Applications; Routledge, London, New York.

WEGENER, Michael (1995); "Current and Future Land Use Models"; Land Use Model Conference proceedings; Texas Transportation Institute, Dallas.


Appendix

Here is an example of a ungenerated file from a link feature in a coverage:

1.0,2.3

3.1,4.6

2.9,3.5

5.6,8.9

Every link is made up by a straight line. Consequently, every two records of the former list should make up also a straight line, so as

1.0,2.3 means 'x' and 'y' of a extreme of a link

3.1,4.6 means 'x' and 'y' of the other extreme

Both extremes will become the nodes of an arc when the procedure will finish.

An Awk script (written by Don Jarvinen, Ma., from the University of Maryland Department of Geography) will transform the ungenerate file format. The Awk script is as follows:

BEGIN {

record = 3

count = 1

{print "1"}

}

{

x = $0

if (x == "end")

{{ print x }}

else

{ if (count == 0)

{{ print $0 }

{ print "end" }

y = record / 2

{ print record / 2 }}

else

{{ print $0 }}

}

record = record + 1

count = 1 - count

}

The new format has to have this appearance so that it may be read by the "ungenerate" Arc command generate file back to ArcInfo.

1.0,2.3

3.1,4.6

end

2.9,3.5

5.6,8.9

end

end


Pau Serra
Departament de Geografia Humana
Divisio I
Universitat de Barcelona
Cr. Baldiri i Reixac s/n
08028-Barcelona
Spain

Telephone: 34-3-4409200
Fax: 34-3-4498510
email: pauserra@trivium.gh.ub.es