An Approach to Large Area Pin-Map Problems (paper#0797)

A dissolve function was then performed on the coverage produced by the union to simplify the coverage. The resultant coverage was highly fragmented and contained just over three million polygons, ranging in size from 0.0005 km² to 62.3 km² The mean and standard deviation of the polygonal areas were 0.04 km² and 0.14 m² respectively. The largest polygons were water reservoirs. The maximum possible weights were -0.66 and +2.49, while the actual range was -0.16 to +2.45.

The final step in producing the prediction layer in vector space was to relate the polygon weights resulting from the union of the rules to the likelihood of finding the pin-map data of interest. The weights were based upon a incidence-to-random ratio and do not represent probabilities. This was done by taking the historical data and an equal set of generated random points and performing a spatial join to the prediction layer with each of them. The weight of the polygons that both sets of points fell on were then ranked and graphed on a quantile plot. The range was determined by the separation of values as is shown below in Figure 1b.

Figure 2 - A segment of the Statistical Approach 50 m raster solution coverage illustrating its high degree of detail.

Raster Computation

The sixteen vector coverages representing the rules were rasterized to a 50-meter grid and the same weighted union performed using the GRID function of Arc Workstation. The union took approximately 2 hours of computer time to complete and resulted in a 300 MB file. The resulting raster solution, as is illustrated above in Figure 2, was just as detailed as it's vector-produced counterpart.

In view of these difficulties it is clear that computations of this magnitude should always be performed with raster formatted data, unless there is a specific need for the attributes associated with vector data. In the present investigation, the topology of vector computation were of little value in interpreting the final results and were not warranted.

Evaluation vs. Random

One of the key points in the testing and evaluation was a benchmark to measure how good of a predictor it is. Since there was no set point-of-reference, performance was gauged against random chance (ie - would the solution be the same as if darts were thrown at the map).

After the final solution layer was generated via grid and vectorized, the 81 data points withheld from the development process were overlayed and an equal number of random points were generated. The prediction coverage surpassed random chance by a factor of 4 to 1.

Summary and Conclusions

Geospatial expert systems based upon multiple weak rules can have significant predictive ability. For example, this investigation identified areas in Mississippi that were four times more likely than random chance of the modeled parameter occurring. This is considered to be our most significant finding. This, together with using the neural net, comprise a process that is used to develop the expert system rule set which is felt to be directly applicable to virtually any type of pin-map data.

There is often a negative connotation associated with the use of neural networks, because the intermediate steps in their solution process cannot be ascertained. However, the current investigation found a neural network to be an extremely valuable analysis tool. Since most investigations are limited by the availability of data, all available, relevant GIS data was input to a neural network and a classification analysis performed. The results, in combination with the information obtained by interviewing experienced law officers, were used to formulate a set of rule hypotheses. The rule hypotheses were then individually tested against random chance, and sixteen of them were identified as statistically significant predictors. These surviving hypotheses were assigned weights on the basis of their performance against random chance and combined in a GIS by a weighted union to produce a map (solution coverage) predicting areas in the state that were likely to exhibit the phenomena modeled. The predictive ability of the solution coverage was then compared in a blind test against data after its development. The statistically verified rules, when combined by a weighted union, outperformed both a neural network prediction and the traditional approach, which consists of going to where they found it before and using pure instinct, used by law enforcement.

The topology associated with vector format coverages made the statistical development of the rule set and solution coverage production by the weighted union extremely labor and computer intensive. In contrast, computation in raster format was computationally economical. The neural network classification analysis was also computationally economical. These factors, and the superior predictive ability of the statistically developed rules, suggest the following steps for the development of a solution coverage from pin map data.

Interviews to determine perceived rules

Acquisition of all available GIS coverages for the study area

Acquisition of historical pin map data

Neural network classification of coverages as predictors to pair down data to relevant rules

Combine interview results and neural network classification to develop rule hypotheses

Statistically evaluate rule hypothesis by comparison to random chance (raster format)

Weighted union of rules (raster format)

Evaluation of the solution coverage against data not used in rule development

Parameter	Buffer/Class	I/R	Area of State (%A)	Weight
Streams	500 m	0.50	50.5	-0.55
Roads	300 m	1.50	45.8	+1.30
Land Use	Agriculture	0.25	20.5	+1.56
Road Density*	2.5-6.5 km^-1	1.25	33.0	+1.20
Clustering**	7500 m	5.07	25.0	+1.40

Abstract

Introduction

Methods

Questionnaires & Interviews

Rule Hypothesis Development Process

Statistical Analysis

Figure 1a - I/R Ratio plot of incidence and random sites. Random is considered where the line crosses the Y-axis at 1.00 - anything greater than that value + 10% is considered likely area and anything less than that value - 10% is considered unlikely area

Vector Computation

*Road Density is a calculation of the length of roads/km² of state area. **Clustering describes the temporal and spatial proximity of some of the historical data (ie - possibly the same offender was active over a number of years).

Figure 1b - An example of a quantile plot where the random points and historical data points were spatially overlaid on a coverage (in this case road density) to determine non-random values.

Figure 2 - A segment of the Statistical Approach 50 m raster solution coverage illustrating its high degree of detail.

Raster Computation

Evaluation vs. Random

Summary and Conclusions

Abstract

Introduction

Methods

Questionnaires & Interviews

Rule Hypothesis Development Process

Statistical Analysis

Figure 1a - I/R Ratio plot of incidence and random sites. Random is considered where the line crosses the Y-axis at 1.00 - anything greater than that value + 10% is considered likely area and anything less than that value - 10% is considered unlikely area

Vector Computation

*Road Density is a calculation of the length of roads/km2 of state area. **Clustering describes the temporal and spatial proximity of some of the historical data (ie - possibly the same offender was active over a number of years).

Figure 1b - An example of a quantile plot where the random points and historical data points were spatially overlaid on a coverage (in this case road density) to determine non-random values.

Figure 2 - A segment of the Statistical Approach 50 m raster solution coverage illustrating its high degree of detail.

Raster Computation

Evaluation vs. Random

Summary and Conclusions

*Road Density is a calculation of the length of roads/km² of state area. **Clustering describes the temporal and spatial proximity of some of the historical data (ie - possibly the same offender was active over a number of years).