Smoothing Crime Incident Data: New Methods for Determining the Bandwidth in Kernel Estimation

Smoothing Crime Incident Data: New Methods for Determining the Bandwidth in Kernel Estimation

Doug Williamson, Sara McLafferty, Victor Goldsmith, Hunter College of the City University of New York

Phil McGuire, New York City Police Department

and John Mollenkopf, Graduate School and University Center of the City University of New York

Introduction

Crime analysts are increasingly using GIS to analyze and display geographic concentrations or �hotspots� of crime events. One of the techniques for doing this is kernel estimation or kernel smoothing, a spatial statistical method that generates a map of density values from the point event data. A critical issue in the smoothing process is the selection of a bandwidth size, the radius of the circular window in which smoothing is performed. Most GIS and spatial statistical programs that perform kernel smoothing calculate the bandwidth based on the geographic extent of the point pattern. These estimates for bandwidth do not reflect the geographic distributions of the points within the study area, only their geographic extent. This can result in misleading density values and maps that either too smooth or too spiky in appearance.

We propose a new approach for bandwidth estimation based on k-nearest neighbor distances between points. This method offers an improvement over existing methods because it is based on the spatial relationships among the points and thus reflects the degree of clustering and dispersion in the crime point pattern. We illustrate the use of this method in creating maps of crime density at different scales in Brooklyn, New York.

Data

The data used in the analyses are from the New York City Police Department�s On-Line Complaint System. Each record corresponds to a location (address) at which a crime occurred. For the purpose of these analyses, only major felony incidents were used, as defined by the Federal Bureau of Investigation�s Uniform Complaint Report. We focus on robbery and burglary since these types of crimes are of major interest to the command structure of the NYPD. The incidents occurred during a 29 day period from 07/26/97 � 08/23/97. The study area is limited to the southern portion of Brooklyn, a borough in New York City. During the study period, this area recorded 650 robberies (see Figure 1) and 873 burglaries.

Kernel Smoothing

The goal of kernel smoothing is to estimate how the density of events varies across a study area based on a point pattern. "Kernel estimation was originally developed to obtain a smooth estimate of a univariate or multivariate probability density from an observed sample of observations�" (Bailey and Gatrell, 1995). In the spatial case, kernel smoothing creates a smooth map of density values in which the density at each location reflects the concentration of points in the surrounding area.

In kernel estimation, a three-dimensional floating function visits every cell on a fine grid that has been overlaid on the study area. Distances are measured from the center of the grid cell to each observation that falls within a predefined bandwidth, or region of influence. Each observation contributes to the density value of that grid cell based on its distance from the center. Nearby observations are given more weight in the density calculation than those farther away.

Kernel estimation has two advantages for displaying crime patterns. It clearly shows complex spatial point patterns in a smooth image that can also be used to create other data sets, and it can be used for quantitative comparisons over time.

First, the method helps make sense of complex point patterns. The result, a smooth raster image of densities, can be used in conjunction with the point map. Therefore, no information is lost in the analysis. From this raster image, users can quickly identify areas with high densities of incidents or potential hotspot areas. This can be done using one of two techniques. Either, one can `eye-ball' them, which is subjective and very judgemental (yet doable) or more appropriately, one can define hotspots based on statistical significance.

The result from kernel estimation method is a simple, aesthetically pleasing raster image, from which users can derive other data sets, specifically, contours of density. These contour loops can be used on their own for display or they can be used to define the boundaries of hotspot areas which can then be analyzed in their own right. The hotspots will often be irregular in shape. It is unlikely, unless the crime distribution is uniform, that the contours will be circles or ellipses as required in other crime clustering methods (see Block, 1995). Under kernel smoothing, the hotspots encompass a uniquely defined area, and are not limited to any jurisdiction, or any man-made boundary for that matter. The hotspots may cover several separate geographic entities like police precincts or sectors, and they represent the fact that clusters of crime often cut across political boundaries.

Another strength of kernel estimation is its usefulness in analyzing change over time. The raster images of density can be used as input into correlation analysis or time series analysis. The correlation analysis can work one of two ways. Either, two consecutive time periods can be compared, i.e., one month to the next, or one time period can be compared to a similar one, i.e. a month in a year could be compared to the same month in a previous year. Either way, the user would expect to see high values in an area from one period corresponding to high values in the same area for the other period. Time series/change analysis could use multiple density images to monitor change over time.

One weakness of the kernel estimator is the arbitrary nature of selecting a radius for the region of influence, or bandwidth. There is no steadfast rule for determining this distance, although some `rules of thumb' have been put forth. Ideally, this distance should represent the actual distances between points in the distribution. The next section addresses issues and challenges in estimating bandwidth.

Estimating Bandwidth

Selecting an appropriate bandwidth is a critical step in kernel estimation. The bandwidth determines the amount of smoothing of the point pattern. The bandwidth defines the radius of the circle centered on each grid cell, containing the points that contribute to the density calculation. In general, a large bandwidth will result in a large amount of smoothing and low density values, producing a map that is generalized in appearance. In contrast, a small bandwidth will result in less smoothing, producing a map that depicts local variations in point densities. Using a very small bandwidth, the map approximates the original point pattern and is spiky in appearance.

Several �rules of thumb� have been suggested for estimating bandwidth. Esri (Redlands, Ca.), the makers of ArcView, the only GIS software to incorporate kernel estimation, use a measure based on the areal extent of the point pattern as the default bandwidth. Specifically the bandwidth is determined as the minimum dimension (X or Y) of the extent of the point theme divided by 30, or min (X, Y)/30. Bailey and Gatrell (1995) suggest a bandwidth defined by 0.68 times the number of points raised to the �0.2 power scaled to the areal extent of the study area, or 0.68(n)^-0.2. This can be adjusted depending on the size of the study area, by multiplying by the square root of the study area size.

The problem with both of these procedures for estimating bandwidth is that neither one takes into account the spatial distribution of the points. Bailey and Gatrell�s estimate is based on point density, but this is limited at best. Large sample sizes will result in small bandwidths while small sample sizes will result in large bandwidths, but no consideration is given to the relative spacing of the points. Also problematic is the arbitrary nature of the coefficient and power. There are in infinite number of combinations that would yield similar results. The ArcView default is also arbitrary. Dividing by the number 30 appears to have no statistical basis. A more practical approach to selecting a bandwidth would take into consideration the relative distribution the points across the study area. One way to achieve this is to base the bandwidth on average distances among points.

We propose that bandwidth be estimated as the average k^th nearest neighbor distance among points. If d_ij is the distance from point i to its j^th neighbor, then the average k^th nearest neighbor distance is:

If k is 10, for example, the bandwidth is estimated as the average distance from each point to its 10 nearest neighbors. The value of k is chosen by the analyst to specify the desired degree of smoothing of the data. Small k values result in a small bandwidth, producing a spiky map with little smoothing. Larger k values result in a larger bandwidth and smoother density map.

The k-nearest neighbor approach is superior to other methods for selecting a bandwidth because it is based on the inter-point distances of the point pattern. Thus, the bandwidth will reflect the spacing of points rather than the size of the study area, or the number of points. The proposed approach also adds flexibility to the kernel estimation procedure by allowing the user to vary k, depending on how much smoothing is desired. Instead of experimenting with different bandwidths in an ad-hoc way, the user controls the degree of smoothing, through the choice of k.

The calculation of the k-nearest neighbor distance was performed in Avenue, ArcView�s scripting language. The procedure is relatively straightforward. First, the distances between each point and every other point are calculated. Then, using a nested loop structure, the distances for each point are sorted, and the average distance of the k-nearest neighbors is calculated. Then, in another loop, the average of those distances is calculated. The result is the recommended bandwidth based on k.

Results

The k-nearest neighbor approach was implemented on the burglary and robbery data at two geographical scales. First, the two crimes were analyzed for Brooklyn South for the given time period. Second, by selecting out smaller geographic areas within the two larger data sets, we created subsets of these two data sets. The script was then run on each of these data sets with varying k-values. These results are summarized in the table (see Table 1) and chart (see Figure 2) below:

Table 1: Bandwidth Values

	All burglaries		All robberies	Burglary subset		Robbery subset
Area	168.681 sq. km.	172.552 sq. km.			14.152 sq. km.		27.047 sq. km.
N	873	650			107		277
K=1 N.N.D.*	132.397 m.	159.136 m.			116.686 m.		108.103 m.
K=5 N.N.D	267.269 m.	310.952 m.			237.010 m.		229.873 m.
K=10 N.N.D	384.747 m.	440.065 m.			377.546 m.		324.737 m.
K=20 N.N.D	552.794 m.	651.093 m.			575.920 m.		469.782 m.
K=30 N.N.D	688.943 m.	823.305 m.			742.972 m.		593.030 m.
ArcView Default	431.916 m.	437.345 m.			123.813 m.		129.408 m.
Bailey and Gatrell	2279.497 m.	2445.604 m.			1004.710 m.		1148.345 m.

Figure 2: Bandwidth Values

*Nearest Neighbor Distance

The data presented above clearly indicates the differences between the ArcView default bandwidth, Bailey and Gatrell�s rule of thumb, and the values computed based on the K nearest neighbor technique. The ArcView default is useful when the study area is large (see Figure 3), but when a small study area is analyzed, the default bandwidth becomes too small (see Figure 4). If such a bandwidth is used, the resulting density map will be spiky in appearance, and have extremely high and low density values. Another problem with the ArcView default can be seen if the study area is not square (or close to square). If the study area is rectangular, with one-side much longer than the other, the ArcView method will use the smaller dimension in its bandwidth estimation. The result, again, will be a small bandwidth. An example of this would be the study area of Manhattan, in New York City, for which the Y dimension is approximately six times larger than the X dimension. Bailey and Gatrell�s rule of thumb produces very large bandwidths, much larger than the default and nearest neighbor values. The end result is a highly smoothed map that does not show local variations in crime, but offers a generalized regional view.

By comparison, bandwidths estimated by the nearest neighbor method are relatively stable with changes in scale (see Figures 5 & 6). Bandwidths for the subset data sets are roughly equal to those for the Brooklyn South data sets, for equivalent values of k. Thus, rather than being determined by the scale of the map, the bandwidth is based on the user defined value of k.

The effect of using different k-values is also important (see Table 1 and Figure 3 above). Logically, it would follow that a higher k will yield a larger bandwidth and smaller k values will yield smaller bandwidths. As k increases, the process searches larger areas for more nearest neighbors, adding larger distances to the calculation. This is useful in providing a logical basis for how much smoothing is desired. If the analyst wants to encompass more points in smoothing, he/she simply has to use a larger k and vice versa. The k value also can be adjusted depending on the sample size (N). If N is small, a small k should be used and if N is large, a larger k should be tried.

Conclusions

Kernel estimation has proved t be a useful tool in simplifying complex spatial point patterns. By creating a smooth map of density values in which the density at each location reflects the concentration of points in the surrounding area, analysts are able to see how crime densities vary across a study area. However, the arbitrary nature of the process of selecting a bandwidth may result in misleading or inaccurate maps. The ArcView default and Bailey and Gatrell�s rule of thumb reinforce this notion of subjectivity decision. The procedure proposed here, the k nearest neighbor technique, bases the bandwidth on the spatial distribution of the point pattern, which maintains flexibility while empirically adding objectivity to the analysis.

References

1995, Bailey, T. and Gatrell, A., Interactive Spatial Data Analysis, Longman Scientific and Technical, Essex, England.

1995, Block, C., STAC Hot-Spot Areas: A Statistical tool for Law Enforcement Decisions, in Crime Analysis Through Computer Mapping, Eds. Block, C., Dabdoub, M. and Fregly, S., Police Executive Research Forum, Washington D.C.