Starlight and Map Objects

for Data Mining Crime Information

 

Dr. Bruce Rex

Pacific Northwest National Laboratory

Richland, Washington

 

Sgt. Ron Rasmussen

Seattle Police Department, Crime Analysis Unit

Seattle, Washington

 

 ABSTRACT: Starlight is a unique visual information analysis system that offers data mining functionality across disparate data sets of multimedia information. Coupled with Map Objects’ spatial analytical, statistical and mapping functionality, a new realm of information analysis and knowledge discovery is enabled. Crime data from Seattle's Arc/Info database is used to exemplify the interaction between these two powerful technologies in a real-world analytic setting.

 

Introduction

Starlight represents an exciting new class of information system: the visual information analysis system. Starlight differs from traditional GIS as a visual information system in that the Starlight system includes information from multiple sources and data types in a common visual workspace. These types can be comprised of free or structured text, tabular data, image or video, geospatial, and temporal information. Using Starlight’s unique visual-analytic tool set, an analyst can quickly uncover hidden relationships and spatial-temporal patterns in the data.

Starlight was initially conceived in response to a need by military intelligence analysts to process ever-increasing quantities of multimedia intelligence reports in real or near-real time. Typically this information comes in the form of coded text messages which can further contain remotely sensed imagery, temporal and geospatial information, as well as structured data such as number tables, etc. The primary design criterion was to integrate this disparate information into a common workspace. Additionally, the workspace needed to assist analysts is rapid reduction of the information sets in support of hypothesis testing as well as "Gestalt" discovery.

Although developed primary for the intelligence application, the design team has made every effort to keep the fundamental design as generic as possible to permit the use of the system in as many other application areas as possible. As a result, Starlight has emerged as an exemplar of a next-generation class of information system: visual information analysis systems [1] or "VIAS". Visual information systems are not simply new-look presentation graphics, but additionally provide tools and interfaces that allow the analyst to interact with directly with high-dimensional information and are a superset of GIS.

 

The Starlight General Concept

The Starlight 3D workspace is optionally comprised of a text clustering visualization, a structured alphanumeric data visualization, a visualization that displays many dimensions of the data simultaneously, maps and images (Fig. 1). There are several additional tools that permit filtering and subsetting of the data to assist in very fine-grained analysis. Data is ingested as XML, and then subjected to several preprocessing steps, depending on the type of information being processed. Once processed, the information is stored internally in an object-oriented data store.

The user interface permits the analyst to "fly" through the workspace and interact with the respective visualizations. Once the appropriate level of detail threshold is passed, the user can perform picking operations on the 3D elements to reconfigure the workspace features such as moving a map or selecting documents to open and read. Navigation is very simple and new users move efficiently through the workspace with only an hour of two of practice. The text spheroid, for example, can be rotated to make the third dimension more obvious.

 

Figure 1.

A Starlight Workspace.

 

Implementation

Starlight is written in Visual C++ under Windows NT. Starlight uses an object-oriented database as it’s internal data store. This requires an analyst to properly organize the information prior to ingest. Within the data store, information is hierarchically ordered according to associations and relationships with other data objects in the collection. The datastore acts, then, as a "relationship" database as opposed to a purely relational database. Four types of queries can be made against the database; content, concept, association and spatial queries. In the case of spatial queries, they are made against the .DBF file portion of the shape files stored in the directory structure.

The content query is simply a pattern search within the database. It is purely Boolean: the requested element exists, or it doesn’t. If it does exist in the database, the associated element in the workspace will assume a blinking state. If the blinking element is clicked on, the document text will pop up for reading. The concept query accepts a series of words and returns a subset of documents that may be related. This type of query will typically return documents pertaining to cars if "automobile" is one of the query keywords. This happens because the text engine will look for words related by context. The word usage here is as important as the actual string of letters. The association query is made against the hierarchical schema of the object base itself. It will return a "link array" denoting the different levels of the hierarchical organization, and the data elements belonging to that level. This is Starlight’s most powerful visualization tool (Fig. 2) and is capable of showing complex spatio-temporal relationships when used with Starlight temporal filtering tool. The fourth type of query, the spatial query is discussed later in the paper.

Figure 2.

A Starlight "Link Array".

 

Relationships amongst the data are represented by white rays connecting the associated elements in the visualization. For example, an association query made on the textual data may result in a subset of documents related as in Figure 3. Multiple datasets may be open simultaneously in the workspace, and visual connections can be made between them in the same manner that they can be made between objects within a single dataset. Starlight features term extraction as a preprocessing step, which allows connections to be made between place names and actual geocoordinates. This is done via Starlight’s proprietary gazeteer. Geocoding can optionally also be done using the MapObjectsÒ geocoding routines. Until recently, all maps in Starlight were simply bitmap images with hotspots mapped onto the image so that the rays point to the right spot on the map. Starlight has recently replaced the original mapping system with a MapObjects® implementation, which is an effective GIS extension that permits more complex spatial query subsetting.

  

Figure 3.

Inter-relationships between clustered documents.

 

GIS Extensions to Starlight

A GIS extension to Starlight enhances the system’s applicability to problem domains such as epidemiology and crime analysis, where spatio-temporal relationships are very important to analysis as well as the need to incorporate non-spatial information that has textual origins. The complementarity of these technologies also enables issues such as data validity and qualitative spatial information [3] to be addressed visually [2] in a common workspace alongside the more traditional geographic displays familiar to GIS users. Quantitative information can optionally be displayed using one of the Starlight visual tools alongside a GIS display, as opposed to using thematic mapping only, allowing a total higher representative dimensionality of the information subset. The ability to include large text databases with traditional spatial analysis opens up an entirely new area in regional analysis beyond GIS. For example, in law enforcement, textual patrol reports tied to GPS fixes permit qualitative textual information to be tied precisely to geospatial data within the workspace.

As mentioned above, the mapping system used in Starlight was originally a proprietary set of codes developed before products like Esri’s MapObjects® were released to the market. As a result, spatial-analytic functionality was limited to a simple contained-area search, which was based on a simple rectangular area defined by a user mouse drag. While this approach permitted data subsetting on a spatial constraint, most areas of interest are not inherently rectangular. The 2.1 release of Starlight featured the inclusion of MapObjects® as the basis for mapping and spatial-analytic functionality.

Coordinate information is tied to place names via the Starlight gazeteer. The gazeteer has a naturally hierarchical organization so that places can be modeled in the system as they occur in real life. For example, a parcel is contained by a county, which is in turn contained in a state, and so on. This is an inherently object-oriented relationship and is very amenable to storage within the object-based gazeteer. In this manner, places can be reference either by coordinate or name. Additionally, when the Starlight database is constructed, a name extraction utility automatically registers entries in the gazeteer for the names extracted from textual information. For example, if San Diego is mentioned in a document, the document identifier is connected to San Diego’s representation in the gazeteer. Thanks to the gazeteer, casual textual references to place names can be accurately connected with both cartographic representations and structured or tabular data relation to that place name. It also removes much of the tedium that would be required to manually make such associations using only a traditional GIS.

Interactions between the Starlight and MapObjects® are two-way. Queries can be generated from either side. The analyst may select an arbitrary region from a displayed map by any of the standard selection tools provided by MapObjects®, then pass information retrieved via query on the area of interest to the Starlight database for further visual analysis. Conversely, a query made on the Starlight database might return some information which has a connected geospatial component that would appear on the map or maps rendered by MapObjects® with points or areas highlighted or marked by icons or other symbology.

One of the most exciting spatial queries made possible by extending Starlight’s mapping system with GIS functionality are all of the various forms of the proximity search. Of these, one of the most useful is the buffered containment query. This query is very useful for gathering information related to areas surrounding a given spatial entity such as a river or road. Combined with textual contexts such as "near Green River", varying buffer distances from Green River permits quantitative spatial queries to be associated with fuzzy textual descriptions by the computer, instead of by an expert human analyst. Although not currently being researched, it should be possible to extract meets-and-bounds information from textual descriptions and tie that data to actual places via the place name extractor, the gazeteer and MapObjects® geocoding routines.

The Seattle Police Department is currently evaluating two Starlight installations in the Crime Analysis Unit. The primary focus is on serial crime, and Starlight ideally suited for this type of analysis. For example, a subset of crime data can be quickly subsetted by a spatial query requesting the set of all robberies within 1000’ of S. Rainier Ave. (the supporting images are omitted here for copyright reasons). This subset can then returned to the Starlight workspace and can be further examined by tieing in qualitative information such as that contained in the detectives’ reports. Combined with Starlight spatio-temporal analysis capabilities, an MO can be quickly associated with the times and location of related crimes.

  

Conclusion

The unification of these technologies offers a synergy that extends information analysis into an entirely new realm. Never before has the simultaneous analysis of multisource, multimedia, disparate spatio-temporal information in a single workspace been possible. This new toolset will shape the way analysts formulate the problem space, providing an entirely new methodology for high dimensional information analysis. The application of this methodology to the analysis of crime and criminal intelligence information is among the most promising. No other technology allows for the simultaneous examination of the relationships between the people, places, and things that are identified and tracked in a multitude of disparate databases and all too frequently in paper form.

The City of Seattle has found that the integration of these data sets for analysis and the extraction of the information from the text has long been a problem in this arena. The solutions offered to date have tended to be very expensive and focused solely on either text extraction and document management or on database integration and data warehousing. Few agencies can support major data migration and warehousing projects. This toolset provides Seattle Police with a means to conduct state of the art data analysis relatively quickly and inexpensively. Existing personnel have been trained to use the tools and the capacity of the system will grow as the agency improves the quality and character of the data that it collects.

Commercially available crime and criminal intelligence software is limited to two-dimensional reporting and structured data and has hamstrung our efforts at thorough and meaningful analysis. Being able to view the information in three dimensions and color, and further being able to manipulate the high-dimensional data space allows for the discovery of relationships that were previously unidentifiable. Whether it is used for crime analysis, case support analysis, strategic planning, or intelligence analysis, Starlight provides law enforcement with a means to be more effective, more efficient and better informed.

.

References

[1] Card, S.K., MacKinley, J.D., and Shneiderman, B.; "Readings in Information Visualization: Using Vision to Think", Overview, pp. 1-34, Morgan Kaufmann Publishers, 1999.

[2] Goodchild, M., Buttenfield, B. and Wood, J; "Visualization in Geographical Information Systems", pp. 141-149, Hearnshaw and Unwin, eds., J.Wiley & Sons, 1994.

[3] Paladino, O.; "Treatment of Qualitative Geographic Information in Monitoring Environmental Pollution", from Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, pp.418-431, Springer Verlag Lecture Notes in Computer Science Series #639, 1992.