Interface agents are software tools oriented to reduce the gap between user's knowledge and the technical knowledge required to operate a software system. For the purposes of this work, we are assuming that the user is neither familiar with ArcInfo, nor with the content and structure of the ArcInfo database, and that he/she is not necessarily working on the same machine on which ArcInfo is available.
The problem we are interested in is to build an intelligent interface agent able to help ArcInfo users to retrieve the information they want, building or helping them to build complex queries during the interaction.
Shapiro, Chalupsky & Chou describe from both the design and the implementation standpoint, an interface between SNePS, the Semantic Network Processing System, a knowledge representation, reasoning and acting system developed by Stuart C. Shapiro et al. at the State University of New York at Buffalo, and ArcInfo [4]. As the authors concluded, "many interesting questions need to be answered before a full fledged Natural Language front-end to ArcInfo can be developed".
Specifically, we address two issues. One is the possibility of using ArcInfo documentation files as a source of information about geographic attributes, to partially automate the construction of a domain knowledge base and domain lexicon. These documentation files are supposed to be compliant with the new Federal Data Transfer Standard [2] and ideally have to accompany every spatial data set used in ArcInfo.
The second issue has to do with the use of an intelligent agent to perform spatial queries in a more natural way, using multimedia language (a combination of Natural Language and pointing). Currently, queries to INFO, ArcInfo database module, involve knowledge of attribute values (for example, "select landuse-code 3" where 3 might stand for "agriculture", or "select vegetation 'forest'"). Queries of more complex nature are more difficult and may require additional operations, such as statistical procedures, before they can be correctly formulated (for example, to retrieve information about the highest points in an elevation data set, the user first has to run statistics on this data set to obtain the exact value). Although menu-based interfaces, such as ArcTools, simplify user's task, they usually constrain the possibilities of interaction to those defined by design, while still requiring familiarity with concepts and database structure.
The main technical problems faced in this project were:
A small demonstration that shows the main features of the intelligent agent is currently available.
In order to do that, the agent must be able to:
For a plan to be performed, its preconditions must be satisfied. Preconditions refer to beliefs of the intelligent agent. We identify at least two different types of beliefs in our agent:
Some examples:
ArcInfo data types are coverage, grid, table, image, and TIN.
SNePS representation:
(assert superclass ARC-datatype subclass {coverage, grid, table, image, tin})
Coverages can represent points, lines and polygons.
SNePS representation:
(assert superclass coverage subclass {polygon-coverage line-coverage point-coverage})
If a coverage is a polygon coverage, its system attributes are : area,
perimeter, and internal ID (#).
SNePS representation:
(assert forall ($x) ant (member *x class polygon-coverage) cq (object *x system-attribute (AREA PERIMETER "#")))
A typical ArcInfo workspace is a directory that contains data sets (each forms a subdirectory) and a common ``info'' subdirectory. The ``info'' subdirectory stores all INFO files associated with all data sets in this workspace, as well as user-defined INFO files. In our prototype we deal only with the most common type of ArcInfo data set, a coverage. Each coverage represents geographic features of a particular kind, for example, roads, land parcels, or vegetation classes, for a particular geographic area. Geometric objects in a coverage (points, lines, and polygons) represent location of geographic features, while attributes of objects in a database associated with the coverage represent properties of features. One of the problems inherent in the management of data sets, such as coverages, is associating objects in coverages and their attributes with the features in the real world. Most often, this association is maintained by the owner of the information, who knows which real world features each data set stands for. The owner of the information knows, for example, that a coverage named STREAMS will contain streams, which are represented by lines (arcs). Attributes of arcs stand for certain properties of streams, for example, LENGTH represents lengths of stream, DISCHARGE means annual discharge, and so on. The owner of information also knows about other INFO and text files which are related to data sets.
Since this knowledge largely resides with the user, it is very difficult to formalize. In this project we combine several sources to obtain the most complete possible knowledge about an ArcInfo workspace. These sources are:
Until recently, ArcInfo did not provide an easy way to keep track of information about coverages in an automated manner. Executive Order 12906 of April 1994 "Coordinating Geographic Data Acquisition and Access : The National Spatial Data Infrastructure" [2] required all federal agencies to document their spatial data sets according to the guidelines established by the Federal Geographic Data Committee. To meet meta-data (data about data) needs of federal agencies Environmental Systems Research Institute (producers of ArcInfo software) distributed DOCUMENT, a meta-data management tool for ArcInfo users.
DOCUMENT is used to create and update documentation files associated with ArcInfo data sets. Among other things documentation files contain information which is directly relevant to this project: names of data sets, their topics and brief descriptions, descriptions of feature classes and their attributes. Taking into account the popularity of ArcInfo and emerging meta-data standards for spatial data sets, it seems beneficial to be able to extract essential information about a particular ArcInfo workspace from documentation files produced by DOCUMENT. This would allow us to automate the construction of the semantic network knowledge-base in which we represent knowledge about the workspace at SNePS-level, which would otherwise have to be constructed manually.
Documentation files consist of four INFO (ArcInfo database module) files per every data set. Three of them are considered relevant to this project:
Every feature class in an ArcInfo coverage (point, arc, polygon, etc.) has an INFO file associated with it. The name of the file is composed of the name of the coverage and a standard extension: .AAT (arc attribute table), .PAT (point/polygon attribute table), etc. This files are examined to obtain information about attributes of corresponding features in the coverage.
Often a workspace contains INFO files related to features in coverages. A relate is a link maintained in ArcInfo relational database between INFO files, most often between feature attribute tables and some data files. These data files, produced by users, can contain additional information which can be linked to features in coverages when needed. For example, column (attribute) VEGCODE in polygon attribute table of coverage VEGETATION (VEGETATION.PAT) can represent vegetation classes with integers 1 through 5 (to save space and increase performance). Related INFO file VEGCLASS.DAT will have the same attribute (to establish a relate) and other attributes. It may contain an attribute VEGCLASS with definitions of vegetation classes: ``forest'' for ``1'', ``woodland'' for ``2'', and so on.
Client procedures, written in C, were attached to a set of primitive actions in SNePS, in such a way that the agent is able to perform the act of connecting and communicating with ArcInfo either by queries or simple requests. In the first case, a response is expected from ArcInfo. In the second case, the execution can continue immediately. At C-level, message passing features are included to acknowledge errors or receive something from the ArcInfo server. An ArcInfo command can be evaluated or a request for a result processed.
The following modules compose the C-level of the client (arcclient):
During this phase
At this point our interface agent has some beliefs (knowledge) about general organization of ArcInfo and about the content of the workspace in question. The usefulness of this interface agent depends on it's ability to process various kinds of requests from users to ArcInfo. If we assume the request - response model of interaction between a user and ArcInfo, which is reasonable for novice users, then the task of the interface agent is to
Depending on the type of query and response (display, listing of attributes, text, etc), last two steps may be omitted. The critical task (at least, at this stage) is to be able to convert user's input into sequences of ArcInfo commands. In order to do this, the agent must have plans - sequences of actions which are taken if their preconditions are satisfied, for every type of user request. The next section covers some of such plans present in our interface agent.
underlying the design of decision-support systems are questions such as "What can people do with computers?", and the even more fundamental question, "What do people do?". Answering the question "What do GIS users do?" for various sets of GIS tasks would be a prerequisite to the design of user interfaces [3, p.1429].In this paper we cannot attempt to cover all "various sets of GIS tasks", rather we limit our task to developing plans suitable for some typical queries that users perform in ArcInfo. What would be a typical query, what types of queries exist? This question is definitely worth a special research which is beyond the scope of this paper.
To get an idea of questions that users might try to solve utilizing ArcInfo, a questionnaire was distributed (Appendix) . Based on a few responses received and on personal experience sample questions (question is used interchangeably with request or query here) were formulated. These questions definitely reflect an approach based on personal experience with GIS and ArcInfo, as well as, possibly, some biases in the use of GIS, database construction, etc. The choice of questions was also limited by practical considerations: only questions that can be answered using our sample database were considered. Given all these limitations, it is still hoped that these questions are adequate as a starting point.
To this point only a few plans have been developed. We started with the basic questions of type "Where is...", "Show...", such that they refer directly to values of attributes of coverage features and most generally can be answered by showing requested features graphically, in relation to other features. Here are two questions of this kind used in our demonstration.
Show all forests in the study area. Where is the Moneron creek?Assuming that these features are identified in coverage attribute tables or related files, queries of this kind can essentially be answered by retrieving information from INFO. Other questions require some spatial processing, such as overlay or buffering.
What is the total area of woodlands? What portion of the island is covered with woodlands? What percent of area within 100 ft of streams is forested?
For example, question Where is the Moneron creek? could be responded to with the following set of commands in Arcplot:
reselect {path to database} {streams} {arc} {name} = '{Moneron}' linecolor red arcs {path to database} {Streams}where elements in { } are provided (through inference) by the interface agent.
Here is the actual plan that the agent follows to carry out this task:
If a value is related to an attribute and the value is related to a feature and the value is related to a coverage then a plan to locate the value of a type is to say "reselect " and then say the coverage and then say "space" and then say the feature and then say "space" and then say the attribute and then say " = '" and then say the value and then issue "quote" and then issue "linecolor red" and then say "arcs %.db%/" and then send the coverage.where %.db% is AML variable that contains path to the database.
Representations of plans such as the one above are stored in a semantic network knowledge base (the system's knowledge base), and used to translate Natural Language queries plan definitions, and requests to sequences of ArcInfo commands.
These data sources are used by a parser in the analysis and generation of English sentences. We modified the current parser of SNePS in such a way that it can handle possible mismatches between user's input and the representation of the ArcInfo workspace in the agent's knowledge base.
These mismatches will be quite common for users who are not familiar with the content and/or structure of the ArcInfo database (i.e. type of users this interface is developed for). In order to deal with this problem, our intelligent agent uses forward and reduction inference to infer, from the agent's representation of ArcInfo workspace, possible matches between user's and ArcInfo terms and request to the user the selection of one of the possible alternatives (if they exist).
The selected alternative is used for further processing of the query, and the new concept is added to the lexicon. The meaning negotiation process and the components involved in the processing of natural language queries are shown in Figure 4.
Figure 4. Structure of the meaning negotiation process
Figure 5 shows a sample run of the prototype. The top-right screen shows the activity of the ARC server, the bottom-right screen shows the SNePS agent and left screen shows the answer to the request "find all woodlands" provided by accessing the ArcInfo workspace.
Figure 5. Sample run of the interface agent - screen dump.
Different types of users exists, with different knowledge and skills. The system we are developing is mostly aimed at a particular type of user, the one without sufficient knowledge of ArcInfo and, possibly, without much GIS experience. Natural Language interfaces seem to be helpful for this type of users as long as they offer the possibility of expressing requests in the same way the users may express them to a human consultant.
Interface agents such as the one proposed in this paper could be used for research in spatial cognition: how people think about spatial problems, and how useful can be linguistic devices for expressing their needs of processing spatial information. These research questions could be investigated with the help of tools like the one we are developing. Instead of ruling out a particular way of interaction, it might be interesting to test in which cases it might be useful.
Many more plans are needed, and much more research on tasks carried out with GIS is required to define "typical questions" and provide automated answers to them. We will appreciate if you help us by filling the form in Appendix.
As it was research oriented, efficiency and speed were not considered as important in the development of the prototype. As result of this, our current prototype has response time that might not be satisfactory for real time interaction. Response time can be improved if the system orientation changes from research to production.
In the future, we will consider connecting the interface agent and ArcTools, to provide the user a unified interface in which Natural Language is one possible way of reach the user's goals.
Given these data, what questions would you ask about the study area ?
TYPE and STRUCTURE of question are more important than any specifics, you c an ask any questions and add more datasets if you need.
Examples of questions :
We would appreciate your questions, as well as any remarks/comments/etc.
Daniel Campos D.
Aleksey Y. Naumov
State University of New York at Buffalo
105 Wilkeson Quadrangle
Buffalo, New York 14261-0023
E-mail: naumov@geog.buffalo.edu
Stuart C. Shapiro
Professor
Department of Computer Science
State University of New York at Buffalo
226 Bell Hall, Buffalo, NY 14260-2000
E-mail: shapiro@cs.buffalo.edu
This research represents part of Research Initiative #10, "Spatio-Temporal Reasoning in GIS", of the National Center for Geographic Information and Analysis, supported by a grant from the National Science Foundation (SBR-88-10917); support by NSF is gratefully acknowledged.