Daniel Campos D., Aleksey Y. Naumov, and Stuart C. Shapiro

Building an Interface Agent for ArcInfo

Abstract

Introduction

The Interface Agent

How the Interface Agent Works

Testing the Interface Agent

Discussion and Conclusions

Abstract

This paper describes a knowledge-based interface agent whose mission is to help users without a knowledge of ArcInfo to access and process spatial data stored in ArcInfo databases. Our interface agent, using a client-server schema and operating on a LAN or WAN network, receives and processes requests written in plain English, interacting with the user in case of possible mismatches between his/her concepts and the representation of data in the ArcInfo database. The agent builds and sends sequences of commands oriented to provide the information requested by the user to an ArcInfo server, receives, and presents the results of those requests to the user. A prototype of the interface agent was built using SNePS (Semantic Network Processing System) and Common Lisp on a Sun Sparc station. The project faces some of the challenges raised in the work of Shapiro, Chalupsky & Chou [5] and proposes some possible solutions to them.

Introduction

The development of GIS technology has made it available to a growing number of people from different disciplines and with different backgrounds. However, the degree of productivity they can achieve is limited by their lack of technical knowledge about GIS tools.

Interface agents are software tools oriented to reduce the gap between user's knowledge and the technical knowledge required to operate a software system. For the purposes of this work, we are assuming that the user is neither familiar with ArcInfo, nor with the content and structure of the ArcInfo database, and that he/she is not necessarily working on the same machine on which ArcInfo is available.

The problem we are interested in is to build an intelligent interface agent able to help ArcInfo users to retrieve the information they want, building or helping them to build complex queries during the interaction.

Shapiro, Chalupsky & Chou describe from both the design and the implementation standpoint, an interface between SNePS, the Semantic Network Processing System, a knowledge representation, reasoning and acting system developed by Stuart C. Shapiro et al. at the State University of New York at Buffalo, and ArcInfo [4]. As the authors concluded, "many interesting questions need to be answered before a full fledged Natural Language front-end to ArcInfo can be developed".

Specifically, we address two issues. One is the possibility of using ArcInfo documentation files as a source of information about geographic attributes, to partially automate the construction of a domain knowledge base and domain lexicon. These documentation files are supposed to be compliant with the new Federal Data Transfer Standard [2] and ideally have to accompany every spatial data set used in ArcInfo.

The second issue has to do with the use of an intelligent agent to perform spatial queries in a more natural way, using multimedia language (a combination of Natural Language and pointing). Currently, queries to INFO, ArcInfo database module, involve knowledge of attribute values (for example, "select landuse-code 3" where 3 might stand for "agriculture", or "select vegetation 'forest'"). Queries of more complex nature are more difficult and may require additional operations, such as statistical procedures, before they can be correctly formulated (for example, to retrieve information about the highest points in an elevation data set, the user first has to run statistics on this data set to obtain the exact value). Although menu-based interfaces, such as ArcTools, simplify user's task, they usually constrain the possibilities of interaction to those defined by design, while still requiring familiarity with concepts and database structure.

The main technical problems faced in this project were:

design and implementation of a mechanism for linking ArcInfo and SNePS through INTERNET in such a way that the system could operate independently of the actual locations of ArcInfo and the interface agent,
design and implementation of a mechanism to automate the process of building a knowledge base and a domain dictionary using the information stored in ArcInfo meta-data files,
definition of a grammar for a dialect of English that enables the system to understand Natural Language queries and enables a user to teach it plans that can be used to provide answers to those queries, and
design and implementation of a mechanism for negotiation of meanings of unknown words.

A small demonstration that shows the main features of the intelligent agent is currently available.

The Interface Agent

Our goal is to provide inexperienced users of ArcInfo with an interface that would help them to obtain information from ArcInfo databases. We are assuming that users are neither familiar with ArcInfo, nor with the content and structure of ArcInfo workspaces. Consequentially, our interface agent must be able to plan sequences of ArcInfo commands aimed at providing the information requested by the user, interacting with him/her in the case of possible mismatch between the user's concepts and data representation in the ArcInfo workspace.

In order to do that, the agent must be able to:

build sequences of ArcInfo commands that provide answers to typical queries to an ArcInfo database,
infer, if it is possible, best matches between user's concepts and ArcInfo data, and
generate text and graphical output to the user.

For a plan to be performed, its preconditions must be satisfied. Preconditions refer to beliefs of the intelligent agent. We identify at least two different types of beliefs in our agent:

beliefs about general characteristics of the ArcInfo system,
beliefs about the ArcInfo workspace used during a particular interactive session.

Beliefs about general characteristics of the ArcInfo system

The interface agent has some general beliefs about the ArcInfo system. We collected some generic information about ArcInfo and represented it as a propositional semantic network using SNePS.

Some examples:

ArcInfo data types are coverage, grid, table, image, and TIN.
SNePS representation:

(assert superclass ARC-datatype
        subclass {coverage, grid, table, image, tin})

Coverages can represent points, lines and polygons.
SNePS representation:

(assert superclass coverage 
        subclass {polygon-coverage line-coverage point-coverage})

If a coverage is a polygon coverage, its system attributes are : area, perimeter, and internal ID (#).
SNePS representation:

(assert forall ($x) 
           ant (member *x class polygon-coverage)
            cq (object *x system-attribute (AREA PERIMETER "#")))

Beliefs about the ArcInfo workspace used during a particular interactive session

In order to be able to build queries for ArcInfo, our intelligent agent needs to have some beliefs about the content and structure of the active ArcInfo workspace. In previous version of this demo these data were directly programmed in the demo. However, for real workspaces containing multiple data sets and INFO files associated with them, this would require a huge effort and substantial knowledge of both ArcInfo and SNePS, which can not be assumed of the large majority of users. Therefore, an automated procedure for describing a given ArcInfo workspace was developed. This procedure examines a workspace and writes a set of propositions about it in the SNePS language.

A typical ArcInfo workspace is a directory that contains data sets (each forms a subdirectory) and a common ``info'' subdirectory. The ``info'' subdirectory stores all INFO files associated with all data sets in this workspace, as well as user-defined INFO files. In our prototype we deal only with the most common type of ArcInfo data set, a coverage. Each coverage represents geographic features of a particular kind, for example, roads, land parcels, or vegetation classes, for a particular geographic area. Geometric objects in a coverage (points, lines, and polygons) represent location of geographic features, while attributes of objects in a database associated with the coverage represent properties of features. One of the problems inherent in the management of data sets, such as coverages, is associating objects in coverages and their attributes with the features in the real world. Most often, this association is maintained by the owner of the information, who knows which real world features each data set stands for. The owner of the information knows, for example, that a coverage named STREAMS will contain streams, which are represented by lines (arcs). Attributes of arcs stand for certain properties of streams, for example, LENGTH represents lengths of stream, DISCHARGE means annual discharge, and so on. The owner of information also knows about other INFO and text files which are related to data sets.

Since this knowledge largely resides with the user, it is very difficult to formalize. In this project we combine several sources to obtain the most complete possible knowledge about an ArcInfo workspace. These sources are:

ArcInfo documentation files
Until recently, ArcInfo did not provide an easy way to keep track of information about coverages in an automated manner. Executive Order 12906 of April 1994 "Coordinating Geographic Data Acquisition and Access : The National Spatial Data Infrastructure" [2] required all federal agencies to document their spatial data sets according to the guidelines established by the Federal Geographic Data Committee. To meet meta-data (data about data) needs of federal agencies Environmental Systems Research Institute (producers of ArcInfo software) distributed DOCUMENT, a meta-data management tool for ArcInfo users.
DOCUMENT is used to create and update documentation files associated with ArcInfo data sets. Among other things documentation files contain information which is directly relevant to this project: names of data sets, their topics and brief descriptions, descriptions of feature classes and their attributes. Taking into account the popularity of ArcInfo and emerging meta-data standards for spatial data sets, it seems beneficial to be able to extract essential information about a particular ArcInfo workspace from documentation files produced by DOCUMENT. This would allow us to automate the construction of the semantic network knowledge-base in which we represent knowledge about the workspace at SNePS-level, which would otherwise have to be constructed manually.
Documentation files consist of four INFO (ArcInfo database module) files per every data set. Three of them are considered relevant to this project:
- <data set>.DOC - general description of data set, contains theme (topic) of the dataset and its short description;
- <data set>.ATT - attribute description file, contains feature classes of a data set and description of their attributes;
- <data set>.NAR - narrative file, contains additional information about data set.
INFO files associated with feature classes (point, arc, polygon) of coverages
Every feature class in an ArcInfo coverage (point, arc, polygon, etc.) has an INFO file associated with it. The name of the file is composed of the name of the coverage and a standard extension: .AAT (arc attribute table), .PAT (point/polygon attribute table), etc. This files are examined to obtain information about attributes of corresponding features in the coverage.
Related INFO files in the workspace
Often a workspace contains INFO files related to features in coverages. A relate is a link maintained in ArcInfo relational database between INFO files, most often between feature attribute tables and some data files. These data files, produced by users, can contain additional information which can be linked to features in coverages when needed. For example, column (attribute) VEGCODE in polygon attribute table of coverage VEGETATION (VEGETATION.PAT) can represent vegetation classes with integers 1 through 5 (to save space and increase performance). Related INFO file VEGCLASS.DAT will have the same attribute (to establish a relate) and other attributes. It may contain an attribute VEGCLASS with definitions of vegetation classes: ``forest'' for ``1'', ``woodland'' for ``2'', and so on.

AML program to describe ArcInfo workspace

An AML (Arc Macro Language) program was created to analyze the content of a workspace and produce files containing the resulting SNePS network and workspace-specific dictionary. The following algorithm outlines the way in which the ``arcinfo-workspace.aml'' program analyzes the workspace.

The program receives as arguments the path of an ArcInfo workspace and a path for remote copies of output files.
A predefined set of arc labels is written to the output network file.
Workspace is searched for related files. Every ArcInfo relate (link between INFO files) is stored as a record in a relate file. Relate file is an INFO file with particular attributes (columns). Since the structure of relate file is always the same, these files can be found in the workspace and names of related files can be obtained. Names of all discovered relate and related files are written to a list.
A list of coverages in the workspace is created. Every coverage is then processed in the following way.
1. If documentation files for this coverage exist, they will be examined. Coverage theme and description are extracted. A list of INFO files described is created. All descriptions of attributes available in documentation for these INFO files are extracted.
2. A list of feature attribute tables (INFO files associated with feature classes in the coverage) is created. If any of these INFO files is missing from the list created at step 1, it is added to it.
3. Every INFO file in the list is described. Name, description (if available from documentation, or default description for some types of INFO files), feature class (if associated with a feature class) are extracted. Every attribute in this INFO file is then described in the following way:
  - Name of attribute, description (if available from documentation, or default description for some types of attributes).
  - If an attribute is character, all unique values of this attribute in this INFO file are described (values described by column).
4. If this file is one of the relate and related files (discovered at step 3), describe it by record (values described by row).
5. If some of the relate and related files (discovered at step 3) were not described in step 4.3, repeat step 4.3 for those files.
All names of coverages, INFO files and values of character attributes are added to a lexicon file when they are encountered.

The ``arcinfo-workspace.aml'' program was tested on several ArcInfo workspaces and was generally able to describe them (with the different degree of detail) regardless of the status of documentation (present or absent, more or less complete). Partial view of the resulting SNePS knowledge-base for one of test workspaces is shown in Figure 1.

Partial view of an automatically
built SNePS knowledge-base

Figure 1. Partial view of the SNePS knowledge-base automatically generated from the files in a particular workspace.

Connecting SNePS and ArcInfo

The current version of ArcInfo offers the possibility of running ArcInfo as a server, and provides a way to communicate with this server from an external application. We use this client-server mechanism to establish connection between SNePS and ArcInfo. Figure 2 shows the structure of the communication layers of the system.

Figure 2. Architecture of SNePS-ArcInfo client-server connection.

Client procedures, written in C, were attached to a set of primitive actions in SNePS, in such a way that the agent is able to perform the act of connecting and communicating with ArcInfo either by queries or simple requests. In the first case, a response is expected from ArcInfo. In the second case, the execution can continue immediately. At C-level, message passing features are included to acknowledge errors or receive something from the ArcInfo server. An ArcInfo command can be evaluated or a request for a result processed.

The following modules compose the C-level of the client (arcclient):

ARCCONNECT:: This function connects the client to a server identified by the contents of the file connect.arc. If the connection is successful, it returns a non-negative server identification.
ARCCOMMAND:: Function of the container of the string returned by ARC, or "ERROR!" if the request to the ARC server fails.
ARCQUERY:: This function sends a request to execute a specified procedure number to a server, with a string as an argument for the procedure. The results of the request are concatenated to the status of the request and returned in the return string.

Using this client-server mechanism, we developed a model of interaction between the user, the interface agent and ArcInfo. Part of this model, interactions during the initial phase of the system's operation, is shown in Figure 3.

Figure 3. Interaction between the user, the interface agent and ArcInfo during the initial phase.

During this phase

client-server connection between ArcInfo and SNePS is established,
the user is asked for a workspace (as was stated earlier, for simplicity we assume that all database is within one ArcInfo workspace),
the workspace is examined to filter out all relevant information, which is written to two SNePS source files: network file and lexicon file,
the files are remotely copied to the machine running SNePS, and
network and lexicon files are loaded in SNePS to form the basis of knowledge base of the intelligent agent.

How the Interface Agent Works

At this point our interface agent has some beliefs (knowledge) about general organization of ArcInfo and about the content of the workspace in question. The usefulness of this interface agent depends on it's ability to process various kinds of requests from users to ArcInfo. If we assume the request - response model of interaction between a user and ArcInfo, which is reasonable for novice users, then the task of the interface agent is to

interpret user's request,
translate it into ArcInfo command(s),
receive some results back from ArcInfo, and
present them to the user.

Depending on the type of query and response (display, listing of attributes, text, etc), last two steps may be omitted. The critical task (at least, at this stage) is to be able to convert user's input into sequences of ArcInfo commands. In order to do this, the agent must have plans - sequences of actions which are taken if their preconditions are satisfied, for every type of user request. The next section covers some of such plans present in our interface agent.

Development of plans for some typical spatial queries to ArcInfo

Before any plans can be developed for our interface agent, we need to broadly identify the kinds of tasks which people attempt to do using GIS. Mark and Gould emphasize that

underlying the design of decision-support systems are questions such as "What can people do with computers?", and the even more fundamental question, "What do people do?". Answering the question "What do GIS users do?" for various sets of GIS tasks would be a prerequisite to the design of user interfaces [3, p.1429].

In this paper we cannot attempt to cover all "various sets of GIS tasks", rather we limit our task to developing plans suitable for some typical queries that users perform in ArcInfo. What would be a typical query, what types of queries exist? This question is definitely worth a special research which is beyond the scope of this paper.

To get an idea of questions that users might try to solve utilizing ArcInfo, a questionnaire was distributed (Appendix) . Based on a few responses received and on personal experience sample questions (question is used interchangeably with request or query here) were formulated. These questions definitely reflect an approach based on personal experience with GIS and ArcInfo, as well as, possibly, some biases in the use of GIS, database construction, etc. The choice of questions was also limited by practical considerations: only questions that can be answered using our sample database were considered. Given all these limitations, it is still hoped that these questions are adequate as a starting point.

To this point only a few plans have been developed. We started with the basic questions of type "Where is...", "Show...", such that they refer directly to values of attributes of coverage features and most generally can be answered by showing requested features graphically, in relation to other features. Here are two questions of this kind used in our demonstration.

Show all forests in the study area.
Where is the Moneron creek?

Assuming that these features are identified in coverage attribute tables or related files, queries of this kind can essentially be answered by retrieving information from INFO. Other questions require some spatial processing, such as overlay or buffering.

What is the total area of woodlands?
What portion of the island is covered with woodlands?
What percent of area within 100 ft of streams is forested?

For example, question Where is the Moneron creek? could be responded to with the following set of commands in Arcplot:

reselect {path to database} {streams} {arc} {name} = '{Moneron}'
linecolor red
arcs {path to database} {Streams}

where elements in { } are provided (through inference) by the interface agent.

Here is the actual plan that the agent follows to carry out this task:

If a value is related to an attribute 
and the value is related to a feature 
and the value is related to a coverage 
then a plan to locate the value of a type 
is to say "reselect " 
and then say the coverage 
and then say "space" 
and then say the feature 
and then say "space" 
and then say the attribute 
and then say " = '" 
and then say the value 
and then issue "quote" 
and then issue "linecolor red" 
and then say "arcs %.db%/" 
and then send the coverage.

where %.db% is AML variable that contains path to the database.

Representations of plans such as the one above are stored in a semantic network knowledge base (the system's knowledge base), and used to translate Natural Language queries plan definitions, and requests to sequences of ArcInfo commands.

Natural Language Processing and Meaning negotiation

In order to process natural language queries, plan definitions, and requests to ArcInfo, the intelligent agent uses four data sources:

an analysis/generation grammar of a subset of English that represents the syntactical/semantical structures involved in the dialog among the user and the intelligent agent,
the domain-specific dictionary automatically built from the ArcInfo workspace,
a system's dictionary that contains descriptions of words that are not domain-specific (e.g.,: the, of), and
the system's belief stored in the system's knowledge base.

These data sources are used by a parser in the analysis and generation of English sentences. We modified the current parser of SNePS in such a way that it can handle possible mismatches between user's input and the representation of the ArcInfo workspace in the agent's knowledge base.

These mismatches will be quite common for users who are not familiar with the content and/or structure of the ArcInfo database (i.e. type of users this interface is developed for). In order to deal with this problem, our intelligent agent uses forward and reduction inference to infer, from the agent's representation of ArcInfo workspace, possible matches between user's and ArcInfo terms and request to the user the selection of one of the possible alternatives (if they exist).

The selected alternative is used for further processing of the query, and the new concept is added to the lexicon. The meaning negotiation process and the components involved in the processing of natural language queries are shown in Figure 4.

Figure 4. Structure of the meaning negotiation process

Testing the Interface Agent

The questions we used in developing of our interface agent relate to the sample ArcInfo database. The database consists of five coverages for the Moneron Island located 50 km north of Hokkaido, Japan, in the north-west Pacific. The coverages are:

SHORE - shore line,
STREAMS - streams and rivers,
TOPOGRAPHY - elevation contour lines,
TRAILS - trails, and
VEGETATION - vegetation cover of the island.

All data sets were created from paper maps by one of the authors. Meta-data files were created using DOCUMENT in ArcInfo. Overall, these data sets are typical for ArcInfo data sets and provide a simple but realistic basis for geographic problem solving using ArcInfo geographic information system.

Figure 5 shows a sample run of the prototype. The top-right screen shows the activity of the ARC server, the bottom-right screen shows the SNePS agent and left screen shows the answer to the request "find all woodlands" provided by accessing the ArcInfo workspace.

Figure 5. Sample run of the interface agent - screen dump.

Discussion and Conclusions

Different kinds of spatial problems exist and expression of a problem in a human language may not necessarily be the easiest way to solve it. The use of natural language interfaces does not exclude the possibility of using other alternatives if they are found useful in a particular situation during the user-computer interaction. The ideal interface for a GIS system seems to be one in which the interaction is performed in a multimedia language that may include (among others) natural language, gestures (e.g.. pointing to icons, menus and images), voice and text commands.

Different types of users exists, with different knowledge and skills. The system we are developing is mostly aimed at a particular type of user, the one without sufficient knowledge of ArcInfo and, possibly, without much GIS experience. Natural Language interfaces seem to be helpful for this type of users as long as they offer the possibility of expressing requests in the same way the users may express them to a human consultant.

Interface agents such as the one proposed in this paper could be used for research in spatial cognition: how people think about spatial problems, and how useful can be linguistic devices for expressing their needs of processing spatial information. These research questions could be investigated with the help of tools like the one we are developing. Instead of ruling out a particular way of interaction, it might be interesting to test in which cases it might be useful.

Many more plans are needed, and much more research on tasks carried out with GIS is required to define "typical questions" and provide automated answers to them. We will appreciate if you help us by filling the form in Appendix.

As it was research oriented, efficiency and speed were not considered as important in the development of the prototype. As result of this, our current prototype has response time that might not be satisfactory for real time interaction. Response time can be improved if the system orientation changes from research to production.

In the future, we will consider connecting the interface agent and ArcTools, to provide the user a unified interface in which Natural Language is one possible way of reach the user's goals.

Appendix. Information requested from ArcInfo users

We are doing research on interfaces to GIS, and we are interested in knowing about the types of problems GIS is most often used to solve, and particularly about the ways these problems are stated by the users. In other words, if problems can be formulated as questions in plain English, we are interested in the typical questions GIS users ask. Obviously, these questions vary from one application to another, but there may be generic features that are fairly common (for example, questions that imply operations on one dataset [query to coverage.PAT, buffering] versus questions that require operations on several datasets [overlay]). To make it more specific, imagine that you have a study area, for which the following datasets are available :

Vegetation
Attributes : vegetation classes (forests, woodlands, shrubs, meadows)
Streams
Attributes : stream names (Cold, Cedar, etc.; you can add some)
Trails
Attributes : trail names (use any names you want)
Shoreline
Attributes : none
Topography
Attributes : absolute elevation values (0 - 450 meters)

Given these data, what questions would you ask about the study area ?

TYPE and STRUCTURE of question are more important than any specifics, you c an ask any questions and add more datasets if you need.

Examples of questions :

"Show all forests in the study area"

"Where is the Cold creek?"

"Show places higher than 400 m"

"At what altitude do trails cross Cedar stream?"

"List all areas with meadows that are steeper than 10 degrees"

"Find sections of trails which pass through woodlands on gentle (< 5 degrees) slopes"