Thomas A. Quinn

GIS Development Using SDE

The Esri Spatial Database Engine (SDE) provides an enterprise wide repository for spatial and attribute data within an RDBMS. Its client/server architecture and built-in spatial analysis and query tools allow efficient access, management and distribution of spatial and attribute data throughout a network. This paper describes a process to build a system to integrate a corporate database containing spatial data using the SDE extensions to ArcView and the SDE Application Programmers Interface (API). There are several activities that are part of building such a system including: 1) integration with other COTS applications and tools to provide the framework for the user interface, 2) the graphic display environment, 3) the overall RDBMS database design, and 4) the capabilities required to load, maintain, query, analyze, display and report on the data.


Thomas A. Quinn

GIS Development Using SDE

The Esri Spatial Database Engine (SDE) provides an enterprise wide repository for spatial and attribute data within an RDBMS. Its client/server architecture and built-in spatial analysis and query tools allow efficient access, management and distribution of spatial and attribute data throughout a network. This paper describes a process to build a system to integrate a corporate database containing spatial data using the SDE extensions to ArcView and the SDE Application Programmers Interface (API). There are several activities that are part of building such a system including: 1) integration with other COTS applications and tools to provide the framework for the user interface, 2) the graphic display environment, 3) the overall RDBMS database design, and 4) the capabilities required to load, maintain, query, analyze, display and report on the data.

Introduction

Many factors must be considered to achieve the goal of providing a functional, cohesive and easy to use system through the course of design and development of a GIS application. There are integration, customization and custom development choices that must be made even with the use of Commercial Off The Shelf (COTS) products. GIS applications have always had the need to interact with a database whether it is external or uses its own proprietary structure. This paper discusses some of the design and development issues and selection choices that are being considered for a system currently under development. The system requires GIS capabilities while accessing a large repository of spatial and attribute data. Early in the process it was decided that the Spatial Database Engine (SDE) provided capabilities best suited for this system.

Included in this paper is a general discussion of requirements for the system, database engine options that were considered, interface selection requirements, input data characteristics, import data process, archive data process, delete data process, and access data process. The functional modules are discussed by their requirements followed by a comparison of possible solutions. As the system is still in the design phase, selection of the best solution from the possible options has yet to be finalized.

General System Requirements

The primary application for the system being designed is to provide a query capability for users to access a large metadata (spatial and attribute data) database that is updated daily. An administrator of the system will be responsible for maintaining existing data and loading new data. The most recent data is of primary interest. Data older than one year is archived off-line and deleted from the database.

The system must to maintain only one years worth of data. Records older than one year will to be archived, saved to tape, then deleted from the system. It is desirable to create the archive with a months worth of records so that, at any time, the system may actually contain thirteen months worth of data. The system does not routinely use records older than one year but the system may be expanded to include an extended range of data in the future. There is also a requirement to include functionality for reloading an archived data set for a user performing historical processing.

The application is a portion of a larger system already in use. The operational system uses ArcView for its geographic analysis and display in a multi-user, multi-system setting. The system in development is required to interact with the operational system, provide a consistent user interface, and provide the access and processing requirements to interact with a large database of geographic and attribute data.

Database Engine Selection

Several options were considered for the database engine including the use of a generic RDBMS, Oracle Spatial Database Option (SDO), ArcInfo, ArcView and Spatial Database Engine (SDE). The engine must provide the capability to store and serve feature and attribute data for the system. Upon reviewing the options it was decided that SDE best met the requirements. The general characteristics of each product as applicable to the system are discussed below.

Generic RDBMS

A generic RDBMS was considered for the storage engine. The features used in the system are polygons with a fixed maximum number of vertices. These could easily be stored in a single table using a blob structure, stored as columns in a table, or stored in a polygon table using a feature identifier, vertex sequence number and coordinates of vertex. Attributes could be stored in a related table or stored directly with the spatial data depending on the structure type above. A generic RDBMS approach was not selected for several reasons including its inability to provide sophisticated spatial searches on the data.

SDO

SDO (formerly MultiDimension), provides a multi-dimensional approach for feature data. In this approach, feature representations (point, line or polygon) are stored in a single field within a table. A single Helical Hyperspatial code (HHCODE) is used to store the Euclidean spatial dimensions and additional data dimensional include depth, elevation, or time. The types of multidimensional data are restricted only in that they must be a numeric data type and have a bounded range. The HHCODE is generated through the recursive decomposition of dimensional space. Attribute data for specific multidimensional data is stored within columns of a table in the database. Access to the data for processing and manipulation is accomplished through extensions to Oracle PL/SQL . SDO was not selected for several reasons including: 1) it has less built in integrated capabilities with the ArcView COTS product already selected for use than SDE, 2) there is no current requirement for multi-dimensional capabilities, 3) its spatial search capabilities are not as comprehensive as SDE and 4) integration with other custom components to be developed would be more complex.

ArcInfo

The use of ArcInfo with its full featured GIS capability was also considered. Its internal model is feature based and uses topological data structures. Attributes can be stored in INFO or a related RDBMS. Although ArcInfo does provide the spatial search capabilities that meet the system requirements, there is no requirement for many of the analysis and mapping capabilities that come with ArcInfo. ArcInfo could be integrated into the system if these capabilities are needed in the future,. Access to the SDE database would occur through SDE functions assuming they are supported in future releases of ArcInfo. Additionally, data used in the target system is object (non-topological) based. This data model better fits the SDE data model than ArcInfo.

ArcView

The ArcView model for data in the system utilizes its native shapefile format. Its data representation model is object based which is consistent with the target system requirements. Attribute information can be stored in a DBase structure or may be stored in a related RDBMS. There is an additional advantage in that ArcView is already a component of the existing system. It was believed, however, that the general capabilities of ArcView were better suited to providing front end functions than handling a large dynamic database in a multi-user environment.

SDE

The SDE data model uses an object based approach and supports point, line and polygonal objects. Point objects are simple points or clusters of points. Line objects are linear arrays of points, a ring of points (same first and last point) or a spaghetti array of points (crossing lines). Polygon objects are homogeneous areas or areas that have holes in them (doughnut polygons). The SDE RDBMS storage model is fixed with the spatial representation of the feature stored in a single column in the database regardless of the object type. Additional metrics such as area, length, object type and bounding rectangle are stored within columns of the spatial table. Attributes for the objects are stored in a separate related attribute table. Each object has spatial indexes that are stored in index tables in the database. The uses of indexes make for efficient spatial searches with SDE. The group of spatial, attribute and index tables form what is known as a layer. Objects for a single layer do not have to be of homogeneous types, (i.e., points, lines and polygons may be stored together). The top level of the model is the data set which is a repository for the layers in the RDBMS and contains spatial translation and scaling parameters. The SDE model also contains security and locking information. SDE is a server/client application and contains administration tools, a "C" API toolkit and is integrated as an ArcView Extension. The development of custom "C" code can be reduced or eliminated by using Avenue classes and methods using the SDE capabilities within ArcView.

SDE provides all of the functionality necessary to meet the system's data storage and access requirements and contains features that make integration with the system's graphical user environment simpler. SDE supports the extremely large continuous database required by the system. Its server makes the database available to the system's client applications. Also SDE provides an Application Programmers Interface (API) to support the development of the required database and access functions.

Interface Selection

There are many options to consider in providing an interface to the administrator and users of the system. The options include providing a command line interface, making use of existing COTS products in the system, or building a graphic user interface through custom development or COTS customization.

A command interface to support administrator functions is desired. This simple interface allows the administrator to make use of the automatic schedule and run functions that are included with of the operating system. A graphic user interface for the administrator functions could also developed but there are minimal benefits.

The modules being developed for the system will be added as an enhancement to an existing ArcView based application. The user interface functions can be developed in ArcView as custom menus, buttons and tools. Additionally, ArcView's limited user interface dialog capabilities could be augmented for the system through the development of a custom Graphic User Interface (GUI), integration of other COTS products and interacting with ArcView through DDE or RPC communications. A GUI to support queries outside of ArcView that return tabular attribute data is not a requirement at this time, but could be a useful future enhancement.

There are a wide variety of COTS tools and GUI builders that are available for designers and developers who build custom applications. The list is too long to review here and goes beyond the scope of this paper.

Input Data Characteristics

The requirements of the system are to accept polygonal data that is being collected on a daily basis. The data that is ingested into the SDE database comes from an external source and is in a flat file format. The file contains both the coordinates of polygons and their attribute data. No record contains any single unique field that differentiates it from other records. In some cases the complete record, including the spatial coordinates, will need to be compared to all other records to determine if the record is unique. Records on different days are distinguished by the presence of a date field. SDE assigns a unique identifier to objects in its database.

An import file is named with the date (e.g. 031596) of the most recent data in its file. Each file contains records of the current day and the previous day, or in the case of weekends/holidays, multiple days. The import file does not contain all the records for the current day. It contains only the records available when the file was created. It is not possible to know which records were missing on a previous day so all the records from the previous day are included with the current days records in the import file.

Import Module Requirements

The import module must be able to read import files, provide gross error checking (e.g., check if format of import file is correct), and format and load the records into an SDE database. Additionally, the import module must handle data records in the current day import file that are duplicates from the prior day. Since SDE is object based and not topological based, the same spatial and attribute data record could potentially be loaded into an SDE database multiple times as separate objects. It is required that the import module not allow duplicate entries of the same object in the SDE database.

Import Module Comparison

The import module could be written in several ways including: 1) Converting the data to a shapefile format, loading the shapefile and removing duplicate records, 2) using Avenue to load records into a database feature class then loading only the new records into the database, or 3) development of a custom SDE API application that reads and loads the data. A hybrid approach for loading the records is also possible.

Importing the data into ArcView as a shapefile requires an Avenue script to create an empty shapefile with correctly defined spatial and attribute fields, reading the import file, formatting the spatial data into a shape class and attribute data to their appropriate field classes and loading the data records. This is a relatively slow process. Alternatively, a C program could be written to create a shapefile from the import file format. The shapefile could be loaded into SDE using the shapefile to SDE administration tool. Duplicate records (although the records would have a unique id in SDE) would then have to be deleted through a custom built SDE API application or Avenue application. This approach is undesirable because it compromises the integrity of the database by allowing duplicate records to be present and requires addition processing and resource requirements that would not be required using other methods.

An Avenue script could load the records into an SDE feature class. The script would query the database theme to determine if the feature is already present within the SDE database. Records that are not present are loading into the SDE database. This type of processing would not require a custom SDE API application to be written, but would require the administrator to initiate and monitor the procedure in ArcView. Displaying the information graphically or in a table format is also possible as the data is imported however, it would be undesirable to do so since this slows the process considerably. The new data once imported could easily be displayed using ArcView database theme capabilities. The overall procedure would require greater processing resources for reading the text file, conversion of formats, and interaction with SDE functions than a custom built SDE application.

Development of a custom SDE API application that reads and loads the import file is the best option. The application could check each record to see if it had already been loaded into the SDE database, and if not, load it. Alternately the application could be written to preprocess the import data into a new import data set so that duplicate records can be excluded allowing only processing of a simple load of unique data. This also has the advantage of being able to place the SDE database layer in a load only mode that allows faster load of the data since indexes are created once at the end of the process instead of continually during the process. The load only mode approach is not possible if the database is to be used currently with feature retrieval operations by the users. The custom import data module could be placed in a UNIX cron which allows the scheduling and execution of non-interactive batch commands. The module could be scheduled for daily use or it could be run interactively by the administrator as part of the daily operations.

Archive Module Requirements

The archive module must be able to save a portion of the SDE database off-line. The archive must be in a format that can be reloaded into the SDE database since the system only maintains one years worth of records. Records older than one year need to be archived, saved to tape and then deleted from the system. It is desirable to create the archive with a months worth of records so that at any time the system may actually contain thirteen months worth of data. The system does not routinely use records older than one year but the system may be extended to include an extended range of data in the future. Also there is also a requirement to be able to reload an archived data set for a user doing historical processing.

Archive Module Comparisons

There are five basic approaches for dealing with the archive requirements 1) Let the system that generates the import files handle the archive requirements, 2) Save and archive the import files, 3) Export the data using the RDBMS export utility, 4) Export the data from SDE, or 5) Convert the SDE data to a shapefile (ArcView) format. Hybrid approaches to these options are also possible. The options are discussed below.

Import files come from an external source to the system. Allowing the external source to provide archive functions for the system is not a workable solution. The tasking, mission and data requirements are different between the system and external source. Although the external source does contain an off-line archived copy of the data, the archive would be difficult use directly because it contains more data than the system is using, and the archiving schedule differs. Data access rights would have to be addressed with the archive as well. The system also needs additional resources (disk space and software) to handle the processing and extraction requirements of the external source. Having the external source re-stage the archive so import files could be generated would involve the use of resources and personnel costs that are beyond its current mission. A positive factor of this approach is that the import utility required to re-load an archived data set would be nearly identical to the standard operating procedures for loading new data. Another consideration is that using an unprocessed archive is not an option if the data (spatial or attribute) changes once it is in the SDE database. Currently the requirements are for read-only use of the data but one should not preclude the option of allowing data modification some time in the future.

The saving and archiving of the import files is a simple and straight forward approach. Files are collected for a single month then archived to tape. Since the import files are named by date one just needs to look at the file name to determine the collection date of the data. It also makes extraction of archived data by a single date easier although, this is not how the data is primarily utilized in the system. This method increases archive storage and processing requirements since records may be stored duplicate times in the import files. Alternatively, the import files could be preprocessed to strip out partial (duplicate) day records. A simple UNIX shell script could be written to combine the files and eliminate redundant records prior to archiving. This is not a possible archiving solution if it is decided that the system will allow changes to the data once it has been loaded into the SDE database.

Native RDBMS export utilities can be used to dump the data to an operating system file that can be archived to tape. The native RDBMS export utility dumps all records in a table or tables to a file. Extracting just a months worth of records, including the spatial, attribute, index, and access records would be complicated. Export utilities are excellent for providing system backup of the complete SDE data sets and should only be used as such.

A custom SDE based API function can be written to extract the data for archiving. The API calls would open the data set and set up a search criteria to find features of the layer that meet the archiving date range requirement using this methodology . The data could be retrieved and stored in a binary format that closely represents the internal data structures from which they were retrieved or could be stored in the same ASCII format as the import files. The advantage of using a binary format is that it would reduce the amount of processing that needs to occur when importing the data and would also reduce the size of the archive file. This approach would require the development of another import function to process the custom formatted export file. Caution must be taken with this approach since there could be compatibility issues between the format of the binary structure of the stored data and that of future SDE releases. The second approach eliminates the compatibility problems but requires conversion of internal (system) units to real world units and formatting of spatial and attribute data to the import format. The existing custom import routine could be used to re-load the data when needed.

SDE contains a translator that can produce a shapefile format. A query can be applied through the translator to select only the records that meet the archiving requirements. The shapefile would be used as the archive file. The administrator has the option to graphically review the archive in ArcView before continuing the process or may continue to accept the archive and delete its records from the system. Also this approach has the advantage that the archive file may be graphically reviewed by a user interested in the archive without having to load it back into the SDE database. The administrator can re-load it back into the system once a user is satisfied that they have the correct archive data. Loading the archive back into the system is accomplished using the shapefile to SDE translator.

Delete Requirements

It is not part of the standard operating procedures to use data that is older than one year. The records must be deleted from the SDE database following archiving of the data. This also provides the system the additional benefit of knowing the approximate size of the database since the input data record rates per month are relatively constant. A capability is required to allow deletion of re-loaded archive data. The delete function also needs to have the capability to perform routine maintenance of the SDE database such as the deletion of single records.

Delete Module Comparisons

The delete module requirements could be accomplished through the ArcView environment, SDE administration tools, a custom application built using the SDE API, or a mix of the approaches. The delete option approaches are discussed below.

ArcView contains capabilities that would be important to an administrator doing SDE database record deletes. It provides the ability for the administrator to display data records prior to deletion and allows tweaking of which records are to be deleted. This approach limits the amount of custom API development that has to occur and places the administrator in complete control of when the delete should occur and what should be deleted. A custom Avenue script can be developed to aid this process or the ArcView database theme functions may be used directly. The administrator is at risk of human error in the deletion of too much or too little data if no automation in the process is provided. This approach relies on the administrator using his, and the systems resources with ArcView while doing a manual process. Also, there is no real need to view data to be deleted provided there is confidence in the contents of the system archived data.

The SDE administrator tools allow for deletion of single features and for complete layers. Although these options are not sufficient to handle our routine bulk delete requests, they have much value in providing maintenance for single feature delete and layer delete. The layer delete option can be used for deletion of re-loaded archive data since the archive re-load process could place data in its own layer instead of with the yearly data.

Providing a custom SDE API application to perform the deletion allows the function to be automated, reduces the risk of administrator errors and decreases the resources required for the process. This is the best option. The application should be written as a standalone utility that can be called directly from the UNIX environment. In this way the process can be embedded in a shell script combined with the archive process to handle the complete procedure. The archive/delete procedure can then be automated by using the UNIX cron (batch scheduling) to invoke the script on a monthly basis. The procedure could easily be written to abort the operation if there were problems in the archive process (e.g. system ran out of disk space) and could report the results of the process to a log file or mail the results to the administrator. Placing the procedure in a cron file has the added advantage of being able to schedule processing at off-peak hours.

Access and Processing Requirements

Users need to query and retrieve from the SDE database features which meet their search criteria. The search criteria can come from several sources. They can come from:

Attributes are also typically included with geographic search parameters as part of the search process. Queries can be generated from within the ArcView environment or requested at the operating system prompt. A query builder front end could also be a useful tool but there is no current requirement for this capability.

Access and Processing Module Comparisons

Features in the SDE database with their associated graphic and attribute data is returned by the Access and Processing Module. The user requires the capability to access the SDE database using spatial and attribute queries. Spatial information can be displayed in ArcView. The system must also allow SDE database to be queried from the operating system command prompt. Discussed below are the access and interface approaches for the system including ArcView database themes, Avenue requests and SDE API custom applications.

Interaction with the SDE data from ArcView is accomplished through the database themes extension. The extension is an add on of SDE function requests to ArcView. It provides the capability to use SDE layers as (database) themes and provides for interaction of SDE data objects through Avenue requests. Graphic and attribute data are returned for these requests.

A database theme may be created from the SDE layer containing the features of interest. The theme is similar to an ArcView feature theme and could be queried the using ArcView's query builder. Little or no additional development or customization is required to use database themes. Care must be taking to make wise use of the query theme properties which can reduce the number of features in the theme to those of interest.

Accessing the SDE data through Avenue requests provides extended capabilities from that of database themes. SDE layer information can be accessed (queried) directly without having to create a database theme using Avenue requests. Returned data can be presented to the user graphically as SDE features, converted and added to a feature theme, or converted and displayed as simple graphic data.

Defining and using the graphic search criteria as dictated by the requirements is straightforward from Avenue although more than a nominal development effort is required. SDE supports a wide range of geographic search criteria from within an Avenue script.

Defining the attribute portion of the query constraint is not well handled by Avenue. The standard ArcView Build Query Dialog can not be used to present information as it applies to themes, and the Avenue developer has no access to its contents. Avenue message boxes could be used to prompt the user to enter attribute constraints but this is not the most efficient of interfaces and makes a simple process for the user, lengthy, complex or prone to error. Having predefined queries selectable from a list is also an option but this approach would not provide the capability to perform ad-hoc queries. The interface could also be developed using a COTS product that has both a graphic user interface builder and inter-application communication capabilities. Communication to ArcView would be would be through RPC or DDE requests.

ArcView's graphic user interface (GUI) may be customized to make use of Avenue scripts that perform the necessary query and support functions. Menu items, buttons and tools can be added to the GUI to support the access requests.

The use of the SDE API is required to meet the need to query the SDE database from a operating systems command prompt. The developed program would make use of command line arguments which are a geographic search envelope followed by attribute constraints. The application returns the attribute data only.

Conclusion

The Spatial Database Engine (SDE) is a server of GIS data stored in a relation database management system and a provider of GIS services. Traditionally GISs and RDBMSs have worked together with the GIS providing management and storage for spatial data and the RDBMS for attributes. With SDE, GIS data is stored within the RDBMS and the engine provides the management, access and spatial analysis functions. It is straightforward to integrate SDE technology into a system by making use of ArcView database themes, and Avenue SDE requests. Additionally, there is a "C" API which can be used to integrate SDE into a system. The use of COTS products does not eliminate the need to make design and development choices in the building of a system.

This paper has described database, import, archive, delete and access/processing requirements and possible implementation options for a system that is under development. Selection of the best option has yet to be finalized and will be accomplished through the analysis of the options. This may involve the use of rapid prototyping of the solutions. The requirements of the system, specifically the need to access and maintain a large volume of spatially referenced features in a multi-user, multi-system environment, lends it to having SDE as a key component.


Thomas Quinn, MTS
TASC, Inc.
12100 Sunset Hills Road
Reston, VA 22090
Telephone: (703) 834-5000
Fax: (703) 318-7900
E-mail: taquinn@tasc.com