Spatial Data Management in an Enterprise GIS

Stuart Rich, St. George Consulting Group
Amar Das, St. George Consulting Group
Christopher Kroot, Maine Department of Environmental Protection

This paper will describe the experience of the St. George Colnsulting Group and the Maine Department of Environmental Protection in creating a spatial data management infrastructure for an Enterprise GIS. We will discuss the issues involved in creating large extent spatial datasets, data inventory and organization issues, data migration issues, tuning experience for ArcSDE in Oracle 8i, and the storage of imagery in ArcSDE. We will also discuss the advantages of relational database storage of spatial data and the architecture of the Geodatabase and ArcSDE.


1. What is an “Enterprise”?

There are many different definitions of the word “Enterprise”. For the purposes of our discussions here, we do not take “Enterprise” to mean NCC-1701 Constitution-class starship, but rather “Any organization that needs to support multiple concurrent users accessing a shared information resource.” This may mean a three or four person shop concurrently working on a single project or it may mean several thousand people spread out over the globe networked together with a Wide Area Network or the Internet.

2. What are the data management needs of an Enterprise?

The information management needs of an organization change dramatically as soon as any group of users requires multiple concurrent use of any information asset. This basic requirement, with its attendant requirements for security, record level locking, edit conflict resolution, etc., is the prime force behind the evolution of the modern Relational Database Management System (RDBMS). It is this need for centralized management of the shared concurrent access to an organizations information assets that we take to be the primary differentiator of an Enterprise from other organizations. In its simplest form, the problem boils down to providing your users with secure, dependable access to centrally managed information for the organization.

3. The old spatial data models

Up to this point, we have not mentioned Spatial Data specifically at all. This is no oversight. Geographic Information Systems are merely a small subset of the other Information Systems of an Enterprise. The GIS acronym stands for Geographic Information System after all. In many ways, the Spatial Data management needs for an Enterprise are little different than other data management needs. Unfortunately, until very recently, GIS data models have not kept pace with some of their more sophisticated RDBMS cousins and have traditionally been file based. From an Esri perspective, the traditional geographic data models have included:

a. Coverages -- The basic spatial model for ArcInfo. The coverage is a very solid data model that has served us very well for many years. The data model includes internal topology, and is very rigorous about enforcing proper feature construction.

b. Shapefiles -- The basic spatial data model for ArcView. The Shape File model is much less rigorous (some would say sloppy) about enforcing feature integrity and relies on run-time calculation for topology.

c. Librarian -- Librarian layers are collections of coverages which are adjacent to each other. Each coverage is referred to as a 'tile'. The tiles are defined in an index coverage which is a polygon coverage of just the space each tile takes up (such as a USGS map boundary). Tiles do not have to be equal in size or shape, but usually are. All tiles have to conform to the tile boundaries specified by the index coverage. The index coverage contains an item for each layer in the library, the record for each tile is merely the path to that tile. So ArcInfo knows to find the library based on an entry in the system's INFO file, then looks in the library's index coverage to see what layers are in there, then looks for the tiles wherever that is specified. The advantage of libraries is that the whole layer is not rendered as you zoom in, only the tiles within your extent - very similar to SDE's use of spatial index. The disadvantage is that it's still built on the coverage model and thus does not support (very well, anyways) multi-user editing and display. Our tests have shown that library layer display comes to a crawl as more users are added.

d. GeoTIFFs (and other spatially registered images) -- a file based spatial data model for rasters where each pixel has a spatial representation but rather little attribute depth.

e. GRIDs - The ArcInfo representation of a Raster image that allows for greater attribute depth for each pixel in the file.

f. Image Catalogs -- An image catalog is similar to a library layer, but each catalog has its own database file. The database file is nothing more than a table of minX, minY, maxX, maxY, and image location. It specifies the extent of an image and where it is stored. In this manner, users can bring in image tiles as an apparent mosaic without finding all the tiles and piecing them together. Image catalogs were not supported at ArcMap 8.0.x, but are at 8.1.

There are several significant limitations to any file-based data model. Concurrent user access typically degrades performance dramatically and it is not possible to support multiple concurrent users editing a single file. Further, there are limitations to the size of any physical layer in the file system. The file size limit is largely a function of the density of the data involved, but in many instances it becomes necessary to subset large contiguous spatial datasets in order to obtain adequate performance. The Librarian structure was developed to streamline some of the resulting problems inherent in tiling datasets, but it remains a rather in-complete solution.

4. The new spatial data models Over the course of the past five years or so, Esri and most of the major database vendors have begun the process of developing spatial data models based upon Relational and Object-Relational Database Management Systems. The aim of these development efforts is to take advantage of advances in relational database technology in order to provide the Enterprise features lacking in a file-based data model. Security, multiple concurrent user access, and spatial indexes are dramatic improvements to the traditional spatial data models available through the new Spatial Database models. While many of the major database vendors (Oracle, Informix, IBM) have introduced their own proprietary spatial database formats, for the purposes of this paper, we will concentrate on Esri’s implementation of Spatial Database Technology.

a. SDE layers The Esri Spatial Database Engine (SDE) has been around for several years now and has achieved great performance advantages over file based spatial data models. SDE creates a multi-tiered spatial index scheme on your spatial data allowing a user to extract and render very quickly a subset of a very large spatial data layer. This capability allows a spatial data administrator to move away from the tiled spatial data model and create seamless data layers for the entire geographic extent of interest to the users. While you can load spatial data into SDE from almost any data format with the appropriate software, SDE enforces a much more rigorous spatial data model than shape files, and shape files can sometimes provide trouble when trying to load into SDE. We recommend that shape files be converted to coverages before being loaded into SDE. SDE is currently the only spatial data format that is visible to all of the Esri clients.

b. The Geodatabase The Geodatabase is, in our humble opinion, the most significant advance in the spatial data model in thirty years. The Geodatabase, as implemented with ArcGIS 8.1, is an object-relational data model that enables tremendous new capabilities in our attempts to model the world around us. For the first time, we can begin to model the behaviors of the spatial objects in the world around us and not just the attributes of those objects. Coupled with the relational database technology that gives these new models their persistence, these new data models will deliver great new flexibility to users of spatial data in the years to come.

There is a tremendous amount of confusion currently about this new term “Geodatabase”. While the underlying technology that supports a Geodatabase is a group of relational database tables administered by SDE, the Geodatabase itself exists as a group of COM objects within ArcGIS as it is running on a MS Windows platform. For this reason, technologies that are not running within the MS Windows memory space (ArcIMS, ArcExplorer) or applications that have not been built to see the new Geodatabase objects (ArcView 3.X) will not be able to take advantage of the exciting new opportunities made available by these new data models. It is important to realize that while SDE may manage the storage aspects of a Geodatabase, SDE does not understand any of the custom behaviors that may have been defined for Geodatabase objects or the relationships that have been established within the Geodatabase.

Within the Geodatabase, there are several base object classes to enable the storage and management of spatial objects. These base classes include:

i. Feature Classes - This is the most basic type of Geodatabase object. You can think of it as roughly analogous to a shape file or an individual layer of a coverage. A feature class stores a group of features with a shared geographic extent, spatial reference, attribute table, etc. Each individual point, line, or polygon within a feature class is a separate object within the feature class.

ii. Feature Datasets - A feature dataset is intended to store a group of feature classes that share some sort of spatial relationship. For example, you might create a ‘Political Boundaries’ feature data set that included Towns and Counties feature classes where Towns and Counties share some coincident boundaries. You could then establish some editing relationships that determine that whenever a Town boundary is moved, that any shared coincident geometry in the associated County will also be moved. There is no specific requirement of Feature Datasets that the member Feature Classes have spatial relationships.

It is technically possible to utilize Feature Datasets as logical data organization mechanisms. For example, you could create a ‘Hydrography’ Feature Dataset that included Rivers, Ponds, Lakes and Streams. None of these feature classes would share any spatial relationship, but the Feature Dataset would be used to logically organize the data and make it more easily accessible by the Enterprise users.

There is a significant problem associated with this “misuse” of the Feature Dataset, however. Whenever a Feature Class within a Feature Dataset is opened for editing, all of the other Feature Classes within that Feature Dataset are also opened in order to check for spatial relationships. This can create severe performance problems in many cases. We recommend that Feature Datasets be used only to store Feature Classes with shared spatial relationships (for which they were intended after all) and that other mechanisms be developed for data organization and usability purposes.

We have developed a custom extension to ArcMap for this very purpose for our clients at the Maine Department of Environmental Protection. Using this tool, the Spatial Data Administrator can create a multi-tiered folder structure to organize the Enterprise feature classes into logical groupings for the user. This approach has the added benefits of allowing the Spatial Data Administrator to include a single Feature Class in several places in the folder structure and it allows the users to view all available Enterprise spatial data from within ArcMap without having to start ArcCatalog or go to the Add Data dialog box.

iii. Rasters - An SDE Raster represents the capability of storing raster data within a RDBMS. All of the justifications of centralized information management, security, multiple concurrent users of central information assets, and the performance gains inherent in the spatial indexing of very large datasets apply here.

iv. Network objects - Spatial data models that require a network topology (transportation road networks, water distribution networks, electrical distribution networks, etc.) require the Geodatabase to be implemented. SDE alone does not support a network topology data model. (Nor do most of the proprietary RDBMS spatial data solutions). The Geodatabase, however, fully supports the network data model and several Esri business partners (most notably Minor and Minor) have developed custom network data models for water/wastewater distribution and electrical distribution.

v. Custom objects - As we just mentioned with the Minor and Minor example, it is very possible to create custom data models within the Geodatabase. Much work has already been done to develop a custom hydrography data model by the USGS and others. The EPA is working with Ross Associates to develop a custom data model for regulated facilities. With the new custom data modeling capabilities of the Geodatabase, it is now possible to create spatial data models with much more depth that more accurately represent the objects in the world around us.

5. Implementing an Enterprise Geodatabase.

If you have gotten to this point in our discussion and are starting to think that perhaps an Enterprise Geodatabase represents some great advantages for your organization, then the next question you will be asking is “How would I go about implementing an Enterprise Geodatabase within my organization?” You will hear a number of presentations this week at the User’s Conference that imply that all you have to do is create a Geodatabase, drag and drop a few shape files into it and Presto! you have an Enterprise Geodatabase. While this statement may be technically correct, designing an Enterprise GIS that is appropriate for your organization will require much more thought and planning to get it right. Here are just a few other minor considerations that should go into your planning of an Enterprise GIS for your organization:

a. Architecture Design

i. How would my users like to apply GIS within their daily work flow? Be honest here. Your users would probably like to do a whole lot more with GIS than they are currently doing but are limited by the fact that data is hard to find, tiled in inconvenient ways, hard to get over the network, and they are unsure whether they have the most up to date copy. And, oh by the way, the desktop GIS software (if you are still using ArcView 3.X) doesn’t give them the kinds of GIS capabilities that they would really like. Before you can develop an Enterprise GIS that fulfills your users’ needs, you will first have to document what these needs are. This documentation process can be difficult and time consuming and can not be done from the comfort of your office. Get out there and talk to your users and figure out what it is that they really need to do with GIS. Write it down. Don’t’ dismiss any of their requests as impossible before you document them. Prioritize their requests.

ii. What kind of bandwidth is available within my organization? GIS datasets can be very large and dense. Moving even subsets of these datasets across the Enterprise network for manipulation or viewing purposes can have serious network performance implications. If you have a high capacity network and a relatively small number of GIS users each with a relatively powerful workstation, then connecting each workstation directly to the Enterprise Geodatabase over the network is a very viable solution. If, on the other hand, you have a large number of distributed users with less than ideal workstations and shared, modest bandwidth, then you will be better off setting up a central Citrix server on a high capacity network link to your Enterprise Geodatabase and serve your GIS client applications over Citrix connections for those users that need GIS desktop applications.

As an example, at our St. George Consulting Group offices in Rockland, we have a 100 Mb switched network and all of our GIS users (about eight) have relatively capable workstations. Running ArcGIS locally on the workstation and connecting to the Enterprise Geodatabase works just fine in this environment.

At the Maine DEP where they have over a hundred users in four regional offices throughout the state, many of the regional offices are sharing a 10MB connection to the main office and have less than exciting workstations on the desktop. In this situation, the Citrix deployment model has delivered outstanding access to GIS applications to these distributed users over the Wide Area Network.

iii. What are the capabilities of my users’ desktop machines? As we have just described, ArcGIS requires a pretty substantial workstation for optimal performance. Don’t give GIS a bad name by delivering a high performance software package on inadequate hardware.

iv. Do my users need to publish GIS data or services outside our internal network? Internet mapping infrastructure deserves a whole presentation of its own. We will not try to go into all of the considerations for designing an ArcIMS architecture here. There are a few points that we do feel should be made, however. First of all, it is very important to recognize that ArcIMS is NOT a GIS desktop application, but rather is a tool for publishing pre-defined maps over the internet. Do Not think of ArcIMS as a replacement for ArcView. Secondly, remember that ArcIMS can utilize SDE layers, but will not be able to take advantage of most of the sophisticated capabilities of the Geodatabase. If you need to have access to Geodatabase objects from within ArcIMS, you will need to do some pretty sophisticated programming an utilize ArcGIS 8.1 as a GeoObject server (not something that is handled within the current licensing language of ArcGIS).

b. Capacity planning (hardware)

i. How large and complex is my data? Unfortunately, there are not simple elegant formulas to translate the size of a shape file into an equivalent Feature Class. Spatial database tuning is an interesting blend of art and science and involves a lot of trial and error.

ii. How many concurrent users must I support? There is an excellent white paper available on the Esri web site on System Architecture Design by Dave Peters of Esri. This paper will give you some good guidelines for hardware capacity planning.

iii. What kind of spatial operations do my users want to do? Keep in mind that all users are not created equal. Users that are concurrently editing a networked data layer will require more hardware resources than those that are selecting and drawing points layers.

c. Security Planning

i. What are the editing needs of my users? Which layers should be visible to which groups of users? Which groups of users should be able to edit which layers? Am I serving any sensitive data? You will need to develop a security plan for your enterprise that takes these issues into consideration.

d. RDBMS Software Selection The choice of a particular RDBMS vendor for your Enterprise will likely not be determined by a list of required functionality that one vendor supports while others do not. For the most part, all of the major RDBMS vendors will be able to support the majority of your requirements. Your spatial RDBMS vendor selection will therefore fall on several other criteria:

i. Do I have any particular RDBMS skills in house? If you already have RDBMS experience with a particular Enterprise database in house (MS Access is NOT an Enterprise database) then this vendor is most likely your best choice to implement your Enterprise Geodatabase.

ii. How large is my installation likely to get? If your Enterprise Geodatabase is likely to get very large and be distributed among several different offices, then Oracle and IBM probably offer the most scalable RDBMS platforms supporting advanced database replication.

iii. Do I have requirements to integrate spatial data with non-spatial applications? If you have non-spatial database applications within your organization that would be enhanced by GIS integration, then it makes the most sense to keep the RDBMS platform consistent across all applications.

e. GIS Software Selection - If you are going to create an Enterprise Geodatabase, you must remember that the only clients that can view all of the capabilities of this new data model will be ArcGIS clients. There are essentially three different flavors of ArcGIS to choose from each implemented with different capabilities of ArcMap, ArcCatalog, and ArcToolbox. ArcView 8.1 is able to select and analyze data from an Enterprise Geodatabase, but is unable to edit within this environment. ArcView 8.1 is only able to edit shape files and personal geodatabases. ArcEditor is able to edit data within an Enterprise Geodatabase, but does not have all of the geo-processing tools available with ArcInfo workstation. ArcInfo 8.1 is the full blown, top of the line product.

f. Training - Moving to an entirely new data model, often accompanied by a change in the GIS desktop software obviously will require some user training to help your users make the most of your GIS investments. There is generic training on how to use the software products available through Esri (Introduction to ArcGIS etc.) but Enterprise Geodatabases are very unique installations and you should plan on investing a fair amount of time and energy in developing user training that is specific to your installation.

6. Planning for the migration

OK, so once you have designed the systems architecture that will eventually house your new Enterprise Geodatabase, how do you start making the migration?

a. Existing spatial data inventory - Start by taking an inventory of all of your existing spatial data. How many duplicate or overlapping datasets are there? If there are differences between duplicate datasets, how will you resolve the editing differences? How much metadata exists for your current data?

b. Creating seamless datasets - For many organizations, the most important advance that a spatial data administrator can deliver to his or her users when implementing an Enterprise Geodatabase is access to seamless layers for the entire geographic extent of the organization’s area of interest. Moving away from tiled data is usually greeted by great cheers from the users.

Getting there, requires a bunch of work, however. The most dependable process for creating seamless datasets from tiled data is to convert all of the data to coverages, append the data into a single coverage using the appropriate snapping tolerances, resolve any editing problems and then clean and build the seamless coverage.

c. Spatial data loading - Once you have created your source data layers, you can either import these layers into your Enterprise Geodatabase using ArcCatalog or use SDE command line options. For large data loading operations, you will probably want to make some adjustments to the dbtune options in the Geodatabase for the duration of your data loading operations and then change them back to your production settings once the loading is completed.

d. Spatial data tuning - SDE tuning is a bit of a black art. With some of the RDBMS platforms, (Oracle in particular) significant performance gains can be made by adjusting table definitions, indexes, and in some cases the placement of portions of the physical data in different places on the file system. SDE tuning is way beyond the scope of this paper and we will not attempt to do the subject justice here.

7. Start small and build incrementally.

One of the principles of the Unified Software Development process that we have come to believe in very strongly is that of iterative and incremental development and deployment. You don’t need a pair of Sun 4500’s to begin your experiences with Enterprise Geodatabases. Start small. Learn the capabilities of the new software. Learn the strengths and weakness of your data. Roll out your Enterprise Geodatabase to a small number of users initially and test its performance under as many different user conditions as possible. As you gain experience with your users and your data, you will have a better understanding of how additional investments in your infrastructure could most effectively be made.

8. Looking ahead So how does an Enterprise Geodatabase fit into the broader future of GIS?

a. LOTS more spatial data is becoming available. -- There is an explosion of new spatial data becoming available and this trend will only increase over the coming years. Your ability to acquire and serve more and better spatial data to your users will increase dramatically in the next ten years. Make not mistake, your users will expect to have access to this data.

b. Network capabilities improving - Though is has not happened as quickly as any of us would like to see, network bandwidth is steadily improving and becoming accessible to more users. As bandwidth barriers fall, expect to hear from more users requesting GIS data and applications.

c. Publishing maps on the Internet - The more sophisticated capabilities are not currently available to our current map publishing software (ArcIMS). But this limitation will not be with us for long. The Arc 8 development team has indicated that with the release of ArcGIS 8.2 that it will be possible to author ArcIMS services with ArcMap. Though the complete functionality to be delivered has not be described yet, we are hopeful that these services will be fully mindful of the entire Geodatabase model.

d. Spatial data services - Your Enterprise Geodatabase will not be the only source of data of interest to your users. New Geographic data services are becoming available that will fill interesting niches particularly in temporally sensitive data (weather for example). There may be requirements for your organization to provide some of these same geographic data services either internally or externally.

e. Integration of GIS into Enterprise Information Systems - As the major RDBMS vendors mature in their ability to deliver integrated spatial data storage capabilities within their databases, the ability to integrate spatial concepts into the rest of our business database applications will become much easier to achieve and the demand for this capability will become much more common.

 

For more information on Enterprise Geodatabases and for all of our current documentation on the work that we are doing with the Maine Department of Environmental Protection, please visit our web site at and visit the ‘Sharing’ link.



Stuart Rich
St. George Consulting Group
16 School Street
Rockland, ME 04841
sturich@stgeorgeconsulting.com

Amar Das
St. George Consulting Group
16 School Street
Rockland, ME 04841
adas@stgeorgeconsulting.com

Christopher Kroot
Maine Department of Environmental Protection
State House Station 17
Augusta, ME 04333
christopher.kroot@state.me.us

Michael Smith
Maine Department of Environmental Protection
State House Station 17
Augusta, ME 04333
michael.smith@state.me.us