Implementing a Spatial Data Management Infrastructure in a Marketing Organization


Eric Pimpler, SBC Communications, Inc.

Abstract:
This paper will describe the experience of the Marketing GIS group at SBC Communications in converting to the new ArcGIS platform and ArcSDE spatial data model within a departmental GIS setting. This paper will discuss the organizational and technical issues related to converting to the new platform and spatial data warehouse. Discussion items include user and organizational needs and objectives, systems architecture design, creating large extent spatial data sets, data inventory and organization issues, data migration issues, and the tuning experience for ArcSDE in Oracle 8i. This paper will also discuss the advantages of relational database storage of spatial data and the architecture of the geodatabase and ArcSDE.

I. Introduction

SBC Communications Inc., is a global leader in the telecommunications services industry. SBC subsidiaries including SBC Southwestern Bell, SBC Ameritech, SBC Pacific Bell, SBC Nevada Bell, SBC SNET, Sterling Commerce, and Prodigy along with a world-class network enable SBC to provide a full range of voice, data, networking and e-business services to address the specific needs of individual businesses and consumers. SBC is America's leading provider of high-speed DSL Internet access service with approximately 1.3 million subscribers, and one of the nation's leading Internet Service Providers (ISPs). 

Headquartered in San Antonio, Texas, SBC has approximately 190,000 employees. SBC companies currently have more than 60 million access lines in 13 states along with a 60 percent equity interest in Cingular Wireless, its joint venture with BellSouth, which serves more than 21 million wireless customers.

The Marketing GIS Department was centralized in San Antonio, Texas in January, 2000 under SBC's Strategic Marketing Organization. The department's mission is to provide GIS analysis, mapping, reporting, and application development for the sales and marketing division. During the Spring of 2000, the department began converting to the ArcGIS family of software products and its ArcSDE data model. Previous to this time the group relied primarily on ArcView 3.x, Workstation ArcInfo and Esri's file based data models. This paper explores the rationale for converting to the new platform and data model along with our experience in planning and implementing the new system architecture. 

II. Overview of Esri Platforms and Data Models 

A. ArcGIS: A New and Improved GIS Platform
The new ArcGIS platform is a family of products composed of ArcView, ArcEditor, ArcInfo, ArcIMS, and ArcSDE built around information technology industry standards. It is a scalable system of software for geographic data creation, management, integration, analysis, and dissemination. The product family is a step forward from past Esri software releases for a number of reasons including a single code base for its primary applications, relational database management software (RDBMS) for data storage, a new customization environment, and enhanced Intranet application development tools.

B. Old Spatial Data Models 
Unfortunately, until very recently, GIS data models have not kept pace with some of their more sophisticated RDBMS cousins and have traditionally been file based. From an Esri perspective, the traditional geographic data models have included coverages, shapefiles, grids, images, and triangulated irregular networks (TINS).

Coverages - Coverages are the original spatial model used with ArcInfo, and form the basic unit of vector data storage. They store geographic features as primary features (arcs, nodes, polygons, and label points) and secondary features (such as tics, map extent, links, and annotation). Associated feature attribute tables describe and store attributes of the geographic features.

Shapefiles - Shapefiles were introduced with the release of ArcView 2 in the early 1990s. A shapefile is a non-topological data structure that does not explicitly store topological releationships, but .relies on run-time calculation for topology. The primary advantage of shapefiles is that its simple structure draws faster than coverages. Shapefiles are also easily copied, and do not need import and export facilities. In recent years, shapefiles have become the leading format for transferring data.

GRIDs - Grids are based on a geographic data model that represents information as an array of equally sized square cells arranged in rows and columns. Each grid cell is referenced by its geographic x,y location.

TINS - Triangulated irregular networks (TINS) are surface representation derived from irregularly spaced sample points and breakline features. TIN data sets include topological relationships between points and their neighboring triangles. Each sample point has an x,y coordinate and a surface, or z-value. These points are connected by edges to form a set of non-overlapping triangles used to represent the surface.
Images - Images are graphic representations of a scene, typically produced by an optical or electronic device. Common examples include remotely sensed data, scanned data, and photographs. An image is stored as a raster data set of binary or integer values that represent the intensity of reflected light, heat, or other range of values on the electromagnetic spectrum.

All file-based data models have significant limitations. Concurrent user access typically degrades performance dramatically and it is not possible to support multiple concurrent users editing a single file. In addition, there are limitations to the size of any layer stored in a file system. As files become larger, application performance typically degrades to a point where it becomes necessary to split large contiguous datasets into tiles to achieve adequate performance. 

C. New Spatial Data Models

To overcome the inherent obstacles of file-based data models, Esri has developed spatial data models based on Relational Database Management Systems (RDBMS). These new data models take advantage of advances in relational database technology to provide the features lacking in file-based models. The advantages of these new models are many, and include multiple concurrent user access to contiguous datasets, management of spatial and business data in an integrated environment, spatial indexes, versioning, long transactions, security, and support for intelligent features. Although most database vendors have implemented proprietary spatial data models in their products, this paper will concentrate on Esri's spatial data models.

ArcSDE - ArcSDE acts as broker between Esri's client products and the RDBMS. It manages spatial data in the RDBMS while allowing a user to extract and quickly render a subset of a large spatial data layer. This extraction is accomplished through the use of a multi-tiered spatial index scheme. As a result, administrators are able to move away from the tiled data model, and create seamless data layers covering an entire geographic extent of interest. In addition, ArcSDE allows concurrent user access to layers, along with versioning, long transactions, and all the other advantages of using a RDBMS to store data. 

GeoDatabase - There has been a significant amount of confusion surrounding the Geodatabase concept since it's introduction. The underlying database technology that supports the Geodatabase is simply a group of relational database tables administered by ArcSDE. From this perspective the Geodatabase doesn't differ from ArcSDE. The true power of the Geodatabase concept is the group of COM objects within ArcGIS running on a Windows platform. In effect, the Geodatabase concept builds on the storage component provided with ArcSDE by providing custom behaviors in the COM objects and relationships defined in the Geodatabase. 

III. Planning the Migration to the ArcGIS Platform and ArcSDE Data Model

Due to the advantages of the new Esri ArcGIS platform and spatial data models, SBC Communications embarked on a project to switch to the new data models and ArcGIS platform in the spring of 2000. This conversion was not a simple process, and many lessons were learned along the way. A large GIS implementation cannot be successfully completed without a rigorous planning process. This planning process, know formally as a system architecture design, enables organizations to accurately assess their current system design, software license structure, user needs, future systems design, hardware requirements, and implementation details. 

A. Current System Design
As a first step in migrating to the new spatial data models, SBC Communications completed a review of the then current system design. The system design consisted of a traditional desktop workstation environment accessing data from a centralized GIS file server. Desktop workstations included software licenses for ArcView 3.x running on Windows NT and UNIX platforms along with Workstation Arc/INFO licenses running on SUN workstations. These desktop workstations were connected to a SUN 1000E centralized GIS file server through a local area network (LAN). In addition, several rudimentary ArcView 3.x Internet Map Server applications were also being served through a Windows NT Server running Internet Information Server 4.0. 

B. Current Software License Structure
The software license structure consisted of floating Workstation Arc/INFO and ArcView 3.x licenses controlled through the License Manager located on a SUN 1000E server. In addition, several GRID and TIN licenses were available for use. Furthermore, each user desktop had a copy of ArcView 3.x available from their Windows NT desktop. Several ArcView Extensions including Spatial Analyst, 3D Analyst, and Internet Map Server were also licensed. 

C. User Needs Assessment
The Marketing GIS group was established to provide both spatial analysis and ad hoc support for the creation of maps and reports. To meet these requirements the group used ArcView 3.x and Workstation Arc/INFO as its primary tools for creating maps and reports in response to ad hoc requests. The group used a variety of geographic data sources including third party vendors such as Geographic Data Technology (GDT) and Claritas along with internally created telecommunications datasets. Non-spatial customer data was accessed via legacy databases running in a mainframe environment. Requests for internal customer data were submitted through the SBC Reports Group that pulled data from legacy databases and delivered the requested information in the form of flat text files. The spatial and non-spatial information would then be joined to produce the necessary analysis, maps, or reports. 

All spatial and non-spatial data sets were stored on a GIS file server running on a SUN 1000E platform. Arc/INFO coverages and shapefiles were employed as the primary data model. Since SBC services millions of customers throughout the United States, many of its spatial and business datasets are extremely large and spatial extents are vast. Accessing this data was cumbersome, confusing, and inefficient using the file based data models. The needs assessment revealed an obvious need to convert to the newer spatial data models. 

At the time, web based mapping efforts were virtually non-existent in the group. However, the needs assessment identified a need to create Intranet applications to relieve some of the common ad hoc requests and to provide a mechanism for creating custom mapping applications for the sales and marketing organization. ArcIMS was an obvious choice since it provides "out of the box" solutions to many common mapping applications.

D. New System Architecture Design
After reviewing the current system architecture design and software license structure and completing a user needs assessment, SBC designed a system architecture that could better serve the needs of the organization. The ArcGIS 8.x family of software comprising a complete geographic information system was selected as the basis for the new system architecture. 

Desktop versions of ArcInfo and ArcView would run in a desktop environment, and provide users with increased capabilities and productivity for mapping and analysis. End users at SBC would take full advantage of the data creation, update, query, mapping, and analysis functions provided through the desktop versions of ArcInfo and ArcView. 

The construction of a Marketing GIS Intranet site was also envisioned as part of the new architecture. This site would provide mapping applications and static maps, along with internal and vendor reports and analysis. In addition, a portal would be developed to provide access to internal and vendor supplied reports, analysis, journal articles, and a special collection of documents provided by Northern Light. This portal was to be developed in conjunction with Northern Light, and would provide a wealth of information to the marketing organization. A dedicated Compaq Proliant DL 580 Web Server running Windows 2000 with Internet Information Server along with ArcIMS and ColdFusion were identified as necessary components of the architecture. Intranet applications would be designed with a thin client approach using the HTML Viewer and Active Server Pages components of ArcIMS. 

For data storage, SBC selected an ArcSDE Data Warehouse environment running on an Oracle 8i relational database management system in conjunction with a traditional GIS file server each running on separate Compaq Proliant DL 580 Servers with Windows 2000. The ArcSDE server was set up primarily to provide support for GIS view and query clients. Initially, ArcIMS applications were seen as the primary consumer of data served through ArcSDE and Oracle. Traditional mapping, reports, and analysis functions would continue to take advantage of a traditional GIS file-based server until end users acquired the necessary training and skills with the new ArcGIS platform to take full advantage of the new ArcSDE spatial data model. 

For the most part, non-spatial business data would continue to be obtained through SBCs Reports Group on an as needed basis. These files would continue to be provided as flat text files that could then be loaded into ArcInfo or ArcView for joining with spatial data sets. In some cases, this data would be provided as automated data extracts occurring at set time intervals. Internally generated extraction, transformation, and loading (ETL) tools would be written to strip the data from legacy databases, transform the data as necessary, and load the data into Oracle. This automated transfer of data was seen as a necessary component for several ArcIMS applications.

E. Hardware Requirements
In support of the new systems architecture design, Compaq Proliant DL 580 Servers running Windows 2000 were identified as the major hardware component. Several reasons were cited for selecting Windows Servers over the more traditional UNIX environment. Although generally less stable than its UNIX counterparts, a Windows platform offers many advantages including cost, ease of administration, and a more readily available supply of talent for development and administration. In addition, system availability was not identified as a critical component in the design. 

IV. Implementation

Implementation of the new GIS architecture for the SBC Marketing Group began in the summer of 2000 and was essentially complete by January 2002. A number of steps including hardware and software purchasing, hardware setup, RDBMS installation, GIS software installation, user training, data loading, and data tuning were completed during this time. Although each of these issues requires careful planning and implementation, several deserve extra attention for any group planning to migrate to the new Esri platform and data models. 

A. System Administration
System administration is one of the most important yet overlooked aspects of any GIS implementation. A number of alternatives exist for handling this critical piece of the implementation. Probably the most frequently used scenario involves using information technology division employees to support the system. Advantages of this approach include the availability of 24x7 support, a ready supply of trained, experienced administrators, and other benefits that come with a centralized information technology group. However, charge-back costs to the group are frequently much higher than the costs associated with directly hiring and training system administrators that are directly under the control of the hiring group. The Marketing GIS Group at SBC Communications chose to administer its systems through employees hired to support the group directly rather than using traditional information technology group administrators. The cost savings of using this approach were considerable. 

Another system administration alternative includes hiring contractors for the initial implementation and on-going support. Since system administration is an on-going function this alternative is not cost effective in the long run. However, skilled contract administrators are often brought in during the implementation stage. Later, on-going administration duties can be handed over to internal administrators. 

B. RDBMS Software Selection
Selecting an RDBMS vendor for your project hinges on a number of considerations including existing RDBMS skills on staff, the size of your installation, integration of spatial and non-spatial data, and the level of support provided by Esri. Since most major RDBMS vendors provide similar functionality it is necessary to select a vendor based on the factors mentioned above.

Most of the major RDBMS software requires considerable training and experience to be used effectively. If you already have staff members that have experience using and administering a particular RDBMS product, this will probably be your best choice. It is of particular importance that someone on your staff be highly familiar with the administration of the RDBMS. Ultimately, system performance will be greatly determined by how well your database functions, and this is largely a function of the logical and physical database design implemented by the database administrator. In addition, it is important that the database administrator have an understanding of spatial data and how it should be physically structured in the database as well as knowledge of the particular vendor product. 

The size of your installation will also play a role in the selection of an RDBMS. Some products such as Oracle and DB2 do a better job of handling large amounts of data. Furthermore, in the case of distributed applications, these products tend to perform better. 

In the event that you have non-spatial databases in your organizations that can be enhanced through the addition of a spatial component it makes sense to use the same platform. Data can easily be transferred between systems using export and import functions. 

Finally, if you will be using ArcSDE as a middleware product to administer and access data in an RDBMS, it should be noted that the level of information and support provided by Esri differs significantly between products. At this time, a great deal of information, support, and training are being provided for Oracle and Microsoft SQL Server. Other vendor products such as DB2 and Informix are supported, but limited information and training are available. 

The Marketing GIS Group at SBC Communications chose Oracle 8i as it's RDBMS based on employee skill sets, size of installation, and the available support provided by Esri for using it's ArcSDE product with Oracle. In-house skills included an Oracle 8i certified database administrator with experience in ArcSDE and spatial data along with considerable SQL query experience among several other members of the group. In addition, the capability to store and efficiently access large datasets was a requirement for the implementation. Large spatial datasets from outside vendors along with massive internal customer data could potentially push the total database size into the terabyte range. Oracle has a proven track record for efficiently storing and accessing datasets of this size. Furthermore, transferring data between non-spatial Oracle databases in the organization and the ability to obtain training and support through Esri were contributing factors in the selection of Oracle as our RDBMS.

C. Data Loading
A number of methods were used to load spatial and non-spatial database into the existing Oracle/ArcSDE database including vendor processes, BusinessObjects jobs, Oracle PL/SQL jobs, and Arc Macro Language (AML) scripts. Many of the spatial datasets provided by third party vendors such as GDT come with loading processes that automate the loading of large datasets. These processes can be started during off-hour times such as nights and weekends so that system performance is not affected during the load process. In addition, these processes are generally easy to use. However, loads should be monitored for problems or errors that may occur. Most vendor data is supplied on a quarterly basis.

Custom written ETL jobs were also written in BusinessObjects and Oracle's PL/SQL to facilitate the loading of non-spatial customer data into Oracle. Since they were familiar with internal customer data SBC employees wrote these jobs. In general, most ETL jobs are run on a monthly basis.

AMLs were also written to load internally produced spatial datasets and some third party vendor data onto the GIS file server. The AMLs were produced using the "Batch" option available through the new ArcToolbox application.

D. Oracle and ArcSDE Tuning
Oracle and ArcSDE are flexible products that can be configured in numerous environments to support a wide range of applications. However, the performance of client applications accessing spatial and non-spatial data through ArcSDE and Oracle is largely dependent upon the skill and experience of the database administrator. Although there are general guidelines for setting up these products, tuning remains as much art as science. With Oracle, significant performance gains can be made by properly adjusting tables, indexes, and the physical placement of data within the operating system. Particular attention should be paid to the logical and physical designs of the database for use with ArcSDE. Typically an ArcSDE implementation will require a minimum of five tablespace to store the geographic data: FEATURE, ATTRIBUTE, SPATIAL_INDEX, ORACLE_INDEX, and SDE. Each of these tablespaces will store different tables or indexes that are used with ArcSDE and its associated client applications. From an ArcSDE tuning perspective, significant attention should be given to properly setting the grid sizes of each layer, parameters listed in the dbtune.sde file, and the giomgr.defs file. Both Oracle and ArcSDE tuning are subjects that extend beyond the scope of this paper, so we'll not attempt to do them justice here. 

E. Training
Moving to the new ArcGIS platform and ArcSDE data model required a considerable amount of training for the group. Each member of the group attended ArcGIS and ArcIMS training to assist in the transition. In addition, administrators obtained additional training in ArcSDE along with Oracle database administration training. 

F. Application Development with ArcIMS
A number of ArcIMS Intranet applications have been built and deployed through the Marketing GIS Groups website over the past year. These applications were built to fulfill specific needs in the marketing and sales organization in addition to relieving many of the ad hoc mapping requests that filter through the group. All applications were built as thin clients through the use of the HTML Viewer and Active Server Pages connector. A sampling of the applications include the following:

E. Challenges
All GIS implementations face challenges along the way, and this particular case is no exception. Probably the single biggest challenge we have faced and continue to face is the reluctance of analysts to switch to the new platform and data model. Despite the improved capabilities in visualization, editing, map production, and analysis, some users continue to rely on ArcView 3.x technology with file-based data such as shapefiles and coverages. Without question, transitioning to the ArcGIS 8.x platform and ArcSDE data model is not a painless process. Users must devote a significant amount of time to training, learning new terminology and functionality, and upgrading programming skills to the new VBA customization environment. However, the return on the investment is great in terms of increased productivity, enhanced map and report production, and web-based application development. Other challenges faced along the way included hardware setup problems, creation of custom data loading tools for legacy customer databases, data vendor support for the new data models, and personnel turnover.

V. The Future

In the near future we expect to take advantage of advances in Esri technology including the introduction of ArcIMS 4.0, integration of ArcGIS with ArcIMS in the form of producing ArcIMS services through ArcGIS project files (.mxd, .mxt), and direct connections to the Oracle Spatial product released in version 9i. The ability to produce ArcIMS ArcMap Services through ArcGIS is of considerable importance because it will allow us to take advantage of advanced data access and cartographic capabilities that our users have urgently requested. In addition, it should further hasten our users transition from ArcView 3.x and Workstation ArcInfo to the new ArcGIS 8.x platform.


Author:
Associate Director - Marketing GIS
SBC Communications, Inc.
8000 IH-10 West, 12Q05
San Antonio, TX 78230
Telephone: (210) 524-2295
Fax: (210) 541-1737
E-mail: eric.pimpler@txmail.sbc.com
Web: http://www.sbc.com