Lessons Learned Developing a Complex Geodatabase While ArcInfo 8.x Evolved,

Brian Wopershall, Metcalf & Eddy

Lindle Willnow, Metcalf & Eddy

Abstract:

An ArcInfo and SDE application was developed for the Cleveland, Ohio’s Water Division (CWD) to model their distribution system’s hydraulics including all 5,304 miles of pipes and appurtenances. The client required that the database conform to relational database standards. However, to use ArcInfo, the ArcFM Water data model had to be adapted to match the conceptual database design.Several important lessons were learned in developing this project over the time between the release of ArcInfo version 8.0 through the release of version 8.1. The paper discusses development of the database, custom tool development to add client-required functionality, database tuning, and performance issues.

Introduction

The project began in mid-1999 prior to the release of ArcInfo 8 into the GIS market.  The goal of the project was to develop a hydraulic model of the City’s water system where all data would be stored in a central Oracle database that meets relational database standards. Modeling would be completed with data exported from a GIS system where the information could be reviewed and analyzed by anyone within the organization. Esri gave a presentation to the Cleveland Division of Water (CWD) and described the geometric network concept and the ArcFM data model which led to the recommendation of ArcInfo as the GIS software package. Development of an ArcInfo and SDE application began with the intention of having the ability to export data from GIS to a hydraulic modeling package to model the distribution system including all 5,304 miles of pipes and appurtenances in as well as tracking of CWD’s assets and maintenance activities. From this point, database design meetings were held between Metcalf & Eddy’s database sub-consultant, Database Technologies (DBT) and the Division of Water.  Part of the RFP for the project described the goal of a fifth normal database design for the GIS geodatabase. The reason for this was to minimize data redundancy. Fifth normal form is strongly dependant on table joins; a table is in fifth normal form if it cannot can be broken down into any number of smaller tables without a loss of information. DBT worked with the City to assess their existing data, much of which was in dBase or Paradox spreadsheet format.  A meeting with Esri was planned for November 1999 to discuss the differences between the CWD design and the ArcFM Water design. From there, it was determined that CWD’s database design would be essentially customized because in many cases, ArcFM’s feature class configuration did not match CWD’s business operation. At the conclusion of the design, custom tools were written to allocate demands and export data from the GIS to the model. As of June 2002, distribution system modeling is on going and model results are being calibrated to the behavior of the system.

Project Timeline

Hydraulic Model Software Selection May 1999

Hydraulic Model Data Conversion                                 July 1999 - March 2000

Hydraulic Model GIS DB Design                                  November 1999 - April 2000

Hydraulic Model Tools Development                             January 2000 - August 2001

Hydraulic Model Demand Allocation                             August 2001 – April 2002

Hydraulic Modeling and Calibration                               April 2002 - present

Gap Analysis of CWD Design and ArcFM Design - November 1999

After selecting the ArcFM Water data model as the basis for the geodatabase design, a series of meetings were held to discuss how to close the gap between a normalized database and the ArcFM model. Several SDE database technical analysts from Esri visited Cleveland and discussed their software requirement to rely on Visio Enterprise as the database design tool for Esri geodatabases.  Originally Oracle Designer was the suggested tool for UML design. This realization changed the development significantly. Use of subtypes and domains became better understood and the next step was to re-visit the design of the existing database and decide how important it would be to have the CWD design similar to the ArcFM design.

Database Design Meetings to finalize CWD Design with respect to ArcFM Design and H2Onet modeling software

December 1999 to April 2000

It was decided soon after the Gap Analysis meeting that the CWD database design would be fully customized instead of using the ArcFM model. After much discussion, the feature PressurizedMain as it exists in ArcFM was changed to Pipe for simplicity.  Appurtenances such as air valves, pitometers, and access manholes were broken up into their own feature classes rather than using subtypes as done in the ArcFM Water model. Also, to create fields that would map into the H2Onet modeling software, several fields and tables were added.

The result was over 200 tables with over 100 relationships and over 40 features on the water network. Many of these tables were needed to remain consistent with the goal of a database design as fully normalized as possible. For example, Pipe characteristics are stored in a “Pipe_Make” table since many pipes have the same characteristics. Rather than breaking tables down to the fifth normal form, the most normalized tables are in the third normal form in the CWD geodatabase design.Third normal form requires that all attributes not dependant on the primary key in the table be moved to a related table. According to several technicians at Esri, this is one of the largest, most complex geodatabase designs in the U.S.

Custom Tool Development – April 2000 to December 2000

Several custom tools were written in Visual Basic by developers at Esri for the project to facilitate entry of all of the City’s information that could be incorporated for the hydraulic model. Tools that are key to the success of the project are the demand allocation tool and the H2Onet export tool.

Demand Allocation Tool

The demand allocation tool utilized CWD’s meter data on its 459,000 customers which was matched to parcel data by parcel number and address (72% match). It was written in Visual Basic using the ArcObjects data model. Based on a selected set of parcels, the tool checks the parcel number in the Parcel layer against the parcel number listed in the CWD Billing Account table containing the daily average consumption of that user based on meter data.  After locating the meter data, the tool assigns the usage to the nearest node as well the demand’s usage category based on SIC code. Running the code took an average of 10 billing accounts per minute in a SDE environment on a server with dual 700 Mhz processors with 2 GB RAM.

Model Export Tool

This key component of the project exports data needed for the hydraulic model from the GIS. Data from over 30 tables in the GIS is extracted into an Access file that can be mapped into the H2Onet modeling software. After testing by developers at Esri, it was found that retrieving data through related tables significantly impacted performance, thus the code does not rely on the inherent relationships in the geodatabase.  Extraction of data to model 3,000 pipes takes approximately 90 minutes.

Lessons Learned

The following list summarizes the lessons learned from implementing or database design with Esri ArcInfo 8.x:

Versioning and Performance

After working in the personal geodatabase environment during data development, there was much to learn about working in the versioned environment in SDE. Features were digitized into the GIS, but many attributes were not assigned. One remaining task was to run an automated tool to rotate valves and hydrant symbols to match existing grid maps used by CWD. One approach was to run the tool in multiple versions over a weekend. The amount of time to compress these edits was significant – over 72 hours. Bulk edits to the database could not be done using versioning unless the tool included a routine to perform the reconciliation/post/compress function included in the program.Many attributes were modified once the data were in SDE. The number of edits that were made prior to reconciling and posting these edits would quickly affect performance.To assure that edits could be made without significant redraw slow-down, the database was compressed nightly.  This was often a time consuming task for the 6 editors working on the database at the time. Also, compression can only occur while there are no users connected to the database.

Schema Load Tool

After unsuccessfully attempting to load our UML model into ArcCatalog using the schema creation tool several times on two separate machines, we contacted Esri tech support. The alternative method to using the schema creation tool that Esri suggested was to load the schema into a personal geodatabase and load all data into that file. Then copy and paste the entire database into the newly created database connection. This performed the same action as the schema creation wizard, loading all relationships as well as network features successfully on the first attempt.

Joins & Related Tables

Many times the team’s GIS analysts were required to join related tables to main tables to perform their queries and reports.  Unfortunately, in some cases, a main table with 450,000 records would need to be joined to a table containing over 100,000 records that could be related.  If the key field was put in the main table, queries and reports would have taken much less time.  In fact, there were many cases were a query would return 0 records as a match when it was certain that the query should return a result. Esri tech support confirmed that this was a bug in the software that is being addressed as new versions are released.

Sorts

Similar to the case with joins and related tables, some sorts would return unusual results.  In one case, our control valve layer had 365 records, but after a sort, the total number of records would change to 165.  Esri tech support could not reproduce this action.  This complicated our QC analysis on such layers.

Oracle Parameters

A few Oracle parameters that were found to improve performance were the SORT_AREA_SIZE and DB_BLOCK_BUFFERS. The optimal values varied depending on the amount of data that resided in the database. Our database performance also improved after the database was rebuilt with an 8k block size rather than 4k. An error that was encountered several times was the case where the compression rollback segment was full.  Since compression occurs in one large transaction, it is recommended to compress often so this transaction does not over load the rollback segment.  We are currently using a 300 MB rollback segment for compression.

Conclusion

Although in theory the relational database concept has valid benefits, there are drawbacks when considering typical GIS tasks such as table-based attribute editing, bulk update queries, and overall data management and QC. Changes that were made to the software improved performance but were limited due to the number of records. It seems that the change to the design that would have improved performance and productivity most would have been to allow some redundancy of information in the database to reduce the need for joining tables and slowing bulk processing code.

We have been successful exchanging data between the GIS and the hydraulic model and found work-arounds to make the GIS work as is. In some cases where sorts and query results did not make sense, Oracle views were used to join attributes of interest enabling quality control checks to be performed quickly. Instead of using the slower queries in the GIS, SQL queries were used as much as possible to verify demand allocation, updating dates of installation for pipes, and similar automated attribute population activities.  Editing tools written in Visual Basic led to more efficient data entry by giving the editor only the information needed from existing data to assign updated attribute values. Bulk editing was performed on a nightly schedule, so that compression transactions were not so large as to impede work by editors on the following day.

As Esri improves ArcInfo with each successive version, it is likely that many of the issues we experienced will disappear.  In the mean time, it seems that one suggestion we received from an Esri developer is very applicable to any geodatabase design: keep it as simple as possible.

ACKNOWLEDGEMENTS

Pierre Haddad, Director of Advanced Planning and Hydraulics, City of Cleveland, Division of Water

Alex Margevicius, Assistant Commissioner, City of Cleveland, Division of Water

Guy Singer, Director of Planning, City of Cleveland, Division of Water

Foster McMasters, Project Manager, Metcalf & Eddy

Lakshman Thallapally, Project Engineer and Oracle DBA, Metcalf & Eddy

Ellen Cyran, Northern Ohio Data Information Services (NODIS)

Jim Wyles, Northern Ohio Data Information Services (NODIS)