Kathy H. Spivey
Robert W. Finkle
Julia J. Hunt, P.E.

Navigating the Stormy Waters of ArcStorm in Water Utilities

The City of Arlington Water Utilities Department (CAWU) has implemented an ARC/Oracle GIS using ArcStorm to manage water and sewer system map graphics and data. The use of ArcStorm in this configuration has required CAWU to evaluate database design revisions and alternative application development scenarios. This paper presents the issues CAWU has encountered using ArcStorm, and discusses the pros and cons of approaches for using ArcStorm with ARC/Oracle for water and sewer networks.


Introduction

More and more geographic information systems (GIS) users are adopting external relational database management systems (RDBMS) as their primary database. This is especially true in local governments and organizations where enterprise-wide access to the data through a GIS, or some other form of information system, is necessary. Because of this trend, GIS and RDBMS vendors and users are having to create integration solutions for linking and transferring data between the GIS and RDBMS.

The City of Arlington is located in northern Texas between the major metropolitan centers of Fort Worth and Dallas. Arlington is home to more than 290,000 people and covers 98.8 square miles. The City of Arlington Water Utilities Department (CAWU) is responsible for supporting the city's needs for safe water and safely disposing of wastewater in a responsive, cost-effective manner. The Department consists of three primary divisions: Business Services, Operations, and Treatment. CAWU operates two water treatment plants, coordinates supply of water to its citizens with the Tarrant County Water Control and Improvement District No. 1, is responsible for activities related to two watershed protection programs, and maintenance of the water distribution system and wastewater collection system. The CAWU waterworks and sewer system serves approximately 129,400 water utility units. Approximately 1,100 miles of sanitary sewer mains comprise the wastewater collection system that services all developed areas within the City limits.

Recognizing the advantages of GIS in manipulating and presenting both graphic and nongraphic data and as a tool to improve customer service, CAWU created an internal division dedicated to developing a GIS that would meet the business needs of the users department-wide. Over the past years the GIS division built a GIS database that contains both graphic and nongraphic data associated with the city's water distribution system and sanitary sewer system.

In the process of developing the department's GIS, several integration hurdles were overcome. One such hurdle involved the interaction of the department's chosen relational database management systems with ArcStorm, a geographic data storage facility. The objectives of this paper are to develop an understanding of how ArcStorm works with external relational database management systems, to outline options for using ArcStorm with a normalized water and sewer system database, and to discuss the solution implemented by CAWU to continue using ArcStorm with their water and sewer system database.

Background

In 1996, the Water Utilities Department of the City of Arlington, Texas, embarked on a commitment to create a Geographic Information Division within their department. The intention of this effort was to focus dedicated time and resources on the development of a GIS that would: 1) create useable applications and products for the Department within the near future, 2) outline future GIS development efforts and requirements, and 3) provide the Department with a tool to improve customer service. Prior to the development of this Division, existing staff had assumed GIS development and data conversion tasks as a sideline to their primary roles and job responsibilities in the Department.

PlanGraphics, Inc., an independent GIS consulting firm, was contracted to work with CAWU staff to formulate an implementation plan and staffing needs for a successful GIS program. Recognizing a lack of centralized coordination of GIS activities, CAWU created two dedicated staff positions, a GIS Manager with the responsibility, decision-making capability and authority to make the GIS implementation happen, and a GIS Applications Programmer tasked with application programming, application project management and training. In addition to the new staff, one existing position was reclassified to serve as a GIS coordinator responsible for the day-to-day delegation of assignments, supervision and quality control of the data conversion effort.

Hardware/software selection was determined by preexisting factors in the city and department. The City of Arlington uses an Environmental Systems Research Institute, Inc. (Esri) based GIS on a citywide basis. Influenced by previous departmental GIS activity and the citywide GIS program, CAWU chose Esri's ArcInfo as their GIS software. INFO is used for internal feature attribute storage. However, because of the importance placed, by the Department and the City, on developing an enterprise-wide information system environment and because of the amount of data that would ultimately be stored in the database, Oracle's RDBMS was selected to store all of the other attribute data associated with the graphic features. The hardware infrastructure for housing the GIS and RDBMS was already in place. Both ArcInfo and Oracle currently run under the AIX operating system on RS6000 Model 320s.

To better manage data access by the multiple users in the Department, CAWU decided to implement Esri's ArcStorageManager (ArcStorm). ArcStorm was developed to provide multiple users concurrent access to a contiguous spatial database. ArcStorm acts as a storage facility and transaction manager for both graphic and associated nongraphic data stored in an external RDBMS. This tool provides feature-level transaction management, unified graphic and nongraphic transactions, locks on related records in external database management systems, recovery mechanisms, data archiving, and other data management features.

In addition to staffing requirements and hardware/software selection, the following tasks were carried out as part of the Department's GIS implementation plan:

Existing Database Design

Prior to creation of the new division and implementation of the new system, the Department used an existing ARC/Oracle database that had been designed during earlier GIS efforts. Although most of this existing information was still important to the Department's business functions, the existing data structure and content did not comprehensively meet all of the business needs of the Department.

Most of the sewer and water infrastructure graphics were created during previous GIS efforts by the department. Much of the data was originally entered into AutoCAD, and then converted into an ArcInfo format. Existing ARC coverages associated with the water distribution system and the sanitary sewer system included: water lines and sewer lines (arcs), and manholes, cleanouts, valves, and fire hydrants (nodes).

All existing water and sewer attribute data was stored in Oracle tables. The existing database consisted of six tables, one table for each of the following graphic entity: water lines, valves, fire hydrants, sewer lines, manholes and cleanouts. No relationships were established between the tables, making each table essentially a flat file. Each graphic feature in ArcInfo was represented by and related to one and only one record in a single table in the Oracle database. ArcStorm was used to manage the transactions for both graphic and nongraphic data manipulation.

New Database Design

A major portion of the implementation effort was associated with designing a database that would meet the needs of the entire Department. Extensive interviews were held with each division of CAWU, including Engineering, Water Treatment, Map Room, Real Estate Services, Field Operations, Customer Services and Meter Services. From these interviews, a summary of needed applications and associated data to support the entire Department was created. Throughout the user interviews, it became apparent that additional data was necessary to serve the business needs of the users. This resulted in a database redesign and a significant increase in the number of entities and attributes over the existing database. Most of the redesign effort focused on the RDBMS database, with only minor changes occurring in the spatial ArcInfo design.

For the RDBMS design, standard database design procedures were followed. Entity-Relationship Diagrams (ERDs) were used to document both the water and sewer systems. Adhering to normalization rules during the design process, it was determined that to model the nodes of both the water and sewer networks properly would require creating superentity/subentity relationships. The items common to multiple node types were combined into a generic superentity table. For example, the items common to all sewer node types, i.e. common to both cleanouts and manholes, were combined into a superentity node table called S_NODE. The distinct attributes for each node type were placed in a subentity table (i.e. S_CLEANOUT and S_MANHOLE). Figure 1 diagrams the relationships for the sewer nodes and Figure 2 for the water nodes.

Figure 1. Sewer Node Database Design

Sewer Node Database Design

Figure 2. Water Node Database Design

Water Node Database Design

Integrity constraints necessary for ensuring data integrity in the new database design would also be necessary. These included primary keys, foreign keys and check constraints. Oracle supports such integrity constraints. Oracle, as one of its RDBMS functions, maintains referential integrity and thus verifies the parent/child relationships (such as those established in a superentity/subentity relationship). For example, a record added to a child table must first have an associated record in the parent table. In order to eliminate a parent record, any associated child records must be deleted first.

Database Implementation

Once the new database design was complete, a SQL conversion routine was written to transfer the data from the old Oracle tables to the new database design. In addition, several SQL data quality check routines were written and applied to the new tables to test for erroneous or improperly formatted data.

Because CAWU intends to perform updates to the external Oracle nongraphic attribute data from the GIS, relates between the appropriate external tables and INFO tables must be configured to establish a link between the graphic and nongraphic data. This link forms a relationship between the primary key in the graphic coverage's INFO table and the primary key of the nongraphic entity's Oracle RDBMS table. For example, in order to update attribute information associated with a water line, a relate must be defined between the WATER.AAT (water line arc attribute table) and W_LINE (associated water line attribute table in Oracle) using the water line ID in both tables. The order of relates were entered randomly, initially.

In addition to setting up the relates between the external tables and the INFO tables, the new Oracle tables must be registered with the ArcStorm database through the ArcStorm SchemaEdit command REGISTER.DBMS. ArcStorm offers two types of updates when registering external RDBMS tables through the REGISTER.DBMS command: 1) AUTO DELETE, and 2) CHANGES ONLY. Unless the user explicitly specifies otherwise, AUTO DELETE is the default method of updating external RDBMS tables in an ArcStorm session. AUTO DELETE implies that each time data is checked in, all data records involved in the transaction are first deleted in the existing Oracle tables, and the new data in the ArcStorm working tables are inserted into the Oracle tables, regardless of whether rows were modified or not. Alternatively, the CHANGES ONLY method of update performs updates only on those rows that have been modified. Thus, no unnecessary deletes are performed on the external RDBMS tables during the check-in procedure. CAWU did not specify a particular method of update and thus the method defaulted to AUTO DELETE.

Problem Statement

As the City of Arlington progressed with the database conversion effort, testing began on the new database design. As part of the testing process, data maintenance in the new environment was performed using existing applications modified to reference the new Oracle tables. An ArcStorm session was established and a transaction defined. Relates were set up between the INFO SEWER.NAT (sewer network node table) and the Oracle S_NODE and S_MANHOLE tables. Edits were performed on both the graphic and nongraphic data. However, during the ArcStorm checkin process, Oracle parent/child integrity constraint violation errors were generated. Both insertion and deletion errors were generated.

Because the error was associated with the fact that the new Oracle tables have parent/child relationships established, several scenarios were investigated in order to isolate and determine the cause of the error:

  1. adding manhole data prior to associated node data (adding the child before the parent);
  2. adding node data prior to its associated manhole data (adding the parent before the child);
  3. reordering the relates between ArcStorm tables and Oracle tables (child before parent, then parent before child).

Each of these scenarios resulted in a parent/child integrity constraint violation error of some form, which was assumed to be associated with the superentity/subentity format of the water and sewer system node tables, necessitated during normalization of the database. CAWU and PlanGraphics worked together with Esri to determine the exact cause of these errors and to establish a reasonable and effective solution. First, investigation led to the realization that using the default REGISTER.DBMS option AUTO DELETE was resulting in the deletion of all Oracle records included in the current transaction, regardless of whether the records had been modified.

Second, discussions with Esri revealed that ArcStorm does not inherently enforce parent/child constraints such as those established in the normalized Oracle database. When a checkout transaction is initiated in ArcStorm, a working table is created that does not inherit the base table's indexes, constraints or any other "rules" associated with the base table. Thus, when checkin is attempted, no integrity checks have been made on the data, and depending on the order in which the relates are established, integrity violations can occur. Specifying the order of relates can eliminate this problem. However, the order in which a parent and a child (and thus of the relates) are required to be checked in will differ depending on whether a record is added or a deleted.

Consider, for example, a valve in the water distribution system. If a valve is added to the system and data is entered into the W_NODE table and the W_VALVE table, in order to avoid a parent/child integrity violation the new valve's record must be entered first into the W_NODE table (the parent) and then into the W_VALVE table (the child). Alternatively, if a valve is to be deleted from the water distribution system, the valves records must be deleted from the W_VALVE table first (child) and then from the W_NODE table (the parent). The order in which these records would be checked back into the Oracle base tables is established in the relate of the session. Thus, the ability to perform both adds and deletes in the same ArcStorm session is impossible because only one relate sequence can be specified for a single ArcStorm session.

Alternative Methodologies

Considering the findings above, CAWU staff, City of Arlington Geoprocessing Department staff, and PlanGraphics staff brainstormed possible solutions to this problem. Six alternative solutions were generated, as follows:

Solution 1


Esri Technical Support staff presented one way that parent/child integrity constraint can be enforced in ArcStorm. The proposed solution is summarized as follows:

  1. Remove all parent/child constraints from the Oracle base tables,
  2. Set up the connection to the Oracle base tables,
  3. Register all Oracle tables to be edited with to the appropriate ArcStorm working tables using the CHANGE ONLY update option.
  4. ArcStorm defines the appropriate corresponding public working table. ArcStorm will create the working table in Oracle with the same columns as the base Oracle table, but without the constraints. Once a public table is defined in ArcStorm, however, the DBA can add the appropriate database constraints, triggers and indexes. In CAWU's case, this would mean adding the parent/child foreign key constraints into the public working tables. Once these tables are set up, they remain and do not need to be set up every time a user goes to edit a table. The constraints will remain throughout the life of the working table.
  5. Upon checkout, records that are checked out are inserted into the public working table (that now has the Oracle constraints effective). All rows checked out are written to the public working table with the lock_id = "database.transaction_id". The lock_id value is then used to manage which records are associated with each transaction. NOTE: Remember that any records added to the working tables during a transaction will need the lock_id set to the value of the database.transaction_id to allow the record to be posted during the checkin process. The order of the checkout will need to be parent then child (because of the foreign key constraints).
  6. Data is edited on the working tables. With the constraints, the working tables behave like normal Oracle tables. They will not allow the user to enter a child record before a parent record exists, and vice versa. However, adds and deletes can be performed in the same ArcStorm session.
  7. During the check in process, ArcStorm removes the rows from the working table (with the correct lock_id for the current transaction), and inserts them into the Oracle base tables. Because the constraints have been enforced in the ArcStorm session, at the point of checkin the implied referential integrity of the base tables holds.
  8. Issue a commit and proceed with ArcStorm checkin.
  9. After checkin is complete, ArcStorm deletes the records in the working table that match the transaction's lock_id.

With the above process, the user can work entirely within an ArcStorm session. However, the biggest disadvantage with this approach is the removal of constraints on base tables (which in CAWU's case is an enterprise-wide database accessed by other information systems). Data editing (specifically adding and deleting parents and childs) can only be done within an ArcStorm session to ensure referential integrity. CAWU anticipates editing non-primary key attribute data, such as pipe material or liner type, will be done outside the ArcStorm session. Thus, an application accessing the data from outside an ArcStorm session could have write capabilities on the non-primary key attributes, but must be limited to only read access of the primary key data (could not add, modify or delete the primary keys of the parent and child tables).

Solution 2

A second alternative is to denormalize the Oracle base tables in order to remove the parent/child relationships established by creating superentity/subentities during normalization. To achieve this denormalization, the information currently stored in the parent table would be distributed into the appropriate child tables. Fields found in the parent table would be created in each of the child tables. Although this is doable and would allow the user to do all updates within an ArcStorm session, many of the preferred benefits of the superentity/subentity relationship will be lost.

Solution 3

A third alternative involves the elimination of the need to perform both adds and deletes in the same ArcStorm session. During an edit session, if it is determined that a graphic feature and thus its associated attribute record are to be removed, the feature is flagged for deletion in its Oracle record. After the ArcStorm checkin process has been completed, an SQL routine can be used to post-process the deletes in both the child and parent Oracle tables. This allows the user to continue working in the ArcStorm session, and more importantly, maintains the integrity of the Oracle base tables outside ArcStorm.

Solution 4

A suggestion was made that ArcStorm could be replaced with Librarian. It was understood that these constraints did not exist in Librarian. However, this was not an option for CAWU since the city-wide GIS community in Arlington had previously determined there was a need to switch to ArcStorm for its transaction management functionality.

Solution 5

A fifth alternative, also based on the idea of not performing both adds and deletes in the same session, involved the use of separate ArcStorm sessions with distinct relate orders to accomplish both adds and deletes of attribute information in the Oracle tables. Two different checkout and checkin routines would be required, and edits in a single session would either need to be add only or delete only. The main drawback to this approach is the need to exit one ArcStorm session and begin a new session to continue the same round of edits.

Solution 6

A sixth solution proposed that only graphic editing be performed in ArcStorm. Nongraphic attribute editing would be performed essentially outside the GIS, perhaps in an Oracle Forms format. A major disadvantage to this solution is the increased possibility of orphan records existing either in INFO or in Oracle. For example, if a user entered a valve into the water distribution system, but fails to follow up with data entry into the Oracle database, the valve record in the INFO table would not have an associated record in the appropriate Oracle tables. This shortcoming could be overcome by enforcing data entry procedural standards.

Evaluation and Solution Selection

CAWU and PlanGraphics worked together to determine what factors were important to CAWU in evaluating these alternatives:

  1. preserving the strengths and advantages in establishing an enterprise-wide solution (this involves not removing constraints from a database that will be accessed by other information systems in the organization),
  2. time required to affect the selected solution.

In order to evaluate the alternative solutions presented above, CAWU and PlanGraphics established qualitative criteria to aid CAWU in selecting an appropriate solution (Table 1). The criteria included: 1) level of programming effort and complexity involved to implement applications with the new solution, 2) level of effort involved to redesign the database to accommodate any changes associated with the chosen solution, and 3) whether data would need to be remigrated.

Table 1. Evaluation of Possible Solutions to CAWU ArcStorm/Oracle Problem

Proposed Solution

Level of AML
programming effort

Level of effort
to redesign database

Data Migration

Solution 1

High

Low

No

Solution 2

Low/None

High

Yes

Solution 3

Medium

None

No

Solution 4

High

Low

Yes

Solution 5

Medium

None

No

Solution 6

High

Low/None

No

After careful consideration, CAWU and PlanGraphics decided to implement Solution 3 - record retirement in the database design. The primary reasons for selecting this solution include:

  1. Both CAWU and PlanGraphics considered data normalization and database integrity an important requirement for implementing a successful enterprise-wide database.
  2. Time required to implement this solution is minimal. A small change in the database design is required (addition of a flag field in appropriate tables). However, this change is minor to denormalizing the database.

Although implementing this solution requires a change in the GIS maintenance operating procedures, CAWU is at a perfect point in their implementation to document and incorporate these procedures. Deletes can no longer be handled in the same session. The Oracle tables are now registered with the CHANGES ONLY update option. The delete button was removed from all forms currently used by the GIS maintenance team. Also, relates are specified in the order needed for adds only, specifically upon checkout, the child tables are created first, followed by the parent tables, so that when the data is checked in, the order is reversed (the parent is added first, then the child).

A retirement column has been added to the appropriate Oracle tables for use when deleting a record. In this column, there are three valid entries: 1) C is entered in the field for a record that is to be deleted due to a correction in the database (for example, data was incorrectly entered), 2) R is entered in the field to indicate that the record is to be deleted due to the retirement removal of that particular item in the infrastructure, and 3) the field is left blank when the feature is active and should not be retired or deleted. It is anticipated that an SQL script in a post-processing routine outside the ArcStorm environment which will move all of the records that contain an R in the designated field to another set of tables that are designated as history tables. If a record contains a C in the flag field, then that particular record will be deleted from both the parent and child tables.

Conclusion

As the GIS community moves to a more enterprise-wide, client-server environment, data integration issues between database systems will arise. This paper illustrates one example of this type of challenge. Time was spent evaluating and thoroughly investigating all options for this scenario. The objective was to provide a solution which maximizes the intended functions of both ArcStorm and the RDBMS (in this case Oracle). The selected solution provides the City of Arlington Water Utilities Department with a normalized relational database design which retains its referential and integrity constraints, thus ensuring sound data. This is then used in conjunction with ArcStorm as a spatial data manager. This experience illustrates the importance of focusing early on an enterprise-wide solution to information needs. Integration issues should be evaluated, alternative methodologies considered, and an approach selected and implemented appropriate for a multi-information system environment. As vendors move toward a more open-development environment, integration issues should be reduced. More integration solutions will become available as this technology progresses.

Acknowledgements

The authors would like to thank the following people for their support, technical input, and guidance during this project: Teri Landrum-Applications Programmer, Russ Irons-Geoprocessing Services Manager, and Jan Jordan-Geoprocessing Programmer Analyst, City of Arlington, Texas; Minna Li-Executive Consultant and Bart Guetti-Systems Analyst, PlanGraphics, Inc.; and the technical staff at Esri in San Antonio, TX and Redlands, CA, especially Tom Brown.

References

Esri, ArcStorm Manual.


Kathy H. Spivey
Systems Analyst, PlanGraphics, Inc.
11020 Huebner Oaks #2137
San Antonio, TX 78230
210-558-3989 tel
210-558-4707 fax
kspivey@plangraphics.com

Robert W. Finkle
Vice President, PlanGraphics, Inc.
1597 Cole Blvd. Suite 300
Golden, CO 80401
303-274-2706 tel
303-274-2706 fax
rfinkle@plangraphics.com

Julia J. Hunt, P.E.
Water Information Services Manager, City of Arlington Water Utilties
101 W. Abram St.
Arlington, TX 76010
817-459-6604 tel
817-459-6626 fax
huntj@onramp.net