Migrating From Relational GIS To GIS Objects

ABSTRACT

The migration of data from a relational data and application schema to an object oriented implementation requires entities to re-evaluate what has been required of the data, and what behaviors can be identified to be incorporated into the GIS data objects. With the growth of object technology, the expansion of data sharing among organizations and the influx of new users, data models must be reconstructed. For many utilities the data requirements have changed.

INTRODUCTION

Traditional relational databases support such GIS functions as customer location, network analysis, tracing, and map creation. Other applications such as work management also utilize the relational data model.

Attention must be given to creation of an object-oriented database that will provide behaviors that were previously imbedded in custom application code. Geometric characteristics, interfaces, subtypes, transaction management for concurrency, transaction management for recovery and a host of issues can now be imbedded in the data object definitions. Consideration must be given to how these objects interact with each other and with a variety of users.

WHY MIGRATE FROM RELATIONAL DESIGN TO OBJECTS?

Technological advantages that compliment utility network modeling are inherent in object model designs where as most relational database management models lend themselves more toward the business functions of an organization. In relational databases, persistency is obtained by all data stored in tables. Queries and editing operations performed on these data are stored in tables. The network infrastructure needed much custom code to be managed and queried in ways that are useful to the day to day operations of a utility. In object oriented designs (OOD), objects can be made transient (data used only for the duration of a function or application session) or persistent (data that is created and may available throughout the organization) when instantiated. Abstraction can provide the capability to change from one to another in some systems and provides a mechanism of scalability for data model expansion. The object model itself is the source for maintaining the characteristics that are meaningful to a utility.

Design differences in relational and object oriented databases, and the addition of new data users within the enterprise require that agencies redesign the model to take advantage of existing processes, new technologies, and welcome new users with their requirements to utilize the enterprise database. The goal is still to provide the primary data store and promote data utilization enterprise wide to make work-groups more efficient, to enhance customer satisfaction, and to provide additional cost savings and payback. However the way that the data is used has changed requiring a design that allows for component object model (COM) applications to analyze, imbed, and report on data objects in ways that relational designs could not.

Cost

Managing a GIS can be costly. The data is the most expensive investment in a relational GIS project and costs are greater when custom applications and functions are required. End-user and system training is needed with any GIS implementation. The maintenance of the GIS software and database modifications is where continuing expenditures are also required.

Since much of the behavior and that which defines a GIS feature is a part of the object model diagram, new data objects can be added to rectify deficiencies discovered in the relational model without having to anticipate a large investment in custom application programming. Application maintenance costs are lower because there is less code to maintain. Code does not have to change when the data model classes are extended. The training effort as it relates specifically to GIS maintenance is more efficient since many commonly used application development tools are used to create the ODBMS and supporting application code.

Software Development

Object oriented programming methods and tools have become standard to the GIS world thus the object-oriented database becomes the best implementation to take full advantage of the benefits that COM applications provide. Since many object-oriented designs are built to be only somewhat complete yet expandable, components are reusable thus lowering the cost of software. Modular components also provide better quality software and a faster development cycle. Applications can be distributed to specific units to provide users with combinations of common and specialized functions. COM development, compared to traditional GIS software development, contributes to cost reductions since programmers can more productive in developing and maintaining software.

Performance

Modeling requirements of complex relationships that are required by utilities are offered in object GIS designs. The object database provides a better means of storing and retrieving GIS and related data. Engineers and field crews can access the data and related information easier. Emergency response to events such power outages and main breaks can be analyzed faster and the results delivered easier through COM based applications. In relational databases information pertaining to a feature is spread out over many tables. For this reason, response time for queries suffers due to the need to access multiple tables and join them in order to retrieve feature data. “Object-oriented databases can reduce the need for paging by enabling only the currently required objects to be loaded into memory (relational databases load in tables containing both the required data AND other unnecessary data ).” [1]

Scalability

Complex data such as images such as network schematics or photographs, video such as sanitary sewer television inspections, and documents can be managed better within the object oriented GIS. Conversely, GIS data can be accessed from interfaces to complex data. Functionality can also be scaled to a particular task or user group. Objects can be stored in the way they are actually used and COM programming techniques also insure implementation compatibility to object databases as object oriented technology evolves.

Data Layer Overhead

Relational designs require layers that convert data to rows and columns for storage. This layer also has a drain on system processing resources since every application call to the database requires that this process be executed or analyzed. Essentially the overhead costs add up and more efficient means of data design and storage are necessary.

Data Maintenance

While GIS may be the focus of the migration, other applications should also be reassessed for usefulness within the object environment. Many systems such as work management systems (WMS) may need to be replaced in order to provide a more comprehensive utilization of the ODBMS implementation and to provide a central location for facility management data thus eliminating redundant activities between departments such as CIS and field operations. The stakeholders in each department that will be effected by the migration should have input into the conceptual design when the migration project begins. Participation by representatives of all potential groups will help to identify requirements and minimize design and application selection risks from the initial conceptual vision to the final deployment of the ODBMS and supporting software. Business processes, workflow, and source data format will impact the extent of the migration effort from a conversion, integration and application stand point. These issues and the resolution to current problems require as much input from stakeholders as can be obtained. The result should be a single facilities database that can be accessed, with particular security contraints, by all departments whose daily business functions require such data.

Relational design limitations:

· Relational design standards are rigid and not necessarily used to create the database and the rules that applications and feature tables adhere to.

· Relational designs require application processing to link tables together and manage user interaction with data tables.

· Other applications and users do not inherently adhere to standards imposed upon a relational design.

· New data definitions can be introduced into the model but application programming is required to apply the proper links and rules for the new feature data or related attribute.

· Standards for developing relational applications are more or less up to application developers or in-house technicians using a proprietary macro application language.

· The technology industry has been utilizing object-oriented techniques for application development for years. Relational models need customization to applications to create the model relationships at run time and are better suited for business accounting type processing.

· Design flaws may not be discovered before the model is implemented.

Object design advantages:

· Object model diagrams are the source of the object database. The repository is born of object model diagrams created using a Unified Modeling Language (UML).

· Object relationships are established and adhered to based on the object model diagrams.

· COM allows objects and functions to be shared at the binary level.

· Object models can be modified through the object diagram to introduce enhancements or new requirements within the object classes without additional application programming.

· Standards for developing object models and functions are established through commonly used development environments (Visio, Rational Rose, VB, VBA, C++, etc.)

· Object models with all of the required relationships and data validation domains are created with a design tool and are established when the database is created.

· Design flaws in the model are captured and can be corrected within the modeling tool.

NEW WAYS TO THINK OF GIS FEATURES

BLOBs (binary large objects) such as images and video are stored better in ODBMS’s. While RDBM’s support BLOBs they cannot utilize them in the way they query tabular data and they introduce a high level of performance loss.

Network contribution can be defined in the object model. Features that control network activities can have a source and termination qualifier. The behavior of the utility network, through the relationships defined in the object model, can be analysed based on situational criteria. This capability is also available in relational models but required much custom programming and processing to produce results. Changes introduced in relational models intoduce a lower level of reliability in the analysis thus requiring more custom programming. In object data models the probability of exercising reuseable code is higher.

Connectivity that constructs the utility network model is not a new concept but traditionally defined in custom application code which explicitly detected junction characteristics or analyzed the intersecting features and, through an iterative process, compared the potential connections with predefined perimeters listed in a table. This implementation many times took a while for the transaction to be processed. Connectivity in an object model is defined in a diagram and must be adhered to by the editing software reducing code overhead and processing time.

Open architecture that many COM compliant applications can access to:

· produce reports

· produce customized maps

· imbed maps into documents

· perform data model analysis

· produce system schematics

“Relational models can be represented with objects. OO class hierarchies can be more difficult to model with RDBMS. Many ODBMS products provide means of integrating with RDBMS. The GIS translation problem is easier and does not require a generic translation capability.” [2]

THINGS TO THINK ABOUT

Object model characteristics contribute:

· Location-based objects that can have geometry

· Features that contain specific behaviors

· Attributes with valid value domains without a separate link to a table

· A more real world representation of utility networks than relational designs

Migration strategies should result in:

· A Migration Specification Document that maps the current data structure to the new one and identifies and verifies critical attributes

· Compatibility with new field technologies to use maps and data in COM compliant interfaces

· Accommodating the requirements of new users

· A single facility database fot the entire enterprise

Application support considerations for new model:

· Asset management

· Work management

· Dispatch

· Crew management

· Service Requests

· Emergency analysis and response

· Field based data capture and update

The utility should think of the new model as GIS / Data centric. Features (class objects) have relationships to the geometry information (abstract object class). Attribute values and their domains are inherited from subtypes to abstract classes of the basic geometric element without re-establishing run time links or table joins. Many departments, whether they require GIS or just access to the ODBMS, can be sharing the most up to date version of the enterprise facility database.

What data is required when creating the object repository? Consider potential users and entities during the development of the new object model. Application developers using relational models tended to provide a “one size fits all” concept to minimize maintenance costs that did not afford utilities to design what they really needed. In many utilities other systems had been brought on line since the initial data requirements analysis was performed. Requirements introduced via experience with the old design and interaction with other potential users can now be considered for the object model design.

A facilities centric view of the enterprise becomes more of a reality. Custom applications can be created as before to use the data, but the data can also be used by inexpensive commonly used software. Database administration is enhanced with versioning capabilities making GIS even more accessible to casual users through specific user-type interfaces.

Field applications are available to download and upload maps and documents. Remote users can utilize the processing capabilities of the server while access permissions and type restrict functionality and accessibility to data.

There are several techniques to migrate relational data to an object GIS. One method is to join tables that contain feature geometry and geographic reference data with tables that house the feature tabular data. Some systems offer wizards to create the object database from GIS network maps. This method is a feature-centric approach to data migration. This means that processes can be designed to operate on one feature class at a time, to gather information then create homogeneous graphic and attribute data files from the digital sources. The data can then be migrated to discrete themes or logical layers in the target ODBMS. For example, linear features such as pipes and conductors can be recognized from the source design files and transferred to a layer with linear topology. The linear features will have attribute tables that preserve all of the design file properties, including the relational linkage tags that associate each feature to a specific record in a specific RDBMS table.

Feature attributes can be encapsulated for portability. This means that attributes from RDBMS tables can be physically copied to the appropriate feature tables in the GIS data. Enterprise-wide migration would more appropriately leave the RDBMS data outside of the GIS layers to avoid redundancy of data and maintenance operations such as data validation. This approach can be effective in the pilot project in which a sampling of data is required to assess the proposed migration process and the functionality of an application.

UML diagrams of the utility model will contain abstract features, feature definitions, feature subtypes, feature relationships, network relationships, connectivity, and cardinality. The way the legacy features relate to each other will change little if at all. Still all of the assets that form the network need to have their network contributions modeled, as they would operate in the real world. The ability to create complex data structures lends real world capabilities to the features that control network loads, capacity, and flow. Some relational models treated every nuance of a utility feature as a separate entity when behavior was essentially the same as all other features in the same class. Subtypes will provide a more efficient way of defining the features and their interaction within the utility network. The relationships between network features and non-network features will provide a road map for other applications to follow when GIS is not the tool used for analyzing features and their history or status.

The UML will, through inheritance, identify where data redundancy has occurred in the relational design. This redundancy can be eliminated within the object model. Data migration may present issues when duplicate data is discovered and decisions need to be made as to what source is more accurate and provides the best information to the most people.

HOW TO GET THERE

Migration Specification

The migration specification document is the primary and most essential element by which the project will navigate. While the specifications may be adjusted to suite necessary migration criteria as problems are discovered, the migration specification document provides the basis of understanding of the project details.

Mapping the old data structure to the new one is a requirement that will result in the translation specification. This document must provide a home for every piece of data and should also identify the location of the data source. As the migration specification is compiled activities will occur to provide for data enhancements, aggregation, and normalization. The migration specification should provide enough detail to those on the project staff who are not familiar with the data.

Compiling a detailed migration specification document allow people who are not familiar with the core business of the utility to understand the migration. Tabular and graphic data will be analyzed to ensure that records are migrated to meet required codes and standards and the document should reflect the reasoning behind the migration decisions for business, compliance and operational reasons.

Expect to deal with anomalies. Not all data can be migrated due to technological constraints with migrating graphics from one technology to another. The migration specification document will inevitably reveal what you did not know about your data. Expect anomalies with tabular and graphic data.

Validation

A comprehensive data validation activity will assure that each process in the migration will be inspected for efficiency and correctness. Confirm through checks that attributes are migrated correctly, (data is consistent), graphic records match tabular records, and that features have the proper relationship with regard to the utility data model. The validation process is another environment in which to address where refinements of the migration must occur, whether in the process itself or in the data that is being processed.

Confident that the process is correct, decisions must be made with data refinements. It may be easier to fix data problems in the pre-migration phase because of staff experience in the old environment. Knowing what tools will be available in the new environment will provide the information that enables the data to be fixed post-migration. Understand how long fixing data will take and where the best tools for the remedy are available.

Considerations

Be realistic and understand that the data model will change during the migration specification process. Try however to limit data model changes during the actual migration process to keep schedules and costs under control. It is also helpful to talk to those whom have gone before you. Consult with others who have completed or are well into their migration projects.

Make sure that a pilot migration occurs. The pilot will reveal some anomalies and will provide valuable information to the migration specification document. The pilot project must migrate everything that is available in the data model. Decide how the data will be migrated. Decisions on what geographic area represents the most complete collection of available utility features must be made so the migration process can be verified and calibrated based on review of the pilot area migration. Bench-marking the migration processing will provide insight into how long the migration will take and reveal where problems exist whether in the migration program or the physical size of the target database. The migration process can then be planned from the pilot.

Decide what data should be migrated. Data that is stored in other systems should be acquired only where it makes sense. The new system may not put constraints on where data resides and may also provide an easier alternative to accessing data if left in the current location.

Staff the project sufficiently. All of the project staff must understand the goal and at each stage of the migration contribute to or be updated as to the progress of the project. Make sure that the project staff also understand the next step. An understanding of the data loading process will keep the migration project staff involved and interested as each phase is executed.

Make sure that the project staff has been trained in the new environment. They will be more capable of offering effective assistance in the migration process with a more objective mind set. There may be a large learning curve so planning training closer to the migration project involvement will be more effective and ensure the skills that are learned will optimize productivity sooner.

Expect to go through a ‘trial and error’ process to get the tabular and graphic results required. While not desirable, it is a fact of most data migration projects. With this in mind the project staff must also understand the data model and that even small changes to the model can introduce significant impact on the project schedule and cost. Changes that are introduced to the data model should be prioritized in that some are imperative and others do not necessarily require model changes until later.

Understand the behavior of the new system. The migration process will be iterative because the migrated data must be tested in order to determine if the model does what is required for the application(s) as well as providing the desired links to other systems and users.

SAVE THE SOURCE DATA AND TOOLS. You may need to re-translate some anomalies. The source can be kept out of production on a down sized system while the new system is being deployed. Make sure the new system has enough size to accommodate the data. Migration programs will have to run several times, usually for long clock times, during the migration process and making sure that the new storage and working space is sufficient will save time and money with each run. Invest in the hardware and memory up front.

Expect a backlog of work during the migration process since migration may take a considerable amount of time. With testing the backlog can be minimized while the migration occurs. The more that is known about the results from full system tests prior to final migration the more efficient the transition will be.

A functional gap analysis should be performed as part of the data migration process. Applications that were built to operate with the RDBMS GIS may have custom functions that operated with the specific utility features that have been analyzed and maintained. The analysis should compare the functionality that had been available with that of all proposed GIS software that may be used with the object-oriented GIS.

Examine deficiencies in current design, add things and take things out. Most often the relational model will contain attributes that have not been populated and will not have usefulness in the object design. Conversely, integrating with other systems may require interrogation of other databases in order to provide linkages. Issues such as departmental access and user update or view only privileges will also become apparent during this analysis.

Jim Kyles

Senior Consultant

Kema Consulting, Inc.

1620 S. Ashland Avenue, Suite 106

Green Bay, WI 54304

jkyles@kemaconsulting.com

References

[1] “Introduction to Object Oriented Databases” - Steve Hand and Jane Chandler 1998

[2] “Introduction to Object Oriented GIS Technology“ - David Arctur, Philip Sargent