Not Just for Maps: CPS as a System for Structured Geographic Data Management

Andreas Oxenstierna
Dan Sherril - Presenter

Abstract

T-Kartor Sweden AB has developed, marketed, and sold Cartographic Production System (CPS) as an application built on top of ARC/INFO for the past seven years. It has been targeted for specific use by numerous organizations concerned with high quality cartography. While its ability to produce maps has been the focal point of marketing CPS, its ability to organize and manage both simple and complex sets of geographic data have received relatively little attention. This paper will discuss the functionality and methods developed in CPS for data management as well as present some new expanded areas of application development.

1. Introduction

Cartographics Production System (CPS) originated in 1993 to respond to those cartographers who wanted to take advantage of the power of digital cartography and Esri's Geographic Information Systems (GIS), ARC/INFO, provided the vehicle. CPS was relatively simple, it was built on early versions of ARC/INFO, and databases were simple and small. Since these beginnings, GIS technology has been readily accepted not only by cartographers but many other organizations as well. Databases, both spatial and nonspatial, have become huge and complex both in terms of amounts of data but also in terms of derivative products. CPS has evolved to handle these complexities and now even organizations who do not practice cartography as a primary activity use CPS because of the structured geographic data management environment that it offers. This paper discusses a number of issue areas that have affected CPS development. These issues are grouped into data complexity, use of GIS within large organizations, and the georelational data model used by ARC/INFO.

2. Data Complexity

Geographic databases are becoming larger and more complex. The scale of coverage of geographic data has increased dramatically. In the early 1990's, the uniform world coverage of digital data was 1:1,000,000 with the Digital Chart of the World (DCW). Now there are initiatives within the US defense establishment to have uniform coverage at 1:250,000 scale for the entire world and larger for selected areas. A four-fold increase in the scale of mapping translates into a 16 fold increase in the amount of data. Organizations can not only take advantage of a huge amount of existing digital geographic information, they also typically generate large volumes of their own. Satellite imagery has become widely available both in raw forms as well as interpreted forms. Resolutions of these data have seen a similar dramatic increase.

The management of these large and thematically diverse data sets is the subject of this paper.

CPS is built around the CPS Project, which defines a data dictionary storing all properties, and parameters needed for the data access. This includes the type of data (cover, ArcStorm, SDE, LIBRARIAN, image), physical location (path if required), geometric type (area, line, etc.), user name, logical constraints, projection definitions, symbology and many other parameters as well.

A stand-alone viewer called CPS MEDA displays the contents of all CPS projects available in the CPS server installation. This includes the existence, scale, and coverage of referenced data sets as well as the status and contents of all defined map products. (see Figure 1). Text-based metadata that describe individual dataset and map products is readily available.

This makes a powerful management tool both at the operator level as well as at the level of upper management. Management can easily view CPS MEDA to see what progress has occurred at a project and database level. This is useful for generating progress reports and management metrics for productivity. In addition, CPS MEDA provides an easy way for the manager to view the data without having to know ARC/INFO.

For the operator charged with a compilation or map production task, it is easy to see the contents of the database. The operator can also review the work of others as reflected in map formats, frames, or even other projects. The operator can easily decide to take advantage of existing work done by others so that duplication of effort is minimized.

History management is another aspect of data complexity. CPS takes advantage of a master-product database design (MDB/PDB). This design has been documented elsewhere but essentially it stores true (at scale) geometry for all features in a master database and then stores cartographic changes to features in a series of product databases. The same geographic feature can potentially appear in many product databases as well as have its true representation in the master database. Changes to one feature are made in the master and then replicated throughout the product databases through history management.

In addition to facilitating cartographic editing of product databases, tracing the changes that have occurred either in the location of a particular feature or in its attribution is a requirement that many organization have for legal reasons. Agencies producing nautical charts or Electronic Navigational Charts (ENCs) are an obvious example.

CPS implements a history management solution. This solution is able to handle both the requirements of MDB/PDB interaction as well as the legal tracking requirements.

3. Acceptance of GeoIT in large organizations - multi-user environments

A typical large multi-user environment will (or should) contain the following factors:

Mixed platforms, in many cases servers are based on an Unix flavor and the clients are based on a Windows OS

Scalable system functionality. Centralized system administration with as "thin" clients as possible. The clients must be isolated from the centralized information stored on the server. All updates to the centralized information must NOT require any updating of dependent clients.
Organization hierarchy with system administrators, project managers and operators
Large seamless spatial databases, updating responsibility in a few departments. May also be external spatial data in various data formats.
Large non-spatial databases, updating responsibility in many different departments. May also be external tabular data.
Many-to-many relations between spatial and non-spatial data
Many different products (paper maps, presentation maps and diagrams, geographical analyzing, intranet & internet applications, .) must be produced from and supported by the databases
Organization responsibility for updates, in many cases also legal responsibility
Outsourcing As the level of competence required for daily operations rises, outsourcing is an alternative for even core components of many organizations. The system must support outsourcing, if required.

True client-server architecture is needed to manage a complex environment like this, both for software, data, products and organization responsibilities.

CPS, as based upon ARC/INFO "workstation" version 7.2 or 8.x, implements a true client-server architecture, where all software, application files, data, product information (map layout, lookup-tables, symbology files.) and workflow information are stored on the server. All platforms that ARC/INFO "workstation" currently supports (the major Unix flavors (Sun Solaris, HP, Digital Unix, Silicon, IBM), Windows NT and Alpha NT) can be mixed in a transparent way.

The central information will always be updated only on the server. The clients must only be configured to be able to execute the software and access the centralized information in a way transparent to the operator. The presentation will outline the principles for the CPS client-server solution for the software including application files and product information. The handling of spatial and non-spatial data requires a more detailed description.

4. Spatial data, write access

Spatial data as handled in CPS usually requires both read and write access.

Single-user write access is straightforward in Esri-supported data formats, but the responsibility to handle any update conflicts resides on the operator.

Multi-user write access requires a more stringent data handling, usually called long-term transaction. A long-term transaction is composed of the following steps:

1. Selection of object in the central database. Objects locked by other users cannot be selected.
2. A transaction is created, all selected objects are locked and copied out to a local data format
3. Update work can commence. Data from the central database can be displayed as background data. Data to be updated can also be exported to other data formats, needed for e.g. outsourcing and use of less expensive environments.
4. During update work, CPS manages a local workspace with all temporary data.
5. When work is finished, the updated data is stored back to the server. Applications inside CPS can ensure data consistency by cascading updates to e.g. related cartographic text objects and/or non-spatial data. If history needs to be stored, the historical layers are updated with the "old" versions of the updated objects.
6. The locks are released and the transaction is closed.

Long-term transactions have been supported by the ARC/INFO module ArcStorm since the beginning of 90's and are integrated in CPS transparent to the operator.

CPS has also enhanced SDE to be able to perform long-term transactions using the same approach as ArcStorm.

5. Non-spatial read access (DBI access)

Access to non-spatial databases is very important to enhance the usability of the spatial database inside an organization. As non-spatial database in most cases are maintained by other departments, using applications with no interface to the spatial data, a transparent read access from applications using both spatial and non-spatial data must be possible to implement. The implementation must allow for instant reflection of any updates made to the non-spatial data by the department responsible for the updating.

In other words, it is not possible to migrate the non-spatial data into the spatial data model. Instead, the non-spatial data must be accessed by establishing unique identifiers, which can be used for linking between spatial and non-spatial data.

This can be achieved in a number of technical implementations. The most common solutions are:

RDBMS views, e.g. ORACLE, linked to spatial data
Applications for on-the-fly conversion of the non-spatial data into INFO tables which can be linked to the spatial data. This solution is static, i.e. updates to the non-spatial data requires a re-execution of the application to recreate the INFO tables.

6. Data model with integrated topology

Geographic data can be organized in two different manners:

Separate geometry, which may or may not be redundant. As implemented in CAD, shapefile or SDE (ArcSDE8). Often called "spaghetti data". Typically used for small-scale databases, aimed for example topographic or thematic maps and in the CAD-oriented world. Geographic coincidence between different data sets can only be identified and managed by using time-consuming spatial queries.

The other manner is to use shared geometry, which never can be redundant. As implemented in ARC/INFO cover. Typically used for large-scale databases, aimed for e.g. urban maps and navigation purposes. Needed to perform many types of geographic analysis.

The shared geometry data model has two major advantages over the separate geometry model, support for network analyzing and integrated topology.

6.1 Network analyzing

Any line network analyzing for example navigational purposes must use data with shared geometry. The chain-node data model, as implemented in the ARC/INFO cover model, is very efficient to use in network analyzing. All lines are connected by nodes and all analyzing use the node information to make decisions. The chain-node model has a very good performance in analyzing, as it is possible to perform analysis by making tabular queries.

6.2 Integrated topology

Data stored with the principle of shared geometry must often be handled together in some aspects but separated in other aspects.

For example, an area object has several characteristics (e.g. type/subtype of area, population) which are stored in attributes, either directly in the area object attribute table or in a related table where the relations is based on an unique identifier. The area object itself carries no geometry, the geometry is referenced from related line objects. These line objects may have several different characteristics (e.g. type/subtype of line, navigation information, addresses) which are stored in attributes, either directly in the line object attribute table or in a related table where the relations is based on an unique identifier.

We call this geographical data model for data with integrated topology.

IHO (International Hydrographic Organization) has adopted the integrated topology data model in the S57 standard, and named the information layer carrying the geometry the spatial layer. Feature objects stored in many different feature object classes represent the real world objects. The feature objects stores only the attribute information and gets the geometry from the spatial objects by relation based on unique identifiers.

Important aspects of the integrated topology data model are:

Geometric updating will affect all related feature objects ( area, line, node and point)
Updating of attributes must be performed independent for the feature objects (e.g. area and the line object), but information of all related and possible affected objects must be available if needed.
Unique identifiers must be used to manage all needed relations
There may be a many-to-many relation between different objects that must be handled during editing

To build, access and maintain databases with integrated topology presents a number of technical challenges. Some of the most important ones are:

Managing of the basic geometry (the spatial layer)

Managing of many-to-many link tables between spatial and feature objects

Managing of unique identifiers

Transparent access to all defined object types

Consistent updating of related spatial objects

Consistent updating of related feature (non-spatial) objects

Scalable performance

Examples:

Urban data - road network as lines with attributes for classification, addresses and navigation, defining together with other line types polygons such as built-up-area, industrial areas, parks and official buildings. Unique identifiers are needed to link the spatial information to non-spatial data often stored in other data formats.
Hydrographic S57 data. The S57 standard defines that all objects carrying geometry are stored in a specific spatial layer, and the feature objects reference through many-to-many links their geometry as stored in the related spatial objects. Spatial redundancy is NOT allowed. All objects, both the spatial objects (lines, nodes and points) and the feature objects (areas, lines, nodes, points and collections) must have unique identifiers. The feature objects are organized in approximately 180 object classes, each with a defined set of attributes and attribute code ranges.

CPS offers well-integrated support for all of these technical challenges. The upcoming version of CPS offers integrated support for the S57 data model in the CPS Data Dictionary and also support to define other data models using integrated topology.

7. Relation database approach

The concept of relation databases has a very dominating market share over object-oriented databases for several reasons. This paper will not include any in-depth discussion of the differences between object-oriented and relational databases. Instead, a short list is presented with the most important subjects we have found during many years of database design and application implementation using CPS.

Flexible object model

Flexible application support

Performance

Integration between different departments

Lookup-tables for centralized and multi-product symbology

8. Conclusion

The above written paper concludes the importance of storing and maintaining a central database, whilst making products of different kinds from one and the same database. This is timesaving, cost efficient and above all secure to know that all updates are maintained in one database and the products automatically updated when needed.

CPS has not only found a way to make aesthetically pleasing maps, but also takes care of complex data in a multi-user environment in a sophisticated way.

CPS - the safe transition to New Generation technology.

Andreas Oxenstierna
Senior consultant
T-Kartor Sweden AB
Box 5097
S-291 05 Kristianstad
Sweden
tel. +46 44 206800
fax. +46 44 128256
e-mail. ao@t-kartor.se

Dan Sherril
Senior consultant
T-Kartor Sweden AB
Box 5097
S-291 05 Kristianstad
Sweden
tel. +46 44 206800
fax. +46 44 128256
e-mail. ds@t-kartor.se