The value of the advances in mathematics and data processing, storage, and retrieval was never to make "better" maps. In fact, in the sense of a map as a portrait of geography on a two-dimensional surface, these advances should have been thought of as making the map itself obsolete. To begin with an idea of the database, rather than with an idea of the map, may seem somewhat abstract, but it drives a fundamentally different approach to the problem of "digitizing maps." Because one is actually not "digitizing maps" at all, but instead creating a database of many disparate elements tied to geospatial coordinates (elements that can be printed in a host of ways or exported to a host of applications) and a two-dimensional map is merely one quite limited thing that can be done with this data.
As this discussion will explain, a data-centric approach rather than a map-centric approach drives the data conversion model in some very non-traditional directions, which can in turn lead to dramatically lower unit costs, higher database quality, and the ability to predict and meet schedules accurately.
In the beginning, there was a map. The first cave dwellers scratched a map in the dirt, then on the walls of the cave, to remind themselves and to show others where they found water or game. The earliest mariners drew portolan charts to identify rocks, channels, and routes. The great illustrated world maps, starting in the 15th century, became more and more widely available with the onset of the printing press in 1455, and great atlases began to appear.
Mapmakers grew more and more ingenious as they struggled to portray information with more than two dimensions on their maps; a famous map of Napoleon’s march into Russia, for example, showed geography, gain or loss in numbers of soldiers, and several other factors mapped over time, all in two dimensions. But the severe limitations of depiction in two dimensions became even worse when attempting to show maps in three dimensions—because then it became even harder to read accompanying data when it was printed on curved surfaces.
Then with the immense investment in the skills developed by engineers and draftsman to draw three-dimensional objects in two dimensions, or even later in the development of computer-aided design/computer-aided manufactured (CAD/CAM) devices to depict three dimensions graphically, and with the vast inventory of existing maps on which the assets of utilities, telecommunications companies, pipelines, and other transport industries were depicted, it was perhaps natural that practitioners first looked to the map when they thought about creating more automated geospatial information systems (GIS).
Thus an industry developed in which the point of departure was the map, the eventual result was to be the digitizing of maps, and the users of the information were only those who needed maps. New hardware and software was developed, aimed at making better, more accessible, more error free, and more easily revisable maps. The entire GIS industry grew up around taking paper maps, digitizing them into electronic maps to reduce the costs of making new maps.
Meanwhile, in the field of mathematics, where an entire mathematics developed around n-dimensional spaces, without any regard for what they might “look like” in our real world, and in data processing, where beginning after the second World War the speed and power of processing and the ability to store, retrieve, and search data began to increase exponentially, the tools became available to think about storing and using data in an entirely different way from a map. By 1990, Oracle was selling a new “relational” database, which had been developed in the mid-1980s by a small west coast firm for the Central Intelligence Agency, a database that radically changed the traditional hierarchical structure in ways that created vast new opportunities for associating multiple data attributes.
The value of the advances in mathematics and data processing, storage, and retrieval was never to make “better” maps. In fact, in the sense of a map as a portrait of a geography on a two-dimensional surface, these advances should have been thought of as making the map itself obsolete. Instead of a two-dimensional or three-dimensional physical model of geospatial reality, for years it has been possible to carry many dimensions of data tied to specific locations, to check the data for consistency, to access and change a database easily, and, what has become increasingly more important in this age of enterprise-wide data solutions, to export it to other applications. For applications actually requiring a map, it could easily be printed or sent to a screen for viewing, in any scale and easily transmitted or handled.
To begin with an idea of the database, rather than with an idea of the map, may seem somewhat
abstract, but it drives a fundamentally different approach to the problem of “digitizing maps. ”Because one is actually not “digitizing maps” at all, but instead creating a database of many disparate elements tied to geospatial coordinates (elements that can be printed in a host of ways or exported to a host of applications) and a two-dimensional map is merely one quite limited thing that can be done with this data.
As the discussion below will explain, a data-centric approach rather than a map-centric approach drives the data conversion model in some very non-traditional directions, which can in turn lead to dramatically lower unit costs (as a result of being able to use substantially lower-paid production staff), higher database quality, and the ability to predict and meet schedules accurately. The most important of these drivers have been:
- The creation of the idea of the Conversion System Design Document (CSDD)SM. Quite obviously, if your focus is the database rather than the map, you must take a comprehensive approach to the data, its consistency, its structure, and its coherence. Although the CSDDSM has been adopted and adapted by others in our industry, and indeed everybody must use some kind of plan to convert the information on maps into data, a data-centric approach requires a much more extensive effort to get the CSDDSM right, because the complexities of comprehensive database design dwarf those of map production.
- The extreme—and what sometimes appears to be excessive—emphasis on not allowing any incorrect data into the database in the first instance. If you think like a mapmaker, an error isn’t such an important matter: you just look at the map and change it. But to a database creator, an error is like pouring ink in a pitcher of water—it is easy to get the color in, but it costs a fortune to get it out, if indeed you can at all. Or maybe it is more like looking for needles in a haystack—not only is it painful, but you never know when you are finished.
- The critical importance of an open-architecture Oracle-based delivery system. If you are thinking maps, you have little concern for the multiple uses of the data in the database—yet the data is extremely valuable if it is portable and flexible. An open-architecture approach creates a database that can be ported into any existing (and for that matter future) platform. This means it can feed all of the potential uses, from outage management to right-of-way maintenance to payroll, if the correct data is combined and maintained.
- The critical need for controlling productivity and management. In order to assure the uniformity and consistency of the data, the data should be divided into smaller, more manageable work packages. Then data can be produced at any facility, by any operator in exactly the same way.Thus, enabling the ability to control the scheduling of the work packages, to get the data entered in the logical order, and even the ability to pay for the work on a quality-weighted productivity basis to manage costs. Obviously, using a map-based approach, you can just decide how many hours it takes to create a map and the costs per map and then multiply by the number of maps— there is no need for the level of sophistication required by the data-centric approach.
- Finally an important and fundamentally different quality system, ProcessQASM. ProcessQASM focuses on ensuring that quality has been maintained during each step of the production process. One mistake on a map affects only one map, but one mistake in a database can contaminate an entire process. Process software continuously samples processes, not just final products, on a non-random basis searching for errors (for example, sampling newer keypunchers more heavily, or modifying on a real-time basis to over-sample where errors are entering a process).The ideas behind this system come from the process industries such as chemicals, paper, or circuit-boards where separate machines and parts of processes are sampled continuously and adjusted as necessary, not just the final output of the production process. The remainder of this paper is organized around the five critical steps in creating a high-quality database; we have used examples of how Apex Geospatial treats these issues, but naturally there are many different ways to achieve the same objectives. The five critical steps:
- Creating the database
- Keeping needles out of the haystack
- Harnessing the power of an open-systems database
- Managing people and process
- Integrating process quality control through each production step
Each of these five areas will be addressed, illustrating how the data-centric open architecture approach developed by Apex Geospatial works today. We believe that these new systems and approaches have indeed transformed an industry and created data options, now generally untapped, which will drive business organizations in the new Millennium.
- Creating the database: the Conversion System Design Document
A data-centric approach aims at the creation of a multi-attribute database that, in addition to producing maps in any target system, has many more capabilities. The design of a comprehensive database that is robust and cohesive requires a great deal more expertise, time, and judgment than if a map were to be the only output. In essence, an entirely new option arises; to include related data from many different source materials on many different attributes, many more than could be included on even the most cluttered map. This new flexibility suggests several changes in the way utilities and/or their software platforms should think about their data conversion job. First, the broadest view should be taken of what data might be wanted or
required and where and in what form it now lies. A good way to achieve this is to work backwards from the data needs requirements designed to operate major utility systems such as Outage Management Systems, Inspections and Replacement Systems, Right-of-Way Maintenance Systems, and other applications.
This flexibility also has interesting organizational implications for a utility, because with multiple users (and perhaps funders) of conversion data, important organizational and procedural decisions may need to be made.
Next, the users of the data should work closely together with the database designers on the creation of the Conversion System Design Document (CSDD)SM, both so that the great flexibility of the Oracle-based database can be achieved, and to design the database at minimum cost.
Finally, the team should specify, in as fine detail as possible, all of the potential uses of the database, not only for maps (and in all of the different scales or forms of maps), but also for the raw data. It is vital to anticipate, insofar as possible, all of the potential needs for the data, because key managers with differing preferences, the regulatory climate, and other factors change over time. Far too often, the result of short-sighted or cost-driven conversion models is data that cannot, without intervention, be used again for other applications.
The specific components of a CSDDSM are straightforward:
- A definition of the target GIS database,
- A definition of the data sources from which the converted data is to be captured,
- A detailed set of production procedures, and
- Specifications for the customization of the database creation software to tailor the project to the client’s requirements.
The production procedures and the DataWorks® customization specifications are completely integrated and are developed in tandem; they are commonly referred to collectively as the “road map. ”This “road map” commonly encompasses two domains (the distribution system and the land base) and two data worlds (logical and graphical).
Experience has shown the value of spending extensive time and effort (and hence costs) on creating the CSDDSM, since it is the fundamental basis for any successful database creation project. Because of time pressures often brought on by outside forces, long procurement cycles, and a general eagerness to get going, this is a phase that clients often wish to curtail—or even to begin conversion without a clear data model in mind. This is a serious mistake, for recovery from a database design effort started without a clear specification is often very costly or even impossible.
- Keeping needles out of the haystack: The Apex Oracle Data Entry Application Interface (ODAPITM)
Of all the steps, checks, and procedures to attain high quality data in the data-centric approach, none is more important than entering the data correctly in the first place. After the data is entered in the database, every subsequent procedure to check, correct, and refine the data is more complex, more costly, and risks new error.
Because Apex Geospatial began as a pure data conversion company with specialized software and techniques that raised the industry quality standard from 99.95% to 99.995% in non-graphic data conversion (that is, from one error in 2KB to one in 20KB—or from one error a page to one error every ten pages), Apex Geospatial’s focus has always been on delivering quality data. Thus it was natural, when we began to consider the data-centric approach to building a GIS database, to focus on data entry quality—not letting the needle into the haystack in the first place—because we already knew the cost calculus between prevention and remediation.
We designed a procedure to produce 99+% data quality with a four-step system, where two different coders coded the attributed data by hand, and then two new and different keypunchers entered the data. Employing many of the techniques perfected over many years of data entry (such as in-process quality control and others described
below),Apex Geospatial’s extensive attention on not allowing any bad data into the database.
Obviously, there would be numerous ways to enter data carefully, each with its own cost profile. The brief description of the Apex Oracle Data Entry Application Interface (ODAPITM) is meant only to be illustrative of one way to achieve excellent accuracy.
ODAPITM,designed for double-capture of data from two independently created coding sheets, yields extremely high accuracy rates—in excess of 99%.
The ODAPITM method involves the following steps:
- Data is coded from an original source (such as a map) onto coding sheets (Coding 1).
- The same data is coded, in the same sequence, by a different individual onto a 2nd set of coding sheets (Coding 2).
- One operator enters the data from the 1st set of coding sheets into ODAPI (Entry 1).
- A different operator keys the data from the corresponding 2nd set of coding sheets into ODAPI (Entry 2).
- During Entry 2, the Data File created during Entry 1 is available as a reference but is not updated. Instead, every mismatch is recorded in a Delta File.
- The mismatches are reconciled by referring to the original source within the Error Correction and Assignment (ECA) Module.
- Errors are also corrected in the ECA module. The ECA process is performed only by specifically
- The authorized person performing the ECA process examines each Entry 1/Entry 2 mismatch present in the Delta File. The correct value is determined by referring to the Map (i.e., the customer-supplied source) and the coding sheets. The incorrect value is then “replaced” with the correct value, resulting in a “corrected” version of the Data File. At the same time, the error is assigned to (a) Coding 1, (b) Coding 2, (c) Entry 1, or (d) Entry 2, as appropriate.
- Harnessing the Power of an Open-Systems Database: Oracle
The heart of the data-centric approach is a program that converts digital and paper data from multiple sources into the open-systems capability of Oracle. Using ODAPITM the high-quality data entry system previously described, data from many sources can be unified into a single, consistent GIS database.
The Apex Geospatial proprietary technology to do this is called DataWorks®. Once the data is safely and correctly in an Oracle database, it can be ported to any target system. This essentially changes the database into a strategic tool, which can be used to produce maps in many forms or for alternative purposes. With the entire GIS data model compiled in Oracle—i.e. the symbology, attribute data, network definition, graphics, and graphics placement rules—converted data has the maximum degree of flexibility.
This also allows for sophisticated checking on the consistency of database information through simple programming. Built into DataWorks®, for example, are connectivity checks for distribution networks, so that even if the mistake was on the original map it can be corrected in the database. Obviously this also accommodates late design changes with minimal impact to cost and schedules, since no maps are printed until the database is completely rationalized—build once, use many.
In short, although the first purpose may have been maps, clients receive a high-quality, open architecture database with many alternative uses—a database that can have powerful applications elsewhere in the organization. This sometimes can have important budgeting and organizational implications for the buyer, and it all results from an initial focus on the data rather than on a map. In section 4, we describe another very important component of producing data of the quality necessary to drive a data-centric approach.
- Managing People and Process Through Highly Evolved Delivery Systems
The third of the four legs of the table that ensure the high-quality necessary for a data-centric approach is an overall production system that integrates production, human resource management, and quality processes. Although the details of the quality control processes are discussed below, it is extremely important to understand that the management of the people and the process is what produces the standardization that allows for both cost efficiencies and ensures quality inputs. Apex Geospatial calls its own trademarked system to do this ProACTTM.
When ProACTTM is applied under proper conditions, such a high degree of standardization results that the output from any one of our facilities is indistinguishable from that of the other. In itself, proper scheduling and management of human resources is a key component of ensuring quality, and great cost savings are realized in such an integrated system. Clients visiting Apex Geospatial’s Indian facilities, for example, are often amazed that several hundred production workers receive regular paychecks based on their own quality-weighted productivity—with only
one payroll overhead employee in the facility.
- Integrating Process Quality Control Through Each Production Step
The prior sections have demonstrated clearly why a data-centric approach requires the highest quality data, because unlike dealing with maps on a one-by-one basis, it is usually difficult, costly, or even impossible to get the needles out of the database haystack. Each of these elements: the design, the data entry, the testing and configuration, and the management systems have their part in ensuring quality, but nothing is more important than process quality control.
The typical quality control is done at the end of the production process, and indeed Apex Geospatial and every other producer has tests, procedures, and policies to make certain that the eventual results are correct. But Apex Geospatial’s great insight was to recognize years ago that GIS database creation is a scientific process, not a craft. We established a conversion system based upon this premise and borrowed an important concept from the process (assembly) industries (e.g. car manufactures, circuit board assembly, chemicals); that is, testing the quality of the interim product at each production step. We call this ProcessQA.
ProcessQA operates in parallel with production. Samples of output from each production step are taken daily on a real-time basis. Production continues uninterrupted even as the samples are checked. If an error is discovered then corrective action is taken to re-calibrate the production process. For example, if a series of errors is discovered which is then traced to a misunderstanding of a specification, as can happen in the early stages of a new project, all affected operators are re-trained.
Another powerful aspect of ProcessQA is that samples are deliberately biased towards potential errors. For example, when a new specification is first introduced on the production floor, ProcessQA analysts take a disproportionately large sample from the relevant task. In this way, the sampling changes from day to day, continuously targeting areas of possible problems.
When the steps above have been properly applied, the result is a highly accurate database and highly accurate maps—accuracies of 99% or higher. The final product quality control then applies normal sampling techniques before final delivery to the client, but there cannot be quality problems at this stage if the other steps have been followed correctly.
This paper suggests that if you begin with the idea of changing a paper map to a digitized map, you will not create the best process for converting GIS data. If instead you begin with a data-centric approach, an approach that begins with the idea of creating a database, you will do a lot of things differently. First, you will spend more time and effort on design, conscious that what you are doing can have many more applications than the current one to draw maps and that you should cast a broad and inclusive net as to which data to put in the database, based upon inputs from many potential users and a sophisticated appreciation of what might be needed in the next decade or so.
Next, you will be fanatical about data quality, because once an error gets into a database, unlike on an individual map; it is costly, difficult, or perhaps even impossible to get it out. You will approach quality on many levels: keying in the data, testing and configuring the data, and ensuring quality during the process of production through process quality control.
Finally, you will store the database in an open-architecture relational database system, so that you can port it not only into the current target system for today’s uses, but so that you end up with an asset with a vast array of future uses.
Apex Geospatial Data Services, LLC
400 N. Loop 1604 East, Suite 300
San Antonio, TX 78232
Telephone (210) 404-9585
Fax (210) 490-3665
Conversion System Design Document (CSDD)SM,DataWorks®, ProACTTM, ProductQASM, ProcessQASM, ODAPITM are registered Marks or Marks of Apex Data Services, Inc.
in the United States and other countries. Other product and company names mentioned herein might be registered Marks or Marks of their respective owners.