The Abstract
The value of the advances in mathematics and data processing, storage, and retrieval was never to make "better" maps. In fact, in the sense of a map as a portrait of geography on a two-dimensional surface, these advances should have been thought of as making the map itself obsolete. To begin with an idea of the database, rather than with an idea of the map, may seem somewhat abstract, but it drives a fundamentally different approach to the problem of "digitizing maps." Because one is actually not "digitizing maps" at all, but instead creating a database of many disparate elements tied to geospatial coordinates (elements that can be printed in a host of ways or exported to a host of applications) and a two-dimensional map is merely one quite limited thing that can be done with this data.
As this discussion will explain, a data-centric approach rather than a map-centric approach drives the data conversion model in some very non-traditional directions, which can in turn lead to dramatically lower unit costs, higher database quality, and the ability to predict and meet schedules accurately.
The Paper
In the beginning, there was a map. The first cave dwellers scratched a map in the dirt, then on the walls of the cave, to remind themselves and to show others where they found water or game. The earliest mariners drew portolan charts to identify rocks, channels, and routes. The great illustrated world maps, starting in the 15th century, became more and more widely available with the onset of the printing press in 1455, and great atlases began to appear.
Mapmakers grew more and more ingenious as they struggled to portray information with more than two dimensions on their maps; a famous map of Napoleon�s march into Russia, for example, showed geography, gain or loss in numbers of soldiers, and several other factors mapped over time, all in two dimensions. But the severe limitations of depiction in two dimensions became even worse when attempting to show maps in three dimensions�because then it became even harder to read accompanying data when it was printed on curved surfaces.
Then with the immense investment in the skills developed by engineers and draftsman to draw three-dimensional objects in two dimensions, or even later in the development of computer-aided design/computer-aided manufactured (CAD/CAM) devices to depict three dimensions graphically, and with the vast inventory of existing maps on which the assets of utilities, telecommunications companies, pipelines, and other transport industries were depicted, it was perhaps natural that practitioners first looked to the map when they thought about creating more automated geospatial information systems (GIS).
Thus an industry developed in which the point of departure was the map, the eventual result was to be the digitizing of maps, and the users of the information were only those who needed maps. New hardware and software was developed, aimed at making better, more accessible, more error free, and more easily revisable maps. The entire GIS industry grew up around taking paper maps, digitizing them into electronic maps to reduce the costs of making new maps.
Meanwhile, in the field of mathematics, where an entire mathematics developed around n-dimensional spaces, without any regard for what they might �look like� in our real world, and in data processing, where beginning after the second World War the speed and power of processing and the ability to store, retrieve, and search data began to increase exponentially, the tools became available to think about storing and using data in an entirely different way from a map. By 1990, Oracle was selling a new �relational� database, which had been developed in the mid-1980s by a small west coast firm for the Central Intelligence Agency, a database that radically changed the traditional hierarchical structure in ways that created vast new opportunities for associating multiple data attributes.
The value of the advances in mathematics and data processing, storage, and retrieval was never to make �better� maps. In fact, in the sense of a map as a portrait of a geography on a two-dimensional surface, these advances should have been thought of as making the map itself obsolete. Instead of a two-dimensional or three-dimensional physical model of geospatial reality, for years it has been possible to carry many dimensions of data tied to specific locations, to check the data for consistency, to access and change a database easily, and, what has become increasingly more important in this age of enterprise-wide data solutions, to export it to other applications. For applications actually requiring a map, it could easily be printed or sent to a screen for viewing, in any scale and easily transmitted or handled.
To begin with an idea of the database, rather than with an idea of the map, may seem somewhat abstract, but it drives a fundamentally different approach to the problem of �digitizing maps. �Because one is actually not �digitizing maps� at all, but instead creating a database of many disparate elements tied to geospatial coordinates (elements that can be printed in a host of ways or exported to a host of applications) and a two-dimensional map is merely one quite limited thing that can be done with this data.
As the discussion below will explain, a data-centric approach rather than a map-centric approach drives the data conversion model in some very non-traditional directions, which can in turn lead to dramatically lower unit costs (as a result of being able to use substantially lower-paid production staff), higher database quality, and the ability to predict and meet schedules accurately. The most important of these drivers have been:
Each of these five areas will be addressed, illustrating how the data-centric open architecture approach developed by Apex Geospatial works today. We believe that these new systems and approaches have indeed transformed an industry and created data options, now generally untapped, which will drive business organizations in the new Millennium.
This flexibility also has interesting organizational implications for a utility, because with multiple users (and perhaps funders) of conversion data, important organizational and procedural decisions may need to be made.
Next, the users of the data should work closely together with the database designers on the creation of the Conversion System Design Document (CSDD)SM, both so that the great flexibility of the Oracle-based database can be achieved, and to design the database at minimum cost.
Finally, the team should specify, in as fine detail as possible, all of the potential uses of the database, not only for maps (and in all of the different scales or forms of maps), but also for the raw data. It is vital to anticipate, insofar as possible, all of the potential needs for the data, because key managers with differing preferences, the regulatory climate, and other factors change over time. Far too often, the result of short-sighted or cost-driven conversion models is data that cannot, without intervention, be used again for other applications.
The specific components of a CSDDSM are straightforward:
The production procedures and the DataWorks� customization specifications are completely integrated and are developed in tandem; they are commonly referred to collectively as the �road map. �This �road map� commonly encompasses two domains (the distribution system and the land base) and two data worlds (logical and graphical).
Experience has shown the value of spending extensive time and effort (and hence costs) on creating the CSDDSM, since it is the fundamental basis for any successful database creation project. Because of time pressures often brought on by outside forces, long procurement cycles, and a general eagerness to get going, this is a phase that clients often wish to curtail�or even to begin conversion without a clear data model in mind. This is a serious mistake, for recovery from a database design effort started without a clear specification is often very costly or even impossible.
Because Apex Geospatial began as a pure data conversion company with specialized software and techniques that raised the industry quality standard from 99.95% to 99.995% in non-graphic data conversion (that is, from one error in 2KB to one in 20KB�or from one error a page to one error every ten pages), Apex Geospatial�s focus has always been on delivering quality data. Thus it was natural, when we began to consider the data-centric approach to building a GIS database, to focus on data entry quality�not letting the needle into the haystack in the first place�because we already knew the cost calculus between prevention and remediation.
We designed a procedure to produce 99+% data quality with a four-step system, where two different coders coded the attributed data by hand, and then two new and different keypunchers entered the data. Employing many of the techniques perfected over many years of data entry (such as in-process quality control and others described below),Apex Geospatial�s extensive attention on not allowing any bad data into the database.
Obviously, there would be numerous ways to enter data carefully, each with its own cost profile. The brief description of the Apex Oracle Data Entry Application Interface (ODAPITM) is meant only to be illustrative of one way to achieve excellent accuracy.
ODAPITM,designed for double-capture of data from two independently created coding sheets, yields extremely high accuracy rates�in excess of 99%.
The ODAPITM method involves the following steps:
The Apex Geospatial proprietary technology to do this is called DataWorks�. Once the data is safely and correctly in an Oracle database, it can be ported to any target system. This essentially changes the database into a strategic tool, which can be used to produce maps in many forms or for alternative purposes. With the entire GIS data model compiled in Oracle�i.e. the symbology, attribute data, network definition, graphics, and graphics placement rules�converted data has the maximum degree of flexibility.
This also allows for sophisticated checking on the consistency of database information through simple programming. Built into DataWorks�, for example, are connectivity checks for distribution networks, so that even if the mistake was on the original map it can be corrected in the database. Obviously this also accommodates late design changes with minimal impact to cost and schedules, since no maps are printed until the database is completely rationalized�build once, use many.
In short, although the first purpose may have been maps, clients receive a high-quality, open architecture database with many alternative uses�a database that can have powerful applications elsewhere in the organization. This sometimes can have important budgeting and organizational implications for the buyer, and it all results from an initial focus on the data rather than on a map. In section 4, we describe another very important component of producing data of the quality necessary to drive a data-centric approach.
When ProACTTM is applied under proper conditions, such a high degree of standardization results that the output from any one of our facilities is indistinguishable from that of the other. In itself, proper scheduling and management of human resources is a key component of ensuring quality, and great cost savings are realized in such an integrated system. Clients visiting Apex Geospatial�s Indian facilities, for example, are often amazed that several hundred production workers receive regular paychecks based on their own quality-weighted productivity�with only one payroll overhead employee in the facility.
The typical quality control is done at the end of the production process, and indeed Apex Geospatial and every other producer has tests, procedures, and policies to make certain that the eventual results are correct. But Apex Geospatial�s great insight was to recognize years ago that GIS database creation is a scientific process, not a craft. We established a conversion system based upon this premise and borrowed an important concept from the process (assembly) industries (e.g. car manufactures, circuit board assembly, chemicals); that is, testing the quality of the interim product at each production step. We call this ProcessQA.
ProcessQA operates in parallel with production. Samples of output from each production step are taken daily on a real-time basis. Production continues uninterrupted even as the samples are checked. If an error is discovered then corrective action is taken to re-calibrate the production process. For example, if a series of errors is discovered which is then traced to a misunderstanding of a specification, as can happen in the early stages of a new project, all affected operators are re-trained.
Another powerful aspect of ProcessQA is that samples are deliberately biased towards potential errors. For example, when a new specification is first introduced on the production floor, ProcessQA analysts take a disproportionately large sample from the relevant task. In this way, the sampling changes from day to day, continuously targeting areas of possible problems.
When the steps above have been properly applied, the result is a highly accurate database and highly accurate maps�accuracies of 99% or higher. The final product quality control then applies normal sampling techniques before final delivery to the client, but there cannot be quality problems at this stage if the other steps have been followed correctly.
Next, you will be fanatical about data quality, because once an error gets into a database, unlike on an individual map; it is costly, difficult, or perhaps even impossible to get it out. You will approach quality on many levels: keying in the data, testing and configuring the data, and ensuring quality during the process of production through process quality control.
Finally, you will store the database in an open-architecture relational database system, so that you can port it not only into the current target system for today�s uses, but so that you end up with an asset with a vast array of future uses.
Apex Geospatial Data Services, LLC
400 N. Loop 1604 East, Suite 300
San Antonio, TX 78232
Telephone (210) 404-9585
Fax (210) 490-3665
info@apexinc.com
www.apexinc.com
Conversion System Design Document (CSDD)SM,DataWorks�, ProACTTM, ProductQASM, ProcessQASM, ODAPITM are registered Marks or Marks of Apex Data Services, Inc. in the United States and other countries. Other product and company names mentioned herein might be registered Marks or Marks of their respective owners.