Soils maps and manuscripts have been developed and published for most of Missouris 114 counties. The maps and interpretive information in the manuscripts represent a substantial investment (each county represents about 12 to 18 person- years of work) and are very valuable resources for natural resource planning, research, and development. There are several agencies and other interests who would like to have access to this information, and would like to have it in an electronic format which is constant with Geographic Information System (GIS) technology.
The overall task of this project is to develop in digital format individual county soil coverages for the state of Missouri using available Geographic Information System (GIS) and scanning technology, including raster-to-vector and optical character recognition software.
Introduction In October of 1995, the Missouri Department of Natural Resources (MDNR) contracted with the Center for Agricultural Resource and Environmental Systems (CARES) at the University of Missouri-Columbia (MU), to develop a statewide digital soils layer. This contract is designed to develop pilot projects, to evaluate existing source materials and define standards and methodology to be used throughout the digitization of statewide soils information. As the number of Geographic Information Systems (GIS) and users continue to grow at an astonishing rate, so does the demand for digital map data. Increased numbers of GIS managers have been realizing the cost effectiveness and quality to be found in scanning as opposed to conventional table digitization. In the case of the soils data, the conventional use of a digitizing table would be slow and the quality of the data produced would be inconsistent from person to person. Scanning and automatic vectorization can produce consistent, high-quality line work with a minimum of user interaction. Hitachi's Tracer and Recognizer produces high-quality vectors from raster data much more consistently than manual digitizing. Back Ground One of the first and most important issues of the soils project was inventorying the source materials. Several days were spent working with Natural Resource Conservation Service (NRCS) personnel at the state office located in Columbia, Missouri, going through each individual county drawer. The end result was a comprehensive catalogue of soil maps source materials; documenting scale, number of quads, format and completeness. A majority of Missouris counties have mylar text and line separates source materials at either full or one third quad 1:24,000 scale maps, with a few counties mapped at 1:20,000 scale. Standards for the soils map digitization are being developed over the course of the contract period and will be finalized at the end of the contract. The pilot projects will be used to determine the final standards. CARES will strive to meet Soil Survey Geographic (SSURGO) standards in its effort to define the final standards for this project. Two counties and a watershed were selected as pilot projects. Stoddard county, located in the southeast part of the state was selected because its source materials are full 1:24,000 quadrangle maps. Bates County, located along the Kansas state line in the south-central part of the state was selected with source materials at one third 1:24,000 quadrangle maps. Loose Creek was selected as the watershed, it is located in Osage County and consists of about 45,000 acres. Scanning Hardware and Software Scanning will be accomplished by using a Contex FSS 8200 E-size scanner connected to a IBM RISC/6000 workstation. The software used to scan the map separates is Contex's CADImage/Scan. The available scanner resolutions are from 50dpi to 800dpi. Presently CADImage supports more than fifty different industry standard file formats. One of the key features of the software is the threshold settings, in which, all gray tones lower than the input number used will be represented as white pixels, and all gray tones over the threshold will be represented as black pixels. This works as thinning and filter process. Generally it has been determined that the mylar line separates will be scanned at 300dpi and the mylar text separates will be scanned at 500dpi. Raster-to-Vector / Optical Character Recognition (OCR) Information collection on various software companies that offered raster-to-vector and Optical Character Recognition (OCR) software was completed. The field was narrowed down to two vendors that were able to meet requirements and supply the project with a demo copy so that it could be tested out on sample areas. Both software programs are PC based. No IBM AIX versions able to do OCR were found. There are however, UNIX OCR software that will run on SUN workstations, but were not considered due to the fact that CARES uses only IBM RISC/6000 workstations. Hitachi's Tracer and Recognizer was chosen over Ideal's I/Vector. Both were evaluated as very good, but Hitachi's software was selected due to the increased editing functions available with AutoCAD. The Hitachi software is an application that runs in conjunction with AutoCAD and converts scanned maps into vectors and text which can be used by a GIS. Tracer provides the tools to do semi-automatic conversion techniques, while Recognizer provides for automatic recognition of both graphics as well as text. Once the vectorization is completed, the file is converted to a DXF format. The DXF formatted map is then converted to an ARC coverage using the DXFARC command. Various AML programs and menus have been written to further aid in the editing and error checking process in order to get the coverage into its final form. Summary The use of raster-to-vector and OCR software has greatly reduced the conversion process times of converting paper/mylar maps to usable GIS coverages. Working with good input separates, ARC coverage can be generated that are typically better quality than the same product digitized by hand. Selecting the appropriate parameters for conversions is a must, a slight variation can greatly change results. Scanning densities of 300dpi for line work and 500dpi for text appear to offer the best repeatable results from scanned data sets. High quality GIS hardware and software are now available at realistic prices. Many GIS managers are now realizing the fiscal advantages and superior results of using raster-to-vector and OCR conversion software over conventional table digitization. Given a well researched approach to data base design, scanning will take less time and result in higher quality data for the creation of many GIS layers.