For over 20 years, the Oregon State Historic Preservation Office (OSHPO) has collected information on historic inventories and surveys conducted in Oregon. These reports were shelved and graphically displayed on a menagerie of maps, which themselves were collected over the years. This map set now displays almost 18,000 sites, inventories or surveys. While some of the recent data exists in digital form, for most of this data the only other record is that in the original reports submitted to SHPO. Performing a geographical search of this data consisted of reviewing the map collection at the SHPO’s office, not exactly expedient or convenient for the number of users or needs of the users of this data. One of those users, the Bureau of Land Management, is working with SHPO to automate this information. While the most comprehensive plan would be to start from the original reports to create a digital data set, converting the current “Paper GIS” to digital allows for a quicker, more cost effective alterative, though not without its own cost.
SHPO stands for State Historic Preservation Officer or Office. Oregon’s SHPO is within the Oregon State Parks and Recreation Department (OPRD). This is part of a national preservation partnership let by the National Park Service (NPS). Under terms of the National Historic Preservation Act of 1966 each SHPO is delegated the duties of management and administration of programs for the identification, protection and management of the states significant historic and prehistoric resources. They perform this task by collecting and maintaining information about these resources.
The current data that OSHPO has is in two forms: reports files and maps.
The report contains survey, testing and excavation reports, paper maps and site forms. They are documents created to fulfill the required task of inventory or survey of an area prior to some type of current land activity by various agencies or a report created by individuals interested in historic or prehistoric issues. Currently there are roughly 18,000 reports, and about 45,000 site forms.
Most of this information is gathered from state and federal agencies that perform surveys of areas preceding a major geophysical, biophysical or ownership change. The information in the survey is typically documented with a physical description and may include a report map. For the last 23 years, using the physical description or a report map, a geographic depiction of the area(s) has been hand drawn on a single set of maps kept at SHPO.
The map set is composed primarily of USGS 7.5 minute series topo-quads, although other production maps have been used such as USGS 15-minute series, USGS Vicinity maps and BLM surface maps. Each report has been translated to the maps to display its physical location and or extent, defined as closely as possible for the scale used in each instance. All of these maps contain information that was hand drawn onto them.
The method to display uniqueness of each survey was to arbitrarily choose a pen color or color plus a hatch or fill pattern that would be different than the data already presented on the current map. Features were drawn with pen, highlighter markers, permanent markers, ballpoint pen, and pencil. Solid points were also used.
After many years of compiling data onto this single set of maps, it is evident that various areas have been surveyed many times. This has resulted in a layering of surveys on the maps with differing colors and patterns. For each entity recorded on the maps a reference number written on the map identifies the report that the area or point was from, although some points were simply symbols and not referenced.
The initial inquiry focused on how to scan and catalog documents then share them with others. Specifically documents for which the digital version would become obsolete due to changes to the original, thus creating a revolving cycle of scanning the altered documents and redistributing the images.
After investigation and discussion it was found that a large collection of geographic data was being recorded on a static media base that was becoming more difficult to use as the amount of data increased. The maps contain so much data that in areas they are difficult to decipher. The map set was in a sense a “Paper GIS. ”
A recommendation was put forward to create a true digital geographic data set from the map set. The proposal was built on developing the data using a phased approach where at the end of each phase there is a usable and complete product. Using the phased approach also would allow for ease of start and stop as funding or time permitted.
The proposed phases include inventory the maps to discover what is at hand, scanning the paper maps to create digital images, geo-reference or geo-rectification of the digital images, and digitization of the data on the images.
The first step was to perform an inventory of the documents to be scanned. For the inventory a database was created that collected the name, series, scale, southeast corner coordinate, drawer number, miscellaneous notes on the condition of the document, and the amount of data on the map. This created a clear summery of the documents, including the number, type and relative amount of data on each map.
The amount of data on a map is expressed as activity and is represented by a five class category comprised of Empty, Light, Moderate, Heavy, and Very Heavy. The activity and map size are displayed in Figure 1.0-SHPO Map Intensity.
The result reveled a total of 972 maps, fewer than first theorized. This was due to the wide usage of 15 Minute series maps. Other tabular results are displayed in the table below, Figure 2.0-Map Types and Counts. This was a unique view of the map and data set for SHPO, as they had not categorized the map set in this way before.
Map Types and Counts |
|||||
Activity | 7.5 min | 15 min | 30 min | other | count |
blank | 70 | 6 | 1 | 77 | |
light | 429 | 59 | 7 | 1 | 496 |
moderate | 147 | 107 | 11 | 265 | |
heavy | 39 | 64 | 2 | 105 | |
very heavy | 26 | 1 | 27 | ||
total | 685 | 262 | 21 | 970 | |
Figure 2.0 Map Types and Counts |
It is truly difficult to fully grasp the complexity of the maps without seeing examples of them. Some maps are very legible and easy to read, others are difficult at best and close to being incomprehensible. One fictitious example of the SHPO map set follows. This document was created to look similar to an actual document for the purpose of testing digital capture.
The scanning of the original documents fulfilled many requirements for SHPO. It created a copy of the “one of a kind” documents and protected the original investment from loss (a money and time savings). Scanning also allows the maps to be duplicated easily and shared with others outside the SHPO office.
Scanning documents is not without drawbacks and risk. The original documents can become damaged. Of the 972 documents that were scanned, three were damaged in the scanning process, though not destroyed. The risk of damage increases as the condition of the documents decreases. All three of the documents that were damaged in the scanning process had prior damage or excessive wear.
Another risk inherent in scanning documents is that once the original becomes digital it is easy to make copies, thus making it difficult to control access, especially when the data is distributed to many different locations.
A potential complexity in scanning documents is that of maintaining validity between the original and the digital versions of the document. The decision was made that once the documents were scanned they then become static documents that no longer accept changes or additions. This eliminates further cost in rescanning documents and the eventual confusion and loss of data trying to maintain a current digital dataset.
Making the map set static also helps in structuring a phased approach for the project.
The scanning process was relatively straightforward. Since the originals were scanned off site, contained sensitive information, and were very difficult to replace, it was agreed that they would be hand couriered between locations.
A user form was created in MS-Access that captured the same elements as were in the inventory. This created a “blind” quality control between the inventory and the scanning, making sure, that every document in the database was scanned and that every scanned document was recorded in the database.
In addition to the items captured from the inventory the scanning phase captured the production date of the map, revision date and type. The user form also reported a unique file name for each image to the scanner. The unique name was based on the OHIO code and scale of each map.
Scanning was preformed on a Contex scanner with a 36” throat, attached to a Gateway PC running WideImage software.
Each image was scanned at 300DPI, with its own 256 indexed colors. The color indexing was chosen for file size as well as clarity of the images. A higher DPI was not found to be beneficial.
Scanning produced 42GB of uncompressed TIFF imagery. The images were compressed in MrSID at 20:1 ratio to approximately 5GB, which allowed easy distribution and delivery on CDs.
The workflow diagram can be seen in Figure 4.0-SHPO Map Intensity Check during Scanning.
Differences in the way that maps were categorized for activity can be seen in Figure 3.0 SHPO Map Intensity Check during Scanning. In some cases an activity level was not entered. Total time for scanning the images was 5 weeks.
The rectification of the images allows them to be used in conjunction with other geographic data as well as each other.
Geo-rectification was preformed using ER Mapper 6.2, matching the images to the existing available USGS 7.5 Min 24K DRG data set.
Some of the images required additional processing to correct issues created in construction of the original document. There were a small number of documents that were comprised of portions of multiple maps that were taped together to form one document. Unfortunately, the construction created errors in the form of gaps of missing or duplicated data. These documents, as images, where split at the point of adjacency, rectified separately and then recombined correctly.
Quality control of the rectified images was performed by independently checking each image against the USGS DRG data and a 7.5 Min index data set.
The final images after the rectification process were again TIFF and MrSID images delivered on CDs.
Process workflow for the rectification process can be seen in Figure 5.0-SHPO Scanning Project Workflow Process Overview.
Total time for the rectification process was 26 weeks with personnel working only part time.
Currently we are working on the digitization portion of the project. Digitizing the data from the map images makes the data easier to use, and read, enabling integration with other data sets, gives the ability to perform queries on the data using attributes or geographic extents, and also allows for joining or creating relates to other data sets, extending the data. In addition, having digital data gives the ability to add, create and alter the data.
The data model was created with two portions, data quality and data content.
Data quality attributes attempt to document the quality of the data, where the data came from, and who manipulated it.
The attributes are as follows, with a description for each and the rules for attribution during this phase of the project.
Data Source Text (source)
This is the name of the source of the data. If the data came from the State Historic Preservation Office set of paper maps, the source should be identified as the Ohio code and scale denominator of that source map. Otherwise, name the source that the data came from.
SHPO digitization project rule: MandatoryProblem Code (prob_code)
This indicates whether there is a known problem with either the spatial representation of the entity (an entity, also called a spatial feature, being something depicted spatially a point, line or polygon) or the attributes that describe that entity.
Codes are:
None no known problem
Problem a problem exists
Resolved there was a problem but it has been resolved
SHPO digitization project rule: Mandatory
Problem Description Text (prob_text)
If a problem was noted in the “Problem Code” field, then this field is used to describe what that problem is or was.
SHPO digitization project rule: Mandatory if Problem code not equal to ‘None’.
Edit Date (edit_date)
The date of original data capture or the last date corrections were made or problems resolved.
SHPO digitization project rule: MandatoryEditor Name (editperson)
The First and Last name of the person that performed the last edit/data capture. (Last Name, First Name)
SHPO digitization project rule: Mandatory
QC Date (qc_date)
The date that the feature was last reviewed for quality control purposes.
SHPO digitization project rule: Mandatory only after QC step.
QC Person (qc_person)
The First and Last name of the person that performed the last quality control review of the feature.
SHPO digitization project rule: Mandatory only after QC step.
Data content attributes attempt to cover what the data is and what it relates to.
The attributes are as follows.
Spatial Feature ID
A system assigned identification number for each spatial feature.
SHPO digitization project rule: Mandatory system requirement
Trinomial (trinomial)
Also known as the Smithsonian Number, this is the official number assigned by the State Historic Preservation Office for some, but not all, sites. (e.g. 35 DS 00000)
SHPO digitization project rule: May be null if not present on original map.
Oregon Historic Site Number (or_number)
The SHPO numbering convention for some historic sites, (e.g. Orxxxx).
SHPO digitization project rule: May be null if not present on original map. Look for notes similar to example.
Agency Site Number (ag_site_no)
An agency assigned number for cultural resources sites (e.g. 0505040604SI).
SHPO digitization project rule: May be null if not present on original map. Look for notes similar to example.
Agency Project Number (ag_proj_no)
An agency assigned number for cultural resource project areas. (e.g. 05050400000P)
SHPO digitization project rule: May be null if not present on original map. Look for notes similar to example.
Agency Survey Number (ag_surv_no)
An agency assigned number for cultural resource survey areas. (e.g. 05050400212S)
SHPO digitization project rule: May be null if not present on original map. Look for notes similar to example.
SHPO Bibliographic Number (or_biblio)
Sequential number, starting with 1, assigned to all materials received by the SHPO.
SHPO digitization project rule: May be null if not present on the original map. Look for notes similar to example. This should be the most common attribute collected.Contractor Project Number (contprojno)
A number assigned by contractors for cultural resource projects. Format is varied.
SHPO digitization project rule: May be null if not present on the original map. These may be represented as simple numbers or code numbers.
Age Class Name (age_class)
The general time period that the feature is related to (e.g. prehistoric, historic). Only allowable words are “Prehistoric,” “Historic,” “Both,” or “Other.”
SHPO digitization project rule: May be left null.
Entity Type Code (type_code)
The type of cultural resource entity (geographic area) that is being captured and described. Allowable codes are:
Project A cultural resource project area. A project area may include areas that are not surveyed or do not contain any sites (point, line, polygon).
Survey A cultural resource survey area. A survey area is an area that has been physically examined for the existence of cultural resources (point, line, polygon).
Site A cultural resource area that contains 10 or more artifacts or contains a feature. Sites smaller than 2.5 acres in extent are treated as points.
Isolate An isolated cultural resource location consisting of less than 10 artifacts and containing no features (points).
Other Cultural resource locations that cannot be described by any of the other codes (points).
SHPO digitization project rule: May be left null.
Cultural Resource Entity Comments (comments)
Any notes or comments about the information contained on the SHPO maps that cannot be described elsewhere. An example would be where an isolate has been identified as being rock art. No other place exists to capture that information so record it in this comments field.
SHPO digitization project rule: Mandatory if Trinomial and Oregon Historic Site Number and SHPO Bibliographic Number left null. May be null if not needed. Look for notes related to feature.
Currently we are collecting point and polygon data type only. Linear features are collected as a buffered area of the feature. This is due to the fact that the information on these maps do not have accuracy, they only have reliability. Any feature can be reliably found within a distance of the point or within the area of the polygon. More precise information would need to be gathered from the individual report of the given area or point.
Current buffer of the linear features is 30 ft. This is for all linear features. To be able to prescribe a more appropriate buffer one would need to review the report that the feature was from, a task far beyond that of the current scope of the project.
The current process for data capture is to start with an empty copy of the personal geodatabase with the current data model. Then in ArcGIS 8, begin by digitization of the polygons followed by points and lines or vise versa. The method is developed this way so that during the collection the digitizer may become familiar with the map image and review their first capture during the capture of the next set of features. See Figure 6.0- OSHPO Data Capture.
In the current environment, we have multiple data collectors, digitizers, and editors. To avoid conflict between users access to the data, it is controlled using file system permissions and by following the data collection guidelines set out for the project.
Each digitizer has his or her own personal geodatabase where current data is added or digitized. After a period of time, predominantly weekly, the data is moved from the digitizers’ personal geodatabase to the master geodatabase. The master personal geodatabase is held in a separate directory. The data is available to the digitizers by allowing them read permissions. When an area is to be edited or data added to the master, permissions are set to write for the individual performing the task.
Currently the data flow control applies not only to the data being collected from the scanned map set but also to the new data being created at the SHPO office. This may be changed in the future to allow the SHPO more access to make changes to the digitized data. See Figure 7.0- OSHPO Data Control.
This project relies heavily on correct and complete initial capture of the data from the map images. From the onset it was understood that there would be a high amount of error in the data. The QC procedures are set up to perform the minimal amount of follow-up possible without eliminating this step altogether.
The first step of QC is to check that the required attributes are filled in using a set of prewritten queries. Second, data is reviewed against the map images, typically by a different editor from the one performing initial capture. During this review features are selected and attributes are rechecked. See Figure 8.0 below.
The largest problem for digitization is the interpretation of the data on the maps and the reading of the authors’ handwriting. Documentation of the methodology of interpreting the map set is a difficult and continuous task. The exceptions to the rules are too vast to include in this presentation of the project. In short, the document covers capture of donut polygons, multi-part polygons, methods of identification and attribution of symbols.
Areas that have multiple polygons, polygons on top of polygons on top of polygons, have been a persistent problem area where features can be missed. Other features that are easy to miss are those that have been drawn with highlighters that have faded over time and those where the colors have bled together.
Some other difficulties with the data include features that cross map boundaries. There are times that the feature does not line up between maps, either due to scale or inaccurate drawing. Cases of polygon symbolization have also been a problematic.
Generalization, further inspection, isolation or committee can resolve some problems, but others must be presented to the author of the map set for definitive resolution.
Currently, problems are being grouped by map and submitted to the author for resolution. It is expected that even the author may not be able to resolve all of the problems.
Figure 9.0-Digitization Completion Time shows the current total time estimate for completion of the digitization. This table is based on the actual times required to digitize images, excluding the first 20 images.
The QC process only allows 15-20 minutes per image.
Digitization Completion Time |
|||||||
Count | Activity | 7.5 min | 15 min | 30 min | other | time alloted | total time (hours) |
77 | Blank | 70 | 6 | 1 | .36 | 27.72 | |
497 | Light | 429 | 60 | 7 | 1 | 1.14 | 566.58 |
265 | Moderate | 147 | 107 | 11 | 4.87 | 1237.55 | |
105 | Heavy | 39 | 64 | 2 | 8.08 | 848.44 | |
27 | Very Heavy | 26 | 1 | 15.33 | 413.91 | ||
971 | Total | 685 | 263 | 21 | 2 | 3094.16 | |
Days | 386.77 | ||||||
Figure 9.0 Map Types and Counts |
Using the database constructed during the Inventory phase of the project, the project status can be monitored throughout the project. The image in Figure 10.0-SHPO Digitization was made by linking the tracking database to the SHPO index, a data set symbolizing the maps in the set.
During each phase of the project the database is expanded to capture needed data elements. During the rectification phase, added elements consisted of the name of the personnel performing the rectification and the data it was completed. During the digitization, added elements included again the name of the personnel, the date completed and the amount of time required.
A user form was created to help update the table for each phase. The use of a separate table for tracking information allows that information to be used separately from the geographic data.
During the beginning of the digitization phase there was an active search for tools that would help with the tasks. Some of the tools that we have found helpful already existed in Arc8, others we have had to create.
The buffer wizard in Arc8 is very handy and easy to follow and saves the preference from prior use. A nice feature of the buffer tool is that the user may specify the feature class in which to add the new feature.
The trace tool is also used where applicable. This tool allows a user to easily replicate or follow an existing line into a new feature.
The geo-database model has also been found to be quite powerful. We have used the domains and subtypes, where valid, to control the attribution. For example, the domains make sure that required fields are populated as prescribed by the data model. Also, the digitizer’s names and feature types are selected from subtypes.
Where the domains and subtypes leave off, we have created application extensions to further help with the attribution of the data.
The Donut tool is an example of a prewritten tool that has been put to use. It was downloaded from the Esri web site. With the tool an editor is able to create a polygon with an excluded area inside.
The Add Image tool is one that we have created. This tool will add an image to the view after the user clicks on the area of interest. This reduces the amount of time spent looking for adjacent images.
The future for this data is wide open. There have been tests linking the data to an experimental SHPO database, which would contain specific comprehensive information from a report received. This would make an incredible amount of information available to potential users. A second approach would be to scan the reports and serve them as single documents hot linked to the geographic data. A user would be able to instantly get the report for any selected feature or features in the GIS. Either of these approaches is equally possible once the data is completely digital.
There is also the probability of a future phase combining the new SHPO digital data with data sets originating in BLM. This would improve the reliability of the data and provide an even more useful product.