Tim O'Brien
Dante Fernandez
Guy Theriot
Our society revolves around legal hard copy documents for everything from land ownership to E-911 tracking reports. The computer era has helped us become less dependent on paper products by developing special programs that track different types of information. The database design process of digital information has forced users to consider future uses for their information in applications that are to be shared with others. Being creative in how we track and structure these databases will enable us to retrieve information quickly at a later date. With digital scanning technology, users can convert paper documents into digital form and then hot link or retrieve these documents using key search words. Although this way of thinking is not new, the resources necessary to populate GIS attributes can be extensive. The planning of how information is gathered and stored in a database is usually a key factor in the success of a GIS<\P> or IS program. The IS databases of tomorrow will require a GIS user to populate only very basic information about a point, line, polygon or event. The onset of new GIS and document retrieval programs are lowering the level of effort and cost needed to find and retrieve OCR scanned documents. This will link the past hard copy paper age with today's digital world of information with remarkable speed and accuracy. End users will now swim in a world of unstructured digital information.<\P>
This paper will illustrate how advanced technologies, along with an innovative turnkey project methodology, can be applied as a practical solution for GIS users who need to search and retrieve large amounts of scanned or electronic text documents. Utilizing APIs, DDEs and Excalibur Technologies' EFS product, users of ArcView 3.0 and Map Objects can perform "fuzzy" searches to retrieve documents from a wide variety of sources. Document retrieval from ArcView GUI environment is performed by transparently submitting a query composed of GIS database attributes against a neural network-based binary pattern index.
Interactive GIS & Document Retrieval Systems: The Unstructured G(IS) Databases of Tomorrow
Many of us have experienced the Data Manager's nightmare: Worn record documents that suffer from crease lines, frayed edges held together by scotch tape, mold due to exposure to moisture, or just plain old overuse and neglect. Then there are the documents that must mean something to somebody but seemingly don't have a home due to inadequate and outdated filing systems.
More and more, organizations large and small are being forced to deal with the question of how to integrate warehouses full of old documents into their current database systems. Not only must the method use a format that is compatible with current as well as future hardware/software specifications, but it must also be cost-effective. Such an organization is the Bernalillo County Public Works Department (BCPWD), in Albuquerque, New Mexico.
Currently within the BCPWD, hard-copy documents are stored in bulk where they are difficult to manage and constantly at risk of being damaged by fire or flood. Requests for documentation by various county offices, organizations and private citizens require the physical handling and retrieval of documents from storage. Standardization of filing systems is also difficult as each new administration institutes new data management techniques. Each new system requiring additional training of personnel.
A request was made by the BCPWD to develop a solution to their document management problem. This solution should take advantage of in-house GIS hardware and software to integrate scanned documents into their current database management system. As well, the solution should provide an easy technique for scanning, storing and retrieving $20 million in large and small format documents. These documents should be retrievable via a binary fuzzy logic search engine integrated within a GIS. Raster to vector and Optical Character Recognition (OCR) text files should also be derived from the solution, thus reducing drafting time and providing a database upon which other queries could be executed against the entire contents of the document, regardless of it's size. Stored at one location, the scanned images should be accessible to other off-site departments via the internet. Such a solution, the BCPWD has estimated, would save up to $200 thousand per year by reducing costs in personnel time and resources. The solution is ARTView.
ARTView is a method for locating and retrieving scanned, digital documents by selecting an associated geographic feature in ArcView, such as roads, parcels, or utilities. ARTView consists of an interface between Esri's ArcView and Excalibur's Electronic Filing System (EFS). Both companies are leaders in their market areas of GIS and document storage and retrieval, respectively. ARTView is a system designed for the end user who must maintain the integrity of scanned documents in a user-friendly and secure environment. The software goes well beyond the current concept of ArcView's "hotlink." Documents are retrieved using existing attributes as a search clue string. This string is matched against the entire contents of all documents using a fuzzy logic binary search pattern. Documents containing a match, or near-match, are ranked and returned to the user screen. This eliminates the tedious task of hard-coding directory paths to a specified image.
ARTView Architecture
ARTView is designed to bring powerful search and storage capabilities to ArcView users in both the WINDOWS and UNIX environments. It�s scaleable architecture benefits the �stand alone� system as well as internet-based projects designed in Esri�s Internet Map Server or Map Objects environments. ARTView can be used in a wide variety of development frameworks as a front-end document management system or as a component of a highly robust system created in a variety of popular development frameworks, such as Visual Basic and Visual C++. Designed as an extension module to ArcView, ARTView takes advantage of all the mapping, attribute storage and programming capabilities of ArcView. ARTView also utilizes the Pattern Recognition Technology and File Room Metaphor of Excalibur EFS.
Previous and current document management systems are limited in what they can provide. Retrieval of documents from physical filerooms is labor intensive and did not provide a digital backup system. Earlier document management systems required the tedious database attribution of key search words into a structured format. Additionally, these key words would need to be added to the GIS structured database in order to create a hotlink to documents. These documents known as image BLOBs are mearly used as a static visual tool which cannot be queried by the system for specific data other than the structured key search words.
Reusable document information is limited to visual use only. Individual elements of the document cannot easily be located or extracted for use in other projects. The technical limitations of OCR require most applications to perform exact text matching.
The bottom line is that previous retrieval systems are incomplete, requiring time-consuming database attribution. Aditionally, they are found to be inefficient in their use of technical resources.
Today's IS managers can link together current GIS databases with yesterday's hard copy documents using a variety of advancements in software technology. OCR technology, coupled with binary code patterns, reduces scanned data to it's simplest form. Binary pattern recognition code accommodates for the inaccuracies of the OCR process. The pattern recognition process allows the accurate search of OCR text without any need for correction, clean-up, or re-keying of data. Indexing each character in the scanned image allows the user to search on every element within each document of the IS database. As well, images that were once BLOBs can now be queried locally or remotely, via the internet, based on their entire contents, rather than only structured key words. This method will result in a more efficient use of both human and system resources.
Esri's ArcView and Excalibur's EFS are both recognized as the industry leaders in creating cost-effective, off-the-shelf software products. Both products provide an environment for inter-application communication (IAC), dynamic data exchange (DDE) and advanced program interface (API) calls that allow the programs to exchange information. ARTView uses a combination of these methods to exchange tabular information as a "clue" between ArcView and Excalibur EFS. The powerful Adaptive Pattern Recognition Process (APRP) fuzzy logic search is performed on the OCR-processed documents, which then searches the binary index patterns for a match, regardless of spelling errors. A hit list is returned from EFS and ranked according to it's similarities to the search clue's binary pattern. The GUI interface allows the user to select a compressed image, OCR-processed text file, or fileroom hierarchy structure to be viewed.
The user begins with a View of a geographic area containing various features, such as roads, parcels, streams, utilities, etc.. Attribute Tables can also be added to the screen allowing the user to view tabular information about each feature. The data in the Attribute Table may exist as an ArcView table or an RDBMS, such as MS-Access or Oracle, linked to ArcView via inter/intranet.
To begin a search of scanned digital imagery associated with the spatial
features in the View window, the user selects a feature item. This causes the
feature in the View window and the associated record to be highlighted
within the Attribute Table. With both the View and
an associated Attribute Table visible, the user has the option to search
the scanned image library based on the feature item(s) selected in the
view window, or to query the Attribute Table to locate a specific set of attributes.
With the proper attribute record(s) selected, the user may now begin
a search of the scanned image library using ARTView.
A series of dialogue boxes appear allowing further modification or additions
to the search parameters. A user may specify one or several table columns
from which specific record values will be used.
The user may then modify the spelling, case or order of those values. Further, entirely new strings or integers may be added to the search parameters. When the search parameters have been established, the user may click the �Submit� button in the dialogue box and begin the search.
The user executes the Excalibur fuzzy logic search engine by clicking on the submit button. This interface will soon be changed with the development of the Dialog Designer extension. A hit list screen is returned allowing the user to view scanned documents or OCR text files.
Any database developer or professional charged with the care and maintenance of large stores of hardcopy documents can use GIS and document retrieval software packages. It will provide the user with an unstructured spatial search engine to link their document management systems. Assessor offices will find this type of interactive GIS extreamly useful since their world revolves around both mapping and many legal documents associated with each parcel. Police and fire departments could use retrieval systems to quickly assess an emergency situation by reviewing previous hard copy documents from their databases. If an entity uses maps and hard copy documents the ability to link these systems together will prove to be invaluable.
Advancements in hardware and software technologies have made it possible to implement an integrated GIS and document management solution at a reasonable cost. These systems remain complex and require careful analysis of your organization's needs mapped to product capabilities. Also critical to the success of the project is an experienced systems integrator willing to accept sole responsibility for a successful implementation and post delivery support.
For more information on linking your GIS to document archieving and retrieval systems contact
Tim O'Brien
GIS Manager
8401 Monitor Drive N.E
Albuquerque, NM, 87108-5058
505-797-2410
Dante Fernandez
GIS Technician
301 Harvard Drive S.E., #65
Albuquerque, NM, 87106
505-268-5116
Guy Theriot
President Adaptive Retrieval Technologies
7301 Jefferson NE Suite E
Albuquerque, NM 87109
505-343-6117