Ernie Bergan
Bob Varner
Dave O'Hara
INTEGRATING THE ARCVIEW QUERYBUILDER WITH HIGHLY NORMALIZED ORACLE TABLES
Abstract
In the face of tremendous pressure from Congress and the American public for more accountability and in the midst of huge budget cuts, downsizing, and spending moratoria, the USDA Forest Service is in the process of migrating from legacy tabular systems to an integrated information paradigm in a distributed client-server graphical environment. Large volumes of data exist in the legacy system within the Oracle RDBMS. Forest Service application developers are attempting to migrate various legacy tabular data sources into a highly normalized tabular corporate database, simultaneously linking that data with spatial data that will reside in ArcInfo.
The complexity of meeting the business needs and enterprise requirements for a production decision support application to serve such a diverse user community, including users in intra-agency and interagency roles, as well as the American public, requires that applications developed in the Forest Service be bullet-proof, flexible, intuitive, and easy to use. A solution using the standard tools in ArcView is very compelling since ArcView meets the requirements for an easy-to-use Graphical User Interface (GUI), and USDA Forest Service personnel are trained to use the Query Builder. Query Builder can be used as it stands to formulate queries against normalized Oracle tables, but it does not meet a
“point-and-click” directive for the GUI. A user must know the Query Builder constructs, SQL, and have an intimate knowledge of the Oracle table structures and definitions – a skill mix which few users can reasonably be expected to have.
This paper discusses a USDA Forest Service solution to interface the ArcView Query Builder with highly normalized corporate data structures modeling complex dynamic segmentation event data. Example Avenue code and supporting operating system constructs necessary to implement this solution are available from the authors upon request.
Balancing national standards with local needs
The Forest Service administers lands totaling more than 125 million acres. While the basic business of the Forest Service is constant, caring for the land and serving people across the range of ecosystems on all National Forest lands requires site specific information for each National Forest. The Forest Service has been tasked with developing a software application for managing roads and trails which can provide a common user interface for Forest Service employees nationwide. This application and its supporting spatial and tabular data constructs is intended to be the common source of information about Forest Service roads and trails, allowing the Forest Service to provide consistent information to both employees and participating publics.
Nationwide about 19 attributes are deemed mandatory for Forest Service development roads. However, experience indicates that those 19 attributes may represent 20% or less of the total number of attributes required by a given Forest to responsibly manage their lands. Traditional development efforts within the Forest Service have modeled the data management for the common national attributes. This type of design allows a very efficient treatment of the national standard attributes, but creates a real integration challenge for other attributes critical to an individual National Forest.
In data models for roads offered to date, sites wishing to track attribute information beyond the national standard attributes has been presented with one of the following three alternatives to choose from. The first alternative is to alter existing tables, adding the columns necessary to store local information. While adding columns to existing tables is not too difficult, the development effort necessary to alter the supporting forms and reports is usually significant. In addition, updates to Forest Service software do occur. Additions or alterations to the application software might cause Forest-defined storage areas to be inadvertently lost. Such additions or alterations are the responsibility of the individual Forests.
The second option is to store additional information in
“other” tables rather than adding additional columns to nationally released software, requiring each site to develop the additional tables, including enough information to be able to relate to the national tables. Like the first option, this option requires considerable database development effort be handled by each Forest. In the current “right sizing” environment, many Forests do not have adequate resources to accomplish such development.
A third common solution has been to leave
“user-definable” columns in tables to provide sites with known locations to store information. The data definition and total number of columns to provide is always a challenge.A traditional event table structure
The traditional table structure for route attribute information within an
RDBMS utilizing a route control number (CN) would contain columns for: the control number, a beginning measure point (BMP) along the route, an ending measure point (EMP) along the route, and columns for all attributes of interest. The USDA Forest Service has tracked information in this type of tabular system for at least 20 years. Consider a simple example of a table which tracks three attributes; JURISDICTION , SURFACE TYPE , and MAINTENANCE LEVEL . Such a table structure, populated for two sample routes would appear as below:
CN (Character)
| BMP (Numeric)
| EMP (Numeric)
| JURISDICTION (Character)
| SURFACE (Character)
| MAINT_LEVEL (Numeric)
|
41
| 0.00
| 0.50
| FS
| ASPHALT
| 4
|
41
| 0.50
| 1.00
| FS
| AGGREGATE
| 4
|
8810
| 0.00
| 0.25
| FS
| AGGREGATE
| 3
|
8810
| 0.25
| 0.75
| BLM
| AGGREGATE
| 3
|
8810
| 0.75
| 1.50
| FS
| AGGREGATE
| 2
|
Note that this method of route attribute storage requires that all attributes for a route be stored within every record, whether or not the attribute itself actually changed at the measures. Thus, a change in a single attribute type can result in the redundant storage of all other attributes.
A generic storage paradigm
Perhaps 5 years ago, a group of Forest Service developers modeled data storage in a table structure designed to be a
“one stop shopping center” for information storage. This table structure, initially developed for the Asset Management System, assumed that all data types can be grouped into one a three types: date, character, or numeric. Information for the sample data shown would be stored in this table structure as shown below:
CN (Character)
| BMP (Numeric)
| EMP (Numeric)
| ASSET TYPE (Character)
| Date Data (Date)
| Character Data (Character)
| Numeric data (Numeric)
|
41
| 0.00
| 1.00
| JURISDICTION
|
| FS
|
|
41
| 0.00
| 0.50
| SURFACE
|
| ASPHALT
|
|
41
| 0.50
| 1.00
| SURFACE
|
| AGGREGATE
|
|
41
| 0.00
| 1.00
| MAINT_LEVEL
|
|
| 4
|
8810
| 0.00
| 0.25
| JURISDICTION
|
| FS
|
|
8810
| 0.25
| 0.75
| JURISDICTION
|
| BLM
|
|
8810
| 0.75
| 1.50
| JURISDICTION
|
| FS
|
|
8810
| 0.00
| 1.50
| SURFACE
|
| AGGREGATE
|
|
8810
| 0.00
| 0.75
| MAINT_LEVEL
|
|
| 3
|
8810
| 0.75
| 1.50
| MAINT_LEVEL
|
|
| 2
|
This table structure meets the intent of being inherently extensible. Determining whether a column is intended to be of date, character, or numeric datatype can be a challenge for personnel interested in drawing information from this table structure. In addition, early efforts at implementing this table structure attempted to include too many infrastructure entities in a single table, resulting in poor performance during data entry and retrieval.
A normalized table structure
The Travel Routes application stores the attribute information in a table defined so that any event type can be added to the table without requiring users to be outright database administrators. Very similar to the table structure first developed for the AMS system, the Travel Routes structure stores all data in a character datatype, thus eliminating the confusion over what column really contains the event values. In addition, this table is used only for the storage of roads and trails data. The column names are intended to reinforce the fact that this data design relies upon dynamic segmentation for spatial display. The example information for routes 41 and 8810 is stored as shown below:
CN (Character)
| BMP (Numeric)
| EMP (Numeric)
| LINEAR EVENT (Character)
| VALUE (Character)
|
41
| 00.00
| 00.50
| SURFACE
| ASPHALT
|
41
| 00.50
| 1.00
| SURFACE
| AGGREGATE
|
41
| 00.00
| 1.00
| JURISDICTION
| FS
|
41
| 00.00
| 1.00
| MAINT_LEVEL
| 4
|
8810
| 00.00
| 0.25
| JURISDICTION
| FS
|
8810
| 0.25
| 0.75
| JURISDICTION
| BLM
|
8810
| 0.75
| 1.50
| JURISDICTION
| FS
|
8810
| 0.00
| 1.50
| SURFACE
| AGGREGATE
|
8810
| 0.00
| 0.75
| MAINT_LEVEL
| 3
|
8810
| 0.75
| 1.50
| MAINT_LEVEL
| 2
|
"Native" ArcView QueryBuilder
If one attempts to use the native ArcView query builder with the Travel Routes table structure shown above, several significant problems arise for the end user and in terms of performance. For the development dataset consisting of slightly more than 106,000 Oracle records, the resulting VTAB was more than 60 MB in size. Generation time of this VTAB on a RISC 6000 C-10 server required more than 15 minutes of time. This time delay was deemed unacceptable for production purposes. In addition, the column names displayed within QueryBuilder are difficult to understand for casual users. Finally, generating a query which accurately resolves the inner table joins necessary to correctly retrieve the data from the VTAB is a very difficult task for infrequent users of the software.
Note that for a business specialist who understands the language of managing roads and trails, the only columns which might appear familiar are the BMP and EMP columns. While this table structure has the advantage of extensibility for the field users, it is not immediately obvious what information is actually contained in the table. More troublesome is the complexity of the query that must be built by the user in order to answer typical questions about Forest Service roads and trails.
Behind the scenes
The majority of Forest Service employees accessing roads and trails data do not access this information on a continual basis. The Travel Routes design team felt that expecting the day-to-day user to be able to remember and correctly formulate the above query structures was unreasonable. Based on that decision, the Travel Routes design team implemented a user interface which insulates the user from the highly normalized table structure used, while still providing the inherent extensibility desired.
This is accomplished in the following steps:
1) Prepare “de-normalized” view of RTE_LINEAR_EVENTS table for QueryBuilder.
2) Allow user to build query based on the de-normalized view of the date.
3) Convert user query to correct SQL query format for Oracle, including resolution of measure overlaps
4) Return results from Oracle
5) Apply ArcView query to resulting FTAB
6) Create appropriate ArcView theme based on query results
7) Offer user opportunity to store query for later retrieval.
Note that steps 1,3, 4, and 5 occur without user involvement. The user is only prompted for information required to insure theme(s) are built as desired by the user.
Converting the query provided by the user into the actual SQL necessary to accomplish the query against the normalized Travel Routes tables proved challenging. The actual queries involve inner table joins in Oracle in order to resolve the milepost overlaps and retrieve the requested information. Early in the development effort, the Travel Routes design team was told by experienced Oracle consultants that in order to write SQL logic to deal with all possible Boolean we would need to invest approximately three team months into that effort alone. The Travel Routes team decided at that point that we would attempt another path to providing a system that was inherently extensible without large programming effort yet which could be used in day-to-day Forest Service business for data retrieval.
Providing the QueryBuilder with event types
This task is accomplished by executing a query against the RTE_LINEAR_EVENTS table, seeking distinct EVENT TYPES and EVENT VALUES. Note that the list of event types and associated values displayed reflect the status of the current data stored in the RTE_LINEAR_EVENTS table. This approach is intended to provide the user with a current view of the data in their dataset, instead of a system where all possible event types and values are displayed. This will help prevent the user from inadvertently posting a query based on event types or values for which no records exist.
A user initiates the retrieval of the distinct event types and values by pressing the "Q" button shown in the upper right corner of the ArcView screen. Note that the area shown is the Deschutes National Forest in central Oregon, comprising roughly 1.25 million acres.
A user with any ArcView experience will be very comfortable defining the query desired. Note that all the standard parenthetical and Boolean controls inherent in the ArcView QueryBuilder are available to the user. The user (and ArcView) believe they are querying against a traditional denormalized database structure. The query requesting all roads in Forest Service jurisdiction with an aggregate surface type and maintenance level of 3 as specified in the QueryBuilder is shown in the screen shot below.
This process occurs without interaction by the user. This step is one of the most challenging to accomplish form the perspective of the Avenue coding required. This step includes a routine for parsing the query string captured by the QueryBuilder into applicable tokens, then converting the tokens into associated inner table join structures to correctly resolve the query (including event measure overlaps). Recall that attempting to correctly resolve the range of Boolean operations possible within Oracle alone within the constructs of our highly normalized table structure was deemed too daunting a task for our team. The samples shown in this paper return a column of information for each event type requested. At the time this paper was written, the Travel Routes team is looking into a solution which creates a VTAB which consists of one column for each distinct linear event type specified by the user. In either case, the query issued to Oracle species the AND where clause ONLY in building the SQL query. Extracting only the linear event types requested from the Oracle table greatly reduces the size of the VTAB. This reduction in VTAB generation greatly increases the thriftiness of response provided to the user. Originally our vision was to simply pull all linear event types into a VTAB, but that operation was consuming more than 20 minutes on a dedicated RISC 6000 server.
For the sample query specified above, the SQL query issued to Oracle appears below:
Because of the above-mentioned modifications, the SQL query stage is relatively fast, requiring on average under 15 seconds to begin returning results from an Oracle 7 table containing in excess of 106,000 records. Additional time is definitely required for the creation of the VTAB.
Resolving Booleans via a final query
The query string specified by the user is parsed and modified to match the column names generated via the SQL query to Oracle. Note that the user is not required to be aware of this operation. Once the column names are modified to match the column names in the FTAB, ArcView runs the query against the FTAB. Only records meeting the second query are actually joined to the original route system theme.
Obtaining information from the user to build the theme
If the SQL query returns no records, the user is so notified and the Travel Routes QueryBuilder stops. Otherwise, the user is presented with the following screen.
Originally the Travel Routes team thought that the SQL query results would be joined to the active theme. However, it became apparent in the prototyping of the QueryBuilder that it was easy to have an unintended theme active. After having invested the time to obtain the SQL query results, this can be a disappointment. Another option which could be implemented in the event that the active theme(s) join to zero records in the SQL VTAB is to inform the user of the lack of matches and provide another chance to select the intended theme. This modification may be implemented at a later date in our application.
In the current form of the Travel Routes querybuilder, we build a shapefile for all query results. We allow the user to name the shapefile. We are considering offering the user the option of seeing query results as graphics in ArcView only, since some users have stated they don't require the overhead of actual themes being built for each query result. For these users, the picture and thumbnail summary of query results are adequate.
The user is given the opportunity to place the resulting theme in a current view, or a new one. Assuming correct responses from the user in the preceeding steps, the new theme containing the query results is added to the appropriate view as shown below.
We are currently prototyping the storage of queries in an Oracle structure. These queries can be named by the user, stored, and retrieved at a later time. Thus a useful query can be defined once and users can retrieve them and execute them easily without needing to explicitly define the correct query each time. This approach has been in use within a roads database used in the western U.S. for a number of years and it has proved useful. The user can browse posted (stored) queries, store unposted queries, or delete posted queries from the Oracle tables.
Conclusions
The modifications to the standard query builder sequence defined in this paper are critical components to an easy-to-use tool for the retrieval of Forest Service roads and trails data. These modifications allow the Forest Service to store event attributes in a table structure which is extensible and flexible, allowing a single set of forms to support data entry of national standards as well as site specific information. Simultaneously, the end-user is provided a simple tool for the retrieval and display of information from this model. This should result in a far wider acceptance of the national data storage standards being defined under the Travel Routes effort, since users can store and retrieve information useful to them.
The concepts outlined in this paper will be formally tested within the Forest Service during 1997. If users accept the application and express satisfaction with the product, the Travel Routes application is slated for release nationally within the Forest Service in early 1998.
Acknowledgments
The authors wish to thank the members of the Travel Routes design team. The querybuilder process outlined in this paper is the result of ideas and efforts by all members of this team. In addition to the authors, the Travel Routes design team includes:
Debbie Caffin, Southern Region USFS
John Graney, Infrastructure project
Jack Lowe, Rocky Mountain Region USFS
Jay Mills, Infrastructure project
Laura Nance, San Bernadino National Forest USFS
Christine Nuttall, Infrastructure project
Carol Russell, Northern Region USFS
Avis Webster, Infrastructure project
Author Information
Ernie Bergan, Travel Routes Project Manager, Pacific Northwest Region Engineering
333 SW First Ave.
Portland, OR 97204
(503) 326-2559 voice
(503) 326-5745 fax
eebergan/r6pnw@fs.fed.us
Robert Varner, Pacific Northwest Region Engineering
333 SW First Ave.
Portland, OR 97204
(503) 326-6658 voice
(503) 326-5745 fax
rgvarner/r6pnw@fs.fed.us
Dave O'Hara, Rogue River National Forest
333 W. 8th St.
Medford, OR 97501
(541) 858-2351 voice
(541) 858-2220 fax
deohara/r6pnw,rogueriver@fs.fed.us
End Notes
Relational Data Base Management System
The Agency or party with jurisdiction over the management of a road or trail.
A description of the running surface of the route.
The relative level of maintenance effort a route will receive. In Forest Service roads usage, a level 1 road is essentially closed and is maintained to protect resources only. Level 2 roads are normally associated with harvest activities and are maintained for high-clearance vehicles. Level 3 through 5 roads meet Highway Safety Act standards.