Shuqin Jin and Jim Freeman

Data Integration and Automation in Polk County, Florida:

Transportation Applications and Lessons Learned


ABSTRACT

The purpose of this project is to integrate AutoCAD, Integraph, dBase, Q&A, TIGER, and ArcInfo data using ArcInfo and Conflation programs. The objectives of this study are as follows: (1) to research available data and databases; (2) to evaluate and collect useful data; (3) to convert all non-ArcInfo data to ArcInfo data; (4) to check and correct errors using conflation programs; (5) to create road maps of Polk County; (6) to do some analysis and statistics using ArcInfo, Dynseg, Network, and databases; (7) to conclude lessons learned during this study; (8) to customerize applications for the nontechnical community by creating a user-friendly interface.


INTRODUCTION

Polk County is located in central Florida. It is the eighth largest Florida county in population, and the fourth largest in size, with 2,048 square miles. The County has about 30 divisions and 1,700 employees.

The County has many different kinds of transportation data and databases. We have AutoCAD, Integraph, dBase, Q&A, TIGER, and ArcInfo data. They all have different accuracies and naming conventions. This project is to integrate the best qualities of each data set to create an improved common core feature data set, to develop a better database for future data sharing, and to set up some standards countywide. We did a lot of research about available transportation data and databases, collected useful data, converted non-ArcInfo data to ArcInfo data, checked errors and accuracies, put them together, and did some applications. We wrote some AMLs and MENUs to customerize our applications for the nontechnical community. Up to now, we are still working on this project and learned a lot. We would like to share what we learned with people interested and hope others can avoid similar problems and design better GIS systems in the future.


APPLICATION DEVELOPMENTS

1. Data and database research

Data and database research is always very important for any project. It might save you a lot of time and makes your project much faster. Our available data are AutoCAD, Intergraph, dBase, Q&A, TIGER, and ArcInfo formats. We first studied all the data structures, graphic information, and databases. AutoCAD and Intergraph data only have graphic information, and do not contain tabular data. Q&A and dBase are databases. There are a lot of information associated with these databases. TIGER and ArcInfo data have both graphic information and tabular data. We made a data inventory from what we found.

2. Data evaluation and collection

After we did our data and database research, we evaluated all the data. We checked data formats, data accuracies, data status and sources, etc. Not all of the data are useful for our transportation applications. We only collected useful information as following: Intergraph data in DXF format, dBase and Q&A databases as ASCII files, TIGER files, and ArcInfo coverages.

3. Data conversion and conflation

All the data collected can be converted into ArcInfo formats, as long as you know how to do it. We tried several methods and different options in order to make sure that there will be no data or information lost. After a lot of testing, we converted DXF and TIGER files into ArcInfo and ASCII files into INFO databases. We copied over road- numbers (a unique number for all roads within the county) from databases using road-names as "links". This saved us a lot of time and automated our data processing. One of the most important things to remember is the INFO database definition. If the definition is not proper, it might result in some problems for queries and displays. For example, you can not do scientific calculations on "character" type data.

Conflation is a relatively new concept. It is used to align two coverages and transfer the arc attributes from one coverage to corresponding arcs in the second coverage. If conflation is used properly, it may better automate your project. For example, address ranges, street names, and other valuable census data can be 'conflated' or transferred from TIGER data to ArcInfo data. We tried both ArcInfo "conflation" tool and our own conflation programs, and we are still working on it. We transferred a lot of valuable information using "conflation" tool and programs.

4. Data assembling and error checking

Our next step is to check all possible errors and put them into bigger coverages if necessary. The possible errors include label errors, node errors, intersections, dangles, polygon closing, slivers, accuracy, etc. All errors found should be corrected before using the data. One of our final products is a complete county road map. This map will be used by different county agencies as the standard road map. We do not have a complete county road map yet, and we are still working on it.

5. Applications

We chose one township as our "pilot" study area and did our applications for this township. One application is a MENU for updating road information. The MENU is similar to AutoCAD's MENUs. We designed this MENU mainly because most of our technicians are used to AutoCAD's MENUs. This will make our MENU easier to understand and use. It includes most edit functions and has some customerized functions, such as township zooming, previous mapextent zooming, selection by road-number or road-name, express editing of dxf-layers, colored background road maps (the same as colored AutoCAD road maps), automated node editing, etc. This MENU will be used for future road information maintenance.

Another application is a query and display MENU about all road information. The main functions of this MENU are as follows: color map display, query of various road information (road-number, road-name, road-class, pavement, unpavement, date), statistics information and reports, routing (using Dynamic Segmentation), networking (using Network), etc. The "network" function can show you the best or shortest route for your stops and the best order of your stops for your "tour". The results can be saved to "stops file" (this file defines the locations of the stops, and includes items defining the properties of the stops and the output destinations) :

STREET_ID  IN_ORDER  ROUTE_ID  OUT_ORDER ......
     3        1           2        3
     6        2           2        2
     9        3           3        1

For routing and networking, the necessary information (or items) have to be added before doing any applications. This MENU will be used to retrieve information from coverages and databases and view it on screen.

The MENUs and AMLs developed will automate our data processing and data maintenance. We are developing a new MENU to plot what you viewed on your screen, and we will put them together into one MENU (DISPLAY-QUERY-PLOT MENU). It will be a kind of "all-in-one" MENU for end users.

We did all our applications for our "pilot" study area. The "pilot" study is very important for almost all applications and projects, because it can test your database design, test your automation plan, and gain supports for the implementation.


LESSONS LEARNED

Although we have not finished our project and applications yet, we have learned a lot. The following is some of the lessons we learned:

1. System and database design

System and database design should be done up front. Otherwise, it might take you much more time. The basic steps are needs assessment, conceptual and logical design, physical design, automation plan, "pilot" study, and final implementation. We used a lot of already existing data because we can copy over a lot of information and the data were there before we used ArcInfo. Also, there was no system and database design. This created some problems, causing us to take more time to complete our project.

2. Data maintenance

Data maintenance is as important as system and database design. If you do not have a plan for data maintenance, your project might fail or your GIS system might become useless as time goes on.

3. Data standards

Data standards are especially important for large organizations. If you have a large organization and do not have data standards, data sharing and exchange might be a significant problem in the future.

4. Data conversion

Data conversion should be avoided as much as possible, because data conversion always has some problems. If you have to use data conversion, you should study your available data thoroughly before conversion.

5. System management

System management is important too. Data should be centralized, but not the management. Most of the time, more "voices" are better than one. Just remember, do not go to the other extreme: too many discussions and no solutions! It is difficult to say what is the best management strategy, but it is important.

6. "Pilot" study

A "Pilot" study is always useful. It might take some time, but it will save you time and make your system better in a long run. It will not only test your database design (functionality, performance, flexibility), test your automation plan (procedures, validity, and system), and gain support for the full GIS implementation, but will also test the applications design, the hardware and software configurations, and the organizational and administrative procedures.

7. Objectives selection

Objectives selection should address significant organization/agency issues and GIS goals. There are two kinds of objectives: vertical and horizontal. They should be narrow in scope. Your applications will be developed properly if you have the right objectives. Sometimes, applications are only limited by your imaginations.

8. Others

There are some other issues which may be important for your system designs or applications. To name a few, they are: team work and cooperation (especially for government agencies), quality control, target audience, credibility, etc.


CONCLUSIONS

Data integration and automation are key issues for any GIS projects or applications. We only did transportation applications, but the principles are the same for other applications. With the right system design, the necessary resources, and consistent efforts, goals should be reached finally. We are still working on our project and applications. We think our objectives and goals will be accomplished in the near future.


FUTURE APPLICATIONS

Our future applications are to create a data dictionary and to document what we did. The data dictionary is to store all data names and structures. The documentation will be for future references.


ACKNOWLEDGMENTS

The authors would like to thank Greg Forsthoefel of MIS Division for Internet assistance; Ralph Meller, Doug Fraker, and John Brooks of Transportation Division for data provided and cooperation; Xiaohui Wang of Planning Division for prove-reading; City of Lakeland and the Polk County Property Appraiser for suggestions and information.


Shuqin Jin, GIS Analyst
Jim Freeman, Director
Polk County - MIS
Drawer AS04
P.O Box 9005
Bartow, FL 33831-9005
Telephone: (941) 534-4350
Fax: (941) 534-4398
E-mail: shuqin@geol.niu.edu