Bruce A. Ralston

Web-Based Public Health Analysis with MapObjects and SAS

ABSTRACT.Public health officials must look at a variety of factors that contribute to society's health and welfare. Two useful tools for exploring the relationships between variables are Web-based GIS and statistical analysis. In this paper we discuss how to use MapObjects along with SAS to build a powerful public health family of Web sites. Using MapObjects thematic mapping and geocoding features we are able to implement area-based and point-based analyses.


Public health officials must deal with many types of information at several different scales. These include socioeconomic data based for areas, such as census data for counties, tracts, block groups and blocks, and case level data for disease incidence, births, deaths and other aspects of individual health. In addition to these different scales of data, there are differences in confidentiality. Census data for areas is public domain, while case level data must be protected in order to insure confidentiality. A further attribute of case level data is that there can be very large numbers of observations.

There are many potential users of public health data. These include researchers, public health officials at national, state, and local levels, and interested citizens. Making such data available in useful formats while protecting confidentiality is a complex task. Often case level data must be aggregated for presentation, census data should be at an appropriate scale, and statistical analyses are often required. Building and maintaining a web site to accomplish these tasks has been the task of the Community Health Research Group at the University of Tennessee. Headed by Dr. Sandra Putnam, the Health in Tennessee, or HIT, project consists of a cluster of web sites that allow users access to public health information. During the past two years, this author, along with several graduate students, has worked to integrate web-based GIS using MapObjects and statistical analysis using SAS. Together the efforts of the team under Dr. Putnam's direction have increased the democratization of data by giving anyone with access to the web access to various analyses of public health in Tennessee.

HIT OVERVIEW

The Health in Tennessee, or HIT, web site actually consists of several web sites that deal with different databases (Figure 1). In addition, the analyses available can take place on different platforms. In particular, the statistical analyses are based on SAS programs that run on UNIX machines, while the GIS analyses are accomplished using MapObjects and Visual Basic on an NT platform.

P6021.gif (37510 bytes)

The HIT site has links to several different kinds of information, including fixed tabular data (tables), reports in pdf format (reports), the Statistical Profile of Tennessee (SPOT), and TNKIDS--a site dedicated to children's health issues. In addition, there is a link to the GIS page (HIT MapMaker).

Initially, HIT MapMaker was developed as a standalone web site. That is, initially it did not interact with the statistical analyses of SPOT or TNKIDS. A fixed database of census data and the results of selected analyses from SPOT were housed on the NT machine that hosted the MapObjects program that generated the maps for web based GIS. This relationship is shown in Figure 2.

P6022.gif (9681 bytes)

SPOT consists of a several different types of health data, including information on mortality, births, population, hospitals, nursing homes, schools, and motor vehicle crashes. In addition, census data tables and survey results are available (Figure 3).

P6023.gif (27191 bytes)

Rather than provide fixed views of the data in the SPOT databases, procedures were developed that allow the user to query the data available for customized analysis. The user fills out a form and then submits it to the server. A PERL script reads the form and generates a SAS program for producing the requested tables and charts on the fly. SAS executes and the results are presented to the user. In this way, each user can create a custom set of outputs on demand. Figure 4 show a typical SPOT form, this one for births by the characteristics of the mother.

P6024.gif (39137 bytes)

A similar approach is used with TNKIDS' SCORE data. A user creates a request and the proper tables or charts are created. The requests are all processed by a PERL script that generates the necessary SAS program, executes a system call to SAS, and then pushes the results back to the user. Figure 5 shows a typical chart output from the SCORE dataset.

P6025.gif (29431 bytes)

HIT MAPMAKER

Our original web based GIS used programs written in VB using Esri's MapObjects ActiveX control. The mapping program would receive inputs from the client and generate the requested map. The inputs were based on a form, similar to those used in HIT. The user could choose the level of analysis, the thematic variables, the methods of thematic classification (eg, quantiles, equal intervals or standard deviations), ways of overlaying themes (shading, hatching and dot density mapping), and several background layers. The resultant web page consisted of the map, tools for pan, zoom, and identify, a legend and an identify window with hyperlinks to related pages for selected features (Figure 6).

P6026.gif (53813 bytes)

This approach allowed users to generate maps from a fixed data set and interact with those maps. However, the HIT MapMaker site was not well integrated with the other parts of the HIT family, most notably SPOT and TNKIDS. Further, the use of fixed data sets did not fit the design philosophy behind HIT. In particular, we want to give the users the opportunity to build custom data sets for mapping, just as we allow them to do custom statistical analyses. This has been our most recent goal in expanding the web based GIS capabilities of our site (along with expanding the number and types of databases and adding different levels of spatial resolution.)

The first hurdle was passing data from our UNIX site to our NT site. When a user fills out a form for running a SAS analysis, we allow him/her to send the results from the UNIX system to the NT server. This is done with a behind-the-scenes ftp operation. The PERL script that generates the requested SAS program also generates a system call to ftp. This sends the results of the SAS analysis to the mapping server. The script also calls a program on the mapping server (the NT box) to start up the interactive mapping. An interface similar to that used in the fixed data set is presented to the user.

While the ftp of a dataset generated from SAS to our VB/MO program worked, it was not entirely satisfactory. The main problem is that SPOT and TNKIDS consist of many databases and users might want to map the results based on queries from several of these databases. This presented a few challenges. One approach we considered was to assign each user a temporary workspace. All the data tables a user generated during a session would be assigned to that workspace. When the user was ready he/she could choose to map the results. However, we felt this approach presented some security and file management problems. It was decided that the user could generate a data set and that data set would stay current so long as the user's session was current. If he/she left, their data set would be deleted.

A second problem dealt with merging the results from queries of several databases. The merging of tables could be accomplished either on the UNIX side or on the NT side. In order to minimize the number of files that would be sent from UNIX to NT, we decided to merge them all on the UNIX side.

The final--and related--problem was how to support queries of multiple databases in one session. In the earlier versions of HIT, the user chose a database to query, filled out a form, and a SAS program generated the results (see Figures 3 and 4). However, we now wanted to give the user the ability to create queries for multiple databases in one session.

We used the following approach to accomplish these tasks. The user is asked which databases (types of information) they wish to study. Based on their requests, a PERL script generates input forms for each type. Database queries are color coded to reinforce the idea that each form corresponds to a given database (Figure 7). Once the forms are filled out, the resulting SAS programs are generated and their results are merged into a single file. The file is then sent to the HIT MapMaker server, along with a file of all the variable names being sent. A setup form for the mapping program is then generated on the fly containing the new variables that can be mapped (Figure 8). Once filled out, the resultant map is generated.

P6027.gif (45830 bytes)

P6028.gif (46735 bytes)

SOME FINAL COMMENTS

We have learned quite a bit in building this web site. In particular, we have learned that it is possible to combine statistical analysis, web page management, and MapObjects based mapping programs to allow for dynamic generation of data tables to be mapped. The ability to write our own mapping program using MapObjects gives us a great deal of flexibility when designing our web site workflow. Combining this with dynamic analyses means that we are not imposing limits on the user in terms of the combinations of data to be analyzed. While views of fixed thematic layers are often quite useful (and we do support these, too), the ability to generate queries, data tables, and resultant maps on the fly is extremely powerful. What is required to build such a site are skilled people who feel comfortable working in the worlds of GIS, VB programming, SAS, and PERL. We are fortunate to have a team of dedicated students eager to learn and apply these skills. Without them this project would not have been possible.

There are many people who work on this project. I would like to acknowledge the efforts of those with whom I have worked most closely: Xiaohong Xin, Karen Burhenn, Shuhua Hu, and Mo Chatterjie. The support, guidance, and leadership of Dr. Sandra Putnam, along with the support of the Tennessee Department of Health, are also acknowledged.