AUTOMATION PROCESS FOR DEVELOPING VITAL HEALTH INFORMATION AT CENSUS TRACT LEVEL

Hsiu-Hua Liao, Paul Laymon, and Kirk Shull


ABSTRACT

Recent emphasis on streamlined government and health care reform encourages community leaders to search for innovative ways to effectively manage their regions of responsibility. Gradually, Geographic Information System (GIS) technology is becoming a recognizable tool to assist in the health field/community from intervention strategies to health care reform. One advantage of implementing GIS is to geographically locate personal health data through a geocoding process and to examine its spatial pattern over space domain. Geo-referencing personal health data will greatly enhance decisions made by public health officials; however, it complicates the burden of protecting the personal rights to confidentiality. One solution to the dilemma is to aggregate personal identities to a group of data where no identity will be revealed. In this process, vital health records were geocoded and aggregated to the census tract level. Data aggregation was accomplished through the Vital Health and Census Data Integration System (VHCDIS), an ArcInfo-based GIS automation system. The primary objectives for the process were to promote personal privacy, automate health data aggregation of geo-referenced vital records data, and improving national access to spatial health information.


INTRODUCTION

For centuries health researchers have been using spatial locations, boundaries, and regions to determine the quality, quantity, and migration of epidemics. Overlaying quantitative graphicsupon a map enables the viewer to realize potential information in an extremely clear manner. For example, the famous 1854 London cholera study conducted by Dr. John Snow has been hailed as the geographic benchmark for using maps in epidemiological studies.

Currently, the South Carolina Department of Health and Environmental Control (SCDHEC), Division of Biostatistics, presents spatial health information on the county level. County level data provides a wealth of information; however, at this macro-scale it is difficult for local health officials to adequately identify, analyze, and monitor health problems in a micro-scale or community level. Hence, in 1989, the Johnson Wood Foundation authorized a grant for the SCDHEC's Vital Record Geographic Referencing System (VRGRS) and the University of South Carolina's School of Public Health (USCSPH) to generate a feasibility study of geo-referencing vital records data for the purpose of assisting public health assessments, surveillance, and health hazard evaluations at the community level. The main objectives for VRGRS were: (1) to implement a program which encoded the geographic residential location for births and deaths, and apply a geographic information system (GIS) as part of the statewide vital records system; (2) to demonstrate the application of location data in association with the TIGER (topologically integrated geographic encoding and referencing) system of the federal census of 1990; and (3) to design and document the process in a way to facilitate expansion to complement a statewide geographic information system for economic development. The VRGRS project outcome ultimately determined the processes, scientific techniques, and data were suitable enough to implement an informal GIS program within the Division of Biostatistics. Hence, in 1994 staff and equipment were selected to carry on the objectives of VRGRS and to establish the means to systematically geo-reference vital health data collected and stored at the Office of Vital Records.

Geo-referencing provides an opportunity to examine health data and how it will distribute over spatial domain; however, this also raises the issue of confidentiality. When the geographic resolution of data is fine enough to identify less than four addresses, the data are no longer tools of research, but tools to target and expose individuals. (Alpert and Haynes, 1994) The protection for inadvertent disclosure of individuals, households, establishments or primary sampling units, especially in public use databases, is a concern of government health agencies. Even though confidentiality policies may vary among agencies, they must reflect the laws and regulations imposed upon personal data collection and dissemination activities (Croner et al., 1996). To date, there is not a minimum national threshold standard defining public or professional access to spatial reference public health data.

In an attempt to promote spatially-referenced public health confidential standards, the South Carolina Division of Biostatistics GIS Lab focused on the third objective of VRGRS, "to develop a statewide health information system capable of satisfying the wide range needs of health researchers." To develop such a system, the Biostatistics GIS Lab needed a geocoding system capable of converting large volumes of data with acceptable match rates. After a series of tests which included: quality, cost, and turn around times, Geographic Data Technologies (GDT, 1997) from Lebanon, New Hampshire was chosen as the geocoding system.

Once the vital records health data were converted into individual points, the issue of confidentiality was solved by aggregating the data to the 1990 census tracts. The census tracts were chosen for the following reasons: (1) census tracts contained a volume of socioeconomic data, thus the aggregate vital records attribute information could be combined with the existing socioeconomic census data (e.g., mother's age extracted from the vital records would be stratified into the same categorical breakout as the female populace of the tracts, allowing calculation of statistical rates) and (2) geographic boundaries are updated once every decade.

Working with voluminous vital record files proved to be tedious and time consuming. To streamline to process of generating pubic health data from these records, The Vital Health andCensus Data Integration System (VHCDIS) was developed. In designing the system five requirements were determined: 1) It must be flexible enough to be continuously improved; 2) be a time saver; 3) establish a national precedence for collecting health data; 4) standardize data output; 5) and, accurately aggregate health data to predetermined political boundaries, (in this case the census tracts).

In its completed form the VHCDIS offers national and local programs the ability to join aggregate vital records health data with existing socioeconomic census data as a tool for their respective surveillance and intervention strategies. The remaining point data derived from the geocoding process, which are treated with all the confidentiality of a paper certificate, are stored on magnetic device for future use in very high resolution studies.


BACKGROUND

Vital Health Statistics

Vital statistics for the United States are obtained from the official records of live births, deaths, fetal deaths, marriages, divorces, and annulments. These data sets have long been used as statistics measuring devices to identify qualitative and quantitative public health issues. The official recording of these events is the individual responsibility of each state within the nation and independent registration areas (District of Columbia, New York City, and territories). The Federal Government, without expressed constitutional authority to enact national vital statistics legislation, relies upon the states to establish laws and regulations to provide compatible methods of registration and data collection. (NCHS, 1995).

As public health issues continue to become more and more complex, demand for better vital statistics information increases. For this reason updating data collecting, recording, and processing techniques to keep aligned with the rapidly evolving need becomes an increasinglyimportant part of the vital statistics program. Improvement began in the 1950's with increased attention was placed on improving the quality of vital statistics data to make them more useful, and accessible. Interest in vital expanded when the state and Federal health and welfare officials began to look for pertinent and reliable statistics on which to base their political decisions. The registration certificates assumed a new role of importance as they were used as a source of credible national vital statistics for use by all levels of government, institutions, and the general public. The content of the information collected for vital records was expanded and methods to improve its quality and usefulness were added, as health and social issues became more complex, supplemental data sources were developed to augment and enrich the information obtained from the registration system.

Throughout the years the process of producing national vital statistics has shifted several times from one organizational unit of the Federal Government to another. In addition to the National Center for Health Statistics (NCHS) and the National Association for Public Health Statistics and Information System (NAPHSIS) have become recognized as icons for handling health statistics and the associated information systems. NAPHSIS was organized to study and promote all matters relating to the registration of vital statistics. The 1995 revision of the association bylaws states (NCHS, 1995): "This Association will foster discussion and group action on issues involving public health statistics, public health information systems, and vital records registration. The Association will provide standards and principles for administering public health statistics, public health information systems, and vital records registration. The Association will represent the States and Territories of the United States regarding these issues, and will serve as an advisory group to the Association of State and Territorial Health Officials."

With the increasing complexity of public health issues, the federal and local health programs need to improve the process of collecting, storing, analyzing, and displaying community levelepidemics. For this reason, future focus on quality spatial vital records data will continue to grow at an exponential rate. Likewise, vital records recording program tasked to increase the accuracy of vital statistics will continue to explore the development of new technologies rethinking the use for these valuable resources.


Public Health and Geographic Information System (GIS)

Geographic information systems technology has gradually been recognized by public health researchers as a powerful tool to analyze health data. It provides an opportunity to integrate at least six disciplines (epidemiology, environmental health, geography, cartography, computer sciences, and statistics) to study the distribution and possible causes of diseases in population, and target interventions to improve the health of the population (Feinleib, 1997). Applications of GIS in the health field vary from the simple automated mapping of epidemiological data (Pyle, 1994), to the sophisticated analysis of satellite images to demonstrate vector/environment relationship (Hugh-Jones, 1989; Malone, 1992; Perry, 1991; Rogers, 1991).

The simplistic paradigm for implementing GIS technology in public health can be viewed in three phases: data source identifications, GIS support system, and health planning (Figure 1). In the data source identification phase, data sources applicable to your cause are selected and converted into digital geography or "coverages." The data sources used in the Division of Biostatistics are: environmental health hazards, health services, and socioeconomic and health data. Environmental health hazards can be defined as any data pertaining to an environmental situation that may have a negative impact on the surrounding population. Health services are data that identify sources of health correction. And, for the scope of this paper socioeconomic and health data can be defined as data collected for the purpose of monitoring, tracking, and identifying social and health trends. Once these data sources are converted into digital coverages, they can be stored, manipulated, analyzed, and displayed in a GIS. This collection ofstandardized data becomes the foundation for the third phase of health GIS implementation, the health planning phase. In this phase, the GIS becomes the knowledge base for analyzing health outcomes and supporting public health surveillance, where a diverse group of scientific disciplines converge to direct and discover local level health objectives.

Figure 1. Paradigm for Implementing GIS in Public Health.

As the Center for Disease Control (CDC) defines public health surveillance (CDC, 1988): "The ongoing, systematic collection, analysis, and interpretation of health data essential to the planning, implementation, and evaluation of public health practice, closely integrated with the timely dissemination of these data to those who need to know. The final link of the surveillance chain is the application of these data to prevention and control. A surveillance system includes a functional capacity for data collection, analysis, and dissemination linked to pubic health programs." Public health surveillance evolves with changes in science and technology. With the advent of computers, health surveillance has transformed from a primarily historical function to one which promotes timely analysis of data with appropriate responses to given health outcomes. Historically, the quality and quantity of data over space domain were illusive and difficult to interpret. Today, using GIS technology as a tool we can streamline the processes needed to promote health and protect our environmental surrounding.


VITAL HEALTH DATA

In the State of South Carolina, vital health data are collected through official documents filed with the Office of Vital Records and Public Health Statistics and Information System (PHSIS) within the Department of Health and Environmental Control (DHEC). Each year, the Division of Biostatistics publishes reports on vital statistics data for South Carolina live births, deaths, fetal deaths, marriages, divorces, and annulments that occurred during the previous year. These vital statistics are also available in public accessible format for public use. In the case of a specialrequests for data, files and reports are generated and distributed by the Division of Biostatistics to those users who desire analysis different from that which are normally published.

VRGRS justified the need for using GIS technology to improve the Division of Biostatistics capability of analyzing vital records data at an increased spatial resolution. Ultimately, this new technology functions around the process to geocode temporal vital records residence data of births and deaths in the State of South Carolina. All births amongst residents were included regardless of the state of occurrence. While South Carolina occurrences to non-residents were excluded. To support thematic mapping and GIS analysis, an attribute file identifying critical information about the birth and death event was generated and linked to the point by means of a common identifier.

Birth Data

In 1991, South Carolina began using a microcomputer software, Electronic Birth Pages (EBP), to improve the process of generating birth certificates and collecting newborn data for laboratory screening. The main function of the EBP system involves data entry and production of birth certificates. The end product is referred to as `EBC' (Electronic Birth Certificates).

In order to generate spatial information from vital records data, the residential address file is extracted from the mainframe data set using SAS. Variables used in geocoding includes: identification number, residential street address, city, state, zip code, and 4 digit zip code extension These data undergo a system of quality control to identify completeness and accuracy of the address information. To complete the data set, an attribute record is captured as well. For example, the attribute file used for births includes: identification number, county federal information processing standards (FIPS) code, age of mother, attendant at birth, birth weight, education level of the mother, month prenatal care began, number of prenatal care visits, race of the child, race of the mother, sex of the child and year of birth. These chosen attributes werebased on requests made by Health Districts and the Division of Epidemiology in SCDHEC. The attribute file was imported into ArcInfo for data aggregation process. Each variable was aggregated to census tract level by race group (total, White, Black, others, and unknown). Table 1 shows the classification of each variable.

Death Data

Death data were collected through death certificates filed by funeral homes. The funeral director, or person acting as such, is responsible for the completion of the death certificates, including all of the personal information from the family, and the medical portion of the certificate. This certificate is then sent to the county health department where it is screened for completeness. If the certificate is acceptable at the county level, the health department will forward the certificate to Office of Vital Records in SCDHEC. The certificate is again checked for completeness, then personal data are coded and stored in the database.

For the residential address file, variables used in geocoding process were synonymous with the birth data. For the attribute file, variables were temporally selected and causes of death were grouped into disease and non-disease type (Table 2). Causes of death are classified for purposes of statistical tabulation according to the International Statistical Classification of Diseases, Injuries, and Causes of Death (ICD-9), published by the World Health Organization. In this process, only the underlying cause of death was selected for data aggregation.


GEOCODING PROCESS

Geocoding is the process of linking a common location identifier such as address, site location or building to a spatial and geographic database, such as Census TIGER/Line files, which contains the locations of streets, the ranges of addresses found on each street segment, and the boundaries of political and administrative areas. Since the geographic database containsaddress ranges defining the beginning and ending address numbers that were assigned to a given street segment, coordinates (i.e. latitude and longitude) for any specific address location can be found through a linear interpolation of the address number between the starting and ending address numbers assigned to the segment. Once the correct location is assigned, a location identifier is given a map coordinate and becomes a permanent geocode.

In SCDHEC, the statewide geocoding service is conducted by Geographic Data Technologies (GDT). Data were address matched to the DynaMap 2000 series, which is a base map used and generated by GDT for the purpose of address matching (GDT, 1997). Table 3 shows the data generated along with the original address by GDT as the summary and report on the geocoding procedure.

Address Standardization

In order to improve address accuracy, and increase the geocoding match rate, a commercial software, StreetRite, is used to check and correct resident addresses. StreetRite compares the residual addresses against a database of every mailable address in the United States, deciphers inaccurate or incomplete address (e.g., misspelled street names and missing zip codes, cities, and states) and replaces them with the correct data. The addresses StreetRite is unable to match are then manually checked, and if an accurate match is found, the address is corrected.

Errors Sources of Geocoding

Geocoding is a process of matching an address to a geographic location. The quality of the geocoding process is referred to as the geocoding match rate. An accurate geocoding match process depends on the quality of the address data and the geographic data. There are some errors inherit to the process, and in many cases it is difficult to determine how accurate the results are. It is important to document the potential error sources and understand how they could affect the quality and results of geocoding. The following are factors identified during ourgeocoding process that could affect the match rate.

A. Accuracy of address

In geocoding vital health records data, the initial error is introduced when the individual or family providing information to the medical official are not made aware of the differences between mailing and residence addresses. Mailing address quite often are the PO Box at the local Post Office, and the residence address is the street and street number the individual resides at. New addresses created in the calendar year that do not exist in the current street/road database will also reduce the geocoding match rate.

B. Address allocation

The geographic data used in the geocoding process contain a wealth of information about street locations, address ranges, and related information, but they are by no means complete. In urban areas, the percent of street segments that contain address ranges may be as high as 90 percent or above. However, some rural areas do not contain any address ranges. Therefore, the geocoding matching rate will depend upon the study area.

C. Assigning geographic location

Geocoding is a comparison of each address in an event table to the address ranges in a target address database. When an event address matches the address range of a street segment, an interpolation is performed to locate and assign real-world coordinates to the event. For example, given a line with end point values of 0 and 100 and a street address of 50, the location of the address is estimated at the line's midpoint. However, the actual street address may not be located at the midpoint of the line segment. During the aggregation process there is the potential for a small percent of geocoded data to be captured in the wrong polygonal boundary. For instance, in Figure 2, tract number one will be assigned an erroneous value.

Figure 2. Illustration of Potential Error from Assigning Geographic Location


AUTOMATION SYSTEM AND PROCESS

System Design

The primary goal of developing the Vital Health and Census Data Integration System (VHCDIS) is to aggregate vital health data to census tract level and generate public accessible database files. The system is designed to interact with user for generating aggregated information of different public health data. Figure 3 illustrates the general architecture of the VHCDIS. At this point, the system handles only birth and death records. In the future, as the need of geocoding health data increases, more components will be added to the system to handle different health information (such as cancer registry data).

Figure 3. Architecture of the Vital Health and Census Data Integration System.

System Resource and Implementation

The VHCDIS was developed using a GIS software, ArcInfo (Esri, 1996) on both Unix and NT platforms. A system supervisor in the form of an X-window graphical user interface (GUI) was used to provide user access to all the various components (birth data, and death data) described previously. The GUI provides an interactive environment that facilitates user access to the components, selection and execution of selected options. As described below, user navigation of the entire process is accomplished by appropriate selection from the window menu-choice.

Table 4 summarizes the various steps involved in the automation process. The user first loads the system by opening and running the ArcInfo software. At this point, the user is looking at the picture shown in Figure 4. In this example, we will generate birth information using the system. Therefore, the user can point and click on the icon to select the Birth Certificate. As shown in Figure 5, twelve options are available for generating aggregated information of live births, low weight live births, and very low weight live births on census tract level. The classification of each category is shown in Table 1. The user will continue to make selectionaccording the information he needs. For example, to generate live birth information by using the mother's age and race, the user will click the corresponding Select button to continue (Figure 6). In this menu, user needs to supply two data files: birth data file (with all of information shown in Table 1 and census county-tract number), and census tract data file (with census county-tract number only). When selecting the data file, a pop-up window will appear to help the user make the selection from existing data files. After specifying the data file, user can select field item names for each parameter (mother's age and mother's race from birth data; county-tract number from census data). The user should provide an output file name (lb95race.dbf) and define item name for each classification shown in this menu. Since to define item's name one by one is cumbersome, user can select USE-DEFAULT button to use default name for each item. After this, user can click DONE to continue the generation process. When it finished, click CANCEL to return the previous menu (Figure 5) to select other parameters for data aggregation. Outputs from the aggregation process are dBASE files (.DBF) which can be imported to other database software or ArcView to review.


Figure 4. Screen Capture of the Main Menu of the Vital Health and Census Data Integration


Figure 5. Screen Capture of Birth Information Aggregation Menu.


Figure 6. Screen Capture of Generating Information of Live Birth by Mother's Age and by Race.

In this example, an output file (lb95race.dbf) was generated for live birth by mother's age and by race at census tract level. To display the information, the user first runs the ArcView software, open an graphic window, and add the census tract coverage (or shape file) as a new theme. The user then needs to add the output file (lb95race.dbf) as a Table file and open the census tract attribute table. After opening the two tables, the user can perform spatial join function to join both files and select different classified information to display (Figure 7).

Figure 7. Screen Capture of Output Data Display in ArcView for Live Births by Mother's Age and by Race.


CONCLUSION AND ONGOING PROCESS

Geographic information system technology is emerging as a useful tool in public health studies. The technology allow for storage and manipulation of large and multi-faceted data sets,while maintaining the spatial integrity of each data collection or reporting location. As such, the technology gives rise to intensive investigation of the spatial relationship between variables and outcomes necessary to health risk assessment. Therefore, the key to successful application of GIS technology in the public health field is to understand what functions of a GIS should be used, what are the limitations, and how we should apply it appropriately to benefit research and assist officials with intervention strategies and health prevention.

In South Carolina, vital health data are collected each year. The need to access spatial health information is increasing as public health officials and researchers see the importance of analyzing spatial pattern of vital health data. Hence, creating a system to assist in standardizing data transformation from individual geocoded confidential health data to non-confidential data is needed.

This paper described an interactive data integration system, Vital Health and Census Data Integration System (VHCDIS), developed and designed through the use of GIS technology for transforming geocoded confidential health data (birth and death) to non-confidential census health information. Vital health data were linked to census data through geocoding process. By aggregating geocoded vital health data to census tract, the output from VHCDIS will be public accessible and can be analyzed concurrently with other existing census socioeconomic data.

Information in this report describes a project with the primary goal of developing a system to assist ongoing, and systematic collection of health data and disseminate these data to public health officials and researchers for planning, implementing, and evaluating public health practice. Currently, SCDHEC is using the system to develop census health information from South Carolina birth and death data and will continue the process in the future. For the VHCDIS, many improvements and extensions are still underway. For example, the system is being extended to accommodate infant death data, and include cancer registry data next year. Additionally, the system can be extended to census block group levels, and possibly to census block levels. In general, the VHCDIS in its present form is a sufficiently realistic demonstration of the flexibility of GIS technology and the ability to handle large volumes of health data and the aggregation process within the GIS environment.


REFERENCES

Alpert, S., and K.E. Haynes. 1994. Privacy and the intersection of geographical information and intelligent transportation systems. Proceeding of the Conference on Law and Information Policy for Spatial Database. Temple, Arizona. October 28-29, 1994. Pp. 198-211.

Centers for Disease Control. January 1988. CDC Surveillance Update. Atlanta, GA.: CDC.

Croner, M.C., J. Sperling, and F. R. Broome. 1996. Geographic information system (GIS): new perspectives in understanding human health and environmental relationships. Statistics in Medicine, vol. 15. pp. 1961-1977.

Feinleib, M. 1997. The use of computer mapping in monitoring the nation's health. International Symposium on Computer Mapping in Epidemiology and Environment Health. April 1997. pp. 1-3.

Geographic Data Technology. 1997. Dynamap/2000 7.2 User manual. Lebanon, NH.

Huge-Jones, M. 1989. Applications of remote sensing to the identification of the habitats of parasites and disease vectors. Parasitology Today. vol. 5, no. 8, pp 244-251.

Malone, J.B., D.P. Fehler, A.F. Loyacano, and S.H. Zukowski. 1992. Use of LANDSAT MSS imagery and soil type in geographic information system to assess site-specific risk of fascioliasis on red river basin farms in Louisiana, reprinted from Tropical Veterinary Medicine: Current Issues and Perspectives, vol. 635 of the Annals of the New York Academy of Sciences, 389-397.

National Center for Health Statistics. 1997. U.S. Vital Statistics System: major activities and developments, 1950-95. U.S. Department of Health and Human Services, Center for Disease Control and Prevention. Hyattsville, Maryland. DHHS Publication No. (PHS) 97-1003.

Perry, B.D., R. Kruska, P. Lessard, R.A.I. Norval, and K. Kundert. 1991. Geographic information systems for the development of tick-bone disease control strategies in Africa. Prevent. Vet. Med. 11: 261-268.

Pyle, G.F. 1994. Mapping tuberculosis in the Carolinas, Sistema Terra, vol. III, no.1, pp. 22-23.

Rogers, D.J., and S.E. Randolph. 1991. Mortality rates and population density of tsetse files correlated with satellite imagery. Nature, 351: 739-741.


Author Information


Hsiu-Hua Liao
GIS Manager
Division of Biostatistics
South Carolina Department of Health and Environmental Control
2600 Bull Street
Columbia, SC 29201-1708
TEL : (803) 734-4792
FAX : (803) 734-5131

Paul Laymon
GIS Coordinator
Division of Biostatistics
South Carolina Department of Health and Environmental Control
2600 Bull Street
Columbia, SC 29201-1708
TEL : (803) 734-0884
FAX : (803) 734-5131

Kirk Shull
GIS Analyst
Division of Biostatistics
South Carolina Department of Health and Environmental Control
2600 Bull Street
Columbia, SC 29201-1708
TEL : (803) 734-0885
FAX : (803) 734-5131