The Utility of
Geographical Information Systems (GIS) and Spatial Analysis
In Tuberculosis
Surveillance in Harris County, Texas, 1995-1998
Matthew L. Stone
ABSTRACT
The main purpose of this research study was to examine the
spatial distribution of tuberculosis (TB) cases by area in Harris County, Texas
over a three-year period, 1995-1998, using geographical information systems
(GIS) software and spatial analytical techniques.�� Analysis looked at the TB incidence
distribution in
OBJECTIVE
The main purpose of this research study was to examine the
spatial distribution of tuberculosis (TB) cases by area in Harris County, Texas
over a three-year period, 1995-1998, using geographical information systems
(GIS) software and spatial analytical techniques.� In doing this, it was anticipated that this
research study would demonstrate some of the valuable assets of GIS in disease
mapping and surveillance.� It is expected
that the information gathered by this research study will assist public health
workers by identifying and providing effective examples of using epidemiologic
data, public use statistical software and GIS in formulating study questions,
generating and testing hypotheses, and critically evaluating maps that are
prepared using GIS software and spatial statistical methods.�� In addition, useful resources (analytical
and descriptive maps) have been produced for the Houston Tuberculosis
Initiative, a population-based, active surveillance and molecular epidemiology
study of tuberculosis cases reported to the City of
INTRODUCTION
A Geographic Information System (GIS) is a very important tool for use in disease mapping, as well as public health surveillance activities to assist in identifying high-risk groups.� Because GIS is software for mapping and also has an embedded relational database component, it makes the management and analysis of public health surveillance data very organized for determining spatial and time trends.� Disease cases can be viewed in their surrounding social context and patterns of their geographical distribution can be analyzed by using various spatial statistical methods that account for differences in location characteristics (e.g.; latitude and longitude).� In addition, due to its ability to identify and map environmental factors associated with disease vectors, GIS is increasingly important in infectious and vector-born disease surveillance.7,25,26,38,46 All of these characteristics, coupled with the ease of use with the proper training, allow for the mapping of surveillance and disease data to be within reach of even the smallest health departments.� GIS can assist epidemiologists by adding descriptive images that are systematically created according to proper scientific protocol as well as allow for evaluations of potential cluster investigations when combined with robust statistical methods and software.� Detailed mapping can be produced with GIS and revised an infinite number of times with little effort, enabling the creation of a variety of new types of maps that could be useful in public health management and practice.�� An ideal outcome would be that communities have the capability to link together health information from various data sources for efficiency and centralization in order to recognize spatial data patterns that may suggest where cost-effective public health interventions can be applied.40
The Role of GIS Technology
in Public Health Efforts
The use of geography in epidemiologic studies is not a new
phenomenon.� Probably the most widely
cited study for first incorporating the combination of field epidemiology and
geographical analysis is John Snow�s analysis of local water pumps and their
relationship with the spread of cholera in
GIS mapping can also allow researchers to examine many different types of questions involving the particulars of a specific location, the distribution of certain phenomenon, the changes that have occurred since a previous analysis, the impact of a specific event, or the relationships and systematic patterns of a region.6�� The GIS database becomes a model of spatial information that can be used in epidemiologic and health research in order to recognize the specific spatial structure of a process.23� A particular spatial structure includes the individuals affected and how they are connected in communities, as well as the dynamics of these communities and their organization into larger units.36� The geographic component of the GIS becomes a method of classifying data records into groups (administrative areas) separately from the personal characteristics of the individuals and allow for examination of aspects of location that are not captured by variables observed directly for the individuals.44� However, when detailed location information is available for individuals, it is not necessary to aggregate information into groups. It is possible to fit models that include spatial correlation components and do so without compromising confidentiality of the individuals by using varying levels of resolution to display patterns.� The nature of GIS allows for flexibility in utilizing different techniques for mapping data through the use of area-based (counts of cases) and point-based (actual incident cases) data types.
Factors Influencing a
Spatial Analysis of TB
Unfortunately, until more recently, the use of GIS in the study of infectious disease and, more specifically tuberculosis, has been less documented than that of chronic or environmentally related illness.� In fact, some believe that there is little appreciation amongst public health professionals of the value in mapping communicable diseases or associated risks.� �Limited resources, large datasets, and concern for the maintenance of patient anonymity, combined with under-recognition of the benefits of conducting geographical analysis�mean that the spatial references required for disease mapping are frequently not made available.2� This does not have to be the case though, as GIS allows the researcher to display data at different resolutions and aggregations in order to protect confidentiality.� This should not limit the use of analytical methods to describe geographical variation in the distribution of infectious diseases that will be readily understood and used by public health professionals.
A national plan dedicated to the elimination of tuberculosis (TB) in the United States by 2010 (defined by a case rate of less than 1 per 1,000,000 population) has been in place since 1989 by the Centers for Disease Control and Prevention (CDC) and the Advisory Council for the Elimination of Tuberculosis (ACET).12, 35�� The most recent data for reported incident cases of TB in the United States shows a low of 16,377 cases during 2000 compared to 17,531 cases for 1999.11�� Although data show a significant decrease in TB cases during most of the last decade, there is still concern among medical and public health professionals to provide new diagnostic and therapeutic tools to continue this progression towards TB elimination and to deal with the impending tide of individuals with latent TB infection that serve as a reservoir of future cases.1� In fact, one of the greatest scientific advances in TB detection methods has been the use of genetic molecular characterization such as restriction fragment length polymorphism (RFLP) analysis.� Population-based studies have shown that identification of TB case clusters is significantly enhanced by profiling certain copies of TB DNA (defined as probes) and has become a standard tool used in TB epidemiologic studies.43
Tuberculosis, in general, is frequently associated with marginalized populations such as the homeless, persons living at the poverty level, and those living in overcrowded housing, such as immigrants.� Numerous studies, both in the United States and abroad, have shown these factors as well as human immunodeficiency virus (HIV) infection, increasing cases among foreign-born individuals (in the U.S.), drug use, multi-drug-resistant TB (MDR TB) and living in various institutional settings are responsible for a large proportion of the TB cases reported annually.5,8,9,13,15,17,37,39,45
In 1999, the CDC provided revised recommendations for TB prevention that included issues for: using elimination strategies based on local epidemiology, establishing new strategic partnerships to effectively reach the diverse population of people at risk, enhancing the use of current tools for TB prevention and control, developing new tools for TB elimination, recommitting to the global battle against TB, and supporting broad-based efforts for TB prevention and control at all governmental levels in the U.S.� Of specific relevance for this research study is CDC�s view that surveillance and program evaluation data show areas for improvement.12 ��There are particular concerns about individual contacts maintaining compliance with or even starting TB therapy, as these are the individuals most likely to become future TB cases.� According to the CDC, strategies that target groups at high risk for TB and treat those infected have often been poorly applied.12�� In order to effectively deal with this surveillance issue, one objective should be to develop and implement systems to conduct active case finding among high-risk populations, when appropriate.12�� As an example, TB-control staff members could be trained to use local epidemiologic data, coupled with GIS, to consistently identify high-risk groups that are deemed appropriate for targeted testing (e.g. immigrant populations) and to ensure that a greater proportion of infected persons begin and complete therapy.�
Incidence of
Tuberculosis in
Although there was a steady decrease in cases overall (from 12.7/100,000
to 9.2/100,000)11 , the incidence in
RESEARCH QUESTIONS
METHODOLOGY
Study Population
Secondary data analysis was
performed on a subset of data collected during the 36-month period from October
1995 to September 1998 by the Houston Tuberculosis Initiative Program
(HTIP).� HTIP is an ongoing,
population-based, active surveillance and molecular epidemiology study of
tuberculosis cases reported to the City of
Database Organization
and Geocoding
In order to allow for spatial analysis and mapping of individual TB cases, the address of each case at time of entry into the study was geocoded.� This involved assigning a latitude and a longitude for the address utilizing specialized geocoding software20 and a database of street network files.22�� The process of address matching involved matching the street address number, street, city, and zip code with the corresponding street segment in the street network file.� Over 98% of the cases (n=1459) were exactly matched by the geocoding software. Some of the edits involved in the address matching of the final 22 cases involved corrections of: misspellings; mistakes in street typing (Road instead of Street); and problems with street numbering. Only one case was unable to be matched as there was no valid address information for this case.� Subsequently, this case was dropped from the final analysis.�
When analytical methods were used
that included the observed points only, without aggregation to specific areas,
the subset of 1480 cases was used.�
However, because some of the spatial methods used in this analysis
required boundary restrictions and aggregation of points to specific areas
(block groups, census tracts), only the boundary of
Most of the spatial analytical methods used in this project were based upon point pattern methods where the objective is to determine if there is a tendency for events (TB cases) to exhibit a pattern; some form of regularity or clustering.3 ��The cases were the geocoded addresses from the data set as described above and the attributes were the spatial coordinates (latitude and longitude) and the various independent variables under consideration.� The data under study represented �a complete map of events of tuberculosis between October 1995 and September 1998 (because the number of 1995 cases were so few, they were aggregated with 1996 cases, creating the time period 1996-1998 for analysis purposes) and the study region was comprised of the area of Harris County, Texas. �The main purpose of this analysis was to follow exploratory spatial analysis methods to generate possible hypotheses for future analyses and to suggest possible explanatory models to describe the observed processes.� There are various ways in which one can view a spatial point pattern and the following exploratory methods outline the processes utilized in this research study to examine properties of intensity (mean number of events per unit area; 1st order properties) and spatial dependence or interactions (relationships between numbers of events in the study area; 2nd order properties).�
Kernel Estimation
The function of kernel estimation
was to obtain a smooth estimate of a bivariate probability density from an
observed sample of observations. For any chosen kernel and bandwidth, values of
intensity can be examined at locations on a suitably chosen grid over the study
area to provide a useful visual indication of the variation in the
intensity.� The individual kernel
estimates for each cell are summed to produce an overall estimate of density
for that cell.� Through this method we
are provided with a summary of how events tend to cluster throughout the study
area as a means of assessing 1st order properties.� ��The
study area was a rectangular grid placed over the whole of
In order to adjust density
estimates for heterogeneous population distributions (such as population at
risk of disease in an area) one can also use a ratio of kernel estimates
for intensity of events and population density.32� �This
allows for the viewing of an image that takes into consideration the intensity
of events along with the intensity of population and begins to provide an
estimate of case risk.� This also assists
in judging whether what is viewed as cases converging towards a specific area
is a function of population density or not.� This procedure is easily obtained through
the use of CrimeStat�
software32.��� For the
purposes of this procedure, a quartic kernel method was used with a
fixed bandwidth of 1.25 miles (for both kernel calculations) in order to obtain
a smooth model for descriptive purposes.�
Cell size was adjusted in order to create a grid covering all of
Nearest Neighbor
Distances
Nearest neighbor distances were the exploratory methods used in this
study in order to investigate second order properties (looking at possible
relationships between points) using (w) or
(x) distances3 between observed events in a study
area. �This provides information about
inter-event interactions at small distances which could provide useful
information when dealing with an infectious disease such as TB.� Calculations of these distribution functions
were provided by S-plus 2000� 33 and a limitation on the total distance used was set at approximately 3.5
miles (.05 degrees latitude). The
resulting empirical distribution function
(w) was
plotted against suitable values of distances and the resulting empirical distribution
function
(x) was plotted against the theoretical distribution function
of Complete Spatial Randomness (given by the equation 1- exp(-ply2)) in order to explore possible evidence of
inter-event interactions.�
������������ The above nearest neighbor distance methods were useful for looking at
patterns among the closest events and in considering small scales or patterns. �Therefore, a loss of information occurs
because only these smallest patterns of scale are considered.� The above statistics only indicate the
direction of departure from complete spatial randomness but don�t provide a
means for interpreting a process that doesn�t adhere to this assumption.� �An
alternative approach that provided a more effective summary of spatial dependence
over a wider range of scales for second order properties was the (h) function, which provided a test of
randomness for every distance from the smallest up to the size of the study
area.3,32� CrimeStat�� was
used for calculating this
function using 100 intervals (radii) by which the statistic was counted based
on an overall radius of approximately 33 miles32.� The resulting
(h) function was transformed into the square root function (
(h) ) and plotted against distance to reveal
whether there was any clustering at certain distances or any dispersion at
others. This transformation is useful to better visualize the function by
making it more linear.3�� Edge
corrections were not considered in this preliminary analysis.� Five hundred
������������ These calculations were used for all TB cases, cases stratified by each independent variable described above, and 2000 U.S. Census Population characteristics (fixed at the centroid) for the block group level for comparisons.�
Spatial Filtering
Method
The spatial filtering method as
outlined by Rushton41,42 was used also as an exploratory technique
in order to build upon the kernelling methods used earlier.� Not only can one view estimated disease rates
based upon extrapolation of individual cases and underlying population to
points on a fine grid, but this method allows for the input of probabilities
for the event in order to generate Monte Carlo simulations of expected rates to
compare with the observed rates and provide a level of significance for the
observed rates.� The output is generated
as the proportion of simulated rates that were less than the observed rates,
whereby contour lines can be portrayed on a map that show where this proportion
was low or high.� This procedure was used
because it allowed for the data to remain in its original form (N=1480) instead
of being forced to aggregate to a larger area.�
The numerator files for this method were all TB cases and cases
stratified by each independent variable described above.� The denominator files utilized the 2000
Spatial Scan
Statistical Method
This final method, as outline by
Kulldorff et al.24,29,30,31� was
used in order to determine possible cluster areas for TB that were based on
statistical likelihood.� Each resulting
cluster of areas would have an assigned p-value and relative risk measurement
to compare to an expected value. For this analysis, the data was broken into
cases (TB cases by genetic print type) and controls (all other TB cases) in
order to determine areas of clustering for specific print types relative to all
TB cases.� One of the underlying
assumptions is that shared print types have possible shared contacts.� One of these possible contacts is the use of
public transportation.� If significant
clusters can be determined for various print types, the actual case points with
bus-route attribute information can be overlaid onto this area and provide a
rationale for checking personal contact information where print type and bus-routes
are identical.� For all analysis, a
space-time scan statistical test was used as provided in SaTScan� v2.1 in order to adjust for time variations (broken into year intervals,
1996-1998) as well as spatial variations.�
The test was set to scan for clusters with both high and low rates and
the underlying coordinates file was based on the centroids of the 2000 U.S.
Census block groups.� Three thousand
RESULTS
������������ It is not possible to provide examples of all results for the above-mentioned analytical and exploratory methods in this space.� Instead, I focus on the complete set of all TB cases and one case subgroup stratified by race (Black) for comparison.
Results from Kernel Estimation
Figure 1 shows the relative density of TB
cases per square mile of area for all cases in comparison to the density of the
2000 U.S. Census Block Group total population.��
Figure 2 shows the relative density of Black TB cases per square mile of
area in comparison to the density of the 2000 U.S. Census Block Group Black
population. Upon visual comparison of the images in Figure
1, it appears that the density of TB cases is more focused towards the
geographic center of
������������ When comparing the ratio of total
TB cases to total population, one can see that there seems to be an area of
elevated risk at the center of
Results from Nearest Neighbor Distances
������������ The
plots of the (w) function for all TB cases and Black TB cases can
be seen in Figure 4.� On visual inspection, it is clear that there
is relative clustering among all TB cases as evident by the steep rise in the
function at small distances.� This trend
is also evident among the Black TB cases. ��Plots of the
(x) function demonstrated a clustered pattern if the
values for the
(x) function varied from the theoretical distribution function
at larger distances.� These plots are shown in Figure 5 for all TB cases and Black TB cases. On
visual inspection of Figures 4 and 5, it is evident that there is large variation
between the two functions (theoretical and empirical) for all TB cases and for
Black TB cases.
������������ The plots for the (h) function transformed into the square root function (
(h) ) for all TB cases and Black TB cases can be
seen in Figure 6.�
Notice that there is evidence for clustering at all scales for all TB
cases and is more so than for total population.�
Based on the fact that this function lies well outside the simulation
envelopes given, there is some confidence in concluding that the locations of
all TB cases are clustered.� Black TB
cases show evidence of clustering up to approximately 12 miles whereby the
function falls steeply.� Up to this
distance, Black TB cases appear to be more clustered than Black population and
there is confidence in this conclusion based on the function falling well
outside the simulation envelopes.
Results from the Spatial
Filtering Method
������������ The results from this method for all
TB cases can be seen in the map in Figure 7.� Here, the blue isolines indicate where the
highest proportion of simulated TB incidence was lower than the observed
incidence.�� The 3153 grid point
locations have computed TB incidence rates based on more than 100 persons at
risk (3-year aggregated block level population) within the 1-mile search
radius.� Actual case points help to
determine where areas of high rates may be less meaningful (very few cases). �The mean incidence rate that was calculated
for the whole group was equal to 18.72 cases/100,000 population.� There is definitely a large area of higher
than average rates running in a North/South direction in the center of
Results from the Spatial
Scan Statistical Method
������������ For this method, 9 different print
types were analyzed in order to find a most likely cluster in comparison to
other TB cases.� Figures 9 and 10 show two of
these print types and their associated most likely cluster with significance
level.� In Figure 9, the map shows that
the most likely cluster for Print Type 1 had an overall incidence nearly 3
times higher than that among all other areas (significant at p=.01).�
In Figure 10, the map shows that the most
likely cluster for Print Type 4 had an overall incidence approximately 9 times
higher than that among all other areas (significant at p<.01).� There is strong
evidence for the existence of these clusters although the exact boundaries of
these clusters are uncertain given the fact that according to the procedure for
this method, there are many overlapping circular windows that will contain the
most likely cluster.� In using this
method, however, one is able to take the information on most likely clusters
and characterize the case attributes in order to look for significant
patterns.� As mentioned earlier under the
objectives section, one of the possible patterns is characterized by the public
bus routes that may or may not be shared between cases.� In Figure 11,
both Print 1 and Print 4 clusters are shown with the cases in each cluster
characterized by their bus route.�� In
the Print 4 cluster, there were at least 6 individuals who shared the same bus
route (Route 82).� In the Print 1
cluster, there were 3 individuals who shared one bus route (Route 25), 2
different individuals who shared another bus route (Route 15), and 2 different
individuals who shared a 3rd bus route (Route 80).�
Discussion
������������ This
study of cases from a three-year, population-based study of the epidemiology of
tuberculosis in Harris County, Texas, used various spatial analytical methods
to look at the intensity and spatial interactions of TB cases and determine
whether there were significant spatial patterns among cases that may have
deviated from a random pattern.� Through
the use of kernel estimation methods it was evident that there were specific
areas in which the intensity of TB cases during the three-year period was high,
even in reference to the underlying population. This allowed for a quick
assessment of potential centers of TB incidence that were stratified by various
risk factors, such as ethnicity, under the assumption that case density would
follow the underlying population distribution instead of a completely spatially
random distribution.� On first
glance, it was observed that both the population density and TB case density
among Blacks looked very similar.�
However, when controlling for the underlying Black population by using a
ratio method of kernel densities, it was discovered that even within areas of
high Black population density, there were still high TB case density areas
among Blacks.� Some may argue with the
necessary assumption of the kernel ratio method used for this study in that the
population values were centered at a specific location (centroids); this is an
obvious limitation of this method.� However,
block groups generally contain between
600 and 3,000 people, with an optimum size of 1,500 people.� While there still may be variation in a
neighborhood area of this size, the effect of allocating all individuals to a
single point only produces a small error (Levine, personal communication).�� Another method that could be used for
comparison purposes would be to use another point process that could act as a
surrogate measure of underlying population, (Non-Black TB cases) to be used as
the denominator for the kernel ratio method in much the same way as a
case-control design in epidemiology.3�� The above finding is notable, however, for
hypothesis generation, when comparing it to an epidemiological study performed
by HTIP (previous to this study) using a similar data set16.� That study looked at contributions of certain
risk factors associated with clustering of TB cases (where at least two
individuals had similar genetic print types).�
The finding in that study stated that among Blacks, the odds for
clustering was 3 times greater (univariate OR of 3.1) than for Whites.16� ��Had the evidence from this study, that the
intensity of TB among Black cases appeared to be high, been available prior to
the HTIP study, one would have had more rationale for including ethnicity in a
multivariate model with an underlying hypothesis that Blacks may be at high
risk for clustering.� Additional evidence for possible clustering
among Black TB cases was given in the results of the Nearest Neighbor methods
utilized in this project, most notably the (h) function analysis.� The use of simulation
envelopes under the assumption of spatial randomness allowed one to assess
significant departures of
(h) from its
theoretical value.� By providing the same
analysis for the underlying population at risk, one is able to directly compare
the functions and realize that in distances up to approximately 12 miles, there
was a tendency for Black TB cases to show more of a clustering effect than even
Black population.��
������������ Another extrapolation
technique that served to build upon the kernelling methods used was the Spatial
Filtering method advocated by Rushton42.� The added benefit of this technique was the
use of simulation techniques in order to provide a level of significance for
judging the observed relationships.�
Again, there was a definite area where relative TB incidence rates for
all cases and relative incidence rates for Black cases appeared quite high.� One can feel confident that these areas are
meaningful if viewed under the aegis of exploratory analysis and can lead the
researcher to refine areas for further analysis in the future.� Again, one of the limiting factors for this
analytical method is that there was no spatial point pattern for use as the
denominator that took into account the total population at risk.� At best was the use of population centroids
at the smallest geographical area available from the U.S. Census Bureau
(blocks).� However, the method compares
the observed case rates with a simulated distribution of case rates that
inevitably use the same variance structure of the observed rates42.� In addition, previous studies have looked at
using census-based approaches to account for the lack of population and
socio-economic data at the individual level and noted that this approach is
valid and meaningful when the individual-level data is not available.28� �As
a means for routine analysis under a surveillance group, one can quickly make
tentative conclusions about the likelihood of case clusters and their
geographic distribution based on sound methodology and follow these conclusions
with the relevant epidemiological analyses42.�
������������ The spatial scan statistical method29,30,31
was utilized in order to find the most likely clusters based on genetic print
type in comparison to all other TB cases.�
The previous analytical methods have tried to show the evidence of
overall clustering but provide no information on where the locations for
potential clustering may occur.� The
spatial scan statistical method was an attempt to provide location information
for an observed cluster that is provided with a level of significance based on
a maximum likelihood test.� Earlier
analysis16 had identified that the use of public transportation was
a significant risk factor for the clustering of TB cases (multivariate OR of
1.4, p-value = .03).� Therefore, it was
assumed that if this spatial method could show, with statistical significance,
the most likely genetic print clusters, one could compare the attribute
information on public transportation for each case found in this cluster to
look for relevant patterns.� Four genetic
print types were found to be significant geographic clusters based on comparing
the cases with associated print type (aggregated to Census block groups) with
all other TB cases as the control group (also aggregated to Census block
groups). Among these four clusters, the cohort for Print type 4 geographic
cluster was the same as that observed by previous epidemiological analysis by
HTIP47.� Through molecular
characterization and data collected from a standardized questionnaire, and
matched case-control methods, researchers were able to determine that many of
the individuals in this cohort frequented the same social locations (bars), had
similar HIV+ status, had the same ethnic background (White), and had a history
of drug use47.� In addition to
these characteristics, it was demonstrated by the current study that at least
six individuals in this geographic cluster alone (out of a total of 7 in the
total cohort of 38) shared the same mode of public transportation (Bus Route
82). ��This analysis used a space-time
scan statistic that calculated an overall relative risk that takes into
consideration the location and the time of infection (based on the City of
Houston TB control morbidity date) of the specific genetic print type cases
relative to non-print cases.� This method
attempts to correct for any faulty assumptions based on the possibility that
all cases occurring in the same time period may bias the overall results.�
������������
Conclusions
������������ One of the main reasons for performing
this analysis was to show that there is a definite utility in the use of GIS
and spatial analysis in conjunction with epidemiological analyses in public
health.� The HTIP group has published
numerous papers on the risk factors associated with TB clustering and developed
novel ways of isolating the threat of increasing incidence rates.16,17,27,43,47� This project adds a benefit of performing
another type of analysis that provides the researcher with a meaningful picture
of the disease patterns that can be used in conjunction with output from
epidemiologic studies.� However, critics
may be quick to point out that this benefit is also a limitation; the fact that
this current analysis is coming on the heels of prior research findings is no
guarantee that these methods would have steered the research group toward their
findings. This should not hinder, however, the use of spatial analytical
methods in conjunction with epidemiological studies, especially as
hypothesis-generating activities and exploratory exercises useful for planning
future explanatory analyses.� The main
focus of this project was to show that the description of spatial patterns in
disease events can lead to important decisions as to where interventions may
need to take place or dollars spent on control efforts.�� �In
addition, some may recognize the limits of simple univariate point analysis
with the methods used here, preventing one from looking for spatial
relationships that may adjust for a number of covariates together as is done in
traditional epidemiological studies.�� There
are methods that will analyze multivariate point patterns14, such as
a bivariate (h) function, that could be
used in the future to look at comparing differences in spatial point patterns
that account for locations of two or more types of events in a study region but
this type of decision should be made by all interested parties involved in the
research tasks, with a variety of analytical and exploratory data for
background comparison.� This project
serves to add to that wealth of information already present in
The importance of GIS in health
research has been documented in a large number of articles during the past
decade.� Various peer-reviewed journals
have devoted whole issues to the topic of GIS in health research (Journal of
Public Health Mgmt. Vol. 5 Nos. 2,4), spatial analysis (Statistics in
Medicine, Vol.19 Nos. 17,18), as well as, lengthy review articles on both
subjects.36,40� There have
even been several books written on the theme of GIS and health, as well as,
exploratory analyses using spatial statistical methods and� specialized software/training modules
developed to meet the needs of researchers when stand-alone GIS software is not
enough for more robust statistical analysis purposes.3,18,21,29,32,41� The benefits of combining active health
surveillance efforts with systematic collection and display of geographical
information have also been discussed at length.32,34,42 GIS provides a visual component that may
often be lacking in scientific studies that can provide useful information when
combined with sound statistical methods. The ease of incorporating such GIS
systems into already existing database structures in public health departments
and surveillance systems should become the norm in an effort to promote the
timely communication of disease trends to policy makers and the general public.
Acknowledgements
������������ This study would not have been possible without the assistance of the Houston Tuberculosis Study and Dr. Edward Graviss, Ph.D, M.P.H. who agreed to let me use the necessary data for this study.�
In addition, I would like to thank the researchers Martin Kulldorrf, Ph.D, Ned Levine, Ph.D and Gerard Rushton, Ph.D who responded promptly to my questions about using their software.
References
1.�� �American Thoracic Society.� 2000.� Diagnostic standards and classification of tuberculosis in adults and children.� Am J Respir Crit Care Med. 161: 1376-95.
2��� �Atkinson,
P. and Molesworth, A.� 2000.� Geographical analysis of communicable disease
data.� In:� P. Elliot; J.C. Wakefield; N.G. Best; D.J.
Briggs (Eds.)� Spatial epidemiology:� methods and applications. pp. 253-66.�
3.�� �Bailey, T.C. and Gatrell, A.C.� 1995.�
Interactive spatial data analysis.�
4.�� �Barnes, P.F.; Yang, Z.; Preston-Martin, S.;
Pogoda, J.M.; Jones, B.E.; Otaya, M.; Eisenach, K.D.; Knowles, L.; Harvey, S.;
Cave, M.D.� 1997.� Patterns of tuberculosis transmission in
central
5.�� �Bellin, E.Y.; Fletcher, D.D.; Safyer,
S.M.� 1993.� Association of tuberculosis infection with
increased time in or admission to the
6��� �Bernhardsen, T.� 1999.�
Geographic information systems, an introduction,� 2nd edition.
7.�� �Beyers, N.; Gie, R.P.; Zietsman, H.L.;
Kunneke, M.; Hauman, J.; Tatley, M.; Donald, P.R.� 1996.�
The use of a geographical information system (GIS) to evaluate the
distribution tuberculosis in a high-incidence community.�
8.�� �Bifani, P.J.; Mathema, B.; Liu, Z.; Moghazeh, S.L.; Shopsin, B.; Templaski, B.; Driscoll, J.;� Frothingham, R.; Musser, J.M.; Alcabes, P.; Kreiswirth, B.N.� 1999.� Identification of a W variant outbreak of Mycobacterium tuberculosis via population-based molecular epidemiology.� JAMA.� 282(24):� 2321-2327.
9.�� �Bishai, W.R.; Graham, N.M.H.; Harrington, S.; Pope, D.S.; Hooper, N.; Astemborski, J.; Sheely, L.; Vlahov, D.; Glass, G.E.; Chaisson, R.E. 1998.� Molecular and geographic patterns of tuberculosis transmission after 15 years of directly observed therapy.� JAMA. 280(19): 1679-1703.
10.� Centers for Disease Control and Prevention. 2001.� MMWR. 49(Nos. 51&52):1153-76.
11.� Centers for Disease Control and Prevention. 2001.�� Division of Tuberculosis Elimination. (Online). Available: HYPERLINK "http://www.cdc.gov/nchstp/tb/surv/surv.htm" [2001, June 15].
12.� Centers for Disease Control and Prevention. 1999.� Tuberculosis elimination revisited:� obstacles, opportunities, and a renewed commitment.� MMWR.� 48(No. RR-9): 1-13.
13.� Centers for
Disease Control and Prevention. 1990.�
Tuberculosis among foreign-born persons entering the
14.� Cressie, N.A.C.� 1991.�
Statistics for spatial data.�
15.� Daley, C.L.; Small, P.M.; Schecter, G.F.; Schoolnik, G.K.; McAdam, R.A.; Jacobs, W.R.; Hopewell, P.C.� 1992.� An outbreak of tuberculosis with accelerated progression among persons infected with the human immunodeficiency virus.� N Engl J Med.� 326: 231-235.
16.� De Bruyn, G.; Adams, G.; Teeter L.; Soini, H.; Musser, J.M.; Graviss, E.A. 2001.� The contribution of ethnicity to Mycobacterium tuberculosis strain clustering.� Int J Tuberc Lung Dis. 5(7): 633-41.
17.� El Sahly, H.M.;
Adams, G.J.; Soini, H.;Teeter, L.;Musser, J.M.;Graviss, E.A. 2001.� Epidemiologic differences between United States-
and foreign-born tuberculosis patients in
18.� Elliot, P.;
19.� Environmental Systems Research Institute,
Inc. 1999.� ArcView Spatial Analyst
Vers. 1.1,
20.� Environmental
Systems Research Institute, Inc. 1998.� Atlas
GIS 4.0,
21.� Gatrell, A. and L�yt�nen, M.� 1998.�
GIS and health.�
22.� Geographic Data
Technology, Inc.� 2000.� Dynamap 1000 Street Network File for
23.� Goodchild, M.F.� 1987.� A spatial analytical perspective on geographical information systems.� Int J Geographical Information Systems.� 1(4): 327-34.
24.� Hjalmars, U.; Kulldorff, M.; Gustafsson, G.;
Nagarwalla, N.� 1996.� Childhood leukaemia in
25.� Jacquez, G.M.
1998.� GIS as an enabling
technology.� In:� A. Gatrell and M. L�yt�nen (Eds.) GIS and
health.� pp. 17-28.��
26.� Kleinschmidt,
27.� Klovdahl, A.S.; Graviss, E.A.; Yaganehdoost, A.; Ross, M.W.; Wanger, A.; Adams, G.J.; Musser, J.M.� 2001.� Networks and tuberculosis:� an undetected community outbreak involving public places.� Soc Sci and Med. 52: 681-694.
28.� Krieger, N.� 1992.�
Overcoming the absence of socioeconomic data in medical records:� validation and application of a census-based
methodology.� American Journal of Public Health.�
82(5):� 703-710.
29.� Kulldorff, M.;
Rand, K.; Gherman, G.; Williams, G.; DeFrancesco, D. 1998. SaTScan v2.1:
Software for the spatial and space-time scan statistics.
30.� Kulldorrf, M.;
Feuer, E.J.; Miller, B.A.; Freedman, L.S.�
1997.� Breast cancer clusters in
the
31.� Kulldorff, M. and Nagarwalla, N.� 1995.� Spatial disease clusters:� detection and inference. �Statistics in Medicine.� 14:799-810.
32.� Levine, N. 2000.
CrimeStat: A Spatial Statistics Program for the Analysis of Crime Incident
Locations (Vers. 1.1).� Ned Levine
& Associates,
33.� Mathsoft,
Inc.� 1999.� S-Plus 2000 Professional Release 1.�
34.� Mayer, J.D.� 1983.� The role of spatial analysis and geographic data in the detection of disease causation.� Soc Sci Med. 17:1213-21.
35.� McKenna, M.T.;
McCray, E.; Jones, J.L.; Onorato, I.M.; Castro, K.G. 1998. The fall after the
rise:� tuberculosis in the
36.�
37.�
38.� Ormerod, L.P.;
Charlett, A.; Gilham, C.; Darbyshire, J.H.; Watson, J.M.� 1998.�
Geographical distribution of tuberculosis notifications in national
surveys of
39.� Pablos-Mendez, A.; Ravioglinone, M.C.; Laszlo, A. et al.� 1998.� Global surveillance for antituberculosis-drug resistance, 1994-1997.� N Engl J Med.� 338:1641-9.
40.� Richards, T.B.; Croner, C.M.; Rushton, G.; Brown, C.K.; Fowler, L.� 1999.� Geographic information systems and public health:� mapping the future.� Public Health Reports.� 114:359-373.
41.� Rushton, G.;
Armstrong, M.P.; Lynch, C.; Rohrer, J.�
1997.� Improving public health through
geographical information systems:� an
instructional guide to major concepts and their implementation, vers 2.5.�
42.� Rushton, G. and Lolonis, P.� 1996. Exploratory spatial analysis of birth defect rates in an urban population.� Statistics in Medicine.� 15: 717-726.
43.� Soini, H.; Pan, X.; Teeter, L.; Musser, J.M.; Graviss, E.A.� 2001.� Transmission dynamics and molecular characterization of Mycobacterium tuberculosis isolates with low copy numbers of IS6110.�� Journal of Clinical Microbiology.� 39(1):� 217-221.
44.�
45.� Whalen, C.; Horsburgh, C.R. Jr., Hom, D.; Lahart, C.; Simberkoff, M.; Ellner, J.� 1997.� Site of disease and opportunistic infection predict survival in HIV-associated tuberculosis.� AIDS.� 11: 455-60.
46.� Wilkinson, D.
and Tanser, F.� 1999.� GIS/GPS to document increased access to
community-based treatment for tuberculosis in
47.� Yaganehdoost, A., Graviss, E.A.; Ross, M.W.; Adams, G.J.; Ramaswamy, S.; Wanger, A.; Frothingham, R.; Soini, H.; Musser, J.M.� 1999.� Complex transmission dynamics of clonally related virulent� Mycobacterium tuberculosis associated with barhopping by predominantly human immunodeficiency virus-positive gay men.� The Journal of Infectious Diseases.� 180:� 1245-51.
Author Information
Matthew L. Stone
Public Health and GIS Researcher
Center for Health Policy Studies
University of Texas-Houston
1200 Herman Pressler, Suite RAS E929
713-500-9395
713-500-9493(fax)
mstone@sph.uth.tmc.edu