Vijay Aswani & Eldredge Bermingham
Measuring biodiversity of large areas, such as entire countries or regions, is a difficult and costly process. One largely untapped and potentially useful source of biodiversity information is museum collection databases. But how does one from a large database of collection records to a succinct description of biodiversity? This paper describes a number of ArcView GIS scripts that use a museum database to generate biodiversity information such as species per area counts, area-species curves, endemicity plots, and species contours representing Central American freshwater fish.
A key to understanding the diversity, evolution and biogeography of organisms within any landscape is a spatial database of distribution of its component species. This data is gathered in the field and the collections are often passed onto museums for cataloguing and storage. Thus museum collection databases should be a key resource for studies in biodiversity. The goal of our research is to gain an understanding of the diversity, evolution and biogeography of freshwater fish in Panama and the rest of Central America. To this end, we embarked upon assembling a GIS database of freshwater fish collection activity, as reflected in museum holdings. From an appropriate summary and exploratory analysis of this database we hoped to gain the answers to questions such as:
This last question is directly related to our studies in the molecular phylogeography of freshwater fish using DNA sequence information (Bermingham & Martin, 1998; Martin & Bermingham, 1998). As a desirable spin-off gained with very little additional effort, we endeavored to collect and summarize information useful in the conservation biology of the area of study. Some of these data are:
The GIS database product that would be formed in the course of study would, we believe, serve as a resource to other ichthyologists and evolutionary biologists (not to mention any other specialist with an interest in the data). It is our intention to provide access to this GIS database through the Internet as map-serving technology makes this possible.
Our study focuses on lower Central America (Photo credit: National Geographic Maps: Physical map of the World). This is described as the land area bordered by Lake Nicaragua in the north and the Cordillera Occidental (NW Colombia) in the south. These two natural features form vicariant barriers that would make the landscape between them a separate biogeographic region for freshwater fish within the neotropics.
The region is of interest because:
Materiel & Methods
The data for studying freshwater fish distribution in LCA came from two sources: The NEODAT project (http://biodiversity.uno.edu/~neodat/) and the STRI Freshwater Fish Collection (Bermingham et al, 1997).
The Inter-Institutional Database of Fish Biodiversity in the Neotropics (NEODAT) is an international cooperative effort to make available systematic and geographic data on neotropical freshwater fish specimens deposited in natural history collections in the New World and Europe. Currently 29 institutions in South America, Central America, North America, West Indies, and Europe participate in the project. Over 300,000 records have been captured and are accessible through project databases.
We obtained the records from NEODAT pertaining to Costa Rica, Panama and Colombia. Although ideally, one should be able to map this data directly and study the distribution of fish species, for reasons we shall describe below, this does not yield acceptable and credulous results.
The STRI Freshwater Fish Collection (Bermingham et al, 1997) was established 5 years ago at the Smithsonian Tropical Research Institute (STRI). It holds over 10,000 freshwater fish tissue samples preserved suitably to facilitate the recovery of good quality DNA for genetic studies.
Building the GIS
The steps involved in building this GIS from the above two museum collection databases combined are described in some detail herewith. We believe that our experience may be instructive to others attempting to utilize museum collection data to establish a GIS for any organism.
All changes described below were made to newly created fields so as to retain the original information. We did not in any case examine the actual fish specimen stored in the museum (except the ones in the STRI collection, which were examined), so any errors in the original identification and or recording the data could not be corrected.
Step 1: Geo-referencing the records
Although most of the records had location information, in many cases, there was no latitude, longitude data available. This meant reading the textual location information and looking up the collection sites on a map.
This was a most tedious and time-consuming step. Some helpful procedures were:
Preparing a list of unique collection sites to look up.
Since most collection field trips would results in the collection of multiple individuals and species from the same location, one could reduce the total number of records to be georeferenced by grouping them according to collection site and location. (7951 unique locations in 40,000 records).
Using National Gazetteers to spot locations.
The national P267ry & Mapping Agency (NIMA) has an excellent resource to aid in this process. It is the Digital Interim Geographic Names (NIMA, 1997). This collection of two CD-ROMs provides names of geographic features in most countries of the world along with a basic classification and latitude and longitude of the feature. By converting these national gazettes into databases, it was possible to locate the features indicated in the textual location fields of the museum collection databases and come closer to the original point of collection.
Some difficulties that one must beware of are:
Step 2: Filling out higher taxonomy information
We found the taxonomic information supplied with the data sets incomplete. We went through the data and filled out higher taxonomic categories (families, orders and classes).
Although we made every effort to identify each species by its most recent classification, our task was made more daunting by the state of knowledge of neotropical freshwater fish. Many genera and families are undergoing quite complete revisions while others have never been formally studied.
Step 3: Revising the taxonomic information
It is not possible to obtain species counts from the raw data set because of the following problems:
We therefore undertook to correctly identify the scientific name as best we could utilizing several resources (Eschmeyer, 1998;FishBase, 1997; Nelson, 1994; Berra, 1984), a couple of which are available as searchable digital databases on CD-ROM, a monograph on the fishes of Costa Rica (Bussing, 1998) and the expertise of local and visiting scientists (see acknowledgments). Since our lab has collected extensively in Panama and Costa Rica has been adequately described by Bussing (1998), these two countries posed less of problem. In addition, we used a couple of monographs on fishes of Panama (Meek & Hilderbrand, 1916; Loftin, 1965).
Our representation of Colombia is far from complete. Our calls were conservative so our estimates of biodiversity there are under-estimates at best.
Step 4: Identifying freshwater fish
Myers (1938, 1949, 1951) had developed a widely used classification of fishes in freshwater fish based on their tolerance to salt water. This was later modified by Darlington (1957) into the categories described belowThe focus of our study was Neotropical freshwater fish. To restrict our analysis to this group, we had to create fields that identified taxa as being primary (little salt tolerance and confined to freshwaters), secondary (usually confined to freshwater but their dispersal suggests that they could travel through salt waters) and peripherals (derived from marine ancestors). Since this information was not part of the original museum record, we had to key it in.
To do this, we used several reference works(Eschmeyer, 1998; FishBase, 1997;Nelson, 1994; Bussing, 1998) and local expertise).
Step 5: Collating the final data set
At the end of these steps, it was possible to perform a query on the database and obtain those records that met our criterion. These were:
The records not included in the analysis are described in the adjoining table. The collection points for these records are displayed on the accompanying map of lower Central America shown below.
Obtaining good quality geographic maps useable in ArcView was difficult. Panama has only just begun (past 2 -3 years) to generate digital maps suitable for use in GIS. We began by using the 1:1 million scale maps of Digital Chart of the World. We were subsequently able to obtain excellent quality maps of Panama from Geoinfo, S.A. - who generously supported our work by donating the use of their data products to the project. We have subsequently also received coverages for use in the project from CyberTech, S.A.
Obtaining river drainages was more difficult. No one had digitized the river drainages using hydrological models. We have been working from river drainage polygon coverage that I eyeballed using the rivers and contour coverages. I intend to use Spatial Analyst to generate better drainages utilizing the contour, spot elevation and country boundary coverages obtained from the above mentioned commercial sources. The maps of Costa Rica and Colombia used were those of Digital Chart of the World (1: 1 million).
The GIS software used during the project was PC ARC/INFO and subsequently, ArcView 2.1, 3.0 with the Spatial Analyst Add-in. These tools were generously donated by Esri. Our databases were kept in FileMaker Pro 4.0 . Other specialized software for genetic data analysis and multivariate analysis were also used.
AVENUE scripts developed for the Analyses
In order to use ArcView to summarize the biodiversity of this region reflected in these museum records, a number of AVENUE scripts were written. These scripts should prove useful to anyone interested in similar work and will be made available on the Esri Users' Scripts web site in August, 1999. They are described below:
This script is a modified 'Identify' Tool.. Since sometimes many individual fish were collected from the same location, using the Identify tool would produce a scrolling list that could not be stopped until complete. This tool reports the number of samples at that point and shows the user the first 5. It then asks if you would like to see the next 5, giving the option to stop at any time.
This script asks for a polygon coverage (the regions/provinces) and a point theme (the collections points). It then counts the number of unique species within each region. This count is added as a separate field to the polygon theme, allowing one to color code or chart the counts across regions.
This script once again works with a polygon and a point theme. It produces a list of species and the number of regions in which each is found. Looking at those found in a single polygon gives one a count of endemics.
Measures the number of species recorded with increasing sampling effort. Stores the results in a table that can then be plotted in ArcView or any other program to produce species:effort curves (see example in this paper)
Produced shapefiles that represent species ranges.
Works with a polygon and point theme. Asks the user to select a field from the polygon theme and automatically adds a field to the point theme table filling it in with the polygon's field value based on location. Useful to assign biogeographic region category to species records.
Results and Conclusions
The adjacent map was prepared by drawing a square grid over the landscape and counting the number of species/grid cell. The highest species diversity by this analysis was observed the Choco and Tuira basins - a finding that has reported by World Wildlife Fund in their studies (Dinerstein et al, 1997). Maps of each individual species were produced ( not shown).
Summaries of counts for each country in the region of study were produced and are shown above. Simultaneously, examination of genetic analysis seemed to indicate that the drainages of the LCA landscape could be coalesced into geographically contiguous regions based on similarity at a population level within species.
Both the Venn diagram and the table show a fall in diversity from south to north. The number of species shared between the various named regions also reflect a turnover in the species composition. This can be further seen by comparing the pie charts for each region showing the composition at the order level.
Roeboides is a primary freshwater fish found in many parts of lower Central America. The 'tree' diagram next to the picture of the fish represents the relationship of fish from the different regions within LCA (colored blocks). The tree is produced by measuring similarity distances between a stretch of mitochondrial DNA 842 bases in length representing the genes ATP synthase subunits 8 & 6. Note that groups of branches occur within a single biogeographic region and that distances between these groups are much larger in comparison.
The situation for Aequidens is similar. However since Aequidens is a secondary freshwater fish, the branches in the Pacific slope that face the Bay of Panama are shorter and not isolated from each other. This suggests that the ability of secondary fishes to cross short distances through salt water has allowed populations in these regions to share genetic information.
Both of these examples do however confirm our work with species counts and identification that there are identifiable ichthyological regions within lower Central America with their own distinct fauna and endemic species.
An interesting independent way we arrived at the same biogeographic region classification was by Detrended Corresponded Analysis. This is an application of multivariate analysis to a matrix of species by area based on the dataset. The results shown here indicate that 33% of the total variance could be explained by the first 3 axis, the most important two of which are shown below. The clustering of the individuals from the various regions indicate that these regions have greater species similarity within them and differences between them.
For further summary of the data, we decided to use these biogeographic regions instead of individual river drainages.
Species counts at various levels of taxonomy were obtained by running an ArcView script --- Species/Region Counter developed for the purpose. This script is available from the author by correspondence.
Endemic species were defined as species found in only biogeographic region within the LCA landscape. Caution was needed in interpreting these results as single misidentifications and erroneously located collection points could give spurious results. By an iterative process of revision and running the ArcView script, a list of endemics for each region were produced..
Area:species curves are an important way of noting how complete the sampling was and consequently, whether actual species counts could be higher than estimated. Preliminary species curves were generated using the ArcView script Species:Effort Curve. It is hoped to modify this script to include statistical testing and curve-fitting to randomized data. The curves show that while Panama and Costa Rica had been adequately sampled (flattening of the curves), NW Colombia did not yet plateau out with sampling. This indicates that the actual diversity measures for Colombia are under-estimated by the numbers in this study, due to inadequate sampling.
We would like to thank the following scientists for their time and expertise in helping us with the identification, taxonomy and/or range distribution of the fishes whose museum records were used in this study:
Heidi Banford (most families), Daniel Fromm (rivulines), Jon Armbruster (Loricaridae), Guy Reeves (Brycon, Hemibrycon and related genera), Swen Kullander (cichlidae), Cesar Roman (fishes of Colombia), Ana Isabel Perdices (peripheral fish families and genera).
Geoinfo, S.A. graciously allowed us the use of their commercial data product PANPAIS - digital ArcView shape files of the country at the scale of 1:250,000 projected to UTM Zone 17.
CyberTech, S.A. supplied us with high quality 1:250,000 digital maps of Panama for the exclusive use in our project.
Esri Latin America donated a copy of PC ARC/INFO 3.4.2 and ArcView 3.0 for use in the study.
The Esri Conservation Program provided one of us (V.A.) with free training at the Esri Learning Center.
Bermingham, E. and A. P. Martin. 1998. Comparative mtDNA phylogeography of neotropical freshwater fishes: Testing shared history to infer the evolutionary landscape of lower Central America. Molecular Ecology 7: 499-517.
Martin, A. and E. Bermingham. 1998. Systematics and evolution of lower Central American cichlids inferred from analysis of cytochrome b gene sequences. Molecular Phylogenetics and Evolution 9(2):192-203
Stehli, F.G. and Webb, S.D. 1985. The Great American Biotic Interchange. Plenum Press, New York.
Bussing, W.A. 1985. Patterns of distribution of the Central American Ichthyofauna. In G. G. Stehli and S. D. Webb (eds.), The Great American Interchange, 453-473. New York, Plenum Press.
Bermingham, E., Banford, H., Martin, A. P. and Aswani, V. 1997. Smithsonian Tropical Research Institute Neotropical Fish Collections. Pp. 37-38 in L. Malabarba (ed.), Neotropical Fish Collections. Museu de Ciencias e Tecnologia, PUCRS, Puerto Alegre, Brazil.
National P267ry and Mapping Agency. 1997. Digital Interim Geographic Names Data, Series:GAZGN, Item: DIGNAMES, Edition: 002, NIMA Ref. No. GAZGNDIGNAMES, NSN: 7644014174141.
Eschmeyer, W. N. (ed.) 1998 Catalog of Fishes. California Academy of Sciences, San Francisco.
FishBase 1997. FishBase 97 CD-ROM. ICLARM, Manila.
Nelson, J. S. 1994. Fishes of the World. 3rd Edition. John Wiley & Sons, Inc., New York.
Berra, T. M. 1984 An Atlas of Distribution of the Freshwater Fish Families of the World. University of Nebraska Press, London.
Bussing, W. A. Freshwater Fishes of Costa Rica. International Journal of Tropical Biology and Conservation Vol 46 June 1998 Suppl. 2. ISSN-0034-7744
Meek, S. E. and Hilderbrand, S. F. 1916 The fishes of the freshwaters of Panama. Field Museum of Natural History, Zoological Series 10: 217-373
Loftin, H.G. 1965 The Geographical Distribution of Freshwater Fishes of Panama. Ph.D. Dissertation. Zoology. Florida State University
Myers, G. S. 1938 Fresh-water Fishes and West Indian zoogeography. Ann. Rep. Smithsonian Inst. For 1937, pp. 339-364.
Myers, G. S. 1949 Salt-tolerance of fresh-water fish groups in relation to zoogeographical problems. Bijdr. Dierk. 28: 315-322.
Myers, G. S. 1951 Freshwater fishes and East Indian zoogeography. Stanford Ichtyol. Bull. 4(1): 11-21.
Darlington, P. J., Jr. 1957 Zoogeography. New York, Wiley.
A Conservation of the Terrestrial Ecoregions of Latin America and the Caribbean by Eric Dinerstein, David Olson, Douglas J. Graham, Avis Webster, Steven Primm, Marnie Bookbinder, George Ledec. In association with the World Wildlife Fund and The World Bank, Washington, D.C. Pages 12 - 20.
Dr. Vijay Aswani Researcher
Dr. Eldredge Bermingham Staff Scientist
Smithsonian Tropical Research Institute, Unit 0948, APO AA 34002-0948, U.S.A.
Fax: 011-507-228-0516 Tel: 011-507-272-2564, 5840, 2435