The Harvard Geospatial Library: Campus Wide Access to GIS Data

Introduction

The Harvard Geospatial Library is an effort underway by the Harvard University Library System to facilitate the discovery and use of geospatial data. This effort is a part of Harvard’s Library Digital Initiative (LDI) to promote the use, preservation and dissemination of digital materials.

GIS at Harvard is highly decentralized, as is the University itself. Faculties, libraries, Institutes, internal programs, all use GIS, which leads to duplication of both data and effort. Currently, only two schools offer formal classroom instruction in GIS, but plans are underway in many departments, such as archeology and the government, to add GIS to the curriculum.

The Harvard University Libraries have long been adding digital geospatial data to their holdings. As the sole surviving piece of the Geography Department, the Harvard Map Collection is the primary recipient of these resources. The Map Collection’s holdings total hundreds of gigabytes of data, most of it on CD-ROM in the stacks. In order to utilize the resources, students have to be physically present in the Map Collection, as there is a strict non-circulation policy for all holdings. For students, who are often studying to all hours of the night, this is a large disadvantage. Other campus venues offer GIS software, but not the data rich holdings of the Map Collection.

The HGL is a system to make GIS resources available to all members of the Harvard Community via the Internet. It is also a system to help students find the data they are interested in, and hopefully, it will show many students that GIS is what they need, even though they might not know it. By expanding the availability of GIS and geospatial data, the HGL will expand the awareness and knowledge of this increasingly important technology.

Technical Overview

Based on the Esri product suite, the HGL is a complex arrangement of integrated modules: the catalog, the data repository, and the web user interface. All GIS data CD-ROMs acquired by the Library are cataloged in the Harvard On-Line Library Information System (HOLLIS). This is one way in which students can find out where data is located at Harvard. HOLLIS cataloging uses the MARC standard, which has many useful fields, but is not designed for in depth documentation of geospatial information. For that reason, the catalog portion of the HGL was based on the FGDC content standard for geospatial metadata. Additional fields were added to hold information that is needed to run the application (rendering information, Harvard specific access restrictions). This detailed metadata is stored in an Oracle database as XML, indexed and is searchable. The second piece of the system is the data repository. The repository stores spatial data in SDE running on top of Oracle. The final piece of the system, the user interface, was the most complex to construct. The goal of the project was to provide enough functionality so the user could find data of interest, look at detailed metadata, and view the data so as to make an informed decision as to whether or not the data is of use. We also needed to provide the ability to choose a number of layers and display them together in a data exploration environment and complete simple cartographic tasks, such as creating thematic maps. The whole system had to be scalable, as the number of layers available for users to choose from will soon exceed 1,000. For this reason, the HGL, powered by ArcIMS, implements both the ArcIMS HTML and Java Viewers.

The first step in using the HGL system is the search. Users can search in the way they are used to searching for library holdings: title, author, or keyword. Additional terms that are domain specific, such as scale, are also available. The user is presented with a map in order to select the area of interest for their search. The map search tool is custom designed, and the search runs against the metadata catalog. Once a result set has been returned, the user is given a list of publications, layers and themes that matched the search criteria. Returning publication level information allows the user to trace back to a specific publication to see if it contains other useful data. Each theme, or attribute, for a layer is listed, and each attribute has two hyperlinks. The first links to a dynamically generated page displaying the full metadata record for the layer. The second sends the user to an ArcIMS HTML viewer that displays a thematic map of the selected attribute. This page is created using Arc XML (AXL) rendering code fragments that are stored in the repository along with the geographic data. The AXL code is created when the data is first loaded into the system. When the user clicks the link, a request is sent to a servlet, which then starts a dynamic map service using the stored AXL code. Session management functions track the name of the new service, so the user is assured of getting the correct map.

Once a user has found data layers they can be added to a “shopping cart” and either downloaded as zipped shape files, or added to the HGL exploration environment. Session management functions remember the coordinates of the search area, and data files are clipped to the area of interest, to avoid having users try to download the entire hypsography layer of the Digital Chart of the World. Because of the current browser dependencies of the Java Viewer, only users of Internet Explorer are give the option to proceed to the exploration page.

The exploration environment in powered by the ArcIMS Java Custom Viewer, and is accessed in the same way as the HTML viewer described above. A map service configuration file is written, and a map service is started based on the layers in the user’s shopping cart. A secondary AXL file is written as required by the Java Viewer. Once in the exploration environment, users can create their own thematic maps and perform more complicated analysis.

Challenges:

Making GIS functionality and data so easily available brings new challenges to the University. Proper training is a must, but since HGL is a library application, the GIS expertise will not be located in the same place as the computers where the tool will most commonly be used. Reference librarians have to be made aware of the resource, and also of the types of applications that would benefit from GIS analysis. The first step is an awareness campaign, and a number of “GIS for Librarians” training sessions. Once the staff knows what the HGL is, they will be able to identify those librarians who would benefit from more training in how to use the exploration environment. The goal is not to create a library staffed by GIS experts, only to create an awareness of the technology in general, and the HGL system in particular. As demand for GIS increases, it will be the responsibility of the various departments at Harvard, working with the libraries, to develop the support mechanisms needed. A variety of solutions are currently being discussed. In the interim, points of contact for GIS support beyond the basics have been chosen. An extensive help system is under development, and Harvard currently has a subscription to the Esri Virtual Campus for technical training.

By making this research tool available, we hope to meet two goals. The first is to spread the word about GIS around the campus, so more students and faculty can take advantage of the power of the technology. The second is to get GIS out of the libraries, in a way. By making the data available over the Internet, the Harvard Community will be able to pursue GIS activities on their own, and won’t have to rely on the support of the library staff.