Roger Goldsmith

Some Applications and Challenges in Extending GIS to Oceanographic Research

Oceanographers have for years developed their own specialized GIS software for the analysis and display of their data. As the GIS field has become more mature, and new web oriented products are being developed, this seemed an appropriate time to review what a powerful desktop product like ArcView could contribute to oceanographic research. Some prototype applications that were developed to make researchers aware of the capabilities of current GIS technology are presented here, along with some of the challenges that need to be addressed to allow more effective integration of these two fields.

For those of you old enough to remember the Muppet television show, there was a segment called the Muppet News. I remember one skit that went something like `This item just in. We have received word that the Atlantic Ocean has been kidnapped and is being held in an apartment in New Jersey'. This sort of highlights one of the problems confronted when applying traditional GIS concepts to deep-water oceanography. Contrary to the joke, the Atlantic Ocean is not a single, easily defined or delimited feature. Just what is the Atlantic Ocean? What are the borders, both on global and very local scales? Where, for instance, does the Atlantic Ocean become the Arctic Ocean or the Gulf of Mexico? On a more regional scale, what is the Gulf of Santa Catalina? Where is the edge of a wetland? The problem is further confused when we talk about Antarctic Deep Water, which may be found in the North Atlantic, or Labrador Water found in the tropics.

Everyone is probably familiar with some of the more traditional oceanographic applications in the commercial arena. These include oil spill/contamination monitoring and hazard navigation. It is also true that GPS technology has been a tremendous aid in determining accurate mid-ocean positions, perhaps even more so than on land where there are often physical or cultural landmarks. And GIS certainly have been put to use in the traditional cataloging of various satellite platforms imagery. But it is not much different to apply the imaging concepts to synthetic sea floor bathymetry, and this is one of the points I hope to convey. The problems are similar. They are often just phrased in different terminology. In land-based applications, for instance, what constitutes the Rocky Mountains? In fact, what makes a mountain a mountain? Is it elevation? Is it the difference in elevation from some reference level? Or is it a mountain just because somebody named it a mountain?

How can we get an overview of the metadata? Research is becoming more inter-disciplinary in scope. And while oceanography, precisely because of its lack of landmarks, has developed using geographic coordinates it has done so employing a bewildering array of non-standard conventions. (I am always converting units, position formats or dealing with a positive west and depth.) There is also the three-dimensional aspect (depth), resulting in many points at apparently the same geographic location. And like many lab experiments, there are often replicates, time varying samples that are important in determining trends. We are finding that decadal time scales are important to understanding climate change so we need to keep track of those old data sets. And finally, how does one get over the hurdle that `GIS is land-based utility software that does not apply to us', the classic `not invented here' syndrome.

These are some of the problems researchers in the physical sciences have traditionally confronted. They have developed contouring, three-dimensional and satellite imagery analysis tools. Whether they have known it or not, they have been using GIS techniques for decades. But the formal field of GIS has matured to the point where I feel it has a lot to offer, not only in the analysis tools themselves but in the whole performance to price arena. The benefits obtained by the oceanographic community would be a standardization of procedures available in COTS software. Conversely, by addressing some of the needs of the oceanographic (and other earth science) researchers, I think GIS could make use of some of the techniques those fields have developed over the years. This would make GIS that much more robust and widespread. We are seeing some of this capability emerge in Esri's various extensions (Geostatistical, Spatial, 3D and Image Analysts extension). Value added resellers such as ERDAS have also developed specialized products, but I am addressing the basic processing capabilities.

I would like to proceed by taking a brief look at some projects I've initiated to get the local oceanographic community to take a closer look at what the GIS field can offer in the way of helping them collect, organize and analyze their geographically oriented data. I wanted to extend GIS technology into areas where there are not roads, highways or other traditionally registered features [Figure 1]. By its very nature oceanographic research is involved with geographic data spread over the entire globe. Whether they have known it or not, researchers have been using GIS techniques for decades. One of the tasks is integrating relatively sparse measurements collected during large, basin-wide experiments with more detailed regional studies such as for continental shelves, estuaries or even local wetlands. The ability to synthesize data at variable scales is one of the strengths of a GIS.

Most GIS veterans will agree that data collection, and its attendant organization, is the major part of any GIS effort. Oceanography is no different. We need a way to inventory who did what, when. We would like to view the historical record not as a group of individual projects but as a collection of data. Well, that is the ideal, but we needed to start with something a little more manageable. I worked with a group of physical oceanographers to define a limited subset of observations, namely the moored current meter collection. These measurements go back about 40 years. They involve mooring an instrument in the ocean and continuously measuring the direction and speed of the water mass as well as additional properties like temperature or conductivity [Figure 2] . The purpose of this GIS project was twofold; 1) collect the data into a common repository with common formats and 2) allow researchers to view and query data availability, then download it as desired. The basic data unit here is the time series [Figure 3]. Time dependent data sets off alarm bells in the minds of most GIS users and I am not going to offer any solutions here.

Figure 4 shows the initial cut at the ArcView project. Superficially it is just a geo-data inventory problem. If we want know what is available in a specific area ArcView offers a number of simple methods. Likewise, if we want to know what is available for a specific time, such as the summer of 1986, we can invoke a simple query. [Figure 5]. If we want to show those data sets where we have summer observations, itself not a very specific term but lets say during June, July or August, the queries get quite a bit more complicated with the time series data structure. Here then is one of the first problems we encountered. How do we keep the user interaction simple enough for infrequent users while not having to expend a lot of time developing special fill-out forms, query pages or the like. There simply are not enough resources for this type of effort. This is especially the case for smaller, independent projects that are not part of a larger national or international effort.

One solution is to go back to the design of the metadata and include any fields you think might be useful. This is easy to say, not always easy to implement. In this case we might like to have the ability to treat each measurement individually with location and time properties. Then we might be able to retrieve for a specific time, in order to do a spatial snapshot, or as a time series. Currently the number of data points and tools to facilitate these dissimilar types of analysis are not readily available other than through something like the database connection. That is an entirely new technology the researcher would be required to learn. Keep in mind this is a fairly restricted data set. Think of something like the continuous hourly temperature records for all the weather stations in the United States and how the Weather Service might use a GIS application.

A second problem comes to light when we look at the results of a query for a selected point [Figure 6] . We need to be able to distinguish the times at which each data unit was measured; we also see several data sets apparently at the same location. But a closer inspection reveals they are at separate depths. Again, we have to design the metadata to at least allow this distinction but that only defers the problem further down the analysis path. The three-dimensional aspect of the environment is a very important component in the analysis of most earth science data. I refer here not to the standard single valued function of x and y, but the volumetric type analysis that can be found in software package like AVS, Matlab or Surfer to name just a few. The problem is further compounded by the fact that the data is not uniformly sampled in depth; there are no standard levels. Observations may appear anywhere, as we shall see more of later. A lot of work has been done in analyzing well logging data and it seems like some of the technology could be incorporated into GIS, providing better tool for all fields. This problem will only get worse as the number of interdisciplinary studies, such as looking at air-sea interaction measurements, increase. By addressing some of these other fields GIS could serve the role of providing common data structures, formats and analysis tools for the average researcher.

Still, our prototype does offer users on-line access to the data. The IMS MapCafe interface was used, without much adornment, to allow users to view and query the collection. Using a modification of the Identify tool [Figure 7] they can display or download selected data series. Note that we provide both ASCII and NetCDF versions of the data. One lesson we took away from this prototype was the importance of the metadata and the resulting design of the tables. It needs be as flexible as possible and yet allow the users simple interaction.

A second project we've undertaken is another data inventory problem. This time we are dealing with sediment traps. Again, we are faced with the problem of an instrument sampling at variable depths. And it is collecting data over time, although in this case not nearly as frequently. In fact, the initial population of the project tables was taken from an Excel spreadsheet, a very big spreadsheet. What makes this application a little different is the integration of a bibliographic database from the EndNote package. The bibliography incorporated those articles with references to the data displayed on the map. This allows retrieval of articles by author, keyword or location. There is nothing fancy here; it is not much different than land or parcel records. I'd like to say this project is further along, but again, there are resource limitations. The other challenge here is how to access and process time-dependent data. It is a common scientific process to obtain replicate measurements, whether globally at sediment sites, regionally for fisheries stocks, or locally for the biodiversity in wetlands. The development of analysis tools for this geographic and time dependent data still lies in the hands of the researcher. Here is another area were the GIS community could develop and provide more tools.

A third project is another data inventory project but it has some unique elements. Not the least of which is the principal engineer and data collection manager is in Norway for a couple years. Here the IMS web based product makes a very nice way of keeping him abreast of the state of the collections and the project. The core sites are shown in Figure 8. The project started with the request to automate the process of generating the lithological description that goes with each core [Figure 9]. The core locations were already in an ASCII record oriented structure that was accessible through a DOS-based query procedure. Having identified a core, based on some general attributes, the researcher would go through dusty volumes to glean the specific description properties. To automate the process I decided to enter the descriptions into a database (Sybase for historical reasons). A windows-based interface (Sapien's Ideo) allowed users access but it was very hardware specific and seat limited. Only the data manager had practical access. Then hardware and software licenses changed and it was necessary to make it web accessible. I felt this would be a good time to get them into the GIS world. After all, it was primarily a geographically oriented collection so why reinvent all that capability. ArcView combined all the functions of data organization, display, query and extraction in a single package with web capability. We've added bathymetry [Figure 10] to provide some environmental cues. Well, this was a case where all that was needed was the initial push to start a chain reaction. Having even a very small subset of the sedimentary data in a database allows access from ArcView. And while many users would not attempt to learn SQL, and probably would not have funds for the development of a separate interface, they think nothing of using a package like ArcView to start making simple queries. `Show me all the cores longer than 2 meters in depths greater than 5000 meters' [Figure 10] . As the detailed descriptions data start getting incorporated into the database new types of questions come to mind. `Show me all the cores where forams are found in greater than 15% abundance more than 200 centimeters beneath the surface'. While only the new data is being analyzed for entry into the system, the enthusiasm has led to the scanning of all those dusty volumes. Now those images, stored as PDF files, are available through the IMS interface [Figures 11] . And OCR tools have been acquired to convert these images into the digital detail descriptions needed to populate the database with the historical data. So in a very real sense the maturity and ease of access of the GIS software has pushed researchers into areas of analysis that had not been thought practical before except in large, well-funded projects. We have made some use of the Spatial Analyst extension to start analyzing the data. This is pretty straightforward, the advantage being that it keeps everything in the same package and allows researchers ease of access. When we finally get the descriptive data at depth entered we will be back up against the problem, as these figures have shown, of multiple features at the same location.

I would like to move now into projects that utilize GIS in a more operational mode. A considerably larger project is the tracking and archiving of drifting buoy data. Drifting buoys, both surface and subsurface, have been used for more than 30 years to track the ocean's currents. Figure 12 shows a subset of surface drifters and sub-surface floats. Positions have been measured, using a variety of technologies, at rates varying from every ninety minutes to twenty days. The subsurface drifters, put at varying depths down to 4000 meters, are particularly interesting because they `feel' a lot more ground than we see in our normal view of the sea surface [Figure 13] . Even when not running aground, some seemingly anomalous behavior can be explained with the introduction of bathymetry [Figure 14] . The ArcView Spatial Analyst extension has also been useful in monitoring float progress as it changes depth and the depth varies along its track. I have no doubt that the Image Analyst extension would also be useful in combining satellite imagery with float positions, but we currently have not required that capability.

The state of instrument design is such that oceanographers now envision launching a global array of sub-surface floats trajectory [Figure 15] which periodically rise to the surface and telemetry their position and acquired data to satellite. The information is then relayed as e-mail to the researcher's desk. Operationally then, we have been developing the software capability, using ArcView, to allow project/mission planners to monitor the operation. The capabilities include looking at the most recent surface position and the inferred trajectory [Figure 16] . For simplification of this spaghetti bowl, and this is only a relatively small experiment, we also can show where a selected buoy was launched and the net displacements trajectory [Figure 17] ; these can be selected for the group or by current (or launch) position. We also like to know where data is available from the profiles the instrument takes on its way to or from the surface [Figure 18] . For both planning and political operations we need to consider when a buoy might enter jurisdictional waters [Figure 19] .

Ultimately we would like to provide the data, probably in a manner similar to that presented with the moored buoys earlier. And that revisits the problem I said I'd address later. One of the common formats used by the research community for storing and exchanging data is the UCAR/NCAR developed NetCDF (Network Common Data Form). It is public domain, which perhaps explains why there are so many users in the academic and research community, and it has some measure of portability, standardization and maintenance support. It is particularly good at storing data series and arrays. There is an accompanying suite of software used for data extraction and manipulation. It would be nice if GIS users could access data stored in this form like they do from other tables or databases. There should at least be standard hooks for readily incorporating these files into ArcInfo or ArcView as a data layer or events. Or perhaps there could be a NetCDF to shapefile conversion?

Web-based interfaces such as DODS (Distributed Oceanographic Data System) and Matlab are starting to address this need. I believe a presentation was given here last year (Dawn Wright, Oregon State) about the ArcGMT software, which interfaced ArcInfo with the public domain GMT display software. These types of utilities are key to allowing researchers access to at least the more common data formats. Although it is nice to have a company mandated GIS policy, I suspect the real world is not that much different from the academic/research institution. The data is everywhere and in just as many formats. (And don't try to set a standard if you are in the computer department.) Maybe someone out there is already working on this problem; if so I'd like to talk with you because it has been kept pretty quiet until now. I think this is another area where the GIS software vendors could expand their market if they could figure out a way to provide access to the common scientific data formats.

A final project with operational overtones has been a prototype application for introducing GIS capabilities into the cruise planning or Marine Operations environment. There have historically been many tasks that go into planning a cruise. These start with `where we are going'? As I alluded to at the beginning of the talk, there are no nice roads and markings in the ocean. When initial discussions mention the East Pacific Rise, not everyone knows where it is. I have populated this application with a couple of Thesaurus [Figure 20] that allows the user to search for a category of features or a specific site. Support facilities might require trips to the nearest (friendly) port. We might also have to obtain clearances if we have to pass through any foreign jurisdictions. Planning the cruise involves computing a track length [Figure 21] . If bathymetry is being measured en route or moorings are being set there might be a need to know the depth for any cable needed. There is a whole electronic charting industry that provides harbor charts, port facilities and other expensive, proprietary information. I'm sure that something like the Network Analyst extension could even be made to handle current and transit conditions in a manner analogous to traffic flow.

I'm sure many of you are saying, or said in the course of the talk `well, you can already do that in ArcView' or `I know where there is a script that does that'. But that is the whole point. GIS now offers inexpensive software tools adaptable to a wide range of problems outside the traditional scope. There is usually a very favorable performance to price ratio for users in the academic research market. It is time for a wider community to look at what a GIS package like ArcView has to offer. Researchers have to be made more aware of the capabilities. But by the same token, a lot of analytical tools have been developed in sciences such as oceanography, meteorology, climatology and geology. The GIS industry needs to broaden its thinking and see what might be learned from scientific markets.

Conclusion

In summary, a robust, cost effective GIS software package like ArcView can benefit researchers in the oceanographic research community.

GIS offers the opportunity to bring together diverse data sets. One of its strengths is assimilating data over diverse scales.
The web serving component is a great asset because in sharing you also get feedback.

There are also several issues that need to be addressed by both potential users in the earth science fields and vendors of GIS software.

Many disciplines lack standard nomenclature and definitions. What constitutes a feature at any level (point, line, area or volume)?
There are too many formats in the real world. Utilities or hooks are needed to provide simplified access into data sets in scientific formats.
Continued work is needed to simplify the handling of data employing vertical and time dimensions.
There is, of course, the memory problem; we are not too far away from wanting to look at everything in the whole world at once.
Metadata table design is very important as it is one method of allowing access to greater quantities of data.
Extending GIS applications into new fields requires simple, flexible user interfaces. More attention should probably be given to features such as the Dialog Designer.

Acknowledgements

I'd like to thank the following WHOI staff for their contributions of data and time: Jim Broda, Dr. William Curry, Ruth Goldsmith, Dr. Nelson Hogg, Dr. Susumo Honjo, Steve Manganini, Dr. Breck Owens, RADM Richard Pittenger USN (Ret.), and Susan Tarbell.

Arc/INFO and ArcView are registered trademarks of Esri.
EndNote is a registered trademark of ISI ResearchSoft.
Excel is a registered trademark of Microsoft Inc.
Matlab is a registered trademark of The MathWorks Inc.

Figures

Figure 1: The mid-ocean conspicuously lacks lines or other recognizable landmarks.
Figure 2: Current meter deployment configuration.
Figure 3: Example plot of a current meter time series.
Figure 4: Chart of buoy experiment sites.
Figure 5: Query construct for summer months, resulting map.
Figure 6: Detail area of buoy sites; query table.
Figure 7: The web page result from IMS MapCafe `Identify' request page allows users to retrieve graphics or data.
Figure 8: Chart showing the locations of the sediment core and dredge sites.
Figure 9: Sample lithologic description from SEDCORE 2000.
Figure 10: Sediment sites with bathymetry and query.
Figure 11: Retrieval of archive .PDF images for detailed information.
Figure 12: Chart showing a subset of surface drifters and subsurface floats in the Atlantic Ocean.
Figure 13: Chart showing the sea level and 4000 meter depth planes.
Figure 14: Chart showing the use of bathymetry in the analysis of float trajectories.
Figure 15: Proposed ARGO float deployments (Courtesy of ARGO/Scripps Institution of Oceanography.
Figure 16: Chart showing most recent ACCE float positions and inferred trajectories.
Figure 17: Chart showing ACCE float launch positions and displacements to last known position.
Figure 18: Chart showing locations of ACCE profile measurements.
Figure 19: Chart countries with shallow EEZ jurisdictions.
Figure 20: Chart containing thesaurus, ports and EEZ jurisdictions.
Figure 21: Chart showing cruise track length and depth along track.

References

Unidata, DODS: Distributed Oceanographic Data System, http://www.unidata.ucar.edu/packages/dods , University Corporation for Atmospheric Research, Boulder, CO.
Unidata, NetCDF: network Common Data Form, http://www.unidata.ucar.edu/packages/netcdf , University Corporation for Atmospheric Research, Boulder, CO.
Wessel, P and Smith, W. H. F. (1995) New version of Generic Mapping Tools released, Eos, Transactions, American Geophysical Union 76(33), p. 329.
Wright, D. J., R. Wood and B. Sylvander (1998), ArcGMT: A suite of tools for conversion between Arc/INFO and Generic Mapping Tools (GMT), Computers and Geosciences, 24(8) pp. 737-744.

Roger A. Goldsmith
161A Clark CIS/MS #46
Woods Hole Oceanographic Institution
Woods Hole, MA 02543

Tel: 508/289-2770
e-mail: rgoldsmith@whoi.edu