Ted Habermann and John C. Cartwright

NOAA National Data Centers - A Distributed GeoSpatial Data System

Partnerships are an important element in conceptual models of distributed geospatial data systems. The NOAA National Data Centers (NNDC) have recently integrated GIS with data management and web access systems. This project has involved partnerships on many levels: within Data Centers, between Data Centers, and between NOAA and the USGS. Finding government management models that enable effective partnerships has proven difficult. It has also been difficult to evolve the Esri Internet tools into an environment with distributed data sources. This process presents considerable technological and usability problems including connection maintenance, screen real estate, state maintenance, and browser capabilities.


The Project

The National Oceanographic and Atmospheric Administration (NOAA) operates Data Centers in Asheville, North Carolina (Climatic), Boulder, Colorado (Geophysical) and Silver Spring, Maryland (Oceanographic). Together these are referred to as the NOAA National Data Centers (NNDC). They manage a diverse and rapidly growing collection of data describing everything from the interior of the earth to the sun. The goal of these data centers is long-term data stewardship. This includes assessing and improving the quality of data, integrating data from different disciplines, and making data available to users from all over the world. The NOAA National Data Centers Server project (NNDCServer) was recently begun as part of an attempt to build a single system that addressed these common data management needs. It is the data access and web service part of the larger NOAA Virtual Data System (NVDS).

Using Geographic Information Systems for data management and access is an important part of NNDCServer. The system architecture includes Esri's Spatial Database Engine (SDE) at all three data centers with supporting clients and internet map servers.

Partnerships

Conceptual models of distributed geospatial data systems include many implicit partnerships (Fig 1.). The NNDC Server Project has provided experience with partnerships on many levels. At the lowest level are partnerships between technologists and data managers at each data center. At the highest level are partnerships between government agencies. Developing these partnerships presents a multiplicity of challenges. Figure 1
Figure 2 A common approach at the data centers involves a data manager that has end-to-end responsibility for a collection of data sets. They solicit and receive data from providers, they develop technologies for data access, quality assessment and improvement, they create documentation, and they make data available to users. Many data managers build custom software systems for executing these tasks. These systems are designed specifically for their data and their customers. The introduction of a distributed geospatial data system into this environment brings fundamental changes in data management paradigms. The data migrate into a centralized database system and the tools that are used to deal with the data must evolve into geospatial database tools. This obsoletes many of the tools that the data managers have developed through the years and converts their situation from one of complete control to one of complete incompetence. This is a difficult transition that requires supporting partnerships with management, other data managers, and technologists that are implementing the new data system.
The next set of partnerships are those between different groups within an organization. The well known adage that organizations cannot create products that don't reflect their structure has emerged because of the difficulty of forming cross-group partnerships. These partnerships also require significant paradigm shifts. The groups have generally been competing for resources for years. Each group has created an identity that is built in part by separating themselves from the others. Now the plan is for these groups to work together. Building the trust required between groups is a difficult process that must be addressed specifically. Ironically, the same group forming dynamics that make it difficult to bring these groups together must be used to strengthen the group implementing the change. They must believe that they too are a special group. The interplay between team building and trust building is very tricky. Figure 3

A second layer of complexity is added by the fact that developing these new systems is likely to involve the innovative risk takers from these groups. These are precisely the people that managers least like to share, for they are the people that bring competitive advantages to the group. Those advantages are critical as the groups continue to seek resources outside of the project that involves the partnership. It is natural that the managers will resist giving these people up and, therefore, not contribute to the partnership in a meaningful way. In fact, one of the most effective means of resisting the change is to provide people that lack the required skills. This looks like compliance on the surface while it achieves the goal of slowing down the implementation of the new system.

The final set of partnerships are those between different government agencies. These partnerships bring all of the problems of the second group raised to a higher power. The agencies that are likely partners are involved in similar work and, therefore, are exactly the agencies that you historically compete with for resources. Managers within one agency are even less likely to want to share critical resources across agency boundaries. Turf issues are exacerbated as higher level managers become aware of real interactions across agencies and finding effective mechanisms for sharing credit and potentially income from geospatial products become very important. Even in multi-agency projects, funding from partnering agencies many times stays in those agencies rather than being distributed in a way consistent with the partnership goals. Figure 4

Our experience indicates that creating and maintaining effective partnerships is the most difficult task associated with distributed geospatial data systems. All of the relationships discussed above are important and any one is capable of scuttling the project if not well managed. Two aspects of management continue to be particularly challenging in the NNDC Server project. First, the traditional hierarchical management models prevalent in the government and the resulting turf culture. We have not been able to prevent "taking the lead" on a project from becoming "make this your project". The entire project management process simply falls back into stovepipe mode too easily.

Second, we considered the NNDC Server project to be a technology project and assigned IT managers to it. Our experience indicates that, in fact, this is a project aimed at changing the way people think about their jobs and transforming business as usual. It is much more sociological than technological. Such projects need more leadership than management. Managers that are fluent in leadership are hard to find

Distributed vs. Local Data Systems

Much of the work done to date on geospatial data for the web involves presenting maps made up of layers generated from local shape files. Some very impressive systems have been built using this paradigm. The NNDC Sever was conceived from the beginning as a distributed system. We started with Map Objects code that was designed to sit on top of local shape files at the USGS. We hoped that evolving this code to function in a distributed environment would be straightforward. This turned out not to be the case.

One of the principle differences between local and distributed data systems is the management of connections that is required in distributed systems. The ubiquitous model for connections comes from the http world, although there are many similarities with database connections. In this world connections are quick and ethereal stateless interactions. The keyword here is "stateless". All relevant information is passed through the connection at the time that it is made and thus the connection knows nothing of the past or the future, the "state" of the system. Figure 5
Figure 6 This is very different than the model used in the Esri clients that I am familiar with. In those cases the history is one of very "stateful" and persistent connections. These might more aptly be called "sessions". The client and the server both store a considerable amount of information about their interactions. This decreases the amount of information that needs to be reiterated with each connection and thus makes the interactions faster and easier.
The fundamental differences between these connection models caused considerable problems as we migrated the National Atlas code to a distributed environment. The most obvious problem was the need for many SDE connection licenses. Multiple instances of internet mapping applications are a critical piece of Esri's scalability model. In fact, a principle task handled by the map server is distribution of requests across those instances. In the local data case, multiple instances run on a single machine or a cluster of machines that hold the shape files that are the basis of the maps being served. There are no "connections" in this model except for those between the applications and the file system. When the underlying data are migrated to SDE, a connection license is required for each running instance of the Atlas. In a system like NNDC this quickly led to a requirement for over 20 connection licenses at each site. This increases the cost of each SDE installation considerably. Of course, we did not become aware of this requirement until after our budget was gone. Figure 7
Figure 8 Part of our solution to this problem involved design and implementation of a web-based Enterprise SDE Connection Monitor. This tool monitors connections on all of our SDE servers and stores the results in business tables in one of our underlying databases. A web page shows information about active connections from three MOIMS instances to all of our SDE Servers. This page refreshes every minute, so it is always up-to-date.

The more interesting problems related to connection management have to do with state management in a stateless http world. This problem is ubiquitous in web-based applications and approaches to solving it are well known. They fall into several categories: hidden variables in web pages, cookies, server-side state databases with session id's, and client-side java and memory. The latter approach is that emphasized by Esri's recent IMS development. IMS comes with tools for creating your own map server! The creator chooses which themes will be mappable and arranges the user interface by bringing together a custom selection from available components.

This approach may work well for creating stand-alone map servers in the mold of existing local shape file servers, but it comes with its own collection of issues. First, client side java still brings challenges to many web users. Our experience suggests that java remains essentially unusable for many web users. This is particularly true for large java applications like those required for interactive mapping. Many of us believe that this is getting better with time. Maybe so, but it is not getting better very fast.

A more challenging aspect of this approach is that it adds an "all or nothing" aspect to the internet mapping technology. The canned Internet Map Servers make it difficult to phase a mapping component into existing web services. Such phasing is critical to the development of the partnerships between data managers and technologists described above in that it creates the short-term benefits required for trust and motivation to continue. This is addressed to some extent by the HTML generation available using the Image Server and will hopefully be improved when the ArcIMS Software Development Kit becomes available.

Conclusion

NOAA's three National Data Centers have been working to integrate GIS into their existing data management and web presentation systems since the beginning of 1999. This most difficult lessons to learn are sociological and management lessons associated with any significant change in the way people do business. The most challenging technological lessons involve migrating from a system based on local files to a distributed system. We have addressed problems with SDE connections using an automatic monitoring system. The recent decision by Esri to sell read-only SDE licenses should also help considerably. The stand-alone map server approach taken by Esri works well for new customers who want to put maps on the web. It makes it difficult to phase GIS capabilities into an existing web site.


Ted Habermann
Information Architect
NOAA National Geophysical Data Center
Ted.Habermann@noaa.gov
John Cartwright
Associate Scientist
Cooperative Institute for Research in Environmental Sciences (CIRES)
University of Colorado / NOAA National Geophysical Data Center
jcc@ngdc.noaa.gov