Creating a Multi-tiered, Multi-server System: Trials and Tribulations with ArcIMS

Clark A. Roberts
Carolyn S. White

Abstract: MapIllinois is an ArcIMS prototype project that was begun at the University of Illinois. As development progressed, MapIllinois became the basis for a variety of WEB sites that had diverse target audiences and deliverables. This growth demonstrated the need for a scalable and fault-tolerant WEB infrastructure with high availability. Our constraints included near term expansion and limited financial resources, this lead to using industry standard hardware (Intel-based, Windows 2000 servers) and software systems to provide the platform.

What we learned in the process is that there is much more to implementing a multi-tiered, multi-server system than our reading suggested. Issues such as network speed between computers and to the DNS server, network and computer naming conventions, and what load-balancing scheme should be implemented become real issues. In the end it was a few limitations in the ArcIMS software coupled with security patches to the Windows operating system and WEB server that convinced us to take another path, at least until the sustainability issue is addressed.

Introduction

This paper documents the development of a prototype multi-tier, multi-server, interactive GIS decision support system. The constraints on such a system are numerous but the main ones include: time, money and expert knowledge in multiple fields. The biggest constraint by far was the information on exactly how to configure such a multi-tiered system and how to maintain the servers long-term without upsetting the delicate balance between them.

Concerns Guiding the Design Specifications

From a system administration design perspective, four concerns guided the multi-tier, multi-server design specifications:

  1. Load-balancing multiple web servers
  2. Scalability
  3. Fault tolerance
  4. A limited hardware budget

Load-balancing Multiple Web Servers: At the time development of this system was under consideration, the notion of using multiple servers working in a distributed fashion was rather new in the GIS arena. On the other hand, WEB servers had been utilizing load-balancing techniques for some time. While a session might be devoted to reviewing the multiple hardware and software solutions one might take in choosing what load-balancing technique to use, we decided to use the built-in load-balancing features of Microsoft Windows 2000 Advanced Server as the base for this system’s front end. The project had a strict budget with no funds to hire additional system and hardware specialists. We needed to rely on the institutional knowledge in our academic setting, including system administrators who could setup and support such a configuration. Our approach was to take advantage of standard software and hardware for which expertise was readily available.

Scalability: While hardware funds were initially limited, we felt confident the need for web-based GIS decision support systems would grow in the college of Agriculture, Consumer and Environmental Sciences (ACES) at the University of Illinois. A distributed computing approach would address the issue of scalability.

Fault-tolerance: A year earlier, a team of GIS specialists within ACES had developed an ArcIMS 3.0 map engine site now known as Map Illinois for Watersheds. This site operated from a single Dell Optiplex workstation. The workstation was self-contained in that the ArcIMS Application Server, the Spatial Server and the data all resided on that workstation.

The original site worked if the number of simultaneous requests to the server was limited, but if too many requests were made at once the system would crash. In demonstrations consisting of as few as five simultaneous users, the system repeatedly crashed and all users were then shut out. Such unacceptable performance was one driving force for the development of the prototype discussed herein; a way had to be found that would allow for the creation of a system that would easily scale and be fault-tolerant at the same time.

Windows 2000 Advanced Server allows for load-balancing clusters of up to 32 computers. It is also fault-tolerant and will continue to function as long as one server in the cluster is active. So adding and subtracting servers to a Windows 2000 load-balancing cluster is as easy as turning a server off or configuring a new one to be a cluster member. This solution was appealing and was deemed appropriate. The bandwidth to fully utilize all 32 possible servers in our situation would not be available for years.

Limited Bugdet: The hardware budget for this project was $11,000.

The Design of the Prototype

The First Tier: The system designed consisted of two WEB servers configured in a network load-balancing cluster (NLB).  A couple of single processor servers with modest system RAM was deemed sufficient.

The Second Tier:  Our study of the Esri literature indicated that greatest amount of processing power would be required by the ArcIMS Spatial Servers -- the second tier of our design. The tasks of the Spatial Servers can be computationally intensive so there was need for lots of computing power and generous amounts of RAM. The (then) new Pentium IV processor with its high clock speed and fast RDRAM memory was chosen for the spatial server processing.

ArcIMS 3.1 included a new feature; it allowed for support of multiple virtual spatial servers.  This feature met a key requirement to making the second tier fault-tolerant.

The Third Tier: This left us with only one question, “Where’s the data?” We needed a backend that would serve as the central clearinghouse for the ever-increasing amount of data utilized for this project. While the data were currently being served as Esri shapefiles, the ultimate goal was to convert the data to SQL databases to be accessed and managed through ArcSDE.  These requirements would require a server with a lot of power, storage space, and RAM, as well as a high bandwidth to provide the data to the middle tier spatial servers.

The System Specified: We now had a blueprint for a five-server system that should be expandable and fault-tolerant in the first two tiers.

Tier

Cost

Brand

RAM

Disk

Processor

WEB

$1000 x2

Gateway 6400

384 MB

9 GB SCSI

933 MHz PIII

Spatial Server

$2500 x2

Gateway E-4600

1GB PC800 RDRAM

40 GB 7200 rpm UDMA 100

1.7 GHz PIV

Database Server

$3500

Gateway 6400

2 GB ECC SDRAM

2x 73 GB 10,000 Ultra-160 SCSI

1 GHz PIII

The servers cost a total of $10,500, just within our hardware budget!

Additional Costs:  The College of ACES at UIUC contributes 40% of the overall cost of a University-wide site license for Esri products. As a result, this project was not charged directly for ArcIMS or other Esri software. A separate budget line allocated $1,500 for software licenses for the operating system. Other funds were reallocated for additional network cards and an 8 port 10/100MBit Ethernet switch with 1000MBit uplink port This equipment was to be used for internal cluster network traffic.

Software Used in the Design: As previously noted, Windows 2000 Advanced Server was chosen as the operating system for all of the servers in this prototype, in part because we planned to use the network load-balancing features available with it on the two WEB servers. Microsoft Internet Information Server (IIS) 5.0, Tomcat version 3.2 servlet software), and the ArcIMS 3.1 Application Server were all installed on the two web servers.  The two WEB servers were running in tandem as a Network Load-Balancing (NLB) cluster. Microsoft SQL was installed on the database server, as was ArcGIS 8.1.

A few custom tools had been developed for the original Map Illinois for Watersheds site and it was necessary that we incorporate these tools in the new ArcIMS 3.1 site. This required that we install Microsoft Access, Microsoft Excel, and Microsoft Visual Basic, as well as Esri Map Objects on the WEB servers.

Results of our Tests

For clarity, we will discuss the results in a rather disjointed way.

Network Issues: In our various conversations with Esri technical support it was stressed that it is difficult to write a single set of installation guidelines and troubleshoot problems in an individual network installation because networks themselves can be so varied – intranet or internet, with or without a firewall, host names or cnames or machine names, etc.

At various phases of our study, we had successes under one network specification that failed in another. But in all tests we were attempting to establish internet services without a firewall. As we will describe, the issue of host names, cnames and NetBIOS machines names caused us great problems. It would benefit users greatly if Esri documentation regarding these network naming possibilities were clarified.

Advantages of closed loop dual network: In our study we found that the most efficient method for communication between servers in a cluster is to utilize a closed loop dual network card setup that uses private non-routable IP numbers (192.168.x.x) to set the internal data paths. This keeps traffic not bound for the Internet off of the Local Area Network. If this strategy is not employed the system will still function, but the network has the potential to be flooded with all of the data traffic between the servers themselves.

Think about such a system with as many as 32 WEB servers and you soon realize that bandwidth is a definite limiting factor. Dual network cards present another problem because when things aren’t working as expected it’s very difficult to trace the data packet route. Sometimes we found packets on the wrong network for some reason. It would have been much easier if the dual network cards were only in the WEB servers but for remote management concerns we needed all of the computers to be seen on the network.

Network Naming conventions: Another problem we ran into with the network was naming conventions and shared network resources. The initial setup took place in a local department that managed its own IP space.  The department network and system administrators use a convention where a computer’s NetBIOS name and its registered DNS domain host name are the same. In this environment early tests of the multi-tier, multi-server design worked well with test data – or so we thought. It wasn’t until later that we realized that we had erroneously installed the Spatial Servers on the WEB servers in the test environment. Our conclusion that things were working was deceptive in that we had not really tested the second tier – requests to the spatial server were being handled by the WEB server rather than the separate spatial servers in the second tier.

Once the servers were moved to their hosting location in another IP domain, a separate naming convention was employed whereby the host name and the NetBIOS name were not the same. At this point configuration problems became significant. In a multi-tiered system it is critical for the servers to be able to find and communicate with one another. Often this was difficult, sometimes impossible.

We also found that the time on each computer’s system clock is of significance. If the system clock of each server differs by more than a few seconds it affects the results.  So some kind of time synchronization program is needed to maintain a sable environment.

With 20-20 hindsight and a year’s experience with ArcIMS installations, we can now guess about the cause of many of the problems we encountered. What worked at the time was to totally reconfigure the servers -- uninstall and re-install all of the software – the operating system, IIS, tomcat, and ArcIMS – so that we might get back to the point at which we were previously.  We might add that the attempt to uninstall ArcIMS wasn’t possible without painstaking editing of the computer’s registry. We can now guess that situation was created because the machines were relocated to a different IP domain. With the re-installation of ArcIMS we corrected the previous mistake and installed the Spatial Server only on the spatial server machines.

There are many pieces to getting an ArcIMS installation working and communication between the pieces (ArcIMS, the web server software, and tomcat and the network) is critical. Here were some of our complications.

  1. The installation of ArcIMS requires that you provide a login and password with administrative privileges. This is necessary so that the Esri ArcIMS servlet connector (or other connector that you might be using) can communicate with the ArcIMS Application Server. The login must be appropriate to the networking environment you are in. Should this be a local machine login or a domain-level login? Initially our servers were on an internal network in one IP domain. Moving them to another IP domain meant changes were necessary.
  2. Should the setup of ArcIMS provide just the host name, e.g., web1, or does it need to specify the fully qualified DNS name? Esri advised us to use the fully qualified DNS name in our networking environment. Since we had different host names and NetBIOS names in our new IP space, this was a complication we just couldn’t work around. Attempts to use cnames didn’t help. The ArcIMS parameter files have a parameter for “host alias” but just what this means under different network configurations is not spelled out. In the end we convinced the network administrator to change the host name and the NetBIOS name to be the same. This had other consequences in the design setup.
  3. When IIS is setup a login is created on the local machine for accessing website virtual directories under an anonymous login. Changing the machine name does not change the name of this login and may result in inability to access IIS virtual directories. This name must be changed by hand in IIS. Depending on your network configuration, in a multiserver environment this may need to be a domain-level login – preferably with restricted priviledges.
  4. Likewise, when you install the operating system, a local machine login for accessing com+ services is created. If the machine name changes, com+ services may not be available.

Servlet Connector/Monitor Issues:  Load balancing in our design meant that as users initially came to the site, they would be directed to one or another WEB/Application server. Our reading of the documentation indicated that once a user started on one Application Server, future requests from that Application Server would be directed to the same Spatial Server. But what if that Spatial Server failed?

Given that one of our primary design concerns was to establish a fault tolerant system in both tier-one and tier-two, we needed to modify the standard ArcIMS setup so that if one of the spatial servers were to fail, the other spatial server would complete the remaining requests from either of the Application Servers. With Esri’s help, we attempted to cross-connect the Application Servers and the Spatial Servers. To do this we had to specify a different port for Servlet Connector communication on each Application Server and write two monitors property files for each Application Server. That is, starting the ArcIMS Monitor now became a hand-initiated process.

The order in which the servers were booted-up or shut-down was of critical importance. Start-up and shut-down procedures during setup and testing were tedious to say the least- especially since it was a 45 minute generation time to bring up a single Map Service given the 20 gig of data for the Map Service in question.

And there were quirks. One spatial server would always appear with its fully qualified name in logs and in the ArcIMS Administrator; the other spatial server would always appear with only its host name though the fully qualified name had been specified in the installation on both machines. One spatial server was consistently slower than the other though the hardware configurations were the same. The symptom was constant thrashing of the machine processes and failure to maintain a connection as diagnosed by the DOS command netstat –a.

(Many months later, long after the design in question had been abandoned, this problem was diagnosed as an errant system setting on that machine.)

Application Server Issues: In the end, we could get one WEB/Application server to work with two Spatial Servers, but if the other WEB/Application server was online, there was no load balancing. The second WEB/Application server was never called. We tried another approach. We turned off Windows 2000 Advanced Server load balancing and attempted to use a DNS cname for load balancing. That didn’t work for us.

The next step would have been to attempt to use specifications within ArcIMS for load balancing. But by this time we were two months over our contract deadline and Code Red consequences in the real world made it clear that our cross-connected configuration of the Spatial Servers that required hand-initiated restarts at certain stages of the restart in an environment of at least weekly reboots for security reasons was not feasible with our limited staff resources.

We simply didn’t have the time, tools or know-how to solve all the problems.

Conclusion

Being on the leading edge of implementing new features of a product always means risks. We designed the system before ArcIMS 3.1 was brought to market; we only had available pre-market documentation for consideration. At the time we were attempting to implement the multi-tier, multi-server features of ArcIMS 3.1 the product had just been brought to market. An understaffed Esri technical support was not prepared to deal with the quantity of calls and the complexity of issues in the real-world diversity of network configurations. We worked in a period when Tech Support calls often took two weeks for a call back and then more weeks to resolve. It was the worst of times. Since then a very active user-support group has taken up a lot of the support slack on arconline.Esri.com.

We believe that given our present experience with ArcIMS 3.1 over the past year we could implement our design for a scalable and fault-tolerant multi-tier, multi-server system – but we might need to take a different approach to load balancing. Given the problems we were asked to address and the tools and the time frame for implementation, we still would have chosen the hardware and software route we took.  The real problems we had were in regard to immature software, lack of time to prepare a production service without development resources and the lack of expert guidance in addressing our technical questions. We would love to try this again, but only if we have both the time and the money to do it right.


Clark A. Roberts
Manager of System Services
University of Illinois at Urbana-Champaign
College of Agriculture, Consumer and Environmental Sciences
Department of Agricultural and Consumer Economics
326 Mumford Hall, 1301 W Gregory Drive
Urbana, IL 61801
Phone: 217.333.5513
Fax: 217.333.5538
crobrts1@uiuc.edu

Carolyn S. White
Program Coordinator
University of Illinois at Urbana-Champaign
College of Agriculture, Consumer and Environmental Sciences
Information Technology and Communication Services
207 Taft House, 1401 S. Maryland Drive
Urbana, Il 61801
Phone: 217.333.6751
cswhite@uiuc.edu