Improvements in performance will go a long way in moving ArcStorm forward to being a truly functional transaction management and storage facility for spatial data. Described in this paper you will find the results of numerous benchmark's that show how dual or multi-CPU processors may help. As well you will find tips that may help in the development of your own ArcStorm PARCEL Management system.
Introduction Because of a much needed reworking of the Louisville & Jefferson County Information Consortium's (LOJIC) property database, and the pending implementation of ArcInfo 7.0, the Property Valuation Administration (PVA) GIS PROPERTY database was redesigned in 1995. Originally housed in the ArcInfo Librarian database subsystem, the PROPERTY Library consisted of 990 tiles. Now residing in ArcStorm 7.03, the database consists of 45 tiles, representing approximately 300,000 land parcels. The data was originally digitized from various PROPERTY block maps, of differing scale and accuracy, and graphically adjusted to fit the existing planimetric basemap data residing in the Consortiums' GIS. (For a description of the PROPERTY database schema's see Appendix A.) The primary goal of the PVA PROPERTY Management System is to update and maintain property data that accurately describes ownership, shape and location of land parcels in Jefferson County for tax assessment purposes. Faced with a software upgrade to ArcInfo 7.0, a parcel database and management application originally designed in 1991, and a much needed reworking of the original parcel database design, a transition to the new ArcStorm Spatial Data Management System had to be considered. At this time, the data was far from user friendly. A complex coding system was used to assign descriptive line attributes in 127 categories of various combinations of block and lot lines. To coherently symbolize the data graphically, an equally complex look up table was needed to determine the type of line you were working with. For example, an attribute code used for symbolizing line features with a code of 1, meant the line was a street right of way. A code of 14 meant a line feature represented a railroad right of way, a subdivision boundary line, and also a tax block boundary line, etc. Making matters worse, much of this information was repeated in other GIS layers found in the LOJIC GIS libraries. After almost 10 years of collecting and refining the GIS, LOJIC data was maturing, and the number of users were increasing rapidly. It was becoming exceedingly important to provide Consortium members with easy access to LOJIC data. Our goals were changing from a data creation oriented focus to a focus on our clients need for more user friendly data and easier access to that data. To meet this responsibility it was necessary to redesign the PROPERTY database. It was only after lengthy investigation that ArcStorm was chosen as the new PROPERTY Management System. By highlighting the challenges experienced during these processes, this paper will reveal practical development practices and methods that may be beneficial to anyone wishing to migrate to a new spatial data management system. ArcStorm was chosen as the PARCEL Management System for a variety of reasons. Because there are as many as 8 PVA staff requiring concurrent access to update Jefferson Counties PARCEL data, concurrent transaction management is critical. Additionally, the ability to lock features without requiring extraction of an entire tile and the resulting unavailability of extraneous features, meant less processing and scheduling time, and system recovery mechanisms would allow the database to be returned to a consistent state in case of system problems experienced during transaction check ins. Benefits and Challenges of ArcStorm Prior to making the decision to migrate from the ArcInfo Librarian system, a partial copy of the PROPERTY database, representing about 30,000 parcels, was loaded into ArcStorm 7.0 for integrity and performance testing by LOJIC staff. Working closely with PVA, LOJIC staff spent 2 months developing the initial application interface and testing ArcStorm performance and integrity before the commitment was made to migrate to the ArcStorm system. This testing was followed by 6 months of client interviews, demos, and application development before the new ArcStorm PROPERTY Management application rolled out to PVA for preliminary use. The original property data consisted of three layers. The primary layer, the parcel polygons, consisted of both arc and polygon attributes. The second layer consisted of historic and block boundary lines, many of which were duplicated in the PARCEL layer. And the third layer, representing administratively merged parcel polygons, was called tax areas. Because the tax area data could be easily derived from the existing PARCEL layer, the decision to drop this data was made. With the exception of block lines that intersect right of ways, block boundaries could also be derived from the PARCEL layer. The line attributes were simplified as well, resulting in only two data layers, the PARCEL layer, and the historic and non-coincident block line data, called the Propline layer. The resulting database schema's are shown in Appendix A. Implementing the new database design proved to be a challenge. First the data had to be modified to meet the objective of providing spatial information that was easy to interpret and use. Aside from the initial problem of implementing the changes to the parcel data itself, it was immediately evident that the continuous nature of the right of way polygon was going to be a problem. The 990 tile Librarian grid was used to isolate right of way, thereby limiting the impact of transaction checkouts. Block boundary lines were also taken from the historic and non-coincident block line layer, called PROPLINE, and used to further isolate the impact of right of ways. Because Jefferson County contains some 300,000 parcels spread across approximately 385 square miles, various tiling structures were tested and tuned for optimum checkout and check in times. After numerous tests, the data was partitioned by feature density into 45, square and rectangular tiles. The next step was to measure performance. Transaction check out and check in times were measured over the WAN, with the ArcStorm database residing on a remote server. Due to a shortage of disk space, preliminary testing was limited to a partial copy of the PROPERTY database consisting of approximately 30,000 parcels. Preliminary benchmark's looked good. Scheduling and locking went smoothly and the nuts and bolts of working with property data in an ArcStorm environment were being worked out. Concentration soon shifted from working on the data itself and getting it right for ArcStorm, to application development. Many challenges followed and various difficulties with ArcStorm were solved with creative programming, but as of this writing, others are pending. A few of the challenges faced during the various processes of migrating to ArcStorm are outlined below: - Transactions involve two layers, both with annotation, the primary PARCEL layer and the secondary layer used to archive historic property lines called the PROPLINE layer. This layer may have areas that when overlaid with their related parcels have no features. In this case it is necessary to copy the .bnd file of the transactions primary PARCEL layer to the transactions historic layer. Oddly, when two or more users check out an area with no historic features, ArcStorm always schedules the locks to tile_1. This happens regardless of were the transaction location falls in the tile grid. - During the locking phase, viewing users who wish to draw parcel data residing in the ArcStorm database experience a "wait state" when nothing happens. This occurs even if the view is well away from the area being locked or unlocked. Users accessing data for view and query that resides on the ArcStorm PROPERTY database server could experience waits in excess of 20 minutes during locking activity. To alleviate this problem, the property data is copied nightly into a Librarian structure specifically for view and query access. - After PVA staff were on line with the new ArcStorm PARCEL Management system for a couple of months, unrelated feature locking started to occur and became more and more frequent as time went on. As a result, ArcStorm client processes would bottleneck and the asmaster would have to be killed and restarted. Monitoring transaction activity in Schemaedit showed that non-adjacent and unreasonably high numbers of tiles were being locked during check out. Further analysis showed that duplicate object-id's existed on distinctly different objects across the database. To correct the problem the database had to be rebuilt, and to prevent it from recurring, the object-id's for all feature classes are calculated to 0 prior to check-in. - For reasons unknown, the database periodically goes into an unrecoverable state. RECOVERDB must be used to put the database back into a consistent state. - For reasons unknown, the wservice sometimes dies leaving an orphaned asmaster. The orphaned process must be killed before the wservice can be restarted. These episodes became less frequent after moving the ArcStorm server to a dual CPU machine. - Transaction recovery during check in is sometimes unable to return message that transaction check in was successful causing confusion as to whether the transaction was successful or not. - Generally bad check out and check in performance times. Continued bad performance resulted in extensive benchmarking run on different system configurations for indications of performance and stability. Because some ArcStorm processes are extremely I.O. intensive, while others are CPU intense, the optimum platform was anticipated to be one with dual or multi-CPU's and very fast disk speed. Results of these benchmark's follow. Benchmark's Benchmark's were performed against the ArcStorm PARCEL database on 4 different system configurations. Client processing time was subtracted from transaction check out and check in times to more accurately gauge the performance of the ArcStorm server itself. The tables found below list various server times for transaction check outs and check ins. To ensure consistency, the following parameters were followed: Check-in times were modified by subtracting 30 seconds from all times to account for processing that takes place locally on the client machine. To eliminate potential CPU usage by extraneous server processes, all testing was done after hours while CPU usage was flat and no other processes were running against these machines. To keep client performance constant and accurately measure the PVA ArcStorm servers performance, all testing was done on the same SPARC Station IPX machine. Hardware configurations tested include: ULTRA SPARC, Model 170, 33 MHz, Single CPU SPARC Station 20, Model 71, 75 MHz, Single CPU SPARC Station 10, Model 41, 33 MHz, Dual CPU SPARC Station 10, Model 20, 33 MHz, Single CPU Transaction logs accumulated since PVA went on line with the ArcStorm database server were analyzed to determine actual processor time occurring on the server during check out and check in processes. With the exception of Graph D, all graphs show time in minutes plotted over the number of days that the original, single CPU ArcStorm server went on-line at PVA. Graph D illustrates the difference between processing times for the original PVA ArcStorm single CPU server, versus the dual CPU server now used. This data was tabulated and graphed into four categories, described below: - Graph A: Total transaction times on the original PVA ArcStorm single CPU server. - Graph B: Total check-out time on the original PVA ArcStorm single CPU server . - Graph C: Total check-in time on the original PVA ArcStorm single CPU server. - Graph D: Total transaction times on the original PVA ArcStorm single CPU server compared to the currently used, dual CPU PVA ArcStorm server. (Graphs B. and C. represent a users point of view. If three transactions start at the same time, all of which finish in 15 minutes, these graphs show 3, 15 minute transactions.) Performance Graphs Graph A)
Graph A, shows total average PVA ArcStorm server transaction processing times for both data check outs and check ins. This data was derived from the actual PVA transaction logs and shows time in minutes over the number of days the PVA's ArcStorm database was in use on the single CPU server. Average transaction time is approximately 8.5 minutes. The extreme spikes that occur showing abnormally long transaction times appear to correlate to times when bottlenecks occurred due to unusually high numbers of simultaneous transaction processes. Graph B
Graph B depicts total ArcStorm transaction times for check-out processing occurring on the original PVA single CPU ArcStorm server. Again the extreme spikes appear to correlate to times when unusually high numbers of simultaneous transactions were occurring. The variance seen in the transactions times is due to fluctuating numbers of simultaneous transactions. Graph C)
Graph C shows total ArcStorm transaction times for check-in processing occurring on the original PVA ArcStorm single CPU server. Again the extreme spikes appear to correlate to times when unusually high numbers of simultaneous transactions were occurring. The variance seen in the transactions times is due to fluctuating numbers of simultaneous transactions. Graph D)
The final graph, Graph D, shows total transaction time for both check-in and check-out processing on both the original, single CPU PVA ArcStorm server, and the new, temporary replacement, dual CPU ArcStorm server. Times were plotted in minutes over a 20 day period. The first 10 days show the original single CPU server times. The remaining 10 days show processing times on the dual CPU server that is currently being used until a permanent solution is identified. This graph is the most revealing as it shows a very distinct improvement in transaction processing time since the installation of the dual CPU ArcStorm server. ArcStorm Server Performance Measurements Under Increasing CPU Load As stated previously, graph information was derived from the ArcStorm server logs and show actual computer processing time. From the user point of view, times may not correlate directly due to many factors including; network traffic, sub-standard workstations, multi-tasking, and other daily tasks that can impact the performance of the local machine. To gauge ArcStorm server performance under increasing load, the following benchmark's were performed showing average times for 1 to 3 simultaneous transactions during check-out and check-in processing on a SPARC Station IPX workstation. Currently, up to 7 transactions can occur simultaneously, one for each PVA client. Because this scenario is extremely rare and difficult to model and analysis of PVA logs shows that the average maximum number of simultaneous transactions is 3, 3 simultaneous transactions were chosen as the upper limit for these benchmark's. Three Simultaneous ArcStorm Transactions ULTRA SPARC Check Out Time Check In Time 1:48 7:40 2:40 9:10 2:59 13:40 Avg. = 2:27 Avg. = 10:00 SPARC 20 Check Out Time Check In Time 4:57 5:00 4:58 6:17 5:09 7:55 Avg. = 5.01 Avg. = 6:24 SPARC 10, Single CPU Check Out Time Check In Time 7:04 10:23 7:05 13:33 7:06 17:00 Avg. =7:05 Avg. =13:38 SPARC 10, Dual CPU Check Out Time Check In Time 1:55 4:40 2:19 6:33 3:10 7:55 Avg. = 2:28 Avg. = 6:03 Two ArcStorm Transactions ULTRA SPARC Check Out Time Check In Time 1:40 6:56 2:20 8:16 Avg. = 2:00 Avg. = 7:18 SPARC 20 Check Out Time Check In Time 3:28 4:15 3:31 5:30 Avg. = 3:30 Avg. = 4:53 SPARC 10, Single CPU Check Out Time Check In Time 6:00 8:50 6:02 11:37 Avg. = 6:01 Avg. = 10:14 SPARC 10, Dual CPU Check Out Time Check In Time 2:05 3:19 2:15 4:25 Avg. = 2:10 Avg. = 3:52 One ArcStorm Transaction ULTRA SPARC Check Out Time Check In Time Avg. = 1:10 Avg. = 3:30 SPARC 20 Check Out Time Check In Time Avg. = 1:55 Avg. = 3:00 SPARC 10, Single CPU Check Out Time Check In Time Avg. = 4:00 Avg. = 9:00 SPARC 10, Dual CPU Check Out Time Check In Time Avg. = 2:30 Avg. = 4:30 Check-Out Benchmark For Local Processing Times With one exception, client or local processing time used during ArcStorm transaction processing is insignificant. The exception occurs during the selection process when the client chooses the area of interest to be extracted from the ArcStorm database. Client processing during these times varies greatly according to the client system. The following 4 commonly used system configurations were measured and are listed below: SPARC 20 - 1:15 SPARC 5 - 1:32 SPARC IPX - 4:32 SPARC IPC - 5:20 Conclusion It is evident that multi-CPU processors are the optimum choice for ArcStorm processing. The dual processor, is approximately twice as fast, although some degradation occurs slowly under increasing load. It should be expected that a dual processor SPARC20 or ULTRA SPARC or a multi-processor server will perform best, but will also show some degradation when transaction processing occurs simultaneously. Another factor effecting the performance of the ArcStorm PARCEL database tasks performed at PVA are sub-standard workstations. Currently only one PVA staff member has access to a SPARC Station 5, Model 70 machine, the minimum system recommended by Esri. Upgrading PVA staff workstations in addition to using a dual or multi-CPU dedicated server may go a long way in increasing the productivity of the PVA's GIS staff. Many lessons were learned about application and database development through this experience. Utmost is the ability to emulate how the client will use the data as much as possible. Client load and database size can significantly impact the performance of any system. If the impact of new technology causes significant performance decreases, weigh carefully any decision to migrate to a new platform. The investment in development time is too costly to turn out a product that adversely impacts productivity. As a general rule, as software development progresses, the need for new, faster and more powerful hardware becomes something to consider. The success of LOJICs' conversion of the property data from a Librarian system to an ArcStorm system depends heavily on the ability to acquire appropriate hardware. If data integrity is an issue in your current configuration, then a minor decrease in performance time may be acceptable. In this respect, ArcStorm holds many benefits over ArcInfo Librarian. Database recovery mechanisms ensure that the data will revert back to a consistent state if the transaction fails to successfully check in, but performance decreases can be remedied if budget allows for improved hardware. In the PVAs' case, productivity has increased. To increase productivity even more, the ArcStorm PROPERTY data server was changed from a single CPU processor to a dual processor. Plans are under way to acquire a dedicated dual or multi CPU server, and client machines are being upgraded as well. Acknowledgments Special thanks to the property mapping staff at the Property Valuation Administration of Louisville and Jefferson County KY for their valuable feedback and help in making this project a success. Appendix A
Christi Stevens GIS Analyst LOJIC 700 W Liberty St Louisville,KY 40203-1913 (Phone) 502-540-6383 (Fax) 502-540-6562