Using ArcSDE Versions in the Real World

James Ewing, Ewing Consulting


Abstract

This paper describes the Palm Beach County Property Appraiser's implementation of a feature management system using an ArcSDE-versioned geodatabase. The Appraiser built a feature management application to leverage the built-in strengths of versions to meet the users' needs for strict feature locking, published versus working versions, and robust check-in/check-out control. The final application wraps ArcSDE versioning in a user-friendly interface where all ArcSDE housekeeping is handled programmatically. The paper will discuss the Appraiser's requirements and detail the application that was put into production to meet those needs.


Introduction

The Palm Beach County Property Appraiser is in the process of converting its GIS from GDS to ArcGIS. The target implementation is a multi-feature ArcSDE Geodatabase stored in an Oracle 8i relational database. As part of the implementation I was assigned to evaluate the various Versioning scenarios available in ArcSDE to select the most effective way to meet the Appraiser’s requirements for performance, flexibility, and control. This paper discusses the Appraiser’s system requirements, the results of the Versioning evaluation, and the resulting application architecture to meet the requirements.

In summary the Versioning work flows supported by ArcSDE do not provide a sufficiently robust control to meet the needs of the Appraiser’s application. They fall short in two main areas: performance and flexibility. To address these shortcomings it was decided to build the application using feature level attributes to control geodatabase aspects such as history and point-in-time snapshots and to use a single default Version to provide work-in-process visibility and maximize long-term geodatabase performance.

As a note in terminology the word “version” can be one of those confusing terms. In this paper “Version” capitalized will indicate an SDE Version that applies to the geodatabase. Lowercase “version” will mean a generic version such as the shape and attributes of a single feature at a specific time.

Palm Beach County Property Appraiser Requirements

Palm Beach County, in southeastern Florida, is the largest land area county east of the Mississippi. Covering about 2,500 square miles it extends from the Atlantic coast to the center of the state. Its developed area includes the condominium-covered coast, a large urban/suburban developed area, the western agricultural area, and several large wildlife resource areas. The county has about 545,000 total parcels of which 278,000 have a land component. In addition to valuation the Property Appraiser is responsible for creating, maintaining, and managing the parcel base map of Palm Beach County.

The Appraiser’s GIS conversion effort has several top-level goals to facilitate the map maintenance process. These are:

1) To provide a means of tracking work-in-process and an application controlled environment for data quality assurance and production data input. Approximately 10,000 map changes are expected per year.

2) To provide seamless integration between the GIS applications and the Appraiser’s CAMA database with minimal duplication of data.

3) To allow a user to retrieve an historical view of the certified tax roll for past years. Five years of map data to be maintained on-line.

4) To allow point-in-time views of map status at tax roll events during the tax year or at a specific date.

5) To support multiple edits to a single feature maintaining the correct chronology of feature changes including the ability to edit work-in-process features from different edit sessions prior to the existing edits being posted to production.

Of these requirements all except 2) involve some form of version management to successfully implement.

To illustrate some of these requirements and their interaction with SDE Versioning I’ll use the following scenario.

Figure 1a

Figure 1a shows two adjacent parcels as they exist in the most recent certified tax-year.

Figure 1b

In Figure 1b parcel 101 is split and half of the parcel is subdivided.

Figure 1c

In Figure 1c parcel 102 is split and half the parcel is subdivided.

Figure 1d

Figure 1d shows the remains of both parcels combined and subdivided.

This scenario describes the sequential removal of the two parent parcels during the work process for the subsequent tax year. Neither parcel survives into the next year but both undergo significant changes during the year that may affect ownership, valuation, or addressing.

Before going into detail about the Versioning consideration for each of the top-level requirements I’d like to briefly review how SDE Versioning works and the different types of Version work flows.

Versioning Mechanics

When you define a feature class in ArcCatalog you are defining the ArcSDE Business table for that feature class. This table can be the user defined geodatabase feature class or attribute table. This is the feature class or table that is worked with at the user level. However, to manage geometry and Versioning the feature class has several supporting tables that are related back to the business table through some ArcSDE system tables. A business table has a single row entry in the SDE.LAYERS system table and a single row entry in the SDE.TABLE_REGISTRY system table. The LAYER_ID column in the LAYERS table is used to associate the feature class with its geometry in the FEATURE (“F”) table. From TABLE_REGISTRY the REGISTRATION_ID is used to associate the feature class with its Versioning Add (“A”) and Delete (“D”) tables. As a simplified example the TESTPARCEL feature class could have the following relationships shown in figure 2.

Figure 2

In this example the feature class TESTPARCEL has a Layer_ID of 27 and a Registration_ID of 46. The geometry for this feature class is stored in the Feature Table named F27. All adds and deletes to this feature class, even if they are to the default Version, are stored in the A46 table for adds, or the D46 table for deletes. The “A” table has the same structure as the base feature class with the addition of the SDE_State_ID column. The geometry for Add table features is stored in the “F” table.

When you add a new feature in ArcMap the tabular information is stored in the “A” table and the geometry in the “F” table. This happens regardless of whether you are editing the default Version or a child Version. Similarly, when a feature is deleted the reference to that feature is put into the “D” table. These “A” and “D” table rows remain until the Version is posted back to the default Version and the database is compressed. If working in the default Version no posting is required.

The SDE.VERSIONS and SDE.STATES system tables are used to identify which features are appropriate for display for which Versions. In short, when displaying a particular Version, the system selects everything in the base Version plus everything in the “A” table with an appropriate SDE_STATE_ID less those features referenced in the “D” table. When a new Version is created it assumes the State_ID of its parent Version until an edit is made. At this point the new Version starts generating new State_IDs for each edit session that occurs on that Version.

When a Version is selected the SDE.VERSIONS State_ID column contains the latest state referenced within that Version. This State_ID is related into the SDE.STATE_LINEAGES system table to determine the lineage of States represented by that Version. Every State within that lineage is valid for the selected Version, those not in the lineage are not valid for the Version. The state lineage is compared against the “A” and “D” tables’ SDE_STATE_ID and DELETED_AT to determine the features that are included in each State, and by association, the selected Version.

Versioning Scenarios

In 'Modeling Our World' (Michael Zeiler, Esri Press, 1999) five possible Versioning workflows are presented. They are: Direct Editing – the maintenance of a single Version of the database; a Two-level tree – a series of single level of Versions each of which is based on the default Version; a Multilevel tree – where child Versions can themselves have children; Cyclical – a single chain of Versions each of which has a single child that represents a step in a process; and Extended history – a series of Versions each of which represents the state of the geodatabase at a certain point in time. These Versioning scenarios are represented in figure 3a and 3b.

Figure 3a

In the Direct editing workflow all edits are applied to the default Version. This has the benefits of making all work-in-process visible to all users of the geodatabase and keeping the geodatabase as compact as possible. Its drawbacks are that, without special processing, the work in process cannot be hidden from any users and that previous versions of features cannot be maintained.

With the Two-level tree scenario each child Version is based on the base default Version. This scenario will support persistent feature versions but with the drawback that the changes made in earlier Versions will not be available to later Versions until the earlier Versions are reconciled and posted to the default Version.

A Cyclical or Step Tree Version scenario has all Versions chained off of each other in a single parent child hierarchy. All work-in-process from earlier Versions is visible to later Versions and intermediate Versions can be posted back to the default Version as the work is completed. Intermediate Versions cannot be deleted until their child Versions have been removed. Also posting child Versions also posts the work done in the parent Versions so having incomplete work in a parent Version precludes posting out the Version string until all work is complete. This is the workflow of choice if later data modifications are dependent upon earlier work.

Figure 3b

The Multilevel tree workflow supports an indeterminate number of Versions based on the default Version each with an indeterminate number of children down through the hierarchy. Once again the work-in-process in one Version is not visible to other Versions. This model is the most flexible for a workflow that is non-overlapping.

Last the Extended history workflow creates snapshots of the geodatabase at specific points in time. This allows the SDE Versioning software to maintain as many historical views of the geodatabase as the user requires. The primary drawback to this scenario is that the “A” and “D” tables are never flushed during database compression.

These are just the basic Versioning scenarios possible. They can be mixed and matched to meet the needs of the specific application.

Versioning and the Palm Beach County Appraiser Requirements

The Appraiser’s workflow is based on the processing of individual documents. A document might be a deed, a recorded plat, a municipal resolution or any number of other types of documents. All documents share two things in common: 1) they are “legal” either through being filed with the Courts or through administrative procedure; and 2) they result in a change to the map. In the Appraiser’s application each document is tracked and processed individually or as part of a group of related documents. Each document or group of documents follows an appropriate process from identification to data capture and edit, to QA and finally to posting. While, in general, documents are entered in chronological order they are not necessarily processed in that same order. This can be due to anything from a title or abstract problem to a later document requiring “rush” processing. From a business workflow each document or group is handled independently but a later document may sometimes be dependent on an earlier document such as with the case of shared parent parcels. Lastly a document may have problems preventing its process being completed and forcing that document to be moved into a subsequent tax year.

Coupled with these workflow issues are a couple of time dependencies. Some Appraiser departments will begin processing data for the NEXT tax year while other departments are still processing the CURRENT tax year. This requires two active work areas (at the least). Also the end of a tax year does not limit changes for that tax year. These changes can come about due to owner petitions, litigation, or discovery of a processing or valuation error. Occasionally these changes can happen a year or more later and will affect subsequent tax years.

With this business workflow in mind let’s review the application requirements that have a versioning component and see how they will work within the Versioning scenarios just covered. The application elements appropriate to a versioning scheme were:

1) To provide a means of tracking work-in-process and an application controlled environment for data quality assurance and production data input. Approximately 10,000 map changes are expected per year.

3) To allow a user to retrieve an historical view of the certified tax roll for past years. Five years of map data to be maintained on-line.

4) To allow point-in-time views of work-in-process by events during the tax roll calendar or by a specific date.

5) To support multiple edits to a single feature maintaining the correct chronology of feature changes including the ability to edit work-in-process features from different edit sessions prior to the existing edits being posted to production.

Item 1) would be best supported by a Two-level tree Versioning scheme. This would allow each piece of work to be individually tracked and reviewed at any time. Problems within one Version will not affect subsequent Versions so there is no impediment to posting completed work into the default Version as it is completed and checked.

Item 3) is handled by the Extended history scenario. Each tax year becomes it’s own version at the end of the year. Subsequent work occurs in a new version. The thought here was to have a “Working” Version for all processing during the tax year which is posted back to the default Version at the end of the year. A “TaxYear” Version is then defined as the state of the current default Version. Any post-close changes can be posted to the TaxYear Version or to a subsequent Final-Final Version as appropriate.

Item 4) can not be directly handled by ArcSDE Versioning. Versioning is a discreet incremental concept and not a continuous function. To meet this requirement an acceptable Version increment would need to be defined, i.e. daily or weekly, and a new Version created at each of those intervals. The Tax Year processing events could be handled in the Extended history workflow.

Lastly, requirement 5) can be met only by a Versioning scheme in which all edits are applied to a single Version. This could be either the Direct edit or the Extended history model.

The Two-level and Multilevel tree models will not work because of the requirement that work-in-process be available to all Versions. If there is work being done in one Version and related work in another neither is visible to the other unless each is completed in turn and posted back to the shared parent. Additionally if a new document comes through that builds on the work done in both earlier Versions then both must be completed and posted back to a shared parent before the new work can be started. The Two-level tree, Multilevel tree, and Cyclical models also fail to meet the business workflow need that work does not need to be completed sequentially.

By elimination the Appraiser was left with a modified Extended History scheme for versioning where prior tax years be saved as separate Versions. There would be a single “Working” Version for the current tax year that would be posted back to the default Version once a year. The significant tax year events and the intra-year snapshot Versions would be based on the “Working” Version and would persist as long as needed. The “Future” tax year Version would also be based on the “Working” Version. Future work-in-process at the end of a tax year would be programmatically rolled into the new “Working” Version. The conceptual model is shown in figure 4.

Figure 4

With this working model in mind a series of proof-of-concept and benchmark tests were run to sort out the implementation details. What became apparent was that the level of complexity of this Versioning scheme quickly became unmanageable, the system’s performance would degrade to unacceptable levels, and the programming resources required to implement an ArcSDE Version based application would be significantly more than building the same functionality without Versions.

Why It Didn’t Work

The Version based scenario failed due to three primary reasons.

1) ArcSDE Versions are not versions they are collections of states.

2) In a geodatabase with a large number of Versions the number of states and supporting database rows will increase much faster than the number of features.

3) It’s hard to programmatically manage a large number of versions.

Let’s take a more detailed look at these.

An ArcSDE Version is a collection of states. When a Version is created it assumes the state of its parent version. As changes are made in the new version two things happen. First, its state changes to the current state and, second, new states are added as each change is made increasing the state lineage. The parent and child Versions are now representing (at least) two different states. If a change is now made to the parent Version that change does not automatically propagate to the child Version because the child Version isn’t based on the parent version. It’s based on the state the parent Version had at the time the child was created. Figure 5 illustrates this.

Figure 5

In the context of the Property Appraiser’s application this precludes having an easy way to get post-close changes from a prior tax year into the current working Version or into any subsequent dependent Versions. To propagate changes forward the application will have to programmatically determine Version dependencies and insert the changes to make them show up in all the appropriate Versions. If the changed feature has been modified in a subsequent Version then the propagation will have to terminate to avoid corrupting existing data. In addition any changes made in the current “Working” version will have to be moved into the future “Working” version once that future Version becomes active.

In a geodatabase with a large number of Versions the number of states and supporting rows will increase much faster than the number of features. In a “normal” situation changes are made to the geodatabase which increases the number of states. As part of the normal processing the geodatabase is regularly compressed to remove unreferenced states and unneeded features. However, if a Version is created for historical or archive purposes all states and features will be preserved to maintain that Versions validity. These states and features will persist for as long as they are referenced by any Version even if they subsequently become unused. In the Appraiser’s Versioning model it was anticipated that there would be three “Certified” tax roll versions, another three Versions based on the tax roll process events, and at least 52 “Work-in-Process” Versions per tax year. There will also be one current and one future working Version. In the fully developed geodatabase containing five years of data there will be, at a minimum, 292 Versions. Depending on the level of work at the time each Version was created it may reference a significant number of unneeded elements.

At this point we made some assumptions to determine the impact of these archive Versions on the application’s performance. It must be remembered that persistant Versions not only preserve state information they preserve “A” and “D” table entries even if those entries are subsequently posted to the default Version. In an RDBMS environment table joins on large tables are expensive. In our geodatabase environment the primary spatial index search against the “F” table should be very quick. However, the results of that query will need to be joined into the Version management tables which will add a significant processing overhead. To estimate the impact the growth in the Version side of the query would have we assumed that on average each change (10,000 total per year) would require two edit sessions (20,000 total) and that 25% of those edit sessions (5,000) would span a Version of some kind. The 10,000 changes would be 5,000 new features, 4,000 changed features and 1,000 deleted features. Each new feature would therefore result in two “A” table entries and one “D” table entry. One add for the initial add and then one delete and add for the subsequent modification. Each Change would result in four new entries, one add and delete for the initial change and one add and delete for the subsequent change. The deleted features would each require just one “D” table entry. The total number of new feature entries is 5,000 for the features and 10,000 for the changes, of those 2,500 (25%) will persist. For the changes there will be 8,000 entries for the changes and 8,000 for the subsequent edits, of those 2,000 (25%) will persist. For the deleted features all 1,000 will persist. This brings us to a total of 23,500 entries per year just to maintain version information. That total translates into 117,500 entries for our five-year database for Version maintenance. This represents about 20% of the total size of the geodatabase solely in data overhead. However there is an additional item of concern. If it is ever necessary to change a single item, say a new set of neighborhood codes, for the entire geodatabase then the size of the geodatabase will immediately triple in addition to the maintenance overhead.

The last problem is how manage all this within the application. The basic management functions such as selecting the appropriate Version for edit or display, creating new Versions, and compressing the geodatabase would be straightforward and easily customized within ArcObjects. The difficulty comes from the forward-posting of changes after new dependent Versions have been created. At the very least a method for reading the Version and state information directly from the system tables would be required. This information would be processed to determine which Versions would be affected by the change. The actual changes would have to be made using ArcObjects and the Editor. This would introduce even more unneeded states and features to the geodatabase but would be safer than trying to force the state information into the Version system tables outside of ArcGIS’ control.

Clearly, while none of these issues is a complete showstopper, together these concerns forced us to look for a simpler and more direct method of addressing the application requirements that would meet the Appraiser’s data management and performance goals.

Measured Performance

To try and quantify the expected performance hits between a Versioned geodatabase and a non-Versioned equivalent a pair of identical feature classes was created. Both feature classes were members of the same feature dataset. Both contained 10,000 identical 100’ by 100’ parcels each of which had the same corner coordinates in both feature classes. The first feature class, named TestParcels, was a single SDE Version where all data was entered into the default Version and the geodatabase was compressed prior to edit testing. The second feature class, named TestParcels2, was built as a series of five Versions each of which contained 2,000 features. Both feature classes had a property column for Tax Year, Creation Date, and Status for testing. The non-Versioned feature class, TestParcels, was used to isolate the effect on performance of additional entries into the “A” and “D” tables without the impact of additional Versions present. The Versioned feature class, TestParcels2, ran the same tests with the added complication of previously existing Versions. For TestParcels2 the previous Versions represented an archival snapshot for a prior tax year. To simplify things no work-in-process Versions were used for the test.

The test was set up to measure the impact of a series of consecutive data updates. In each update 20% of the total features had an attribute modified. The next 20% was then updated and so on until the change to the entire database was complete. The results were surprising.

Figure 6 shows a graph of the results. The Y-axis is in dimensionless time to remove influences such as server or network performance, or database tuning. These factors could change the absolute time data but the relative time will stay pretty much the same. The very big surprise here is that the non-Versioned geodatabase had significantly slower performance than the Versioned geodatabase. In absolute terms the time required for the 0 – 20% update were within a few seconds of each other. But the non-Versioned updates took significantly longer as the total percentage of the database updated increased. The blue line on the chart shows the non-Versioned performance and the red line the relative update speed of the multi-Versioned geodatabase.

Figure 6

The only difference between the two scenarios is that in the non-Versioned geodatabase all primary data resides in the business table, TESTPARCELS. In the multi-Versioned geodatabase only 20% of the data is stored in the business table, the rest is stored in the Versioning “A” table. This points to an ArcSDE optimization issue between the way data is handled in the business table and the Versioning table. To test this I started with an empty feature class and created a Version based upon that empty state. I added the same 10,000 features to the Default SDE Version. However, because there was an existing Version representing the empty state, all data was stored in the feature class’ “A” table. The update test was run and the performance in this scenario was slightly better than that of the multi-Versioned test. This performance is shown in figure 6 by the green curve.

Clearly there is either a design or implementation problem with the handling of updates against the business table. At this time I don’t know if this is a bug in SDE or a design feature due to optimized performance in the version tables.

Data insert and update is only part of the performance consideration. One of the goals to optimize performance is to keep the size of the Versioning “A” and “D” tables as small as possible. This is done by compressing the geodatabase in ArcCatalog. Here the performance results are opposite. In the non-Versioned geodatabase the 10,000 features required 10,000 rows in the business table at the beginning of the test. After the test was run and the geodatabase compressed there were still 10,000 rows in the geodatabase. Compression was fast, just a few minutes. For the multi-Versioned geodatabase the feature class started out with the same 10,000 rows, 2,000 in the business table and 8,000 in the “A” table. However, following update the number of rows doubled. 10,000 for the current features and another 10,000 to support the states in the archive Versions. Compression for this scenario was very slow, over an hour, which might limit the available windows for compression of a large geodatabase in a production environment. The single-Version scenario started with 10,000 rows in the “A” table. Following compression there were still only 10,000 rows as in the non-Versioned model. Compression speed was intermediate between the non-Versioned geodatabase and the multi-Versioned geodatabase at about 20 minutes for the test geodatabase. This may still present a problem for finding an appropriate time window to compress a very large production geodatabase.

The Solution in Palm Beach County

To meet the application requirements for the Palm Beach County Property Appraiser the versioning of the production geodatabase will be a combination of ArcSDE Versioning and true versioning based on feature attributes. The ArcSDE Version model used will be that of an empty dummy “archive” Version to take advantage of the higher performance seen by using the “A” table for primary storage instead of the feature’s business table. To keep the total row count as small as possible all work will be done against the ArcSDE Default Version.

To manage the true versions needed for display of historical tax year maps, tax year calendar snapshots, work-in-process snapshots and work-in-process status maps a set of feature attributes will be populated with key data. These attribute include the tax year the feature became active, tax year the feature went inactive, feature creation date, feature historical date, the feature’s map status, and the job or work unit that a feature is associated with.

Coupling these feature attributes with the ArcMap Layer Definition Query allows a user to display a map for any tax year, at a specific point in time, showing all features of a specific status or associated with a job or work unit. This functionality can be easily controlled from the standard user interface or programmatically.

The complicating factor becomes the preservation of features in history. To do this a control module is needed that allows the user to identify the feature to work with, duplicates that feature to preserve its geometry and attributes and then removes the original from the working layer to let the user work on the copy. When the work is complete another program module will move the original into historical status and makes the edited copy the current active version of the feature. This method will allow work-in-process features from different jobs to be shared as source data to new jobs bypassing one of the major limits of ArcSDE Versioning. It will also allow the all work-in-process to be visible to other jobs in the same area.

Finally, I’d like to step through a couple of scenarios to illustrate how this will work in practice.

To display the basemap of current active features the application will programmatically define the layer definition query to include all features with a STATUS of “C”.

To display the features for a particular tax year the layer definition query is modified to include features with a STARTTAXYEAR value equal to or earlier than the target tax year and an ENDTAXYEAR value of null or greater than the target year. A feature whose STARTTAXYEAR and ENDTAXYEAR values are the same would indicate that the feature never made it to a certified end-of-year status.

For map edits the steps become slightly more complicated. Initially the display would include two layers. The first would be for all current active features. The second for all work-in-process features. This map represents the allowed set of parent parcels available to the user. When the user has identified and checked-out the parent parcels these initial layers become non-selectable and a new layer containing the editable copies of the parent parcels is added to the display.

Summary

While offering the ease-of-use and high performance of a built-in component ArcSDE Versioning falls short in several areas required by the Palm Beach County Property Appraiser’s map maintenance application. Chief among these are Versioning’s inability to share states of multiple parent Versions to a single child and its inability to propagate changes from a parent version to its children automatically. To get around these limitations the Appraiser’s application uses a simple archive Version scheme to take advantage of the performance benefits of ArcSDE Versioning coupled with a set of feature attributes to allow a user to dynamically define a true version of the geodatabase. This combination assures both the performance and flexibility expected by the system’s users.

Acknowledgements

I’d like to thank Gary R. Nikolits, CFA the Palm Beach County Property Appraiser for his kind permission to present this paper and his ongoing support of a really outstanding GIS development project.

Author Information

James Ewing

Ewing Consulting
509 Gulf Road
North Palm Beach, FL 33408

561-840-1536 : voice and FAX
jewing@netster.com
www.netster.com