Managing Selected Sets of Layer Features in ARCPLOT

AUTHORS: Bruce Tepley and Bruce Thomas, Thomas Bros. Maps

DEFINING ISSUE:

Spatial database managers often need to filter and selectively recombine selected subsets of features from large geo-spatial datasets. The use of ARCPLOT writeselect files is a useful high-performance approach to filtering. However , it cannot be used with ArcInfo Librarian and ARCSTORM layers.

ArcInfo SOLUTION:

The ArcInfo solution presented in this paper is based upon two features of ARCPLOT:

1. Read/Write-select can be performed dependably on INFO keyfiles.

2. The results of these selections on the keyfile can be transferred to the library layer with RESELECT KEYFILE.

METHOD:

A layer item is designated for feature identification as the key item. Other layer items are used for selection criteria.

A separate "reference" INFO table of these items for identification and selection is created by INFOFILE from all features in the library layer.

Whenever a set is used for efficient logical combinations (AND, OR, XOR), a writeselect of the reference set is saved. The readselect options are later used for convenient logical (logical) filtering. The resulting records in the reference set will then contain the appropriate keyfile values in the key item which are then used to select the same set of features in the layer.

Indexed keyfile selection may be used to transfer equivalent record selection sets between the keyfile reference set and the library layer.

SOFTWARE: Implementation examples are in AML.


 

Introduction

 

In ArcInfo, there are many ways of combining feature selection commands to produce a report or plot, but some are more practical than others. Thomas Bros. Maps has used ArcInfo software to produce plot files from high-quality ArcInfo coverages stored in ARC LIBRARIAN. Plotting the data involves selecting and combining sets of coverage feature records (filtering). This filtering must often be done using complex criteria that are applied to a database of frequently updated coverages or tiles.

This paper compares the uses of two fundamentally different kinds of filtering commands:

1. RESELECT, which performs value comparisons on feature items.

2. "Bitmapping," which requires ARCPLOT READSELECT/WRITESELECT.

One can greatly enhance filtering performance with "bitmapping" options. Even a database undergoing regular updating may be filtered more quickly with some of the techniques we will explore.

 

 

Comparing Two Types of Commands

 

The RESELECT command performs value comparisons on specified items in the feature attribute table. With larger feature attribute tables, the time required to complete a command varies with the number of features being filtered. In general, complex selection combinations based on multiple criteria invoke a new comparison pass for each new criterion applied. As a result, selection functions can easily become a performance bottleneck with large coverages, info files, or tile layers.

Users familiar with using complex combinations of feature attributes may have observed that a RESELECT command sequence that reduces the "active" set of records the most during early passes performs faster. This is not surprising. Reducing the mass of records for comparison leaves less work for subsequent passes, because they will search on a smaller record set. In effect, subsequent reselect passes use the result of previous passes for a performance advantage, although this tends to occur incidentally.

The READSELECT/WRITESELECT commands store and retrieve results from a comparison reselection for a more long-term use. These commands depend on set selections created from a prior RESELECT, although they nevertheless may create, store, and retrieve record combinations that are very different from those stored in one WRITESELECT file. This is because READSELECT can do much more than restore selection results. It also provides convenient options with logical ("Boolean") terminology that enable very rapid logical combination of stored subsets of records.

A Simple Benchmark

Performances of these two kinds of record selection differ greatly, as even the following simple benchmark will show:

1. Create a coverage using some fixed series of reselections in the ARC RESELECT command.

2. In ARCPLOT re-create the same selection subset and WRITESELECT it. Create an identical coverage using the ARC RESELECT command with the WRITESELECT file option.

When we performed this exercise on very large arc coverage, we found that the RESELECT-only method required more than two hundred times as long to process, compared to the "bitmapping" method. The exercise underscored the time advantage of using bitmapping commands for record selection in an ARC RESELECT command. It compared a record-by-record search with "bitmapped" direct retrieval.

However, this example also highlights a limitation of the "bitmapped" approach: a previous reselection must first be performed and then saved. Performance advantages of bitmapping commands are realized only after a pre-selected set-saving file has been created. Users ensure that no features are added or deleted between writeselect file creation and use. Esri documentation warns that otherwise, results are unpredictable. Neither approach by itself is ideal.

Towards Ideal Record Selection Performance

 

The issue of this paper is how to exploit the best advantages of the filtering command types, while minimizing each one’s potential costs and dangers. The key is to reduce the number of direct item comparisons.

 

Combining Key Criteria with "Logical Feature Selection"

We now turn to developing specific techniques and guidelines for exploiting the best features of each command type.

We may exploit the fact that truly ad hoc unique key criteria are in one sense unusual. We say this because we have observed that unique custom selections are often decomposable into more commonplace selections created by one key comparison criterion at a time. Various basic preselected sets of records can be logically combined using READSELECT to provide many unique reselection combinations. We shall refer to this as "Logical Feature Selection."

Once fundamental sets of records have been chosen by direct comparisons of item values, pointers to the set can be saved with WRITESELECT and later retrieved with READSELECT in ARCPLOT. These retrievals may in turn be combined with logical (Boolean) operations.

 

 

Brief Review of Logical Operations and Related Feature Selection Commands

Readers unfamiliar with the options and usages of READSELECT and WRITESELECT in ARCPLOT may wish to refer to the Esri documentation to follow this discussion more easily. We will refer to some terms that are described there in more detail:

"Active selection set"—the selected collection of files and records, prior to a READSELECT

"Selection file set"—the collection of files and records that had been previously stored by a WRITESELECT command

A particular selection file set is specified by the first argument to the READSELECT command. It may also be desirable to review the online documentation for the ARCPLOT commands RESELECT, ASELECT, NSELECT, and CLEARSELECT.

 

 

Using Logical Expressions in Record Set Combinations

 

Logical terminology will define logical combinations of subcomponent selection sets unambiguously. We shall refer to such combinations as "logical expressions." "Logical Feature Selection" means performing what the expression describes. Euler-Venn, or "Venn", diagrams often reveal the essential nature of logical expressions plainly. Following are a few diagrams that illustrate this.

In the illustration that immediately follows, each circle represents a saved group of records that had been created by comparing an item value to some criterion for each record.

 

 

ILLUSTRATING LOGICAL COMBINATIONS

 

ARC Info Logical Operator Implementation on Individual Files

 

The logical operations applicable to our purposes are the operations that are easily performed with READSELECT sub-command options: AND, OR, and XOR. The NOT operator may be simulated file-by-file by using ARCPLOT NSELECT. All file records or features are selected quickly by ASELECT. ARCPLOT CLEARSELECT removes a file or feature from the active selection set altogether.

 

Tips for Using READSELECT with Several Files

What the preceding diagrams do NOT highlight is how two seemingly equivalent logical methods with ARCPLOT bitmapping command sequences can produce different results when several files are involved in the saved selection fileset and/or the active selection fileset. A review of READSELECT options based on ARC Info documentation plus a few additional protective suggestions (detailed below) will show us how to prevent such surprising and unwanted selection results.

 

---------------------------------------------------------------------------------------------------------------------------------------------

 

READSELECT Options: CLEAR, KEEP, REPLACE, OR, XOR, AND

Note: In the summary below, the term "file" may refer to either a coverage or an info file.

 

Clears all records from the active selection set, often unexpectedly

 

Prevents clearing existing fileset selections, but conversely alters only file selection for files not in the active selected set. To alter a file that already contained the active selection set, first remove it from that set with CLEARSELECT.

 

Leaves files outside of the <selection file> set in the active selection environment alone, but clears active selection set files of their selections and then places the selections from <selection file> set into these files. This is equivalent to CLEARS on each file in <selection file>, followed by READSELECT KEEP.

 

Adds records to the active fileset that were contained in <selection file>. Files outside of the <selection file>

set will have their record selections left unchanged.

 

Replaces records in the active fileset that were contained in <selection file>, but NOT contained in the active set.

Files outside of the <selection file> set will have their record selections left unchanged.

 

(Continued)

 

 

 

 

---------------------------------------------------------------------------------------------------------------------------------------------

 

READSELECT Options: CLEAR, KEEP, REPLACE, OR, XOR, AND

 

 

Merges the <selection file> set with the selection file in a logical intersection (figure 1, illustration 1). Files NOT INCLUDED in the <selection file> set of files ARE CLEARED of their selections. This can be potentially disastrous, so we recommend routine protection against disaster by applying READSELECT AND as follows:

WRITES <selection filename1> /* Global save of immediate selection environment

READS <AND-selection filename2> AND /* Applies logical intersection, clears files that are external to the <selection file>

 

READS <selection filename1> KEEP /* Restores selection sets to files that had active selections outside of the <selection file>, if desired

 

---------------------------------------------------------------------------------------------------------------------------------------------

 

The last command restores any selected record sets for files in the original active selection environment that may have been cleared by the "AND" READSELECT option. This command sequence executes much more quickly than an equivalent "reselect-type" command sequence would.

 

Thus we see that any logical operation for combining record set selections has command equivalents in ARC Info. To perform these operations reliably, it is important to bear in mind the tips and precautions outlined above, especially with multiple logical intersection (READSELECT AND).

 

 

 

The Problem Of Unstable Databases

 

There is also a more fundamental problem of reliability that will apply to the any process that involves storing pointers to selected records. one must take care not to restore(READSELECT) a <selection filename> record set selection to any file which may have had a record added or deleted since <selection filename> had been stored. Many databases are subject to continual update, so an understanding of techniques for dealing with unstable databases is essential to mastering filtering techniques. We will present one especially versatile approach.

 

The heart of the approach is to maintain a stable INFO file with a keyitem that points dependably (by keyfile match) to a relatively unstable feature table or library layer. This "virtual stability keyfile" is especially useful with library layers, since, as Esri has documented, direct use of READSELECT or WRITESELECT on library layers cannot be done. But by using specially-created info sub-files consisting only of key items and a unique identifier, it is possible to create much smaller, faster-processing snapshots of keys and criteria that effectively impose a "virtual stability" on changing coverages or library layers.

 

One may pre-select the "virtual stability keyfile" with "bitmapping" commands, and the result can be transferred to the original feature table or library layer with a single RESELECT …KEYFILE in ARCPLOT. Thus "bitmapping" commands can indirectly control the resultant set of selected features.

 

Making a "Virtual Stability Keyfile"

 

For illustration consider a user who has a library layer of roads and highways called ROADS which is contained in a library called TESTLIB. The layer must contain a unique identifier for every feature. This identifier item is called LINKID. The layer also contains several attributes. Some of these attributes have values that we will use as criteria for record / feature selection. These "selection-criterion" attributes are: CFCC, CITY, COUNTY, FIRE and SCHOOL

 

We begin to create the "virtual stability keyfile" by ensuring all valid features are selected in ARCPLOT. In this case, a valid feature is any feature with LINKID GT 0:

 

LIBRARY TESTLIB

TILES ALL

ASEL ROADS LINE

RESELECT ROADS LINE LINKID GT 0

 

We will use INFOFILE here to make a keyfile for one or more of the layer’s selection/criterion attributes. PULLITEMS in ARC and PUT in ARCEDIT accomplish the same thing in different software environments. The command extracts the identifier item together with any needed selection/criterion items. These additional item values will later be used to create various sub-selected sets of features.

 

INFOFILE ROADS LINE ROADS.KEY LINKID CFCC CITY COUNTY FIRE SCHOOL INIT

 

 

Using a "Virtual Stability" Keyfile

 

It is now possible to select for each type of CFCC a TIGER file attribute for road categories and make a writeselect file for each set of features with the CFCC values of interest. All roads have a CFCC that contains the letter "A" and a two- digit classification code. This letter (A) and the first digit of the classification code will be used as selection criteria. A command sequence for making writeselect files for primary, secondary, local, and other roads on the INFO keyfile is:

 

CLEARSELECT

RESELECT ROADS.KEY INFO CFCC CN ’A1’

WRITESELECT ROADS.CFCC.A1 ROADS.KEY INFO

CLEARSELECT

RESELECT ROADS.KEY INFO CFCC CN ’A2’

WRITESELECT ROADS.CFCC.A2 ROADS.KEY INFO

CLEARSEL

RESELECT ROADS.KEY INFO CFCC CN ’A3’

WRITESELECT ROADS.CFCC.A3 ROADS.KEY INFO

CLEARSEL

RESELECT ROADS.KEY INFO CFCC CN ’A4’

WRITESELECT ROADS.CFCC.A4 ROADS.KEY INFO

CLEARSEL

 

The writeselect files created above are used next with the READSELECT command OR option below: The NSELELECT command will then result in all the other roads.

 

READSELECT ROADS.CFCC.A1 CLEAR

READSELECT ROADS.CFCC.A2 OR

READSELECT ROADS.CFCC.A3 OR

READSELECT ROADS.CFCC.A4 OR

 

The active selection set is now the logical union of all the features specified in any of the above writeselect files.

 

NSEL ROADS.KEY INFO /* All other roads

 

This NSELECT is equivalent to bracketing the logical union and performing a logical NOT.

Now we save a new selected set, logically derived from the others.

 

WRITESELECT ROADS.CFCC.OTHER ROADS.KEY INFO

 

We will similarly base other writeselect files on the item CITY. The city item values distinguish between roads that are inside or outside of a particular city. SCHOOL and FIRE can likewise be used as selection criteria, comparing against specific values.

 

CLEARSELECT

RESELECT ROADS.KEY INFO CITY EQ 'ESCONDIDO'

WRITESELECT ROADS.CITY.ESCONDIDO ROADS.KEY INFO

CLEARSELECT

RESELECT ROADS.KEY INFO FIRE EQ 'CDF'

WRITESELECT ROADS.FIRE.CDF ROADS.KEY INFO

CLEARSELECT

RESELECT ROADS.KEY INFO SCHOOL EQ 'NORTH COUNTY'

WRITESELECT ROADS.SCHOOL.NORTH_COUNTY ROADS.KEY INFO

 

Now logical feature selection can be demonstrated using the writeselect files just created. If the user wished to select all the primary divided roads (CFCC CN ‘A1’) that are also in the city of Escondido, then the following command sequence would generate a corresponding subset of records in the INFO keyfile. The keyfile is then used to select the features from the library layer ROADS.

 

LIBRARY TESTLIB

TILES ALL

READSELECT ROADS.CFCC.A1 CLEAR

READSELECT ROADS.CITY.ESCONDIDO AND

RESELECT ROADS LINE KEYFILE ROADS.KEY LINKID

 

The preceding command sequence uses writeselect files on the INFO keyfile in logical combination and then selects exactly corresponding features from a large, dynamic library layer. The final keyfile selection set is transferred to the larger database with RESELECT KEYFILE. The indexed unique identifier serves as the keyitem. This two-stage procedure is not nearly as fast as a direct bitmapped approach, but does perform much faster than a complex combination of "reselect-type" selections.

Speed is gained because the only major value comparison passes performed with this approach are those involving the RESELECT KEYFILE. RESELECT KEYFILE is especially efficient with indexed keyitems (see Esri command documentation). It is as if the data to which the bitmapping commands pointed had been placed in a state of "Virtual Stability" in which the selection procedure ignored records added or deleted records since the keyfile had been created.

 

Guidelines for Using READSELECT/WRITESELECT Commands

In principle, one can construct many complex record selection sets by emphasizing either of the two command types we have discussed: "reselect-type" or READSELECT/WRITESELECT type.

 

1. Reselect-type commands have the advantage of being based on the most current information. Their disadvantage is that they can easily cause expensive multiple-pass searches on large files

 

2. New selection set environments may be composed with great speed from logical (Boolean) combinations of previously saved record sets, using the ARCPLOT READSELECT command. Conversely, seemingly unique record set combinations can often be decomposed into subsets that have widespread re-usability.

 

3. The speed advantages of READSELECT statements depend on having re-usable component selection environments already set up and saved. The files to which these saved selection sets apply must not have records added or deleted during the period between the selection file saving and file use.

 

4. It may not be possible to foresee when it is advantageous to pre-store any given collection of file selections. This is particularly true with arbitrary ad hoc database queries. These situations call for the exclusive use of "reselect-type" commands.

 

5. A saved selection applied to a file that had had records added or deleted will produce an unpredictable result. However, saved selections may still be applied indirectly. This can be accomplished by periodically creating a stable "keyitem" file from a file that is subject to update. READSELECT and WRITESELECT will operate reliably on such a file. These selection results can then be transferred to the files that are being updated.

 

 

Conclusion

 

If ARC Info programmers logically decompose their reselection queries into re-usable subsets, the techniques described in this paper can be powerfully applied to improve record selection performance. Many involved topics like indexing, logical decomposition, and key item selection have a bearing on any specific implementation, so we must leave our discussion incomplete. Ideal implementations of these principles are outside the scope of this paper, but major gains in performance are readily available to programmers who are willing to research and experiment.


Bruce Tepley
Senior Analyst, Custom Information Services
Bruce Thomas
GIS Project Manager, Geofinder Services
Thomas Bros. Maps
17731 Cowan
Irvine, California 92714-6065
Telephone: (714) 863-1984
Fax: (714) 757-1564