ArcSDE Tuning -- Tips and Tradeoffs

Mark Harley

ABSTRACT

ArcSDE is a flexible product which can be configured in numerous environments, supporting a wide range of applications. This session explores ArcSDE database design and tuning techniques. Topics include choosing a configuration to suit specific application requirements and identifying the parameters which have the greatest influence on performance. Tips will be provided to help identify those items which are most critical, and others where spending less time is appropriate. Design for scalability is discussed, including planning early to simplify future system growth. The presentation is supported with detailed data measured from varied real-world tuning situations.


OUTLINE

The major sections in this document are:


ARCSDE SYSTEM DESIGN CONSIDERATIONS

Introduction

ArcSDE is a flexible product which can be configured in numerous environments, supporting a wide range of applications. There are many factors which affect the performance of an ArcSDE database. By anticipating system needs in advance and following reasonable design guidelines, satisfactory performance will be achieved more readily.

Because it is a frequently-used configuration, this paper primarily describes ArcSDE setup for an Oracle 8i-based system, using release ArcSDE 8.0.2, presuming data stored in native ArcSDE Compressed Binary format. Many of the concepts are at least generally transferable to other environments, including the new ArcSDE 8.1 release. Version 8.1 contains added functionality and performance enhancements, but it is fundamentally similar in much of its basic configuration. This document does not attempt to cover every aspect of ArcSDE system design. Many of the important parameters are outlined here, affecting typical situations and this information can be used to assist in designing and setting up an ArcSDE server for a specific application. There are numerous details which are described well in the Oracle and Esri documentation. Particularly helpful from Esri are the ArcSDE Administration Guide, for version 8.0.2, (which is renamed Managing ArcSDE Services in version 8.1) and the ArcSDE Configuration and Tuning Guide for Oracle. The latter is available for other supported databases as well. It is assumed that the reader has a basic knowledge of the structure and installation of both Oracle and ArcSDE. The intent here is to provide information to assist with setting them up for optimal performance.

Overall System Considerations

When designing an ArcSDE system, these are the types of questions to answer right from the beginning:

Based on the answers to these questions, the initial system components and parameters can be defined. The physical database and ArcSDE disk layout can also be proposed.

System Setup Overview

At the highest level, these steps should be performed:

Basic Installation Steps

In a bit more detail, the steps (presuming an Oracle database) are:

  1. Perform system design, including reasonable estimates for tuning parameters
  2. Set up server hardware
  3. Install relational database software
  4. Set initial Oracle tuning parameters as appropriate: edit init.ora
  5. Locate the datafiles, sized according to number of physical disks, the number of data layers (or ArcSDE “feature classes”) and the number of features (or “shapes”) per feature class, distributed according to type of access anticipated
  6. Install ArcSDE software
  7. Set initial ArcSDE tuning parameters in the file SDEHOME/etc/giomgr.defs
  8. Specify ArcSDE spatial storage method
  9. Define the physical sizes (extents) of database tables and indexes in the file SDEHOME/etc/dbtune.sde
  10. Load data into the ArcSDE server
  11. Measure system performance of both Oracle and ArcSDE
  12. Alter tuning parameters as appropriate, or, in extreme cases, reconfigure system and reload data

CONFIGURATION DETAILS

Oracle Set-up Overview

These are the basic steps required to set up an Oracle database for ArcSDE:

Oracle Set-up Details

Described here are considerations and trade-offs for many of the important items to be configured in an Oracle database which is being set up for use as an ArcSDE server. This information is intended to assist with the design process, rather than to be an all-inclusive, step-by-step set of instructions. Some overall knowledge of Oracle system administration is presumed. The Oracle and Esri documentation mentioned in the introduction are highly recommended additional resources.

Oracle configuration and tuning

There are numerous other Oracle parameters which may have an effect on ArcSDE performance. If possible, find an experienced Oracle database administrator and follow any site or application specific recommendations they may have. The Esri documentation also provides some reasonable default values and suggests circumstances when they may be inappropriate.

Physical disk layout - Oracle

Physical disk layout - ArcSDE tables

Oracle stores tables and indexes in tablespaces. Tablespaces consist of one or more datafiles, located on physical (or logical) disks in the system.

Estimating tablespace sizes

A tablespace is a named virtual entity which is specified from SQL and ArcSDE as a "location" for storing tables and indexes.

ArcSDE Initialization: giomgr.defs

The SDEHOME/etc/giomgr.defs file is read when the ArcSDE server is started. It is used to set most of the server tuning parameters. Once running, the giomgr process spawns distinct gsrvr processes to manage each individual ArcSDE connection, but giomgr does not re-read the giomgr.defs file. If parameters are changed, it will be necessary to stop and restart the ArcSDE server for them to take effect.

The following is a description of each giomgr parameter, grouped by type. The default values listed for each parameter are those recommended in the Esri documentation for ArcSDE version 8.0.2; most recommendations are the same in version 8.1.

Session Parameters

CONNECTIONS

Description: The maximum number of concurrent connections allowed for an ArcSDE service. The value should be the same as the total number of licensed ArcSDE connects, unless the system is designed for load-balancing connections between multiple servers. On a smaller system with only a few connect licenses, the value should be reduced from the Esri default to save memory.
Cost: 780 bytes of shared memory each
Default: 64
TEMP
Description: The full path name to a temporary disk space directory. Used for attribute Binary Large Objects which exceed the size specified by the giomgr parameter BLOBMEM.
Cost: Esri recommends a minimum of 5mb of available space.
Default: /tmp
TCPKEEPALIVE
Description: A TCP/IP-wide parameter which causes the system to check connected sessions periodically; the default is every two hours. If anArcSDE client has been terminated, and its connection is hanging, the connection is closed. Beware that a very short-term network outage - right as the connection test occurs - could cause active processes to be disconnected!
Cost: Esri recommends FALSE unless there are frequent problems with clients which exit without releasing ArcSDE connect licenses.
Default: FALSE

Transport Buffer Parameters

Transport buffers are used to store data being moved between the ArcSDE client and server (in either direction). By storing up data and transmitting it in bulk, the communication overhead can be reduced considerably. When a client issues a query and is waiting, data will be sent when at least MINBUFSIZE bytes are available in the server’s transport buffer. If the client is not waiting, the server will continue to process the query until the MAXBUFSIZE buffer is full

MAXBUFSIZE

Description: The size in bytes of the client and server transport buffers. The value must be greater than MINBUFSIZE and at least 12,288.
Cost: The number of bytes specified in this parameter is allocated on both the client and server for each connection.
Default: 65536
MINBUFSIZE
Description: The minimum size in bytes which must accumulate before data is transferred between client and server - if the receiving side is waiting. The minimum value is 4096.
Cost: If this value is too high, the interactive wait time may increase for some queries. If the value is too low, additional client/server transfer overhead may be incurred. A value around one quarter of MAXBUFSIZE often works well. Avoid setting to greater than one-half the value of MAXBUFSIZE.
Default: 16384
MINBUFOBJECTS
Description: The minimum number of objects which must accumulate in a buffer before data is transferred between client and server.
Cost: Similar considerations as MINBUFSIZE; the appropriate value depends on the size of an average row of data.
Default: 512

When initially loading a large database, with no (or few) other users on the system, it is recommended to increase MAXBUFSIZE significantly. At least one or two megabytes is a reasonable value to use; set MINBUFSIZE to its maximum recommended value as well: one-half of the MAXBUFSIZE value.

Array Buffer Parameters

An array buffer is created on the server for each open data stream. It provides the first level of buffering between the ArcSDE server software and the underlying relational database server. As with the transport buffers, by storing up records and transmitting them in bulk, the communication overhead can be reduced considerably. Keep in mind that the array parameters are for the entire ArcSDE server. They must be set to accommodate the complete set of anticipated feature classes. If the data changes significantly over time, it may be necessary to adjust the values.

MAXARRAYBYTES

Description: The total number of array bytes available to be allocated per stream for both the attributes and shape points buffers.
Cost:
Default: 550000
SHAPEPTSBUFSIZE
Description: The buffer size, in bytes, to contain all of the points for the number of given features (shapes) retrieved per MAXARRAYSIZE. In other words, for a polygonal feature class, this buffer will contain all of the x,y coordinates describing each polygon. Esri suggests setting this buffer to hold 90% of the features at 8 bytes per x,y pair. The calculation is:
SHAPEPTSBUFSIZE = (MAXARRAYSIZE) * (number of points per feature) * (8 bytes per point)
The Esri defaults presume an average of 500 points per feature, with MAXARRAYSIZE of 100, and 8 bytes per point. The above formula calculates as follows:
400000 = 100 * 500 * 8
Of course, if the feature classes stored in the database contains points only, a fixed 8 bytes per shape is required and no estimation of points per feature is necessary. Note that if shapes have measures, annotation and/or z-values, additional bytes are used.
Cost: If this parameter is set too low, many features will be retrieved individually which reduces performance. Too high a value uses excessive memory.
Default: 400000
ATTRBUFSIZE
Description: The buffer size, in bytes for (non-blob) attribute data retrieved from the business table.
Cost: If too low, the buffer will not be used effectively, slowing performance. Too large a value wastes memory.
Default: 50000
MAXARRAYSIZE
Description: Actually the number of rows (feature geometry and attributes) that the RDBMS server will retrieve on each request.
Cost: If too large a value, memory will be wasted. If too small, the number of requests made to the database may become excessive, negatively impacting performance.
Default: 100

Lock Parameters

If too many users are editing and receive SE_OUT_OF_LOCKS, one of the following LOCK parameters, must be increased to eliminate the problem. A restart of the ArcSDE server is required for any new value to take effect.

LOCKS

Description: The total number of area locks for an ArcSDE service.
Cost: Low - 12 bytes of shared memory per lock
Default: 10000
STATELOCKS
Description: The maximum number of locks which can be applied to states in a versioned database. A state lock is allocated for each query or edit operation on a versioned feature class or table.
Cost: Low - 8 bytes of shared memory per versioned database lock
Default: 10000
OBJECTLOCKS
Description: The maximum number of locks which can be applied to database objects
Cost: Low - 8 bytes of shared memory per object lock.
Default: 10000
MAXTABLELOCKS
Description: The maximum number of business table row locks for an ArcSDE service.
Cost: Low - 8 bytes of shared memory per object lock.
Default: 10000

Raster Image Parameters

RASTERCOLUMNS

Description: The maximum number of raster columns permitted in this ArcSDE database.
Cost: CCC
Default: 500
RASTERBUFSIZE
Description: The raster transport buffer size in bytes. Must be large enough to hold the largest raster image stored in the database. The raster array buffer is set at twice RASTERBUFSIZE.
Cost: RASTERBUFSIZE bytes per client and three times RASTERBUFSIZE on the server.
Default: 102400

General Parameters

Setting many of these parameters too low will cause an error; too high, may reserve excessive memory for buffers that are never used. Specific conditions are outlined below.

LAYERS

Description: The maximum number of Feature Classes to be served by this ArcSDE server
Cost: Low - 12 bytes of shared memory per entry.
Default: 500
REGISTRATIONS
Description: The maximum number of tables registered for logging or versioned access.
Cost: Uses 504 bytes of shared memory per entry.
Default: 1000
MAXTIMEDIFF
Description: Value in seconds representing the maximum allowable difference between the client’s and server’s clock. This is a security measure which may be appropriate depending on the system network architecture. The Esri default value allows for a 30 minute deviation. A low value, like 60 seconds, is recommended if security is a major concern. A value of -1 disables the feature.
Cost: Possible lower system security if the value is large, or -1 to disable. Possible annoyance if the value is too low and client / server clocks are not well synchronized.
Default: 1800
MAXBLOBSIZE
Description: The maximum number of bytes allowed for any BLOB (binary large object) on an ArcSDE server.
Cost: If too small, data errors may occur when trying to store certain objects in the system.
Default: 1000000
BLOBMEM
Description: The maximum number of bytes per BLOB for in-memory storage. Any BLOB exceeding this size will be written to disk, instead of remaining in memory.
Cost: If too small, BLOB data will not be cached in memory, reducing speed. Too large a value wastes memory.
Default: 1000000
AUTOCOMMIT
Description: The number of database transactions which will occur before a commit is done. A value of zero means a commit is only done on user request.
Cost: If this value is too high (or if set to zero and user requests are not issued with sufficient frequency), the relational database logs may not be large enough to hold the complete transaction. Committing too often may be inefficient.
Default: 1000
MAXINITIALFEATS
Description: Controls the number of features allocated when a new ArcSDE layer (feature class) is created. The administrator can keep this value at a reasonable level and prevent users from specifying a huge initial features value and accidentally taking a very large initial database extent.
Cost:
Default: 10000
MAXDISTINCT
Description: Sets the maximum number of unique values which can be returned when calculating table statistics. A value of zero allow an “infinite” number of distinct values to be returned.
Cost: Calculating stats for an extremely large table could jeopardize server or client memory.
Default: 512
MAXSTREAMS
Description: Specifies the maximum number of simultaneously opened streams on an entire ArcSDE server. The Esri default value of eight is on the conservative side; it may need to be increased, depending on specific usage.
Cost: If too low, users may see the SE_TOO_MANY_STREAMS error. If too large, significant server memory resources may be consumed.
Default: 8
STREAMPOOLSIZE
Description: When a stream is closed, its memory resources are released to the system. The stream pool is a collection of closed streams which are held for future use instead. Esri recommends a value of up to one-half of MAXSTREAMS if users open and close streams frequently. A value of 10% or less (or even zero) may be appropriate if applications typically commence and access the same data throughout their operation.
Cost: If too large, many closed and unused streams can occupy memory. If set very small and many streams are opened and closed, system performance may slow.
Default: 3
STATECACHING
Description: If set to TRUE, the state of each stream is maintained in memory on the server. If false, the state must be read from the disk for each stream operation.
Cost: Frequent stream operations should be cached for maximum performance.
Default: TRUE
READONLY
Description: Set this parameter to TRUE to prevent ArcSDE clients from writing to the database.
Cost:
Default: FALSE

ArcSDE Physical Storage

dbtune

Edit the file SDEHOME/etc/dbtune.sde to define the specific storage layout for each ArcSDE feature class. The labels inserted in the dbtune file should be referenced from the "sdelayer" command when creating new feature classes. Presuming the block of text shown below were added to the dbtune file, the command would include "sdelayer -o add -k GDT_WATER_POLYGON", where -k [name] refers to the keyword [name] in the dbtune file.

As a system is designed, space requirements should be calculated for each feature class. There will be separate sizes for each of the Oracle entities listed here (where [n] is replaced with the internal ArcSDE layer number):

Here is a sample section from a dbtune file, showing the pre-calculated initial and next extent sizes for a hypothetical "water polygon" feature class. In each case, the initial extent was estimated to hold the entire table and a next extent of about 10% of the total size was included in case the estimate was a bit low, or some additional data was added later.

##GDT_WATER_POLYGON

A_TBLSP SDE_BUSINESS
A_INIT 13M
A_NEXT 2M

F_TBLSP SDE_FEATURE
F_INIT 128M
F_NEXT 13M

S_TBLSP SDE_SPATIAL_INDEXES
S_INIT 21M
S_NEXT 3M

INDEX_TABLESPACE SDE_ORACLE_INDEXES
A_IX1_INIT 7M
A_IX1_NEXT 720K
F_UK1_INIT 7M
F_UK1_NEXT 720K
S_IX1_INIT 32M
S_IX1_NEXT 4M
S_IX2_INIT 7M
S_IX2_NEXT 720K

END

Note that in ArcSDE release 8.1 the dbtune.sde file becomes a database table instead - the DBTUNE table in the SDE schema. A utility, sdedbtune has been provided by Esri to make it easy to import previous dbtune.sde files, or to export a DBTUNE table to a text file for easier editing.

Grid Sizing

Setting the grid size for the spatial index can seem complex at first, but reasonable values can be found fairly easily. ArcSDE also provides tools (like sdelayer) to display the spatial index statistics once a feature class is loaded. Since it is possible to change the grid size at any time by rebuilding the spatial index, the initial decision is not cast in stone.

One general rule of thumb is to decide on an average viewing area in ArcMap or an ArcIMS service. There should be approximately nine grid cells contained in that space. In other words, if an average display window for some street centerline data covers an area of 3 miles by 3 miles, the grid cells should be sized at about one square mile.

A fairly common mistake is to try and take advantage of all three levels of grid, perhaps because they are available. Actually, it is far more efficient to minimize the number of entries in the spatial index table and stick with one grid level whenever possible. An exception is when the features in a feature set vary widely in size. If an initial grid size is calculated, and the spatial index statistics (calculated by ArcSDE with "sdelayer -o si_stats") show that over 25% of the data is large enough to be in more than four grid cells, it may be time to consider a second grid level. When multiple grids exist, any feature inserted will be tried at the first (smallest) level first. If it occupies more than four cells, it is automatically promoted up to any higher grid level present. When there is no higher level, it remains in each of the largest grid cell(s). Consider though that ArcSDE scans the spatial index once per level of grid present. Adding additional grids causes more Oracle scanning and therefore it should only be done when absolutely necessary.

Here is an example from a street database containing about 2.5 million line features. Notice that one grid level seems appropriate in this situation. Most of the features are of a similar size and 100% of them occupy four or fewer grid cells. Of course, this one statistic is easy to achieve by making the grid too large. In this layer though, there are an average of 36.46 features in each grid cell which seems about right. In a dense area, there are apparently 1,857 features in a single grid which seems a bit on the high side. It is possible that a slight reduction of the grid size might speed feature retrieval in urban areas. It might also significantly increase the overall number of spatial index records, however, which is undesirable.

% sdelayer -o si_stats -s burma -i gdt_devdb8s -u gdt -p gdt -l street,bus_fid

ArcSDE 8.0.2 Build 350 Fri Aug 18 11:30:49 PDT 2000
Layer Administration Utility
-----------------------------------------------------
Layer 108 Spatial Index Statistics:
Level 1, Grid Size 0.02
|-------------------------------------------------------------------|
| Grid Records: 2849693 |
| Feature Records: 2517875 |
| Grids/Feature Ratio: 1.13 |
| Avg. Features per Grid: 36.46 |
| Max. Features per Grid: 1857 |
| % of Features Wholly Inside 1 Grid: 88.41 |
|-------------------------------------------------------------------|
| Spatial Index Record Count By Group |
| Grids: <=4 >4 >10 >25 >50 >100 >250 >500 |
|---------- ------ ------ ------ ------ ------ ------ ------ ------ |
| Features: 2517866 9 0 0 0 0 0 0 |
| % Total: 100% 0% 0% 0% 0% 0% 0% 0%|
|-------------------------------------------------------------------|

With grid size changed from 0.02 to 1, notice there are now over 37,000 features per grid on average, up from about 36. The grid size is way too big!

% sdelayer -o si_stats -s burma -i gdt_devdb8s -u gdt -p gdt -l street,bus_fid

ArcSDE 8.0.2 Build 350 Fri Aug 18 11:30:49 PDT 2000
Layer Administration Utility
-----------------------------------------------------
Layer 108 Spatial Index Statistics:
Level 1, Grid Size 1
|-------------------------------------------------------------------|
| Grid Records: 2524213 |
| Feature Records: 2517875 |
| Grids/Feature Ratio: 1.00 |
| Avg. Features per Grid: 37120.78 |
| Max. Features per Grid: 312013 |
| % of Features Wholly Inside 1 Grid: 99.75 |
|-------------------------------------------------------------------|
| Spatial Index Record Count By Group |
| Grids: <=4 >4 >10 >25 >50 >100 >250 >500 |
|---------- ------ ------ ------ ------ ------ ------ ------ ------ |
| Features: 2517875 0 0 0 0 0 0 0 |
| % Total: 100% 0% 0% 0% 0% 0% 0% 0%|
|-------------------------------------------------------------------|

Here are the basic layer statistics for the above feature class:

% sdelayer -o stats -s burma -i gdt_devdb8s -u gdt -p gdt -l street,bus_fid

ArcSDE 8.0.2 Build 350 Fri Aug 18 11:30:49 PDT 2000
Layer Administration Utility
-----------------------------------------------------

Statistics for Layer 108
--------------------------------------------------------------
Total Nil Features: 0
Total Point Features: 0
Total MultiPoint Features: 0
Total Line Features: 0
Total MultiLine Features: 0
Total SimpleLine Features: 2517875
Total MultiSimpleLine Features: 0
Total Area Features: 0
Total MultiArea Features: 0
-----------------------
Total Features: 2517875

Minimum Feature Number: 1
Maximum Feature Number: 2517875
Last Modified: Thu May 31 08:21:26 2001

Largest Feature: 488 Points
Smallest Feature: 2 Points
Average Feature: 4.53 Points
Minimum Line String Length: 0.000006
Maximum Line String Length: 0.093399
Average Line String Length: 0.002169
Layer Envelope:
minx: -124.40657, miny: 32.53995
maxx: -114.13497, maxy: 42.00924

Calculate Statistics

After loading has completed on a feature class, use the ArcSDE command:

sdetable -o update_dbms_stats -m compute
to update the database statistics for each feature class that you loaded data into. By using the ArcSDE command, rather than analyzing the tables directly in Oracle, it ensures statistics are calculated in the way ArcSDE expects them. Unless a feature class is huge, try "-m compute". If computing the statistics takes too long, try "-m estimate". Estimating will run more quickly, but does not always give good enough results to allow the Oracle optimizer to do its job properly.

TUNING TIPS

Helpful Hints for Different Configurations

These suggestions can be particularly useful when setting up an ArcSDE server for the first time.

Bulk-loading an empty database

Setting up a frequently-edited database

Setting up a read-only database

Oracle statistics warning

Size calculations

Spatial index order


AUTHOR

Mark Harley
Principal Engineer; Director, Consultative Services; Geographic Data Technology, Inc.
11 Lafayette St.
Lebanon, NH 03766-1445
Telephone: (603) 643-0330
Fax: (603) 653-0249
E-mail: mark_harley@gdt1.com
Web: http://www.geographic.com