Where in the World is the Esri User Conference (Exploiting Feature Information Utilizing a Worldwide Gazetteer Database)

Access to a names database is an important criterion to many applications that process data for worldwide applications. There are a variety of business and government applications that require access to names information. Such applications vary from business to government. The Esri Spatial Database Engine (SDE) provides an enterprise wide repository for spatial and attribute data within an RDBMS. Its client/server architecture, built-in spatial analysis and query tools and accessibility from ArcView allow efficient access, management and distribution of spatial and attribute data throughout a network. Implementation of a names database in SDE provides users with a valuable resource. There are many government, commercial and network sources of names information. Two official names sources are the NIMA Geographic Names Placement System (GNPS), accessible from NIMA's GEOnet Names Server (GNS) and USGS Geographic Names Information System (GNIS) databases. Using these sources as input, an SDE names database was generated. A customized ArcView menu interface provides access to the database for users.


INTRODUCTION

The ability to use feature name information in a GIS environment is an important requirement of many applications that process data for worldwide applications. There are a variety of business and government applications that require spatially referenced names and their associated attributes. Cities, rivers, lakes, airports and mountains are some common examples of features that have names. The purpose of this paper is to describe one such application implemented in SDE and ArcView.

The paper discusses sources of official names data, provides descriptions of names data attributes and parameters for the names SDE database. Also discussed are the techniques used to load the SDE database, exporting and importing of the database and the ArcView interface to access the data.

DATA SOURCES

The two sources of official geographic names are the U.S. Geological Survey (USGS) which is part of the U.S. Department of the Interior and the National Imagery and Mapping Agency (NIMA). Users can obtain data directly from these agencies or from commercial sources. The U.S. Board on Geographic Names (BGN) approves the official names in each of the databases maintained by these organizations.

The BGN meets regularly during the year and is composed of representatives from a variety of Government agencies. USGS is responsible for names within the United States, its protectorates and Antarctica. NIMA is responsible for world wide names not including those falling under the responsibility of USGS.

The USGS maintains names in a system called the Geographic Names Information System (GNIS). The GNIS database is actually three related databases: 1) the National Geographic Names Data Base (NGNDB), 2) the National Topographic Map Names Data Base (NTMNDB), and 3) the Reference Data Base (RDB). The term GNIS frequently refers to the NGNDB its main component. The NGNDB contains over 1.5 million records. The GNIS database is available on CD-ROM directly from USGS. The USGS world wide web site provides more information about GNIS.

www-nmd.usgs.gov/www/gnis

NIMA maintains foreign names in the Geographic Names Placement System (GNPS). The names are accessible from NIMA's GEOnet Names Server (GNS). The system contains the digital database of NIMA's gazetteer information. The database has over 4 million names. Users can obtain more information about the data from NIMA's world wide web page listed below. Also, the web site provides the capability to query names in the database.

www.nima.mil

GDE System Inc is a commercial provider of official geographic names. They provide a product called GEONAMES available on CD-ROM that provides a graphic interface to query and view names data. It also provides a capability to export data in tabular form. The tabular query results can be directly loaded into ArcView as an event theme. The tabular format lends itself well to loading into an SDE database by writing an import module using the SDE API. Another approach to load SDE with tabular data is to load the data into ArcView, convert it to a shapefile and use the shapefile to sde converter. This may not work well for large data sets. Users can obtain more information about GEONAMES from the GDE System Inc. web site.

There is also a wide variety of names data available from government, commercial and network sources that are not official names databases. The non-official sources offer a valuable resource for many specific applications. Sometimes these databases are derived from official sources. The non-official databases usually contain additional regional and/or demographic information. The focus of this paper is on official names and a developed application using these names.

SOURCE DATA CONTENT

The GNPS and GNIS databases differ in both their structures and attribute content. However there is some commonality between the two databases. Both databases contain name information, variant names, a name class category (e.g., populated place, airport, river, mountain, …) and geographic references in addition to their unique fields. The name class categories have some similarities but are different for the two databases.

The GNIS database contains additional reference information for name features to counties and states. States and counties have name and Federal Information Processing Standards (FIPS) coded attributes. Also included are topographic map references, elevation, description relative to nearby features, historical and administrative fields. Geographic coordinates are in degrees, minutes, seconds, and hemisphere format. A feature has a single primary coordinate. The center coordinate defines an area feature and the terminal point, such as the mouth of a river, defines linear features. Features may also have secondary coordinates. These coordinates do not map the shape of the feature rather each identifies a point on the feature within on a distinct USGS 7.5 minute topographic map.

The GNPS database also contains FIPS coded reference information for name features to countries and provinces. The GNPS supports the definition of names information through the transliteration and romanization of native languages. Also included in the database as attributes are name types (conventional, native, variant, short form, long form), UTM grid reference, JOG map reference, region of the world code, and generalized feature class type. Geographic coordinates are in degrees, minutes, seconds, and hemisphere format. The resolution of coordinates is to minutes. Feature origins for shape types are similar to the conventions used in the GNIS database.

SDE NAMES DATABASE CONTENT

The SDE names database contains geographic names information from both GNPS and GNIS databases. There are similarities between some but not all the attribute fields in the two sources. Because of these and other factors described below, each data source has its own SDE layer. Users of the system typically know prior to formulating a query if the features to be found are in the US or outside of it so logical segregation of the data makes sense. All names from the GNPS and GNIS database including variants were in the SDE names database. The SDE names database contains two point layers, one for the US data and one for Non US data. A single primary coordinate identifies each feature.

The internal SDE database requires that features be stored in positive integer space. The SDE dataset, which controls the layers coordinate conversion, has a false origin of -180 (longitude) and -90 (latitude) and a scale factor of 1000000. This ensured that the coordinates for name features were within SDE limits and while maintaining the resolution of the source data.

Attribute information stored in the GNIS (US) layer includes feature name, type of name, feature class type, state FIPS, county FIPS and topographic map name. The layer contains a record for each of the primary and variant names in GNIS. The attributes came from or were all derived from attributes found in the GNIS database.

Attribute information stored in the GNPS (non-US) layer includes: feature name, type of name, feature class and class type, country code, province code and region of the world code. The layer contains a record for each of the conventional, native, variant and non-verified names in GNPS. Diacritics and special characters are not included in the names fields. The attributes came from or were all derived from attributes found in the GNPS database.

DATA CONVERSION/IMPORT

Conversion of the GNIS and GNPS data involved exporting the data to flat file formats, transferring the data from a PC environment to UNIX, preprocessing the exported files to produce a single record per feature name format and conversion of degrees, minutes and seconds to decimal degrees.

The conversion of raw GNIS data into an SDE layer required several steps. GNIS data in a flat file format was generated using query and export capabilities available in the GNIS software on the PC. The data was transferred (ftp) from a PC to UNIX. UNIX commands (awk in particular) and C programs were run on the data to produce a flat file format that contained records with feature coordinates in decimal degrees and with a single record per feature name. An import module, written using the SDE C API, loaded the data into the SDE US names layer.

The GNPS SDE layer was loaded from a flat file that was produced from an export and manipulation of GNPS data. The export program handled the conversion of coordinates from degrees, minutes and seconds format to decimal degrees format. The data exported from GNPS was stripped of diacritics and special characters. The export of the data was performed on a PC. Additional processing of the data occurred on a UNIX workstation similar to the preprocessing of GNIS raw data. The additional processing created a flat file format with one record per feature name. An import module written with the SDE C API was used to load the data into the SDE world names layer.

DEVELOPMENT AND TARGET SYSTEM ENVIRONMENT

System and software development was performed on a Sun workstation that contained SDE (v2.1), Oracle RDBMS (v7), ArcView (v3) and a UNIX development environment. The target environment was a DEC Alpha server and client DEC Alpha workstations and NT PCs. The server contained SDE (v2.1) and Oracle RDBMS (v7). The client systems have ArcView (v3). The development and target environment and target environments were on physically separate local area networks. Software and data generated on the development system was transferred to the target system via 8-mm tape.

Limited disk resources on the development environment prevented the SDE names database from being populated with the full GNIS and GNPS data at the same time. The target system had ample resources to hold both SDE names data layers concurrently. In the target environment the SDE database co-existed in the same Oracle database with other non-names databases. Also, the Oracle server and SDE server resided on the same DEC Alpha server.

No software conversion was required to port the software and data developed on the development environment to the target environment. The SDE names database was exported on the development machine using the Oracle export (exp) utility and imported on the target machine using the Oracle import (imp) utility. The end user application software consisted on ArcView Avenue scripts that were integrated into ArcView project files in the target environment.

APPLICATION REQUIREMENTS

The application was required to search for names data, generate a result that could be viewed graphically and used for further analysis. The ability to constrain a query using attributes and to retrieve attribute values was also required. ArcView was selected as the front end interface for users to query, display and analyze names data. It offers a class of objects called database themes (dbtheme) that provide access into an SDE database. The dbtheme class is available in ArcView3 under the Database Themes extension.

Database themes, although quite powerful, is not sufficient to be used as a direct user interface by users. The volume of data in the SDE names database displayed in an ArcView view document would typically saturate the users display with point name features if the dbtheme was unconstrained. Also, since users of the system are novice computer and GIS users, a more directed user interface was required. The interface had to provide the users with a capability to manage database access, define and refined query constraints, query the database, and return results.

APPLICATION DESCRIPTION

The user interface was developed using ArcView's Avenue scripting language. The names application was incorporated into user's existing ArcView environment as menu items separated into three groups of options: 1) Manage database access, 2) Constrain a query, and 3) Perform or show the query.

The login options allowed the user to log into the SDE database using a specific login account and password or to log in using the automatic login capability. The automatic login capability in SDE requires users have an SDE login account set up by the SDE administrator. Authorization to the database is then allowed as the user is considered to be verified by the fact that they have logged on to their operating system account. From the user's perspective all they are required to do is to confirm the question: Do you want to use the automatic login? The user also has the option to log out and log into another SDE account. Access to the SDE names database was set to read only for users.

The constrain a query group of menu options provides the user with the ability to specify how to limit and select data from the database. The SDE layers are categorized by being U.S. or non-U.S. (World) names. The user can select to query either source or both. If only the U.S. constraint option is selected, then the World options are grayed out. If only the World option is selected, then U.S. constraint options are grayed out. The constraint options included: name of feature, feature category class (populated place, airport, river, …), state, country and name type (official, variant). The user can also select whether the query should include only features that would appear in the current ArcView view or all applicable features in the database. The user has further query capabilities on the name constraint. The user may specify that the name be an exact match, starts with the name, contains the name or sounds like the name.

Once satisfied with constraints of the query, the user may show the query as an SQL statement or run the query. Query results are returned in a shapefile format. The shapefile is added as a theme into the legend in the current ArcView view. U.S. and World name features are returned in separate themes as their attribute fields differ. If no features meet the current constraints a message box informs the user that zero records returned and a shapefile is not created. The user may refine the query or define a new query following the completion of a query.

RESOURCE MANAGEMENT

The time and effort to set up the database and transfer data layers from the development system to the target system turned out to be non-trivial. The Oracle export (exp) and import (imp) utilities were used to unload the data from the development system and load it to the target system.

The export of the data went smoothly. A number of resource management problems were experienced when the data was imported. These problems consisted of defining adequate tablespace resources for constraints, indexes, rollback segments and temporary segments. It took several data load iterations to define resources properly.

Users may want to consider other options when transferring SDE layers between systems. In other projects we have been very successful using the SDE to Shape (sde2shp) and Shape to SDE (shp2sde) utilities. Also we could have used the original import utility to load our flat files.

FUTURE PLANS

Several enhancements for SDE names database system are underway or being considered. These include upgrading of the SDE software and database, enhancing the user interface and making use of special characters and diacritics.

SDE version 2.1 is the platform of the current names database. With the soon-to-be released SDE version 3 additional capabilities will be available. Among these improvements are that the user will be able to join SDE tables with other tables in the database. This could greatly enhance capabilities of a names database while also requiring a redesign of the database. The use of additional attributes from non-official databases could easily be integrated in the SDE names database.

The current user interface operates as menu items on an ArcView view. Creating the interface as an ArcView extension would make the software more portable and allow the user to easily remove the options when they are not in use. Also the use of ArcView's dialog designer (also soon to be released) could create a simpler, easy-to-use interface.

The SDE names database does not currently store diacritics for foreign names. The use of a standard character encoding format, such as ISO 10646, expected in the future will make handling and display easier than some of the current proprietary fonts.

TRIVIA

So, where in the world is the Esri user conference this year? The answer of course is San Diego. San Diego is at latitude 32 degrees, 42 minutes 55 seconds north and longitude 117 degrees, 9 minutes, 23 seconds west in the database. The San Diego International Airport, which many of us will be flying into, is at latitude 32 degrees, 44 minutes, 0 seconds north and longitude 117 degrees, 11 minutes, 15 seconds west.

CONCLUSION

The Spatial Database Engine (SDE) is a server of GIS data stored in a relation database management system and a provider of GIS services. A geographic names database utilizing this technology has been developed. The content of the database is official geographic names and attributes from the NIMA GNPS database and USGS GNIS database. A customized menu interface in ArcView provides access to the database. The design and implementation of system and Oracle resources require careful planning for an SDE database. Additional planning requirements occur when transporting the database to another location. The availability of names data in a GIS environment provides users with a valuable resource.


Thomas Quinn, MTS TASC, Inc.
12100 Sunset Hills Road Reston, VA 20190 Telephone: (703) 834-5000 Fax: (703) 318-7900

E-mail: taquinn@tasc.com