Access to a names database is an important criterion to many
applications that process data for worldwide applications. There
are a variety of business and government applications that require
access to names information. Such applications vary from business
to government. The Esri Spatial Database Engine (SDE) provides
an enterprise wide repository for spatial and attribute data within
an RDBMS. Its client/server architecture, built-in spatial analysis
and query tools and accessibility from ArcView allow efficient
access, management and distribution of spatial and attribute data
throughout a network. Implementation of a names database in SDE
provides users with a valuable resource. There are many
government, commercial and network sources of names information.
Two official names sources are the NIMA Geographic Names Placement
System (GNPS), accessible from NIMA's GEOnet Names Server
(GNS) and USGS Geographic Names Information System (GNIS)
databases. Using these sources as input, an SDE names database
was generated. A customized ArcView menu interface provides access
to the database for users.
INTRODUCTION
The ability to use feature name information in a GIS environment
is an important requirement of many applications that process
data for worldwide applications. There are a variety of business
and government applications that require spatially referenced
names and their associated attributes. Cities, rivers, lakes,
airports and mountains are some common examples of features that
have names. The purpose of this paper is to describe one such
application implemented in SDE and ArcView.
The paper discusses sources of official names data, provides descriptions
of names data attributes and parameters for the names SDE database.
Also discussed are the techniques used to load the SDE database,
exporting and importing of the database and the ArcView interface
to access the data.
DATA SOURCES
The two sources of official geographic names are the U.S. Geological
Survey (USGS) which is part of the U.S. Department of the Interior
and the National Imagery and Mapping Agency (NIMA). Users can
obtain data directly from these agencies or from commercial sources.
The U.S. Board on Geographic Names (BGN) approves the official
names in each of the databases maintained by these organizations.
The BGN meets regularly during the year and is composed of representatives
from a variety of Government agencies. USGS is responsible for
names within the United States, its protectorates and Antarctica.
NIMA is responsible for world wide names not including those falling
under the responsibility of USGS.
The USGS maintains names in a system called the Geographic Names
Information System (GNIS). The GNIS database is actually three
related databases: 1) the National Geographic Names Data Base
(NGNDB), 2) the National Topographic Map Names Data Base (NTMNDB),
and 3) the Reference Data Base (RDB). The term GNIS frequently
refers to the NGNDB its main component. The NGNDB contains over
1.5 million records. The GNIS database is available on CD-ROM
directly from USGS. The USGS world wide web site provides more
information about GNIS.
NIMA maintains foreign names in the Geographic Names Placement
System (GNPS). The names are accessible from NIMA's GEOnet Names
Server (GNS). The system contains the digital database of NIMA's
gazetteer information. The database has over 4 million names.
Users can obtain more information about the data from NIMA's world
wide web page listed below. Also, the web site provides the capability
to query names in the database.
GDE System Inc is a commercial provider of official geographic names. They provide a product called GEONAMES available on CD-ROM that provides a graphic interface to query and view names data. It also provides a capability to export data in tabular form. The tabular query results can be directly loaded into ArcView as an event theme. The tabular format lends itself well to loading into an SDE database by writing an import module using the SDE API. Another approach to load SDE with tabular data is to load the data into ArcView, convert it to a shapefile and use the shapefile to sde converter. This may not work well for large data sets. Users can obtain more information about GEONAMES from the GDE System Inc. web site.
There is also a wide variety of names data available from government,
commercial and network sources that are not official names databases.
The non-official sources offer a valuable resource for many specific
applications. Sometimes these databases are derived from official
sources. The non-official databases usually contain additional
regional and/or demographic information. The focus of this paper
is on official names and a developed application using these names.
SOURCE DATA CONTENT
The GNPS and GNIS databases differ in both their structures and
attribute content. However there is some commonality between the
two databases. Both databases contain name information, variant
names, a name class category (e.g., populated place, airport,
river, mountain,
) and geographic references in addition
to their unique fields. The name class categories have some similarities
but are different for the two databases.
The GNIS database contains additional reference information for
name features to counties and states. States and counties have
name and Federal Information Processing Standards (FIPS) coded
attributes. Also included are topographic map references, elevation,
description relative to nearby features, historical and administrative
fields. Geographic coordinates are in degrees, minutes, seconds,
and hemisphere format. A feature has a single primary coordinate.
The center coordinate defines an area feature and the terminal
point, such as the mouth of a river, defines linear features.
Features may also have secondary coordinates. These coordinates
do not map the shape of the feature rather each identifies a point
on the feature within on a distinct USGS 7.5 minute topographic
map.
The GNPS database also contains FIPS coded reference information
for name features to countries and provinces. The GNPS supports
the definition of names information through the transliteration
and romanization of native languages. Also included in the database
as attributes are name types (conventional, native, variant, short
form, long form), UTM grid reference, JOG map reference, region
of the world code, and generalized feature class type. Geographic
coordinates are in degrees, minutes, seconds, and hemisphere format.
The resolution of coordinates is to minutes. Feature origins for
shape types are similar to the conventions used in the GNIS database.
SDE NAMES DATABASE CONTENT
The SDE names database contains geographic names information from
both GNPS and GNIS databases. There are similarities between some
but not all the attribute fields in the two sources. Because of
these and other factors described below, each data source has
its own SDE layer. Users of the system typically know prior to
formulating a query if the features to be found are in the US
or outside of it so logical segregation of the data makes sense.
All names from the GNPS and GNIS database including variants were
in the SDE names database. The SDE names database contains two
point layers, one for the US data and one for Non US data. A single
primary coordinate identifies each feature.
The internal SDE database requires that features be stored in
positive integer space. The SDE dataset, which controls the layers
coordinate conversion, has a false origin of -180 (longitude)
and -90 (latitude) and a scale factor of 1000000. This ensured
that the coordinates for name features were within SDE limits
and while maintaining the resolution of the source data.
Attribute information stored in the GNIS (US) layer includes feature
name, type of name, feature class type, state FIPS, county FIPS
and topographic map name. The layer contains a record for each
of the primary and variant names in GNIS. The attributes came
from or were all derived from attributes found in the GNIS database.
Attribute information stored in the GNPS (non-US) layer includes:
feature name, type of name, feature class and class type, country
code, province code and region of the world code. The layer contains
a record for each of the conventional, native, variant and non-verified
names in GNPS. Diacritics and special characters are not included
in the names fields. The attributes came from or were all derived
from attributes found in the GNPS database.
DATA CONVERSION/IMPORT
Conversion of the GNIS and GNPS data involved exporting the data
to flat file formats, transferring the data from a PC environment
to UNIX, preprocessing the exported files to produce a single
record per feature name format and conversion of degrees, minutes
and seconds to decimal degrees.
The conversion of raw GNIS data into an SDE layer required several
steps. GNIS data in a flat file format was generated using query
and export capabilities available in the GNIS software on the
PC. The data was transferred (ftp) from a PC to UNIX. UNIX commands
(awk in particular) and C programs were run on the data to produce
a flat file format that contained records with feature coordinates
in decimal degrees and with a single record per feature name.
An import module, written using the SDE C API, loaded the data
into the SDE US names layer.
The GNPS SDE layer was loaded from a flat file that was produced
from an export and manipulation of GNPS data. The export program
handled the conversion of coordinates from degrees, minutes and
seconds format to decimal degrees format. The data exported from
GNPS was stripped of diacritics and special characters. The export
of the data was performed on a PC. Additional processing of the
data occurred on a UNIX workstation similar to the preprocessing
of GNIS raw data. The additional processing created a flat file
format with one record per feature name. An import module written
with the SDE C API was used to load the data into the SDE world
names layer.
DEVELOPMENT AND TARGET SYSTEM ENVIRONMENT
System and software development was performed on a Sun workstation
that contained SDE (v2.1), Oracle RDBMS (v7), ArcView (v3) and
a UNIX development environment. The target environment was a DEC
Alpha server and client DEC Alpha workstations and NT PCs. The
server contained SDE (v2.1) and Oracle RDBMS (v7). The client
systems have ArcView (v3). The development and target environment
and target environments were on physically separate local area
networks. Software and data generated on the development system
was transferred to the target system via 8-mm tape.
Limited disk resources on the development environment prevented
the SDE names database from being populated with the full GNIS
and GNPS data at the same time. The target system had ample resources
to hold both SDE names data layers concurrently. In the target
environment the SDE database co-existed in the same Oracle database
with other non-names databases. Also, the Oracle server and SDE
server resided on the same DEC Alpha server.
No software conversion was required to port the software and data
developed on the development environment to the target environment.
The SDE names database was exported on the development machine
using the Oracle export (exp) utility and imported on the target
machine using the Oracle import (imp) utility. The end user application
software consisted on ArcView Avenue scripts that were integrated
into ArcView project files in the target environment.
APPLICATION REQUIREMENTS
The application was required to search for names data, generate
a result that could be viewed graphically and used for further
analysis. The ability to constrain a query using attributes and
to retrieve attribute values was also required. ArcView was selected
as the front end interface for users to query, display and analyze
names data. It offers a class of objects called database themes
(dbtheme) that provide access into an SDE database. The dbtheme
class is available in ArcView3 under the Database Themes extension.
Database themes, although quite powerful, is not sufficient to
be used as a direct user interface by users. The volume of data
in the SDE names database displayed in an ArcView view document
would typically saturate the users display with point name features
if the dbtheme was unconstrained. Also, since users of the system
are novice computer and GIS users, a more directed user interface
was required. The interface had to provide the users with a capability
to manage database access, define and refined query constraints,
query the database, and return results.
APPLICATION DESCRIPTION
The user interface was developed using ArcView's Avenue scripting
language. The names application was incorporated into user's existing
ArcView environment as menu items separated into three groups
of options: 1) Manage database access, 2) Constrain a query, and
3) Perform or show the query.
The login options allowed the user to log into the SDE database
using a specific login account and password or to log in using
the automatic login capability. The automatic login capability
in SDE requires users have an SDE login account set up by the
SDE administrator. Authorization to the database is then allowed
as the user is considered to be verified by the fact that they
have logged on to their operating system account. From the user's
perspective all they are required to do is to confirm the question:
Do you want to use the automatic login? The user also has the
option to log out and log into another SDE account. Access to
the SDE names database was set to read only for users.
The constrain a query group of menu options provides the
user with the ability to specify how to limit and select data
from the database. The SDE layers are categorized by being U.S.
or non-U.S. (World) names. The user can select to query either
source or both. If only the U.S. constraint option is selected,
then the World options are grayed out. If only the World option
is selected, then U.S. constraint options are grayed out. The
constraint options included: name of feature, feature category
class (populated place, airport, river,
), state, country
and name type (official, variant). The user can also select whether
the query should include only features that would appear in the
current ArcView view or all applicable features in the database.
The user has further query capabilities on the name constraint.
The user may specify that the name be an exact match, starts with
the name, contains the name or sounds like the name.
Once satisfied with constraints of the query, the user may show
the query as an SQL statement or run the query. Query results
are returned in a shapefile format. The shapefile is added as
a theme into the legend in the current ArcView view. U.S. and
World name features are returned in separate themes as their attribute
fields differ. If no features meet the current constraints a message
box informs the user that zero records returned and a shapefile
is not created. The user may refine the query or define a new
query following the completion of a query.
RESOURCE MANAGEMENT
The time and effort to set up the database and transfer data layers
from the development system to the target system turned out to
be non-trivial. The Oracle export (exp) and import (imp) utilities
were used to unload the data from the development system and load
it to the target system.
The export of the data went smoothly. A number of resource management
problems were experienced when the data was imported. These problems
consisted of defining adequate tablespace resources for constraints,
indexes, rollback segments and temporary segments. It took several
data load iterations to define resources properly.
Users may want to consider other options when transferring SDE
layers between systems. In other projects we have been very successful
using the SDE to Shape (sde2shp) and Shape to SDE (shp2sde) utilities.
Also we could have used the original import utility to load our
flat files.
FUTURE PLANS
Several enhancements for SDE names database system are underway
or being considered. These include upgrading of the SDE software
and database, enhancing the user interface and making use of special
characters and diacritics.
SDE version 2.1 is the platform of the current names database.
With the soon-to-be released SDE version 3 additional capabilities
will be available. Among these improvements are that the user
will be able to join SDE tables with other tables in the database.
This could greatly enhance capabilities of a names database while
also requiring a redesign of the database. The use of additional
attributes from non-official databases could easily be integrated
in the SDE names database.
The current user interface operates as menu items on an ArcView
view. Creating the interface as an ArcView extension would make
the software more portable and allow the user to easily remove
the options when they are not in use. Also the use of ArcView's
dialog designer (also soon to be released) could create a simpler,
easy-to-use interface.
The SDE names database does not currently store diacritics for
foreign names. The use of a standard character encoding format,
such as ISO 10646, expected in the future will make handling and
display easier than some of the current proprietary fonts.
TRIVIA
So, where in the world is the Esri user conference this year?
The answer of course is San Diego. San Diego is at latitude 32
degrees, 42 minutes 55 seconds north and longitude 117 degrees,
9 minutes, 23 seconds west in the database. The San Diego International
Airport, which many of us will be flying into, is at latitude
32 degrees, 44 minutes, 0 seconds north and longitude 117 degrees,
11 minutes, 15 seconds west.
CONCLUSION
The Spatial Database Engine (SDE) is a server of GIS data stored
in a relation database management system and a provider of GIS
services. A geographic names database utilizing this technology
has been developed. The content of the database is official geographic
names and attributes from the NIMA GNPS database and USGS GNIS
database. A customized menu interface in ArcView provides access
to the database. The design and implementation of system and Oracle
resources require careful planning for an SDE database. Additional
planning requirements occur when transporting the database to
another location. The availability of names data in a GIS environment
provides users with a valuable resource.
Thomas Quinn, MTS
TASC, Inc.
12100 Sunset Hills Road Reston, VA
20190 Telephone: (703) 834-5000 Fax: (703) 318-7900
E-mail: taquinn@tasc.com