An Avenue toolset for the development of National
Hydrography Dataset data
Chuck Horne
Substantial portions of the production, conflation, or
use of the USGS National Hydrography Dataset in ArcInfo format can benefit from
an in-memory model of the relationships between the features.
A variety of tools developed in our operation utilize these concepts to
very significantly enhance the speed and accuracy of interacting with the data
for conflating, quality checking, creating, or analyzing the data.
This paper describes the underlying in-memory data structures.
Introduction: The National Hydrography Dataset (NHD) is a nation-wide geographic data set specified by the USGS and extending the concepts and purposes of the EPA Office of Water River Reach File (RF) series. The development of this data series has been underway for some time, with significant cooperation between federal, state, and private organizations. The fundamental goals of the NHD are:
A particular challenge to the development and maintenance of such a database is to assure that the goals are met even when data is added or updated. Map scales tend to increase over time as the technical capacity and resources applied to GIS improve; for data used primarily as cartographic or attribute retrieval sources, the new features apparent at larger scales impose little impact on legacy data. Given the goals of the NHD however, synchronizing the relationship between added and existing features is of paramount importance. Since one of the most expensive costs in data development is operator time, there is substantial need for computer tools that check or establish the correctness of feature relationships. This paper discusses some tools developed to address this need with respect to the NHD.
The Office of Information Technology Outreach Services (ITOS) of the University of Georgia has been in partnership with the USGS since 1998. Originally, ITOS was implementing the DLG-F standard for transportation and hydrologic features for the State of Georgia. The process of migrating to the NHD standard for Esri covers has been explored with the intention of conflating the 1:24,000 DLG-F hydro and the 1:100,000 NHD. The tools described here were developed to address specific needs arising from the evaluation of the conflation process. Since the author has changed jobs, and the actual production at ITOS has not begun as of this writing, then the tool set is not complete. The programming concepts embodied in the tools, however, have been and continue to be extended for NHD and other applications.
Obviously, the edge-junction model implicit in the Arc8 Geodatabase can be exploited to achieve many of these same effects. The choice of Avenue as a development environment was based on these points:
The remainder of this paper will focus on the algorithms used, performance, and future directions of the Avenue scripts.
FLIP: The
flip process addresses the problem of arc direction. The NHDinArc specification
requires that flow modeling arcs (e.g, streams, artificial flow-paths but not
water body boundary) be directed from upstream to downstream. This algorithm is
relatively straight-forward and will serve to illustrate some of the basic
concepts that are applied in other situations.
Step
1: The underlying data structure is an in-memory representation of the
arc-node topology present in the cover. This data structure is initialized by
reading through the AAT in one pass. As each record in the AAT is read, one arc
dictionary entry is created using the arc# as key and recording the to- and
from-nodes as well as an initial 0 (not –visited) value for its flip status.
The to- and from-nodes are evaluated and a node dictionary entry created or
appended to for each. Each node dictionary entry has an initial value of nil for
its downstream node and a list of the arcs that connect to it as well as an
indicator of whether the node is a to- or from-node for that arc.
Arc#à{tNode#,
fNode#, flipStatus}
Node#->{downstreamNode,{{arc#,isToNode},…}
Using
these data structures, an operation can start with a given arc and, using the
to- or from-node members of the arc’s Dictionary entry, access node
information in the node’s Dictionary entry. Included in that entry is
information about all connected arcs. These simple retrievals are executed in
memory and are only moderately impacted as the size of the data set increases.
“Traversals” proceed very rapidly.
Step
2: The user identifies exit node for the network. All arcs connected to
that node (ie, retrieved from the node entry’s arc list) pushed onto a stack
as list that includes the node number as entryNode to the arc. The exit node’s
downstream node number is set to the special value –1. The main processing
loop proceeds as long as elements can be popped off the stack
Step
3: An element is popped off the stack. The arc is evaluated for flip
status: if the entryNode is the to-node for the arc, its flip status is set to
-1 (don’t-flip), otherwise it is set to 1 (flip). The node at the other end (otherNode)
is evaluated for connected arcs: any connected arcs (except the current arc) are
pushed onto the stack, with the entry node being otherNode. The downstream node
for the otherNode is set to entryNode.
This
basic algorithm will traverse upstream from the exit point and set the flip
value for all connected arcs in a dendritic network. Unconnected arcs will be
recognizable by the flip status of 0
(not-visited). Upon completion, the flip status can be written out as an arc
attribute so that all arcs needs flipping can be selected and flipped with two
ArcEdit statements. A complication is the fact that hydrologic networks are not
necessarily dendritic: a stream may diverge into multiple segments that later
flow back in to the network. The downstream node element in the node dictionary
entry is used to circumvent this problem. In step 3, if otherNode is
found to have a downstream node already set, then a divergence is recognized,
although the point of recognition is not necessarily the point of divergence.
Also, any flip values already set in the loop caused by the divergence may be
invalid. In order to isolate the arcs in the loop, the following process is
used:
Divergence
processing: the current
entryNode uses downstream node element members to traverse back down to the
original exit point, whose value is –1, compiling AList of all nodes
encountered. Then, BList is created
by traversing through downstreamNodes from otherNode, except at each node, it
checks for the presence of that node in Alist. When it encounters a node in
Alist, then the circle of nodes that includes entryNode, otherNode, BList, and
all elements of Alist from entryNode to the intersection node comprise members
of a circularity. The inter-connecting arcs are given a flip status of 2
(circularity). Since the circularity was recognized at a node already processed,
the all of its connecting arcs have been stacked; no other steps need to be
taken to continue processing through the divergence. Of course, those stacked
arcs may include some later recognized as being a part of the circularity, but a
quick check of flip status in Step 3 will avoid reprocessing divergences.
Variations
on FLIP. Divergence processing can be improved by using Euclidean
distance or shortest path approaches to establish flow direction for a subset of
arcs in the circularity. Likewise, additional node or arc attributes can be used
to alter the behavior of the base algorithm when pumps or non-flowing
connections are involved.
ArboLatSum Stream level is a number which is constant along the “main” path
from the source to the exit point. Tributaries have a level value that is one
higher than the stream they empty into. Where streams flow together, the
determination of the segment that constitutes the main flow path is not
methodically determined: hydrologically, the greater volume of inflow is the
most important consideration. Other considerations include:
The
determination of ArboLatSum involves reading the network topology in a fashion
similar to that of FLIP. Additional dictionary entry items include
accumulated length (initialized to -1), arc length, geographic name, and flip
value from previous application of the FLIP tool.
Arc#à{tNode#,
fNode#, flipStatus, accumulated length, arc length, geographic name, flip}
Node#->{downstreamNode,{{arc#,isToNode},…}
The
arcs designated as “flip” have been flipped in ArcEdit: i.e., arcs are
directed downstream except where divergence makes impossible the determination
of flow direction. The flip item in an arc dictionary structure is used to
identify divergences, which require special processing. After reading the AAT to
initialize the arc and node dictionaries, the steps are:
Step
1: Stack all “leaf arcs”.
These are identified from all nodes which:
Step
2: Pop elements off the stack until it is empty. For each popped arc, get
its upstream Node upNode. For each arc attached to upNode (except the currently
popped arc), total the accumulated length. Set this arc’s accumulated length
to that total length plus the arcs’s length. In the case of leaf arcs, the
accumulated length will equal the arc length. Consider the arc’s downstream
node downNode. For each arc connected to downNode (except the current arc), make
a list of those which are upstream of downNode (ie, their To-node is downNode)
and identify the arc which is downstream (ie, their from-node is downNode). If
all arcs in the list have an accumulated length, then push the downstream arc.
When the stack is empty go to step 3.
Step
3: Identify the exit node for the network a value for lowest level in the
network. Determine the greatest accumulated length among the arcs connected to
the exit node. For each connected arc, push on a stack an entry which has the
arc and a level: for the arc with the greatest accumulated length, that level is
lowest level; for all others it is lowest level + 1.
Step
4: While the stack has members, pop an entry, set the level item in the
arc dictionary entry to be the level popped from the stack, and consider each
connected arc at its far end. Whichever of the connected arcs has the same
geographic name, or there is none, the highest accumulated length is stacked
with level current level; all others are stacked with current level + 1.
Step
5: When the stack is empty, level values from the arc dictionary can be
written out.
Since
level is actually a reach route feature attribute in NHDinArc, the above
algorithm can result in ambiguity.
Where the longer tributary joins a reach at somewhere in between the up- or
downsteam-most node, arc leveling will follow the longer tributary. The reach
route containing the downstream segment and the shorter tributary will be
composed of arcs whose arc levels differ. However, if the assignment of reach
route level is consistent with the downstream-most arc level, then the
specification of level assignment may be considered to be correct at the penalty
of having an intersection of tributaries at which both inflowing reaches are at
the same level.
Another
complication with this algorithm is divergence processing. At the time of the
original tool development, the following strategy was employed to deal with
divergences.
Divergence
processing in ArboLatSum:
When Step 2 above is being executed, the downstream traversal can
recognize that a divergence has been encountered by the arc entry’s flip item.
At this point, the following takes place:
Divergence
step 1: The entry node of the
divergent arc is considered. That arc and all of its connected arcs are stacked
with the entry node.
Divergence
step 2: Entries are popped
until the stack is empty. The popped arc is added to a list, allEncountered, and
the entry node is added to another list, allNodes. Each popped arc’s otherNode
is considered. All arcs connected to the otherNode are added to the
allEncountered list if not already there. If those arcs are a member of the
divergence (i.e. flip value is 2), then their otherNode is considered and its
connected arcs are stacked with otherNode. These divergent arcs are added to a
list, allDivergent. The non-divergent arcs (i.e. flip value not 2) are added to
a list, allConnected.
Divergence
step 3: When the stack is
empty, the allDivergent list will contain all arcs that are members of the
divergent set while all arcs that are directly connected (i.e., share a node)
but are not members of the divergent set will be in the allConnected list. The
members of allConnected are considered. Ordinarily, one of these will outflowing;
this will be detected because its from-node is in allNodes. The others will be
inflowing; their to-nodes will be in allNodes.
Divergence
step 4: Replacement arc
dictionary structures are created for each inflowing connected arc. These
replacement arcs have a new arc# (incremented from the highest arc number
encountered at cover read time) and will have a to-node that is the from-node of
the outflowing arc and the from-node of the associated inflowing arc. Its
accumulated length will be the sum of accumulated lengths upstream. The arc and
node dictionaries will be updated to mirror this new replacement arc: e.g., the
involved node entries will no longer reference the real arcs, but instead will
reference the replacement arcs.
When
a divergence is encountered, the replacement arcs mirror a dendritic flow that
connects the inflowing segment(s) to the outflowing segment. The length accumulation
phase “passes through” the divergent set and continues downstream. Since the
process replaced the divergent arcs with replacement arcs in the dictionary
model. The subsequent leveling phase therefore proceeds upstream of the
divergent set as well. However, the divergent arcs themselves do not have an
assigned level.
In
some cases, there are multiple outflowing segments and this situation has not
been addressed.
RFLOW The RFLOW table is a cross-reference between inflowing and outflowing drain route features. The fundamental concepts illustrated by FLIP and ArboLatSum data structures are extended to model routes. The data structures required are more complex, in the sense that they have more components and inter-relationships, but they are not otherwise conceptually more challenging. The basic process of reading through cover tables in one pass and constructing dictionaries keyed by unique feature ID numbers is the same. One-to-many relationships are still maintained in lists which are elements of the dictionary entry.
Currently, the NHDinArc specification is not
explicit about some of the assumptions required for the approach to work.
Whereas there is a general statement that features representing flowing water
will be directed downstream, the potentially discontinuous implementation of
routes (e.g., the “branched reach” connecting artificial flow paths within a
waterbody with multiple inflows) will require special processing.
The RFLOW algorithm is not discussed here because of the potential alterations that may be required as specification issues are finalized as well as the bulky and convoluted descriptions that result from data structures of this size. Instead, we present a brief account of the nature of those data structures as well as the applications that may be constructed upon them.
After the transport and reach routes and the waterbody and waterbody reach features have been constructed, the NHD cover is read. The route model is built by reading the section table then the route table for both the transport and reach features, then the arc table is read and finally the model is finalized by correlating relationships between the feature types. Optionally, region and polygon relationships can be stored as well.
Transport route#à{{sortedSectionList},startNodeList,endNode,{ Com_id, Rch_com_id, Wb_com_id, Ftype, Fcode}}
Reach
route#à{{sortedSectionList},startNode,endNode,{
Com_id, Rch_code, Level, Gnis_id}}
Section#->{route,
arc, fMeas, tMeas, withArc, fromNode, toNode}
Arc#->{fNode, tNode, 0, transportRoute, drainRoute, lPoly, rPoly}
Node#->{downstreamNode,
{{arc,True}}}
The tools described
here were written while the author was employed at the Office of Information
Technology Outreach Services at the University of Georgia.
Chuck Horne
Senior GIS Analyst
Jordan, Jones, and
Goulding, Inc.
2000 Clearview
Avenue
Atlanta, GA 30340
Phone: 678-575-4252
FAX: 706-549-0423
Email: chorne@jjg.com