An Avenue toolset for the development of National Hydrography Dataset data

Chuck Horne

Substantial portions of the production, conflation, or use of the USGS National Hydrography Dataset in ArcInfo format can benefit from an in-memory model of the relationships between the features. A variety of tools developed in our operation utilize these concepts to very significantly enhance the speed and accuracy of interacting with the data for conflating, quality checking, creating, or analyzing the data. This paper describes the underlying in-memory data structures.

Introduction: The National Hydrography Dataset (NHD) is a nation-wide geographic data set specified by the USGS and extending the concepts and purposes of the EPA Office of Water River Reach File (RF) series. The development of this data series has been underway for some time, with significant cooperation between federal, state, and private organizations. The fundamental goals of the NHD are:

The permanent and unambiguous identification of surface hydrologic features; in effect a framework for address geocoding water bodies
A database model that supports the abstraction of basic hydrologic processes; the basic notions of upstream, downstream, drainage basin, point of discharge and so forth intuitive to every stakeholder

A particular challenge to the development and maintenance of such a database is to assure that the goals are met even when data is added or updated. Map scales tend to increase over time as the technical capacity and resources applied to GIS improve; for data used primarily as cartographic or attribute retrieval sources, the new features apparent at larger scales impose little impact on legacy data. Given the goals of the NHD however, synchronizing the relationship between added and existing features is of paramount importance. Since one of the most expensive costs in data development is operator time, there is substantial need for computer tools that check or establish the correctness of feature relationships. This paper discusses some tools developed to address this need with respect to the NHD.

The Office of Information Technology Outreach Services (ITOS) of the University of Georgia has been in partnership with the USGS since 1998. Originally, ITOS was implementing the DLG-F standard for transportation and hydrologic features for the State of Georgia. The process of migrating to the NHD standard for Esri covers has been explored with the intention of conflating the 1:24,000 DLG-F hydro and the 1:100,000 NHD. The tools described here were developed to address specific needs arising from the evaluation of the conflation process. Since the author has changed jobs, and the actual production at ITOS has not begun as of this writing, then the tool set is not complete. The programming concepts embodied in the tools, however, have been and continue to be extended for NHD and other applications.

Obviously, the edge-junction model implicit in the Arc8 Geodatabase can be exploited to achieve many of these same effects. The choice of Avenue as a development environment was based on these points:

Programmer familiarity
The availability of critical data structures: hash tables, lists, and stacks (the Avenue Dictionary, List, and Stack classes)
The cost per seat and ease of general purpose QC in the ArcView environment

The remainder of this paper will focus on the algorithms used, performance, and future directions of the Avenue scripts.

FLIP: The flip process addresses the problem of arc direction. The NHDinArc specification requires that flow modeling arcs (e.g, streams, artificial flow-paths but not water body boundary) be directed from upstream to downstream. This algorithm is relatively straight-forward and will serve to illustrate some of the basic concepts that are applied in other situations.

Step 1: The underlying data structure is an in-memory representation of the arc-node topology present in the cover. This data structure is initialized by reading through the AAT in one pass. As each record in the AAT is read, one arc dictionary entry is created using the arc# as key and recording the to- and from-nodes as well as an initial 0 (not –visited) value for its flip status. The to- and from-nodes are evaluated and a node dictionary entry created or appended to for each. Each node dictionary entry has an initial value of nil for its downstream node and a list of the arcs that connect to it as well as an indicator of whether the node is a to- or from-node for that arc.

Arc#à{tNode#, fNode#, flipStatus}

Node#->{downstreamNode,{{arc#,isToNode},…}

Using these data structures, an operation can start with a given arc and, using the to- or from-node members of the arc’s Dictionary entry, access node information in the node’s Dictionary entry. Included in that entry is information about all connected arcs. These simple retrievals are executed in memory and are only moderately impacted as the size of the data set increases. “Traversals” proceed very rapidly.

Step 2: The user identifies exit node for the network. All arcs connected to that node (ie, retrieved from the node entry’s arc list) pushed onto a stack as list that includes the node number as entryNode to the arc. The exit node’s downstream node number is set to the special value –1. The main processing loop proceeds as long as elements can be popped off the stack

Step 3: An element is popped off the stack. The arc is evaluated for flip status: if the entryNode is the to-node for the arc, its flip status is set to -1 (don’t-flip), otherwise it is set to 1 (flip). The node at the other end (otherNode) is evaluated for connected arcs: any connected arcs (except the current arc) are pushed onto the stack, with the entry node being otherNode. The downstream node for the otherNode is set to entryNode.

This basic algorithm will traverse upstream from the exit point and set the flip value for all connected arcs in a dendritic network. Unconnected arcs will be recognizable by the flip status of 0 (not-visited). Upon completion, the flip status can be written out as an arc attribute so that all arcs needs flipping can be selected and flipped with two ArcEdit statements. A complication is the fact that hydrologic networks are not necessarily dendritic: a stream may diverge into multiple segments that later flow back in to the network. The downstream node element in the node dictionary entry is used to circumvent this problem. In step 3, if otherNode is found to have a downstream node already set, then a divergence is recognized, although the point of recognition is not necessarily the point of divergence. Also, any flip values already set in the loop caused by the divergence may be invalid. In order to isolate the arcs in the loop, the following process is used:

Divergence processing: the current entryNode uses downstream node element members to traverse back down to the original exit point, whose value is –1, compiling AList of all nodes encountered. Then, BList is created by traversing through downstreamNodes from otherNode, except at each node, it checks for the presence of that node in Alist. When it encounters a node in Alist, then the circle of nodes that includes entryNode, otherNode, BList, and all elements of Alist from entryNode to the intersection node comprise members of a circularity. The inter-connecting arcs are given a flip status of 2 (circularity). Since the circularity was recognized at a node already processed, the all of its connecting arcs have been stacked; no other steps need to be taken to continue processing through the divergence. Of course, those stacked arcs may include some later recognized as being a part of the circularity, but a quick check of flip status in Step 3 will avoid reprocessing divergences.

Variations on FLIP. Divergence processing can be improved by using Euclidean distance or shortest path approaches to establish flow direction for a subset of arcs in the circularity. Likewise, additional node or arc attributes can be used to alter the behavior of the base algorithm when pumps or non-flowing connections are involved.

ArboLatSum Stream level is a number which is constant along the “main” path from the source to the exit point. Tributaries have a level value that is one higher than the stream they empty into. Where streams flow together, the determination of the segment that constitutes the main flow path is not methodically determined: hydrologically, the greater volume of inflow is the most important consideration. Other considerations include:

Stream name: main path follows stream name. For example, if Little Creek flows into Big Creek, then stream level is the same for the segments of Big Creek above and below the intersection and higher for Little Creek, regardless of other issues.
Straightness: the upstream segment more nearly in line with the downstream segment inherits the main path designation
ArboLatSum: the upstream segment that has the greatest accumulated segment length in its drainage system is designated as main.

The determination of ArboLatSum involves reading the network topology in a fashion similar to that of FLIP. Additional dictionary entry items include accumulated length (initialized to -1), arc length, geographic name, and flip value from previous application of the FLIP tool.

Arc#à{tNode#, fNode#, flipStatus, accumulated length, arc length, geographic name, flip}

Node#->{downstreamNode,{{arc#,isToNode},…}

The arcs designated as “flip” have been flipped in ArcEdit: i.e., arcs are directed downstream except where divergence makes impossible the determination of flow direction. The flip item in an arc dictionary structure is used to identify divergences, which require special processing. After reading the AAT to initialize the arc and node dictionaries, the steps are:

Step 1: Stack all “leaf arcs”. These are identified from all nodes which:

Have exactly one connected arc
Are the From-node for that arc

Step 2: Pop elements off the stack until it is empty. For each popped arc, get its upstream Node upNode. For each arc attached to upNode (except the currently popped arc), total the accumulated length. Set this arc’s accumulated length to that total length plus the arcs’s length. In the case of leaf arcs, the accumulated length will equal the arc length. Consider the arc’s downstream node downNode. For each arc connected to downNode (except the current arc), make a list of those which are upstream of downNode (ie, their To-node is downNode) and identify the arc which is downstream (ie, their from-node is downNode). If all arcs in the list have an accumulated length, then push the downstream arc. When the stack is empty go to step 3.

Step 3: Identify the exit node for the network a value for lowest level in the network. Determine the greatest accumulated length among the arcs connected to the exit node. For each connected arc, push on a stack an entry which has the arc and a level: for the arc with the greatest accumulated length, that level is lowest level; for all others it is lowest level + 1.

Step 4: While the stack has members, pop an entry, set the level item in the arc dictionary entry to be the level popped from the stack, and consider each connected arc at its far end. Whichever of the connected arcs has the same geographic name, or there is none, the highest accumulated length is stacked with level current level; all others are stacked with current level + 1.

Step 5: When the stack is empty, level values from the arc dictionary can be written out.

Since level is actually a reach route feature attribute in NHDinArc, the above algorithm can result in ambiguity. Where the longer tributary joins a reach at somewhere in between the up- or downsteam-most node, arc leveling will follow the longer tributary. The reach route containing the downstream segment and the shorter tributary will be composed of arcs whose arc levels differ. However, if the assignment of reach route level is consistent with the downstream-most arc level, then the specification of level assignment may be considered to be correct at the penalty of having an intersection of tributaries at which both inflowing reaches are at the same level.

Another complication with this algorithm is divergence processing. At the time of the original tool development, the following strategy was employed to deal with divergences.

Divergence processing in ArboLatSum: When Step 2 above is being executed, the downstream traversal can recognize that a divergence has been encountered by the arc entry’s flip item. At this point, the following takes place:

Divergence step 1: The entry node of the divergent arc is considered. That arc and all of its connected arcs are stacked with the entry node.

Divergence step 2: Entries are popped until the stack is empty. The popped arc is added to a list, allEncountered, and the entry node is added to another list, allNodes. Each popped arc’s otherNode is considered. All arcs connected to the otherNode are added to the allEncountered list if not already there. If those arcs are a member of the divergence (i.e. flip value is 2), then their otherNode is considered and its connected arcs are stacked with otherNode. These divergent arcs are added to a list, allDivergent. The non-divergent arcs (i.e. flip value not 2) are added to a list, allConnected.

Divergence step 3: When the stack is empty, the allDivergent list will contain all arcs that are members of the divergent set while all arcs that are directly connected (i.e., share a node) but are not members of the divergent set will be in the allConnected list. The members of allConnected are considered. Ordinarily, one of these will outflowing; this will be detected because its from-node is in allNodes. The others will be inflowing; their to-nodes will be in allNodes.

Divergence step 4: Replacement arc dictionary structures are created for each inflowing connected arc. These replacement arcs have a new arc# (incremented from the highest arc number encountered at cover read time) and will have a to-node that is the from-node of the outflowing arc and the from-node of the associated inflowing arc. Its accumulated length will be the sum of accumulated lengths upstream. The arc and node dictionaries will be updated to mirror this new replacement arc: e.g., the involved node entries will no longer reference the real arcs, but instead will reference the replacement arcs.

When a divergence is encountered, the replacement arcs mirror a dendritic flow that connects the inflowing segment(s) to the outflowing segment. The length accumulation phase “passes through” the divergent set and continues downstream. Since the process replaced the divergent arcs with replacement arcs in the dictionary model. The subsequent leveling phase therefore proceeds upstream of the divergent set as well. However, the divergent arcs themselves do not have an assigned level.

In some cases, there are multiple outflowing segments and this situation has not been addressed.

RFLOW The RFLOW table is a cross-reference between inflowing and outflowing drain route features. The fundamental concepts illustrated by FLIP and ArboLatSum data structures are extended to model routes. The data structures required are more complex, in the sense that they have more components and inter-relationships, but they are not otherwise conceptually more challenging. The basic process of reading through cover tables in one pass and constructing dictionaries keyed by unique feature ID numbers is the same. One-to-many relationships are still maintained in lists which are elements of the dictionary entry.

Currently, the NHDinArc specification is not explicit about some of the assumptions required for the approach to work. Whereas there is a general statement that features representing flowing water will be directed downstream, the potentially discontinuous implementation of routes (e.g., the “branched reach” connecting artificial flow paths within a waterbody with multiple inflows) will require special processing.

The RFLOW algorithm is not discussed here because of the potential alterations that may be required as specification issues are finalized as well as the bulky and convoluted descriptions that result from data structures of this size. Instead, we present a brief account of the nature of those data structures as well as the applications that may be constructed upon them.

After the transport and reach routes and the waterbody and waterbody reach features have been constructed, the NHD cover is read. The route model is built by reading the section table then the route table for both the transport and reach features, then the arc table is read and finally the model is finalized by correlating relationships between the feature types. Optionally, region and polygon relationships can be stored as well.

Transport route#à{{sortedSectionList},startNodeList,endNode,{ Com_id, Rch_com_id, Wb_com_id, Ftype, Fcode}}

Reach route#à{{sortedSectionList},startNode,endNode,{ Com_id, Rch_code, Level, Gnis_id}}

Section#->{route, arc, fMeas, tMeas, withArc, fromNode, toNode}

Arc#->{fNode, tNode, 0, transportRoute, drainRoute, lPoly, rPoly}

Node#->{downstreamNode, {{arc,True}}}

Acknowledgments

The tools described here were written while the author was employed at the Office of Information Technology Outreach Services at the University of Georgia.

Appendices

Endnotes

References

Author Information

Chuck Horne

Senior GIS Analyst

Jordan, Jones, and Goulding, Inc.

2000 Clearview Avenue

Atlanta, GA 30340

Phone: 678-575-4252

FAX: 706-549-0423

Email: chorne@jjg.com