Kristine L. Verdin1

A System for Topologically Coding Global Drainage Basins and Stream Networks2

Development of a global digital elevation model at 30 arcsecond resolution (approximately 1 kilometer) at the U.S. Geological Survey's EROS Data Center, labeled as GTOPO30, has made it possible to obtain derivative products for hydrologic studies. Tools in ArcInfo3 and GRID were used to automatically extract river basins and drainage networks, which were subsequently vectorized and tagged with attributes according to the numbering scheme developed by Otto Pfafstetter, a Brazilian engineer. The Pfafstetter system is based upon the topology of the drainage network and the size of the surface area drained. Its numbering scheme is self-replicating, making it possible to provide identification numbers to the level of the smallest subbasins extractable from a DEM. The system's appeal stems from its economy of digits, the topological information which the digits carry, and its global applicability. ARCVIEW Avenue scripts were written to provide an interface to the basins and drainage networks via the Pfafstetter codes. Queries which exploit the topological information in the Pfafstetter digits are programmed in the scripts. Examples of the utility of the system include: for a given location, the automatic identification of all upstream subbasins, all upstream river reaches, or all downstream reaches. The result is a global system of unique identification numbers for topographic drainage features and a convenient interface to manipulate them for customized applications.


The planning and implementation of human activities and their interactions with the natural environment require consideration, at one time or another, of water and the hydrologic cycle. In this regard, the river basin takes its place alongside ecoregions, property boundaries, census tracts, political divisions, and protected areas as a fundamental landscape unit upon which human research, analysis, design, and administration are based. As the practical considerations of sustainable development coincide more and more with those of global change research, there is a need for a simple and globally applicable system of reference which at once uniquely identifies and indicates the spatial nature of a hydrographic basin. To address this need, this system for delineation and codification of basins based upon topography and the topology of the resulting drainage network is presented.

Topological Characteristics of the Proposed System

The system presented here for the delineation and codification of the Earth's river basins is founded upon concepts first articulated by Otto Pfafstetter, an engineer with the Departamento Nacional de Obras de Saneamento (DNOS), a civil works agency (now defunct) of the federal government of Brazil (Pfafstetter, 1989). It is a natural system, based upon topographic control of areas drained on the Earth's surface, and the topology of the resulting hydrographic network. It therefore lends itself to implementation through manipulation of digital elevation models (DEMs), and to subsequent exploitation using the tools in ArcInfo and ARCVIEW. The appeal of Pfafstetter's approach stems from its economy of digits, the topological information which the digits carry, and its global applicability. In order to explain the system, it is necessary to first cover basic definitions.

If one begins at the mouth of a river and works upstream, before very long a confluence will be encountered. At this point, it is necessary to distinguish between the main stem and the tributary. The main stem is always taken as the watercourse which drains the greater area; the tributary drains the lesser of the two areas. At times this is inconsistent with local custom or map notation, but the drainage area rule is nonetheless strictly applied.

The area drained by a tributary is called a basin. Continuing upstream to a second confluence, one once again applies the drainage area rule to distinguish between the main stem and the tributary. A second basin is associated with the newly encountered tributary. The area directly drained by the reach of the main stem lying between the two tributaries is called an interbasin.

Subdivision of the area drained by a major river into coded basins and interbasins first requires identification of the four largest tributaries, according to the criterion of area drained. These are assigned the numbers 2, 4, 6, and 8, in the order in which they are encountered as one goes upstream along the main stem. Next, the interbasins are numbered 1, 3, 5, 7, and 9, again working upstream from the mouth of the main stem. Interbasin 1 is the area drained by the main stem between the mouth of basin 2 and the mouth of the main stem. Interbasin 3 is the area drained by the main stem between the mouths of basins 2 and 4. Interbasin 5 is the area drained by the main stem between basins 4 and 6, and interbasin 7 lies between basins 6 and 8. Interbasin 9 always consists of the headwaters area of the main stem, and always drains more area than basin 8, by definition. If a closed basin is encountered, it is assigned the number 0 (zero). Figure 1 demonstrates the subdivision of the Mississippi River basin into its first 10 subbasins, including four basins (the Red, Arkansas, Ohio, and Upper Mississippi Rivers), 5 interbasins and one closed basin (Old Wive's Lake Closed Drainage). Notice that the Mississippi basin is one in which the headwaters defined using this technique are contrary to local custom. The area drained criterion designates the Missouri River, not the Upper Mississippi River as the headwaters. Each basin and interbasin is preceded, in the case of the Mississippi, by the digit eight (8). This digit designates the Level 1 Pfafstetter subdivision of the North American continent, to be described below.

A basin may be further subdivided by repeating the application of the same rules to the area within it. Thus, basin 8 of our example, the Upper Mississippi Basin, is subdivided into basins 82, 84, 86, and 88 (the Illinois, Des Moines, Iowa and Upper Mississippi Rivers), and interbasins 81, 83, 85, 87, and 89. No closed basins were found at this level. See Figure 2.

An interbasin is subdivided by identifying the 4 tributaries within it having the greatest drainage areas. They are numbered from downstream to upstream, regardless whether they enter the main stem on the right or left bank. Thus, interbasin 7 in our example contains basins 72, 74, 76, and 78 (the St. Francis, Big Muddy, Kaskakia, and Meramac Rivers). As in the case of the main stem, the areas drained by the intervening reaches are numbered as interbasins 71, 73, 75, 77, and 79. See Figure 3.

Depending on the density of the mapped stream network, any basin or interbasin can be further subdivided until four tributaries can no longer found.

Implementation of the Method for a Continent

The USGS has developed a consistent set of DEMs with 30-arcsecond (approximately 1-km) postings for the land masses of the Earth (Verdin and Jenson, 1996). They have been given the name GTOPO30 and are available to the public over the World Wide Web at Procedures for continental basin delineation and codification were developed for the case of these data sets. The North American continent was chosen as the first for implementation of the Pfafstetter basin numbering scheme. This was due, in part, to the abundance of digital data (streams and basins) available for verification from U.S., Mexican and Canadian organizations.

It is necessary to first employ ArcInfo drainage analysis software (Jenson and Domingue, 1988; Esri, 1992) to refine a continental DEM prior to basin analysis. This involves identifying and filling spurious sinks, and preserving representations of natural sinks in the landscape (Verdin and Jenson, 1996). After transformation to a cartographic projection which supports area calculations, such as the Lambert Azimuthal Equal Area coordinate system, and selecting a 1 km spacing for elevations, the ArcInfo drainage routines are applied to the properly filled DEM to solve for flow direction at each posting location. Next, a flow accumulation value is computed for each position, and a 1000 km2 threshold applied to obtain a drainage network in raster format. This drainage network is vectorized using the STREAMLINK function, and each arc in the network is attributed with the maximum flow accumulation value from the set of raster elements used to derive it. In this way, each arc in the resulting vector drainage network is made ready for application of the Pfafstetter rule for distinguishing between the main stem and its tributaries by criterion of area drained.

A set of arcs defining the circumference of the continent is also needed. These arcs are derived from the data/NODATA interface, ordinarily associated with the ocean shore of a land mass. These arcs are given identification numbers which increase in a clockwise direction around the perimeter of the continent.

As an intermediate step, the arcs composing the vector drainage network are sorted into three classes: those which drain directly to the sea, those which drain directly into closed basins, and those which are tributary to arcs in these first two cases. The Pfafstetter numbering rules are applied to all the arcs draining directly to the sea. First, the four with the greatest area drained are identified. They are assigned Pfafstetter codes 2, 4, 6, and 8, following a clockwise order around the continent.

Returning to the raster DEM and its associated flow direction grid, the surface areas drained by basins 2, 4, 6, and 8 are identified, and the corresponding vector polygons delineated, each carrying the appropriate Pfafstetter code attribute. Basin 0 is identified as the largest closed basin, obvious at the continental scale. The remaining continental area is assigned to polygons corresponding to coastal interbasins 1, 3, 5, 7, and 9, such that interbasin 3 lies between basins 2 and 4, interbasin 5 lies between basins 4 and 6, and interbasin 7 lies between basins 6 and 8. The area between basins 2 and 8 is divided between interbasins 1 and 9. Choosing a divide which connects basin 0 with the coast can be a convenient way to do this.

The result of the preceding steps is a vector coverage of the continental land mass composed of ten polygons numbered 0 through 9. This coverage is intersected with the vector coverage of the continental drainage network, in order to pass to each arc of the drainage network a new attribute consistent with the Pfafstetter code (0-9) for the basin or interbasin containing it. This completes a Level 1 subdivision of a continent. Shown in Figure 4 is the Level 1 subdivision of the North American continent. The four largest river systems of the North American continent which are delineated at this level are the Mackenzie, Nelson, St. Lawrence and Mississippi Rivers.

Further subdivision is carried out to obtain a Level 2 set of Pfafstetter basins for the continent. Within each Level 1 basin, or interbasin, mainstem and tributary arcs are identified. The tributary arcs are then sorted by size of area drained, and the four largest numbered 2, 4, 6, and 8, going from downstream to upstream along the main stem. The corresponding polygons are derived using the flow direction grid, and all arcs are given the second Pfafstetter digit as an attribute.

Subdivision of coastal interbasins requires examination of the four largest streams flowing to the coast. These are numbered in a clockwise order as 2, 4, 6, and 8 and the appropriate basins defined. Figure 5 shows the Level 2 subdivision of the Pacific Northwest region. The Columbia, Fraser, Kuskokuim and Yukon River basins are extracted at this level as well as the Oregon Closed Basins.

It can be seen that this process can be repeated over and over to obtain successively finer subdivisions of basins and interbasins. The process is ultimately limited by the spatial resolution of the DEM employed. This becomes evident when it is no longer possible to identify four tributaries within a basin or interbasin. Switching to a higher resolution DEM for an area of special interest would, however, permit the process to continue.

All the above described procedures were coded in Arc Macro Language (AML) for automatic implementation within ArcInfo. In the case of the 30 arc-second North American DEM, subbasins down to Level 5 Pfafstetter units were extracted. This produced 5020 Level 5 Pfafstetter units for North America with an average surface area of 3,640 sq. km. The resolution of the DEM did not merit finer extraction of basins. Shown in Table 1 are the average areas of each level of subdivision for the North American continent.

Table 1. Surface Area of Pfafstetter Units
Level 1 Area (sq. km.) Level 2 Average Area (km2) Level 3 Average Area (km2) Level 4 Average Area (km2) Level 5 Average Area (km2)
0 368,758 46,095 10,947 2,669 N/A
1 3,232,005 323,200 37,149 7,652 3,045
2 1,749,654 194,406 21,871 4,268 1,827
3 3,032,711 366,968 41,874 9,761 7,276
4 1,109,407 110,941 15,175 4,334 2,354
5 2,383,428 264,825 29,793 6,719 3,083
6 1,055,209 117,245 13,190 2,890 755
7 1,536,468 170,719 18,969 6,180 4,434
8 3,239,411 323,941 50,503 9,506 3,178
9 4,385,016 438,502 47,663 8,418 3,925
Average 2,209,207 237,549 29,873 6,980 3,640

Characteristics of the Numbering Scheme

Identification numbers which end with an even digit represent basins, and numbers that end with an odd value represent interbasins. Upstream-downstream dependency between locations can be inferred by examining and comparing the topological information carried by the identification numbers. This is perhaps best explained by means of a few examples.

Consider a cannery situated on a river in interbasin 8873. Will a new dam at a location in 8885 affect flows to the cannery? Yes, because the dam site is upstream of the cannery. The identification numbers reveal this, without need to reference a map or engineering straight line diagram. The match of the leading digits, 88, tells us that the two locations are in the same basin. Beyond the matching digits, 85 is greater than 73, and therefore the dam lies upstream of the cannery and will affect flows to it.

Will the dam affect the irrigation diversion of a farmer at 8834? No, it will not, even though we see that 85 is greater than 34. This is because the trailing 4 indicates a tributary off the main stem and above any flows influenced by the dam. Thus, river reaches affected by the dam will have a match of leading digits, 88, and trailing odd digits less than 85. These are: 8883, 8881, 8879, 8877, 8875, 8873, 8871, 8859, 8857, 8855, 8853, 8851, 8839, 8837, 8835, 8833, 8831, 8819, 8817, 8815, 8813, and 8811. The first two are downstream interbasins in basin 888. The rest are downstream interbasins on the mainstem. Basins 886, 884, and 882 (and all of their interbasins and basins) belong to tributaries which enter the main stem downstream of the dam, and are therefore unaffected. Areas which are tributary to the dam in interbasin 8885 are those which are upstream: 8886, 8887, 8888, and 8889. Figure 6 shows these interdependencies via the ARCVIEW interface developed for the data set.

It can be seen from these examples that simple rules to check digits with tests of "odd" or "even", and "less than" or "greater than", can quickly isolate areas of interest for a particular investigation. Such tests are easily implemented ArcInfo and ARCVIEW, particularly with the help of AVENUE scripts and AML programming.

ARCVIEW Implementation

The North American hydrologic data set has been put into an ARCVIEW interface for ease of viewing and manipulating the data. The logic behind the Pfafstetter numbering scheme for aggregation of upstream basins and identification of downstream reaches has been coded in AVENUE scripts. Currently, the ARCVIEW interface allows users to view the raster (DEM, Flow direction, Flow Accumulation, Slope, Aspect and Wetness Index) and vector data sets (streamlines tagged with contributing area and basins), query the basin and stream data sets and identify upstream basins and downstream channels. Work is progressing on attributing all the basin and stream features with common names. The data sets will be available in late summer 1997.


A new global system for delineation and codification of the Earth's river basins was presented. It is a natural system, defined by topographic control of drainage and the topology of the resulting network of rivers. It can be readily implemented by application of GIS techniques to global DEMS , which is in fact an endeavor of the USGS using the GTOPO30 data sets. The identification numbers which are generated carry valuable topological information which can be easily exploited by standard data base management operations, facilitating analyses of sustainability of natural systems and human activities which intervene or are dependent upon the availability of surface water resources. The topological information of the identification numbers makes full and efficient use of all digits, and for this reason the system compares favorably with existing national and continental numbering schemes. A series of single five-digit codes is sufficient to uniquely identify subbasins smaller than 4,000 km2 in mean surface area for all the land masses of the globe.


Esri, 1992, "Cell based modeling with GRID", Esri, Inc., Redlands, California.

Jenson, S., and J. Domingue, 1988, "Extracting topographic structure from digital elevation data for geographic information system analysis", Photogrammetric Engineering and Remote Sensing, v.54, pp.1593-1600.

Pfafstetter, O., 1989, "Classification of hydrographic basins: coding methodology", unpublished manuscript, DNOS, August 18, 1989, Rio de Janeiro; translated by J.P. Verdin, U.S. Bureau of Reclamation, Brasilia, Brazil, September 5, 1991.

Verdin, K., and S. Jenson, 1996, "Development of continental scale DEMs and extraction of hydrographic features", proceedings of the Third Conference on GIS and Environmental Modeling, Santa Fe, New Mexico, NCGIA.


1Hughes STX Corporation. Work performed under U.S. Geological Survey contract 1434-CR-97-CN-40274
2This paper is preliminary and has not been edited or reviewed for conformity with U.S. Geological Surveye standards or nomenclature.
3Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Kristine L. Verdin, Hughes STX Corporation
Earth Resources Observation Systems (EROS) Data Center
Sioux Falls, South Dakota 57198 USA
phone: (605) 594-6002
fax: (605) 594-6568