Isabelle Halleux

What's the Best Data Model for Managing the Quality of Surface Water ?

According to European directives, the Belgian Government had to implement a program for the control of the quality of surface water systems. The management of water systems lies however beyond the simple framework of analysing water samples and transmitting results to the European Agency. Extra features have been added to the minimum program in order to answer to specific questions such as estimation of water pollution, follow-up of local characteristics, implementation of federal effluent taxation, etc.

Arc-node, Dynamic Segmentation and GRID data models are reviewed and critically examined through case-studies in order to take advantage of ArcInfo or ArcView functionalities when considering, on one hand, the data analysis in space and time and the information of the end-users on the other.

INTRODUCTION

ISSeP (Public Service Scientific Institute) is commissioned by the Walloon Government of Belgium to manage several networks that control the quality of the environment (air and water). The Institute has to maintain and calibrate about 350 points of measurement covering the country, to analyse the samples, to validate the results and to send reports to the Ministry of the Walloon Region (Fig.1).

Fig. 1. The environmental networks managed by ISSeP (1996)

ArcInfo has been used since 1989 by the Environment Section of ISSeP, for managing the data, mapping the results and developing specific procedures of analysis.The implementation of environmental networks databases requires the definition of the best data model that must be used for an efficient management of the information, i.e. the best consistent framework in which to create, store and analyse the data. Parameters like platform (PC, Workstations, Macintosh), version of the software (PC ARC/INFO, ArcInfo, ArcView) and compatibility aspects are to be taken into account.

THE MANAGEMENT REQUIREMENTS

The objectives of control networking, in terms of quality of the environment, are as follows:

follow-up and control of the pollution,

follow-up and control of industrial effluents,

analysis in comparison with directives,

report and information, proposal of solutions.

Open discussions between computer scientists, chemists and end-users must lead to a detailed study of the data and the information to be extracted. A well designed database structure, corresponding to Point Attribute Tables, determines the system efficiency in terms of computer processing as well as in terms of results analysis, and is thus a critical aspect of the development of numerical management functions.

In practice, the end-user needs software functionalities to achieve these goals, i.e.

to validate the results,
to point out unexpected values,
to analyse distributions,
to compare data to norms,
to analyse variability in time and/or in space,
to present the results.

The first four software requirements can be easily fulfilled using statistical and elementary mathematical procedures, picked up from external software or developed especially using the ARC Macro Language or the AVENUE programming software. Validation procedures have been written for decreasing input errors, using on-line warnings to users, mentioning minimum and maximum values of previous data, moving average value calculated on annual base, detection limits of analytical methods, ... (Maquinay J.C., Michaux A., Halleux I., 1996). Analysis functionalities have been developed for pointing out the behaviour of data, implementing statistical tests, flagging uncertainties, and automatic reporting. Tools have been implemented for providing scientific reliable results when applying norms, for example, estimation of the LC50 in Ecotoxicology, according to ISO-8692 (Halleux I., van der Wielen C., 1996).

The software requirements (5) and (6) need georeferencing. Binary diagrams (pollutant vs. time, pollutant vs. river profile) or more sophisticated graphs (eg. variograms) are useful to aid understanding. The geographical data models to be used here are of outstanding importance.

THE GEOGRAPHICAL DATA MODELS

Data models for water networks are more complex than those for air networks, because they need to analyse point feature data (sampling points) using a linear support (the river system).

The classical linear arc-node topological data model seems to be well designed for the water system; the attribute information is stored in the Arc Attribute Table related to the lines (river name, basin, ...). The water system constitutes the geographical support of the sampling points stored in a coverage, associated or not to the line river coverage.

For the analysis at a small scale, the arc-node model allows to consider immediately sections between confluences. At a larger scale (global study of main rivers), the water system must be simplified (generalised) for an optimum analysis of sections. For considering sections between sampling locations, the points are to be stored as nodes intersecting the linear water system; the Node Attribute Table describes their characteristics. In practice, fixed sampling points are located year after year, but new points can be temporary added for tracking occasional or accidental problems. In the same way, the pollutant sources considered for the interpretation are theoretically well known, but new pollutant industries must sometimes be added to the database system. The end-user has thus - if necessary - to update coverage (or shape file if he is working using ArcView). The point feature information we have to consider is thus a living geographical system and the solution of an arc-node data model is no more convenient.

Moreover, an interesting way for analysing variability in time and in space is to draw profiles. There is no problem using the arc-node data model for drawing binary diagrams of pollutants vs. time at a given sampling point. For drawing profiles along rivers, a measure system along rivers is needed, with a one-to-many relation when considering parts of profiles having specific significance; the label-arc-node data model is unable to offer this possibility.

An alternative approach is to use the Dynamic Segmentation model. It associates multiple sets of attributes to any portion of a linear feature, and uses linear measure values to define point locations along linear features. Routes can defined according to the required profiles for the analysis of pollution in space. Sampling points as well as source points are stored in INFO or RDBMS files (ArcInfo) or ASCII files (ArcView), and can be loaded as event points.

Functions exist for drawing elementary profiles. Unfortunately it is not possible to choose the route scale when applying the STRIPMAP function and drawing lines between eventmarkers along profiles (Fig. 2).

Fig. 2. Dynamic Segmentation : Example of profiles drawn using ARCPLOT

When using ArcView on PC or Macintosh, the coverages are to be used as INFO coverages because the IMPORT function of exported ArcInfo files on these platforms does not keep the route-systems. Graphs in time and in space can be drawn using the CHARTS functionalities; this is an advantageous alternative to the display using ARCPLOT, but needs AVENUE programming for drawing immediately the required profiles or for securing the consistency between Route Attribute Tables, Events and Event Tables.

Updating route-systems must necessarily be performed using ArcInfo. Updating events can be done on INFO files (ArcInfo or ArcView), but the measure system location update can only be performed using ArcInfo.

Interpretation needs also the integration of data allowing to estimate flow accumulation and pollution dispersion along the water system. So that land cover, soil characteristics, land use, surface curvature, slope, effect of non-point source pollutants (sediment, nutrient, pesticide runoff), are to be added to the river and the sampling point systems. Storing points, line, polygons and surfaces uniformly requires the GRID data model. This raster data model do not offer direct functionalities when considering the first management objectives described here : a continuous definition of the data is not evident at the country scale, the cell-size for describing sampling points and rivers are difficult to match, and each sequence of operations has to be programmed. Besides the hydrological functionalities of GRID, the surface functions are of particular interest. The three available surface interpolators applied to the points data sets allow to generate continuous information along profiles. At the network scale, the sampling points can be considered as sparsely distributed, so that the best results are generally not obtained from inverse distance weighting and polynomial regression, because they are smoothing the data. Studying problems of pollution in this way doesn't seem to be the ideal approach. Variography and kriging (KRIGING with the GRAPH or BOTH options, and plot with SEMIVARIOGRAM command) are on the other hand interesting for analysing the spatial variation of data (structural and random components, preservation of values at the data points) and for concluding in terms of pollution spreading and of sampling scheme optimisation (Halleux I., 1996). Only ArcInfo can be used in this way, so that the solution is less compatible with the Esri products actually available on other platforms.

CONCLUSIONS

Dynamic Segmentation is certainly the best ArcInfo data model that can be used for managing the quality of surface water. Route-systems and events offer a reliable and efficient support for the control of the environment.

ArcInfo must be used for building route-systems according to the profiles needed for the spatial analysis of data, and for creating events and event tables based on the definition of the management goals. Tools in ARCPLOT are useful but not sufficiently developed for a powerful presentation of the environment reality (lack of utilities for scaling profiles and drawing lines between eventmarkers). Some specific functions may be found in GRID for studying continuously the discrete distributions.

ArcView presents lots of advantages for the day-by-day analysis : it is easy to use and repetitive tasks may be programmed. This software however does not allow updates of route-systems and measures.

Further developments of Esri products will probably extend the usability of Dynamic Segmentation and then its possibility of application to environmental control networks.

ACKNOWLEDGMENTS

I would like to thank all my colleagues for the numerous useful - and tedious - discussions that led to implement the Water Database System of ISSeP. Thanks especially to J.C. Maquinay, A. Michaux, C. van der Wielen and S. Eloy for their patience when testing the data models and the dedicated functionalities.

REFERENCES

Esri Inc. : ArcDoc on CDROM.

Halleux, I. : Evaluation quantitative des stocks de rŽsidus de lavage, EEC-ECSC Research Report, Luxemburg, 32 p., to be published, 1996.

Maquinay, J.C.., Michaux, A., and Halleux, I. : Note descriptive relative ˆ la Base de DonnŽes "RESEAUX EAU", ISSeP, Internal Report, 6/96, 19 p., 1996.

Halleux, I., van der Wielen, C. : Stats & Ecotox : User's guide, ISSeP, 7/96, 11 p., 1996.

Isabelle Halleux, Dr. ir.
Environment Section
ISSeP
Rue du Chera, 200
B-4000 Liege
Belgium

Telephone: (32) 41-527150
Fax: (32) 41-524665