Marko Häkkinen

Paving the Information Superhighway - the National Road Database Case in Finland

As a part of the Finnish Geographic Information Market or National Spatial Data Infrastructure activities, some inexpensive end user tools or client applications for querying public online spatial data have been developed. These applications act as the client end for the National Spatial Data Infrastructure in Finland.

To serve the client applications, query interpreters must be built to interface the online back end database servers to the concept. The queries defined by the client applications are translated in to Edifact format. These Edifact queries are sent across a data network and further translated in to expressions understood by the back end server's data management system in question.

One of these public online databases is the National Road Database, managed under ArcInfo at National Land Survey of Finland. The incoming Edifact query is translated in to a group of SELECT, ASELECT and RESELECT commands. The query is run in ArcInfo, resulting in a coverage with features that meet the selection criteria. This coverage is translated in to Edifact format and sent back to the end user who made the original query.

All this processing happens automatically and within some minutes. The back end query processing, i.e. automatic query interpretation, data retrieval and message forwarding, is done by using an AML application developed at National Land Survey of Finland.


INTRODUCTION

National Online Spatial Data Service Framework

Since mid-1980's there has been activity to set up a national concept to enable different Finnish organizations share spatial data in an easy way. Much of the work has been carried out in the background, with evolving technical solutions and goals.

Now, in 1995, the much touted National Spatial Data Infrastructure is finally turning into reality in Finland. As a part of this societal GIS framework, an electoronic spatial data clearinghouse or market has been set up. This spatial data market is a client-server based computing system. On one end of the counter there are client applications, creating query messages. On the other end there are server applications, responding to query messages and thus creating data messages to be sent back to the clients. Both client and server ends of this spatial data interchange are supported by EDI (Electronic Data Interchange) software, being responsible for format translations etc. This data sharing approach is very similar to that of Open Geodata Interoperability Specification (OGIS) [Buehler 1994].

Three different client applications are available to users who want to access various spatial databases. Two of these client applications are based on desktop GIS technology [Mikkonen & Rainio 1995].

National Road Database

Digital road data has been available for some time in Finland, but mainly for small scale road mapping purposes or only over smaller areas. Due to increasing demand from various organizations, National Land Survey of Finland (NLS) has built a National Road Database (NRDB). All roads that can be driven by a car are stored in the NRDB. The attributes stored in NRDB are:

NRDB data volume amounts to some 1 GB. A detailed presentation of NRDB can be found in [Tikkanen 1994].

EDI BASED SPATIAL DATA SERVICE CONCEPT OF FINLAND

In Finland, the national strategy for spatial data services has been developed gradually since 1983. At an early stage, it became apparent that the variety of spatial data sets is too wide to be harmonized at national level; Edifact standard (ISO 9735) was chosen for the basis of EDI. One of the three key tasks of NLS is to advance spatial data sharing. NLS's Geographic Information Centre (GIC) has worked as a harmonizing actor in the spatial data market.

The open spatial data market in Finland is implemented as a client-server model. The client-server communication is carried out according to the rules of EDI query and data messages that can handle also geometric aspects.

The basic components of the EDI based spatial data market are:

Client - the End User Application

The client or query application - sometimes even called the data extractor - provides its users a standard query interface to all datasets within the market. Users can query spatial data in a standard way regardless of the system that delivers the data. EDI based data service allows for different types of user interfaces: electronic forms, spreadsheets and maps.

A map interface can present data from different sources. This way, users can easily query the datasets and areas of interest they really want and always get the data up-to- date. Obvious advantages of GIS based client applications using the EDI concept are:

As an example of the last benefit, Sampo - the ArcView 2 based client application [Mikkonen & Rainio 1995] - receives the data messages in either shapefile, coverage or TIFF format.

Client applications can be built in short time with existing development tools, such as Avenue and Visual Basic. First commercial client applications are just about to enter the market.

Server - Data Delivery System

Spatial data sharing - or the joint use of spatial data as it is often being called in Europe - aims at improving the availability of spatial data. It is based decentralized data collection and maintenance. The concept is open, i.e. all spatial data of general interest should be available and able to be transferred in a standard way. Data suppliers are responsible for defining and describing datasets that they bring to the market.

A database becomes a part of the EDI based spatial data service concept when a server interface is implemented. In a data delivery system, user rights and invoicing are automatically controlled, since a customer database is connected to the service. The query message sent by the client is translated by a query interpreter and the data then retrieved from the database. The retrived data is translated to Edifact and sent back to the user.

This concept brings following major benefits to data suppliers:

Making It All Work

A three-layer EDI model forms the basis for the concept. International standards are applied to the transportation and presentation or encoding. Generic or content-specific message types are described in a standard way (national standards). Ordering is based on these descriptions. Transportation is usually based on Internet Protocol addressing.

Service Center Concept

Good support and consultance is important since EDI tools and standards may be found too complicated. A service center is a good approach when implementing and marketing the service. In 1994, data providers asked the NLS GIC to set up a spatial data service center (SDSC) for the spatial data market. Some other SDSCs are also showing up.

SDSC can be called a spatial data clearinghouse. It helps the data providers in implementing their server processes but it also acts as a reseller of spatial data: data suppliers make agreements with SDSC, not directly with clients. Also end users, clients, make agreements with SDSC. This way, both clients and servers need to make only one agreement instead of tens or hundreds of agreements. The end user gets only one bill from SDSC and not separate bills from all data providers. The implementation may also include other parts of the EDI system to be run by a SDSC.

NRDB EDI SERVER IMPLEMENTATION

According to the EDI based spatial data service concept, the NRDB server receives a query message that was originally created by the end user client application. The query message is translated by the NRDB server, the query is performed and the resulting data message is created. The overall call-tree structure of the algorithm implemented can be seen in Figure 1.

Translating the Query Message

An example of a full Edifact query message can be seen in Appendix 1. The first part of the message tells what kind of attribute data the client wants. Usually she wants all the available attributes, and this presentation reflects this approach. Ultimately, this means that the NRDB EDI server is only interested in the Edifact message part beginning with the first OPE statement. So basically all Edifact records before the first OPE can be skipped. But this is true only for client queries asking for full attribute data message as an answer.

In our example, with the leading and trailing Edifact frame lines skipped, we are dealing with the following query.

OPE+OR+2' OPE+AND+2' AJK+Tien luokka+LUOK:TI' AJA+Autotie IIIa+12131' AJA+Autotie IIIb+12132' AJK+TIE+TNRO+TNUM' AJA+Tienumero+385' AJK+Päällysteen laatu+PAAL' AJA+Kestopäällyste+2' UNS+D' OPE+LKO+3' ALU+EA' VII+M' PIS+6987411.522:1566311.233' PIS+6987513.522:1566313.233' PIS+6987411.522:1566514.233' PIS+6987513.522:1566511.233' PIS+6987411.522:1566311.233' TRV+TIE' TRV+MTP'

In this query, both spatial and attributive conditions are expressed in records beginning with an OPE field. Operators are searched simply by reading the Edifact file. OPE has two arguments which indirectly tell whether the operator is spatial or attributive. The attribute operator appears first in the Edifact file. There is only one spatial operator but the number of attributive operators is not limited.

Translating the Attributive Condition

The attributive condition of the query message looks as follows:

OPE+OR+2' OPE+AND+2' AJK+Tien luokka+LUOK:TI' AJA+Autotie IIIa+12131' AJA+Autotie IIIb+12132' AJK+TIE+TNRO+TNUM' AJA+Tienumero+385' AJK+Päällysteen laatu+PAAL' AJA+Kestopäällyste+2'

Translated to plain English, this query means: select those parts of road number 385 that have a class code of 12131 or 12131, or all those roads that are paved. The Edifact condition is translated in nrdb_attributes. The record following OPE tells whether there is another operator or not. In the former case, recursion is needed to translate the complex condition. In AML, recursion can be achieved by using the &do &while structure in the following way.

/* main.aml &setvar .more_operators_to_come = .TRUE. &run open_files &run find_first_operator /* returns .oper &run find_next_operator %.oper% &run close_files &return /* find_operator.aml &run process_previous_operator %.oper% %.out_file% &do &while %.more_operators_to_come% &setvar .oper [read %.in_file% readstatus] &run check_operator &if (%readstatus% ne 0) or %.out_of_operators% &then &setvar .more_operators_to_come = .FALSE. &run find_next_operator %.oper% &end &return

If the record begins with AJK (instead of another OPE), the record and trailing AJA records represent one attributive condition. In our example, the Edifact condition becomes

(((LUOKKA = 12131) OR (LUOKKA = 12132)) AND (TIENUM = 385)) OR (PAALLY = 2)

in familiar AML. This condition is written to a temporary file that can be used later. Let us call this file attr_con.aml in this paper. Note that AJA records following an AJK record in the Edifact query message have an implicit OR operator between them.

Translating the Spatial Condition

The spatial part of the Edifact query message looks as follows:

OPE+LKO+3' ALU+EA' VII+M' PIS+6987411.522:1566311.233' PIS+6987513.522:1566313.233' PIS+6987411.522:1566514.233' PIS+6987513.522:1566511.233' PIS+6987411.522:1566311.233' TRV+TIE' TRV+MTP'

Translating the spatial condition from Edifact to AML is much easier than translating the attributive condition. In the case of NRDB, the spatial condition, i.e. the search object, can be either a polyline or polygon. A national recommendation for Spatial Query Message (JHS 118) defines two spatial operators: LKO and SKO. These two operators can have the following three meanings in this context:

  1. LKO line-line: those lines from NRDB that intersect the search line are selected (1a);
  2. LKO polygon-line; those lines from NRDB that are at least partly within the search polygon are selected;
  3. SKO polygon-line: those lines from NRDB that are completely within the search polygon are selected.

These operators have the following translations in AML:

  1. SELECT POLYGON PASSTHRU ...
    ASELECT POLYGON PASSTHRU ...
    ASELECT POLYGON PASSTHRU ...
  2. SELECT POLYGON PASSTHRU ...
  3. SELECT POLYGON WITHIN ...

The translation is done in a module named nrdb_geometry. The only operator that requires some further discussion here is LKO line-line, where the intention is to search those lines that intersect with a search line of N vertices or points (N > 1). Since AML does not support a search using a N-point line, each segment of this search line is translated to a separate 2-point line. So an Edifact search line with N vertices is translated to N-1 lines of two points each. The first two-point segment is translated to a SELECT statement, and the following two-point lines become ASELECT statements, having arguments POLYGON PASSTHRU. Note that these search polygons are defined by just two points, thus defining actually a line segment. If the search line has only two points, no ASELECT statements are needed.

The two different interpretations LKO operator is defined by the Edifact record following the OPE record: if it is ALU (as in our example), we have the alternative 2. Respectively, VII yields the other alternative, i.e. 1.

The spatial condition (given above) of our example becomes

SELECT POLYGON PASSTHRU 1566311.233 6987411.522 ~ 1566313.233 6987513.522 ~ 1566514.233 6987411.522 ~ 1566511.233 6987513.522 ~ 1566311.233 6987411.522

when translated to AML. This AML statement is written to a temporary file. Here, let us call this file geom_con.aml.

Querying the Database

Running the actual query is very easy and it is done in module nrdb_select. The AMLs created in the translation stage are simply run and the selected features are extracted to a selection coverage, using the following AML. Note, that in the Edifact query message the attributive operator is given before the spatial operator, but in ArcInfo it is more effective to select by geometry first. Hence, the temporary AML query files are run with spatial condition first. At this stage, some control statements are also attached to the AML sequence.

display 0 coordinate keyboard editcoverage /tie2/nrdb editfeature arc &run geom_con &run attr_con put selcov

If the client wanted to have only some of the attributes present, the unwanted attributes are omitted or zeroed by using CALCULATE in TABLES. After this, DISSOLVE is run to clear out the unnecessary nodes in the selection coverage. Now the selection coverage is ready to be translated to a data message.

Generating the Data Message

Since writing out Edifact is a relatively complex task, an intermediate format, namely NLS's internal MXL, was chosen to be used as the output format for the data message.

Translating the coverage to MXL is done in two steps:

  1. the selection coverage is ungenerated using basic ArcInfo functionality;
  2. the two generate files are translated to an MXL file.

Converting the Coverage to Generate Files

The coverage is transferred to generate files by using UNGENERATE and UNLOAD. This way we get a LIG (line geometry) file and a LIA (line attributes) file. Together, these files can be called as the G&A files. Since the coordinates in the LIG file are presented as floating point numbers, they must be translated to a format supported by MXL. This reformatting is done by using the AML [FORMAT] function.

The attribute data is unloaded simply by giving

tables select %filename%.aat unload %filename%.lia %filename%-id, luokka, tienum, ~ osanum, versuh, valmas, vapkor, ykssuu, paally, tastar q stop

After this step we have two files, LIA and LIG - for attributes and geometry, respectively.

Combined, these files are the data message that the client wants to have. But since the client may not be able to read this G&A combination, the data message must be translated to Edifact. This is done by first translating the G&A file pair to MXL.

Merging and Converting G&A Files to MXL

MXL format is a spatial data transfer format developed at NLS. It is an ASCII format, enabling the transfer of points, lines and polygons.

The LIG and LIA files are opened for read and the MXL file is opened for write. After writing the standard MXL file header, the first record of the LIA file is read and further translated or reformatted to an MXL line header record. Next, N records from the LIG file are processed one by one, i.e. translated or reformatted to MXL line point records. After all the line point records in the LIG file have been processed for the first line feature, another record is read from the LIA file. All the lines in the LIA file (and hence, the LIG file also) are processed this way.

The MXL data message is sent to SDSC's server, which translates the data message to Edifact. The server further sends the Edifact message to the originating client, which receives the Edifact and translates it to the format that the client GIS understands, e.g. shapefile format.

DISCUSSION AND CONCLUSIONS

General

The EDI based spatial data service concept involves many format conversions, some of them being relatively complex, but in the long run this concept is the most effective way of providing online access to the fast growing number of spatial databases. This is due to the fact that both the client and the server applications need to have only one data format interface, the one defining the conversion to and from Edifact.

If we think of a situation with three different client applications (A, B and C) and three different server applications (X, Y and Z), there are altogether six different applications. If they do not support a single common format, there are actually five formats each application has to support. If these formats are stable, this is not a problem. But whenever a modification is needed to any of the application formats, each translator has to be modified accordingly. E.g., if format B is redefined, the following client-server converters in must be modified: X-to-B and B-to-X, Y-to-B and B-to-Y, Z-to-B and B-to-Z.

But if all of these six applications support a common format (E), the situation gets a lot simpler. Now, if the B format gets changed, only E-to-B and B-to-E converters need to be modified. Similarly, adding new client or server formats is relatively easy, since a new application (N) only needs two converters, namely N-to-E and E-to-N. If there were no E format, all the applications A, B, C, X, Y and Z would need new converters to and from N, as would application N need to and from converters for all these existing applications. This clearly illustrates the strength of using a standard format in between.

Of course, the many conversions in the EDI concept take a lot of processing time. But performance tests show that the conversion is still a very small part of the total elapsed time in the whole chain of two-way transmission over data network, running the actual query and so on. So skipping some conversions does not really bring any noticeable benefits to the client.

In a corporate environment with fast network connections, it is a good idea to skip unnecessary conversions, though. NLS is currently building its internal Spatial Data Distribution System (SDDS), based on the EDI concept. The client is built on top of ArcView 2 and most servers are based on either ArcInfo or MAAGIS (NLS's proprietary GIS for data collection). Since ArcView supports ArcInfo data sets and MAAGIS supports ArcInfo format, there is no need for Edifact in the data messages.

NRDB Case

Clearly, there is a lot to be done to increase the performance of the NRDB server. Unloading and ungenerating the coverage first to ArcInfo generate files (G&A) and then to MXL before actually reaching the Edifact message format is certainly not a very effective way of acting. The reason for this procedure is simple, though: it was the fastest way to set up the server. There was no need to massive programming, since the G&A files - supported by ArcInfo - could be translated to MXL with an existing converter. And earlier, luckily, SDSC staff at NLS GIC had written a converter from MXL to Edifact. Of course, the two intermediate formats will be skipped in a future release of the NRDB EDI server application.

Programming the query interpreter was a totally new and challenging experience, since the QI application was meant to be fully automatic. This meant that normal feedback methods could not be used; all the processing - from receiving the query to sending the answer message back - had to be carried out without any interaction. This called for radical error handling, since all errors had to be dealt with in a meaningful way. It was not possible to just halt the AML execution and say "error in Edifact query message, try again". In theory, the input data (i.e. the Edifact query message) is always syntactically correct, but the server application can not rely on this. And even if the syntax is correct, the semantics of the query may be questionable. All semantical errors can not be tracked - nor should they be - but all apparent errors had to be handled, whether they were caused by the end user, front end query application or something else. Of course, a lot of cooperation with the vendors of other software components (client applications and SDSC processes) was involved in the development process, but still the extensive error handling made the programming demanding.

The NRDB EDI server application was built at NLS's Geographic Data Centre (GDC). Building the NRDB EDI server application as described took some 200 hours of AML programming. During 1995, four additional EDI query servers based on the same approach will be built at NLS GDC. One of these servers has a massive raster database as the dataset to be queried.

ACKNOWLEDGEMENTS

I want to thank Jan Lindholm and Kirsi Mäkinen of NLS GIC for exposing me the secrets of Edifact.

REFERENCES

Buehler Kurt: OGIS Augments Data Transfer. GIS World, October 1994; pp. 60-63.

Coordinating Geographic Data Acquisition and Access: the National Spatial Data Infrastructure. Executive Order 12906, April 11, 1994. Federal Register, Vol. 59, Number 71, pp. 17671-17674.

European Commission, Directorate General XIII: GI2000 - Towards a European Geographic Information Infrastructure.

Federal Geographic Data Committee: National Digital Geospatial Data Framework.

Mikkonen Kari and Rainio Antti: Towards a Societal GIS in Finland - ArcView Application Queries Data from Published Geographical Databases. Proceedings of 15th Annual Esri User Conference, 1995.

National Land Survey of Finland: National Strategy for Geographic Information Services.

National Land Survey of Finland: Implementation of EDI Based Geographic Information Service.

National Land Survey of Finland: Technical Standardisation of Geographic Information Services.

Open GIS Consortium: Open Geodata Interoperability Specification.

Rainio Antti: Joint Use of Geographic Information in Finland. From Research to Application through Cooperation, Lecture material of Workshop G, Part 2, Joint European Conference and Exhibition on Geographical Information, The Hague, the Netherlands, March 1995.

Tikkanen Tapio: The National Road Database of Finland. Proceedings of the Fourteenth Annual Esri User Conference, 1994.

US Geological Survey: US National Spatial Data Infrastructure.


Marko Häkkinen
Senior GIS Analyst
National Land Survey of Finland
Geographic Data Centre
Development Services
Opastinsilta 12 C (P.O. Box 84)
FIN-00521 Helsinki, Finland
Telephone: +358 0 1545194
Fax: +358 0 1545454
Email: Marko.Hakkinen@mmh.fi