Bert Vermeij

Implementing European metadata in ArcInfo 8

Abstract

ArcCatalog has an open extensible architecture and provides a framework for the implementation of custom metadata environment. This paper deals with our experiences with the customization of ArcCatalog, fitting the metadata functionality to the needs of the Dutch users. Metadata describes and documents spatial data. The implementation of metadata serves several goals. Archiving spatial data, for data managers and searching spatial data, for data users. Spatial metadata is one of the key-components of a geo-information infrastructure. Data managers prefer to keep the metadata with the data; data users are primarily interested in a metadata environment that can be queried. Publishing metadata in a Clearinghouse provides a searchable database of information about geodata and thus facilitates data sharing. With ArcCatalog, Esri introduced a system that automatically associates metadata with spatial datasets. A key issue with respect to metadata is the use of content standards or what metadata to collect. Out of the box, ArcCatalog currently supports the (American) FGDC content standard for spatial metadata, whereas in Europe the CEN/TC287 standard becomes widely adopted. This paper describes an initiative of Esri Nederland and one of our business partners, Geodan B.V., to integrate ArcCatalog in the relatively well-developed metadata-situation in the Netherlands. The core part of this effort is the design and implementation of metadata schema and an editor that supports the Dutch content standard for spatial metadata. In fact, this standard is a locally redefined subset of the European CEN/TC287 content standard for spatial metadata. Both technical issues as well as some theoretical background are covered by this paper.


Why metadata

As spatial data is the fuel of a GIS, it is important to know if the data fits the systems needs. Metadata describes data. It is defined as a common set of terminology, defining (potentially disparate) data to facilitate consistent collection, indexing, querying and publishing. This is information about content, quality, source organizations, data format and organization, collection schedule, uses, data currency, spatial references and distribution mechanisms of the data. Keeping spatial metadata records is important for three reasons. First, from a data management perspective, metadata documents spatial data. This is important to maintain the investments of organizations in spatial data. Second, for data users, metadata is necessary to search for appropriate datasets. Metadata provides information about the data available in an organization, or through catalog services or clearinghouses, one can get information about data available in external sources. The metadata not only helps to find data, once data has been found, it also tells how to interpret data and how it can be used. Finally yet importantly, publishing metadata facilitates data sharing. Sharing data between organizations stimulates cooperation and the coordinated, integrated approach to spatially related policy issues. These different purposes of metadata require different detail of the metadata content. Data managers need very detailed information on data format, internal structures and data definitions. Users generally require a kind of 'catalogue' information on where to find certain data, how to use it and contact information. From an organizational perspective, many users only need a few metadata items; only a few users need completely detailed metadata. However, the bottom line is, that metadata make spatial information more useful, simply because you know what data you are working with.

Metadata users

Metadata in ArcCatalog

ArcCatalog is an application for locating, browsing and managing spatial data. It resembles the Windows Explorer, but can see down into databases and quickly view data and metadata. With regard to metadata, ArcCatalog's strongest property is the automatic association of metadata with all geographic datasets. The ArcCatalog has been designed to create metadata for any dataset supported by ArcInfo as well as any other dataset identified and catalogued by the user (e.g. text, CAD files, scripts, images). ArcCatalog comes with the support for the FGDC metadata standard, with an editor to enter metadata, a storage schema and property sheets to view the data. Within the ArcCatalog environment, two types of metadata are distinguished: properties, or inherent metadata and documentation. Inherent metadata, which is metadata that can be derived from the data, is generated fully automatically. Examples of properties are the name of a dataset, the number of objects it contains, feature types and attributes, the geographic extend and the projection. Documentation is descriptive metadata, to be filled in by the user. These are items like the organizations that collect the data, quality characteristics and information on how the data can be obtained. The metadata is actually stored in XML, with the data. All data management functions in ArcCatalog (e.g. copy, rename, move, and clip) honor it. Metadata always travels with the data. Users can view the metadata in any XML-aware environment. Within ArcCatalog, stylesheets present the metadata to the user. A stylesheet gives the possibility to view only the metadata items required, or to give different views on the same metadata. By using different stylesheets, it is easy to support the different metadata requirements from different groups of users.

Background: Metadata in the Netherlands

In the early 90’s a new term was introduced into the GIS world: metadata, or meta-information. Each geographic data set should be described and preferably accompanied by metadata. Geographic data then are documented and ready to be searched by data users. At first, the implementation of metadata in an organization was perceived as ‘necessary evil’, but as more people became aware of the importance and the benefits that implementation brought, its popularity grew. Today many large, mostly governmental, organizations that deal with spatial information have implemented metadata. Most of those organizations use GeoKey, a popular Dutch metadata management tool, to create metadata and to make metadata available to large numbers of users through intranet or internet. The figure below illustrates the role and position of meta-information in a Geo Information Infrastructure.

Metadata in a Geo Information Infrastructure

GeoKey stores metadata in a meta-database that is physically separated from the spatial data itself. A user can access several meta-databases, in different locations, on different platforms, simultaneously. Since GeoKey allows distributed search, several organizations have shared their meta databases. The Provinces of Gelderland and Noord-Brabant share their metadata with the regional Ministry of Traffic, Water management and Public Works offices of Gelderland and Noord-Brabant respectively. This stimulates data sharing, and saves time and money. The current discussion in the Netherlands is: must data, which are not described by metadata, be accessible for users? Some organizations take the lead: only data, described by metadata, may be stored in a data warehouse or on central network disks. These data are then only accessible through the metadata.

The European CEN metadata standard

With regard to metadata, an important issue is what we need to know about a dataset? In general, the description of a dataset covers the following topics. Identification. This is basic information about the dataset, like the title, the geographic area covered and rules for using the data. Spatial data organization. How is the spatial information represented in the dataset? Is it a raster or vector structure, has the data direct or indirect spatial reference. Spatial reference. A description of the reference frame for and means of encoding coordinates in the dataset. This includes the map projection, coordinate system and horizontal and vertical datum. Data quality. Metadata items like positional and attribute accuracy, completeness, consistency and methods used to produce the data provide an assessment of the quality of the dataset. Content. Information about the objects in the dataset, including names and definitions of features and attributes. Distribution. Contact information and other items on how to obtain the dataset. Metadata reference. This describes the currentness of the metadata and the responsible organization. So there is a wealth of items that together describe a dataset. In addition, these items can be used in different ways. In this perspective, a standard provides a common set of metadata elements or variables to document geospatial data. It also gives definitions and a common terminology. By using a content standard, metadata becomes transparent. The Federal Geographic Data Committee (FGDC) adopted a content standard for metadata in the US. According to an Executive order, all Federal agencies use this standard to document newly created geospatial data. In the Netherlands, the European CEN / TC 287 standard has been chosen to become the National Standard. An advantage of this European Model is that it is an official standard, which is maintained by the standardization board. When the FGDC and CEN content standards are compared, the most important difference is that FGDC is more granular and mainly use discrete variables where the CEN model has some degrees of freedom in a number of metadata elements. In many cases one CEN tag maps to many FGDC tags, making writing a translator non-trivial. The generic CEN model has been further specified in the Dutch preliminary standard NVN-ENV 12657. This model contains about 290 metadata elements. This Dutch version of the CEN/TC287 standard has been widely accepted as metadata standard for the Netherlands. It is used in the National Clearinghouse for Geo Information pilot-project (NCGI). It should be noticed that CEN will be streamlined with the future ISO standard for geospatial metadata. A common opinion in the GIS-community is that the CEN standard and the official Dutch version of the CEN standard are not practical from a user perspective. It has too many variables and therefore is not workable. At least in the beginning. The Ministry of Traffic, Water management and Public Works has taken the initiative to define a simpler version of CEN, which contains a subset of 80 out of 290 metadata elements. This model has been implemented in the Ministries metadata management system (GeoKey) and is becoming adopted by many other organizations as well. It is referred to as ‘CEN-RWS’.

Implementing CEN in ArcCatalog

Out of the box, ArcCatalog only supports the FGDC content standard for spatial metadata. Users ask for the support of CEN as well. Support for the National (or European) standard is vital for ArcCatalog to play a role in the National Spatial Data Infrastructure. It is even vital for the success of ArcCatalog. If the users content standard for metadata is not supported, they will not use the metadata tools in ArcCatalog. This made Esri Nederland to decide to build a Dutch version of ArcCatalog. Building your own custom environment in ArcCatalog basically requires four components: A logical model for metadata content; An implementation schema; An editor; And one or more stylesheets. First, you need to decide what metadata elements you need (or want) to collect. This can be a standardized set like the FGDC or CEN content standard for spatial metadata or an internal standard, defined by your organization. Initially, the ‘CEN-RWS’ model is supported. As it is a subset of CEN, it can relatively easily be further developed and grow into the complete model. It also gives organizations the possibility to add their own metadata elements to support specific requirements. Next, this logical data model needs to be translated into an implementation schema in XML. A major advantage of XML is the open user definable structure, which makes it relatively simple to adjust a schema to the specific needs. This brings us to a powerful feature of ArcCatalog, the automatic harvesting of inherent metadata (properties). These properties are generated by ArcCatalog and then written to the XML-file. The tags in the XML-file correspond to metadata elements that follow the FGDC content standard and naming conventions. This process of synchronization turned out to be the big Achilles heel. It is one of the few things in ArcCatalog (ArcInfo) that cannot be customized. Synchronization always uses FGDC tags and there is no way to tell it to put the properties that are harvested into a different schema (e.g. CEN). Therefore, we had to get around that problem in the editor. The core part of our customization effort is the metadata editor. In ArcCatalog, a metadata editor allows users to enter and edit metadata for data sources, following a standard or other defined model. The editor stores the metadata in the XML-schema. It simplifies the creation of metadata with text boxes and dropdown lists for fields with predefined domain values, all according to the standard chosen.

Metadata editor defines schema

Esri Nederland has chosen to use VB to build an editor for the Dutch version of CEN. The editor supports the official ‘Guidelines for the implementation of the Dutch Preliminary Norm NVN-EVN 12657 Geographic Information – Data description – Metadata’. Wizards, menu interfaces and pick lists help the users in filling in the metadata. As the metaphor for the user interface, the editor follows the official Dutch tax program: it is straightforward, most users are familiar with it and well ordered. Help files on the use of the ‘Guidelines’ complete the interface.

Dutch ArcCatalog editor

The only way to get around the problem of the FGDC-tags for the metadata properties is to continue to use these. To be fully CEN-compliant, the editor copies ‘FGDC property tags’ to corresponding CEN tags. This gives some data redundancy, but this is only a small disadvantage compared to the benefits of the automatic synchronization process combined with the support of the CEN model for data transfer. Stylesheets expose the metadata to the user. The XML / XSL architecture gives the opportunity to use a set of different stylesheets on one XML schema. Dutch stylesheets in ArcCatalog enable viewing metadata in different ways. A stylesheet basically converts XML to a more easily readable HTML. It can be considered as a view on the metadata. Different stylesheets support different users. A database manager or users who need to know detailed information on the dataset will use a stylesheet that shows the complete metadata, an end-user can use a simple stylesheet that exposes a subset with only a few key metadata elements.

Stylesheet

ArcCatalog metadata in a broader context

Given the very specific circumstances in the Netherlands, and upon request of our users, a ‘bridge’ between GeoKey and ArcCatalog was added as a fifth component. In the current release (ArcInfo 8.0), ArcCatalog does not have a repository or metadata server, so it lacks the capability for distributed structured metadata search. The complementarity in the functionality of GeoKey and ArcCatalog is clear. Where ArcCatalog is a powerful (meta) data management tool, GeoKey is strong in metadata distribution and searching. Currently, many Dutch Esri GIS users use GeoKey as the environment to manage spatial metadata. This means that metadata is integrated within the GIS environment. Much effort has been invested in the documentation of the geospatial databases with metadata. Users query metadata to find the data they need. By building a bridge between GeoKey and ArcCatalog, these users can benefit from not only the strengths of both tools, but also from the additional benefits that result from the integration. The integrated solution is based on the following conceptual model: Metadata should be stored and managed with the data; Metadata should be stored in an open format (XML); Metadata content must be compliant with the European standard (CEN/TC287); Metadata must be made available in an open searchable environment; Users access metadata from an open query environment (Web-based); A Dutch user interface supports both metadata editing and metadata retrieval. It is absolutely required that the metadata records that are connected to the data can be populated with the metadata that is available in the existing (GeoKey) metadata databases. This not only efficient, as investments from the past are being exploited, it also assures a smooth transition. For the short term, users could benefit much from a simple bridge between the two environments, so that metadata can be transferred from one system to another. A simple connection for data exchange enables Esri GIS users to benefit immediately from the full strength of ArcCatalog without having to go through the very extensive process of creating the documentation in the metadata records. Creating metadata in ArcCatalog and bringing the metadata records to the GeoKey database provides a robust searchable environment. On a conceptual level, starting point is that in an environment where ArcCatalog and GeoKey live together, a metadata record initially is always created in connection with the data, so in ArcCatalog. Synchronization tools (import / export) between GeoKey and ArcCatalog are needed, both to fill the XML records with documentation that is already available in GeoKey and to publish the ArcCatalog XML-records in GeoKey's searchable database. The synchronization encompasses conversion to and from XML and a mechanism to detect anomalies. The synchronizer is an independent process, running outside GeoKey and ArcCatalog. It is possible to initialize the synchronizer from ArcCatalog, using COM-technology. The following conceptual model shows how these components work together to build the bridge between GeoKey and ArcCatalog.

Conceptual model

Distributed architecture; the next steps

ArcCatalog in the 8.0 release is focused on metadata capture. Searching and publishing of metadata is not supported yet. Essential for metadata to work for a geo information infrastructure is to support an open and distributed catalog architecture. The OpenGIS consortium has designed a conceptual model for distributed metadata search. In this model, spatial data is documented in catalogs. A catalog can also point to other catalogs. ArcCatalog should in a future release be able to support this architecture. A metadata server publishes ArcCatalog metadata to external clients. The server communicates with clients, based on standards like for instance Z39.50 or the OGC Catalog specification. Any compliant client application can connect to the server and query the published metadata. ArcCatalog itself can be such a client. The other way around, ArcCatalog should be able to connect to any metadata server. In addition, it is important that every user can access and search metadata from a standard web-browser. A simple web application should be available to connect to several distributed metadata servers and query published metadata.

Metadata users

Conclusion

ArcCatalog’s open extensible architecture proved to be a powerful framework to build a custom environment to capture metadata. Implementing a European CEN compliant metadata model is straightforward as far as it concerns documentation and requires a workaround for the properties. As harvesting properties is not customizable, it is recommended that future releases of ArcCatalog support not only FGDC but other standards like CEN and ISO as well. Although initially the collection of metadata is important, the real fun starts when you can search for data using a simple metadata browser in a distributed environment. In this respect, we expect ArcCatalog to be as open as possible.


Drs. Bert Vermeij
Business Consultant
Esri Nederland B.V.
Stationsplein 45
3013 AK ROTTERDAM
The Netherlands
+31-10-2170700
b.vermeij@Esrinl.com