Michael J. Meitner
Duncan Cavens
Stephen R.J. Sheppard

Academic Metadata Standards: Getting Compliance Without Enforcement

One of the crucial and often overlooked aspects of collecting and maintaining metadata associated with geographic data is the individual who must voluntarily comply with standards developed in a centralized fashion. In order to overcome this barrier, a metadata system/standard was developed at the University of British Columbia to assist the user in becoming compliant with our organizational metadata goals. Given the fact that participation in an academic setting is impossible to mandate, we present a set of rules to follow for organizations where compliance cannot be governed centrally.

INTRODUCTION

As geospatial data continues to accumulate, now in gigabytes at a time rather than the megabytes of old, it becomes increasingly difficult to 1) locate relevant data 2) make use of in-house data products and 3) share relevant data with interested parties. At the University of British Columia's Faculty of Forestry we have seen a dramatic increase in the proliferation of geospatial data since the recent hardware purchases made by the Forest Information Resource Management Systems (FIRMS) laboratory. A number of centralized data servers were purchased yielding nearly .5 terabytes of storage space. As anticipated, this increase in available hardware has lead to an explosion of GIS research by faculty and students, but has also lead to a number of problems. The greatest of which is that we have so much data now that we have lost track of where it came from and whether or not it has any current or future value that would necessitate archiving it to a more durable medium.

Our situation, however, is not unique. With the continually decreasing prices of hardware and software, the advent of desktop GIS packages such as ArcView, and an ever-increasing array of source data, many organizations are finding themselves in similar situations. As GIS software providers continue to strive to make their products increasingly "user friendly", a larger number of questionably trained individuals are churning out data products that are of questionable value. While I would not discourage this practice of enhancing the usability of GIS technologies, this trend requires the consumer of these data products to be quite meticulous when it comes to determining the origins and paths that these data may have traveled. (Enter our hero): "METADATA".

Most experienced GIS professionals know that the solution to this problem is well maintained and documented metadata, but where is it? How many times have you searched, made phone calls, and sent emails trying in vain to determine where some data that ended up on your desk came from, let alone the process of its genesis? The answer to this question is usually quite shocking. If only we lived in a perfect world where data always came with the information you needed neatly attached. This paper hopes to offer a number of possible solutions to the problem of rare and endangered metadata.

The reason for creating metadata is normally to improve the possibilities of document retrieval as well as to support control and management of collections (Hakala, et. al., 1996). Academics often refer to data in the absence of metadata as raw data. Raw data is useful only when it can be framed within a theoretical or conceptual model. This requires understanding the types of variables that were measured, measurement units, potential biases in the measurements, sampling methodology, and other pertinent facts not represented in the raw data. The combination of raw data and metadata within a conceptual framework produces information, which then becomes quite useful in research setting. Without this, GIS data has little use beyond the scope of its initial purpose.

Recently, increased interest in long-term temporal-change research, comparative studies, and expansion of the spatial, temporal, and thematic scales of basic and applied studies have resulted in data sets being used for multiple purposes, often repeatedly over long periods of time. This is especially true in the field of forestry research. Often forestry researchers are interested in time scales in excess of 100 years. Without adequate metadata it is impossible to build on the research of those who came before. This concept of data sharing, both with contemporaries and through time, is becoming more important as we strive to answer questions such as the sustainability of our practices over time, inter-generational equity, and long term ecological and economic impacts of decisions made today. Databases compiled from a variety of sources that have been reformatted, verified by quality assurance and quality control procedures (QA/QC), integrated with ancillary data, and well-documented have been, and continue to be, valuable to efforts to extend the spatial, temporal, and functional scope of forestry related studies.

So why doesn't everyone maintain high standards when it comes to metadata? The answers to this question are plentiful.

High costs, primarily in terms of personnel time, can be associated with initially developing metadata. For short-term projects, the level of effort expended in developing metadata may exceed the efforts expended in data collection. Editing data and metadata and making them available in hard copy or electronic formats to the scientific community are real costs that researchers must trade off with the ability to achieve research objectives.
Research grants and other existing funding mechanisms are often insufficient to support development of a comprehensive set of metadata. Funding is always short and no one wants to pay for extending the life of the data beyond the limited scope of the funded project.
Metadata standards such as the Content Standard for Digital Geospatial Metadata (CSDGM) created by the Federal Geographic Data Committee (FGDC), while quite comprehensive, are far to onerous to adhere to in an academic setting of limited budgets. Their length and number of highly scientific terms makes them rather daunting to implement for researchers whose area of expertise is outside the scope of GIS.
Long-term maintenance of archived data and associated metadata are real costs that are not typically factored into project budgets.

All of these issues have led us to a situation where data abounds and metadata is hard to find. In my experience, most researchers know that they should do it but this has not solved the problem. In an attempt to remedy this situation we have worked to implement a centralized metadata database. By creating this resource for our researchers we have been able to offer a service, which addresses a number of the concerns listed above.

METHODS

A critical aspect of the system was to utilize a standard web browser interface in order to minimize the complexity of using the system in hopes of gaining a higher degree of use in an academic setting where technical support is often at a premium. Our system uses SQL server as the back-end relational database management system and active server pages as the front-end of the dynamically generated user interface. The use of these rather simple development environments means that we spent very little time actually programming the system and to date the maintenance load has been quite light. Users of this system are able easily query for relevant data products by a number of fields (See Figure 1).

Figure 1: FIRMS lab metadata database query tool

Results from a query are then brought up in the next page as seen in Figure 2.

Figure 2: Search results

All entries matching the search criterion will be bought up and to see the detailed metadata for any entry the user need only click on the item they would like more information on (see Figure 3).

Figure 3: Detailed results

Of primary concern for us was the simplicity of standards as well as use. We chose to implement a small subset (15 in total) of the CSDGM standards so that the overall burden of data entry was reduced. This has the additional benefit of reducing the complexity of the standard so that non-GIS types can easily understand what needs to be documented. Simple terminology was used whenever possible and a set of online guidelines are provided when further clarification is needed. Since the standard we have implemented is short much, is still left undocumented but this also keeps the cost of filling out these forms rather small. The average user can fill out the form in as little as 5 minutes and once a form is completed it can be cloned to other data so that if you only need to change a few entries it is even faster (see Figure 4).

Figure 4: Data entry form

Guidelines are also provided to assist new users in becoming familiar with the system (see Figure 5).

Figure 5: Data entry form guidelines

CONCLUSION

The real question that must be on your mind if you have made it this far is, "Did this help in getting compliance without enforcement?" We would love to be able to tell you that this is the end all be all of metadata systems and that now, because of all of our efforts, we have achieved 100% compliance, but that is simply not the case. People just don't get excited by metadata. However, we have seen a dramatic increase (from none to some) in the documentation of data products in our faculty. We believe this is because we have given researchers a tool that they find valuable to our overarching mission of advancing science in general as well as facilitating the organization of deliverables, location of needed data, and sharing of data with colleagues.

This is very much a work in progress and more effort is needed in the future in implementing; custom metadata standards by research project, XML import/export ability, and a greater degree of integration with our spatial data as we ramp up to SDE in the coming months.

REFERENCES

Hakala, J., Husby, O., & Koch, T. (1996). Warwick framework and Dublin core set provide a comprehensive infrastructure for network resource description. Report from the Metadata Workshop II, Warwick, UK, http://www.cas.usf.edu/english/walker/apa.html (20 June 2001).

Michael J. Meitner
Assistant Professor
Collaborative for Advanced Landscape Planning
Department of Forest Resources Management
Forest Sciences Centre
2045-2424 Main Mall
University of British Columbia
Vancouver, BC., Canada V6T 1Z4
E-Mail: meitner@interchange.ubc.ca
Phone: (604) 822-0029, FAX: (604) 822-9106

Duncan Cavens
Graduate Student
Collaborative for Advanced Landscape Planning
Department of Forest Resources Management
Forest Sciences Centre
2045-2424 Main Mall
University of British Columbia
Vancouver, BC., Canada V6T 1Z4
E-Mail: duncan@cavens.org
Phone: (604) 822-6708, FAX: (604) 822-9106

Stephen R.J. Sheppard
Associate Professor
Collaborative for Advanced Landscape Planning
Dept. of Forest Resources Management and Landscape Architecture Program
Forest Sciences Centre
2045-2424 Main Mall
University of British Columbia
Vancouver, BC., Canada V6T 1Z4
E-Mail: shep@interchange.ubc.ca
Phone: 604 822-6582, fax: 604 822-9106