Cross-Media Database Normalization of Various Metadata Standards for Environmental Decision Support and Community Management

Jacqueline Lesch

Abstract

This paper outlines the design and implementation of a cross-media database for the Salton Sea Database Program. The program is an Environmental Protection Agency (EPA) funded project at the University of Redlands to develop a GIS and establish an online data resource clearinghouse on issues relating to the ecological collapse of the Salton Sea. The cross-media database investigates new methods for the organization and use of multimedia digital data in a unified relational database design. Particular focus is given to the development of the cross-media data model including normalization across various metadata standards such as the Federal Geologic Data Committee  (FGDC) Content Standard for Digital Geospatial Metadata, Dublin Core, and the MARC standard for bibliographic resources.


1.0 Introduction

The Salton Sea Database Program (SSDP) is an Environmental Protection Agency (EPA) funded project at the University of Redlands' research institute - The Redlands Institute for Environmental Design, Management, and Policy. The mission of the SSDP is to develop geographic information systems (GIS) and establish an on-line digital resource clearinghouse for the Salton Sea, a Southern California inland lake facing ecological collapse if left unaided. Through the Salton Sea project, the Redlands Institute is researching to use of database technology to promote area information stewardship for environmental and community management through providing online access to geographically referenced multi-media information and resources.

The principles of area information stewardship as applied to environmental and community management seek to foster productive, informed dialog and decision-making among stakeholder groups. While modern and emerging information management technology, geographic information science, and the Internet have opened broad horizons for compiling, accessing, hyper-linking, relating, analyzing, and distributing information, at present such tools are underutilized for environmental application. At the core of the SSDP project is a multi-media relational database - The Redlands Institute Cross-Media Database (XMDB) for environmental application.

XMDB focuses on digital archival and distribution of multi-media resources, but also strives to advance emerging research and efforts for building interoperability between online knowledge communities through normalizing various metadata standards including the Federal Geographic Data Committee (FGCD) Content Standard for Digital Geospatial Metadata, Library of Congress Marc 21 Format for Bibliographic Data, and the Dublin Core Metadata Initiative's 15 element metadata set . To reach online communities conforming to the aforementioned standards, metadata should be maintained in one normalized, interoperable schema.  This document describes the development of the XMDB including normalization of metadata standards, the use of descriptive information relationships to create enhanced information patterns, and the incremental system development of a data and software platform model. 

2.0 A Brief Discussion of Metadata

Several definitions of "metadata" have been cited in literature including:

Common among the various definitions of metadata are the notions of "descriptive data" and "structure". In reality metadata exists everywhere in very common forms. Metadata is the formal term for descriptive information about a digital or physical object such as a person, place, thing, or even an activity. "Structure" refers to how this descriptive information is presented; a phone directory (contact metadata about people and organizations) and a library catalog (bibliographic metadata about documents) are two examples of structured metadata. Metadata is most often used as a tool for resource discovery - the information that can be searched to locate a digital or physical object. Metadata standards researched for the Cross-Media database include: FGDC, MARC, and Dublin Core.

FGDC.  Geospatial metadata gained national attention in the mid-nineties when Executive Order 12906, "Coordinating Geographic Data Acquisition and Access: The National Spatial Data Infrastructure," was signed by President William Clinton. Section 3 of the executive order calls for the development of an online national geospatial data clearinghouse with standardized data documentation. In 1998 the Federal Geographic Data Committee drafted a national standard for geospatial metadata: the Content Standard for Digital Geospatial Metadata is commonly referred to as "FGDC". (Content Standard for Digital Geospatial Metadata, available online: http://www.fgdc.gov/metadata/contstan.html).

MARC.  In contrast, evidence of bibliographic metadata exists from at least 537 BC in the form of clay tablet shelf lists kept by the National Library of Babylon.4 Another national library, the United States Library of Congress (LOC) is responsible for modern computerized bibliographic metadata structure. A higher education act signed by President Lyndon B. Johnson gave LOC the broad responsibility for "acquiring, insofar as possible, all current library materials of value to scholarship published throughout the world, and providing cataloging information for these materials promptly after they had been received."5  In 1965 LOC developed the "MARC" (Machine Readable Cataloging) format for bibliographic data. MARC became the official national standard for sharing data about books and other research materials in 1971, and an international standard in 1973.  The original MARC format has since evolved into MARC 21, the standard used by most library computer programs today.6

Dublin Core.  The desire for interoperable metadata about World Wide Web resources spurred the formation of the Dublin Core Metadata Initiative. A list of "core" metadata elements "to provide vertically specific (or semantic) information about Web resources, much in the same way library card catalogs provide indexed information about book properties" was developed in 1995. Dublin Core metadata is used to supplement existing methods for searching and indexing Web-based metadata. The original discussions that took place at the first workshop were focused primarily on creating metadata for electronic resources however since that time, the consensus among the DC community is that DC-enabled resource discovery systems can and should be used to describe both digital and "real" physical objects. Most DCMI participants are involved in large-scale archiving or cataloging projects that require the use of Dublin Core metadata to enable large collections of object "resources" to be grouped, named, classified, and indexed in a useful fashion.7

3.0 Cross-Media Database Project Background

XMDB has its roots as two separate databases, one for bibliographic resources and one for geospatial resources. The original compilation of documents for the Salton Sea project focused on collecting scientific journal literature, scientific white papers, and environmental documentation. A document database was developed using Microsoft Access to capture bibliographic metadata and track the number of documents collected and scanned.

Concurrently a GIS was developed. Data layers include:

The GIS supported data analysis and map production for the 2000 Draft Salton Sea Environmental Impact Statement/Environmental Impact Report.  In addition, the Salton Sea Digital Atlas was developed, a 3-CD ROM set containing 105 data layers packaged with Esri's ArcView Data Publisher TM. Geospatial metadata for the digital atlas was recorded in a separate Microsoft Access database.

The idea to combine the bibliographic and geospatial metadata into a single integrated relational database spurred the development of a cross-media database system.

 4.0 Phased Implementation

The development of the Cross-Media Database has been an incremental research and design effort closely tied to grant funding. The overall goal has been to create an information system that can suitably scale to the research and grant activities of the Redlands Institute's various projects. Although the desire was to execute an elaborate, multi-functional system in one design and build phase, a practical approach tied to grant funding was adopted using phased implementation. Phases have included:

4.1 Phases One and Two: The Cross-Media Prototype

A Salton Sea Database Program Strategic Plan conducted a stakeholder needs summary, GIS data requirements analysis, and implementation strategies for publishing and disseminating GIS and bibliographic data online. Concept workshops were held in which the idea of a web-accessible cross-media database to promote area information stewardship for environmental and community management was formalized.   The initial version of the cross-media system was built in Microsoft Access to serve as a prototype model for addressing  issues concerning data migration, integration and normalization of the two different database schemas into a single relational database. The cross-media system prototype was developed as a web-enabled database application using Allaire's ColdFusion as the web development tool and stored on Microsoft's Internet Information Server platform.

The prototype handles several digital resources including: geospatial resources, images, events, web sites, documents, people, and organizations (Figure 1).

Figure 1 Cross-Media Database Concept

Figure 2 shows the Microsoft Access relationship view of the prototype. Tables include: entity base, entity type descriptions, entity locations, dates, relationships, relationship descriptions, format descriptions, organization, person, gis, documents, and websites.


Figure 2
Relationship View of XMDB Data Model
Figure 2 Relationship View of XMDB Data Model

The prototype tables store the following information:

Relationships.  Relationships are inherent to information. Information resources can have topical, temporal, and spatial relationships. The prototype explores the possibilities of reducing repetitious and time consuming cataloging chores while enhancing serendipity during the research process.

Returning to Figure 2, the document table includes:

Missing from the document table is the field 'author.' The author of a document is actually handled through a relationship. For example, a Salton Sea project researcher is an author of a report and also an organization employee. Rather than entering the person information twice, once as an author, and once in an employee, the information is entered once in the person table. The person record is then related to the appropriate document and organization records with descriptive relationship labels. When the report is retrieved from the database, the person's name appears as the author and includes related information such as additional documents, membership organizations, etc.

Figure 3
Descriptive Relationship Labels

Figure 3 Descriptive Relationship Labels

This type of relationship building can facilitate links between information and information, information and people, and people and people to form more patterns among database resources than are possible through keyword indexing alone. This approach to relationships in essence mimics that of hyper-linking between html documents on the World Wide Web, albeit in a more structured fashion. This approach also reduces redundancy between multiple metadata standards, a point discussed in greater detail later.

Client View of Relationships.  First published on the Web in August 2000, the cross-media database system prototype has been available for approximately 1 year at http://institute.redlands.edu/salton  Figures 4 and 5 present an example of the client view of record relationships. Figure 4 is a record for a Salton Sea Restoration Project website. The text between the two gray lines is data about the website. Below the second gray line is a link icon with the path to the website, below that an icon of a person. The person represent a relationship to the website, the primary content author. The person hyperlink brings specific information about the primary content author and additional related resources (Figure 5).


Figure 4
Salton Sea Restoration Project Website XMDB Record
Figure 4 Salton Sea Restoration Project Website XMDB Record


Figure 5
Primary Content Author Information and Related Resources
Figure 5 Primary Content Author Information and Related Resources

4.2 Phase Three: Lessons Learned

The cross-media database prototype was tested for cataloging and searching against the project needs of a tool to foster area information stewardship. As previously stated, the ultimate goal is not just to produce a digital archive, but to link people and organizations with similar research interests, jurisdiction, and mandates, and to provide the appropriate information for informed policy, dialog, and decision making. The testing phase determined the need for enhanced capabilities including:

4.3 Phase 4: Redesigned Data and Platform Model with Enhanced Functionality

Spatial Query and Metadata.  The prototype system has provided the minimal functionality that can be achieved through the integration of geospatial and multi-media metadata resources into a single relational database. At the simplest level metadata can be retrieved for geospatial and multi-media resources, but this alone does not exploit the power of GIS to combine spatial and text queries to perform analysis on geographically-referenced material.  The cross-media database redesign moves beyond the prototype to offer a graphical user interface with spatial and text query capability to access geo-referenced multimedia resources. The Redlands Institutes is particularly interested in the implications of such an application for building resource interoperability between knowledge communities.

A knowledge community can be broadly defined as a group comprised of single or multiple academic disciplines engaged in coordinated work and sharing data, knowledge bases, interests, methods, and terminology 8 . Metadata content standards have been developed to support knowledge communities.   In order for a physical or digital object to reach the broadest community of information workers, metadata must be made accessible through a number of metadata standards.9   As the number and complexity of metadata standards grows, supplying metadata for a single physical or digital object in each metadata standard becomes increasingly repetitious and time consuming.  In order to minimize the need to recreate metadata for each standard, metadata should be maintained in one normalized, interoperable standard.  The redesigned XMDB explores the ability to create bridges of access between knowledge communities by normalizing geospatial, bibliographic, and World Wide Web metadata standards .

Data Model.   Figure 6 below represents the enhanced data model based on stronger adherence to MARC, FGDC, and Dublin Core metadata standards. The concept of the "entity base" from the prototype remains, however resource types now include: documents, images, video resources, audio resources, internet resources, data, models, studies, GIS resources, events, organizations and people.


Figure 6
Redesigned Data Model Diagram
Figure 6 Redesigned Data Model Diagram

A preliminary list of metadata for each of the aforementioned resource types was drafted based on the review of separate metadata standards for each resources type. The list of metadata was then compared against the prototype data model and a crosswalk between the data model and MARC/FGDC/Dublin Core is under development. An in-depth analysis of supported descriptive relationships is also being conducted.

The new data model will embed the MARC/FGDC/Dublin Core crosswalk as field name aliases. Most commercial metadata management systems that were evaluated adhered to a single metadata standard. Those that did support multiple standards did so in a manner that required a resource to be entered into templates for each standard. For example, a document would be entered once in MARC, and again in Dublin Core. A GIS resource would be entered once in FGDC and once again in Dublin Core. This approach also presented a problem with creating relationships between resources, a fundamental function for using the cross-media database to create complex information patterns.  The data model above investigates the approach of entering the data once and extracting the information through client-based Z39.50 specifications - a library catalog would receive MARC coded data, an FGDC clearinghouse would receive FGDC coded data, and Dublin Core to enhance general resource discovery on the World Wide Web. The deployment diagram below (Figure 8) is a conceptual design for deploying the software application components as client browsers, web pages, and services operating at specific system nodes.


Figure 7
Deployment Diagram
Figure 7 Deployment Diagram

 5.0 Summary

This paper has focused on the phased research and implementation of a cross-media database to foster information sharing between disparate knowledge communities.

As an ongoing research project the Redlands Institute Cross-Media Database has evolved through successive phases.

In the first phase, the Redlands Institute researched and developed a prototype to aid in refining the project's concept, utility, and desired functionality. The second phase included analysis of the prototype to develop an enhanced data model, an architecture, and technology specification The next phase included contracting software engineers to review and dissect functionality and provide timeframe and cost analysis, and match commercial software products that offer off-the-shelf-functionality. Arc IMS has been decided for publishing interactive map services to the web, and Oracle for metadata and digital resource storage and enhanced text queries.

The current phase of the project is to apply selected software technologies by designing and building a scalable framework for system implementation of the most critical functions first. The new XMDB is scheduled for release by August 2001. At that point the system will be advertised and registered with gateway/clearinghouse sites. The data model will be evaluated for providing seamless interoperability among targeted knowledge communities. User testing and analysis will focus on the effectiveness of spatial querying, distributed digital geographic information, and enhanced capabilities for searching, browsing, and downloading information.

 6.0  References

1.  Day, Michael. (nd). Slide Presentation:  Metadata and electronic information. Retrieved June 22, 2001, from the World Wide Web: http://www.ukoln.ac.uk/metadata/presentations/circe/tsld004.htm

2 & 3. McEathron, Scott R. (nd). Slide Presentation: Metadata: just data? Retrieved June 22, 2001, from the World Wide Web: http://www.lib.uconn.edu/ris/Liaison/Metadata/tsld004.htm

4. Lewis, W. (nd). Great Moments in the History of Technical Services.  Retrieved June 22, 2001 from the World Wide Web: http://sun3.lib.uci.edu/~techsvcs/tssb/tshist.htm

5. Library of Congress. ( 1999, July 14).  Jefferson's Legacy A Brief History of the Library of Congress: The Library of Congress, 1800-1992. Retrieved June 22, 2001 from the World Wide Web: http://www.loc.gov/loc/legacy/loc.html

6. Library of Congress .(2001, May 31). What is a MARC Record, and Why is it Important? Retrieved June 22, 2001 from the World Wide Web: http://lcweb.loc.gov/marc/umb/um01to06.html

7. Dublin Core Metadata Initiative .(nd). An Overview of the Dublin Core Metadata Initiative. Retrieved June 22, 2001 from the World Wide Web: http://dublincore.org/about/overview/

8. Van House, Nancy A. (nd). Abstract: Digital Libraries and the Practices of Trust:  Networked Environmental Information. Retrieved June 22, 2001 from the World Wide Web: http://www.sims.berkeley.edu/~vanhouse/seabstract.html

9. Pierre, Margaret and LaPlant, William P. (1998, October 15). Issues in Crosswalking Content Metadata Standards. Retrieved June 22, 2001 from the World Wide Web: http://www.niso.org/crsswalk.html

Additional References

Redlands Institute. (2000, June 12). Cross-Media Database Conceptual Design and Present Status. University of Redlands, Redlands, California.

Redlands Institute .(1999, July 18). Salton Sea Database Program Strategic Plan: An Operational Strategy based upon Staekholders' Needs. University of Redlands, Redlands, California.

Weitzman, Eric. (2000, August 1). SSDP Cross-Media Database Workplan. LandTime, Inc. Sunnyvale, California.


Jacqueline Lesch
Digital Library Administrator
Redlands Institute
1200 E. Colton Ave.
Redlands, CA 92373
Phone: (909) 335-5268
Fax: (909) 3076952
lesch@institute.redlands.edu