Sudha Ram, Michael R. Kunzmann, Jongseo Kim and Jeff
Abbruzzi
Abstract
To increase the efficiency of data collection and subsequent environmental
decision making the rate of data collection must be increased. In addition,
the costs associated with managing the information and making the information
available to the public in a user friendly interface is an important goal for
federal land managing agencies. To facilitate these goals and the goals of the
National Biological Information Infrastructure (NBII) program we have
developed a digital "living" library that promotes data distribution and
contribution using a www interface. By providing the capability to share
data and resources, important ecological data can be updated and distributed
more efficiently. In addition, landscape-level determinations derived from the
digital library can be more readily applied by management in day-to-day
decision making processes and integrated with long-term land management policy
development. Our digital library system not only allows users to search for
and retrieve spatial data sets and other information, but is also able to grow
dynamically through users' contributions of data. To facilitate data
"harvesting" numerous tools and data protocols had to be developed to increase
the the overall utility of the library and to assist with the identification
of user needs and other data management issues such as security. The automated
collection of natural and cultural resource information and associated
metadata decreases maintenance requirements and delivers on the promise of a
digital "living library".
Introduction to Saguaro Digital Library
The
Saguaro Digital Library (SDL) is a comprehensive digital library system
providing a full range of services to facilitate our understanding of the
impacts of natural and human environmental hazards, to provide models of
environmental change that can access and utilize data, processing tools and
algorithms across the Internet, and to provide a wide range of users the
ability to obtain quantitative measures of this change.
The primary focus of the SDL is to facilitate the responsible stewardship
of our natural assets and good ecosystem management. The SDL directly
addresses the goals of the National Biological Information Infrastructure
(NBII) by developing the capability to share data and resources so that
biodiversity and ecosystem findings can be more readily applied in management
and policy. The ultimate goal of the SDL is to allow components of the digital
library to evolve independently and yet be able to call on one another
efficiently and conveniently. Thus, the digital library will support
heterogeneous and federated collections of digital content, including data,
metadata, models, tools, and algorithms. The digital library will specifically
provide decision support tools to improve monitoring of ecosystem status,
better predict and mitigate change, and optimize sustainable productivity. The
following figure describes the overall architecture of the Saguaro Digital
Library.
The Saguaro Digital Library is a joint project being developed by a
consortium of University research groups as well as Federal and State agencies
in conjunction with industrial partners. State Agencies partnering in the
Saguaro Digital Library project include the Arizona State Lands Department,
the Arizona State Cartographers Office, the Arizona State Geological Survey,
and the Arizona Geographic Information Council. Federal Agencies participating
in the project include, the United State Geological Survey (USGS) Sonoran
Desert Field Station, the US Army, the Rocky Mountain Research Station, Los
Alamos National Laboratory, the Nature Conservancy, and the US National Park
Service. Industrial partners involved in the project include Online Computer
Library Center Inc. (OCLC), Raytheon STX, and Simons International
Corporation. K-12 partners include Lawrence Intermediate School, Fort Lowell
Elementary School, and the Vail School District from Arizona. The development
of the library is led by the Department of Management Information Systems in
collaboration with other departments at the University of Arizona (UA)
including, the UA Library, Hydrology and Water Resources, Arid Land Studies,
Geography, Electrical and Computer Engineering, the Arizona Regional Image
Archive, and Renewable Natural Resources.
Resource Harvesting
For the most part, digital libraries are mausoleums, or in other words,
static information sets that require periodic updates. Such updates are often
temporary and costly. Thus, in reality, a digital library should support
dynamic resources that can be updated by users or an information provider at
any time. By employing appropriate data management techniques to assure
quality, security, searchability and a means to generate metadata information,
it is possible to create dynamic digital libraries. However, in order to
collect the information, it is essential to provide digital forms for the
appropriate information set and establish update rules. Furthermore, tools
allowing new resources to be added and old/obsolete resources to be removed
periodically must support the evolution of the library.
Providing mechanisms for resource harvesting and supporting dynamic
evolution of digital libraries would have enormous practical benefits to the
general public at large. Potential benefits include the following:
(1) reduced cost of obtaining updated information
(2) direct communication with the end user to learn what information is
used and why
(3) creation of a cooperative atmosphere to encourage information
exchange
(4) improved cost effectiveness of research efforts by expediting and
increasing information flow
(5) mechanisms for direct public participation in the information gathering
process
In addition, it creates an environment where users and the public have a
significant role and responsibility to add information they deem necessary to
effect the outcome maps or decision surfaces that may be created through
information exchange.
Architecture
The Harvest System is composed of a Harvester, Metadata, XML Parser,
Metadata Storage and Templates (Overall
Architecture of Harvest System). Harvester provides users with a
user interface through which users can submit the URL for GIS data and its
metadata. According to the user's input, Harvester determines which
template file to display. If Harvester receives an XML file, it creates
Metadata after extracting the metadata from that file using the XML
Parser and then saves the Metadata information in Metadata
Storage. The following is a description of each component of the Harvest
System.
Harvester provides users with a user interface for data submission as
well as retrieval of an URL for GIS data or its metadata in an XML file
format. Harvester consists of MetadataHarvest.java extending ADRGServlet.java, MetadataUpload.java, MetadataAccess.java,
MetadataFormReader.java and HarvesterUser.java extending ADRGUserImpl.java. Users can select from among the
following three options:
1) No digital metadata available. Help me to create it.
2) Use existing metadata as a template for creating new metadata.
3) Submit metadata as an FGDC XML file.
According to the users selection, Harvester retrieves metadata from
Metadata Storage or works with XML Parser to handle what the
user has submitted. Also, Harvester provides a user interface for using
files in Templates.
Metadata has been created following the Content Standard for Digital
Geospatial Metadata approved by the FGDC, and consists of MetadataComponent.java, Metadata.java, MetadataLists.java,
FGDCCompliance.java, Cntinfo.java, Citeinfo.java, Timeinfo.java, Mdattim.java,
Rngdates.java, Cntaddr.java, Keywords.java, Metextns.java, and DomainException.java. Metadata is used to create a
metadata object for Digital Geospatial Metadata from data extracted from a
parsed XML file which users submit, data retrieved from Metadata
Storage, or data entered into the Metadata Form page provided by
Harvester.
An XML parser for Java called XML4J developed by IBM was extended to create
XML Parser. Cooperating with Harvester, XML Parser parses
an XML file that users submit to create a metadata object and store it in
Metadata Storage. XML Parser consists of HarvestXMLParser.java and DOMParserSaveEncoding.java.
Considering the design of Servlet, we used WebMacro, an HTML template
engine and back end servlet development framework. Templates includes
main.wm, metadata.wm, metadatamain.wm, xmlview.wm,
htmlview.wm, compliant.wm, notcompliant.wm, error.wm, metaregister.wm,
login.wm, loginfailure.wm, dbchoice.wm, changepw.wm, changepwfailure.wm,
changepwsuccess.wm, register.wm, registerfailure.wm, registersuccess.wm,
and thankyou.wm. Harvester uses the
above template files to display the user interface.
Harvesting System Description
The Arizona NBII Metadata Harvesting System is a precursor to the Saguaro
Digital Library, a confederation of natural resources data which dynamically
grows through user contributions. Although many individuals and organizations
have GIS data sets which they are willing to make available to the public,
metadata is often not available for these data sets, or is not present in a
form which can be easily processed digitally. The harvesting system addresses
two key challenges: storing GIS metadata in a form which allows for maximum
flexibility in performing searches and presenting this data, and providing a
mechanism for users to easily create and contribute metadata for their
coverages.
The system stores metadata, and although it is capable of also storing the
GIS data files themselves, it is designed primarily to hold only references to
such files somewhere on the Internet. Using our system, a user can place
database files in an accessible location on the Internet, and then use our
system to submit or create metadata for these data files. Once this is
completed, users can use our system to search for and locate these data files.
This makes it possible for organizations to publish their data without
redevelopment of a web interface for searching for and downloading this
data.
The metadata creation process begins in one
of three ways. The user may choose to create the metadata from scratch, in
which case they are presented with a
blank metadata form. Alternatively, the user may choose to create metadata
using an existing metadata record as a template. In this case, the system
presents a
list of existing metadata records from which the user may choose. The
system presents the same metadata form, but the fields are filled with the
values of the metadata record used as a template. This is very helpful when
creating a series of metadata records which are mostly similar, since only
that data which differs must be changed before submitting the new metadata
record. Finally, the user may have existing metadata for the data set, in
which case producing metadata using the form is tedious an unnecessary. In
such cases, the user may upload the metadata as a valid XML file corresponding
to the FGDC Content Standard XML Document Type Definition. Many organizations
use the USGS Metadata Parser (mp) by Peter Schweitzer to prepare
FGDC-compliant metadata. This tool has an XML output option. Metadata in a
variety of forms can thus be converted to XML format and uploaded to the site.
The system parses the XML file and then fills the metadata form with the
appropriate data. Any desired changes can be made, although a compliant
metadata record will not require them - the metadata form can then be
submitted to place the record in the database.
Even if the user chooses to create entirely new metadata, the system
provides assistance to make the process as fast and easy as possible. Users
must register at the site in order to use the system. This enables us to track
the way the system is used, but it also enables users to optionally
submit personal information such as their address, phone number, email and
other details. When a user creates new metadata, this information is
automatically included in the metadata form, eliminating the need to enter the
data each time a submission is made. When creating metadata using another
record as a template, the user has the option of using the original contact
information, or replacing it with his or her own. Fields which have a limited
set of values, or a list of recommended values specified in the FGDC content
standard, include those values in pull-down menus for easy metadata creation.
The value lists for other pull-down menus, such as the menu for the
"Originator" field, are dynamically pulled from the database. If the desired
value exists in a previously entered metadata record, the user may choose it
from the menu rather than typing the value. Thus, users can very quickly
create metadata with a minimal amount of typing.
When the user has finished creating the metadata and submits the metadata
form, the system checks it for FGDC compliance, and reports
specifically the errors which it finds - a mandatory field which is
missing, or a numeric value outside the allowable range. Understanding and
correcting these errors is made far easier through an online help system which
provides information about the FGDC content standard requirements for each
metadata field. Clicking on the field name in the metadata form causes a
new window to appear which contains abbreviated FGDC content standard
documentation. Once the user submits a fully-compliant metadata record, he or
she may review the record as an XML file. Currently, this view supports only
raw XML delivered as an XML or text
data type. In the future, we would like to enhance the display options for
the data. Internet Explorer 5 provides a
tree structure interface for examining raw XML files which by itself is a
useful viewing tool for this data. The XML files is useful for more than
review. The user may save the file to their local system, and then process it
through mp to create FGDC-complaint metadata in a variety of formats, for
application to other purposes completely unrelated to Arizona NBII data search
and services. Even users who choose not to actually submit their data may find
the automated metadata creation tools useful.
The set of metadata fields used by the harvesting system does not include
the entire FGDC metadata standard. We chose to include all fields which are
identified as mandatory, and any other fields which are useful for metadata
search. Because the system was primarily designed to capture data which would
enable users to search for, locate and download data sets, those optional
fields which would rarely be used for search criteria were left out of the
system. The result is a subset of important metadata elements which allow the
user to create a minimal fully-compliant metadata record quickly and easily.
Although the richness of the FGDC standard is lost, this decision makes the
system more practical, efficient and user-friendly. In addition, it eases the
significant database design burden placed upon developers by the FGDC content
standard, which is highly recursive and irregular. Its complexity makes it
very difficult to represent in a relational database, although storage of
metadata in a database system, as opposed to collections of files or
file-based systems, provides a more scalable, efficient, flexible and secure
solution.
Future Direction and Conclusion
This Harvest System project is an attempt to develop a Harvesting Agent
(HA) for the Saguaro Digital Library in support of resource harvesting.
The Harvest System provides users with a friendly user interface through which
they can submit geographic data and its metadata. The Harvest System also
offers three options for users who have geographic data but not its metadata,
and helps less-advanced users to submit metadata.
The Harvest System will be able to be extended as follows:
- Extension of the Metadata Information
- Integration of this system into the larger context of the Saguaro
Digital Library
- Submission of indented text files as metadata
The current system has been developed based on Identification Information
and Metadata Reference Information. In the future, it will include Data
Quality Information, Spatial Data Organization Information, Spatial Reference
Information, Entity and Attribute Information in addition to Distribution
Information to allow users to describe their metadata in more detail. The
result is that it can then provide an advanced search option with which people
can access the exact metadata they need.
The effort to develop this Harvest System has been largely been oriented
toward information harvesting. However, since the ultimate goal is to also
share geographic data and to increase knowledge, users should be able access
and evaluate our data sets. To achieve this goal, the system will also be
search-enabled in the future. People will be able to submit geospatial
information they have gathered and share it with other users.
Even though the recommended file format by FGDC is XML, oftentimes people
have an indented text file as their metadata. Also being considered is the
ability to extend the system to receive indented text files, parse them to an
XML file using mp, validate the XML file and then follow the same process when
an XML file is received.
The overwhelming hope for this system being developed is that it will not
only help people to submit and share more geospatial information, but that it
will also help people better understand the interdependence between the
economy and the environment, actively conserve biodiversity, and protect
natural ecosystems to preserve the quality of human life.
Acknowledgements
We would like to express our sincere thanks to the following organizations
and individuals for their expertise and willingness to contribute funding,
information, or time: The Eller College of Business and Public Administration,
The College of Agriculture, The University of Arizona Advanced Resources
Technology group (ART), The Arizona State Cartographers Office, The Arizona
State Lands Department (ALRIS), the USGS National Biological Information
Infrastructure program (NBII), the USGS California Science Center, The
University of Arizona Library, The Arizona Remote Sensing Center, The
University of Arizona College of Arts and Sciences, and the many University of
Arizona graduate students that provide many hours of code checking and data
entry to make all of this happen.
References
[1] The Saguaro Digital Library for Natural
Asset Management
http://vishnu.bpa.arizona.edu/projects/saguaro.html
[2] Sudha Ram, Jinsoo Park and Dongwon
Lee,
Digital
Libraries for the Next
Millennium:
Challenges and Research Directions, Information Systems
Frontiers
1:1, 75-94,
1999
[3] Coordinating Geographic Data Acquisition and
Access:
The
National Spatial Data Infrastructure
http://www.fgdc.gov/publications/documents/geninfo/execord.html
[4] Content Standard for Digital Geospatial
Metadata, Version 1.0
Metadata Standards Development, The Federal Geographic Data
Committee
http://www.fgdc.gov/publications/documents/metadata/metav1-0.html
[5] Rames Elmasri and Shamkant B. Navathe,
"Fundamentals of Database
Systems"
Second Edition,
Addison ?Wesley Publishing Company, 1994
[6] Jinsoo Park, "Facilitating Interoperability among
Heterogeneous Geographic Database
Systems:
A Theoretical
Framework, A Prototype System, and Evaluation", 1998
[7] Jason Hunter with William Crawford,
"JavaTM Servlet Programming", O'Reilly, 1999
[8] Doug Tidwell, XML Programming in Java
http://www-4.ibm.com/software/developer/education/xmljava/
[9] WebMacro Java Servlet Framework
http://www.webmacro.org/
[10] Declaring Elements and Attributes in an XML DTD
http://www.informatik.tu-darmstadt.de/DVSI/staff/bourret/xml/xmldtd.html
[11] Todd Freter, XML: Mastering Information on the Web
http://www.sun.com/980310/xml/
[12] STS Prasad and Anand
Rajaraman,
Virtual Database Technology, XML, and the Evolution of the
Web,
IEEE Computer
Society Technical Committee on Data Engineering, 1999
[13] Andrew V. Royappa, Implementing Catalog Clearinghouse With
XML and XSL,
pp
616-623 ACM, 1999
[14] Howard Smith and Kevin Poulter, Share the Ontology in
XML-based Trading
Architectures,
Communications of the ACM 42(3) pp110-111, March 1999
[15] Tim Bray, Beyond HTML: XML and Automated Web
Processing
http://developer.netscape.com/viewsource/bray_xml.html
[16] Norman Walsh, A Technical Introduction to XML
(1998)
http://nwalsh.com/docs/articles/xml/
[17] C. M. Sperberg-McQueen, What is XML and Why Should
Humanists Care?
http://users.ox.ac.uk/~drh97/Papers/Sperberg.html
[18] XML: Structuring Data for the Web: An Introduction
http://wdvl.com/Authoring/Languages/XML/Intro/fixing.html
[19] Udi Manber and Peter A. Bigot, Connecting Diverse Web
Search
Facilities,
IEEE
Computer Society Technical Committee on Data Engineering, 1999
[20] Jon Bosak, XML, Java, and Future of the Web
http://metalab.unc.edu/pub/sun-info/standards/xml/why/xmlapps.htm
[21] Robert J. Glushko, Jay M. Tenenbaum and Bart
Meltzer,
An XML
Framework for Agent-based
E-commerce,
Communications of THE ACM 42(3) pp. 106-114, 1999
Author Information:
Sudha Ram is a professor in the Management Information System Department
which is an integral part of the Eller College of Business and Public
Administration. The Eller College of Business and Public Administration is one
of the 17 colleges at The University of Arizona. Correspondence should may
addressed: Dr. Sudha Ram, Department of Management Information Systems,
McClelland Hall 430, The University of Arizona, Tucson Arizona, 85721. Ms. Ram
may also be contacted by telephone at (520) 621-4113 or by email mailto:Ram@bpa.arizona.edu
Michael R. Kunzmann is an Ecologist at the USGS Sonoran Desert Field
Station located within the School of Renewable Natural Resources. The School
of Renewable Natural Resources is in the College of Agriculture and is
centrally located on The University of Arizona campus. Correspondence may be
addressed: Michael R. Kunzmann, USGS Sonoran Desert Field Station, The
University of Arizona, 125 Biological Sciences East, Tucson, Arizona, 85721.
Mr. Kunzmann may also be reached by telephone at (520) 621-7282 or by mailto:mrsk@npscpsu.srnr.arizona.edu.
Jongseo Kim is a graduate student in the Management Information Systems
department at the University of Arizona. Correspondence should be addressed:
Mr. Jongseo Kim, Department of Management Information Systems, McClelland Hall
430, The University of Arizona, Tucson Arizona, 85721. Mr. Kim may also be
contacted by telephone at (520) 621-2328 or by email jskim@bpa.arizona.edu.
Jeff Abbruzzi has recently received a masters degree under the Management
Information Systems department at the University of Arizona. Mr. Abbruzzi is
in the process of relocating to Phoenix, Arizona to pursue a new career
opportunity. In the interim correspondence may be addressed: Mr. Jeff
Abbruzzi, Department of Management Information Systems, McClelland Hall 430,
The University of Arizona, Tucson Arizona, 85721.