Dan Williams

Automated Capture of Metadata: Simple Procedures and Tools for Editing Coverages

The quality of a given geospatial data set is only as good as its metadata. Metadata, data about data, is getting more attention from data managers, generators, and users as dependence on and understanding of digital geospatial data grows. The establishment of the Federal Geographic Data Committee and their publication of the Content Standards for Digital Geospatial Metadata are important events for the oversight and standardization of metadata use. What comes next is the hard part: implementing these standards in a productive and meaningful fashion.

Having a standard to guide the production of geospatial metadata is important. But the real challenge is in making the generation of that metadata actually happen. The process must be transparent to those generating geospatial data. Otherwise, metadata requirements will be viewed as administrative overhead; nice-to-have extras we'll get to once the "real" work is done. ArcInfo has built-in tools for collecting metadata during coverage creation and editing. This paper introduces a process implemented with AMLs that uses several data capture tools in the Arcedit module to automatically capture metadata during an edit session. It also presents a process that encourages some good sense procedures for processing coverages. The AMLs function in concert with the Esri Arctools so that data capture functions while EditTools are in use.

Metadata is getting a lot more attention these days. Now that the initial GIS gee-wiz has worn off and end-users are starting to put the technology to practical use, they want to know its limitations. In GIS, the limitation is almost always the data. No matter how many bells and whistles in the software tools GIS analysis is only as good as the data with which it works. If you want to know how, when or why the data was created, then you need metadata: data describing the data.

When looking at a map - let's assume it's a professional, cartographically superior map - there is metadata printed along the borders and throughout the legend of the map. Scale, publication date, projection information, etc. are all right there in the field of view. However, if you've ever watched a new GIS user sit down to a workstation and start to browse their hometown roads issues of scale are not apparent. Invariably, they'll zoom right in on their own little street corner and announce, "I live here." When, all the while, you know that street data is from a 1:100000 scale source and this guy's zoomed in to about 1:100 and you don't want to burst his bubble but .... And so you educate the new guy as so many of us have been and all of a sudden the new guy wants to know about metadata. Of course, metadata itself has been around for sometime. But one user's metadata may not look like another's. And one organization's metadata requirements are invariably different from the next. And unless a data set and its metadata were originally created with the intent of wide dissemination the information contained in the metadata is not likely to be very useful beyond its intended use. For example, don't expect the roads layer currently in use in the tax department to be much use to the new school bus dispatch system. The tax department probably doesn't care very much about how, by whom or when the roads layer was created so they probably don't have much metadata for it either. While metadata content requirements have been debated professionally for the past several years, it wasn't until recently that any official guidelines were available. The Federal Geographic Data Committee (FGDC) published it's Content Standards for Digital Geospatial Metadata on June 8, 1994. Now there's a standard by which our metadata can be judged and all of a sudden we can say something like, "this metadata meets the FGDC Metadata Standard." (Of course, that is not always true but that's another issue entirely. There are other standards on the way, too.) While the metadata standards will continue to be debated and refined, at least there's a standard to be referenced and we're a lot better off than we were with no reference.

Implementing metadata standards is the next step and it's not going to be a simple feat. Remember back in the good old days of programming when software was developed by magic? A need would arise, a contract would be awarded, the one paragraph requirement would be handed to the growling pack of programmers and off they would go, shredding the requirements to pieces as they dragged it out of sight into their computer den. A couple months later a "program" would be released, shredded by the customer, and then fed back to the programmers for more work. This cycle continued until the money ran out or the customer couldn't argue with the fact that the "program" did technically meet the one paragraph of requirements. Hence, Software Engineering was born. I won't belabor the history lesson but, in a word, software engineering is documentation. Lots of carefully structured documentation aimed at clearly defining the requirements of a software product and tracing its production history. One important result should be that when it comes time to change/improve the product it's a simple matter to analyze the code and figure out what to change. This has an interesting parallel to metadata. The whole reason for metadata existence is so that we can look back at the history of a data set and make an assessment of its quality. With thorough and robust documentation (metadata) it should be an easy assessment.

Once the software industry woke up and realized that without documentation their software products couldn't be maintained, improved, or reused (not to mention they didn't work), the software executives wrote a memo and henceforth all the programmers followed good software engineering practice and everything was fine. Right? Not. In fact, the biggest challenge in getting this stuff to work was getting the people who actually do the work, to change their ways. The software industry is still struggling with this task. It takes a long time to change paradigms. But they're making progress and one of the ways they're getting there is to automate as much of the process as possible. Modern programming environments come with a host of tools to automatically format and document code. In this way the programmer is forced by her tools to implement coding policies that she used to laugh at or maybe pay lip service to. It's working and it could work for metadata too. The hardest part of producing metadata is getting the data producers to write it. Part of the solution is to automate its production during the normal course of data creation.

ArcInfo does this to a limited extent. The log file is a record of some of the processing that takes place on a coverage. And there are some tools in Arcedit which allow the user to produce a report of editing changes. But, except for the log file, the rest is optional. The USGS Water Resources Division and the EPA have produced a real gem of an AML in their DOCUMENT ATOOL. This is a suite of AMLs meant to implement the FGDC Metadata standards as much as possible through a menu driven interface. This is a very thorough documentation system and is an excellent tool for walking a user through meeting the documentation requirements of the FGDC standard. However, there is still a gap, as I see it, in the process. There is no record of changes made to data during an Arcedit session. As the DOCUMENT.AML is implemented, it's up to the user to use a text editor to create a "narrative" file to document any changes made to the data. This paper presents two AMLs, EDIT.AML and DONE.AML, which automatically produce an INFO file of all the actions performed during an Arcedit session and any changes made to edit tolerances. The capture is transparent to the user and, thus, doesn't require a change in their normal work process. This file combined with the Arc LOG file result in a complete record of all processing performed on a data set during its lifetime. And since the file is an INFO file, it becomes part of the edited coverage staying with it through copy's and exports.

The idea behind these AMLs is simple: capture all the operations performed during an edit session so that if there is ever a question about the locational accuracy of a feature a record will be available to trace its processing history. Using the &WATCH file feature and Arcedit's STATUS TOLERANCES and AUDIT TRAIL commands, every action taken during an edit session can be captured as well as the tolerances used for automatic adjustment features like SNAPPING and INTERSECT. Though it would be tedious to wade through the &WATCH file produced during an extended editing session, solid answers to specific questions about feature accuracy could be derived from the collected data. Finally, loading this text data into an INFO file named and stored like the other coverage INFO files would automatically bind the data to the coverage. I combined this with a common sense process for naming and handling coverages to produce the following Coverage Editing Process Flow Chart:

As you can see, there are two distinct phases to the editing process. What happens before editing, and what happens after. The AML's are simply a means of automating the flow chart. EDIT.AML automates the first part, and DONE.AML automates the second part. I've incorporated the option for the user to invoke the ARCTOOLS suite of EDITTOOLS once the edit session is underway. This required some modifications to a few EDITTOOLS so the &WATCH file would remain active while EDITTOOLS was running. I've also copied the Esri method of naming coverages that was used in my ArcInfo training classes. I find it clear and concise. There's no requirement to use this method. If you find your own naming convention easier to understand, then you should use that.

I've also mentioned something called a diary' file in the final step of the editing process. This is the equivalent of the narrative file which is part of the DOCUMENT.AML metadata implementation. It's probably the most useful piece of coverage metadata because it's the GIS processor's description of the processes, tools, and utilities used to develop a spatial data set, the reasons behind actions, and a description of the processor's objective in his actions. This can probably never be automated beyond providing a format and a list of pertinent topics. The only way to make sure this piece of metadata is produced is sound leadership in support of an active metadata collection policy in an organization that is truly committed to data quality.

The AML's are shown below, but in order to work correctly with EDITTOOLS, the following modified EDITTOOLS AMLs must replace the existing versions: EDIT_TOOLS.AML, AEDRIVER.AML, CF_BATCHMATCH.AML. These AML's, installation instructions, and the text of this paper are available via anonymous ftp from newt.semcor.com in the pub/amls directory.

EDIT.AML

/* The following text is an Arc Macro Language utility.
/*
/* Name:      EDIT.AML
/* Companion: DONE.AML
/* Purpose:   EDIT.AML will initiate an Arcedit session following
/*            the recommended ENVEIS documentation procedures.  
/* Arguments: COVER - name of original coverage to be edited
/*                EDITCOV - duplicate coverage of COVER.
/* Variables:      SUFFIX - the text '.wat'    
/*             WATCH_FILE - watch file name created by concatenating
/*                          EDITCOV and SUFFIX.
/* Procedure:  EDIT.AML requests an edit coverage if one is not
/*             specified on the command line.  The edit coverage
/*             is then copied to a new coverage name, Arcedit is started,
/*             a watch file is initiated, the edit coverage is defined,
/*             and the tolerance status is listed.
/*
/* Author:  Dan Williams    January 1994

&args COVER EDITCOV

/* if user did not specify original cover on command line */
/* then select one now.                                   */
&if [null %COVER%] &then
  &s COVER = [getcover * -ALL 'Select an edit coverage:']
  &if [null %COVER%] &then &return &inform Coverage not specified

/* specify name of final edited coverage name if not already done */
/* or if preexisting.                                            */
/* this forces user to create a new edited version               */
/*
&do &while [null %EDITCOV%] or [exists %EDITCOV% -cover]
  &if [exists %EDITCOV% -cover] &then &type *** %EDITCOV% already exists ***
  &s EDITCOV = [response 'Enter name for edited version of '%COVER%' ']
&end

copy %COVER% %EDITCOV%

/* NOTE: This size is set to the same proportions as an 11"x8.5" page  */
/*       It's just my personal preference and has no effect on the AML */
display 9999 size 800 618 position ur screen ur
arcedit

/* create a watch file with same name as cover and suffix .wat */
&s SUFFIX = .wat
&s WATCH_FILE = %EDITCOV%%SUFFIX%
&WATCH %WATCH_FILE% &COMMANDS &COORDINATES

mapex %EDITCOV%
edit %EDITCOV%

/* list start-up tolerances */
status tolerances

&type Arcedit session initiated.
&type Use the 'done' command instead of 'quit' when editing is complete.

&if [query 'Run Edit Tools' ] &then &do
  &r $ATHOME/drivers/edit_tools INIT
&end

DONE.AML

/*  The following text file is an Arc Macro Language utility
/*
/*  Name:    DONE.AML
/*  Companion: EDIT.AML
/*  Purpose:   This AML completes a standard Arcedit session.  It is intended
/*             to be executed from the Arcedit prompt after the EDIT.AML
/*             utility has already been executed.
/*  Variables: SAVE_COV  - if the user chooses to save changes to a different
/*                         coverage, this var will be assigned the name of
/*                         that coverage.
/*  Procedure: Four steps are accomplished to complete the edit session: 
/*             1) list tolerances; 2) list audittrail; 3) save changes;
/*             4) quit Arcedit.  The &do loop is necessary to handle the
/*             possibility that an existing coverage name is entered to save
/*             to.  If this is not trapped, the save command will fail
/*             stopping execution of the AML.
/*
/*  Procedure: Added additional step to load watch file into INFO file
/*  4/11/95    This idea came from coding by Mark Hoel and the folks at USGS
/*             who write DOCUMENT.AML.  I like it because it automatically
/*             ties the edit session procedures to the coverage.
/*
/*  Author:  Dan Williams, SEMCOR,Inc.    April 1995

/* List tolerances */
status tolerances

/* List audit trail */
audittrail full

/* Trim path from EDIT_COV */
&s EDIT_COV = [ENTRYNAME [SHOW EDIT]]

/* Give SAVE_COV non-null value to execute &do loop at least once */
&s SAVE_COV = NOTNULL
/*  Execute until valid coverage name selected to save */
&do &until [null %SAVE_COV%] or ^ [exists %SAVE_COV% -cover]

  &s SAVE_COV [RESPONSE 'Enter name of coverage to save ([CR] for '%EDIT_COV%')']

  &if [exists %SAVE_COV% -cover] &then
    &type *** %SAVE_COV% already exists ***

&end    

&if [null %SAVE_COV%] &then
  &do
    /* save current edit coverage  */
    save
    /* set ED_FILE to EDIT_COV.ED */
    &s ED_FILE %EDIT_COV%.ED
  &end
&else
  /* save to a newly named coverage */
  &do
    &type ***WARNING***: Watch file has different name than coverage.

    /* confirm that user wants to save to new coverage */
    &if [query 'Save to '%SAVE_COV%] &then
      &do
        save %SAVE_COV%
        /* set ED_FILE to SAVE_COV.ED */
        &s ED_FILE %SAVE_COV%.ED
      &end
  &end

/* capture name of watch file before closing */
&s WAT_FILE [LOCASE [SHOW &WATCH]]
&watch &off

quit

&type Arcedit session complete
&type Please stand by while leaving Arcedit and creating INFO
file: %ED_FILE%

/* load the watch file into an INFO file.  That way it stays with
the coverage. */
&data ARC INFO
  ARC
  DEFINE %ED_FILE%
  TXT_LINE
  80
  80
  C

  GET %WAT_FILE% COPY ASCII
  Q STOP
&end

/* For now, leave the watch file.  May want to automatically delete it */

The result of all this is simply the &WATCH file which captured all your editing actions and an exact copy of it in an INFO file with the coverage name and the suffix .ED'. Using EDIT.AML forces the user to edit a copy of a coverage rather than the original. This results in a separate coverage for each editing session. Though it may take up a bit of disk space, the result is a clear progression of coverage editing changes which can be traced through the .ED file in each coverage's INFO directory. Another nice feature is you have several versions of a coverage along its progression to final form. If you decide you must undo some changes, you have several versions in various states of change to choose from. Hopefully, you won't have as much rework. I try to organize my editing sessions into efforts focussed on a particular feature type or geographic area. That way if I screw it up I've only screwed up that one area and not the whole coverage. To minimize disk space use, I export all the coverage versions, tar them into one file and compress the tar file.

Combining the use of these AMLs with the use of the USGS's DOCUMENT.AML can produce a pretty decent collection of metadata on a coverage. The whole process is still voluntary, but with these tools available, producing metadata is a much simpler task. GIS Managers will get a better reception when placing metadata requirements on their employees if they also give them the tools and procedures to meet them. The result, hopefully, is better and more useful spatial data documentation.

Dan Williams, Principal Engineer
SEMCOR, Inc.
65 West Street Road
Suite C-100
Warminster, PA 18974
Telephone: (215) 674-0200
Fax: (215) 443-0474
E-mail: williams@semcor.com