Joel Schlagel

Integrating Aerial-Videography with ArcInfo for Satellite Image Interpretation and Accuracy Assessment in Vermont

Aerial videographic imagery, with frame location identified using global positioning system receivers, has been demonstrated to be an efficient and cost effective method for gathering ground-truth data for satellite image interpretation and post-classification accuracy assessment (Graham, 1993; Slaymaker, et. al., 1995).

Both the usefulness and effectiveness of air-video image interpretation can be enhanced by integrating video imagery with other spatial databases in ArcInfo. A series of programs, shell scripts, and AMLs, were written to facilitate this process. The user interface allows control of video tape recorders, and the real-time display of the flight position on a workstation monitor. Queries of coverages and multi-spectral imagery may be performed. The operator's interpretation of the video image, and data from the spatial queries are recorded for further analysis.

The air-video interpretation system is being used for the development of refined land-cover maps for Vermont in support of vegetation mapping for the Gap Analysis program of the National Biological Survey. The system is also being used for an investigation of the effect of training sample density on the accuracy of image classification. A number of other applications are being developed.


Introduction

Monitoring and management of wildlife populations over large geographic areas requires land-cover maps that are both spatially detailed and categorically rich. Of equal importance to map detail is a clear understanding of the accuracy and limitations of such maps. There is an extensive current literature addressing error in land-cover maps derived from remotely sensed imagery (e.g., Congalton, 1993; Dicks, 1990; Green, 1993; Lucas, 1994; Stehman, 1992), as well as the potential effects of map-based error on habitat models that rely on such maps (e.g. Stoms, 1992).

Development of accurate land-cover maps from remotely sensed imagery requires ground-truth data to inform both the classification and validation processes. This is true for supervised and unsupervised classification procedures, as well as 'heads-up' or on-screen digitizing methods. Yet relatively little has been written within the image processing literature on this initial stage of the map generation process (Warren, 1990). Outside of the image processing literature, sample size and distribution requirements, and their effects on the success of discriminant analysis are well researched (e.g., Williams, 1990). Within the field of image processing the procedures and recommendations for gathering ground truth data described by Joyce (1978) are largely operative today, i.e. the primary goal of the field effort is the development of normally distributed training signatures that reflect a uniform and homogeneous vegetation or land-cover condition. Assuming eight to 10 major land-cover types per scene, and accounting for varying terrain conditions, Joyce (1978) states that this can be accomplished with 90 to 150 training sample sites per Landsat scene.

The existence of mixed pixels, especially in largely human influenced landscapes such as that of New England (Schriver and Congalton, 1995), the strong effect of terrain on signatures, and the desire to extract many similar looking cover types argue for much larger ground truth data sets. Similarly, the adoption of image processing methods that acknowledge and address the variability of signatures within a given land-cover class (e.g., Foody, 1992; Foody and Cox, 1994; Slaymaker, 1995; VanDeusen, 1995) requires careful analysis of training and validation samples. Even several hundred training sites become 'diluted' as target classes are subdivided by slope, physiographic region, etc., leaving the interpreter with relatively few training or validation sites per class.

Acquiring and georeferencing a sufficiently large ground truth data-set from field visits to implement and validate a detailed image classification scheme can be prohibitively expensive. As a result, ground truth data are often assembled in an ad-hoc manner which may include visits to randomly selected field sites, roadside field visits, and the use of historic data. There are problems with using each of these data sources. Randomly chosen field sites can be remotely located, limiting the number that can be visited, whereas sites clustered along roads or other easily accessible locations may not be representative of all field conditions. Existing or historic databases may not reflect current conditions, may be of unknown positional accuracy, or may have been generated at an inappropriate spatial scale to be meaningful in the image interpretation or post classification validation process. Inappropriately collected training site data may themselves become a significant source of bias or error in the image processing project.

The use of air-video in training site selection allows for a different approach to traditional field visits. Rather than seeking a minimally sufficient number of points, the air-video approach described by Graham (1993) seeks to develop extremely large training site data-sets. The collection and analysis of a large number of training sites can result in a significant reduction in the number of misclassified pixels under a variety of image processing techniques. Graham (1993) and Slaymaker, et. al. (1995) have reported very high accuracies in their land-cover mapping efforts, despite working in completely different terrain, and using very different image processing methods. The common element was a training site data-set an order of magnitude larger than commonly used. Graham reported interpreting more than 11,000 sample points in Arizona, or about 1,000 points per TM scene, identifying 142 distinct vegetation classes. Slaymaker interpreted 18,000 points at 2,300 sample locations for a single TM scene in Massachusetts, identifying 42 vegetation classes, with an overall accuracy of 89% for 11 vegetation types at Anderson Level 3.

Building on the approaches of Graham and Slaymaker, the Vermont Cooperative Fish and Wildlife Unit will implement air-video interpretation for land-cover mapping in Vermont as part of the National Biological Service Gap Analysis Project (Scott, 1993). To facilitate and enhance the usefulness of air video, an interpretation station that integrates video, image, GIS data has been developed.


Implementation

Airborne Equipment:

The hardware and software for air-video data collection are described in Graham (1993) and Slaymaker, et. al. (1995). The airborne equipment used in Vermont consists of two Sony Hi-8 video cassette recorders, two Hi-8 video cameras, a small LCD video monitor, a Trimble GPS receiver, a laptop computer, a Horita SMPTE timecode generator/reader, and Geo-Link software for logging GPS data to the PC. During flight, the video cameras are mounted either on a door-mount or, if available, a belly-port. One camera is set to wide angle, the other to zoom. GPS position and time are recorded on the laptop computer. GPS time is converted from a serial data-stream to SMPTE timecode by the Horita timecode encoder, and recorded on one audio channel of each video tape. The timecode on the video frame is associated with the GPS point with a matching time, yielding a location for each video frame. Following flight, the GPS data are differentially corrected, and converted to an ArcInfo point coverage. Each point is coded with its latitude and longitude, and the time at which the GPS point was recorded.


Video Snapshot 1 Video Snapshot 2 Video Snapshot 3
Figures 1a, 1b, and 1c. Three Wide Angle Air-Video Snapshots, with Time Code.
Click on images to view at full size (~200kb each).
Interpretation Station:

Though video imagery is a rich source of information (Figure 1), it is much more useful when combined with other spatial data-sets. The data logging, reporting, and error checking process can be improved and simplified by incorporating the interpretation of video imagery into a GIS environment. To accomplish this, software was developed that combines the data management capabilities of ArcInfo v.7.0.2, with the multi-media tools available on Silicon Graphics workstations into an integrated air-video image interpretation station. The air-video tools allow the interpreter to bring as much information to bear on the selection of training sites and interpretation of video imagery as possible. In addition to typical ArcInfo operations, timecode (extracted from the video tape), VCR functions, and video data are controlled or processed by the video interpretation software.

Timecode:

A Horita timecode generator/reader extracts SMPTE timecode from the audio track of the videotape. The timecode is displayed in a window on a video monitor and converted to a serial data-stream. This data-stream is fed to a serial port of the workstation, and a c program extracts 4 time code bytes (hour, minute, second, and frame) as an ASCII record. 'smpteget' is accessed from ArcInfo using the TASK function, allowing timecode to be repeatedly 'sampled' and stored in an AML variable. An AML queries the point coverage containing all GPS points for a flight-line, and identifies and displays the point whose timecode matches timecode read from the serial port. Repeated calls to 'sample' timecode from the serial port allow one to follow in a cartographic display the playback of the videotape in real time (Figure 2).


video locations & TM
Figure 2. Air-Video Frame Locations over TM bands 3,2,1. Northern Vermont.
Video Control:

Computer control of the video cassette recorders is accomplished by attaching a Sony VBox to the LANC port of each VCR. One Vbox is connected via serial port to the workstation, and the second Vbox is 'slaved' to the first. LANC compatible video hardware allows several video devices to be daisy-chained together. A Vbox allows computer control of the functions of Sony video equipment. Sony VISCA protocol is used to communicate with equipment on a LANC network. A VISCA protocol implementation developed by Silicon Graphics was modified for single function calls using an ArcInfo form menu, or AML. For example, button menus were developed for standard VCR controls such as play, rewind, or pause; or an AML could instruct the VCR's to play forward 10 seconds, rewind 20 seconds, and repeat the loop several times.

Video Signal:

The final data-stream handled by the air video interpretation station is the video data. Video is fed to the workstation using a Silicon Graphics (SGI) Galileo video board, and displayed at full resolution in a window on the workstation monitor. Tools provided with the SGI IRIX operating system allow manipulation of the video imagery to enhance contrast, and capture individual frames as video is displayed. AMLs and shell scripts, accessed from ArcInfo, are used to capture video frames to disk, with a file names unique to a frame's geographic location. Other utilities can then retrieve stored video images by geographic query. AML control is also used to output a captured video frame, or full workstation screen to a small (approx 3x5") format dye-sublimation video printer.


Video Tool Menu Figure 3. Air-Video Tools Menu Bar.

Cartographic Data:

Video control and video input functions are combined with standard ArcInfo procedures in a graphical user interface (Figure 3, 4). Tools provide convenient access to map display and query functions, and image display and classification operations. The display and query of ancillary data-sets provide the operator with additional information useful in training site selection and evaluation. For instance information on the distribution of sample points with respect soil type can be displayed, or National Wetland Inventory coverages can be displayed along with air-video data to alert an operator to the possibility that an identifiable wetland may be present within a video image. After selecting an interpretation point, summary statistics and histograms can be displayed for pixels with similar cover types, or adjacent pixels in windows of varying sizes. Pixel signatures can be written as ascii files, or an edit grid can be used to record interpretations. Finally, hardcopy maps can be generated from a screen display for field checking.


full screen action
Figure 4. Image of Air-Video Interpretation Screen

Conclusion

The air-video interpretation system described here will facilitate rapid development of training site databases for land-cover mapping in Vermont. The data-sets developed will also be used for a rigorous investigation, through repeated sampling, of the effects of training site selection on classification success. The use of air-video imagery along with the tools described above have generated a great deal of interest among other researchers in Vermont. During the summer 1995, flight lines flown for land-cover mapping (Figure 5) are expected to be supplemented by flights for a number of other projects including agricultural land-use monitoring, riparian zone monitoring, updates of land-cover maps for the Green Mountain National Forest, and a study of seasonal water retention by wetlands.


vt & nh transects
Figure 4. Vermont and New Hampshire Air-Video Transects

References

Congalton, R.G., and K. Green. 1993. A Practical Look at the Source of Confusion in Error Matrix Generation. Photogrammetric Engineering & Remote Sensing. 59(5):641-644.

Curry, S. and K Schuckman, 1993. Practical Considerations for the Use of Airborne GPS for Photogrammetry. Photogrammetric Engineering & Remote Sensing. 59(11):1611-1617.

Dicks, S.E., and T.H.C. Lo, 1990, Evaluation of Thematic Map Accuracy in a Land-Use and Land-Cover Mapping Program. Photogrammetric Engineering & Remote Sensing. 56(9):1247-1252.

Foody, G.M. 1992. A Fuzzy Sets Approach to the Representation of Vegetation Continua from Remotely Sensed Data: An Example from Lowland Heath. Photogrammetric Engineering & Remote Sensing. 58(2):221-225.

Foody, G.M., and D.P. Cox, 1994. Sub-Pixel Land Cover Composition Estimation using a Linear Mixture Model and Fuzzy Membership Functions. International Journal of Remote Sensing. 15(3):619-631.

Graham, L.E. 1993. Airborne Video for Near Real-Time Resource Applications. Journal of Forestry. 91(August):28-32

Green, E.J., W.E. Strawderman, T.A. Airola, 1993. Assessing Classification Probabilities for Thematic Maps. Photogrammetric Engineering & Remote Sensing. 59(5):635-639.

Janssen, L.L.F, and F.J.M van der Wel 1994. Accuracy Assessment of Satellite Derived Land-Cover Data: A Review. Photogrammetric Engineering & Remote Sensing. 60(4):419-426.

Joyce, A.T., 1978, Procedures for Gathering Ground Truth Information for a Supervised Approach to a Computer-Implemented Land Cover Classification of Landsat-Acquired Multispectral Scanner Data. NASA Reference Publication 1015, NASA, Houston, Texas. 43p.

Moody, A, and C.E. Woodcock, 1994. Scale-Dependent Errors in the Estimation of Land-Cover Proportions: Implications for Global Land-Cover Datasets. Photogrammetric Engineering & Remote Sensing. 60(5):585-594.

McGwire, K.C. 1992, Analyst Variability in Labeling of Unsupervised Classifications. Photogrammetric Engineering & Remote Sensing. 58(12):1673-1677.

Schriever, J.R. and R.G. Congalton, 1995. Evaluating Seasonal Variability as an Aid to Cover-Type Mapping from Landsat Thematic Mapper. Photogrammetric Engineering & Remote Sensing. 61(3):321-327.

Slaymaker, D.M, K.M.L. Jones, C.R. Griffin, and J.T. Finn, 1995. Mapping Deciduous Forests in New England Using Aerial Videography and Multi-Temporal Landsat TM Imagery. In Review.

Stehman S.V., 1992. Comparison of Systematic and Random Sampling for Estimating the Accuracy of Maps Generated from Remotely Sensed Data. Photogrammetric Engineering & Remote Sensing. 58(9):1343-1350.

Stoms, D.M., 1992. Effects of Habitat Map Generalization on Biodiversity Assessment. Photogrammetric Engineering & Remote Sensing. 58(11):1587-1591.

Stoms, D.M, F.W. Davis, C.B. Cogan, 1992. Sensitivity of Wildlife Habitat Models to Uncertainties in GIS Data. Photogrammetric Engineering & Remote Sensing. 58(6):843-850.

Thomasson, D.A., C.W. Bennett, B.D. Jackson, and M.P. Mallander, 1994. Photogrammetric Engineering & Remote Sensing. 60(1):55-59.

VanDeusen, P.C., 1995. Modified Highest Confidence First Classification. Photogrammetric Engineering & Remote Sensing. 61(4):419-425.

Warren, S.D., M.O. Johnson, W.D. Goran, and V.E. Diersing, 1990. An Automated, Objective Procedure for Selecting Representative Field Sample Sites. Photogrammetric Engineering & Remote Sensing. 56(3):333-335.

Williams B.K., K. Titus, and J.E. Hines, 1990. Stability and Bias of Classification Rates in Biological Applications of Discriminant Analysis. Journal of Wildlife Management. 54(2):331-341.
Joel Schlagel, Research Associate
School of Natural Resources
University of Vermont
Burlington, Vermont 05405

Telephone: (802) 656-3324
Fax: (802) 656-8683

e-mail:joel.schlagel@uvm.edu