AbstractError Tracking and Propagation Identification Track: Database Design, Automation, and Management Author(s): Lorri A. Peltz-Lewis Error within a spatial database can be from the source materials or methods, from the processing methods, and from the use. Source errors can be from the original measurements, the interpretation of those measurements, and creation of the digital databases. Processing errors can be from changes in software, migrations to new software, conversions to new formats, projecting or transforming data, internal storage ability of computer hardware, and even simple processes such as rounding errors. Use error is impacted by both source and processing errors, yet inappropriate use of digital databases is, and will continue to be, an issue as software is made available to inexperienced users. Determining "fitness-of-use" requires knowledge of the application the database will be used for, as well as full disclosure of what the potential problems are with the database. Although the use of metadata has been officially adopted through Executive Order 12906 (United States Government, 1994), in the United States very little accuracy information is actually provided. Metadata providers are encouraged to provide verbal descriptions of the digital databases they create that assist the end user in determining the fitness-of-use but do not indicate exactly where the accuracy is better or worse. Many global statistics are provided as an interpretation of the database (Unwin, 1996) and are also too general to be of much use in determining fitness-of-use. Very few databases provide a full disclosure of what the potential errors are in the final database, where those errors exist, and to what extent they may exist. "Error is inescapable, it should be recognized as a fundamental dimension of data" (Chrisman, 1991, pg. 165) and is an element in every database. Comments such as "truth in labeling" (Prisley, 1994, pg. 33), fitness-of-use, lack of information, and a "full description of quality" (Aspinall and Pearson, 1995, pg. 71) abound. Many calls have been made for the producer of the database to provide clear and concise information on errors in the database, but few recommendations are made on how to provide this information. One recommendation is to provide a summary of error by object (Aspinall & Pearson, 1995; Brassell et al., 1995), and another recommendation is to provide an error matrix (Aspinall & Pearson, 1995; Congalton and Green, 1998; Goodchild, 1994; Veregin, 1995; Veregin and Hargitai, 1995). While each of these recommendations has merits, neither is sufficient alone to describe all of the potential error in some databases. The question is how to provide a full disclosure of the errors in a digital database? What index should be used. How can these indexes be used to determine error propagation? How would they impact the decision making process? How should confidence levels be visualized before and after the analysis? The organization of this thesis is to review the two selected databases and the source and processing errors, determine the extent of the error, address fitness-for-use, evaluate how the error impacts a change analysis, and determine how to visualize or communicate the level of confidence in the final analysis. The example used here will demonstrate a method for error tracking and propagation identification in a land use change analysis. Lorri A. Peltz-Lewis U.S. Bureau of Reclamation 2800 Cottage Way MP-450 Sacramento, CA 95825 USA Phone: 916-978-5271 Fax: 916-978-5290 E-mail: lpeltzlewis@mp.usbr.gov |