back Author Index Title Index Track Index forward
back   fwd

Abstract


Paper  Geographically Representative Sampling--The People
Track: Business, Banking, and Insurance
Author(s): Frederick Busche

When using artificial intelligence algorithms to discover behavior patterns in customers, training and testing data sets are created using random selection procedures. These data sets are used to validate the accuracy of predictive algorithms. However, purchasing behavior of customers is based upon not only the demographic and cyclographic attributes in the database but also where the customers are located. Failure to take this into consideration can introduce bias by making either one or both data sets nonrepresentative of the overall customer database. This problem has been apparent in gold mining exploration for decades. People tend to cluster in much the same way in locations where people of like backgrounds tend to co-locate or nugget with each other. This paper discusses a potential means by which a measure of geographic representativeness can be tested.

Frederick Busche
IBM
3116 Lake Highlands Dr.
Highland Village, TX 75077
USA

Phone: 972-318-1723
Fax: 972-318-1180
E-mail: fbusche@us.ibm.com