Where Are Your Customers?

Raster Based Modeling for Customer Prospecting


Shaun K. McMullin

Shaun@mcmullin.cjb.net

http://mcmullin.cjb.net

Abstract: This paper implements a systematic approach for applying consumer data and spatial marketing techniques to a medium sized retailer for trade area growth and customer mining. Point of purchase surveys of consumers are linked with transaction records to analyze distribution of customer origins when engaged in shopping trips. Disaggregate trade areas are created through customer spotting points and displayed in a raster/grid based geographic information system (GIS). Densities of sales for the sampled customers provide a foundation for understanding the spatial extent of the customer base. The analysis is based on an analog raster-modeling procedure and takes into account 1) drive times, 2) customer segments and 3) household incomes. Map calculations are performed on the raster data to arrive at a composite suitability map of neighborhoods. A direct marketing tool within a GIS framework is used to query addresses of potential customers for targeted mailings. Addresses are selected based on the suitability model and then mailing labels are printed. By means of the suitability model, increased response rates would be expected.

Key Words: retail location analysis, geodemographics, GIS, geographic information systems, database marketing, analog approach, trade areas, market area analysis.


Introduction

The marketing and retail geography techniques developed over the past few decades were used without the help of geographic information technologies. In fact, the majority of these tools were designed and implemented before the advent of the personal computer. Tapes were utilized for demographic and population data and Fortran programs were written to solve specific complex modeling procedures. For visualization, paper maps were used and data were hand drawn on top of them. Classic pictures of "push pin" maps identifying retail locations or neighborhoods in which to target are clearly in the past. With the advent of the PC and the Pentium processor, geographic information systems (GIS) and desktop mapping have enabled firms such as the Gap, Target, Blockbuster Video, Starbucks Coffee, Sears Roebuck, etc. to utilize retail and marketing geography techniques that were previously unattainable. Efficiencies that computers brought the industry allowed for development of software and mass production of data, thus bringing prices within reach of large, medium and even very small companies.

This paper will present an emerging application of GIS for customer prospecting. It utilizes a raster based model for identifying addresses for direct mailings based on existing customer information. The project utilizes a case study approach for a medium sized retailer.

Background

The University Bookstore is a retailer of textbooks, general books, computers and electronics, student supplies, and other merchandise, operating store outlets in Seattle, Tacoma, Bothell, and Bellevue, Washington. The Bellevue Bookstore does not carry textbooks and this branch caters predominantly to patrons of the Eastside of Lake Washington, across the water from Seattle. Figure 1 is a regional view of Bellevue, the Eastside, and the Bellevue Bookstore.

Figure 1: Regional View of Bellevue Bookstore

Survey design

The first step in building a raster based customer prospecting model was to secure data on customer transactions, place of residence, work, and behaviors. Sales people conducted a survey by asking each customer if he or she would complete a short questionnaire after their purchase. The sales people informed the customers that this information would remain confidential and that their names would, in return, be entered into a gift drawing. Ninety-three percent of the respondents provided their residential address. Six hundred surveys were collected over a four-week period.

The survey had four primary elements. Each question fit into one of these areas. The elements are: 1) single and multi-purpose shopping behaviors, 2) trip origin and residential address of consumer, 3) recency-frequency-monetary value of customer, and 4) demographic characteristics of the customer. After receiving the completed survey, the sales person linked the sales receipt with the customer survey. Thus, it was possible to analyze survey responses with sales amount and items purchased. Figure 2 displays a copy of the survey.

 

Figure 2: Consumer survey

Technical requirements

Software used in the project included Esri’s ArcView, Spatial, Network and Business Analyst extensions, CACI Coder/Plus geocoder and lifestyle clustering, and Microsoft Excel for spreadsheet analysis. In addition to software tools, the project also utilized private business and proprietary data sets. Typically, these data sets were processed and utilized by more than one software application. Figure 3 displays the interaction between the GIS engine and other software used in the project.

Figure 3: A Direct Marketing GIS

Data elements used in the model were 1) drive time suitability, 2) customer segmenting and 3) household median income. Layers of data were created from information on actual customers as provided by the surveys. For example, drive times were computed for surveyed customers and broken into 10th, 20th…90th percentiles. Drive time grids were then created based on these percentiles. Thus, an area closest to the store was weighed most heavily based on the high distance sensitivity of customers. Demographic components were also modeled based on the survey of customers. The percentages of customers falling into the two genders and into each age group were computed. The percentages were then multiplied by gender and age forecasts for each block group. Distribution of median household income per block group provided the third data element. These three data elements were then summed using map calculation providing suitability rankings of areas. Addresses within the high suitability areas were queried using the GIS and mailing labels were produced. The business geographer can then target consumers in direct marketing or other advertising.

Geocoding and spotting of customer addresses

To convert addresses of residential location and origin of trip to latitude and longitude coordinates, the data file was entered into a geocoding engine. By themselves, addresses are not inherently geographic. Addresses are processed by software and matched with a database file relating building and street addresses with latitude and longitude coordinates. Geocoding is a process that adds geographical latitude and longitude coordinates to each customer address, so it can then be displayed in ArcView as a graphical point theme. Geocoding refers to what has traditionally been termed address matching. Geocoding takes an address (or some identifier that has no spatial tie or x,y/latitude, longitude coordinate) and locates that feature spatially. The procedure assigns a geographic code to the matched points based on the location. Geocoding requires that an address reference theme be present, which usually consists of a street file that has building addresses, street names, and their corresponding x and y coordinates.

The survey file was entered into a product called CACI Coder/Plus. The software processed the trip origin and residential addresses from the survey respondents. After this process, the output file contained a series of new fields with the lat/long coordinates. The result was a file containing new fields of point locations for each successfully geocoded customer. Small percentages (20%) of customer’s addresses were not successfully geocoded because the software was unable to match these addresses with latitude and longitude coordinates. This is typical of a project of this sort. A successful hit rate of 80% is common in geocoding of survey addresses. After each customer was geocoded, it was then possible to display the customer addresses as a point theme in ArcView GIS. This will be discussed in much detail in later sections of this chapter.

The first step in analyzing the trade area was to display all sampled customers. This was accomplished through a customer spotting routine. The process took the geocoded customers and applied the latitude and longitude coordinates to display them as points. Figure 4 shows the distribution of sampled customers. The map excludes customers falling outside the Seattle/Bellevue MSA.

Figure 4: Customer Spotting of Residential Addresses

The next step was to delineate the boundaries of the spotted customers into a trade area boundary line. Eighty percent of the customers determined the trade area. This eliminated customers that traveled farther than typical distances. Figure 5 shows the results of this step.

Figure 5: Trade Area Boundary

The suitability model

A suitability ranking is a relative weighting of a certain factor for each geographic area of a defined aggregation. In this paper, the aggregated area for relative weights is the block group. Thus, each block group is compared to one another on an interval scale of suitability factors. These three factors are: 1) drive times, 2) customer segments, and 3) household incomes. Each of these factors contributes equally to the suitability ranking of block groups, from most desirable to least desirable. Thus, each block group will have a ranking from one through ten. The highest ranked block groups should provide the highest response rates to direct mailings or other advertising tactics.

The Spatial Analyst module creates grids of continuous interval data. Each block group is ranked on the above three criteria on a scale of 1-10. The GIS is then used to perform a "map calculation", summing each of the factors and then averaging this sum to arrive at a final suitability map. This map displays each block group in the trade area and its relative suitability for responding to advertising messages. We will explain each of the components that go into the creation of this model. We begin with the drive time analysis.

Drive times

The customer spotting routine explained earlier provided a distribution of residential addresses from which network drive times were computed. Each customer address was entered into the vector analysis extension of ArcView GIS. A shortest path analysis was implemented on each point using Network Analyst; and a cost field was created for the customer file. In this cost field, the cost in minutes of driving from each residence to the store were entered for each customer. The software computed drive time assuming no variability in traffic congestion. The times were not compiled for any particular time of day, rather as an across the board estimate of cost, holding all else equal. Costs were then exported to SPSS and frequencies were computed. The following provides a summary of these statistics. All numbers in the third column of Table 1 are in minutes (except n).

Table 1: Summary of Customer Drive Times

n

 

433

Median

 

7.71

Range

 

31.78

Percentiles

10

2.45

 

20

3.80

 

30

5.08

 

40

6.03

 

50

7.71

 

60

8.76

 

70

9.81

 

80

11.36

 

90

15.46

The chart above shows a median cost of 7.71 minutes for the 433 customers who provided residential addresses that were successfully geocoded. The results were broken into 10 equal intervals of n to facilitate transformation to a scale of 1-10. We used the GIS to create ten polygon coverages, while each polygon represented one of the above percentiles. Thus, the smallest polygon represented a drive time of 2.45 minutes, next largest 3.8 minutes, and so on, until the largest polygon represented the highest value of the range, 31.78 minutes.

Figure 6 displays a representation of this drive time analysis routine, with those customers living within 8 minutes (rounding the median) of the store represented as red points and those living farther than 8 minutes shown as black points. Thus, red points are below the median and black points are above.

Figure 6: Median-Break Point in Customer Drive Times

Figure 7 shows the polygon boundaries for each of the ten drive times as defined by the 10 percentiles displayed in Table 1. The polygon farthest from the store was estimated at 32 minutes, the next closest at 15 minutes, and the closest polygon line represented a 2.4 minute buffer from the store. Each of these polygon boundaries is considered an isoline, with the value of each line represented by the appropriate drive time. The area between isolines has the attribute of the outermost line bounding it. From these estimates, a continuous gradient surface was created in the raster domain. Subsequent processing for the neighborhood suitability model was carried forth with Esri’s Spatial Analyst.

Figure 7: Drive Time Gradients

Customer segments

Customer surveys granted a means of segmenting the population based on gender and age. The two questions asking for gender and age of respondent provided data on customer segments. A summary of each age group for females and males gave percentages for each customer segment. This percentage was taken from each age group for each gender based on the total customer responses. There were six age categories and two gender categories, summing to 100% of customer survey respondents. The following table displays the customer segment and their respective ratios.

Table 2: Summary of Customer Segments

Females

Males

Totals

Females

Males

<18

4

4

8

0.01

0.01

19-25

10

6

16

0.02

0.01

26-35

35

12

47

0.07

0.02

36-50

161

45

206

0.31

0.09

51-65

129

37

166

0.25

0.07

>65

57

16

73

0.11

0.03

No answer

3

0

3

0.01

-

Totals

399

120

519

0.77

0.23

= 1.00

Geodemographic data forecasts for year 2001 were used at the block group level purchased from Claritas, Inc. Forecasts were categorized into finer groups than the survey, for example, 0-5, 6-10, and 11-18 year old categories. Therefore, forecast categories were summed into categories listed in Table 2 to reflect those of the survey. We then applied each of the above percentages to each block group for each age group. Thus, if a block group was forecasted to have 100 females between the ages of 51-65 in year 2001, then these data were applied to the proportion of .25 identified in Table 2. The resulting product, 25, is a relative measure of the attractiveness of this block group, based on the resident population of the age-gender groups and its propensity to purchase at the Bellevue Bookstore.

The described steps were carried out for each of the age and gender groups listed in Table 2. They were then computed into a percent of total persons using the total person’s field in the data set. Thus, each block group had a number of persons falling in the store’s customer segment as a percentage of the total persons in that area. In theory, this would enable increased responses based on increased chances of mailing to a person meeting the store’s customer criteria. Combining female and male customer segment forecasts are shown below. These were first converted to a raster grid and reclassified to a 1-10 interval scale. Figure 8 displays this.

Figure 8: Customer Segments

Income suitability

We can generally infer that higher income influence higher spending habits. More economic resources equate to higher levels of discretionary spending. Figure 9 displays median incomes per block group after transformation to raster and a 1-10 interval scale.

 

Figure 9: Median Household Incomes

Composite suitability of customer analogs

The final map is a composite of the three maps using the "map calculator" function in ArcView’s Spatial Analyst. The routine overlays these data and sums them to form an image displaying the geographic distribution of those totals. Thus, if a given block group fell within the 2.5 minute drive time, had a high population in the age-gender groups that frequent the Bellevue Bookstore, and contained higher than median incomes, then that area would register relatively high on the final composite suitability map. The result of summing these factors and converting back to the 1-10 interval scale is Figure 10. The map includes competition and highways to facilitate analyzing the results with reasonable judgment. It would make sense to further rank the block groups according to the level of competition near them. Thus, those block groups ranked highest while having less proximate competitors would be given higher priority over those ranked the same with more competitors. Field surveys would be helpful here, in order to rank competitors based on some proxy, such as estimated square feet of floor space. Furthermore, these elements could also be included in the suitability model in future research efforts.

Figure 10: Block Groups with Highest Concentration of Potential Customers

Converting high suitability to direct mailings

Areas with predicted high response to advertising, as indicated in Figure 10, are used to query addresses of households living within those block groups. A file of all King County homeowners was geocoded through CACI Coder/Plus to create a coverage of those households. The spatial relationship between location of residence and block group of high suitability was utilized to query all households falling within those block groups. The query and selection operation was processed using the ArcView interface and the selection within boundary function. A coverage was then created to verify this relationship and is shown as Figure 11. Block groups with high suitability are identified in a different shade than those with lower suitability, as are household addresses falling within the identified blockgroups.

Figure 11: Addresses with High Suitability of Potential Customers

Conclusion

Figure 11 displays all block groups and homeowners that are located in areas of high suitability. In the database, each geocoded point has a related mailing address. By selecting the above addresses based on location in suitable block groups, the addresses are available for export. Therefore, a table of addresses was exported from ArcView to a business module called Presentable, where mailing labels were created based on the ‘taxpayer name’ field and the address, city, state and zip fields. These mailing labels were printed on the back of advertising materials. Codes were also included, such as block group, to facilitate analyzing responses based on the results of the analog suitability model. This created a dynamic feedback loop whereby the analyst would learn and adapt the suitability model accordingly. With proper tracking, responses to advertisements (in the form of consumer purchases) provide information on market expansion and success of the GIS-based target marketing effort.


Author Information

Shaun K. McMullin, MA
Geography Department, University of Washington
2466 Westlake Ave. N. #8, Seattle, WA 98109
Shaun@mcmullin.cjb.net
http://mcmullin.cjb.net