Ryan Damon

On the Methodology of Geocoding Electric Utility Customers to Validate Billing Rates and Taxes

Abstract

The Automated Geocode Validation System (AGVS) is an application developed by Edison to ensure the accuracy of utility user tax collection and disbursement. Customer billing rates vary according to a code defined by the location of the customer within geopolitical boundaries including: the climate zone (as defined by the California Public Utilities Commission), county, city, and Edison district. This code is used to determine a customer's billing rate and associated taxes. To ensure correct codes for billing, the AGVS application attempts to address match all 4.3 million customer accounts, overlay them with city, county, district, and baseline region boundaries, and assign a code. The resulting code value is subsequently compared with the existing code. Existing codes which are determined to be incorrect are subject to further analysis before any adjustment is made to ensure against incorrect code changes. Based on a set of predetermined criteria, codes that require adjustment are changed and uploaded to the mainframe customer billing database. This process is run biannually to accommodate for new accounts and spatial data updates. A byproduct of the application is the customer location data that is generated which has provided Edison its first spatial information at the customer level. This paper will discuss the technical development of the application, focusing on the integrated ArcInfo/MatchWare approach that the AGVS application employs, and problems inherent with address matching over 4.3 million addresses in an area exceeding 50,000 square miles.


Background

Edison, the nation's second largest investor owned utility, collects and distributes user taxes for over 180 municipalities and 16 counties in Central and Southern California. Each of Edison's 4.3 million meters are categorized to ensure accurate collection and payment of taxes for each of the government agencies requiring franchise payments. To verify the accuracy of tax collection and distribution, Edison has relied in the past on a manual system of validating meter locations. This has proved to be a cumbersome and expensive task, especially considering the size of the Edison service territory, over 50,000 square miles, and the fact that municipal boundaries change frequently with annexations. To relieve Edison of the manual process and to increase the accuracy of tax collection, it was decided to validate all of the meters in an automated fashion, and then put into place a system to verify all new meters as they were installed. With user taxes and franchise payments dependent on the location of each Edison meter, a GIS was the ideal tool to automate the process.

Description

The Automated Geocode Validation System (AGVS) is an application developed to ensure that meters are categorized correctly and that franchise tax payments to local incorporated governments are accurate and complete for all municipal residents. Customer billing rates and associated taxes vary by a code defined by the location of the customer's meter within geopolitical boundaries, including the county, city Edison district, and climate zone(as defined by the California Public Utilities Commission, in an attempt to create a more equitable distribution of cost, based on user need, across varied climates). The code is used to render a customer's billing rate, utility user taxes, and franchise tax payments to be made by Edison. To ensure correct codes for billing, it was determined that a GIS needed to be developed to determine the correct location of all Edison electric customer meters, assign a code based on the location of the meter within geopolitical boudaries, and validate the results against the existing values in the customer mainframe database.

To develop the needed GIS, Edison's Geographic Information and Analysis Systems group designed an Automated Geocode Validation System (AGVS). The AGVS application attempts to address match the 4.3 million Edison customer accounts, and determine their respective geographic location within city, county, Edison district, and climate zone boundaries. The application receives meter information from the customer mainframe database, including address, premise number(a unique identifier for every meter), account number, and the code which determines billing rates and appropriate taxes. The address of each meter is matched against the Thomas Brothers digital database to determine geographic location. That location is compared with corresponding digital data to determine the correct code for each meter. Questionable codes are reviewed in further depth to ensure the accuracy of the address matching process and, where appropriate, the results are returned to the mainframe to correct the codes.

A secondary derivative of the application is the creation of a customer meter location database. Latitude/longitude items were added to the mainframe, and the application is able to populate those records with all of the meters that are address matched. Geographic data at the customer level, encompassing most of Southern California and much of the Central Valley, has become a valuable tool for further GIS applications.

Methodology

The initial task of the AGVS application is to extract customer data from CIS (Customer Information System), the mainframe system which contains a wealth of information related to customer meters. The application creates "jobs" which extract a series of data items from the mainframe by a list of ZIP codes, district numbers, or meter numbers. An html interface was developed to ease data extraction and to allow users to extract data from any of the GIAS group's Unix workstations or PC's. The interface asks for the location of a file containing a list of items to extract by, the location of where to dump the data, and a password. The html, by a combination of shell and perl scripts, creates the appropriate job which is subsequently sent to the mainframe for execution. Using COBOL, the job extracts the data from the mainframe by the appropriate item, sorts the data in a text file, and returns the data back to the user specified location. The file is then easily imported into an Info file.

The AGVS application, written in AML, then extracts customer data from the newly created Info file by ZIP code. Street centerline data is also extracted from the Thomas Bros. Maps (TBM) digital database by ZIP code using the Librarian module of ArcInfo. The street data is preprocessed by removing the arcs with address ranges containing 0 to prevent invalid matches by unpopulated numeric address items in the customer tables. With the preprocessing complete, the street data and customer data are unloaded to text files, where they are matched using Automatch software from MatchWare Technologies Inc., which matches the customer addresses to the individual corresponding streets. (Initial bench marking of different address matching tools led the application to use Automatch over ArcInfo or ArcView address matching due to higher accuracy and match rates.) The matched results are then imported back into Info for further processing. While Automatch will match the appropriate address to the appropriate street segment, it does not perform the interpolation of the position of points along the street segments or the offsetting of points to the correct sides of the street. This analysis is performed in the ARC module using "events" and "sections".

The customer data is split into even and odd files, based on the address, while the street data is split into four separate coverages, depending on the direction of the arcs and their polarity. The four coverages, evenodd, oddeven, eveneven, and oddodd, are used to create eight ARCSECTIONs, one for each side of the arc for each of the four arc coverages. EVENTSOURCE databases are created from the customer tables based on their polarity, after which those events are used to create point coverages based on the ARCSECTIONs, with resulting points being offset by two meters from the street center. The eight point coverages (evenodd right side, evenodd left side, oddeven right side, etc.) are then APPENDed to create the final point coverage of matches.

The points are then identified against city, county, district, and climate zone coverages. Based on a table containing all possible combinations of city, county, district, and climate zone, a new code is generated for each point. Those new codes are compared to the existing codes for each customer meter, and those that differ are processed further. The new codes are also added to a historical data table, which tracks the code of all accounts during all runs of the application since the beginning of the project. This historical data is valuable in determining patterns as to the kinds of accounts being affected and for empirical justification for geocode changes.

With the assumptions made in the interpolation of street addresses, and with the distance that a meter is offset from a street center line, those points that are within 300 feet of a given geopolitical border are considered subject to geographic data inaccuracies and, as such, require an additional form of verification. Therefore, those customers meters within 300 feet of a city, county, or climate zone border are set aside pending further investigation. These records are sent out to field personnel for verification of the meter's real world location and its appropriate geopolitical assignment.

Location codes for customer meters that are more than 300 feet from a border, and which require a code change are prepared for further processing. The historical data contained in the previously mentioned table are processed to find code discrepancies. To ensure codes are not recalculated when they have previously been fixed by field and billing office personnel (changes made by hand are assumed to be correct), any manual changes to geocodes are recorded in the historical tables. Additionally, codes which have been changed numerous times are flagged. Depending on how the accounts are flagged, proposed changes to codes are either cleared by the billing department, or they undergo further investigation by the billing department. All code changes to accounts which may change billing rates or taxes are cleared through the billing department, where rebills and corrections are made.

Finally, a shell script will deliver the records by ZIP code back to the mainframe for all of the approved accounts with changed codes, and all other accounts which do not change the code but generate a latitude/longitude value. A "mopup" job on the mainframe will update the code, x coordinate, and y coordinate items for the appropriate accounts over a three day period.

Results

The system is executed every six months, to account for new customers, improved street data, and data changes, such as city annexations and ZIP code changes. During the last run, the system successfully address matched over 3.5 million customers at a match rate of over 81 percent. Of those meters which were successfully matched, over 58,000 accounts required code changes and nearly 8000 of those changes affected utility user taxes and franchise tax calculation.

Since the project was started, there has been a substantial decrease in tax collection and payment discrepancies between Edison and the involved agencies. Furthermore, a large geographic database of 3.5 million households and businesses throughout Edison's territory has been created. That data has been used for many other projects, including target marketing, outage analysis, and crew dispatching and routing, just to name a few.

Problems

This application has been very successful in validating codes and cleaning up the customer database, but only for the address matched accounts. The unmatched 19 percent can not be run through the process of validation without geographic attributes. Match rates vary considerably over differing geographic regions. The Edison territory encompasses areas of the highly populated Los Angeles Basin, as well as sparsely populated areas of the Sierra Nevada and the Mojave Desert. Digital street data integrity and robustness seem to relate to population, and perhaps TBM map book sales, as the highly populated ZIP codes of the Los Angeles basin have match rates easily eclipsing 90 percent, while the northern ZIP codes of the Edison service territory regularly witness rates below 40 percent. In addition, incorrect ZIP codes and incomplete addresses from Edison's Customer Information System database cause a number of accounts to be rejected.

Other problems experienced with matched data have been reversed street segments, causing the point to be placed on the wrong side of the street, and incorrect address ranges, which place the point in the wrong location. While misspelled street names are many times corrected and successfully matched with Automatch, sometimes the assumptions made about the correct spelling end up matching the addresses in the wrong location. For example, many addresses in the Palmdale area have names such as " East Avenue A" or "Avenue N". Many of those accounts have been entered on CIS as "E AVENUEA", or "AVENUEN". Automatch will assume that "AVENUEA" is a typo, correct it to "AVENUE", and then match the account to "AVENUE E", a different street and sometimes different tax area. Situations like these have forced the project to employ increased manual Quality Assurance/Quality Control of meter locations than would be desired, with careful analysis performed on each account before any changes in billing are made. This QA/QC is executed using an Arcview application which was developed by Edison using Avenue scripts, Exceed X Windowing software, and Arcview for Unix. The billing department can query each meter location, along with city, county, district, climate zone, and street data, to determine the accuracy of the placement. In addition, by using previously address matched meters, the billing department can display all of the other meters located in the same meter reader route as an additional data source to educe accuracy.


Conclusion

As utilities move towards competition in a deregulated marketplace, increased customer service and service quality become necessities. The AGVS project is just one method of using an existing tabular database and adding a geographic component to allow Edison to work more efficiently. With a GIS, utilities can use existing customer as valuable tools in this newly emerging era.

Ryan Damon
Southern California Edison
GIAS Lab - Room 228, GO3
2131 Walnut Grove Ave.
Rosemead, CA 91770