Ryan Damon
Abstract
The Automated Geocode Validation System (AGVS) is
an application developed by Edison to ensure the accuracy of utility
user tax collection and disbursement. Customer billing rates vary
according to a code defined by the location of the customer within
geopolitical boundaries including: the climate zone (as defined
by the California Public Utilities Commission), county, city,
and Edison district. This code is used to determine a customer's
billing rate and associated taxes. To ensure correct codes for
billing, the AGVS application attempts to address match all 4.3
million customer accounts, overlay them with city, county, district,
and baseline region boundaries, and assign a code. The resulting
code value is subsequently compared with the existing code. Existing
codes which are determined to be incorrect are subject to further
analysis before any adjustment is made to ensure against incorrect
code changes. Based on a set of predetermined criteria, codes
that require adjustment are changed and uploaded to the mainframe
customer billing database. This process is run biannually to accommodate
for new accounts and spatial data updates. A byproduct of the
application is the customer location data that is generated which
has provided Edison its first spatial information at the customer
level. This paper will discuss the technical development of the
application, focusing on the integrated ArcInfo/MatchWare approach
that the AGVS application employs, and problems inherent with
address matching over 4.3 million addresses in an area exceeding
50,000 square miles.
Background
Edison, the nation's second largest investor owned
utility, collects and distributes user taxes for over 180 municipalities
and 16 counties in Central and Southern California. Each of Edison's
4.3 million meters are categorized to ensure accurate collection
and payment of taxes for each of the government agencies requiring
franchise payments. To verify the accuracy of tax collection and
distribution, Edison has relied in the past on a manual system
of validating meter locations. This has proved to be a cumbersome
and expensive task, especially considering the size of the Edison
service territory, over 50,000 square miles, and the fact that
municipal boundaries change frequently with annexations. To relieve
Edison of the manual process and to increase the accuracy of tax
collection, it was decided to validate all of the meters in an
automated fashion, and then put into place a system to verify
all new meters as they were installed. With user taxes and franchise
payments dependent on the location of each Edison meter, a GIS
was the ideal tool to automate the process.
Description
The Automated Geocode Validation System (AGVS) is an application developed to ensure that meters are categorized correctly and that franchise tax payments to local incorporated governments are accurate and complete for all municipal residents. Customer billing rates and associated taxes vary by a code defined by the location of the customer's meter within geopolitical boundaries, including the county, city Edison district, and climate zone(as defined by the California Public Utilities Commission, in an attempt to create a more equitable distribution of cost, based on user need, across varied climates). The code is used to render a customer's billing rate, utility user taxes, and franchise tax payments to be made by Edison. To ensure correct codes for billing, it was determined that a GIS needed to be developed to determine the correct location of all Edison electric customer meters, assign a code based on the location of the meter within geopolitical boudaries, and validate the results against the existing values in the customer mainframe database.
To develop the needed GIS, Edison's Geographic Information and Analysis Systems group designed an Automated Geocode Validation System (AGVS). The AGVS application attempts to address match the 4.3 million Edison customer accounts, and determine their respective geographic location within city, county, Edison district, and climate zone boundaries. The application receives meter information from the customer mainframe database, including address, premise number(a unique identifier for every meter), account number, and the code which determines billing rates and appropriate taxes. The address of each meter is matched against the Thomas Brothers digital database to determine geographic location. That location is compared with corresponding digital data to determine the correct code for each meter. Questionable codes are reviewed in further depth to ensure the accuracy of the address matching process and, where appropriate, the results are returned to the mainframe to correct the codes.
A secondary derivative of the application is the
creation of a customer meter location database. Latitude/longitude
items were added to the mainframe, and the application is able
to populate those records with all of the meters that are address
matched. Geographic data at the customer level, encompassing most
of Southern California and much of the Central Valley, has become
a valuable tool for further GIS applications.
Methodology
The initial task of the AGVS application is to extract customer data from CIS (Customer Information System), the mainframe system which contains a wealth of information related to customer meters. The application creates "jobs" which extract a series of data items from the mainframe by a list of ZIP codes, district numbers, or meter numbers. An html interface was developed to ease data extraction and to allow users to extract data from any of the GIAS group's Unix workstations or PC's. The interface asks for the location of a file containing a list of items to extract by, the location of where to dump the data, and a password. The html, by a combination of shell and perl scripts, creates the appropriate job which is subsequently sent to the mainframe for execution. Using COBOL, the job extracts the data from the mainframe by the appropriate item, sorts the data in a text file, and returns the data back to the user specified location. The file is then easily imported into an Info file.
The AGVS application, written in AML, then extracts customer data from the newly created Info file by ZIP code. Street centerline data is also extracted from the Thomas Bros. Maps (TBM) digital database by ZIP code using the Librarian module of ArcInfo. The street data is preprocessed by removing the arcs with address ranges containing 0 to prevent invalid matches by unpopulated numeric address items in the customer tables. With the preprocessing complete, the street data and customer data are unloaded to text files, where they are matched using Automatch software from MatchWare Technologies Inc., which matches the customer addresses to the individual corresponding streets. (Initial bench marking of different address matching tools led the application to use Automatch over ArcInfo or ArcView address matching due to higher accuracy and match rates.) The matched results are then imported back into Info for further processing. While Automatch will match the appropriate address to the appropriate street segment, it does not perform the interpolation of the position of points along the street segments or the offsetting of points to the correct sides of the street. This analysis is performed in the ARC module using "events" and "sections".
The customer data is split into even and odd files, based on the address, while the street data is split into four separate coverages, depending on the direction of the arcs and their polarity. The four coverages, evenodd, oddeven, eveneven, and oddodd, are used to create eight ARCSECTIONs, one for each side of the arc for each of the four arc coverages. EVENTSOURCE databases are created from the customer tables based on their polarity, after which those events are used to create point coverages based on the ARCSECTIONs, with resulting points being offset by two meters from the street center. The eight point coverages (evenodd right side, evenodd left side, oddeven right side, etc.) are then APPENDed to create the final point coverage of matches.
The points are then identified against city, county, district, and climate zone coverages. Based on a table containing all possible combinations of city, county, district, and climate zone, a new code is generated for each point. Those new codes are compared to the existing codes for each customer meter, and those that differ are processed further. The new codes are also added to a historical data table, which tracks the code of all accounts during all runs of the application since the beginning of the project. This historical data is valuable in determining patterns as to the kinds of accounts being affected and for empirical justification for geocode changes.
With the assumptions made in the interpolation of street addresses, and with the distance that a meter is offset from a street center line, those points that are within 300 feet of a given geopolitical border are considered subject to geographic data inaccuracies and, as such, require an additional form of verification. Therefore, those customers meters within 300 feet of a city, county, or climate zone border are set aside pending further investigation. These records are sent out to field personnel for verification of the meter's real world location and its appropriate geopolitical assignment.
Location codes for customer meters that are more than 300 feet from a border, and which require a code change are prepared for further processing. The historical data contained in the previously mentioned table are processed to find code discrepancies. To ensure codes are not recalculated when they have previously been fixed by field and billing office personnel (changes made by hand are assumed to be correct), any manual changes to geocodes are recorded in the historical tables. Additionally, codes which have been changed numerous times are flagged. Depending on how the accounts are flagged, proposed changes to codes are either cleared by the billing department, or they undergo further investigation by the billing department. All code changes to accounts which may change billing rates or taxes are cleared through the billing department, where rebills and corrections are made.
Finally, a shell script will deliver the records
by ZIP code back to the mainframe for all of the approved accounts
with changed codes, and all other accounts which do not change
the code but generate a latitude/longitude value. A "mopup"
job on the mainframe will update the code, x coordinate, and y
coordinate items for the appropriate accounts over a three day
period.
Results
The system is executed every six months, to account for new customers, improved street data, and data changes, such as city annexations and ZIP code changes. During the last run, the system successfully address matched over 3.5 million customers at a match rate of over 81 percent. Of those meters which were successfully matched, over 58,000 accounts required code changes and nearly 8000 of those changes affected utility user taxes and franchise tax calculation.
Since the project was started, there has been a substantial
decrease in tax collection and payment discrepancies between Edison
and the involved agencies. Furthermore, a large geographic database
of 3.5 million households and businesses throughout Edison's territory
has been created. That data has been used for many other projects,
including target marketing, outage analysis, and crew dispatching
and routing, just to name a few.
Problems
This application has been very successful in validating codes and cleaning up the customer database, but only for the address matched accounts. The unmatched 19 percent can not be run through the process of validation without geographic attributes. Match rates vary considerably over differing geographic regions. The Edison territory encompasses areas of the highly populated Los Angeles Basin, as well as sparsely populated areas of the Sierra Nevada and the Mojave Desert. Digital street data integrity and robustness seem to relate to population, and perhaps TBM map book sales, as the highly populated ZIP codes of the Los Angeles basin have match rates easily eclipsing 90 percent, while the northern ZIP codes of the Edison service territory regularly witness rates below 40 percent. In addition, incorrect ZIP codes and incomplete addresses from Edison's Customer Information System database cause a number of accounts to be rejected.
Other problems experienced with matched data have
been reversed street segments, causing the point to be placed
on the wrong side of the street, and incorrect address ranges,
which place the point in the wrong location. While misspelled
street names are many times corrected and successfully matched
with Automatch, sometimes the assumptions made about the correct
spelling end up matching the addresses in the wrong location.
For example, many addresses in the Palmdale area have names such
as " East Avenue A" or "Avenue N". Many of
those accounts have been entered on CIS as "E AVENUEA",
or "AVENUEN". Automatch will assume that "AVENUEA"
is a typo, correct it to "AVENUE", and then match the
account to "AVENUE E", a different street and sometimes
different tax area. Situations like these have forced the project
to employ increased manual Quality Assurance/Quality Control of
meter locations than would be desired, with careful analysis performed
on each account before any changes in billing are made. This QA/QC
is executed using an Arcview application which was developed by
Edison using Avenue scripts, Exceed X Windowing software, and
Arcview for Unix. The billing department can query each meter
location, along with city, county, district, climate zone, and
street data, to determine the accuracy of the placement. In addition,
by using previously address matched meters, the billing department
can display all of the other meters located in the same meter
reader route as an additional data source to educe accuracy.
Conclusion
As utilities move towards competition in a deregulated
marketplace, increased customer service and service quality become
necessities. The AGVS project is just one method of using an existing
tabular database and adding a geographic component to allow Edison
to work more efficiently. With a GIS, utilities can use existing
customer as valuable tools in this newly emerging era.
Ryan Damon
Southern California Edison
GIAS Lab - Room 228, GO3
2131 Walnut Grove Ave.
Rosemead, CA 91770