Unleashing the power of Geographic
Information Systems (GIS) on Data Warehouses
Abstract
This paper illustrates how the use
of GIS with data warehouses containing spatial data results in the combination
of two powerful technologies. This opens up enormous opportunities for
developing and deploying powerful GIS applications using data from the data
warehouse. The methodology used to develop and integrate GIS applications with
data warehouses is discussed in this paper. Different real-life examples are
used to illustrate how this methodology is implemented in practice. Finally,
this paper looks at future trends in terms of applying new technologies to give
improved functionality to the end user.
1.0 Introduction
A
Data Warehouse is a place where data from the transactional system and other
varied sources is stored and specifically structured for querying, analysis and
decision-making by the end users. The
tables containing the data in the data warehouse are related to one another as
shown in Figure 1.1 thus forming the star schema. The Fact Table at the center of the Star Schema contains detailed
data about business measurements that are of interest to the end users. The
Dimension Tables are built around the Fact Table. They consist of a collection
of related attributes that are closely tied with the business measurements
stored in the Fact Table. The star schema allows us to present the data in a
standard framework that is intuitive and allows for high performance access.
It
is said that about
80% of all the data in the world has a spatial component to it. So, it comes as
no surprise that the data in many data warehouses contains a spatial component.
In most warehouses, the geographic or spatial component would be represented as
a dimension table that is part of the star schema of the data warehouse as
shown in Figure 1.1. For example, in a data warehouse that contains information
about a retail store, the fact table may contain the dollar value of the total
sales of the different products. The geography dimension might contain the
addresses of the different customers who shop at the retail store. Some of the
other dimension tables in the data warehouse can be the customer dimension
table and the time dimension table. The
retail store can use this information to identify the buying preferences of
their customers located in different parts of the country. Based on this
information, the retail store can determine the optimal number of products and
the amount of each product that should be stored in the different outlets
distributed across the country. This can lead to enormous savings in inventory
and operational costs. The information stored in the warehouse can also be used
to identify the effectiveness of the different promotional campaigns run by the
retail store in different parts of the country at different times of the year.
The retail store can use this information to make appropriate changes to its
marketing strategy to improve its promotional campaigns.
Figure 1.1: The Star Schema of a Data Warehouse.
Figure 1.1 shows two very commonly used dimension tables in a data warehouse – i. Geography ii. Time. GIS Applications can be used to analyze data from the Fact Table using these two dimensions.
2.0 Advantages of
interfacing GIS Applications with Data Warehouses
GIS
uses spatial and geographic relationships to perform the following basic
functions of an Information System:
·
Data
Acquisition
·
Data
Management
·
Data
Retrieval
·
Data
Analysis
·
Display
and Output
·
Dissemination
of Data and Information
Developing
GIS applications using data from a data warehouse eliminates the “Data
Acquisition”, “Data Management” and “Data Retrieval” steps. In many GIS
applications, these functions consume a lot of time and resources. The data
warehouse provides us with a repository of data from diverse sources that can
easily be used to implement GIS applications. The Extraction, Transformation
and Loading (ETL) process used for loading data into the data warehouse can
also be used to retrieve data from the various source systems for developing
the GIS applications. Most of the spatial files needed for the GIS applications
can be obtained from software vendors or can be obtained from local state and
county departments. The Database Management System of the data warehouse can be
used to store the data for the GIS applications. Thus, using a data warehouse
as a data source for GIS applications leads to great savings in time and
resources. Using a GIS tool greatly extends the capabilities of the Data
Warehouse by allowing the users to view and analyze data using geo-analytical
tools. It enables the users to gain fast and easy access to data based on the
geography dimension.
The
following methodology was found to be very useful for the design and development
of GIS applications to display data from the data warehouse. This methodology
was followed on the following data warehousing engagements:
Step 1: Identify the spatial data
in the data warehouse
Step 2: Work with the users to find
out their needs and requirements in terms of viewing and analyzing data using a
geography dimension
Step 3: Based on the needs and requirements of the users, and the availability
of resources and spatial data in the data warehouse, choose an appropriate GIS
tool to integrate with the data warehouse
Step 4: Design the back end for the GIS applications
Each of these steps is explained in the context of the WISDOM and Pennsylvania DPW data warehousing projects in the following sections.
3.1 Step 1:
Identify the spatial data in the data warehouse-- This enables the GIS application developer to scope out the possible GIS applications that can be developed using the spatial data in the data warehouse. In addition, it also helps to identify the data requirements for extending the scope of the GIS applications. The WISDOM data warehouse contains welfare participant information referenced by counties, tribes, agencies and zip codes. Based on a study of this data, the following GIS applications were developed:
· GIS application to show participant information for the different counties and tribes of Wisconsin.
· GIS application to show participant information for the different regions of Milwaukee.
Based on a study of the spatial data in the Pennsylvania Data Warehouse, the following GIS applications were identified and implemented:
· GIS application to show the distribution of the Employment and Training Projects in Pennsylvania. The different counties were color-coded based on the number of Employment and Training Projects in those counties. Using this application, the users can also obtain a list of all the available Employment and Training Projects in a particular county.
· GIS application to show the distribution of childcare providers in Pennsylvania. A color-coding scheme was used to display the distribution of childcare providers in the different counties and zip codes of Pennsylvania. This application also enables the users to obtain a list of childcare providers in a particular county or zip code. This application can be enhanced by geocoding the addresses of the childcare recipients. This will enable the users to compare the location of the childcare providers with that of the childcare recipients. Thus, this application can be used to identify counties that have very few childcare providers compared to the number of childcare recipients. Armed with this information, the management can develop schemes to enroll more childcare providers in those counties.
3.2 Step 2:
Identify the
users’ needs and requirements-- It is very important to identify the needs and
requirements of the users as early as possible. This enables the developer to
design an accurate work plan for developing a GIS application, which will
satisfy the users’ requirements. Proper implementation of this step results in
enormous savings of time and resources. Sometimes the users are not quite sure
what kind of GIS applications they need. In such cases, it is useful to develop
prototypes based on the spatial data in the warehouse to enable them to start
thinking in the right direction. In effect, Step1 and Step2 must be carried out
concurrently. Most of the GIS applications listed in the previous step were
identified after discussions with the user groups on those projects.
3.3 Step 3:
Choose an
appropriate GIS tool—This can often turn out to be a very complex process because many
factors need to be considered while choosing a particular product. Some of the
factors are:
·
Availability
of resources
·
Scale
of GIS applications to be developed using that tool
·
Means
of accessing data
·
Ability
to interface with the data warehouse tools
·
Strengths
and weaknesses of the different GIS tools
·
Cost
of the GIS tool
After
considering all these factors, ArcView3.2 and ArcView IMS were the tools that
were selected for developing GIS applications on the data warehousing projects.
The software vendor for these tools is Environmental Systems Research
Institute, Inc. (Esri). Esri is a leading GIS software vendor and is the de
facto GIS software vendor for the states of Wisconsin and Pennsylvania. The
applications developed using ArcView and ArcView IMS can easily be integrated
into the Decision Support Environment of the data warehouse. In Wisconsin, the
GIS applications were integrated with the Decision Support Environment provided
by Business Objects. In Pennsylvania, these applications were integrated with
the Decision Support Environment of Cognos. Cognos and Business Objects are
leading providers of Business Intelligence software used for developing
Decision Support Systems.
3.4 Step 4:
3.5
Step5:
User Interface Design for the GIS application--- This step involves creation of the Graphical User Interface
(GUI) for the GIS application. After the back end issues are resolved, the
front-end design consists of displaying the data using a GUI. The following
figures show the GUI for the different GIS applications developed for the
Wisconsin and Pennsylvania data warehouses.
Figure 3.1: Customized GIS application
displaying data about the welfare participants in Wisconsin. Many custom
scripts were written in Avenue, ArcView’s object oriented scripting language
for developing this application.
Figure
3.2: Web based GIS
application showing the distribution of the childcare vendors in the different
counties of Pennsylvania. ArcView IMS was used to enable the users to use this
application over the intranet.
Figure
3.3: Web based GIS
application showing the distribution of the childcare vendors in the different
zip codes of Allegheny County. Users can zoom into this map from the map shown
in Figure 3.2. ArcView IMS was used to develop this application.
Figure
3.4: Report listing
all the childcare vendors in zip code 15143 of Allegheny County. Users can
generate this report by clicking on zip code 15143 on the Allegheny county map
shown in Figure 3.3. Active Server Pages were used to populate these reports
with the most recent data from the database.
4.0 The Future
Like
everything else that has been touched by the World Wide Web, the Internet has
revolutionized the way GIS applications are being developed and deployed. More
and more businesses are moving away from stand-alone or client server based GIS
systems. There has been an increased use of corporate intranets and the
Internet to broadly deploy GIS applications. Many GIS vendors provide
extensions to their software products for deploying applications over the
Internet. In addition, the emergence of Java technologies has enabled
interactive web-based GIS applications, which connect to databases using JDBC
(Java Database Connectivity).
Another
related development is the emergence of enterprise
wide open GIS
technology that enables users to operate in a heterogeneous computing
environment. Open GIS interfaces provide transparent access to distributed
geodata and geoprocessing resources. GIS applications, CAD clients, 3D/Imaging
applications and other custom applications can now work together and share and
access data from different databases across different platforms.
As
web-deployment of GIS applications becomes easier and enterprise wide open GIS
technology becomes popular, more and more uses for spatial data will be
discovered thus allowing more advanced GIS applications to be interfaced with
data warehouses.
Acknowledgements
I
thank the Department of Public Welfare, Pennsylvania and Department of
Workforce Development, Wisconsin for allowing me to reference the GIS
applications that were developed on the WISDOM and Pennsylvania DPW Data
Warehousing engagements. My sincere thanks go out to all the people of Deloitte
Consulting who have provided me with opportunities to work on a wide spectrum
of GIS projects thus inspiring me to write this paper.
References
1.
Addresses and IS/GIS Implementation, Urban and Regional Information Systems
Association (URISA) Certified Workshop Handbook
2.
ArcUser, GIS Publication by Environmental Systems Research Institute, Inc. (Esri)
3. ArcNews, GIS Publication
by Environmental Systems Research Institute, Inc. (Esri)
4.
Diogenes E. Torres, Designing, Implementing and Operating a Data Warehouse
5. GEOWorld, GIS Publication
6.
Open GIS Consortium, http://www.opengis.org
7.
Ralph Kimball, Laura Reeves, Margy Ross and Warren Thornthwaite, The Data
Warehouse Lifecycle Toolkit
Vikram
Talada
Consultant
Deloitte
Consulting,
1000
One PPG Place,
Pittsburgh,
PA 15222-5414.
Telephone:
412-402-5442
Fax:
412-402-5817
Email:
vtalada@dc.com