Unleashing the power of Geographic Information Systems (GIS) on Data Warehouses

Vikram Talada

Abstract 

This paper illustrates how the use of GIS with data warehouses containing spatial data results in the combination of two powerful technologies. This opens up enormous opportunities for developing and deploying powerful GIS applications using data from the data warehouse. The methodology used to develop and integrate GIS applications with data warehouses is discussed in this paper. Different real-life examples are used to illustrate how this methodology is implemented in practice. Finally, this paper looks at future trends in terms of applying new technologies to give improved functionality to the end user.

 

1.0 Introduction

A Data Warehouse is a place where data from the transactional system and other varied sources is stored and specifically structured for querying, analysis and decision-making by the end users.  The tables containing the data in the data warehouse are related to one another as shown in Figure 1.1 thus forming the star schema.  The Fact Table at the center of the Star Schema contains detailed data about business measurements that are of interest to the end users. The Dimension Tables are built around the Fact Table. They consist of a collection of related attributes that are closely tied with the business measurements stored in the Fact Table. The star schema allows us to present the data in a standard framework that is intuitive and allows for high performance access.

 

It is said that about 80% of all the data in the world has a spatial component to it. So, it comes as no surprise that the data in many data warehouses contains a spatial component. In most warehouses, the geographic or spatial component would be represented as a dimension table that is part of the star schema of the data warehouse as shown in Figure 1.1. For example, in a data warehouse that contains information about a retail store, the fact table may contain the dollar value of the total sales of the different products. The geography dimension might contain the addresses of the different customers who shop at the retail store. Some of the other dimension tables in the data warehouse can be the customer dimension table and the time dimension table.  The retail store can use this information to identify the buying preferences of their customers located in different parts of the country. Based on this information, the retail store can determine the optimal number of products and the amount of each product that should be stored in the different outlets distributed across the country. This can lead to enormous savings in inventory and operational costs. The information stored in the warehouse can also be used to identify the effectiveness of the different promotional campaigns run by the retail store in different parts of the country at different times of the year. The retail store can use this information to make appropriate changes to its marketing strategy to improve its promotional campaigns.

 

Figure 1.1: The Star Schema of a Data Warehouse.

 

Figure 1.1 shows two very commonly used dimension tables in a data warehouse – i. Geography ii. Time. GIS Applications can be used to analyze data from the Fact Table using these two dimensions.

 

2.0 Advantages of interfacing GIS Applications with Data Warehouses

GIS uses spatial and geographic relationships to perform the following basic functions of an Information System:

·        Data Acquisition

·        Data Management

·        Data Retrieval

·        Data Analysis

·        Display and Output

·        Dissemination of Data and Information

Developing GIS applications using data from a data warehouse eliminates the “Data Acquisition”, “Data Management” and “Data Retrieval” steps. In many GIS applications, these functions consume a lot of time and resources. The data warehouse provides us with a repository of data from diverse sources that can easily be used to implement GIS applications. The Extraction, Transformation and Loading (ETL) process used for loading data into the data warehouse can also be used to retrieve data from the various source systems for developing the GIS applications. Most of the spatial files needed for the GIS applications can be obtained from software vendors or can be obtained from local state and county departments. The Database Management System of the data warehouse can be used to store the data for the GIS applications. Thus, using a data warehouse as a data source for GIS applications leads to great savings in time and resources. Using a GIS tool greatly extends the capabilities of the Data Warehouse by allowing the users to view and analyze data using geo-analytical tools. It enables the users to gain fast and easy access to data based on the geography dimension.  

 

3.0 Methodology

The following methodology was found to be very useful for the design and development of GIS applications to display data from the data warehouse. This methodology was followed on the following data warehousing engagements:

Step 1: Identify the spatial data in the data warehouse

Step 2: Work with the users to find out their needs and requirements in terms of viewing and analyzing data using a geography dimension

Step 3: Based on the needs and requirements of the users, and the availability of resources and spatial data in the data warehouse, choose an appropriate GIS tool to integrate with the data warehouse

Step 4: Design the back end for the GIS applications

Step 5: Design the user interface for the GIS applications

Each of these steps is explained in the context of the WISDOM and Pennsylvania DPW data warehousing projects in the following sections.

 

3.1 Step 1:

Identify the spatial data in the data warehouse-- This enables the GIS application developer to scope out the possible GIS applications that can be developed using the spatial data in the data warehouse. In addition, it also helps to identify the data requirements for extending the scope of the GIS applications. The WISDOM data warehouse contains welfare participant information referenced by counties, tribes, agencies and zip codes. Based on a study of this data, the following GIS applications were developed:

·        GIS application to show participant information for the different counties and tribes of Wisconsin.

·        GIS application to show participant information for the different regions of Milwaukee.

Based on a study of the spatial data in the Pennsylvania Data Warehouse, the following GIS applications were identified and implemented:

·        GIS application to show the distribution of the Employment and Training Projects in Pennsylvania. The different counties were color-coded based on the number of Employment and Training Projects in those counties. Using this application, the users can also obtain a list of all the available Employment and Training Projects in a particular county.

·        GIS application to show the distribution of childcare providers in Pennsylvania. A color-coding scheme was used to display the distribution of childcare providers in the different counties and zip codes of Pennsylvania. This application also enables the users to obtain a list of childcare providers in a particular county or zip code. This application can be enhanced by geocoding the addresses of the childcare recipients. This will enable the users to compare the location of the childcare providers with that of the childcare recipients. Thus, this application can be used to identify counties that have very few childcare providers compared to the number of childcare recipients. Armed with this information, the management can develop schemes to enroll more childcare providers in those counties. 

 

3.2 Step 2:

Identify the users’ needs and requirements-- It is very important to identify the needs and requirements of the users as early as possible. This enables the developer to design an accurate work plan for developing a GIS application, which will satisfy the users’ requirements. Proper implementation of this step results in enormous savings of time and resources. Sometimes the users are not quite sure what kind of GIS applications they need. In such cases, it is useful to develop prototypes based on the spatial data in the warehouse to enable them to start thinking in the right direction. In effect, Step1 and Step2 must be carried out concurrently. Most of the GIS applications listed in the previous step were identified after discussions with the user groups on those projects.

 

3.3 Step 3:

Choose an appropriate GIS tool—This can often turn out to be a very complex process because many factors need to be considered while choosing a particular product. Some of the factors are:

·        Availability of resources

·        Scale of GIS applications to be developed using that tool

·        Means of accessing data

·        Ability to interface with the data warehouse tools

·        Strengths and weaknesses of the different GIS tools

·        Cost of the GIS tool

After considering all these factors, ArcView3.2 and ArcView IMS were the tools that were selected for developing GIS applications on the data warehousing projects. The software vendor for these tools is Environmental Systems Research Institute, Inc. (Esri). Esri is a leading GIS software vendor and is the de facto GIS software vendor for the states of Wisconsin and Pennsylvania. The applications developed using ArcView and ArcView IMS can easily be integrated into the Decision Support Environment of the data warehouse. In Wisconsin, the GIS applications were integrated with the Decision Support Environment provided by Business Objects. In Pennsylvania, these applications were integrated with the Decision Support Environment of Cognos. Cognos and Business Objects are leading providers of Business Intelligence software used for developing Decision Support Systems. 

 

3.4 Step 4:

Back End Design for the GIS application--- This step deals with the retrieval of data from the data warehouse. Depending on the kind of database being used and the type of GIS tool, an optimal data retrieval mechanism is designed. For the WISDOM data warehouse, a SQL connection using ODBC was used to connect to the Oracle 7.3.2 database to download the necessary data to the users’ computers. This downloaded data is used to run all the subsequent queries resulting in improved performance. The data is refreshed every time the user opens a new session thus making sure that the user has access to the most recent data. For other cases where there is a large amount of data, this method might not be optimal. It might be better to use an ODBC connection to query and retrieve relevant data directly from the database over a fast network rather than a one-time download of data. For the Pennsylvania DPW Data Warehouse, data is retrieved from the source systems using the existing Extraction, Transformation and Loading (ETL) approach for loading data into the warehouse. Informatica PowerMart is the ETL tool used for this purpose. After the data is loaded into the database, an ODBC connection is used to retrieve data for the GIS applications.

Recognizing the growing need to store spatial data, many database vendors have added spatial components to their databases. Spatial data modeling enables the users to store and retrieve points, lines, and other spatial features in a database just like any other data.

 

3.5 Step5:
User Interface Design for the GIS application--- This step involves creation of the Graphical User Interface (GUI) for the GIS application. After the back end issues are resolved, the front-end design consists of displaying the data using a GUI. The following figures show the GUI for the different GIS applications developed for the Wisconsin and Pennsylvania data warehouses.

 


Figure 3.1: Customized GIS application displaying data about the welfare participants in Wisconsin. Many custom scripts were written in Avenue, ArcView’s object oriented scripting language for developing this application.

 


Figure 3.2: Web based GIS application showing the distribution of the childcare vendors in the different counties of Pennsylvania. ArcView IMS was used to enable the users to use this application over the intranet.

 

Figure 3.3: Web based GIS application showing the distribution of the childcare vendors in the different zip codes of Allegheny County. Users can zoom into this map from the map shown in Figure 3.2. ArcView IMS was used to develop this application.

 

 

Figure 3.4: Report listing all the childcare vendors in zip code 15143 of Allegheny County. Users can generate this report by clicking on zip code 15143 on the Allegheny county map shown in Figure 3.3. Active Server Pages were used to populate these reports with the most recent data from the database.

 

4.0 The Future

Like everything else that has been touched by the World Wide Web, the Internet has revolutionized the way GIS applications are being developed and deployed. More and more businesses are moving away from stand-alone or client server based GIS systems. There has been an increased use of corporate intranets and the Internet to broadly deploy GIS applications. Many GIS vendors provide extensions to their software products for deploying applications over the Internet. In addition, the emergence of Java technologies has enabled interactive web-based GIS applications, which connect to databases using JDBC (Java Database Connectivity).

Another related development is the emergence of enterprise wide open GIS technology that enables users to operate in a heterogeneous computing environment. Open GIS interfaces provide transparent access to distributed geodata and geoprocessing resources. GIS applications, CAD clients, 3D/Imaging applications and other custom applications can now work together and share and access data from different databases across different platforms.

As web-deployment of GIS applications becomes easier and enterprise wide open GIS technology becomes popular, more and more uses for spatial data will be discovered thus allowing more advanced GIS applications to be interfaced with data warehouses.

 

Acknowledgements

I thank the Department of Public Welfare, Pennsylvania and Department of Workforce Development, Wisconsin for allowing me to reference the GIS applications that were developed on the WISDOM and Pennsylvania DPW Data Warehousing engagements. My sincere thanks go out to all the people of Deloitte Consulting who have provided me with opportunities to work on a wide spectrum of GIS projects thus inspiring me to write this paper.

 

References

1. Addresses and IS/GIS Implementation, Urban and Regional Information Systems Association (URISA) Certified Workshop Handbook

2. ArcUser, GIS Publication by Environmental Systems Research Institute, Inc. (Esri)

3. ArcNews, GIS Publication by Environmental Systems Research Institute, Inc. (Esri)

4. Diogenes E. Torres, Designing, Implementing and Operating a Data Warehouse

5. GEOWorld, GIS Publication

6. Open GIS Consortium, http://www.opengis.org

7. Ralph Kimball, Laura Reeves, Margy Ross and Warren Thornthwaite, The Data Warehouse Lifecycle Toolkit

 

 

Vikram Talada

Consultant

Deloitte Consulting,

1000 One PPG Place,

Pittsburgh, PA 15222-5414.

Telephone: 412-402-5442

Fax: 412-402-5817

Email: vtalada@dc.com