Datawarehouse; where to locate GIS

Contents

  • Abstract
  • Datawarehouse; where to Locate a GIS
  • Introduction
  • Why a Datawarehouse?
  • What is a Datawarehouse?
  • What is Spatial Datawarehouse?
  • Why a Spatial Datawarehouse?
  • Benefits and costs
  • Spatial Datawarehouse advantages
  • References
  • Abstract

    In supporting decision making and planning, information about the business, products and customers is most important. The data on which this decision making is based, is very widespread and is physically kept in different places, stored in transaction-based systems, accessible mostly in one way, and consisting of a huge amount of data. In most cases data is present, the problem is the inadequacy, accessibility, form, performance and availability of the data.

    Datawarehousing is not a technology, it is an information technology system. A Datawarehouse is a collection of data stored in an orderly and accessible way. Facts and related data are used in a Datawarehouse for better analyses and decision support. The basic characteristics of data in a Datawarehouse are; consistent, subject-oriented, integrated, time-variant, non-volatile.

    Datawarehousing makes it possible to view operational data in a multidimensional way, and turns data into information. The advantages of using a Datawarehouse lie in the better understanding of the business, the possibility of the customer being served better, better understanding of the business risks , improvement of the business processes, being able to make more tailor made products and services.

    The spatial component in the Datawarehouse architecture consists of three parts. Firstly, the tool to geocode data and spatial aggregation. Secondly, the spatial database and thirdly analyses and presentation. The most important attraction of integrating GIS in your Datawarehouse is being able to make dynamically geographic queries on your data and to aggregate your data to geographic areas.

    Logisterion has executed projects for a telecommunications company, public housing and the ministry of transportation and public works. Although these experiences are in some cases at an early stage, it is clear that there are several ways to look at a Datawarehouse in combination with a GIS. In all cases we see many benefits of Datawarehousing with a spatial component, and its additional value to the business.

      [ Top ]

    Data Warehouse where to Locate a GIS

    Introduction

    We cannot calculate how much data nowadays is electronically available. Neither can we imagine how much it has cost to gather this data and to feed the systems. Almost all data is captured and used in special purpose systems, so called operational systems. Data in these systems is being modelled for these systems in a specific format. This is done just on behalf of the specific transactional system. Now this data is here, we want to use it for not only the operational tasks but more for decision support. This is becoming more and more important. If you want to keep ahead of your competitor, you must get a better understanding of their needs, the trends in the market, the correlation between events, etc. Not only in competition, but good information also makes the whole company perform better. Good information means information that is there at the right moment, at the right place and in the right (useably) format.

    For people who are not used to work with GIS and even those who do, it is hard to imagine that over eighty percent of the business data has some spatial context. This means that if you want to use data for your decision support systems, you have to consider the use of this spatial factor.

    To get and to keep all spatial and non-spatial data in a useable form for decision support, the concepts of Datawarehousing and GIS can be applied in an integrated fashion.

      [ Top ]

    Why a Datawarehouse

    Until now information technology is mainly focused on getting the manual processes automated. The goal is to get the work done at lower operating costs. It is rather easy to calculate the return on investment when you buy a bookkeeping program which saves you, for example 20, man hours a week. It needs another way of thinking to see the benefits of investing in a dedicated Datawarehouse system. This investment is based on the premise that the intellectual process gets informed, that strategic decisions can be better made.

    The users of the corporate computers understand that the key to identify co-operative threats and opportunities lies locked in the corporate data which is often embedded in legacy systems on different technologies. The customer nowadays has more and more individual needs. Companies have to react to these needs, have to know what trends there are, or better what the trend will be. They need tools which can analyse data, the so called decision support systems. These systems are fed by data which is managed in a Datawarehouse.

    In very many cases the introduction of a Datawarehouse is leading to fundamental changes in the way the market is looked at, processes or correlation's between events. This is driven by the policy of micro-segmentation on basis of data-patterns, which allows the enterprise to observe, over time, the behaviour of data, the corresponding behaviour of customers, processes or events. It is no longer sufficient to satisfy a customer, to monitor processes or events. The aim must be to delight the customer, to predict outcomes of processes and events. The competition has to be beaten. It is not enough to keep up with the competition, you have to surprise them. Information to do so, must be available and accessible.

    There are three basic assumptions to justify a Datawarehouse. The first is locked inside the corporate data; there are valuable patterns of information which are very important in guiding the business. The second is that this information will form the basis of unique services to customers, discovering new trends, predicting outcomes in a manner that will transform the understanding which the company may have of the market. The third is the shortening of the distance between the identification of strategy and the execution of strategy. This will progressively transform the understanding which the company may have of its own organisational structure. Through the developments in hardware and software it now is possible to create the IT-architecture (the Datawarehouse) which can handle the huge amount of data.

      [ Top ]

    What is a Datawarehouse

    The Datawarehouse may be defined in terms of nine characteristics which differentiate the Datawarehouse from the legacy systems in the company. To make these differences clear, let us first have a look at the main characteristics of an operational system.

    The operational or 'legacy'-systems are being optimised for their operational tasks. Therefore they have:

    • Very good performance. The operational data is available as quick as possible and always available.
    • Very dynamic. The main purpose is to update the systems with new data, and editing existing data.
    • Minor history. Most of the older data is archived off-line on tape, or other media. The reason for this is cleaning up the operational environment and not using historical data in the operational environment.
    • External inconsistency. Structure and the contents of the data itself is not always consistent between operational systems
    • Historical inconsistency. Operational systems are adapted to the business needs. Many times the consequences for the historical data are not taken into account.

    The Datawarehouse is designed for Business analyses. It has a different character than the operational systems.

    Typical for Datawarehouse systems is:

    • Separate. The Datawarehouse is separate from the operational systems in the company. It gets its data out of these legacy systems.
    • Available. The task of a Datawarehouse is to make data accessible for the user.
    • Integrated. The basis of this integration is the standard company model
    • A lot of history. Questions have to be answered, trends and correlation's have to be discovered. They are time stamped and associated with defined periods of time.
    • Subject oriented. Most of the time oriented on the subject 'customer'.
    • Not dynamic. When the data is updated, it is done only periodical, but not as on individual basis.
    • Aggregation performance. The data which is requested by the user has to perform well on all scales of aggregation.
    • Consistency. Structural and contents of the data is very important and can only be guaranteed by the use of metadata: this is independent from the source and collection date of the data.
    • Iterative development: a Datawarehouse is the implementation of a concept. The Datawarehouse starts small and grows bigger and bigger. Starting point is a subject area. Implementation is done in an iterative process. Iterations are clearly defined projects, which are added to, a modularity way, to the Datawarehouse. The big challenge in a iterative process is to guarantee the structural and contents consistency of the data, it makes the Datawarehouse live or die.

    The Datawarehouse is in essence a response to the problems and constraints that exists in Information-technology. Datawarehousing is an answer to the problems of: integration of operational applications, modelling the data to corporate standards, not fulfilling the demands of reporting requirements of decision makers, not being able to ensure that the data in corporate databases is clean and consistent.

    The Datawarehouse makes it possible to do on-line analytical processing (OLAP). OLAP systems are used by decision makers to query and analyse the data in the Datawarehouse. The data for analysis with OLAP is accessed through metadata which document data source, frequency of update and location of data.

    The outcome of queries is represented "multidimensional". A multidimensional database is a database where the data is structured as measures and dimensions. Measures are numerical data such as sales. Dimensions are the kind of data that can be summarised with measures such as store, region, or state. The user can specify high- or detailed-level views of data with navigation through drill downs in reports to finer levels of detail and analysis by product, location, and time.

    The data returned from the queries can be used to drill down. This allows the user to ask more detailed questions. For example, after identifying the road with the most accidents, the user can then search the Datawarehouse for information about weather circumstances, number of vehicles per day, road surface etc.

      [ Top ]

    What is Spatial Datawarehouse?

    To be able to use spatial data, and to take full advantage of the spatial dimension, the locational element data has to be integrated in the Datawarehouse. The following GIS-concepts is being used and, with GIS-technology, being implemented in the organisation. There are four main items to distinguished:

    • Geo-reference. Making the objects without geo-references, spatial enable. Objects are connected to a model which represents the real world. This is done by giving the x- and y- co-ordinates in a co-ordinate system. For the object tree: a point object with one co-ordinate, for the object building: a polygon object with several co-ordinates.
    • Geo-coding. Objects like statistics, are objects without a geo-reference. To make it possible to visualise them in the model of the real world, to display them on a map, they have to be connected to co-ordinates.
    • Topology. Separated objects which in the "real world" have relations with each other have to be in the model of the real world. An example is a road which is connected to another road.
    • Spatial aggregation. Depending on the demands on the Datawarehouse, a certain aggregation of the data will be necessary. Not all the detail data has to be stored in the Datawarehouse. For instance, for distribution planning analyses, it is not necessary to have all the addresses (streets and house numbers) only the names and location of the streets are needed.

    Spatial enabling the DW

    Figure 1: Spatial Datawarehouse [ Top ]


    Why a Spatial Datawarehouse?

    Making your Datawarehouse spatially enabled provides you with four distinct capabilities:

    • Presentation, visualisation. Data which has a geo-reference can be visualised on a map. Maps itself are a powerful tool to visually relate seemingly disparate data. For instance marketing, engineering and financial data can be combined in a single analysis on a map.
    • Analysing data. A spatial Datawarehouse makes it possible to perform spatial analyses. Now you can make network analyses, overlay-analyses etc.
    • Aggregation of data. This makes it possible to aggregate information to a geographic boundary. An example is to aggregate data of neighbourhoods to the city or to districts.
    • Spatial (re)organisation. Geographical boundaries change through times, for instance communities are being put together, or split into new ones.

    Spatially enabling adds a new dimension to your database. This dimension the geographical one, which does not need to be explicitly defined. By storing the geographical co-ordinates in the database, query is possible on their interrelationships based on geography. When location of stores, or competitors information are stored in the Datawarehouse, for example the following questions can be asked:

    • Where are my competitors located?
    • What are the total sales per ZIP-code area, by neighbourhoods, county?
    • Which bus stops are nearby (100, 200 500 metres)?
    • Which stores have 250 customers living within 1, 3 or more then 5 kilometres?

    An infinite number of questions can be created when you want to geographically relate subjects in the Datawarehouse (customers, competitors, stores, stations, generators etc.) to subjects of geographical interests (towns, gas pipes, power lines, bus routes, streets).

      [ Top ]

    Benefits and costs

    To build an architecture like a Datawarehouse means investing. It has to be made clear what the benefits are. In most cases the real benefits of the Datawarehouse are not known or even anticipated at the moment of construction. This is because the Datawarehouse is used in a entirely differently way from the operational systems. It is used in a trial and error way of working. The decision Support analyst cannot say what the possibilities and potential of the Datawarehouse are until the first version of the Datawarehouse is ready to use. The normal way of calculating the return on investment cannot be used.

    Fortunately, the Datawarehouse is built in small steps. The first step (iteration) can be done quickly and for a relatively small amount of money. Once the first portion of the Datawarehouse is built and filled, the user can start to explore the possibilities. At that point it is possible to make a justification of the development costs of a Datawarehouse.

    The benefits of the Datawarehouse comes in the ability to make effective decisions from it. The possibility to discover trends and correlations, as they happen now provides the benefits to the business. It is not easy to quantify the benefits to justify the Datawarehouse. How much is saved by giving the Decision makers an effective process for making critical decisions? The experience of the now implemented Datawarehouses teaches us that the organisations who have such an Informationtechnology-architecture could not do without it anymore.

      [ Top ]

     

    Spatial Datawarehouse advantages

    The advantages of using a Datawarehouse lie in the better understanding of the business, the possibility of the customer being served better, better understanding of the business risks, improvement of the business processes, being able to make more tailor made products and services.

    The most important attraction of spatial enabling your Datawarehouse is being able to make dynamically geographic queries on your data, to aggregate your data to geographic areas, to analyse data, spatial (re)organisation of your data and, last but not least, presentation and visualisation.

    Logisterion has executed projects for a telecommunications company, public housing and the ministry of public works. Although these experiences are in some cases at an early stage, it is clear that there are several ways to look at a Datawarehouse in combination with a GIS. In all cases, we see many benefits of Datawarehousing with a spatial component, and its additional value to the business.

      [ Top ]

    References

    Inmon, W.H., Building the Datawarehouse. 1992.

    Kelly, S, Datawarehousing: the route to mass customization. 1997.

    Vermeij, L.W. & Berkel, J. van, GIS Mapping for the Data Warehouse. 1997.

    Drs. Jan van Berkel
    Consultant, Logisterion Automatisering
    Stationsplein 45
    3013 AK Rotterdam
    Netherlands
    Logisterion Automatisering
       
    Telephone: 00 31 10 217 07 00
    Fax: 00 31 10 413 96 93
    Email: berkel@logisterion.nl
    [ Contents ]