Murillo, Alvaro

A GIS DATA MODEL PROTOTYPE

Abstract

The choice of a GIS data model is an important step in order to integrate large volumes of different data. A data model named IGMX was designed for INGEOMINAS as a means to integrate all the themes and provide the user with the ability to organize, update and display spatial, and attribute data according to the thematic domain. The domains are geology, geophysics, mining, geoenvironmental engineering, samples and wells over a topographical basis provided by the Colombian Geographical Institute (IGAC). The IGMX consists of a "backbone" formed by the Observation Point, the Spatial Reference Plane, the Mapping Terrain Unit, and 210 relational tables. From the point of view of thematic domains, the IGMX resembles a cylinder. This paper will include a description of the IGMX thematic layers and a brief history of how they were developed.



INTRODUCTION



As the need for large and complex databases continue to grow, so does the 

need for more simplified access to the data. Databases have become a key 

component in managing the voluminous amount of data gathered by geological 

surveys. However, searching and analyzing this data can become very time 

consuming and complicated.



Digital spatial databases consist of geometrical descriptions of entities, 

with their associated topological relationships, and attributes. The 

database design must meet the needs of the organization, be dynamic and 

compatible with existing data, and avoid duplication and redundancy.



Given these special kind of data and requirements for the database design, 

the first task was to provide a strategy in order to construct a 

functionally-integrated and coherent data model which supports the 

development of a multi-purpose geographical database. An incorrect or 

incomplete data model can cause potential conflicts. This paper will give 

an overview of the development of a framework for data modelling and a 

description of the GIS data model developed for the Research Institute for 

Earth Sciences, Mining and Chemistry (INGEOMINAS) in Colombia.



The strategy



For the past two years INGEOMINAS has been developing an integrated 

Geographic Information Systems (GIS). For this project we select ArcInfo 

and Sybase in a SUN/Sparcstation10 environment.



After a thorough functional analysis of INGEOMINAS, it was found that the 

existing departmental structure was a big obstacle to the development of 

multidisciplinary projects. This departmentalization had subdivided 

information and knowledge. Because of this situation, a project was begun 

with the purpose of defining and developing an integrated GIS data model 

for the geoscientific activities of INGEOMINAS.



Initially a work team was created, consisting of professionals of the 

different INGEOMINAS's scientific-technical areas: Geology, Geophysics, 

Chemistry, Geoenvironmental Engineering and Mining. There were three 

geologists, two system engineers and an electronic engineer student in the 

work team.



As we lacked experience in developing a GIS data model, systematic 

meetings were established and other activities were aggregated to them, 

such as, seminars, ArcInfo courses, Sybase courses, database workshops, 

and surveys about GIS projects in private and governmental organizations. 

Of paramount interest to the work team was the EPICENTRE Data Model 

developed by POSC (Petrotechnical Open Software Corporation, Houston, TX) 

for the exploration and production activities of the oil industry. The 

design concepts of this data model are described in POSC (1993). After 

eigtheen months, a GIS data model prototype has been developed through 

different agreements that ensured access to the database, hardware, 

software and participation in courses.



Need for a GIS data model



There were several reasons why a GIS data model proved to be both 

necessary and important. The first reason was that a GIS can adapt the 

multidisciplinary nature of the earth sciences and allow scientists to 

process and interrelate diverse data types.



Another reason for data modelling was that data volumes and integration 

requirements are being driven by the complexity of environmental and 

economic problems. It becomes necessary to keep a GIS to automate the 

manual process of gathering and analyzing the wide variety of data needed 

to make land-use and resource-management decisions.



An additional reason was that the organization of the data is a "major 

factor for successful use of a GIS" (Aronoff, 1989). This organization 

requires a model of how the phenomenon exists in the real world; without a 

model, one has only the sample data and no inferences can be made about 

the behaviour of the real world. A data model provides the means to making 

it simple and clear. For these reasons, it became obvious to INGEOMINAS 

that the choice of a GIS data model was an important step for producing 

and maintaining information supporting its geoscientific activities.



METHODOLOGY



In developing the methodology, four guiding principles were carried out. 

First, the GIS data model must match all the various types of data 

required in any earth science project of INGEOMINAS. Second, individual 

investigators in the various technical areas had to be able to work 

independently on their data using accepted standard, while also having 

access to suplementary data. Third, the ability for the individual 

investigators to work on the total data set. Last, but not least, the data 

model must be strictly implementation independent whatever the evolution 

of database and application of computer technology.



Functional analysis



The analysis of data, processes, requirements, functions, information 

flows, and systems were the main activities during the functional 

analysis. From this analysis, we recognize that the data lies at the 

center of information systems and that the entities or "things" used in an 

organization do not change very much, except for the occasional addition 

of new entity types. For this reason, the data model had to be "open 

ending" so the investigators could accommodate new information without 

restrictions.



Aiming to achieve a conceptual implementation-independent data model, the 

functional analysis was oriented to define the structure of the data. A 

decision was made to produce a Topic structure (see below) which would fit 

with the major business activities of INGEOMINAS.



Topic structure



A Topic consists of a set of entities. Each entity must belong to one and 

only one theme or subject domains. Each Topic has an internal owner 

responsible for maintaining the model of the topic entities.  A Topic can 

or cannot coincide with the functional structure of INGEOMINAS.



In INGEOMINAS, the Topic work environment consists of Geology (GL), 

Geophysics (GP), Mining (Mi), Geoenvironmental Engineering (IG), Sample 

(MU), Well (SX), and Location (LC). Table 1 shows the Topic structure.



The Well and Location Topics do not coincide with the functional structure 

of INGEOMINAS. In fact, location is under the management of IGAC 

(Colombian Geographical Institute); for that reason, we model its 

products. This is due to the fact that topographical information is an 

intrinsic component in the geosciences. 



Data model architectural definitions



POSC (1993) states that a key to succesfully integrating all the topics 

into one coherent data model is an architecture or overall structure that 

clarifies how similar issues are to be handled consistently in different 

areas. INGEOMINAS has followed their approach here; the reader interested 

in further details can consult POSC (1993, 1994). 









                                   CODE

                                   TOPIC

                                DESCRIPTION





                                    GL

                                  GEOLOGY

Information about the physical nature and history of the earth.





                                    GF

                                GEOPHYSICS

Information describing the earth's subsurface.





                                    MI

                                  MINING

Information about groundwater and mineral deposits.





                                    IG

                             GEOENVIRONMENTAL

                                ENGINEERING

Information about the earth's physical environment.





                                    MU

                                  SAMPLE

Data describing the identification and measurements of the properties of 

the earth's material.





                                    ST

                                   WELL

Descriptive information about wells.





                                    LC

                                 LOCATION

The description of position, as required for locating objects, in terms of 

coordinates or land subdivisions.





Table 1. Topic data model structure of INGEOMINAS.





A fundamental data model architectural definition was to model each Topic

in terms of objects, activities, and properties. The objects are modules 

that reflect physical things of the real world, essentials and 

irreducibles for the purpose of the Topic. An object is similar to an 

entity. The properties are the characteristics that objects may have.  An 

activity is an action that may create values for the properties.



Here is how it works. In the Topic SAMPLES one might have an entity for a 

Sample Analysis. One of its attributes might be Type of Analysis. The 

sample analysis belongs to the type of entities called objects and the 

type of analysis belongs to the type of attributes called activities. The 

type of values obtained in each activity belong to the type of entities 

called properties. For example, one might state that the sample analysis 

(object) is petrography (activity) and petrographical_texture (property) 

is clastic (value).



Another fundamental principle involves spatial objects and their use. To 

enable the spatial relationships of objects from different parts of the 

model to be analyzed we use ArcInfo. Each of the various geometrical 

objects in different parts of the model may be connected to the earth 

through relationship with one or more generic spatial objects.



Entity diagram conventions



A number of conventions have been adopted for diagramming the model. 

Entities are shown as rounded corner boxes and relationships are 

illustrated by lines joining the two associated objects (Figure 1). Each 

box has a title that is usually the initial letters of the name, example: 

Observation Point, PO


Figure 1: Entity Diagram Convention
Each relationship verb is shown, and placed nearest to the "from" entity. 

Relationship cardinality of many is indicated by a crows foot symbol at 

the "to" entity. Absence of this implies a singular relationship (one-to-

one). 



Major objects



During the functional analysis, it was found that four major objects as 

described by Murillo (1994a and 1994b) are common to geoscientific 

information (graphical and non-graphical): Observation Point (PO), 

Description Point (PD), Spatial Referential Plane (PR), and Mapping 

Terrain Unit (UT). These are the "backbone" of all other entities in the 

INGEOMINAS data model. The architecture of the major objects is shown in 

Figure 2.


Figure 2: Major Objects Architecture
An Observation Point (PO) is a Topic's point located in or on the earth. A

Description Point (PD) is an interesting point or set of points in, or on 

the surface of, the earth, which appears on maps. The PDs are geo_objects 

which can conceptually be conceived as single locations or occurrences 

that can be represented as point feature.



A Spatial Referential Plane (PR) is the Colombian grid system and/or land 

subdivisions (Departamento, Municipio, etc.). The local referential plane 

is defined by IGAC (Colombian Geographical Institute). In terms of the 

Colombian grid system (scale 1:100000), PR is a polygon that has 89 nodes 

enclosing the Colombia's continental boundary.



A Mapping Terrain Unit (UT) is a closed surface object having a 

homogeneous characteristic different from the surrounding. The UTs are 

topological elements, such as flood zones, geological formations, 

geomorphological units, etc. They are the most basic two dimensional 

geo_objects. Their combination according to the geoscientist produces 

the geological or thematic maps.

 

Taking a more detailed view of the Major Objects diagram (Figure 2) we 

read:



Each PO may have one or more PDs.

Each PD must be about one and only one PO.



Each PR may have one or more POs.

Each PO must be about one and only one PR.



Each PR may have one or more UTs.

Each UT must be about one and only one PR.



The purpose of the Major Objects is to specify entity classes that carry 

useful inheritable characteristics, enabling these characteristics to be 

used without reference to specific, specialized Topic. 



THE IGMX DATA MODEL



IGMX is the name of the INGEOMINAS GIS data model. It is formed by the 

Major Objects and the Topic schemes (see below) that are the set of 

entities essential in describing a Topic. 



Topic schemes



Topic schemes have been developed to handle a wide variety of objects, 

and yet remain interoperable. To achieve these goals, all the scheme have 

been designed as a tightly constructed, Topic-driven model, which includes 

the concept of precedence (sequence). 



The architecture of the IGMX Data Model includes the seven Topics which 

play a major role in INGEOMINAS (see Topic structure). In the next 

section, an attempt is made to illustrate only the Geology scheme 

(Figure 3). It consists of 24 geological objects that are related to PD, 

PR, and UT. 


Figure 3: Geologic Scheme
PD has three geological objects: DN, unconsolidated deposit; EL, local 

structural data; and, LI, lithology. The last one has seven entities 

(not illustrated). The geoscientist describes each object at each 

geological outcrop examined during the campaign. PD has a relationship 

cardinality of many with each object.



PR has four geological objects: LM, lineament; PL, fold; FG, geological 

fault; and, ER, regional structural data. Fault has another entity. All 

these objects have been modelled as lines along geological interfaces, 

such as the earth surface. These lines are represented by arc features in 

ArcInfo. PR has a relationship cardinality of many with each object.



UT has two geological objects: UCG, geological mapping unit; and, UCQ, 

geochemical mapping unit. The first is characterized by five entities (not 

illustrated here). Mappable units are the most basic two-dimensions 

geological objects and they are derived from the personal interpretation 

made by the geologist. Each UT must be about one and only one object.



Entities and attributes identification



The conversion from the conceptual model to a relational projection 

followed the rules described in Barker's book Entity/Relationship 

Modelling (1990). This book states that generally entities represent 

conceptual or real world things which are to be described by a common set 

of characteristics. An attribute specifies a characteristic of an entity. 

Attributes may be referred to by name and are defined in terms of a 

domain. A relationship links the basic entities togheter. Also, it states 

that generally an entity will become a table; an attribute will become a 

column, and a relationship will become an additional set of columns or 

foreign keys.



A specific set of properties (attributes) was selected for each object 

according to the Topic. They were organized in a tabular form directly 

associated to the spatial data. The attributes tables are related to each 

other by common keys. The structure of the tables is shown in Figure 4 for 

the Major Objects.


Figure 4
The database was developed as part of logical database design from which 

to build the physical database architecture and establish a corporate 

metadata (Nyerges, 1989),  Federal Geographic Data Committee, 1994). The 

physical model is beign conducted with the RDBMS Sybase (Sybase, 1991).



A GIS DATA MODEL



Integration of GIS on an enterprise basis is an effective way of capturing 

the benefits that this technology has to offer institutions (Abel et al, 

1994). There are three levels of integration: organizational, functional 

and data (Bayham and Leppert, 1991). Our emphasis is on the integration of 

geoscientific data throughout the Organization.



The model of an integrated GIS for INGEOMINAS was based on the Major 

Objects. The integration effort was reduced by using that model with a 

data path between the other components. The integrated model is 

illustrated in Figure 5 using the Geology, Geophysics, and Location 

schemas. The Topic order shown in this figure is circumstantial because 

the topics can be arranged according to the type of query made by the 

users.


Figure 5
The IGMX cylinder



The IGMX Data Model was represented by a cylinder (Figure 6), more 

exactly, as a revolving door, with an axis and three hinges formed by: the 

Observation Point (PO), the Spatial Referential Plane (PR), and the 

Mapping Terrain Unit (UT). IGMX has seven planes, one for each Topic; and, 

we can observe how its objects are disposing around the axis, as leaves of 

a revolving door.


Figure 6
The data model prototype



A data model prototype is a representation of a data model that will be 

developed. Prototyping helps build a working model with fewer errors, that 

can be constantly changed as the model is refined (Lucas, 1985, Peuquet 

and Bacastow, 1991, Abel, et al., 1992). Its purpose is to involve the 

user to determine the data requirements of the organization.



The data model prototype named IGMX was implemented in Ingeominas. It 

adopted the generic Major Objects, as demonstrated in Figure 3 for 

Geology. Under this structure, a particular data set is viewed as a member 

of one Major Object. This prototype is in fact a set of entities and 

relationships, with exclusive group memberships. However, a data set might 

be a member of two Major Objects defined by derivation from a certain type 

of information content.



The prototype was designed in 1994 by INGEOMINAS to support earth 

scientist activities, particularly in the case of geology, geophysics, 

geoenvironmental engineering, mining, well, samples, and location. Their 

needs span information retrieval and display of data; spatial and 

non-spatial analysis; and modelling.



IGMX includes 210 objects and its structure is shown in Figure 7 for only 

22 objects. Three groups of relations are present, one for each Major 

Object. The Major Objects has a relationship cardinality  of one to many 

with each object.


Figure 7

CONCLUSIONS



This paper has shown that a data model architecture based on objects, 

activities and properties gives both data modellers and users more scope 

to represent their ideas about the real world for the management of GIS.



It is demonstrated that a GIS Data Model prototype based on Topics and the 

Major Objects is a solution for developing a model with spatial and 

non-spatial data under the same architecture.



From the point of view of the Topics, the IGMX data model resembles a 

cylinder, more exactly, a revolving door consisting of seven vanes (the 

Topics) hung on a central axle (the Major Objects). Also, IGMX can be seen 

as a multiuser data base implemented over a client/server architecture.

 

This model offers a framework to describe 2-D information. The further 

capacity to manipulate and analyze a 3-D body is the next logical step 

that should be made available in the IGMX data model. A final process that 

will be giving consideration to in the future is to 

design a GIS data base chip.



ACKNOWLEDGEMENTS



I gratefully acknowledge the contribution of the GIS work team: Flor 

Marina Rocha, Orlando Hernandez, Germ n Vargas, Luz Clemencia Valencia and 

Hugo Forero Onofre. I gratefully thank Dr. Luis Vergara for his helpful 

comments and suggestions.



REFERENCES



Abel, D. J., Yap, S. K., Ackland, R., Cameron, M. A., Smith, D. F., and 

Walker, G. 1992. Environmental decision support system project: an 

exploration of alternative architectures for geographical information 

systems. Int. J. Geographical Information Systems, Vol. 6, No. 3, p. 

193-204.



Aronoff, S. 1989. Geographic Information Systems: A management perspective. 

WDL Publications, Ottawa, Canada. 294p.



Barker, R. 1990. Case*Method, Entity Relationship Modelling. Addison-

Wesley Pub., Reading, Mass. 237p.

 

Bayham, W. and Leppert, C. 1991. A database design strategy for 

integrating GIS within a large municipality. Proceedings of the 11th 

Annual Esri User Conference, p. 599-614.



Federal Geographic Data Committee. 1994. Content standards for digital 

geospatial metadata. FGDC Secretariat, Reston, Va. 54p. 



Lucas, Jr., H. C. 1985. The analysis, design, and implementation of 

information systems. 3rd. Ed. McGraw-Hill, New York. 495p.



Murillo, A. 1994a. Aplicaciones de un SIG en Geociencias. Seminario Int. 

Aplicaciones de los Sistemas de Informaci¢n Geografica y Sensores Remotos 

en el Manejo de los Recursos Naturales. Instituto Geografico "Agust¡n 

Codazzi", Santafe de Bogot, Colombia, Agosto 24 y 26. 20p.



Murillo, A. 1994b. Un Sistema de Informaci¢n Georreferenciada (SIG): Una 

Aplicaci¢n a las Areas Mineras. In Memorias IX Seminario Nacional de 

Ciencias y Tecnolog¡as del Mar and V Congreso LAtinoamericano en Ciencias 

del Mar. Universidad EAFIT, Medell¡n, Colombia, Noviembre 21 y 25. 15p.



Nyerges, T. L. 1989. Schema integration analysis for the development of 

GIS databases. Int. J. Geographical Information Systems, Vol. 13, No. 2, 

p. 153-183.



POSC. 1993. Software Integration Platform Specification EPICENTRE Data 

Model, version 1.0, vol. I. PTR Prentice-Hall Inc., New Jersey. 731p.



POSC. 1994. Software Integration Platform, version 2.0 (Snapshot). 

Petrotechnical Open Software Corp., Houston. 



Peuquet, D. J. and Bacastow, T. 1991. Organizational issues in the 

development of geographical information systems: a case study of U. S. 

Army topographic information automation. Int. J. Information Geographic 

Information Systems, Vol. 5, No. 3, p. 305-319.



Sitansu, S. M. 1991. Principles of relational database systems. 

Prentice-Hall Inc., New Jersey.



SYBASE. 1991. Fast Track to Sybase. Student Guide. Sybase, Inc. Product 

No. 5811. 267p.


Murillo, Alvaro
GIS Project Leader
INGEOMINAS
Diagonal 53 No. 34-53
Santafe de Bogota, D.C.
Telephone: 9057 1 2221811
FAX: 9057 1 2220797
e-mail: ingeomin@cdcnet.uniandes.edu.co