Murillo, Alvaro
Abstract
The choice of a GIS data model is an important step in order to integrate large volumes of different data. A data model named IGMX was designed for INGEOMINAS as a means to integrate all the themes and provide the user with the ability to organize, update and display spatial, and attribute data according to the thematic domain. The domains are geology, geophysics, mining, geoenvironmental engineering, samples and wells over a topographical basis provided by the Colombian Geographical Institute (IGAC). The IGMX consists of a "backbone" formed by the Observation Point, the Spatial Reference Plane, the Mapping Terrain Unit, and 210 relational tables. From the point of view of thematic domains, the IGMX resembles a cylinder. This paper will include a description of the IGMX thematic layers and a brief history of how they were developed.INTRODUCTION As the need for large and complex databases continue to grow, so does the need for more simplified access to the data. Databases have become a key component in managing the voluminous amount of data gathered by geological surveys. However, searching and analyzing this data can become very time consuming and complicated. Digital spatial databases consist of geometrical descriptions of entities, with their associated topological relationships, and attributes. The database design must meet the needs of the organization, be dynamic and compatible with existing data, and avoid duplication and redundancy. Given these special kind of data and requirements for the database design, the first task was to provide a strategy in order to construct a functionally-integrated and coherent data model which supports the development of a multi-purpose geographical database. An incorrect or incomplete data model can cause potential conflicts. This paper will give an overview of the development of a framework for data modelling and a description of the GIS data model developed for the Research Institute for Earth Sciences, Mining and Chemistry (INGEOMINAS) in Colombia. The strategy For the past two years INGEOMINAS has been developing an integrated Geographic Information Systems (GIS). For this project we select ArcInfo and Sybase in a SUN/Sparcstation10 environment. After a thorough functional analysis of INGEOMINAS, it was found that the existing departmental structure was a big obstacle to the development of multidisciplinary projects. This departmentalization had subdivided information and knowledge. Because of this situation, a project was begun with the purpose of defining and developing an integrated GIS data model for the geoscientific activities of INGEOMINAS. Initially a work team was created, consisting of professionals of the different INGEOMINAS's scientific-technical areas: Geology, Geophysics, Chemistry, Geoenvironmental Engineering and Mining. There were three geologists, two system engineers and an electronic engineer student in the work team. As we lacked experience in developing a GIS data model, systematic meetings were established and other activities were aggregated to them, such as, seminars, ArcInfo courses, Sybase courses, database workshops, and surveys about GIS projects in private and governmental organizations. Of paramount interest to the work team was the EPICENTRE Data Model developed by POSC (Petrotechnical Open Software Corporation, Houston, TX) for the exploration and production activities of the oil industry. The design concepts of this data model are described in POSC (1993). After eigtheen months, a GIS data model prototype has been developed through different agreements that ensured access to the database, hardware, software and participation in courses. Need for a GIS data model There were several reasons why a GIS data model proved to be both necessary and important. The first reason was that a GIS can adapt the multidisciplinary nature of the earth sciences and allow scientists to process and interrelate diverse data types. Another reason for data modelling was that data volumes and integration requirements are being driven by the complexity of environmental and economic problems. It becomes necessary to keep a GIS to automate the manual process of gathering and analyzing the wide variety of data needed to make land-use and resource-management decisions. An additional reason was that the organization of the data is a "major factor for successful use of a GIS" (Aronoff, 1989). This organization requires a model of how the phenomenon exists in the real world; without a model, one has only the sample data and no inferences can be made about the behaviour of the real world. A data model provides the means to making it simple and clear. For these reasons, it became obvious to INGEOMINAS that the choice of a GIS data model was an important step for producing and maintaining information supporting its geoscientific activities. METHODOLOGY In developing the methodology, four guiding principles were carried out. First, the GIS data model must match all the various types of data required in any earth science project of INGEOMINAS. Second, individual investigators in the various technical areas had to be able to work independently on their data using accepted standard, while also having access to suplementary data. Third, the ability for the individual investigators to work on the total data set. Last, but not least, the data model must be strictly implementation independent whatever the evolution of database and application of computer technology. Functional analysis The analysis of data, processes, requirements, functions, information flows, and systems were the main activities during the functional analysis. From this analysis, we recognize that the data lies at the center of information systems and that the entities or "things" used in an organization do not change very much, except for the occasional addition of new entity types. For this reason, the data model had to be "open ending" so the investigators could accommodate new information without restrictions. Aiming to achieve a conceptual implementation-independent data model, the functional analysis was oriented to define the structure of the data. A decision was made to produce a Topic structure (see below) which would fit with the major business activities of INGEOMINAS. Topic structure A Topic consists of a set of entities. Each entity must belong to one and only one theme or subject domains. Each Topic has an internal owner responsible for maintaining the model of the topic entities. A Topic can or cannot coincide with the functional structure of INGEOMINAS. In INGEOMINAS, the Topic work environment consists of Geology (GL), Geophysics (GP), Mining (Mi), Geoenvironmental Engineering (IG), Sample (MU), Well (SX), and Location (LC). Table 1 shows the Topic structure. The Well and Location Topics do not coincide with the functional structure of INGEOMINAS. In fact, location is under the management of IGAC (Colombian Geographical Institute); for that reason, we model its products. This is due to the fact that topographical information is an intrinsic component in the geosciences. Data model architectural definitions POSC (1993) states that a key to succesfully integrating all the topics into one coherent data model is an architecture or overall structure that clarifies how similar issues are to be handled consistently in different areas. INGEOMINAS has followed their approach here; the reader interested in further details can consult POSC (1993, 1994). CODE TOPIC DESCRIPTION GL GEOLOGY Information about the physical nature and history of the earth. GF GEOPHYSICS Information describing the earth's subsurface. MI MINING Information about groundwater and mineral deposits. IG GEOENVIRONMENTAL ENGINEERING Information about the earth's physical environment. MU SAMPLE Data describing the identification and measurements of the properties of the earth's material. ST WELL Descriptive information about wells. LC LOCATION The description of position, as required for locating objects, in terms of coordinates or land subdivisions. Table 1. Topic data model structure of INGEOMINAS. A fundamental data model architectural definition was to model each Topic in terms of objects, activities, and properties. The objects are modules that reflect physical things of the real world, essentials and irreducibles for the purpose of the Topic. An object is similar to an entity. The properties are the characteristics that objects may have. An activity is an action that may create values for the properties. Here is how it works. In the Topic SAMPLES one might have an entity for a Sample Analysis. One of its attributes might be Type of Analysis. The sample analysis belongs to the type of entities called objects and the type of analysis belongs to the type of attributes called activities. The type of values obtained in each activity belong to the type of entities called properties. For example, one might state that the sample analysis (object) is petrography (activity) and petrographical_texture (property) is clastic (value). Another fundamental principle involves spatial objects and their use. To enable the spatial relationships of objects from different parts of the model to be analyzed we use ArcInfo. Each of the various geometrical objects in different parts of the model may be connected to the earth through relationship with one or more generic spatial objects. Entity diagram conventions A number of conventions have been adopted for diagramming the model. Entities are shown as rounded corner boxes and relationships are illustrated by lines joining the two associated objects (Figure 1). Each box has a title that is usually the initial letters of the name, example: Observation Point, PO
Each relationship verb is shown, and placed nearest to the "from" entity. Relationship cardinality of many is indicated by a crows foot symbol at the "to" entity. Absence of this implies a singular relationship (one-to- one). Major objects During the functional analysis, it was found that four major objects as described by Murillo (1994a and 1994b) are common to geoscientific information (graphical and non-graphical): Observation Point (PO), Description Point (PD), Spatial Referential Plane (PR), and Mapping Terrain Unit (UT). These are the "backbone" of all other entities in the INGEOMINAS data model. The architecture of the major objects is shown in Figure 2.
An Observation Point (PO) is a Topic's point located in or on the earth. A Description Point (PD) is an interesting point or set of points in, or on the surface of, the earth, which appears on maps. The PDs are geo_objects which can conceptually be conceived as single locations or occurrences that can be represented as point feature. A Spatial Referential Plane (PR) is the Colombian grid system and/or land subdivisions (Departamento, Municipio, etc.). The local referential plane is defined by IGAC (Colombian Geographical Institute). In terms of the Colombian grid system (scale 1:100000), PR is a polygon that has 89 nodes enclosing the Colombia's continental boundary. A Mapping Terrain Unit (UT) is a closed surface object having a homogeneous characteristic different from the surrounding. The UTs are topological elements, such as flood zones, geological formations, geomorphological units, etc. They are the most basic two dimensional geo_objects. Their combination according to the geoscientist produces the geological or thematic maps. Taking a more detailed view of the Major Objects diagram (Figure 2) we read: Each PO may have one or more PDs. Each PD must be about one and only one PO. Each PR may have one or more POs. Each PO must be about one and only one PR. Each PR may have one or more UTs. Each UT must be about one and only one PR. The purpose of the Major Objects is to specify entity classes that carry useful inheritable characteristics, enabling these characteristics to be used without reference to specific, specialized Topic. THE IGMX DATA MODEL IGMX is the name of the INGEOMINAS GIS data model. It is formed by the Major Objects and the Topic schemes (see below) that are the set of entities essential in describing a Topic. Topic schemes Topic schemes have been developed to handle a wide variety of objects, and yet remain interoperable. To achieve these goals, all the scheme have been designed as a tightly constructed, Topic-driven model, which includes the concept of precedence (sequence). The architecture of the IGMX Data Model includes the seven Topics which play a major role in INGEOMINAS (see Topic structure). In the next section, an attempt is made to illustrate only the Geology scheme (Figure 3). It consists of 24 geological objects that are related to PD, PR, and UT.
PD has three geological objects: DN, unconsolidated deposit; EL, local structural data; and, LI, lithology. The last one has seven entities (not illustrated). The geoscientist describes each object at each geological outcrop examined during the campaign. PD has a relationship cardinality of many with each object. PR has four geological objects: LM, lineament; PL, fold; FG, geological fault; and, ER, regional structural data. Fault has another entity. All these objects have been modelled as lines along geological interfaces, such as the earth surface. These lines are represented by arc features in ArcInfo. PR has a relationship cardinality of many with each object. UT has two geological objects: UCG, geological mapping unit; and, UCQ, geochemical mapping unit. The first is characterized by five entities (not illustrated here). Mappable units are the most basic two-dimensions geological objects and they are derived from the personal interpretation made by the geologist. Each UT must be about one and only one object. Entities and attributes identification The conversion from the conceptual model to a relational projection followed the rules described in Barker's book Entity/Relationship Modelling (1990). This book states that generally entities represent conceptual or real world things which are to be described by a common set of characteristics. An attribute specifies a characteristic of an entity. Attributes may be referred to by name and are defined in terms of a domain. A relationship links the basic entities togheter. Also, it states that generally an entity will become a table; an attribute will become a column, and a relationship will become an additional set of columns or foreign keys. A specific set of properties (attributes) was selected for each object according to the Topic. They were organized in a tabular form directly associated to the spatial data. The attributes tables are related to each other by common keys. The structure of the tables is shown in Figure 4 for the Major Objects.
The database was developed as part of logical database design from which to build the physical database architecture and establish a corporate metadata (Nyerges, 1989), Federal Geographic Data Committee, 1994). The physical model is beign conducted with the RDBMS Sybase (Sybase, 1991). A GIS DATA MODEL Integration of GIS on an enterprise basis is an effective way of capturing the benefits that this technology has to offer institutions (Abel et al, 1994). There are three levels of integration: organizational, functional and data (Bayham and Leppert, 1991). Our emphasis is on the integration of geoscientific data throughout the Organization. The model of an integrated GIS for INGEOMINAS was based on the Major Objects. The integration effort was reduced by using that model with a data path between the other components. The integrated model is illustrated in Figure 5 using the Geology, Geophysics, and Location schemas. The Topic order shown in this figure is circumstantial because the topics can be arranged according to the type of query made by the users.
The IGMX cylinder The IGMX Data Model was represented by a cylinder (Figure 6), more exactly, as a revolving door, with an axis and three hinges formed by: the Observation Point (PO), the Spatial Referential Plane (PR), and the Mapping Terrain Unit (UT). IGMX has seven planes, one for each Topic; and, we can observe how its objects are disposing around the axis, as leaves of a revolving door.
The data model prototype A data model prototype is a representation of a data model that will be developed. Prototyping helps build a working model with fewer errors, that can be constantly changed as the model is refined (Lucas, 1985, Peuquet and Bacastow, 1991, Abel, et al., 1992). Its purpose is to involve the user to determine the data requirements of the organization. The data model prototype named IGMX was implemented in Ingeominas. It adopted the generic Major Objects, as demonstrated in Figure 3 for Geology. Under this structure, a particular data set is viewed as a member of one Major Object. This prototype is in fact a set of entities and relationships, with exclusive group memberships. However, a data set might be a member of two Major Objects defined by derivation from a certain type of information content. The prototype was designed in 1994 by INGEOMINAS to support earth scientist activities, particularly in the case of geology, geophysics, geoenvironmental engineering, mining, well, samples, and location. Their needs span information retrieval and display of data; spatial and non-spatial analysis; and modelling. IGMX includes 210 objects and its structure is shown in Figure 7 for only 22 objects. Three groups of relations are present, one for each Major Object. The Major Objects has a relationship cardinality of one to many with each object.
CONCLUSIONS This paper has shown that a data model architecture based on objects, activities and properties gives both data modellers and users more scope to represent their ideas about the real world for the management of GIS. It is demonstrated that a GIS Data Model prototype based on Topics and the Major Objects is a solution for developing a model with spatial and non-spatial data under the same architecture. From the point of view of the Topics, the IGMX data model resembles a cylinder, more exactly, a revolving door consisting of seven vanes (the Topics) hung on a central axle (the Major Objects). Also, IGMX can be seen as a multiuser data base implemented over a client/server architecture. This model offers a framework to describe 2-D information. The further capacity to manipulate and analyze a 3-D body is the next logical step that should be made available in the IGMX data model. A final process that will be giving consideration to in the future is to design a GIS data base chip. ACKNOWLEDGEMENTS I gratefully acknowledge the contribution of the GIS work team: Flor Marina Rocha, Orlando Hernandez, Germ n Vargas, Luz Clemencia Valencia and Hugo Forero Onofre. I gratefully thank Dr. Luis Vergara for his helpful comments and suggestions. REFERENCES Abel, D. J., Yap, S. K., Ackland, R., Cameron, M. A., Smith, D. F., and Walker, G. 1992. Environmental decision support system project: an exploration of alternative architectures for geographical information systems. Int. J. Geographical Information Systems, Vol. 6, No. 3, p. 193-204. Aronoff, S. 1989. Geographic Information Systems: A management perspective. WDL Publications, Ottawa, Canada. 294p. Barker, R. 1990. Case*Method, Entity Relationship Modelling. Addison- Wesley Pub., Reading, Mass. 237p. Bayham, W. and Leppert, C. 1991. A database design strategy for integrating GIS within a large municipality. Proceedings of the 11th Annual Esri User Conference, p. 599-614. Federal Geographic Data Committee. 1994. Content standards for digital geospatial metadata. FGDC Secretariat, Reston, Va. 54p. Lucas, Jr., H. C. 1985. The analysis, design, and implementation of information systems. 3rd. Ed. McGraw-Hill, New York. 495p. Murillo, A. 1994a. Aplicaciones de un SIG en Geociencias. Seminario Int. Aplicaciones de los Sistemas de Informaci¢n Geografica y Sensores Remotos en el Manejo de los Recursos Naturales. Instituto Geografico "Agust¡n Codazzi", Santafe de Bogot, Colombia, Agosto 24 y 26. 20p. Murillo, A. 1994b. Un Sistema de Informaci¢n Georreferenciada (SIG): Una Aplicaci¢n a las Areas Mineras. In Memorias IX Seminario Nacional de Ciencias y Tecnolog¡as del Mar and V Congreso LAtinoamericano en Ciencias del Mar. Universidad EAFIT, Medell¡n, Colombia, Noviembre 21 y 25. 15p. Nyerges, T. L. 1989. Schema integration analysis for the development of GIS databases. Int. J. Geographical Information Systems, Vol. 13, No. 2, p. 153-183. POSC. 1993. Software Integration Platform Specification EPICENTRE Data Model, version 1.0, vol. I. PTR Prentice-Hall Inc., New Jersey. 731p. POSC. 1994. Software Integration Platform, version 2.0 (Snapshot). Petrotechnical Open Software Corp., Houston. Peuquet, D. J. and Bacastow, T. 1991. Organizational issues in the development of geographical information systems: a case study of U. S. Army topographic information automation. Int. J. Information Geographic Information Systems, Vol. 5, No. 3, p. 305-319. Sitansu, S. M. 1991. Principles of relational database systems. Prentice-Hall Inc., New Jersey. SYBASE. 1991. Fast Track to Sybase. Student Guide. Sybase, Inc. Product No. 5811. 267p.
Murillo, Alvaro
GIS Project Leader
INGEOMINAS
Diagonal 53 No. 34-53
Santafe de Bogota, D.C.
Telephone: 9057 1 2221811
FAX: 9057 1 2220797
e-mail: ingeomin@cdcnet.uniandes.edu.co