Title: Geodata Across the Campus Network: Library GIS Data Services at North Carolina State University
Abstract: GIS data services at NCSU libraries are focused on delivery of data directly to departmental
computer laboratories, classrooms, offices, and research units. Data is accessible in two network
environments. To assist data access and utilization, phone, E-mail, and and in-library consultations
are supplemented by on-site technical support visits. Extensive, Web-based documentation
facilitates spontaneous, unmediated access to data. In collaboration with the College of Forest
Resources and the North Carolina Cooperative Extension Service, the libraries are also
implementing a Web-based mapping system that provides public and non-GIS user access to
environmental, economic, and development data.
1) Introduction
Geographic Information Systems (GIS) services at the North Carolina State University (NCSU) Libraries date back to 1992, when the library became one of the original participants in the Association of Research Libraries (ARL) GIS Literacy program. (Abbott and Argentati 1995) Initial GIS efforts of the NCSU Libraries focused on provision of in-library public access to GIS workstations and data. Library staff received some basic training in GIS and consultation was provided to data seekers. Library-sponsored workshops were also added to the mix of services. As contacts and partnerships with campus academic and research units involved with GIS increased, it became apparent that great potential for services development lay in providing data directly to the campus units themselves via the campus network, either through network file access (i.e., mounted disk or mapped drive access) or download access via FTP. (Argentati 1997)
Network access to data carries the following advantages:
- It is generally more convenient for campus instructors, researchers and students to carry out GIS work within their own departmental laboratories, instructional facilities, or research units than to do so in the library.
- GIS data sets tend to be somewhat large and difficult to transfer by common media such as floppy diskettes.
- A central, network-accessible repository of data reduces the need for redundant data archival efforts at campus units.
- Data resources contributed by campus units become, via the library, accessible and known to those in other campus units.
Network file access (mounted disk or mapped drive) to data carries the following additional advantages:
- Users can access and use the data directly within the application, without downloading.
- Student computing space in departmental laboratories tends to be somewhat limited. While standard student data storage space is 40 megabytes, just one digital orthophoto data set approaches 50 megabytes in size.
- Multi-file data formats can create problems for FTP access.
II) Background
GIS at North Carolina State University
While Geography or City and Regional Planning departments typically serve as centers for GIS activity at universities, NCSU houses neither. The College of Forest Resources, including the Center for Earth Observation and the Dept. of Parks, Recreation & Tourism, took a leadership role in GIS early on and is the home of the Geographic Information Science (GIS) Research and Teaching Program at NC State. The College of Agriculture & Life Sciences, the School of Design and the Institute of Transportation Research and Education (ITRE) were also early adopters of GIS. The past two years have seen an explosion of GIS activity across roughly 30 different academic departments and as well as in a number of campus research and administrative units.
Entry-level and advanced GIS education courses are offered through the Center for Earth Observation in the College of Forest Resources. Fee-based one to three day workshops are offered through ITRE. In addition, the library also offers monthly or bimonthly six hour introductory workshops free of charge to NCSU affiliates.
Software
NCSU owns a campus site license for Environmental Systems Research Institute (Esri) software products that includes an unlimited number of ArcView licenses and 5,000 Arc/Info seats. ArcView is a relatively user friendly, entry-level desktop GIS software program. Arc/Info, which runs under NT or UNIX, is a more robust software package, but requires a much greater investment in time and training. New GIS practitioners at NCSU who use Esri software generally start with ArcView and there are far more ArcView users on campus than there are Arc/Info users. ArcView and Arc/Info can be run from any Sun Unity workstation on campus and installation media for the various software components are made available upon request for installation on computers with NCSU property tags. The four piece combination of: a) campus site software site licenses, b) campus-wide access to data via the library, c) readily accessible training, and d) available technical support from the library and other campus units makes it possible for an individual from any campus unit to spontaneously "buy in" and get started with GIS without waiting for a major resource commitment by their department.
Data Resources in the NCSU Libraries
The NCSU Libraries houses a wide range of feature data, imagery, digital orthophotography and tabular data resources. In the early years of GIS services the chief data resources were commercial in nature. These included the Wessex TIGER distribution and, from Esri, the Digital Chart of the World, ArcUSA and ArcWorld. Early in 1997, the library purchased a campus license for a statewide coverage of SPOT ten-meter resolution panchromatic satellite imagery. Through a cooperative agreement with the North Carolina Center for Geographic Information Analysis (CGIA), the state's coordinating GIS agency, NCSU gained free access via a Wide Area Network (WAN) connection to the state Corporate Geographic Database, which includes more than 100 data layers. Nearly all of these data layers are replicated on the library's servers for local access. Through the Federal Depository Library program (FDLP), a number of other data resources--including Digital Orthophoto Quarter Quadrangles (DOQQs), Digital Raster Graphics (DRG), and Census data (population, economic and agriculture)--are available. The library recently acquired a statewide coverage of uncompressed black & white DOQQ's. The library also houses a variety of data resources generated by local research and education units.
The Computing Environment at NCSU
NC State hosts Unity, one of the largest distributed academic computing networks in the world. Unity is the campus computing realm: a large-scale, multi-platform, UNIX based distributed computing network of workstations and servers providing access to over 35,000 users and running over 50 software packages for the UNIX operating system, including Arc/Info and ArcView GIS software. Any NCSU affiliate (faculty, staff or student) can apply for a Unity account that includes 40 megabytes of storage space. Until very recently, all Unity workstations were UNIX machines. (McDaniel, 1998) In 1998 Computing Services began deploying Windows NT workstations which were configured to allow student use of personal Unity workstations via the NCSUGINA (Graphical Identification and Authentication) client.
Individual campus departments and units typically run a local area network (LAN), usually using Novell Netware. Novell's Netware offers a directory service called Novell Directory Services (NDS). A directory service gives users and administrators transparent access to distributed networked resources, including resources on other Novell networks. (Novell, 1995) The campus local area networking model is grounded in the idea of centrally supported and decentrally managed Novell services. Some campus units run local NT networks, but campus-wide NT networking is not currently supported--although trusts can be set up to allow resource sharing between individual units. Departmental computer labs typically run Windows 95 or Windows NT on client workstations. In some cases NT labs are configured for student access to personal Unity space using the NCSUGINA client. The NCSUGINA client is not available for Windows 95, but some laboratories have set up access to personal Unity space using Samba. Although there is little MacIntosh-based GIS activity on campus, Mac laboratories do provide access to GIS software and allow networked access to both Novell- and UNIX-based resources.
III) Networked Data Implementation in the NCSU Libraries
In order to make data available to users across a heterogenous mix of computing environments, it has been necessary for the library to create redundant networked data holdings in both the UNIX and Netware network environments.
Unity
In 1993, in an attempt to solve data access demands created by the new Esri campus license, the NCSU Libraries teamed with the College of Forest Resources to make a set of geodata available via the Unity UNIX environment. Space was rented from campus Computing Services at a megabyte-per-year rate. The initial set of data was primarily in Arc/Info software format and included the Esri Digital Chart of the World, ArcWorld, and ArcUSA products. As time went on, additional data resources associated with individual GIS courses were added. These course-related resources tended to incorporate TIGER, census population data, and data from the CGIA Corporate Geographic Database. As the statewide SPOT coverage and other data were added, the Unity space was expanded to 9 gigabytes. Initially, data management was handled by personnel from the Center for Earth Observation, but by late 1997 library personnel assumed full responsibility for data management.
Netware
In 1994, in order to accommodate the growing number of Windows 95 and NT users doing GIS--including those using the library GIS PC workstations--most of the data resources residing in Unity were replicated onto a nine gigabyte volume on the library's Netware server. Initially, an anonymous login allowed Netware users outside the library to connect. NDS deployment has nearly eliminated the need for an anonymous login because users are authenticated--based on their own Netware login--upon connection. In late 1997 the library deployed a new NT server for GIS services that would include Internet mapping applications. An effort to migrate the Netware-based data to that server was abandoned when it became clear that campus-wide NT networking would not be scalable under NT 4.0. As a consequence of the failed experience with NT networking, the decision was made to purchase a new Netware server that would be solely dedicated to GIS services. In fall 1998 a new Netware server with 54 gigabytes of storage (44 gigabytes useable with RAID) was deployed and data was migrated from the older Netware server. There are plans to double capacity in 1999/2000.
FTP
In addition to the Unity- and Netware-based access, an FTP server has been used to provide "near-line" and persistent data access. FTP access is an important complement for two main reasons:
- only a small subset of data holdings can be housed on the network and so
must be temporarily uploaded on an as needed basis.
- not all users are able to conveniently access the data via Unity or Netware. FTP provides access through a common denominator: TCP/IP.
In the case of offline resources, requested data is typically loaded on request. Data requestors are mailed a set of instructions for data download and the requested data is left on the server for several days. In the case of persistently-accessible data, ARC/INFO coverages are replicated as export (or interchange) files for FTP download by users who lack network file access to the Netware server.
IV) Data Documentation for Networked Data Resources
A key advantage of networked access to data is support of spontaneous access and use of data; however, the data cannot be easily accessed and used unless the would be user knows that the data exists and has ready access to the following information:
- data location
- data identification and content description (features, attributes, etc.)
- data quality (scale, accuracy, date, etc.)
- data structure (file names, table structures, data dictionaries, etc.)
- spatial referencing (coordinate system/projection, datum)
At NCSU, Web-based documentation has been used extensively to promote and provide documentation for networked data sets. Web pages typically provide the following:
- data descriptions
- data locations and file names
- data availability (geographic extent)
- metadata (if available)
- links to supporting materials and software utilities
A new hyperthesaurus lookup system, expected to be available in August 1999, allows users to lookup data resources by feature type, and browse listings of resources under broader, narrower and related terms.
Metadata
In 1994 Federal Geographic Data Committee created a content standard for geospatial metadata. Metadata is data about the data. Geospatial medatata includes information pertaining to data identification, data quality, data organization, spatial referencing, features and attributes, and distribution. The metadata standard has been put into use by federal agencies and, to some extent, by state and local government agencies. (Federal Geographic Data Committee, 1997)
When metadata is available, the NCSU Libraries makes it available in two ways:
- as a text file residing in the same subdirectory as the data on the network.
- as a Web page accessible from the pertinent Web-based documentation page.
In cases involving non-static data that is copied from a remote source, a copy of the metadata that is valid at the time of data transfer is retained with the data copy. Library documentation then links to this metadata copy rather than the metadata available from the source organization. Metadata replication is needed because it is possible that the source organization will release a new version of the data and update the metadata, in which case a Web link to the source organization metadata would not be valid until the updated data has been replicated locally. Graphic 1 illustrates the relationship between remote and local metadata when local holdings and local derivatives (e.g., retiled or reprojected versions) have been updated in a timely manner. Graphic 2 illustrates the same relationship when local holdings and derivatives have not been updated.
In cases where data resources have been customized or altered by NCSU campus units, efforts have been made to supplement the metadata with lineage information. The term "lineage" refers to information about: a) the data source or sources that act as inputs into a given data set and b) the processing steps carried out on the data to get it into its current form.
Because FGDC compatible metadata records are somewhat lengthy, critical elements of the metadata are extracted and presented for easier access as part of a brief, citation-style display on library Web-based data documentation. These elements include: file name, network location, file size, software format, type (point, line, polygon, image, grid), source scale and coordinate system/projection.
The pending NCSU Libraries GIS services plan is to make metadata searchable in two ways: 1) create a local ISITE index, with a CGI front end for searching by local users and 2) translate metadata records into MARC records for inclusion in the library's online catalog. (See graphic 3) The first search method would be geared to GIS users specifically interested in locating data resources. The second search method would be geared to "accidental users" who are not necessarily interested in GIS, but who through serendipitous discovery stumble upon GIS data resources in subject searches (e.g., "find wetlands") and, by virtue of curiosity or necessity, pursue the acquisition and use of the relevant data resources. The second search method is not ideal for established GIS users because MARC records and standard OPAC software do not fully exploit the indexing and descriptive potential of metadata records. In addition, non-GIS resources (books, etc.) would create background noise on a subject-based OPAC search.
Documenting External Data Resources
The library also provides Web-based documentation for some data resources that are not housed locally, but are available to the general public over the Internet. Some Internet-based data resources are downloaded to the library's networked Geodata collection for more convenient networked access by local users; however, the sheer number of data sets involved, the size of the some data resources and the non-static nature of some data precludes download of every networked resource available. As an added-value access management function of library GIS services, Web pages document these external data resources and provide users with assistance in finding, downloading and processing the data.
One such external data resources is 1:24,000 Digital Elevation Model (DEM) data, which is downloadable for all of North Carolina. Because there are nearly 1,000 such files just for the state, it has not been deemed cost effective to download this data for local access; however, because the process of preparing this data for use in software such as ArcView is so difficult and convoluted, Web pages assisting the download and import process are provided.
The new Web-based hyperthesaurus lookup system hosted by the library incorporates Internet-based data resources in the following categories: a) state and local data from North Carolina, b) nationwide or U.S. regional data with extents including North Carolina, or c) global or world regional data with extents including North Carolina.
V) File Structures for Networked Data Resources
There are many different approaches to organizing data and directories on a server. Data may be organized by one or more of the following criteria: a) subject, b) geography, c) software format, d) permissions level, and e) data source. Networked data at the NCSU Libraries is organized into directories by major data resource (e.g., LULC or TIGER). Within each data source directory, data sets are often then subgrouped by data format (i.e., if there are coverage and shapefile versions) and then by geographic area and feature layer (or vice versa).
The reasons for organizing data by source are many:
- Data sets that are components of a particular data resource will tend to have similar or same update and maintenance cycles--and update is easier if the data is all in the same place.
- Documentation arriving with data will often have path structures explicitly shown. If those files are dispersed across the directory structure, then the documentation is rendered partially invalid.
- A data resource will often have a set of documentation that applies to the resource as a whole (i.e., all of the files). If those files are dispersed across the structure then the documentation must be replicated across the file structure.
- Such a directory structure is partially self-documenting. The origin of a particular data set can be known simply based on its file location. Directory structure complements documentation, Web-based or otherwise.
- The component pieces of a data resource will likely have the same permission level (public domain, NCSU only, etc.). Permissions can be set on data resources simply by setting permissions on the directories (again, levels of access in this case are self-documented by the directory structure).
- Some data resources include ArcView project files (.apr files), in which data paths are hard coded.
VI) Selection of Data Resources for Networked Access
Because available storage space for network accessible data is limited, decisions must be made as to what data should be made persistently network accessible as opposed to downloadable upon request. The following criteria have been used to select data resources for networked access:
- Geographic area: In many cases, such as TIGER and Land Use/Land Cover (LULC), only North Carolina data is made accessible on the network. In the case of data resources that consume more disk space, the focus tends to be more local, with an emphasis on the greater Research Triangle region.
- Frequency of request: In some cases data of a particular non-local area will have high demand. One example is the New Hanover County (coastal area around Wilmington, NC) digital orthophotography, which has been in high demand partly as a result of recent hurricane events in that vicinity.
- Size: Networked access to particularly large (in terms of file size) data sets is restricted. This is especially true in the case of digital orthophotography. Also, compressed, potentially lower-quality versions of image products are more likely to be networked than uncompressed versions because of smaller file sizes. To satisfy high end users, for whom compression-related data loss is unacceptable, uncompressed data can be served in a near-line, FTP environment or through CD-ROM loaning.
- Licensing: Not all data is licensed for campus-wide access. Where appropriate, campus licenses for data have been purchased in order to enable campus-wide networking.
The NCSU Libraries' GIS data collection list can be browsed at: http://www.lib.ncsu.edu/stacks/gis/datalist.html.
VII) Data Format Issues
Choice of software formats carries important implications for networked delivery of data. Although formats for software such as Imagine and Mapinfo can be found in use on campus, Esri formats are predominant among NCSU data holdings for the following reasons: a) there is a campus license for Esri software, b) CGIA and other key state data providers use Esri software, and c) several key commercial data resources originate from Esri or have been provided in Esri formats.
There are a few different choices of software format just within the Esri realm and each of these formats has different implications for networked data services.
Arc/Info Coverages: Arc/Info's native vector format is the "coverage". Data is structured into a set of files occurring in two directories: one carrying the data set name and one called "info". Data sets in a common work space or directory will share the same info directory. File paths are hard coded into the files. Arc/Info coverages cannot be easily FTP'd as is because of this distributed file structure. The Arc/Info copy command is the reliable method of copying coverages across a file system.
Arc/Info Interchange Files: Arc/Info files can be exported to create an "export" or "interchange" file that can be easily copied within or across file systems and can be FTP'd. Arc/Info data that is downloadable from the Internet is generally found in this format (indicated by .e00 file extension). These files can be turned into coverages again using either the Arc/Info import command or the ArcView Import utility.
Shapefiles: Shapefiles are the native format of Esri's ArcView software. Shapefiles are made up of three or more separate files, each of which must be present (at least the .shp, .dbf and .shx files). There are no file path hard codings, so these files can be copied at will; however, because there are multiple files, it is common to "zip" the component files into one package for download. Shapefiles can be turned into Arc/Info coverages using Arc/Info.
Data providers must decide whether to: a) provide the data in its original format, b) convert the data into a different format for access, or c) provide redundant access to the data in different formats. In choosing between Arc/Info coverages and shapefiles data providers must consider the advantages and disadvantages of each format.
Coverages have the following advantages:
- Coverages are in a ready to use format for Arc/Info users.
- Annotation layers are supported.
- Topology (adjacency and connectivity information) is included.
- Coverages can also be used by ArcView.
Shapefiles have the following advantages:
- Shapefiles appear to have faster draw times, especially across a network.
- File transfer is easier.
- It is easier for casual users to add multiple layers to an ArcView "View".
ARC/INFO export files and zipped shapefiles are convenient for download, but these formats cannot do double duty by being used in place by networked file service users and Internet Map Server applications.
The NCSU Libraries is currently making ARC/INFO data available on the network in both coverage (or GRID) and export formats. Shapefiles are not currently being zipped. ARC/INFO coverages are only being mass converted to shapefile format for specific North Carolina regions in conjunction with Internet Map Server projects.
The issue of data format choice extends to non-Esri format data resources as well. For instance, in the case of Digital Orthophotography, one must decide whether to provide the data in compressed JPEG format, the original uncompressed BIP format, or in MrSID compressed format. Each of these choices carries implications for data quality, disk space requirements, flexibility of use, choice of software application, and application draw speeds. The tentative NCSU Libraries plan is to provide access to DOQQ's in the following manner: 1) offline or nearline access to the original, uncompressed, unclipped UTM data, 2) offline or nearline access to uncompressed, clipped, State Plane 1983/meters TIFF versions, and 3) networked access to MrSID compressed versions (clipped in State Plane 1983/meters, possibly both as individual quarter quads and as 7.5 minute mosaics). (see graphic 4) MrSID compression has been used on an on-demand basis in NCSU Libraries GIS services since mid-1998.
Projections
Map projections make it possible for areas on the Earth's surface to be represented on flat surfaces such as a map. In order to do this, map projections distort shape, area, distance or direction to some extent. In GIS, different map projections are employed depending on geographic area, extent of the data, and type of application. Generally, it is currently necessary to have different data sets in the same projection in order to overlay data sets and make different data resources work together; however, newer versions of Esri software such as ArcView and MapObjects applications will have enhanced projection-on-the-fly capability.
Until now, the strategy of the library has been to leave data in its original projection, leaving it to the user to reproject the data into a projection of choice. This hands off approach has been taken for the following reasons:
- Data projection can be time consuming.
- Reprojecting data would run against the strategy of avoiding data alterations.
- In the case of unprojected data (data in geographic coordinates), leaving the data in its original format allows users to project "on the fly" in ArcView to a projection of choice.
Nonetheless, there are is a clear advantage in having data already in a projection favored by the user, because the user can then spontaneously access and combine data sets without undertaking the time consuming task of copying and then projecting data sets. For less advanced GIS users, such as ArcView users lacking Arc/Info expertise (most of the NCSU user population), the task of projection can be daunting. Furthermore, the tools available for projecting data in ArcView may not be as reliable as those in Arc/Info, especially if datum conversions are involved. This situation is expected to change with the release of ArcView 3.2 and the improvement in projection handling.
Until recently, there was no clear choice for standard projection for North Carolina data. The state government was using State Plane 1927 (units feet), the State Dept. of Transportation was using State Plane 1983 (units meters), a wide range of USGS data resources were arriving in UTM (Universal Transverse Mercator), and numerous other resources were arriving in geographic coordinates (units decimal degrees). Data left in geographic coordinates could at least be projected on the fly to any of the other projections.
All this changed in 1998, when the state government decided to standardize on the North Carolina State Plane 1983 (units meters) projection, taking advantage of the improved accuracy of the 1983 North American Datum (NAD83), and bringing the CGIA data into conformity with the substantial North Carolina Dept. of Transportation (NCDOT) data resources. (North Carolina Geographic Information Coordinating Council, 1998) CGIA and NCDOT then began to convert UTM-based data resources such as Digital Orthophoto Quarter Quadrangles and Digital Raster Graphics into State Plane 1983 (units meters). The adoption of the State Plane 1983 projection as a standard for state data products improves the likelihood that the library will convert other data resources to this projection as well in order to facilitate ease of use by the end user. At the same time, improved projection handling in ArcView, MapObjects applications and other software makes such a mass conversion less necessary.
VIII) Web-Based GIS Application Development
In conjunction with networked data services, the NCSU Libraries has been exploring and implementing Web-based GIS applications that facilitate remote access and use of data residing on library servers. Beginning in May 1998, in cooperation with the College of Forest Resources and North Carolina Cooperative Extension Service, the library began hosting Web-based interactive access to more than 60 data layers using ArcView Internet Map Server software. [Korca Baran, 1998] This technology allows users with Web browsers and network connections to remotely access, view, and manipulate data resources residing on the library server. Remote users can make use of a Java client that acts as a miniature mapping application embedded in the Web browser, allowing the user to create maps, pan, zoom and run basic queries against the data resources. [Environmental Systems Research Institute, 1997]
The Internet Map Server technology makes geodata resources accessible to and useable by a much broader audience, including those lacking GIS software, expertise and data storage space. Also, because data processing is handled on the server side and the Web client only makes use of a snapshot of the data, not as much network bandwidth is required to make use of the data. As data resources become larger and larger in terms of file size, this "fat server, thin client" approach is expected to play an important role in the future of networked geodata services. Web-based mapping or Web-based GIS has the added benefit of providing some level of access to data resources for which access or distribution is otherwise restricted by licensing issues. Because the application and data can be utilized by those working on a standard networked PC without special software, GIS data services and a substantial part of the geodata collection have now been mainstreamed into the standard NCSU Libraries public access computing environment. Web-based mapping applications also provide "scratch & sniff" functionality to experienced GIS users who simply wish to evaluate datasets prior to acquisition.
A useful side benefit of the Upper Neuse Region Data System project has been the creation of a seamless coverage of more than 60 data layers, all in the same projection and mostly in shapefile format, for the greater Research Triangle region. This activity may be seen as a precursor to future production efforts oriented towards preparation of seamless sets of data resources in common projections and formats.
IX) Problems Experienced in Networked Data Services
Heterogenous Network Environment
The biggest barrier to effective delivery of GIS data services has been the heterogenous nature of the network environments in which campus users work. UNIX users can generally access both GIS applications and available data, but not all data available in Netware is replicated in UNIX because the current storage capacity in Unity is more limited. Windows 95/NT users can generally install the applications per the campus site license, but only with Netware clients can they gain network file access to the GIS data. A number of campus units are running NT without Netware clients, and hence can only access data by FTP.
Poor Performance
Slow application draw times involving networked data can prove frustrating to the end user. The problem can be particularly bad in times of peak network use as the network becomes congested. Performance also seems to vary considerably by department, depending on the local network infrastructure of the department in question. In most units data access times are quite fast, but in some others access times are frustratingly slow. New GIS users tend to have the most problems because they are more likely to be impatient and continue executing commands in the GIS software while the application is still waiting to receive or process data over the network. This often will cause the application in question to crash. Users are encouraged to wait for the application to finish current processing steps, and then, if necessary, to save the data locally in their work space for higher speed use.
Performance problems are rooted in the shortcomings of NFS-style communications and the nature of GIS data processing. NFS-style communication is connection-oriented communication and consumes considerable overhead in network bandwidth. GIS applications involve the transfer of large chunks of data and frequent access to the data across the network. While CPU and application speeds have grown rapidly in recent years, network bandwidth has not kept pace. (Environmental Systems Research Institute, 1998b)
Other Disadvantages of Network File Access
In addition to the issues of slow data draw times and problematic access in a heterogenous network environment, network file access carries some additional disadvantages when compared with download access. First, in the case of Netware access, there is a need for Netware licenses to accommodate mapped drives. The 100 simultaneous user limit for the library's GIS Netware server was reached in spring 1999, largely due to the fact that a number of campus laboratories had seen fit to set persistent mapped drives to the server. The license was expanded to 350 simultaneous users. Second, data usage cannot be tracked as easily as it can in a download-based system. Third, without a Web-based interface for download (or "readme" files in a strictly FTP-based system) it is not possible to provide disclaimer and licensing click-throughs for data users.
Lack of Documentation
In the case of class or project-related data that is made available over the network, problems have arisen in cases where data are not adequately documented. Class and project data left on the network for a period of time without metadata or at least minimal documentation becomes essentially "dead data" after a period of time. A particularly important piece of documentation is lineage information because potential users will want to know the origin of a particular data set and what processing steps have been carried out on the data. The library has been in the process of removing undocumented data from the network and burning the data to CD-ROM for archival purposes.
The library now encourages the users to, at the very least, do the following two things: 1) capture the original metadata record and save it with the data, 2) record the lineage information describing the processing carried out on the data. In the best case scenario steps 1 and 2 are combined to create a new FGDC-compatible metadata record that can be both viewed and possibly indexed. Library personnel provide consultation to those wishing to create FGDC compliant metadata records.
X) Trends for the Future
For the near future the NCSU Libraries plans to continue the parallel development of the Novell- and UNIX-based geodata collections. In addition to providing a workaround to the lack of accessibility from one networked environment to another other, the full replication of data holdings provides for a full working backup in the case of massive corruption in one or the other environments. It is hoped, however, that in the future the library will be able to employ a single approach that provides data services to users in all campus computing environments.
Spatial Database Engine
One technology under consideration is Esri's Spatial Database Engine (SDE). SDE is a "middle tier application server that disseminates spatial data based on highly efficient spatial search functions, provides geometric data validation, includes map projection functions, and works within heterogeneous hardware and network environments." SDE works in combination with commercial DBMS (Database Management Systems) software to provide for access to spatial data from various network environments. SDE clients are being built into Esri software to allow the user to access, via SDE, data that is stored in an RDBMS. (Environmental Systems Research Institute, 1998a)
The advantages of SDE for library networked data services might include:
- Access from any network environment: SDE communications are based on TCP/IP, which serves as a common denominator for NT, Novell and UNIX networks.
- Seamless data presentation: Tiled data sets, or data such as TIGER that is broken up into county-based files, can be presented as seamless datasets.
- More efficient network communications. Whereas NFS-style access works on connection oriented communications that require considerable overhead in terms of network bandwidth, SDE uses TCP/IP, which is message-oriented and uses very little overhead.
- More efficient client/server interaction. Overall network traffic between the client and server is minimized by a balancing of tasks. The server transfers to the client only data and features needed by the application. (Environmental Systems Research Institute, 1998a)
For all its advantages, SDE is not likely to be implemented in the very near future for the following reasons:
- SDE is relatively expensive.
- There would likely be considerable database administration (DBA) workload
associated with SDE and the chosen RDBMS.
- Current versions of SDE do not yet support images (although the upcoming version is expected to).
- The NCSU Libraries and campus users would like to evaluate SDE client functionality, particularly data extraction functions, before making a major commitment in this area.
- Password control would be problematic.
Internet Map Server
Web-based GIS applications are likely to play an increasing role in NCSU Libraries GIS services because such applications make data available to a much wider user audience and require so little in the way of equipment, software, expertise, or network bandwidth on the part of the end user. Also, as data sets grow larger in larger in size, solutions that involve leaving data on the server and transferring only a minimal amount of data--or only query results--to the client will prove increasingly attractive. Current Web-based GIS applications are somewhat limited in terms of client functionality, but the software is expected to improve. ArcIMS, with its feature-streaming functionality, is likely to play an important role in both serving casual users in a multi-platform environment and extending GIS services to remote users, including distance learners.
With these advantages in mind, in the near future the NCSU Upper Neuse Region Data System will be complemented by new Web-based data services focusing on first the Western part of North Carolina and then North Carolina as a whole.
Endnotes:
Abbott, Lisa T. & Argentati, Carolyn. 1995. GIS: A New Component of Public Services. Journal of Academic Librarianship. 21 (4) (July 1995): 251-256.
Argentati, Carolyn. 1997. Expanding Horizons for GIS Services in Academic Libraries. Journal of Academic Librarianship. 23 (6) (November 1997): 463-468.
Environmental Systems Research Institute. 1997. Dynamic Map Publishing on the Web: ArcView Internet Map Server. [Online]. Available: http://www.Esri.com/library/fliers/pdfs/ims76664.pdf [October, 1997]
Environmental Systems Research Institute. 1998. Spatial Database Engine. [Online]. Available: http://www.Esri.com/library/brochures/pdfs/sdebroch.pdf [April, 1998]
Environmental Systems Research Institute. 1998. System Design Strategies. [Online]. Available: http://www.Esri.com/library/whitepapers/pdfs/sysdesig.pdf [November, 1998]
Federal Geographic Data Committee. 1997. Geospatial Metadata. [Online]. Available: http://www.fgdc.gov/publications/documents/metadata/metafact.pdf [March, 1997].
Korca Baran, Perver. 1998. Upper Neuse Region Data System. [Online]. Available: http://www.lib.ncsu.edu/stacks/gis/regional/ [June 24, 1998]
McDaniel, Ellen. 1997. The EOS/Unity Academic Computing Environment. [Online]. Available: http://www.eos.ncsu.edu/manuals/guide/Introduction.html [August 15, 1998].
North Carolina Geographic Information Coordinating Council. 1998. Statement of Direction: North Carolina Corporate Geographic Database Horizontal Reference, Datum and Unit Of Measure. [Online]. Available: http://cgia.cgia.state.nc.us:80/gicc/policy/projsod.html [March 4, 1998].
Novell, Inc. 1995. Novell Directory Services, Operating Systems Division Marketing Brief Number 1, May 18, 1995. [Online]. Available: http://netware.novell.com/discover/mbnwds.htm [May 18, 1995].
Author Information
Name: Steven P. Morris
Title: Librarian for Spatial & Numeric Data Services
Organization: North Carolina State University Libraries
Address: DH Hill Library, Box 7111, NCSU, Raleigh, NC 27695
Telephone: (919) 513-2614
Fax: (919) 515-8264
E-mail : Steven_Morris@ncsu.edu