POSTSCRIPT FILES AND THE WORLD WIDE WEB

by

SD Lynch

Department of Agricultural Engineering

University of Natal

Pietermaritzburg, 3200

South Africa




Last updated April 7, 1997






ABSTRACT

The World Wide Web (WWW) has become the deo facto medium to communicate and to share information electronically. Research scientists, commercial companies and the lay-public can use the WWW as a common interface, thus enabling them access to each others work. The formatting of a WWW page relies on the use of the Hyper Text Markup Language (HTML) that supports text, graphics and sound.

The PostScript file format is widely used in the Publishing World and also in the Geographic Information Systems (GIS) arena to store and to produce graphical images. This complex graphical file format is not supported in HTML and these PostScript files have therefore to be converted to another format in order for them to be displayed on the WWW.

This paper will examine inter alia the fundamentals of converting PostScript files into a graphical file format that can be displayed by a WWW browser.

INTRODUCTION

The World Wide Web (WWW) was developed in 1989 by the computer scientist Timothy Berners-Lee to enable information to be shared among internationally dispersed teams of researchers at the European Organisation for Nuclear Research facility near Geneva, Switzerland. It subsequently became a platform for related software development, and the numbers of linked computers grew rapidly to support a variety of endeavours, including a large business marketplace. Its further development is guided by the WWW Consortium based at the Massachusetts Institute of Technology in the United States of America. Users generally navigate through the WWW using an application known as a WWW browser client or simply as a browser. The browser presents formatted text, images that are in gif and JPEG formats, sound, or other objects, such as links, in the form of a WWW page on a computer screen (Rutkowski, 1996).

Geographic Information Systems (GIS) can be defined as a set of hardware and software tools that allow the users of applications in the real world to inter alia display, modify, interrogate, store, update and query spatial information. The displaying of the information by users of different hardware and software systems form the nucleus of this paper. The author uses the GIS package, ArcInfo, that was developed and is distributed by Environmental Systems Research Institute, Inc. (Esri).

The Adobe Systems Inc. PostScript software language gives you the power to create and print documents of any visual complexity with total precision. PostScript, was introduced in 1985 and since then, it has transformed what can be done with the printed page. Today, PostScript has become the world's standard printing and imaging technology. PostScript is a computer language that describes the appearance of a page, including elements such as text, graphics, and scanned images, to a printer or any other output device.

PostScript works seamlessly with every major operating system and colour management system. So whether you're using MS-DOS, Windows, OS/2, UNIX, Macintosh, a mini or mainframe system, or any combination of the above, you can print to any printer that has PostScript and expect the highest quality output every time.

The problem that we are faced with is the fact that the majority of GIS packages produce graphical output that is in PostScript format. Before these PostScript files can be used on the WWW they need to be converted into one of the graphical formats that are used by the WWW browsers. The fact that the PostScript files are actually in a language that has to be compiled and then executed, requires a powerful processor, time and the necessary software makes them unsuitable for WWW publishing.

HYPERTEXT MARKUP LANGUAGE (HTML)

The HyperText Markup Language (HTML), the specifications can be found on the WWW at http://www.w3.org, is a simple markup language used to create hypertext documents that are platform independent. HTML can represent hypertext news, mail, documentation, and hypermedia, menus of options, database query results, simple structured documents with in-line graphics and hypertext views of existing bodies of information.

GRAPHICAL IMAGES FILES IN HTML

The Graphics Interchange Format (gif) was initially designed for efficient storage and transmitting of images on the commercial CompuServe Inc. network. The Graphics Interchange Format defines a protocol intended for the on-line transmission and interchange of raster data in a way that is independent of the hardware used in the creation of display. The gif is the copyright of CompuServe Inc. and only they are authorised to define, redefine, enhance, alter, modify or change in any way the definition of the format. CompuServe Inc. grants a limited, non-exclusive, royalty-free license for the use of the gif format in computer software. The Graphics Interchange Format (gif) is the copyright of CompuServe Inc. and gif is a Service Mark property of CompuServe Inc..

The Joint Photographic Experts Group (JPEG) format was developed to facilitate lossy compression of large images with high colour depth. The system does not actually define a file format (that is, how the compressed data should be written into the disk file). An extension to the basic specification (the JPEG File Interchange Format, or JFIF) was created for this purpose.

COMPRESSION OF IMAGE FILES

Originally limited to a maximum resolution of 320x200 the format was expanded to cover an almost-indefinite size range (actually up to 64K by 64K). The major limitation remains its lack in colour-depth, it stores only 8 bits per pixel, so it is limited to a maximum of 256 colours. It is probably the most widely-used of all graphics formats. In recent years it has received something of a new lease of life as one of the two file formats (the other being JPEG) used on the pages of the WWW. The most widely-used compression technique is the LZW algorithm, which was developed by Lempel, Ziv and Welch in the 1970's. Whilst the details of this algorithm are not relevant here, it is generally accepted as a superior approach that produces consistently higher compression ratios whilst still being as fast as the Run Length Encoded (RLE) methods. Some of the most common and most efficient bitmap formats (such as gif and TIFF) use this LZW encoding technique.

In the last twelve months the LZW method has been the subject of intense debate in both the computer graphics and the Internet communities, both of which make extensive use of the gif file format, which uses the LZW compression. The method was patented by Welsh's employers (Unisys), and they asserted their rights to royalties from the writers of software that uses this technique. Whilst end users of the file format were not affected, it is generated backlash against the technique and stimulates the rapid creation of a new file format using a public domain version of the same algorithm (Goodman, 1996). In reaction to the licensing problems around the LZW compression technique used in the gif, the developers of Ghostscript have been prompted to exclude this format from their list of supported graphical interchange formats.

A public domain system was created by a group that grew out of the recognition of the need for improved formats for the electronic transmission of colour and grey-scale data, particularly by facsimile. Out of this came the JPEG, who in turn created the JPEG compression process, made it publicly available, and continue to develop it. This group includes industry representatives, but is essentially an international standards organisation.

POSTSCRIPT FILES

As the desktop publishing market developed on the PC's in the mid-1980's, and the laser printer became widely-used, PostScript became the most common language in this environment. However, given its origins, PostScript is unsuitable as a means of transferring documents (combining text and images) between applications and platforms. To deal with this Adobe Systems Inc. developed the Encapsulated PostScript (EPS) format. EPS files include a PostScript description of a "page", and a low-resolution bitmap that can be used to represent the "page" when incorporated into other documents (or shown on a different, non-PostScript display system).

TECHNIQUES FOR CONVERTING POSTSCRIPT TO gif

PostScript files are in a computer graphics language and therefore need to be compiled before they can be displayed, whereas the image files such as gif and JPEG have been "compiled" and can therefore be displayed much quicker than in the case of PostScript files.

Screen capture method

This is probably the most widely used conversion between PostScript and gif file formats. In this method the PostScript file is displayed on the monitor using, for example, a public domain package such as Ghostscript, a screen grabbing utility is then used to capture the display and it is then saved into one of the WWW friendly graphics formats. There is a loss of resolution when using this method but it has a major advantage in the speed of conversion.

Scanning method

The PostScript file is sent to a PostScript printer or to any other "bitmap" printer (using, for example, the Ghostscript software) to produce a hardcopy. This hardcopy is then scanned using an image scanner and this image is then saved into one of the WWW friendly graphics formats. The time and cost involved in producing the hardcopy are the main disadvantages of this method.

Multi-conversion method

In this method the PostScript file is firstly converted into a bitmapped format and this file is then converted into one of the WWW friendly graphics formats. For example, the Ghostscript software can be used to convert the PostScript file to a Tag Image File Format (TIFF) file and Wingif or XV can then be used to convert the TIFF file to a WWW friendly file format.

ImageMagick method

ImageMagick is a package for display and interactive manipulation of images for the X Window System. It is written in C and interfaces to the X library, and therefore does not require any propriety toolkit in order to compile. Although the software is copyrighted, it is available for free and can be redistributed without fee. The conversion from PostScript format to the gif format can be done in a batch process using the CONVERT routine of ImageMagick. This is therefore the way to go when a large number of PostScript files require converting to the gif format. LZW compression is no longer available in the ImageMagick distribution (gif pixel data is saved uncompressed). Unisys claims that they have the right to demand licenses and/or fees from free software incorporating the LZW algorithms, even though they are currently not doing this. This will hang over the head of the developer of any free software that creates gif files until the Welch patent expires on December 10, 2002 (17 years after its award date). The popular shareware program for Windows, Paint Shop Pro, can be used in a batch mode to convert these uncompressed gif files to compressed gif files.

The above mentioned methods can be used when only one or a few PostScript files require conversion. The author has yet to find an easy automatic method that can be used when a few hundred files require conversion. This is the case when a GIS is used to produce a number of different scenarios.

IMAGE RESOLUTION

The first question that the author wanting to convert a PostScript file to a WWW friendly format needs to ask is,

who will be viewing the image, what size should the image be, what is going to be done will the image.

Some images, Fig. 1, are used primarily to depict the distribution and sizes, for example, of the Provinces in Southern Africa and therefore do not require a high level of resolution. Fig.2, on the hand, requires a high level of resolution as the author wishes to portray the proximity of certain zonal, arc and point information.

The level of resolution should therefore be taken into account when deciding on which conversion technique to use. In the case of Fig. 1, a screen capture method would suffice whereas a more exact conversion technique is required when using an image as described by Fig. 2.

The level of resolution of a PostScript file is output device dependent and is therefore the ideal format to use when saving or sharing images. The only drawback, when using the WWW, is the complexity of the software that is required to render the image and most important of all is the speed at which this can be done.

AUTHORS CHOICE

This paper is still in preparation and therefore interested parties should view this paper every week or so to see what is happening. The author is in the process of researching the different Pbmplus tools. The process that I have used to generate the gif files, that can be viewed at http://www.ccwr.ac.za/~lynch2/data.html, is as follows;

1) use ArcInfo to produce an EPS file,

2) use Ghostscript to convert the EPS file (a.eps) to a PCX format (a.pcx)

gswin32 -q -r100 -dNOPAUSE -sOutputFile=a.pcx -sPAPERSIZE=a4 -sDEVICE=pcx256 a.eps

3) use Pbmplus to convert the PCX (a.pcx) file to portable pixmap file (a.ppm)

pcxtoppm.exe a.pcx > a.ppm

4) use Pbmplus to crop the pixmap file (a.ppm) to a pixmap file (a.crp)

pnmcrop.exe a.ppm > a.crp

5) use Pbmplus to reduce the size of the pixmap file (a.crp) onto a pixmap file (a.scl)

pnmscale.exe 0.5 a.crp > a.scl

6) use Pbmplus to reduce the number of colours in the pixmap file (a.scl) to a pixmap file (a.qua) containing a maximum of 256 colours (gif limitation)

ppmquant.exe 256 a.scl > a.qua

7) use Pbmplus to rotate the pixmap file (a.qua) through 90 to produce a pixmap file (a.rot)

pnmrotat.exe -noantialias -90 a.qua > a.rot

8) finally use Pbmplus to convert the pixmap file (a.rot) to a gif file (a.gif)

ppmtogif.exe -interlace a.rot a.gif

9) insert an URL pointing to the gif file (a.gif)

I have used an IBM RS6000 for steps 1 and 9 and a Pentium 100 Win95 machine for the other steps. I have set up an automated process to perform steps 2 through 8 and it takes approximately 2 minutes to produce the images mentioned above. The Pbmplus suite of Win95 utilities can be downloaded from a simtel site nearer to you by searching the Shareware site for netpbm.

DISCUSSION AND CONCLUSIONS

In the 1980's computer users had access to a host of different computer systems and each of these systems was using a different operating system. It was difficult to share information across platforms. The acceptance of UNIX and the Windows operating systems led to an almost transparent sharing of information between operating systems.

The introduction of the Internet and in particular the WWW and HTML, has made it possible for different computers using different operating systems to share information from users all over the world via the Internet communications backbone. The ability to share text or ASCII data has never actually posed a problem to the computer user community. The major headache has been the ability to share graphical images between different operating systems. When Adobe Systems Inc. launched the PostScript format, a transparent graphical language was introduced that enabled users to output the graphical images exactly and without loss of resolution to any PostScript printer or to any bitmap printer using the appropriate software (e.g. Ghostscript software).

The GIS industry has expanded at almost the same rate as that of the Internet and the WWW. The ability of the GIS fraternity to share graphical images across the WWW is therefore of utmost importance. The majority of GIS packages are able to produce output in a PostScript format and it is hoped that this document will assist them in sharing their images with the WWW user community.

INDUSTRY WISH LIST

The GIS software developers need to produce code that will allow the users an option to save their graphical output into a WWW friendly format. When this has been done this document will become obsolete and the WWW and the GIS fraternity will be able to share information more freely.

ACKNOWLEDGEMENTS

The Computing Centre for Water Research (CCWR) is acknowledged gratefully for their assistance in making this research possible and for allowing the author to make use of their WWW server to publish and disseminate his published articles to the scientific community all over the World. The Research Fund of the University of Natal is thanked for their financial support in this project. The Water Research Commission (WRC) is also acknowledged for allowing time to do this research. Finally, the Internet user community, and in particular Richard Kunz, are also thanked for their assistance in making this research possible.

DISCLAIMER

The information provided herein is subject to change without notice. In no event will I be liable for damages, including loss of revenue, loss of profits or other incidental or consequential damages arising out of the use or inability to use the information presented in this document.

REFERENCES

Adobe Systems Inc.©, 345 Park Avenue, San Jose, California 95110-2704, UNITED STATES OF AMERICA.

Aladdin Enterprises, Aladdin Ghostscript, 203 Santa Margarita Ave.,Menlo Park, California 94025, UNITED STATES OF AMERICA.

CompuServe Inc.©, Graphics Technology Department, 5000 Arlington Centre Boulevard, Columbus, Ohio 43220, UNITED STATES OF AMERICA.

Computing Centre for Water Research, c/o University of Natal, Private Bag X01, Scottsville, 3209, SOUTH AFRICA.

Environmental Systems Research Institute Inc.©, 380 New York Street, Redlands, California 92373-8100, UNITED STATES OF AMERICA.

Albert Goodman, School of Computing and Mathematics, Deakin University (Rusden), Clayton, Victoria 3168, AUSTRALIA.

JASC, Inc.,Paint Shop Pro©, PO Box 44997, Eden Prairie, Minnesota 55344, UNITED STATES OF AMERICA.

ImageMagick, E. I. du Pont de Nemours and Company ©, 1007 Market Street, Wilmington, Delaware 19898, UNITED STATES OF AMERICA.

Independent JPEG Group

Anthony M. Rutkowski, Microsoft® Encarta® 96 Encyclopedia, Microsoft® Corporation, One Microsoft Way, Redmond, Washington 98052-6399, UNITED STATES OF AMERICA.

Unisys Corporation©, PO Box 500, Blue Bell, Pennsylvania 19424, UNITED STATES OF AMERICA.

Water Research Commission , PO Box 824, Pretoria, 0001, SOUTH AFRICA.

Welch Patent Licensing Department; Unisys; Mail Stop C1SW19; PO Box 500, Blue Bell, Pennsylvania 19424, UNITED STATES OF AMERICA

Wingif©, SuperSet Software Corp., PO Box 50476, Provo, Utah 84605-0476, UNITED STATES OF AMERICA.

XV©, John Bradley, 1053 Floyd Terrace, Bryn Mawr, Pennsylvania 19010, UNITED STATES OF AMERICA.