POSTSCRIPT FILES AND THE WORLD WIDE WEB
by
SD Lynch
Department of Agricultural Engineering
University of Natal
Pietermaritzburg, 3200
South Africa
Last updated April 7, 1997
ABSTRACT
The World Wide Web (WWW) has become the deo facto medium to communicate and to share
information electronically. Research scientists, commercial companies and the lay-public can use
the WWW as a common interface, thus enabling them access to each others work. The formatting
of a WWW page relies on the use of the Hyper Text Markup Language (HTML) that supports
text, graphics and sound.
The PostScript file format is widely used in the Publishing World and also in the Geographic
Information Systems (GIS) arena to store and to produce graphical images. This complex
graphical file format is not supported in HTML and these PostScript files have therefore to be
converted to another format in order for them to be displayed on the WWW.
This paper will examine inter alia the fundamentals of converting PostScript files into a graphical
file format that can be displayed by a WWW browser.
INTRODUCTION
The World Wide Web (WWW) was developed in 1989 by the computer scientist Timothy
Berners-Lee to enable information to be shared among internationally dispersed teams of
researchers at the European Organisation for Nuclear Research facility near Geneva,
Switzerland. It subsequently became a platform for related software development, and the
numbers of linked computers grew rapidly to support a variety of endeavours, including a large
business marketplace. Its further development is guided by the WWW Consortium based at the
Massachusetts Institute of Technology in the United States of America. Users generally navigate
through the WWW using an application known as a WWW browser client or simply as a browser.
The browser presents formatted text, images that are in gif and JPEG formats, sound, or other
objects, such as links, in the form of a WWW page on a computer screen (Rutkowski, 1996).
Geographic Information Systems (GIS) can be defined as a set of hardware and software tools
that allow the users of applications in the real world to inter alia display, modify, interrogate,
store, update and query spatial information. The displaying of the information by users of different
hardware and software systems form the nucleus of this paper. The author uses the GIS package,
ArcInfo, that was developed and is distributed by Environmental Systems Research Institute,
Inc. (Esri).
The Adobe Systems Inc. PostScript software language gives you the power to create and print
documents of any visual complexity with total precision. PostScript, was introduced in 1985 and
since then, it has transformed what can be done with the printed page. Today, PostScript has
become the world's standard printing and imaging technology. PostScript is a computer language
that describes the appearance of a page, including elements such as text, graphics, and scanned
images, to a printer or any other output device.
PostScript works seamlessly with every major operating system and colour management system.
So whether you're using MS-DOS, Windows, OS/2, UNIX, Macintosh, a mini or mainframe
system, or any combination of the above, you can print to any printer that has PostScript and
expect the highest quality output every time.
The problem that we are faced with is the fact that the majority of GIS packages produce
graphical output that is in PostScript format. Before these PostScript files can be used on the
WWW they need to be converted into one of the graphical formats that are used by the WWW
browsers. The fact that the PostScript files are actually in a language that has to be compiled and
then executed, requires a powerful processor, time and the necessary software makes them
unsuitable for WWW publishing.
HYPERTEXT MARKUP LANGUAGE (HTML)
The HyperText Markup Language (HTML), the specifications can be found on the WWW at
http://www.w3.org, is a simple markup language used to create hypertext documents that are
platform independent. HTML can represent hypertext news, mail, documentation, and
hypermedia, menus of options, database query results, simple structured documents with in-line
graphics and hypertext views of existing bodies of information.
GRAPHICAL IMAGES FILES IN HTML
The Graphics Interchange Format (gif) was initially designed for efficient storage and
transmitting of images on the commercial CompuServe Inc. network. The Graphics Interchange
Format defines a protocol intended for the on-line transmission and interchange of raster data in a
way that is independent of the hardware used in the creation of display. The gif is the copyright
of CompuServe Inc. and only they are authorised to define, redefine, enhance, alter, modify or
change in any way the definition of the format. CompuServe Inc. grants a limited, non-exclusive,
royalty-free license for the use of the gif format in computer software. The Graphics Interchange
Format (gif) is the copyright of CompuServe Inc. and gif is a Service Mark property of
CompuServe Inc..
The Joint Photographic Experts Group (JPEG) format was developed to facilitate lossy
compression of large images with high colour depth. The system does not actually define a file
format (that is, how the compressed data should be written into the disk file). An extension to the
basic specification (the JPEG File Interchange Format, or JFIF) was created for this purpose.
COMPRESSION OF IMAGE FILES
Originally limited to a maximum resolution of 320x200 the format was expanded to cover an
almost-indefinite size range (actually up to 64K by 64K). The major limitation remains its lack in
colour-depth, it stores only 8 bits per pixel, so it is limited to a maximum of 256 colours. It is
probably the most widely-used of all graphics formats. In recent years it has received something of
a new lease of life as one of the two file formats (the other being JPEG) used on the pages of the
WWW. The most widely-used compression technique is the LZW algorithm, which was
developed by Lempel, Ziv and Welch in the 1970's. Whilst the details of this algorithm are not
relevant here, it is generally accepted as a superior approach that produces consistently higher
compression ratios whilst still being as fast as the Run Length Encoded (RLE) methods. Some of
the most common and most efficient bitmap formats (such as gif and TIFF) use this LZW
encoding technique.
In the last twelve months the LZW method has been the subject of intense debate in both the
computer graphics and the Internet communities, both of which make extensive use of the gif file
format, which uses the LZW compression. The method was patented by Welsh's employers
(Unisys), and they asserted their rights to royalties from the writers of software that uses this
technique. Whilst end users of the file format were not affected, it is generated backlash against
the technique and stimulates the rapid creation of a new file format using a public domain version
of the same algorithm (Goodman, 1996). In reaction to the licensing problems around the LZW
compression technique used in the gif, the developers of Ghostscript have been prompted to
exclude this format from their list of supported graphical interchange formats.
A public domain system was created by a group that grew out of the recognition of the need for
improved formats for the electronic transmission of colour and grey-scale data, particularly by
facsimile. Out of this came the JPEG, who in turn created the JPEG compression process, made it
publicly available, and continue to develop it. This group includes industry representatives, but is
essentially an international standards organisation.
POSTSCRIPT FILES
As the desktop publishing market developed on the PC's in the mid-1980's, and the laser printer
became widely-used, PostScript became the most common language in this environment.
However, given its origins, PostScript is unsuitable as a means of transferring documents
(combining text and images) between applications and platforms. To deal with this Adobe
Systems Inc. developed the Encapsulated PostScript (EPS) format. EPS files include a PostScript
description of a "page", and a low-resolution bitmap that can be used to represent the "page"
when incorporated into other documents (or shown on a different, non-PostScript display
system).
TECHNIQUES FOR CONVERTING POSTSCRIPT TO gif
PostScript files are in a computer graphics language and therefore need to be compiled before
they can be displayed, whereas the image files such as gif and JPEG have been "compiled" and
can therefore be displayed much quicker than in the case of PostScript files.
Screen capture method
This is probably the most widely used conversion between PostScript and gif file formats. In this
method the PostScript file is displayed on the monitor using, for example, a public domain
package such as Ghostscript, a screen grabbing utility is then used to capture the display and it is
then saved into one of the WWW friendly graphics formats. There is a loss of resolution when
using this method but it has a major advantage in the speed of conversion.
Scanning method
The PostScript file is sent to a PostScript printer or to any other "bitmap" printer (using, for
example, the Ghostscript software) to produce a hardcopy. This hardcopy is then scanned using
an image scanner and this image is then saved into one of the WWW friendly graphics formats.
The time and cost involved in producing the hardcopy are the main disadvantages of this method.
Multi-conversion method
In this method the PostScript file is firstly converted into a bitmapped format and this file is then
converted into one of the WWW friendly graphics formats. For example, the Ghostscript software
can be used to convert the PostScript file to a Tag Image File Format (TIFF) file and Wingif or
XV can then be used to convert the TIFF file to a WWW friendly file format.
ImageMagick method
ImageMagick is a package for display and interactive manipulation of images for the X Window
System. It is written in C and interfaces to the X library, and therefore does not require any
propriety toolkit in order to compile. Although the software is copyrighted, it is available for free
and can be redistributed without fee. The conversion from PostScript format to the gif format
can be done in a batch process using the CONVERT routine of ImageMagick. This is therefore
the way to go when a large number of PostScript files require converting to the gif format. LZW
compression is no longer available in the ImageMagick distribution (gif pixel data is saved
uncompressed). Unisys claims that they have the right to demand licenses and/or fees from free
software incorporating the LZW algorithms, even though they are currently not doing this. This
will hang over the head of the developer of any free software that creates gif files until the Welch
patent expires on December 10, 2002 (17 years after its award date). The popular shareware
program for Windows, Paint Shop Pro, can be used in a batch mode to convert these
uncompressed gif files to compressed gif files.
The above mentioned methods can be used when only one or a few PostScript files require
conversion. The author has yet to find an easy automatic method that can be used when a few
hundred files require conversion. This is the case when a GIS is used to produce a number of
different scenarios.
IMAGE RESOLUTION
The first question that the author wanting to convert a PostScript file to a WWW friendly format
needs to ask is,
who will be viewing the image, what size should the image be, what is going to be done will the
image.
Some images, Fig. 1, are used primarily to depict the distribution and sizes, for example, of the
Provinces in Southern Africa and therefore do not require a high level of resolution. Fig.2, on the
hand, requires a high level of resolution as the author wishes to portray the proximity of certain
zonal, arc and point information.
The level of resolution should therefore be taken into account when deciding on which conversion
technique to use. In the case of Fig. 1, a screen capture method would suffice whereas a more
exact conversion technique is required when using an image as described by Fig. 2.
The level of resolution of a PostScript file is output device dependent and is therefore the ideal
format to use when saving or sharing images. The only drawback, when using the WWW, is the
complexity of the software that is required to render the image and most important of all is the
speed at which this can be done.
AUTHORS CHOICE
This paper is still in preparation and therefore interested parties should view this paper every week or so to see what is happening. The author is in the process of researching the different Pbmplus tools. The process that I have used to generate the gif files, that can be viewed at http://www.ccwr.ac.za/~lynch2/data.html, is as follows;
1) use ArcInfo to produce an EPS file,
2) use Ghostscript to convert the EPS file (a.eps) to a PCX format (a.pcx)
gswin32 -q -r100 -dNOPAUSE -sOutputFile=a.pcx -sPAPERSIZE=a4 -sDEVICE=pcx256 a.eps
3) use Pbmplus to convert the PCX (a.pcx) file to portable pixmap file (a.ppm)
pcxtoppm.exe a.pcx > a.ppm
4) use Pbmplus to crop the pixmap file (a.ppm) to a pixmap file (a.crp)
pnmcrop.exe a.ppm > a.crp
5) use Pbmplus to reduce the size of the pixmap file (a.crp) onto a pixmap file (a.scl)
pnmscale.exe 0.5 a.crp > a.scl
6) use Pbmplus to reduce the number of colours in the pixmap file (a.scl) to a pixmap file (a.qua) containing a maximum of 256 colours (gif limitation)
ppmquant.exe 256 a.scl > a.qua
7) use Pbmplus to rotate the pixmap file (a.qua) through 90 to produce a pixmap file (a.rot)
pnmrotat.exe -noantialias -90 a.qua > a.rot
8) finally use Pbmplus to convert the pixmap file (a.rot) to a gif file (a.gif)
ppmtogif.exe -interlace a.rot a.gif
9) insert an URL pointing to the gif file (a.gif)
I have used an IBM RS6000 for steps 1 and 9 and a Pentium 100 Win95 machine for the other
steps. I have set up an automated process to perform steps 2 through 8 and it takes approximately
2 minutes to produce the images mentioned above. The Pbmplus suite of Win95 utilities can be
downloaded from a simtel site nearer to you by searching the Shareware site for netpbm.
DISCUSSION AND CONCLUSIONS
In the 1980's computer users had access to a host of different computer systems and each of these
systems was using a different operating system. It was difficult to share information across
platforms. The acceptance of UNIX and the Windows operating systems led to an almost
transparent sharing of information between operating systems.
The introduction of the Internet and in particular the WWW and HTML, has made it possible for
different computers using different operating systems to share information from users all over the
world via the Internet communications backbone. The ability to share text or ASCII data has
never actually posed a problem to the computer user community. The major headache has been
the ability to share graphical images between different operating systems. When Adobe Systems
Inc. launched the PostScript format, a transparent graphical language was introduced that enabled
users to output the graphical images exactly and without loss of resolution to any PostScript
printer or to any bitmap printer using the appropriate software (e.g. Ghostscript software).
The GIS industry has expanded at almost the same rate as that of the Internet and the WWW. The
ability of the GIS fraternity to share graphical images across the WWW is therefore of utmost
importance. The majority of GIS packages are able to produce output in a PostScript format and
it is hoped that this document will assist them in sharing their images with the WWW user
community.
INDUSTRY WISH LIST
The GIS software developers need to produce code that will allow the users an option to save
their graphical output into a WWW friendly format. When this has been done this document will
become obsolete and the WWW and the GIS fraternity will be able to share information more
freely.
ACKNOWLEDGEMENTS
The Computing Centre for Water Research (CCWR) is acknowledged gratefully for their
assistance in making this research possible and for allowing the author to make use of their WWW
server to publish and disseminate his published articles to the scientific community all over the
World. The Research Fund of the University of Natal is thanked for their financial support in this
project. The Water Research Commission (WRC) is also acknowledged for allowing time to do
this research. Finally, the Internet user community, and in particular Richard Kunz, are also
thanked for their assistance in making this research possible.
DISCLAIMER
The information provided herein is subject to change without notice. In no event will I be liable
for damages, including loss of revenue, loss of profits or other incidental or consequential
damages arising out of the use or inability to use the information presented in this document.
REFERENCES
Adobe Systems Inc.© Aladdin Enterprises CompuServe Inc.© Computing Centre for Water Research Environmental Systems Research Institute Inc.© Albert Goodman
JASC, Inc.,Paint Shop Pro©, PO Box 44997, Eden Prairie, Minnesota 55344, UNITED STATES
OF AMERICA.
ImageMagick, E. I. du Pont de Nemours and Company ©, 1007 Market Street, Wilmington,
Delaware 19898, UNITED STATES OF AMERICA.
Independent JPEG Group
Anthony M. Rutkowski, Microsoft® Encarta® 96 Encyclopedia, Microsoft® Corporation, One Microsoft Way, Redmond, Washington 98052-6399, UNITED STATES OF AMERICA.
Unisys Corporation©
Water Research Commission Welch Patent Licensing Department
Wingif©, SuperSet Software Corp., PO Box 50476, Provo, Utah 84605-0476, UNITED
STATES OF AMERICA.
XV©