Huang Zhengdong

The Manipulation of Chinese Characters in PC ARC/INFO

ABSTRACT

For such a software as PC ARC/INFO (DOS version) which cannot be put in a Chinese Language Environment (CLE), the manipulation of Chinese Characters for mapping output is a troublesome problem which the Chinese users are facing. Based on PC ARC/INFO (version 3.4D plus), a study was carried out on the methods of coding and inputting Chinese characters. Several scenarios were put forward and realized. However, restricted by its own characteristics, the manipulation of Chinese characters in PC ARC/INFO has not come to a convenient point.


1. The General Method of Chinese Character Manipulation

The operating systems and related software packages are based on those few characters which compose words and sentences in western language. The Chinese, however, is a quite different and special language. There is almost no relationship between characters as far as their shapes are concerned. This characteristic leads to great inconvenience in making use of Chinese in computers.

However, nowadays many software packages have a Chinese version, in which the main information is shown in Chinese. There are basically two methods in adding Chinese to software packages: one is to modify the source code of the software programs under Chinese operating system; The other is to "hang" those Chinese characters on the display background of a software.

The appearance of software packages with Chinese versions is of course to the benefit of Chinese users. The characteristics is that those characters shown in various interfaces are organized with dot-matrix method, and, furthermore, at least one Chinese character set sized several hundreds bytes must be available. These are only for display purposes.

The other kind of manipulation concerns "vector Chinese", the registration of Chinese characters in coordinate pairs. Many applications require certain kind of output (hardcopy) such as thematic maps, graphs, tables and so on. In a GIS application, the output of maps to plotters is of great interest. The element of output is coordinates. Thus the Chinese characters included in the output have to be written in coordinates, which are called vector Chinese.

In fact, since many users of western software packages have good language basis, they may not be interested in those "modified" software packages (Chinese version). But they will be in great need of vector Chinese if they have maps to be plotted.

It is obvious that western software packages can't provide programs for making vector Chinese characters. The programs developed by Chinese users, on the other hand, are difficult to become one part of the functions of a software. But a successful software does have the advantage which could be taken of by Chinese users to solve the "vector Chinese problem". There are basically three methods.

The first method is through file transformation. Many software packages can transform information from files with standard format to their own internal format. The Chinese characters could be firstly transformed to coordinates and be stored in a file with standard format, then the information can be acquired through the transformation functionality of the software, and then those characters become one part of the software's internal data which can be plotted.

The second method is by making use of macro language or user development language provided by some software. The language can be utilized to read data from text files, and pass them to system commands. For example, AutoCAD provides two kinds of development languages: AutoLisp and C. They can read files with Chinese character coordinates and plot the characters on a map.

The last method is to take the advantage of special symbol files. Each Chinese character is one symbol in the symbol file. For instance, the shape in AutoCAD is a kind of symbol. The vector Chinese characters have been loaded in shape successfully. The font file IGLFNT in PC ArcInfo is also such a symbol file.

Of the above three methods, the first two are not as flexible as the third one because they treat all input as coordinates. The vector Chinese characters, once put into a map, are quite difficult to change. On the other hand, the third method regards each character as one object, so the character could be moved, zoomed and rotated.

2. The scenarios of using vector Chinese in
PC ARC/INFO

PC ARC/INFO has many users in P.R. China. The output of maps to pages inevitably includes the plotting of Chinese characters. Although the functionality of this GIS software is undoubtedly powerful, it is not an easy thing to handle vector Chinese characters. Many preparations have to be made, CAD software packages are also made use of. A research shows that there are two scenarios of adding Chinese characters to maps manipulated by PC ARC/INFO.

2.1 Transform Through DXF

PC ArcInfo can get data from DXF files to its coverages. The vector Chinese characters are formed in AutoCAD, then written to a DXF file, then read into coverages.

The general procedures are:

Through the above procedures, a new coverage with the old map and the added Chinese characters is generated and ready for output to plotters. This scenario has the following characteristics:

Inconvenient as this scenario is, it is truly a realistic and successful method.

2.2 Utilize the IGLFNT file

The IGLFNT in PC ARC/INFO is a file storing all text and marker symbols. It contains 17 FONTs (0 ~ 16), each FONT contains 128 PATTERNs. Most of the patterns are defined. To satisfy users' needs, the software allows the users to add 8 more FONTs to IGLFNT, which numbered 17 to 24. The users can define and modify patterns in these FONTs through the FONTEDIT program.

The definition of symbols is based on grids. Each turn point of a symbol is attached to a grid point. The coordinates of the symbol should be between (-49, -49)--(200,200), which is enough for vector Chinese characters. Prior to definition, the type of the pattern should be declared either as TEXT or MARKER, because there are slight differences between them as far as coordinate origin and accessing method are concerned. The Chinese symbols should be in MARKER type.

In FONTEDIT, it is impossible to define the Chinese characters manually. The only possible way is to transform the symbols from coverages with arcs of vector Chinese characters. The rule is that one coverage represents one character, and the coordinate system is pre-defined.

By means of a MARKERSET file (e.g. plotter.MRK) the symbols in IGLFNT can be accessed by average users. The MARKERSET file is a TABLE with items concerning symbol number, font number, pattern number, color and size. The contents of the table can be modified.

It seems that it is a practical and flexible method to acquire Chinese characters through IGLFNT. But there is still one problem -- the capacity of IGLFNT-- remains unsettled. Including the user-defined fonts, the total number of fonts in IGLFNT is 24. To be effective, one font should store only one division or Qu of the standard Chinese character set which has altogether 72 divisions. So it is clear that the IGLFNT cannot contain all the Chinese symbols, and it is impossible to build a complete vector Chinese symbol set in PC ARC/INFO.

Restrained by capacity, a CASE method is developed when making use of IGLFNT to store Chinese characters, i.e. to store only those needed characters in IGLFNT. The procedures may include:

Since there are many preparations to be made by the users, this method is not so efficient. But there is a definite difference concerning the Chinese characters added with this method and with the one stated in the first scenario, because in this method, the Chinese characters are regarded as symbols which could be accessed and edited easily. The SML may provide an ideal means of batch processing and menu interface.

An experiment was carried out to make Chinese characters as symbols in a different way. With some effort, all the standard Chinese characters (which have a total number of more then 6000) are written into three font tiles whose names are user-defined other than the name IGLFNT. In this method, the only thing a user can do is to copy those characters needed from the three font files into the file IGLFNT and make use of them.

3. Conclusion

The two scenarios presented may meet the need of Chinese character annotation in PC ARC/INFO. There are, however, clear inadequacy which includes:

The building of a Chinese symbol set is an efficient means of realizing Chinese character in western software packages. The IGLFNT in PC ArcInfo may store vector Chinese symbols, but its capacity restrained this application.

References

1. PC ARCPLOT User's Guide (V. 3.4D plus), Esri, 1992 2. PC DATA CONVERSION
(V. 3.4D plus), Esri, 1992


Huang Zhengdong
ECURSPAM
Wuhan Technical University of Surveying and Mapping
39 Luoyu Road, Wuhan 430070
P.R. China