annotate man/lispref/mule.texi @ 981:0205cafe98ff

[xemacs-hg @ 2002-08-30 08:25:48 by youngs] Don't look now, but 21.5.9 is on its way out the door! Don't forget what good 'ol Ma used to say... "Eat your brussels sprouts, little Johnny, so you can grow up big and strong."
author youngs
date Fri, 30 Aug 2002 08:26:22 +0000
parents 37e56e920ac5
children c1553814932e
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1 @c -*-texinfo-*-
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2 @c This is part of the XEmacs Lisp Reference Manual.
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
3 @c Copyright (C) 1996 Ben Wing, 2001-2002 Free Software Foundation.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
4 @c See the file lispref.texi for copying conditions.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
5 @setfilename ../../info/internationalization.info
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
6 @node MULE, Tips, Internationalization, top
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
7 @chapter MULE
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
8
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
9 @dfn{MULE} is the name originally given to the version of GNU Emacs
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
10 extended for multi-lingual (and in particular Asian-language) support.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
11 ``MULE'' is short for ``MUlti-Lingual Emacs''. It is an extension and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
12 complete rewrite of Nemacs (``Nihon Emacs'' where ``Nihon'' is the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
13 Japanese word for ``Japan''), which only provided support for Japanese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
14 XEmacs refers to its multi-lingual support as @dfn{MULE support} since
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
15 it is based on @dfn{MULE}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
16
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
17 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
18 * Internationalization Terminology::
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
19 Definition of various internationalization terms.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
20 * Charsets:: Sets of related characters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
21 * MULE Characters:: Working with characters in XEmacs/MULE.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
22 * Composite Characters:: Making new characters by overstriking other ones.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
23 * Coding Systems:: Ways of representing a string of chars using integers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
24 * CCL:: A special language for writing fast converters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
25 * Category Tables:: Subdividing charsets into groups.
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
26 * Unicode Support:: The universal coded character set.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
27 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
28
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
29 @node Internationalization Terminology, Charsets, , MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
30 @section Internationalization Terminology
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
31
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
32 In internationalization terminology, a string of text is divided up
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
33 into @dfn{characters}, which are the printable units that make up the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
34 text. A single character is (for example) a capital @samp{A}, the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
35 number @samp{2}, a Katakana character, a Hangul character, a Kanji
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
36 ideograph (an @dfn{ideograph} is a ``picture'' character, such as is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
37 used in Japanese Kanji, Chinese Hanzi, and Korean Hanja; typically there
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
38 are thousands of such ideographs in each language), etc. The basic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
39 property of a character is that it is the smallest unit of text with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
40 semantic significance in text processing.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
41
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
42 Human beings normally process text visually, so to a first approximation
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
43 a character may be identified with its shape. Note that the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
44 character may be drawn by two different people (or in two different
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
45 fonts) in slightly different ways, although the "basic shape" will be the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
46 same. But consider the works of Scott Kim; human beings can recognize
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
47 hugely variant shapes as the "same" character. Sometimes, especially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
48 where characters are extremely complicated to write, completely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
49 different shapes may be defined as the "same" character in national
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
50 standards. The Taiwanese variant of Hanzi is generally the most
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
51 complicated; over the centuries, the Japanese, Koreans, and the People's
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
52 Republic of China have adopted simplifications of the shape, but the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
53 line of descent from the original shape is recorded, and the meanings
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
54 and pronunciation of different forms of the same character are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
55 considered to be identical within each language. (Of course, it may
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
56 take a specialist to recognize the related form; the point is that the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
57 relations are standardized, despite the differing shapes.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
58
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
59 In some cases, the differences will be significant enough that it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
60 actually possible to identify two or more distinct shapes that both
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
61 represent the same character. For example, the lowercase letters
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
62 @samp{a} and @samp{g} each have two distinct possible shapes---the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
63 @samp{a} can optionally have a curved tail projecting off the top, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
64 the @samp{g} can be formed either of two loops, or of one loop and a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
65 tail hanging off the bottom. Such distinct possible shapes of a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
66 character are called @dfn{glyphs}. The important characteristic of two
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
67 glyphs making up the same character is that the choice between one or
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
68 the other is purely stylistic and has no linguistic effect on a word
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
69 (this is the reason why a capital @samp{A} and lowercase @samp{a}
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
70 are different characters rather than different glyphs---e.g.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
71 @samp{Aspen} is a city while @samp{aspen} is a kind of tree).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
72
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
73 Note that @dfn{character} and @dfn{glyph} are used differently
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
74 here than elsewhere in XEmacs.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
75
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
76 A @dfn{character set} is essentially a set of related characters. ASCII,
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
77 for example, is a set of 94 characters (or 128, if you count
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
78 non-printing characters). Other character sets are ISO8859-1 (ASCII
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
79 plus various accented characters and other international symbols),
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
80 JIS X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
81 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
82 GB2312 (Mainland Chinese Hanzi), etc.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
83
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
84 The definition of a character set will implicitly or explicitly give
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
85 it an @dfn{ordering}, a way of assigning a number to each character in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
86 the set. For many character sets, there is a natural ordering, for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
87 example the ``ABC'' ordering of the Roman letters. But it is not clear
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
88 whether digits should come before or after the letters, and in fact
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
89 different European languages treat the ordering of accented characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
90 differently. It is useful to use the natural order where available, of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
91 course. The number assigned to any particular character is called the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
92 character's @dfn{code point}. (Within a given character set, each
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
93 character has a unique code point. Thus the word "set" is ill-chosen;
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
94 different orderings of the same characters are different character sets.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
95 Identifying characters is simple enough for alphabetic character sets,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
96 but the difference in ordering can cause great headaches when the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
97 thousands of characters are used by different cultures as in the Hanzi.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
98
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
99 A code point may be broken into a number of @dfn{position codes}. The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
100 number of position codes required to index a particular character in a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
101 character set is called the @dfn{dimension} of the character set. For
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
102 practical purposes, a position code may be thought of as a byte-sized
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
103 index. The printing characters of ASCII, being a relatively small
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
104 character set, is of dimension one, and each character in the set is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
105 indexed using a single position code, in the range 1 through 94. Use of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
106 this unusual range, rather than the familiar 33 through 126, is an
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
107 intentional abstraction; to understand the programming issues you must
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
108 break the equation between character sets and encodings.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
109
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
110 JIS X 0208, i.e. Japanese Kanji, has thousands of characters, and is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
111 of dimension two -- every character is indexed by two position codes,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
112 each in the range 1 through 94. (This number ``94'' is not a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
113 coincidence; we shall see that the JIS position codes were chosen so
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
114 that JIS kanji could be encoded without using codes that in ASCII are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
115 associated with device control functions.) Note that the choice of the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
116 range here is somewhat arbitrary. You could just as easily index the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
117 printing characters in ASCII using numbers in the range 0 through 93, 2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
118 through 95, 3 through 96, etc. In fact, the standardized
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
119 @emph{encoding} for the ASCII @emph{character set} uses the range 33
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
120 through 126.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
121
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
122 An @dfn{encoding} is a way of numerically representing characters from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
123 one or more character sets into a stream of like-sized numerical values
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
124 called @dfn{words}; typically these are 8-bit, 16-bit, or 32-bit
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
125 quantities. If an encoding encompasses only one character set, then the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
126 position codes for the characters in that character set could be used
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
127 directly. (This is the case with the trivial cipher used by children,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
128 assigning 1 to `A', 2 to `B', and so on.) However, even with ASCII,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
129 other considerations intrude. For example, why are the upper- and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
130 lowercase alphabets separated by 8 characters? Why do the digits start
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
131 with `0' being assigned the code 48? In both cases because semantically
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
132 interesting operations (case conversion and numerical value extraction)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
133 become convenient masking operations. Other artificial aspects (the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
134 control characters being assigned to codes 0--31 and 127) are historical
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
135 accidents. (The use of 127 for @samp{DEL} is an artifact of the "punch
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
136 once" nature of paper tape, for example.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
137
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
138 Naive use of the position code is not possible, however, if more than
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
139 one character set is to be used in the encoding. For example, printed
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
140 Japanese text typically requires characters from multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
141 -- ASCII, JIS X 0208, and JIS X 0212, to be specific. Each of these is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
142 indexed using one or more position codes in the range 1 through 94, so
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
143 the position codes could not be used directly or there would be no way
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
144 to tell which character was meant. Different Japanese encodings handle
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
145 this differently -- JIS uses special escape characters to denote
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
146 different character sets; EUC sets the high bit of the position codes
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
147 for JIS X 0208 and JIS X 0212, and puts a special extra byte before each
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
148 JIS X 0212 character; etc. (JIS, EUC, and most of the other encodings
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
149 you will encounter in files are 7-bit or 8-bit encodings. There is one
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
150 common 16-bit encoding, which is Unicode; this strives to represent all
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
151 the world's characters in a single large character set. 32-bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
152 encodings are often used internally in programs, such as XEmacs with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
153 MULE support, to simplify the code that manipulates them; however, they
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
154 are not used externally because they are not very space-efficient.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
155
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
156 A general method of handling text using multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
157 (whether for multilingual text, or simply text in an extremely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
158 complicated single language like Japanese) is defined in the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
159 international standard ISO 2022. ISO 2022 will be discussed in more
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
160 detail later (@pxref{ISO 2022}), but for now suffice it to say that text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
161 needs control functions (at least spacing), and if escape sequences are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
162 to be used, an escape sequence introducer. It was decided to make all
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
163 text streams compatible with ASCII in the sense that the codes 0--31
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
164 (and 128-159) would always be control codes, never graphic characters,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
165 and where defined by the character set the @samp{SPC} character would be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
166 assigned code 32, and @samp{DEL} would be assigned 127. Thus there are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
167 94 code points remaining if 7 bits are used. This is the reason that
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
168 most character sets are defined using position codes in the range 1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
169 through 94. Then ISO 2022 compatible encodings are produced by shifting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
170 the position codes 1 to 94 into character codes 33 to 126, or (if 8 bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
171 codes are available) into character codes 161 to 254.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
172
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
173 Encodings are classified as either @dfn{modal} or @dfn{non-modal}. In
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
174 a @dfn{modal encoding}, there are multiple states that the encoding can
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
175 be in, and the interpretation of the values in the stream depends on the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
176 current global state of the encoding. Special values in the encoding,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
177 called @dfn{escape sequences}, are used to change the global state.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
178 JIS, for example, is a modal encoding. The bytes @samp{ESC $ B}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
179 indicate that, from then on, bytes are to be interpreted as position
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
180 codes for JIS X 0208, rather than as ASCII. This effect is cancelled
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
181 using the bytes @samp{ESC ( B}, which mean ``switch from whatever the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
182 current state is to ASCII''. To switch to JIS X 0212, the escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
183 sequence @samp{ESC $ ( D}. (Note that here, as is common, the escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
184 sequences do in fact begin with @samp{ESC}. This is not necessarily the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
185 case, however. Some encodings use control characters called "locking
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
186 shifts" (effect persists until cancelled) to switch character sets.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
187
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
188 A @dfn{non-modal encoding} has no global state that extends past the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
189 character currently being interpreted. EUC, for example, is a
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
190 non-modal encoding. Characters in JIS X 0208 are encoded by setting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
191 the high bit of the position codes, and characters in JIS X 0212 are
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
192 encoded by doing the same but also prefixing the character with the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
193 byte 0x8F.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
194
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
195 The advantage of a modal encoding is that it is generally more
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
196 space-efficient, and is easily extendible because there are essentially
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
197 an arbitrary number of escape sequences that can be created. The
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
198 disadvantage, however, is that it is much more difficult to work with
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
199 if it is not being processed in a sequential manner. In the non-modal
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
200 EUC encoding, for example, the byte 0x41 always refers to the letter
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
201 @samp{A}; whereas in JIS, it could either be the letter @samp{A}, or
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
202 one of the two position codes in a JIS X 0208 character, or one of the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
203 two position codes in a JIS X 0212 character. Determining exactly which
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
204 one is meant could be difficult and time-consuming if the previous
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
205 bytes in the string have not already been processed, or impossible if
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
206 they are drawn from an external stream that cannot be rewound.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
207
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
208 Non-modal encodings are further divided into @dfn{fixed-width} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
209 @dfn{variable-width} formats. A fixed-width encoding always uses
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
210 the same number of words per character, whereas a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
211 encoding does not. EUC is a good example of a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
212 encoding: one to three bytes are used per character, depending on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
213 the character set. 16-bit and 32-bit encodings are nearly always
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
214 fixed-width, and this is in fact one of the main reasons for using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
215 an encoding with a larger word size. The advantages of fixed-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
216 encodings should be obvious. The advantages of variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
217 encodings are that they are generally more space-efficient and allow
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
218 for compatibility with existing 8-bit encodings such as ASCII. (For
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
219 example, in Unicode ASCII characters are simply promoted to a 16-bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
220 representation. That means that every ASCII character contains a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
221 @samp{NUL} byte; evidently all of the standard string manipulation
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
222 functions will lose badly in a fixed-width Unicode environment.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
223
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
224 The bytes in an 8-bit encoding are often referred to as @dfn{octets}
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
225 rather than simply as bytes. This terminology dates back to the days
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
226 before 8-bit bytes were universal, when some computers had 9-bit bytes,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
227 others had 10-bit bytes, etc.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
228
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
229 @node Charsets, MULE Characters, Internationalization Terminology, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
230 @section Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
231
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
232 A @dfn{charset} in MULE is an object that encapsulates a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
233 particular character set as well as an ordering of those characters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
234 Charsets are permanent objects and are named using symbols, like
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
235 faces.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
236
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
237 @defun charsetp object
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
238 This function returns non-@code{nil} if @var{object} is a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
239 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
240
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
241 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
242 * Charset Properties:: Properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
243 * Basic Charset Functions:: Functions for working with charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
244 * Charset Property Functions:: Functions for accessing charset properties.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
245 * Predefined Charsets:: Predefined charset objects.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
246 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
247
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
248 @node Charset Properties, Basic Charset Functions, , Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
249 @subsection Charset Properties
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
250
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
251 Charsets have the following properties:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
252
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
253 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
254 @item name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
255 A symbol naming the charset. Every charset must have a different name;
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
256 this allows a charset to be referred to using its name rather than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
257 the actual charset object.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
258 @item doc-string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
259 A documentation string describing the charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
260 @item registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
261 A regular expression matching the font registry field for this character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
262 set. For example, both the @code{ascii} and @code{latin-iso8859-1}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
263 charsets use the registry @code{"ISO8859-1"}. This field is used to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
264 choose an appropriate font when the user gives a general font
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
265 specification such as @samp{-*-courier-medium-r-*-140-*}, i.e. a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
266 14-point upright medium-weight Courier font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
267 @item dimension
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
268 Number of position codes used to index a character in the character set.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
269 XEmacs/MULE can only handle character sets of dimension 1 or 2.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
270 This property defaults to 1.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
271 @item chars
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
272 Number of characters in each dimension. In XEmacs/MULE, the only
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
273 allowed values are 94 or 96. (There are a couple of pre-defined
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
274 character sets, such as ASCII, that do not follow this, but you cannot
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
275 define new ones like this.) Defaults to 94. Note that if the dimension
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
276 is 2, the character set thus described is 94x94 or 96x96.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
277 @item columns
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
278 Number of columns used to display a character in this charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
279 Only used in TTY mode. (Under X, the actual width of a character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
280 can be derived from the font used to display the characters.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
281 If unspecified, defaults to the dimension. (This is almost
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
282 always the correct value, because character sets with dimension 2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
283 are usually ideograph character sets, which need two columns to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
284 display the intricate ideographs.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
285 @item direction
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
286 A symbol, either @code{l2r} (left-to-right) or @code{r2l}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
287 (right-to-left). Defaults to @code{l2r}. This specifies the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
288 direction that the text should be displayed in, and will be
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
289 left-to-right for most charsets but right-to-left for Hebrew
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
290 and Arabic. (Right-to-left display is not currently implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
291 @item final
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
292 Final byte of the standard ISO 2022 escape sequence designating this
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
293 charset. Must be supplied. Each combination of (@var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
294 @var{chars}) defines a separate namespace for final bytes, and each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
295 charset within a particular namespace must have a different final byte.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
296 Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
297 dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
298 bytes in the range 0x30 - 0x3F are reserved for user-defined (not
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
299 official) character sets. For more information on ISO 2022, see @ref{Coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
300 Systems}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
301 @item graphic
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
302 0 (use left half of font on output) or 1 (use right half of font on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
303 output). Defaults to 0. This specifies how to convert the position
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
304 codes that index a character in a character set into an index into the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
305 font used to display the character set. With @code{graphic} set to 0,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
306 position codes 33 through 126 map to font indices 33 through 126; with
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
307 it set to 1, position codes 33 through 126 map to font indices 161
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
308 through 254 (i.e. the same number but with the high bit set). For
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
309 example, for a font whose registry is ISO8859-1, the left half of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
310 font (octets 0x20 - 0x7F) is the @code{ascii} charset, while the right
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
311 half (octets 0xA0 - 0xFF) is the @code{latin-iso8859-1} charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
312 @item ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
313 A compiled CCL program used to convert a character in this charset into
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
314 an index into the font. This is in addition to the @code{graphic}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
315 property. If a CCL program is defined, the position codes of a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
316 character will first be processed according to @code{graphic} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
317 then passed through the CCL program, with the resulting values used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
318 to index the font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
319
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
320 This is used, for example, in the Big5 character set (used in Taiwan).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
321 This character set is not ISO-2022-compliant, and its size (94x157) does
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
322 not fit within the maximum 96x96 size of ISO-2022-compliant character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
323 sets. As a result, XEmacs/MULE splits it (in a rather complex fashion,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
324 so as to group the most commonly used characters together) into two
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
325 charset objects (@code{big5-1} and @code{big5-2}), each of size 94x94,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
326 and each charset object uses a CCL program to convert the modified
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
327 position codes back into standard Big5 indices to retrieve a character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
328 from a Big5 font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
329 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
330
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
331 Most of the above properties can only be set when the charset is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
332 initialized, and cannot be changed later.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
333 @xref{Charset Property Functions}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
334
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
335 @node Basic Charset Functions, Charset Property Functions, Charset Properties, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
336 @subsection Basic Charset Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
337
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
338 @defun find-charset charset-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
339 This function retrieves the charset of the given name. If
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
340 @var{charset-or-name} is a charset object, it is simply returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
341 Otherwise, @var{charset-or-name} should be a symbol. If there is no
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
342 such charset, @code{nil} is returned. Otherwise the associated charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
343 object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
344 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
345
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
346 @defun get-charset name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
347 This function retrieves the charset of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
348 @code{find-charset} except an error is signalled if there is no such
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
349 charset instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
350 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
351
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
352 @defun charset-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
353 This function returns a list of the names of all defined charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
354 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
355
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
356 @defun make-charset name doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
357 This function defines a new character set. This function is for use
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
358 with MULE support. @var{name} is a symbol, the name by which the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
359 character set is normally referred. @var{doc-string} is a string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
360 describing the character set. @var{props} is a property list,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
361 describing the specific nature of the character set. The recognized
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
362 properties are @code{registry}, @code{dimension}, @code{columns},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
363 @code{chars}, @code{final}, @code{graphic}, @code{direction}, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
364 @code{ccl-program}, as previously described.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
365 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
366
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
367 @defun make-reverse-direction-charset charset new-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
368 This function makes a charset equivalent to @var{charset} but which goes
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
369 in the opposite direction. @var{new-name} is the name of the new
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
370 charset. The new charset is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
371 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
372
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
373 @defun charset-from-attributes dimension chars final &optional direction
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
374 This function returns a charset with the given @var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
375 @var{chars}, @var{final}, and @var{direction}. If @var{direction} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
376 omitted, both directions will be checked (left-to-right will be returned
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
377 if character sets exist for both directions).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
378 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
379
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
380 @defun charset-reverse-direction-charset charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
381 This function returns the charset (if any) with the same dimension,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
382 number of characters, and final byte as @var{charset}, but which is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
383 displayed in the opposite direction.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
384 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
385
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
386 @node Charset Property Functions, Predefined Charsets, Basic Charset Functions, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
387 @subsection Charset Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
388
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
389 All of these functions accept either a charset name or charset object.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
390
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
391 @defun charset-property charset prop
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
392 This function returns property @var{prop} of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
393 @xref{Charset Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
394 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
395
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
396 Convenience functions are also provided for retrieving individual
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
397 properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
398
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
399 @defun charset-name charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
400 This function returns the name of @var{charset}. This will be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
401 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
402
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
403 @defun charset-description charset
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
404 This function returns the documentation string of @var{charset}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
405 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
406
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
407 @defun charset-registry charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
408 This function returns the registry of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
409 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
410
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
411 @defun charset-dimension charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
412 This function returns the dimension of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
413 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
414
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
415 @defun charset-chars charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
416 This function returns the number of characters per dimension of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
417 @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
418 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
419
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
420 @defun charset-width charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
421 This function returns the number of display columns per character (in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
422 TTY mode) of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
423 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
424
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
425 @defun charset-direction charset
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
426 This function returns the display direction of @var{charset}---either
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
427 @code{l2r} or @code{r2l}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
428 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
429
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
430 @defun charset-iso-final-char charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
431 This function returns the final byte of the ISO 2022 escape sequence
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
432 designating @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
433 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
434
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
435 @defun charset-iso-graphic-plane charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
436 This function returns either 0 or 1, depending on whether the position
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
437 codes of characters in @var{charset} map to the left or right half
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
438 of their font, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
439 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
440
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
441 @defun charset-ccl-program charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
442 This function returns the CCL program, if any, for converting
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
443 position codes of characters in @var{charset} into font indices.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
444 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
445
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
446 The only property of a charset that can currently be set after
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
447 the charset has been created is the CCL program.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
448
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
449 @defun set-charset-ccl-program charset ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
450 This function sets the @code{ccl-program} property of @var{charset} to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
451 @var{ccl-program}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
452 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
453
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
454 @node Predefined Charsets, , Charset Property Functions, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
455 @subsection Predefined Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
456
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
457 The following charsets are predefined in the C code.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
458
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
459 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
460 Name Type Fi Gr Dir Registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
461 --------------------------------------------------------------
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
462 ascii 94 B 0 l2r ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
463 control-1 94 0 l2r ---
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
464 latin-iso8859-1 94 A 1 l2r ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
465 latin-iso8859-2 96 B 1 l2r ISO8859-2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
466 latin-iso8859-3 96 C 1 l2r ISO8859-3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
467 latin-iso8859-4 96 D 1 l2r ISO8859-4
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
468 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
469 arabic-iso8859-6 96 G 1 r2l ISO8859-6
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
470 greek-iso8859-7 96 F 1 l2r ISO8859-7
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
471 hebrew-iso8859-8 96 H 1 r2l ISO8859-8
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
472 latin-iso8859-9 96 M 1 l2r ISO8859-9
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
473 thai-tis620 96 T 1 l2r TIS620
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
474 katakana-jisx0201 94 I 1 l2r JISX0201.1976
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
475 latin-jisx0201 94 J 0 l2r JISX0201.1976
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
476 japanese-jisx0208-1978 94x94 @@ 0 l2r JISX0208.1978
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
477 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83|90)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
478 japanese-jisx0212 94x94 D 0 l2r JISX0212
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
479 chinese-gb2312 94x94 A 0 l2r GB2312
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
480 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
481 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
482 chinese-big5-1 94x94 0 0 l2r Big5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
483 chinese-big5-2 94x94 1 0 l2r Big5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
484 korean-ksc5601 94x94 C 0 l2r KSC5601
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
485 composite 96x96 0 l2r ---
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
486 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
487
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
488 The following charsets are predefined in the Lisp code.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
489
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
490 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
491 Name Type Fi Gr Dir Registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
492 --------------------------------------------------------------
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
493 arabic-digit 94 2 0 l2r MuleArabic-0
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
494 arabic-1-column 94 3 0 r2l MuleArabic-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
495 arabic-2-column 94 4 0 r2l MuleArabic-2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
496 sisheng 94 0 0 l2r sisheng_cwnn\|OMRON_UDC_ZH
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
497 chinese-cns11643-3 94x94 I 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
498 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
499 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
500 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
501 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
502 ethiopic 94x94 2 0 l2r Ethio
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
503 ascii-r2l 94 B 0 r2l ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
504 ipa 96 0 1 l2r MuleIPA
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
505 vietnamese-lower 96 1 1 l2r VISCII1.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
506 vietnamese-upper 96 2 1 l2r VISCII1.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
507 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
508
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
509 For all of the above charsets, the dimension and number of columns are
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
510 the same.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
511
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
512 Note that ASCII, Control-1, and Composite are handled specially.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
513 This is why some of the fields are blank; and some of the filled-in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
514 fields (e.g. the type) are not really accurate.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
515
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
516 @node MULE Characters, Composite Characters, Charsets, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
517 @section MULE Characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
518
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
519 @defun make-char charset arg1 &optional arg2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
520 This function makes a multi-byte character from @var{charset} and octets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
521 @var{arg1} and @var{arg2}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
522 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
523
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
524 @defun char-charset character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
525 This function returns the character set of char @var{character}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
526 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
527
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
528 @defun char-octet character &optional n
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
529 This function returns the octet (i.e. position code) numbered @var{n}
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
530 (should be 0 or 1) of char @var{character}. @var{n} defaults to 0 if omitted.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
531 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
532
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
533 @defun find-charset-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
534 This function returns a list of the charsets in the region between
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
535 @var{start} and @var{end}. @var{buffer} defaults to the current buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
536 if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
537 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
538
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
539 @defun find-charset-string string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
540 This function returns a list of the charsets in @var{string}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
541 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
542
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
543 @node Composite Characters, Coding Systems, MULE Characters, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
544 @section Composite Characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
545
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
546 Composite characters are not yet completely implemented.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
547
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
548 @defun make-composite-char string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
549 This function converts a string into a single composite character. The
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
550 character is the result of overstriking all the characters in the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
551 string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
552 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
553
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
554 @defun composite-char-string character
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
555 This function returns a string of the characters comprising a composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
556 character.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
557 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
558
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
559 @defun compose-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
560 This function composes the characters in the region from @var{start} to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
561 @var{end} in @var{buffer} into one composite character. The composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
562 character replaces the composed characters. @var{buffer} defaults to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
563 the current buffer if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
564 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
565
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
566 @defun decompose-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
567 This function decomposes any composite characters in the region from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
568 @var{start} to @var{end} in @var{buffer}. This converts each composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
569 character into one or more characters, the individual characters out of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
570 which the composite character was formed. Non-composite characters are
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
571 left as-is. @var{buffer} defaults to the current buffer if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
572 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
573
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
574 @node Coding Systems, CCL, Composite Characters, MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
575 @section Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
576
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
577 A coding system is an object that defines how text containing multiple
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
578 character sets is encoded into a stream of (typically 8-bit) bytes. The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
579 coding system is used to decode the stream into a series of characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
580 (which may be from multiple charsets) when the text is read from a file
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
581 or process, and is used to encode the text back into the same format
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
582 when it is written out to a file or process.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
583
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
584 For example, many ISO-2022-compliant coding systems (such as Compound
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
585 Text, which is used for inter-client data under the X Window System) use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
586 escape sequences to switch between different charsets -- Japanese Kanji,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
587 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
588 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
589 @code{make-coding-system} for more information.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
590
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
591 Coding systems are normally identified using a symbol, and the symbol is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
592 accepted in place of the actual coding system object whenever a coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
593 system is called for. (This is similar to how faces and charsets work.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
594
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
595 @defun coding-system-p object
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
596 This function returns non-@code{nil} if @var{object} is a coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
597 @end defun
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
598
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
599 @menu
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
600 * Coding System Types:: Classifying coding systems.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
601 * ISO 2022:: An international standard for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
602 charsets and encodings.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
603 * EOL Conversion:: Dealing with different ways of denoting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
604 the end of a line.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
605 * Coding System Properties:: Properties of a coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
606 * Basic Coding System Functions:: Working with coding systems.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
607 * Coding System Property Functions:: Retrieving a coding system's properties.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
608 * Encoding and Decoding Text:: Encoding and decoding text.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
609 * Detection of Textual Encoding:: Determining how text is encoded.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
610 * Big5 and Shift-JIS Functions:: Special functions for these non-standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
611 encodings.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
612 * Predefined Coding Systems:: Coding systems implemented by MULE.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
613 @end menu
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
614
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
615 @node Coding System Types, ISO 2022, , Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
616 @subsection Coding System Types
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
617
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
618 The coding system type determines the basic algorithm XEmacs will use to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
619 decode or encode a data stream. Character encodings will be converted
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
620 to the MULE encoding, escape sequences processed, and newline sequences
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
621 converted to XEmacs's internal representation. There are three basic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
622 classes of coding system type: no-conversion, ISO-2022, and special.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
623
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
624 No conversion allows you to look at the file's internal representation.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
625 Since XEmacs is basically a text editor, "no conversion" does convert
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
626 newline conventions by default. (Use the 'binary coding-system if this
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
627 is not desired.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
628
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
629 ISO 2022 (@pxref{ISO 2022}) is the basic international standard regulating
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
630 use of "coded character sets for the exchange of data", ie, text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
631 streams. ISO 2022 contains functions that make it possible to encode
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
632 text streams to comply with restrictions of the Internet mail system and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
633 de facto restrictions of most file systems (eg, use of the separator
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
634 character in file names). Coding systems which are not ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
635 conformant can be difficult to handle. Perhaps more important, they are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
636 not adaptable to multilingual information interchange, with the obvious
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
637 exception of ISO 10646 (Unicode). (Unicode is partially supported by
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
638 XEmacs with the addition of the Lisp package ucs-conv.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
639
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
640 The special class of coding systems includes automatic detection, CCL (a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
641 "little language" embedded as an interpreter, useful for translating
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
642 between variants of a single character set), non-ISO-2022-conformant
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
643 encodings like Unicode, Shift JIS, and Big5, and MULE internal coding.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
644 (NB: this list is based on XEmacs 21.2. Terminology may vary slightly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
645 for other versions of XEmacs and for GNU Emacs 20.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
646
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
647 @table @code
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
648 @item no-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
649 No conversion, for binary files, and a few special cases of non-ISO-2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
650 coding systems where conversion is done by hook functions (usually
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
651 implemented in CCL). On output, graphic characters that are not in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
652 ASCII or Latin-1 will be replaced by a @samp{?}. (For a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
653 no-conversion-encoded buffer, these characters will only be present if
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
654 you explicitly insert them.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
655 @item iso2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
656 Any ISO-2022-compliant encoding. Among others, this includes JIS (the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
657 Japanese encoding commonly used for e-mail), national variants of EUC
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
658 (the standard Unix encoding for Japanese and other languages), and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
659 Compound Text (an encoding used in X11). You can specify more specific
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
660 information about the conversion with the @var{flags} argument.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
661 @item ucs-4
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
662 ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of Unicode.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
663 @item utf-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
664 ISO 10646 UTF-8 encoding. A ``file system safe'' transformation format
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
665 that can be used with both UCS-4 and Unicode.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
666 @item undecided
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
667 Automatic conversion. XEmacs attempts to detect the coding system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
668 in the file.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
669 @item shift-jis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
670 Shift-JIS (a Japanese encoding commonly used in PC operating systems).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
671 @item big5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
672 Big5 (the encoding commonly used for Taiwanese).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
673 @item ccl
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
674 The conversion is performed using a user-written pseudo-code program.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
675 CCL (Code Conversion Language) is the name of this pseudo-code. For
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
676 example, CCL is used to map KOI8-R characters (an encoding for Russian
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
677 Cyrillic) to ISO8859-5 (the form used internally by MULE).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
678 @item internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
679 Write out or read in the raw contents of the memory representing the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
680 buffer's text. This is primarily useful for debugging purposes, and is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
681 only enabled when XEmacs has been compiled with @code{DEBUG_XEMACS} set
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
682 (the @samp{--debug} configure option). @strong{Warning}: Reading in a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
683 file using @code{internal} conversion can result in an internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
684 inconsistency in the memory representing a buffer's text, which will
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
685 produce unpredictable results and may cause XEmacs to crash. Under
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
686 normal circumstances you should never use @code{internal} conversion.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
687 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
688
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
689 @node ISO 2022, EOL Conversion, Coding System Types, Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
690 @section ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
691
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
692 This section briefly describes the ISO 2022 encoding standard. A more
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
693 thorough treatment is available in the original document of ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
694 2022 as well as various national standards (such as JIS X 0202).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
695
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
696 Character sets (@dfn{charsets}) are classified into the following four
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
697 categories, according to the number of characters in the charset:
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
698 94-charset, 96-charset, 94x94-charset, and 96x96-charset. This means
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
699 that although an ISO 2022 coding system may have variable width
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
700 characters, each charset used is fixed-width (in contrast to the MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
701 character set and UTF-8, for example).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
702
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
703 ISO 2022 provides for switching between character sets via escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
704 sequences. This switching is somewhat complicated, because ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
705 provides for both legacy applications like Internet mail that accept
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
706 only 7 significant bits in some contexts (RFC 822 headers, for example),
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
707 and more modern "8-bit clean" applications. It also provides for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
708 compact and transparent representation of languages like Japanese which
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
709 mix ASCII and a national script (even outside of computer programs).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
710
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
711 First, ISO 2022 codified prevailing practice by dividing the code space
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
712 into "control" and "graphic" regions. The code points 0x00-0x1F and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
713 0x80-0x9F are reserved for "control characters", while "graphic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
714 characters" must be assigned to code points in the regions 0x20-0x7F and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
715 0xA0-0xFF. The positions 0x20 and 0x7F are special, and under some
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
716 circumstances must be assigned the graphic character "ASCII SPACE" and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
717 the control character "ASCII DEL" respectively.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
718
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
719 The various regions are given the name C0 (0x00-0x1F), GL (0x20-0x7F),
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
720 C1 (0x80-0x9F), and GR (0xA0-0xFF). GL and GR stand for "graphic left"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
721 and "graphic right", respectively, because of the standard method of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
722 displaying graphic character sets in tables with the high byte indexing
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
723 columns and the low byte indexing rows. I don't find it very intuitive,
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
724 but these are called "registers".
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
725
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
726 An ISO 2022-conformant encoding for a graphic character set must use a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
727 fixed number of bytes per character, and the values must fit into a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
728 single register; that is, each byte must range over either 0x20-0x7F, or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
729 0xA0-0xFF. It is not allowed to extend the range of the repertoire of a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
730 character set by using both ranges at the same. This is why a standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
731 character set such as ISO 8859-1 is actually considered by ISO 2022 to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
732 be an aggregation of two character sets, ASCII and LATIN-1, and why it
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
733 is technically incorrect to refer to ISO 8859-1 as "Latin 1". Also, a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
734 single character's bytes must all be drawn from the same register; this
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
735 is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
736 2022-compatible encodings.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
737
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
738 The reason for this restriction becomes clear when you attempt to define
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
739 an efficient, robust encoding for a language like Japanese. Like ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
740 8859, Japanese encodings are aggregations of several character sets. In
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
741 practice, the vast majority of characters are drawn from the "JIS Roman"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
742 character set (a derivative of ASCII; it won't hurt to think of it as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
743 ASCII) and the JIS X 0208 standard "basic Japanese" character set
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
744 including not only ideographic characters ("kanji") but syllabic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
745 Japanese characters ("kana"), a wide variety of symbols, and many
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
746 alphabetic characters (Roman, Greek, and Cyrillic) as well. Although
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
747 JIS X 0208 includes the whole Roman alphabet, as a 2-byte code it is not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
748 suited to programming; thus the inclusion of ASCII in the standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
749 Japanese encodings.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
750
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
751 For normal Japanese text such as in newspapers, a broad repertoire of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
752 approximately 3000 characters is used. Evidently this won't fit into
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
753 one byte; two must be used. But much of the text processed by Japanese
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
754 computers is computer source code, nearly all of which is ASCII. A not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
755 insignificant portion of ordinary text is English (as such or as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
756 borrowed Japanese vocabulary) or other languages which can represented
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
757 at least approximately in ASCII, as well. It seems reasonable then to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
758 represent ASCII in one byte, and JIS X 0208 in two. And this is exactly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
759 what the Extended Unix Code for Japanese (EUC-JP) does. ASCII is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
760 invoked to the GL register, and JIS X 0208 is invoked to the GR
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
761 register. Thus, each byte can be tested for its character set by
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
762 looking at the high bit; if set, it is Japanese, if clear, it is ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
763 Furthermore, since control characters like newline can never be part of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
764 a graphic character, even in the case of corruption in transmission the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
765 stream will be resynchronized at every line break, on the order of 60-80
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
766 bytes. This coding system requires no escape sequences or special
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
767 control codes to represent 99.9% of all Japanese text.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
768
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
769 Note carefully the distinction between the character sets (ASCII and JIS
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
770 X 0208), the encoding (EUC-JP), and the coding system (ISO 2022). The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
771 JIS X 0208 character set is used in three different encodings for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
772 Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
773 always clear), in EUC-JP it is invoked into GR (setting the high bit in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
774 the process), and in Shift JIS the high bit may be set or reset, and the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
775 significant bits are shifted within the 16-bit character so that the two
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
776 main character sets can coexist with a third (the "halfwidth katakana"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
777 of JIS X 0201). As the name implies, the ISO-2022-JP encoding is also a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
778 version of the ISO-2022 coding system.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
779
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
780 In order to systematically treat subsidiary character sets (like the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
781 "halfwidth katakana" already mentioned, and the "supplementary kanji" of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
782 JIS X 0212), four further registers are defined: G0, G1, G2, and G3.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
783 Unlike GL and GR, they are not logically distinguished by internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
784 format. Instead, the process of "invocation" mentioned earlier is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
785 broken into two steps: first, a character set is @dfn{designated} to one
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
786 of the registers G0-G3 by use of an @dfn{escape sequence} of the form:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
787
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
788 @example
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
789 ESC [@var{I}] @var{I} @var{F}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
790 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
791
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
792 where @var{I} is an intermediate character or characters in the range
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
793 0x20 - 0x3F, and @var{F}, from the range 0x30-0x7Fm is the final
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
794 character identifying this charset. (Final characters in the range
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
795 0x30-0x3F are reserved for private use and will never have a publicly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
796 registered meaning.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
797
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
798 Then that register is @dfn{invoked} to either GL or GR, either
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
799 automatically (designations to G0 normally involve invocation to GL as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
800 well), or by use of shifting (affecting only the following character in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
801 the data stream) or locking (effective until the next designation or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
802 locking) control sequences. An encoding conformant to ISO 2022 is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
803 typically defined by designating the initial contents of the G0-G3
901
37e56e920ac5 [xemacs-hg @ 2002-07-05 20:35:47 by adrian]
adrian
parents: 775
diff changeset
804 registers, specifying a 7 or 8 bit environment, and specifying whether
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
805 further designations will be recognized.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
806
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
807 Some examples of character sets and the registered final characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
808 @var{F} used to designate them:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
809
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
810 @need 1000
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
811 @table @asis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
812 @item 94-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
813 ASCII (B), left (J) and right (I) half of JIS X 0201, ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
814 @item 96-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
815 Latin-1 (A), Latin-2 (B), Latin-3 (C), ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
816 @item 94x94-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
817 GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
818 @item 96x96-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
819 none for the moment
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
820 @end table
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
821
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
822 The meanings of the various characters in these sequences, where not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
823 specified by the ISO 2022 standard (such as the ESC character), are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
824 assigned by @dfn{ECMA}, the European Computer Manufacturers Association.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
825
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
826 The meaning of intermediate characters are:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
827
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
828 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
829 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
830 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
831 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
832 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
833 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
834 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
835 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
836 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
837 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
838 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
839 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
840 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
841
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
842 The comma may be used in files read and written only by MULE, as a MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
843 extension, but this is illegal in ISO 2022. (The reason is that in ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
844 2022 G0 must be a 94-member character set, with 0x20 assigned the value
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
845 SPACE, and 0x7F assigned the value DEL.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
846
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
847 Here are examples of designations:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
848
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
849 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
850 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
851 ESC ( B : designate to G0 ASCII
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
852 ESC - A : designate to G1 Latin-1
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
853 ESC $ ( A or ESC $ A : designate to G0 GB2312
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
854 ESC $ ( B or ESC $ B : designate to G0 JISX0208
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
855 ESC $ ) C : designate to G1 KSC5601
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
856 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
857 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
858
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
859 (The short forms used to designate GB2312 and JIS X 0208 are for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
860 backwards compatibility; the long forms are preferred.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
861
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
862 To use a charset designated to G2 or G3, and to use a charset designated
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
863 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
864 into GL. There are two types of invocation, Locking Shift (forever) and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
865 Single Shift (one character only).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
866
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
867 Locking Shift is done as follows:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
868
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
869 @example
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
870 LS0 or SI (0x0F): invoke G0 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
871 LS1 or SO (0x0E): invoke G1 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
872 LS2: invoke G2 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
873 LS3: invoke G3 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
874 LS1R: invoke G1 into GR
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
875 LS2R: invoke G2 into GR
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
876 LS3R: invoke G3 into GR
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
877 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
878
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
879 Single Shift is done as follows:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
880
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
881 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
882 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
883 SS2 or ESC N: invoke G2 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
884 SS3 or ESC O: invoke G3 into GL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
885 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
886 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
887
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
888 The shift functions (such as LS1R and SS3) are represented by control
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
889 characters (from C1) in 8 bit environments and by escape sequences in 7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
890 bit environments.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
891
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
892 (#### Ben says: I think the above is slightly incorrect. It appears that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
893 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
894 ESC O behave as indicated. The above definitions will not parse
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
895 EUC-encoded text correctly, and it looks like the code in mule-coding.c
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
896 has similar problems.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
897
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
898 Evidently there are a lot of ISO-2022-compliant ways of encoding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
899 multilingual text. Now, in the world, there exist many coding systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
900 such as X11's Compound Text, Japanese JUNET code, and so-called EUC
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
901 (Extended UNIX Code); all of these are variants of ISO 2022.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
902
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
903 In MULE, we characterize a version of ISO 2022 by the following
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
904 attributes:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
905
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
906 @enumerate
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
907 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
908 The character sets initially designated to G0 thru G3.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
909 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
910 Whether short form designations are allowed for Japanese and Chinese.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
911 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
912 Whether ASCII should be designated to G0 before control characters.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
913 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
914 Whether ASCII should be designated to G0 at the end of line.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
915 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
916 7-bit environment or 8-bit environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
917 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
918 Whether Locking Shifts are used or not.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
919 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
920 Whether to use ASCII or the variant JIS X 0201-1976-Roman.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
921 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
922 Whether to use JIS X 0208-1983 or the older version JIS X 0208-1976.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
923 @end enumerate
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
924
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
925 (The last two are only for Japanese.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
926
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
927 By specifying these attributes, you can create any variant
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
928 of ISO 2022.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
929
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
930 Here are several examples:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
931
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
932 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
933 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
934 ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
935 1. G0 <- ASCII, G1..3 <- never used
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
936 2. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
937 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
938 4. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
939 5. 7-bit environment
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
940 6. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
941 7. Use ASCII
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
942 8. Use JIS X 0208-1983
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
943 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
944
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
945 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
946 ctext -- X11 Compound Text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
947 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
948 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
949 3. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
950 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
951 5. 8-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
952 6. No.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
953 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
954 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
955 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
956
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
957 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
958 euc-china -- Chinese EUC. Often called the "GB encoding", but that is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
959 technically incorrect.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
960 1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
961 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
962 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
963 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
964 5. 8-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
965 6. No.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
966 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
967 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
968 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
969
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
970 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
971 ISO-2022-KR -- Coding system used in Korean email.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
972 1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
973 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
974 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
975 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
976 5. 7-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
977 6. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
978 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
979 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
980 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
981 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
982
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
983 MULE creates all of these coding systems by default.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
984
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
985 @node EOL Conversion, Coding System Properties, ISO 2022, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
986 @subsection EOL Conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
987
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
988 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
989 @item nil
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
990 Automatically detect the end-of-line type (LF, CRLF, or CR). Also
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
991 generate subsidiary coding systems named @code{@var{name}-unix},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
992 @code{@var{name}-dos}, and @code{@var{name}-mac}, that are identical to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
993 this coding system but have an EOL-TYPE value of @code{lf}, @code{crlf},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
994 and @code{cr}, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
995 @item lf
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
996 The end of a line is marked externally using ASCII LF. Since this is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
997 also the way that XEmacs represents an end-of-line internally,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
998 specifying this option results in no end-of-line conversion. This is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
999 the standard format for Unix text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1000 @item crlf
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1001 The end of a line is marked externally using ASCII CRLF. This is the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1002 standard format for MS-DOS text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1003 @item cr
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1004 The end of a line is marked externally using ASCII CR. This is the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1005 standard format for Macintosh text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1006 @item t
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1007 Automatically detect the end-of-line type but do not generate subsidiary
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1008 coding systems. (This value is converted to @code{nil} when stored
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1009 internally, and @code{coding-system-property} will return @code{nil}.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1010 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1011
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1012 @node Coding System Properties, Basic Coding System Functions, EOL Conversion, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1013 @subsection Coding System Properties
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1014
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1015 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1016 @item mnemonic
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1017 String to be displayed in the modeline when this coding system is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1018 active.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1019
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1020 @item eol-type
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1021 End-of-line conversion to be used. It should be one of the types
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1022 listed in @ref{EOL Conversion}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1023
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1024 @item eol-lf
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1025 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1026 Unix line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1027
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1028 @item eol-crlf
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1029 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1030 DOS line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1031
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1032 @item eol-cr
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1033 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1034 Macintosh line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1035
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1036 @item post-read-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1037 Function called after a file has been read in, to perform the decoding.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1038 Called with two arguments, @var{start} and @var{end}, denoting a region of
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1039 the current buffer to be decoded.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1040
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1041 @item pre-write-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1042 Function called before a file is written out, to perform the encoding.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1043 Called with two arguments, @var{start} and @var{end}, denoting a region of
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1044 the current buffer to be encoded.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1045 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1046
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1047 The following additional properties are recognized if @var{type} is
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1048 @code{iso2022}:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1049
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1050 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1051 @item charset-g0
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1052 @itemx charset-g1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1053 @itemx charset-g2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1054 @itemx charset-g3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1055 The character set initially designated to the G0 - G3 registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1056 The value should be one of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1057
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1058 @itemize @bullet
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1059 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1060 A charset object (designate that character set)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1061 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1062 @code{nil} (do not ever use this register)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1063 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1064 @code{t} (no character set is initially designated to the register, but
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1065 may be later on; this automatically sets the corresponding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1066 @code{force-g*-on-output} property)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1067 @end itemize
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1068
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1069 @item force-g0-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1070 @itemx force-g1-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1071 @itemx force-g2-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1072 @itemx force-g3-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1073 If non-@code{nil}, send an explicit designation sequence on output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1074 before using the specified register.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1075
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1076 @item short
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1077 If non-@code{nil}, use the short forms @samp{ESC $ @@}, @samp{ESC $ A},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1078 and @samp{ESC $ B} on output in place of the full designation sequences
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1079 @samp{ESC $ ( @@}, @samp{ESC $ ( A}, and @samp{ESC $ ( B}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1080
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1081 @item no-ascii-eol
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1082 If non-@code{nil}, don't designate ASCII to G0 at each end of line on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1083 output. Setting this to non-@code{nil} also suppresses other
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1084 state-resetting that normally happens at the end of a line.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1085
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1086 @item no-ascii-cntl
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1087 If non-@code{nil}, don't designate ASCII to G0 before control chars on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1088 output.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1089
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1090 @item seven
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1091 If non-@code{nil}, use 7-bit environment on output. Otherwise, use 8-bit
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1092 environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1093
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1094 @item lock-shift
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1095 If non-@code{nil}, use locking-shift (SO/SI) instead of single-shift or
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1096 designation by escape sequence.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1097
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1098 @item no-iso6429
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1099 If non-@code{nil}, don't use ISO6429's direction specification.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1100
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1101 @item escape-quoted
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1102 If non-@code{nil}, literal control characters that are the same as the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1103 beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1104 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F),
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1105 and CSI (0x9B)) are ``quoted'' with an escape character so that they can
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1106 be properly distinguished from an escape sequence. (Note that doing
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1107 this results in a non-portable encoding.) This encoding flag is used for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1108 byte-compiled files. Note that ESC is a good choice for a quoting
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1109 character because there are no escape sequences whose second byte is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1110 character from the Control-0 or Control-1 character sets; this is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1111 explicitly disallowed by the ISO 2022 standard.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1112
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1113 @item input-charset-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1114 A list of conversion specifications, specifying conversion of characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1115 in one charset to another when decoding is performed. Each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1116 specification is a list of two elements: the source charset, and the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1117 destination charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1118
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1119 @item output-charset-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1120 A list of conversion specifications, specifying conversion of characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1121 in one charset to another when encoding is performed. The form of each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1122 specification is the same as for @code{input-charset-conversion}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1123 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1124
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1125 The following additional properties are recognized (and required) if
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1126 @var{type} is @code{ccl}:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1127
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1128 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1129 @item decode
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1130 CCL program used for decoding (converting to internal format).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1131
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1132 @item encode
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1133 CCL program used for encoding (converting to external format).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1134 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1135
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1136 The following properties are used internally: @var{eol-cr},
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1137 @var{eol-crlf}, @var{eol-lf}, and @var{base}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1138
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1139 @node Basic Coding System Functions, Coding System Property Functions, Coding System Properties, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1140 @subsection Basic Coding System Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1141
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1142 @defun find-coding-system coding-system-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1143 This function retrieves the coding system of the given name.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1144
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1145 If @var{coding-system-or-name} is a coding-system object, it is simply
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1146 returned. Otherwise, @var{coding-system-or-name} should be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1147 If there is no such coding system, @code{nil} is returned. Otherwise
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1148 the associated coding system object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1149 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1150
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1151 @defun get-coding-system name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1152 This function retrieves the coding system of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1153 @code{find-coding-system} except an error is signalled if there is no
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1154 such coding system instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1155 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1156
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1157 @defun coding-system-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1158 This function returns a list of the names of all defined coding systems.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1159 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1160
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1161 @defun coding-system-name coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1162 This function returns the name of the given coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1163 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1164
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1165 @defun coding-system-base coding-system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1166 Returns the base coding system (undecided EOL convention)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1167 coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1168 @end defun
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1169
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1170 @defun make-coding-system name type &optional doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1171 This function registers symbol @var{name} as a coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1172
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1173 @var{type} describes the conversion method used and should be one of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1174 the types listed in @ref{Coding System Types}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1175
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1176 @var{doc-string} is a string describing the coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1177
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1178 @var{props} is a property list, describing the specific nature of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1179 character set. Recognized properties are as in @ref{Coding System
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1180 Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1181 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1182
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1183 @defun copy-coding-system old-coding-system new-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1184 This function copies @var{old-coding-system} to @var{new-name}. If
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1185 @var{new-name} does not name an existing coding system, a new one will
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1186 be created.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1187 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1188
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1189 @defun subsidiary-coding-system coding-system eol-type
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1190 This function returns the subsidiary coding system of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1191 @var{coding-system} with eol type @var{eol-type}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1192 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1193
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1194 @node Coding System Property Functions, Encoding and Decoding Text, Basic Coding System Functions, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1195 @subsection Coding System Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1196
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1197 @defun coding-system-doc-string coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1198 This function returns the doc string for @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1199 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1200
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1201 @defun coding-system-type coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1202 This function returns the type of @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1203 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1204
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1205 @defun coding-system-property coding-system prop
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1206 This function returns the @var{prop} property of @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1207 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1208
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1209 @node Encoding and Decoding Text, Detection of Textual Encoding, Coding System Property Functions, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1210 @subsection Encoding and Decoding Text
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1211
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1212 @defun decode-coding-region start end coding-system &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1213 This function decodes the text between @var{start} and @var{end} which
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1214 is encoded in @var{coding-system}. This is useful if you've read in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1215 encoded text from a file without decoding it (e.g. you read in a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1216 JIS-formatted file but used the @code{binary} or @code{no-conversion} coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1217 system, so that it shows up as @samp{^[$B!<!+^[(B}). The length of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1218 encoded text is returned. @var{buffer} defaults to the current buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1219 if unspecified.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1220 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1221
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1222 @defun encode-coding-region start end coding-system &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1223 This function encodes the text between @var{start} and @var{end} using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1224 @var{coding-system}. This will, for example, convert Japanese
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1225 characters into stuff such as @samp{^[$B!<!+^[(B} if you use the JIS
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1226 encoding. The length of the encoded text is returned. @var{buffer}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1227 defaults to the current buffer if unspecified.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1228 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1229
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1230 @node Detection of Textual Encoding, Big5 and Shift-JIS Functions, Encoding and Decoding Text, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1231 @subsection Detection of Textual Encoding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1232
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1233 @defun coding-category-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1234 This function returns a list of all recognized coding categories.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1235 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1236
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1237 @defun set-coding-priority-list list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1238 This function changes the priority order of the coding categories.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1239 @var{list} should be a list of coding categories, in descending order of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1240 priority. Unspecified coding categories will be lower in priority than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1241 all specified ones, in the same relative order they were in previously.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1242 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1243
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1244 @defun coding-priority-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1245 This function returns a list of coding categories in descending order of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1246 priority.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1247 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1248
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1249 @defun set-coding-category-system coding-category coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1250 This function changes the coding system associated with a coding category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1251 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1252
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1253 @defun coding-category-system coding-category
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1254 This function returns the coding system associated with a coding category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1255 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1256
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1257 @defun detect-coding-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1258 This function detects coding system of the text in the region between
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1259 @var{start} and @var{end}. Returned value is a list of possible coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1260 systems ordered by priority. If only ASCII characters are found, it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1261 returns @code{autodetect} or one of its subsidiary coding systems
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1262 according to a detected end-of-line type. Optional arg @var{buffer}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1263 defaults to the current buffer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1264 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1265
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1266 @node Big5 and Shift-JIS Functions, Predefined Coding Systems, Detection of Textual Encoding, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1267 @subsection Big5 and Shift-JIS Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1268
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1269 These are special functions for working with the non-standard
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1270 Shift-JIS and Big5 encodings.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1271
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1272 @defun decode-shift-jis-char code
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1273 This function decodes a JIS X 0208 character of Shift-JIS coding-system.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1274 @var{code} is the character code in Shift-JIS as a cons of type bytes.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1275 The corresponding character is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1276 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1277
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1278 @defun encode-shift-jis-char character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1279 This function encodes a JIS X 0208 character @var{character} to
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1280 SHIFT-JIS coding-system. The corresponding character code in SHIFT-JIS
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1281 is returned as a cons of two bytes.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1282 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1283
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1284 @defun decode-big5-char code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1285 This function decodes a Big5 character @var{code} of BIG5 coding-system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1286 @var{code} is the character code in BIG5. The corresponding character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1287 is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1288 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1289
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1290 @defun encode-big5-char character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1291 This function encodes the Big5 character @var{character} to BIG5
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1292 coding-system. The corresponding character code in Big5 is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1293 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1294
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1295 @node Predefined Coding Systems, , Big5 and Shift-JIS Functions, Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1296 @subsection Coding Systems Implemented
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1297
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1298 MULE initializes most of the commonly used coding systems at XEmacs's
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1299 startup. A few others are initialized only when the relevant language
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1300 environment is selected and support libraries are loaded. (NB: The
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1301 following list is based on XEmacs 21.2.19, the development branch at the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1302 time of writing. The list may be somewhat different for other
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1303 versions. Recent versions of GNU Emacs 20 implement a few more rare
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1304 coding systems; work is being done to port these to XEmacs.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1305
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1306 Unfortunately, there is not a consistent naming convention for character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1307 sets, and for practical purposes coding systems often take their name
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1308 from their principal character sets (ASCII, KOI8-R, Shift JIS). Others
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1309 take their names from the coding system (ISO-2022-JP, EUC-KR), and a few
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1310 from their non-text usages (internal, binary). To provide for this, and
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1311 for the fact that many coding systems have several common names, an
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1312 aliasing system is provided. Finally, some effort has been made to use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1313 names that are registered as MIME charsets (this is why the name
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1314 'shift_jis contains that un-Lisp-y underscore).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1315
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1316 There is a systematic naming convention regarding end-of-line (EOL)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1317 conventions for different systems. A coding system whose name ends in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1318 "-unix" forces the assumptions that lines are broken by newlines (0x0A).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1319 A coding system whose name ends in "-mac" forces the assumptions that
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1320 lines are broken by ASCII CRs (0x0D). A coding system whose name ends
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1321 in "-dos" forces the assumptions that lines are broken by CRLF sequences
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1322 (0x0D 0x0A). These subsidiary coding systems are automatically derived
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1323 from a base coding system. Use of the base coding system implies
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1324 autodetection of the text file convention. (The fact that the -unix,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1325 -mac, and -dos are derived from a base system results in them showing up
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1326 as "aliases" in `list-coding-systems'.) These subsidiaries have a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1327 consistent modeline indicator as well. "-dos" coding systems have ":T"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1328 appended to their modeline indicator, while "-mac" coding systems have
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1329 ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1330
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1331 In the following table, each coding system is given with its mode line
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1332 indicator in parentheses. Non-textual coding systems are listed first,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1333 followed by textual coding systems and their aliases. (The coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1334 subsidiary modeline indicators ":T" and ":t" will be omitted from the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1335 table of coding systems.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1336
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1337 ### SJT 1999-08-23 Maybe should order these by language? Definitely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1338 need language usage for the ISO-8859 family.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1339
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1340 Note that although true coding system aliases have been implemented for
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1341 XEmacs 21.2, the coding system initialization has not yet been converted
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1342 as of 21.2.19. So coding systems described as aliases have the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1343 properties as the aliased coding system, but will not be equal as Lisp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1344 objects.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1345
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1346 @table @code
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1347
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1348 @item automatic-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1349 @itemx undecided
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1350 @itemx undecided-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1351 @itemx undecided-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1352 @itemx undecided-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1353
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1354 Modeline indicator: @code{Auto}. A type @code{undecided} coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1355 Attempts to determine an appropriate coding system from file contents or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1356 the environment.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1357
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1358 @item raw-text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1359 @itemx no-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1360 @itemx raw-text-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1361 @itemx raw-text-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1362 @itemx raw-text-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1363 @itemx no-conversion-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1364 @itemx no-conversion-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1365 @itemx no-conversion-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1366
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1367 Modeline indicator: @code{Raw}. A type @code{no-conversion} coding system,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1368 which converts only line-break-codes. An implementation quirk means
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1369 that this coding system is also used for ISO8859-1.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1370
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1371 @item binary
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1372 Modeline indicator: @code{Binary}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1373 system which does no character coding or EOL conversions. An alias for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1374 @code{raw-text-unix}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1375
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1376 @item alternativnyj
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1377 @itemx alternativnyj-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1378 @itemx alternativnyj-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1379 @itemx alternativnyj-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1380
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1381 Modeline indicator: @code{Cy.Alt}. A type @code{ccl} coding system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1382 Alternativnyj, an encoding of the Cyrillic alphabet.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1383
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1384 @item big5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1385 @itemx big5-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1386 @itemx big5-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1387 @itemx big5-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1388
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1389 Modeline indicator: @code{Zh/Big5}. A type @code{big5} coding system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1390 BIG5, the most common encoding of traditional Chinese as used in Taiwan.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1391
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1392 @item cn-gb-2312
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1393 @itemx cn-gb-2312-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1394 @itemx cn-gb-2312-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1395 @itemx cn-gb-2312-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1396
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1397 Modeline indicator: @code{Zh-GB/EUC}. A type @code{iso2022} coding system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1398 for simplified Chinese (as used in the People's Republic of China), with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1399 the @code{ascii} (G0), @code{chinese-gb2312} (G1), and @code{sisheng}
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1400 (G2) character sets initially designated. Chinese EUC (Extended Unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1401 Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1402
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1403 @item ctext-hebrew
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1404 @itemx ctext-hebrew-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1405 @itemx ctext-hebrew-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1406 @itemx ctext-hebrew-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1407
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1408 Modeline indicator: @code{CText/Hbrw}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1409 with the @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) character
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1410 sets initially designated for Hebrew.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1411
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1412 @item ctext
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1413 @itemx ctext-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1414 @itemx ctext-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1415 @itemx ctext-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1416
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1417 Modeline indicator: @code{CText}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1418 with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1) character
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1419 sets initially designated. X11 Compound Text Encoding. Often
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1420 mistakenly recognized instead of EUC encodings; usual cause is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1421 inappropriate setting of @code{coding-priority-list}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1422
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1423 @item escape-quoted
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1424
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1425 Modeline indicator: @code{ESC/Quot}. A type @code{iso2022} 8-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1426 system with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1427 character sets initially designated and escape quoting. Unix EOL
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1428 conversion (ie, no conversion). It is used for .ELC files.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1429
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1430 @item euc-jp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1431 @itemx euc-jp-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1432 @itemx euc-jp-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1433 @itemx euc-jp-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1434
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1435 Modeline indicator: @code{Ja/EUC}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1436 with @code{ascii} (G0), @code{japanese-jisx0208} (G1),
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1437 @code{katakana-jisx0201} (G2), and @code{japanese-jisx0212} (G3)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1438 initially designated. Japanese EUC (Extended Unix Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1439
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1440 @item euc-kr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1441 @itemx euc-kr-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1442 @itemx euc-kr-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1443 @itemx euc-kr-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1444
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1445 Modeline indicator: @code{ko/EUC}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1446 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1447 designated. Korean EUC (Extended Unix Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1448
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1449 @item hz-gb-2312
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1450 Modeline indicator: @code{Zh-GB/Hz}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1451 system with Unix EOL convention (ie, no conversion) using
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1452 post-read-decode and pre-write-encode functions to translate the Hz/ZW
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1453 coding system used for Chinese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1454
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1455 @item iso-2022-7bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1456 @itemx iso-2022-7bit-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1457 @itemx iso-2022-7bit-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1458 @itemx iso-2022-7bit-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1459 @itemx iso-2022-7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1460
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1461 Modeline indicator: @code{ISO7}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1462 with @code{ascii} (G0) initially designated. Other character sets must
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1463 be explicitly designated to be used.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1464
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1465 @item iso-2022-7bit-ss2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1466 @itemx iso-2022-7bit-ss2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1467 @itemx iso-2022-7bit-ss2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1468 @itemx iso-2022-7bit-ss2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1469
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1470 Modeline indicator: @code{ISO7/SS}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1471 with @code{ascii} (G0) initially designated. Other character sets must
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1472 be explicitly designated to be used. SS2 is used to invoke a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1473 96-charset, one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1474
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1475 @item iso-2022-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1476 @itemx iso-2022-8-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1477 @itemx iso-2022-8-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1478 @itemx iso-2022-8-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1479
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1480 Modeline indicator: @code{ISO8}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1481 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1482 designated. Other character sets must be explicitly designated to be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1483 used. No single-shift or locking-shift.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1484
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1485 @item iso-2022-8bit-ss2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1486 @itemx iso-2022-8bit-ss2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1487 @itemx iso-2022-8bit-ss2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1488 @itemx iso-2022-8bit-ss2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1489
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1490 Modeline indicator: @code{ISO8/SS}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1491 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1492 designated. Other character sets must be explicitly designated to be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1493 used. SS2 is used to invoke a 96-charset, one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1494
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1495 @item iso-2022-int-1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1496 @itemx iso-2022-int-1-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1497 @itemx iso-2022-int-1-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1498 @itemx iso-2022-int-1-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1499
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1500 Modeline indicator: @code{INT-1}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1501 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1502 designated. ISO-2022-INT-1.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1503
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1504 @item iso-2022-jp-1978-irv
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1505 @itemx iso-2022-jp-1978-irv-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1506 @itemx iso-2022-jp-1978-irv-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1507 @itemx iso-2022-jp-1978-irv-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1508
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1509 Modeline indicator: @code{Ja-78/7bit}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1510 system. For compatibility with old Japanese terminals; if you need to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1511 know, look at the source.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1512
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1513 @item iso-2022-jp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1514 @itemx iso-2022-jp-2 (ISO7/SS)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1515 @itemx iso-2022-jp-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1516 @itemx iso-2022-jp-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1517 @itemx iso-2022-jp-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1518 @itemx iso-2022-jp-2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1519 @itemx iso-2022-jp-2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1520 @itemx iso-2022-jp-2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1521
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1522 Modeline indicator: @code{MULE/7bit}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1523 system with @code{ascii} (G0) initially designated, and complex
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1524 specifications to insure backward compatibility with old Japanese
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1525 systems. Used for communication with mail and news in Japan. The "-2"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1526 versions also use SS2 to invoke a 96-charset one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1527
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1528 @item iso-2022-kr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1529 Modeline indicator: @code{Ko/7bit} A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1530 system with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1531 designated. Used for e-mail in Korea.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1532
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1533 @item iso-2022-lock
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1534 @itemx iso-2022-lock-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1535 @itemx iso-2022-lock-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1536 @itemx iso-2022-lock-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1537
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1538 Modeline indicator: @code{ISO7/Lock}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1539 system with @code{ascii} (G0) initially designated, using Locking-Shift
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1540 to invoke a 96-charset.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1541
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1542 @item iso-8859-1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1543 @itemx iso-8859-1-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1544 @itemx iso-8859-1-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1545 @itemx iso-8859-1-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1546
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1547 Due to implementation, this is not a type @code{iso2022} coding system,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1548 but rather an alias for the @code{raw-text} coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1549
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1550 @item iso-8859-2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1551 @itemx iso-8859-2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1552 @itemx iso-8859-2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1553 @itemx iso-8859-2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1554
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1555 Modeline indicator: @code{MIME/Ltn-2}. A type @code{iso2022} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1556 system with @code{ascii} (G0) and @code{latin-iso8859-2} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1557 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1558
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1559 @item iso-8859-3
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1560 @itemx iso-8859-3-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1561 @itemx iso-8859-3-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1562 @itemx iso-8859-3-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1563
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1564 Modeline indicator: @code{MIME/Ltn-3}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1565 with @code{ascii} (G0) and @code{latin-iso8859-3} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1566 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1567
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1568 @item iso-8859-4
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1569 @itemx iso-8859-4-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1570 @itemx iso-8859-4-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1571 @itemx iso-8859-4-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1572
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1573 Modeline indicator: @code{MIME/Ltn-4}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1574 with @code{ascii} (G0) and @code{latin-iso8859-4} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1575 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1576
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1577 @item iso-8859-5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1578 @itemx iso-8859-5-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1579 @itemx iso-8859-5-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1580 @itemx iso-8859-5-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1581
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1582 Modeline indicator: @code{ISO8/Cyr}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1583 @code{ascii} (G0) and @code{cyrillic-iso8859-5} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1584
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1585 @item iso-8859-7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1586 @itemx iso-8859-7-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1587 @itemx iso-8859-7-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1588 @itemx iso-8859-7-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1589
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1590 Modeline indicator: @code{Grk}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1591 @code{ascii} (G0) and @code{greek-iso8859-7} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1592
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1593 @item iso-8859-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1594 @itemx iso-8859-8-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1595 @itemx iso-8859-8-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1596 @itemx iso-8859-8-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1597
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1598 Modeline indicator: @code{MIME/Hbrw}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1599 @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1600
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1601 @item iso-8859-9
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1602 @itemx iso-8859-9-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1603 @itemx iso-8859-9-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1604 @itemx iso-8859-9-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1605
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1606 Modeline indicator: @code{MIME/Ltn-5}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1607 with @code{ascii} (G0) and @code{latin-iso8859-9} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1608 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1609
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1610 @item koi8-r
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1611 @itemx koi8-r-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1612 @itemx koi8-r-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1613 @itemx koi8-r-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1614
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1615 Modeline indicator: @code{KOI8}. A type @code{ccl} coding-system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1616 KOI8-R, an encoding of the Cyrillic alphabet.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1617
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1618 @item shift_jis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1619 @itemx shift_jis-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1620 @itemx shift_jis-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1621 @itemx shift_jis-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1622
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1623 Modeline indicator: @code{Ja/SJIS}. A type @code{shift-jis} coding-system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1624 implementing the Shift-JIS encoding for Japanese. The underscore is to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1625 conform to the MIME charset implementing this encoding.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1626
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1627 @item tis-620
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1628 @itemx tis-620-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1629 @itemx tis-620-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1630 @itemx tis-620-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1631
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1632 Modeline indicator: @code{TIS620}. A type @code{ccl} encoding for Thai. The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1633 external encoding is defined by TIS620, the internal encoding is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1634 peculiar to MULE, and called @code{thai-xtis}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1635
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1636 @item viqr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1637
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1638 Modeline indicator: @code{VIQR}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1639 system with Unix EOL convention (ie, no conversion) using
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1640 post-read-decode and pre-write-encode functions to translate the VIQR
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1641 coding system for Vietnamese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1642
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1643 @item viscii
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1644 @itemx viscii-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1645 @itemx viscii-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1646 @itemx viscii-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1647
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1648 Modeline indicator: @code{VISCII}. A type @code{ccl} coding-system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1649 for VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1650 given priority by XEmacs.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1651
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1652 @item vscii
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1653 @itemx vscii-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1654 @itemx vscii-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1655 @itemx vscii-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1656
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1657 Modeline indicator: @code{VSCII}. A type @code{ccl} coding-system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1658 for VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1659 given priority by XEmacs. Use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1660 @code{(prefer-coding-system 'vietnamese-vscii)} to give priority to VSCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1661
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1662 @end table
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1663
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1664 @node CCL, Category Tables, Coding Systems, MULE
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1665 @section CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1666
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1667 CCL (Code Conversion Language) is a simple structured programming
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1668 language designed for character coding conversions. A CCL program is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1669 compiled to CCL code (represented by a vector of integers) and executed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1670 by the CCL interpreter embedded in Emacs. The CCL interpreter
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1671 implements a virtual machine with 8 registers called @code{r0}, ...,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1672 @code{r7}, a number of control structures, and some I/O operators. Take
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1673 care when using registers @code{r0} (used in implicit @dfn{set}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1674 statements) and especially @code{r7} (used internally by several
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1675 statements and operations, especially for multiple return values and I/O
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1676 operations).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1677
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1678 CCL is used for code conversion during process I/O and file I/O for
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1679 non-ISO2022 coding systems. (It is the only way for a user to specify a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1680 code conversion function.) It is also used for calculating the code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1681 point of an X11 font from a character code. However, since CCL is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1682 designed as a powerful programming language, it can be used for more
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1683 generic calculation where efficiency is demanded. A combination of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1684 three or more arithmetic operations can be calculated faster by CCL than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1685 by Emacs Lisp.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1686
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1687 @strong{Warning:} The code in @file{src/mule-ccl.c} and
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1688 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1689 description of CCL's semantics. The previous version of this section
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1690 contained several typos and obsolete names left from earlier versions of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1691 MULE, and many may remain. (I am not an experienced CCL programmer; the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1692 few who know CCL well find writing English painful.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1693
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1694 A CCL program transforms an input data stream into an output data
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1695 stream. The input stream, held in a buffer of constant bytes, is left
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1696 unchanged. The buffer may be filled by an external input operation,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1697 taken from an Emacs buffer, or taken from a Lisp string. The output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1698 buffer is a dynamic array of bytes, which can be written by an external
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1699 output operation, inserted into an Emacs buffer, or returned as a Lisp
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1700 string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1701
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1702 A CCL program is a (Lisp) list containing two or three members. The
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1703 first member is the @dfn{buffer magnification}, which indicates the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1704 required minimum size of the output buffer as a multiple of the input
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1705 buffer. It is followed by the @dfn{main block} which executes while
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1706 there is input remaining, and an optional @dfn{EOF block} which is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1707 executed when the input is exhausted. Both the main block and the EOF
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1708 block are CCL blocks.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1709
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1710 A @dfn{CCL block} is either a CCL statement or list of CCL statements.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1711 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1712 or an @dfn{assignment}, which is a list of a register to receive the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1713 assignment, an assignment operator, and an expression) or a @dfn{control
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1714 statement} (a list starting with a keyword, whose allowable syntax
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1715 depends on the keyword).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1716
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1717 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1718 * CCL Syntax:: CCL program syntax in BNF notation.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1719 * CCL Statements:: Semantics of CCL statements.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1720 * CCL Expressions:: Operators and expressions in CCL.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1721 * Calling CCL:: Running CCL programs.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1722 * CCL Examples:: The encoding functions for Big5 and KOI-8.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1723 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1724
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1725 @node CCL Syntax, CCL Statements, , CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1726 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1727 @subsection CCL Syntax
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1728
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1729 The full syntax of a CCL program in BNF notation:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1730
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1731 @format
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1732 CCL_PROGRAM :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1733 (BUFFER_MAGNIFICATION
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1734 CCL_MAIN_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1735 [ CCL_EOF_BLOCK ])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1736
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1737 BUFFER_MAGNIFICATION := integer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1738 CCL_MAIN_BLOCK := CCL_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1739 CCL_EOF_BLOCK := CCL_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1740
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1741 CCL_BLOCK :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1742 STATEMENT | (STATEMENT [STATEMENT ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1743 STATEMENT :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1744 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1745 | CALL | END
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1746
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1747 SET :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1748 (REG = EXPRESSION)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1749 | (REG ASSIGNMENT_OPERATOR EXPRESSION)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1750 | integer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1751
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1752 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1753
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1754 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1755 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1756 LOOP := (loop STATEMENT [STATEMENT ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1757 BREAK := (break)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1758 REPEAT :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1759 (repeat)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1760 | (write-repeat [REG | integer | string])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1761 | (write-read-repeat REG [integer | ARRAY])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1762 READ :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1763 (read REG ...)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1764 | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1765 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1766 WRITE :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1767 (write REG ...)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1768 | (write EXPRESSION)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1769 | (write integer) | (write string) | (write REG ARRAY)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1770 | string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1771 CALL := (call ccl-program-name)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1772 END := (end)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1773
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1774 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1775 ARG := REG | integer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1776 OPERATOR :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1777 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1778 | < | > | == | <= | >= | != | de-sjis | en-sjis
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1779 ASSIGNMENT_OPERATOR :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1780 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1781 ARRAY := '[' integer ... ']'
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1782 @end format
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1783
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1784 @node CCL Statements, CCL Expressions, CCL Syntax, CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1785 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1786 @subsection CCL Statements
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1787
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1788 The Emacs Code Conversion Language provides the following statement
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1789 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1790 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, and @dfn{end}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1791
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1792 @heading Set statement:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1793
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1794 The @dfn{set} statement has three variants with the syntaxes
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1795 @samp{(@var{reg} = @var{expression})},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1796 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1797 @samp{@var{integer}}. The assignment operator variation of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1798 @dfn{set} statement works the same way as the corresponding C expression
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1799 statement does. The assignment operators are @code{+=}, @code{-=},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1800 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{|=}, @code{^=},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1801 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1802 "naked integer" @var{integer} is equivalent to a @var{set} statement of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1803 the form @code{(r0 = @var{integer})}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1804
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1805 @heading I/O statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1806
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1807 The @dfn{read} statement takes one or more registers as arguments. It
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1808 reads one byte (a C char) from the input into each register in turn.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1809
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1810 The @dfn{write} takes several forms. In the form @samp{(write @var{reg}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1811 ...)} it takes one or more registers as arguments and writes each in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1812 turn to the output. The integer in a register (interpreted as an
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1813 Emchar) is encoded to multibyte form (ie, Bufbytes) and written to the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1814 current output buffer. If it is less than 256, it is written as is.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1815 The forms @samp{(write @var{expression})} and @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1816 @var{integer})} are treated analogously. The form @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1817 @var{string})} writes the constant string to the output. A
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1818 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1819 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1820 the @var{reg}th element of the @var{array} to the output.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1821
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1822 @heading Conditional statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1823
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1824 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1825 an optional @var{second CCL block} as arguments. If the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1826 @var{expression} evaluates to non-zero, the first @var{CCL block} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1827 executed. Otherwise, if there is a @var{second CCL block}, it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1828 executed.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1829
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1830 The @dfn{read-if} variant of the @dfn{if} statement takes an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1831 @var{expression}, a @var{CCL block}, and an optional @var{second CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1832 block} as arguments. The @var{expression} must have the form
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1833 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1834 a register or an integer). The @code{read-if} statement first reads
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1835 from the input into the first register operand in the @var{expression},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1836 then conditionally executes a CCL block just as the @code{if} statement
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1837 does.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1838
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1839 The @dfn{branch} statement takes an @var{expression} and one or more CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1840 blocks as arguments. The CCL blocks are treated as a zero-indexed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1841 array, and the @code{branch} statement uses the @var{expression} as the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1842 index of the CCL block to execute. Null CCL blocks may be used as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1843 no-ops, continuing execution with the statement following the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1844 @code{branch} statement in the containing CCL block. Out-of-range
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1845 values for the @var{expression} are also treated as no-ops.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1846
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1847 The @dfn{read-branch} variant of the @dfn{branch} statement takes an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1848 @var{register}, a @var{CCL block}, and an optional @var{second CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1849 block} as arguments. The @code{read-branch} statement first reads from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1850 the input into the @var{register}, then conditionally executes a CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1851 block just as the @code{branch} statement does.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1852
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1853 @heading Loop control statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1854
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1855 The @dfn{loop} statement creates a block with an implied jump from the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1856 end of the block back to its head. The loop is exited on a @code{break}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1857 statement, and continued without executing the tail by a @code{repeat}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1858 statement.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1859
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1860 The @dfn{break} statement, written @samp{(break)}, terminates the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1861 current loop and continues with the next statement in the current
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1862 block.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1863
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1864 The @dfn{repeat} statement has three variants, @code{repeat},
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1865 @code{write-repeat}, and @code{write-read-repeat}. Each continues the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1866 current loop from its head, possibly after performing I/O.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1867 @code{repeat} takes no arguments and does no I/O before jumping.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1868 @code{write-repeat} takes a single argument (a register, an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1869 integer, or a string), writes it to the output, then jumps.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1870 @code{write-read-repeat} takes one or two arguments. The first must
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1871 be a register. The second may be an integer or an array; if absent, it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1872 is implicitly set to the first (register) argument.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1873 @code{write-read-repeat} writes its second argument to the output, then
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1874 reads from the input into the register, and finally jumps. See the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1875 @code{write} and @code{read} statements for the semantics of the I/O
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1876 operations for each type of argument.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1877
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1878 @heading Other control statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1879
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1880 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})},
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1881 executes a CCL program as a subroutine. It does not return a value to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1882 the caller, but can modify the register status.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1883
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1884 The @dfn{end} statement, written @samp{(end)}, terminates the CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1885 program successfully, and returns to caller (which may be a CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1886 program). It does not alter the status of the registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1887
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1888 @node CCL Expressions, Calling CCL, CCL Statements, CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1889 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1890 @subsection CCL Expressions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1891
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1892 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1893 consist of a single @var{operand}, either a register (one of @code{r0},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1894 ..., @code{r0}) or an integer. Complex expressions are lists of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1895 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1896 C, assignments are not expressions.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1897
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1898 In the following table, @var{X} is the target resister for a @dfn{set}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1899 In subexpressions, this is implicitly @code{r7}. This means that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1900 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1901 freely in subexpressions, since they return parts of their values in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1902 @code{r7}. @var{Y} may be an expression, register, or integer, while
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1903 @var{Z} must be a register or an integer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1904
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1905 @multitable @columnfractions .22 .14 .09 .55
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1906 @item Name @tab Operator @tab Code @tab C-like Description
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1907 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1908 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1909 @item CCL_MUL @tab @code{*} @tab 0x02 @tab X = Y * Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1910 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1911 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1912 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1913 @item CCL_OR @tab @code{|} @tab 0x06 @tab X = Y | Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1914 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1915 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1916 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1917 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) | Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1918 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1919 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1920 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1921 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1922 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1923 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1924 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1925 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1926 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1927 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1928 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1929 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1930 @end multitable
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1931
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1932 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8,
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1933 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1934 and CCL_DECODE_SJIS treat their first and second bytes as the high and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1935 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1936 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1937 complicated transformation of the Japanese standard JIS encoding to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1938 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1939 represent the SJIS operations in infix form.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1940
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1941 @node Calling CCL, CCL Examples, CCL Expressions, CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1942 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1943 @subsection Calling CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1944
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1945 CCL programs are called automatically during Emacs buffer I/O when the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1946 external representation has a coding system type of @code{shift-jis},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1947 @code{big5}, or @code{ccl}. The program is specified by the coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1948 system (@pxref{Coding Systems}). You can also call CCL programs from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1949 other CCL programs, and from Lisp using these functions:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1950
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1951 @defun ccl-execute ccl-program status
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1952 Execute @var{ccl-program} with registers initialized by
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1953 @var{status}. @var{ccl-program} is a vector of compiled CCL code
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1954 created by @code{ccl-compile}. It is an error for the program to try to
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1955 execute a CCL I/O command. @var{status} must be a vector of nine
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1956 values, specifying the initial value for the R0, R1 .. R7 registers and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1957 for the instruction counter IC. A @code{nil} value for a register
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1958 initializer causes the register to be set to 0. A @code{nil} value for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1959 the IC initializer causes execution to start at the beginning of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1960 program. When the program is done, @var{status} is modified (by
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1961 side-effect) to contain the ending values for the corresponding
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1962 registers and IC.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1963 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1964
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1965 @defun ccl-execute-on-string ccl-program status string &optional continue
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1966 Execute @var{ccl-program} with initial @var{status} on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1967 @var{string}. @var{ccl-program} is a vector of compiled CCL code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1968 created by @code{ccl-compile}. @var{status} must be a vector of nine
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1969 values, specifying the initial value for the R0, R1 .. R7 registers and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1970 for the instruction counter IC. A @code{nil} value for a register
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1971 initializer causes the register to be set to 0. A @code{nil} value for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1972 the IC initializer causes execution to start at the beginning of the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1973 program. An optional fourth argument @var{continue}, if non-@code{nil}, causes
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1974 the IC to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1975 remain on the unsatisfied read operation if the program terminates due
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1976 to exhaustion of the input buffer. Otherwise the IC is set to the end
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1977 of the program. When the program is done, @var{status} is modified (by
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1978 side-effect) to contain the ending values for the corresponding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1979 registers and IC. Returns the resulting string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1980 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1981
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1982 To call a CCL program from another CCL program, it must first be
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1983 registered:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1984
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1985 @defun register-ccl-program name ccl-program
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1986 Register @var{name} for CCL program @var{ccl-program} in
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1987 @code{ccl-program-table}. @var{ccl-program} should be the compiled form of
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1988 a CCL program, or @code{nil}. Return index number of the registered CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1989 program.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1990 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1991
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1992 Information about the processor time used by the CCL interpreter can be
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1993 obtained using these functions:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1994
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1995 @defun ccl-elapsed-time
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1996 Returns the elapsed processor time of the CCL interpreter as cons of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1997 user and system time, as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1998 floating point numbers measured in seconds. If only one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1999 overall value can be determined, the return value will be a cons of that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2000 value and 0.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2001 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2002
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2003 @defun ccl-reset-elapsed-time
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2004 Resets the CCL interpreter's internal elapsed time registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2005 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2006
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2007 @node CCL Examples, , Calling CCL, CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2008 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2009 @subsection CCL Examples
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2010
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2011 This section is not yet written.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2012
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2013 @node Category Tables, Unicode Support, CCL, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2014 @section Category Tables
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2015
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2016 A category table is a type of char table used for keeping track of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2017 categories. Categories are used for classifying characters for use in
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
2018 regexps---you can refer to a category rather than having to use a
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2019 complicated [] expression (and category lookups are significantly
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2020 faster).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2021
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2022 There are 95 different categories available, one for each printable
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2023 character (including space) in the ASCII charset. Each category is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2024 designated by one such character, called a @dfn{category designator}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2025 They are specified in a regexp using the syntax @samp{\cX}, where X is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2026 category designator. (This is not yet implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2027
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2028 A category table specifies, for each character, the categories that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2029 the character is in. Note that a character can be in more than one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2030 category. More specifically, a category table maps from a character to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2031 either the value @code{nil} (meaning the character is in no categories)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2032 or a 95-element bit vector, specifying for each of the 95 categories
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2033 whether the character is in that category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2034
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2035 Special Lisp functions are provided that abstract this, so you do not
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2036 have to directly manipulate bit vectors.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2037
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2038 @defun category-table-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2039 This function returns @code{t} if @var{object} is a category table.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2040 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2041
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2042 @defun category-table &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2043 This function returns the current category table. This is the one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2044 specified by the current buffer, or by @var{buffer} if it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2045 non-@code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2046 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2047
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2048 @defun standard-category-table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2049 This function returns the standard category table. This is the one used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2050 for new buffers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2051 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2052
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2053 @defun copy-category-table &optional category-table
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2054 This function returns a new category table which is a copy of
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2055 @var{category-table}, which defaults to the standard category table.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2056 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2057
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2058 @defun set-category-table category-table &optional buffer
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2059 This function selects @var{category-table} as the new category table for
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2060 @var{buffer}. @var{buffer} defaults to the current buffer if omitted.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2061 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2062
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2063 @defun category-designator-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2064 This function returns @code{t} if @var{object} is a category designator (a
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2065 char in the range @samp{' '} to @samp{'~'}).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2066 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2067
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2068 @defun category-table-value-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2069 This function returns @code{t} if @var{object} is a category table value.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2070 Valid values are @code{nil} or a bit vector of size 95.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2071 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2072
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2073
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2074 @c Added 2002-03-13 sjt
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2075 @node Unicode Support, , Category Tables, MULE
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2076 @section Unicode Support
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2077 @cindex unicode
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2078 @cindex utf-8
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2079 @cindex utf-16
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2080 @cindex ucs-2
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2081 @cindex ucs-4
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2082 @cindex bmp
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2083 @cindex basic multilingual plance
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2084
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2085 Unicode support was added by Ben Wing to XEmacs 21.5.6.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2086
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2087 @defun set-language-unicode-precedence-list list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2088 Set the language-specific precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2089 This is a list of charsets, which are consulted in order for a translation
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2090 matching a given Unicode character. If no matches are found, the charsets
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2091 in the default precedence list (see
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2092 @code{set-default-unicode-precedence-list}) are consulted, and then all
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2093 remaining charsets, in some arbitrary order.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2094
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2095 The language-specific precedence list is meant to be set as part of the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2096 language environment initialization; the default precedence list is meant
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2097 to be set by the user.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2098 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2099
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2100 @defun language-unicode-precedence-list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2101 Return the language-specific precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2102 See @code{set-language-unicode-precedence-list} for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2103 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2104
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2105 @defun set-default-unicode-precedence-list list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2106 Set the default precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2107 This is meant to be set by the user. See
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2108 `set-language-unicode-precedence-list' for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2109 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2110
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2111 @defun default-unicode-precedence-list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2112 Return the default precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2113 See @code{set-language-unicode-precedence-list} for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2114 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2115
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2116 @defun set-unicode-conversion character code
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2117 Add conversion information between Unicode codepoints and characters.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2118 @var{character} is one of the following:
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2119
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2120 @c #### fix this markup
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2121 -- A character (in which case @var{code} must be a non-negative integer)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2122 -- A vector of characters (in which case @var{code} must be a vector of
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2123 non-negative integers of the same length)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2124
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2125 Values of @var{code} above 2^20 - 1 are allowed for the purpose of specifying
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2126 private characters, but will cause errors when converted to UTF-16 or UTF-32.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2127 UCS-4 and UTF-8 can handle values to 2^31 - 1, but XEmacs Lisp integers top
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2128 out at 2^30 - 1.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2129 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2130
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2131 @defun character-to-unicode character
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2132 Convert @var{character} to Unicode codepoint.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2133 When there is no international support (i.e. MULE is not defined),
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2134 this function simply does @code{char-to-int}.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2135 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2136
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2137 @defun unicode-to-character code [charsets]
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2138 Convert Unicode codepoint @var{code} to character.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2139 @var{code} should be a non-negative integer.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2140 If @var{charsets} is given, it should be a list of charsets, and only those
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2141 charsets will be consulted, in the given order, for a translation.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2142 Otherwise, the default ordering of all charsets will be given (see
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2143 @code{set-unicode-charset-precedence}).
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2144
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2145 When there is no international support (i.e. MULE is not defined),
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2146 this function simply does @code{int-to-char} and ignores the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2147 @var{charsets} argument.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2148 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2149
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2150 @defun parse-unicode-translation-table filename charset start end offset flags
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2151 Parse Unicode translation data in @var{filename} for MULE @var{charset}.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2152 Data is text, in the form of one translation per line -- charset
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2153 codepoint followed by Unicode codepoint. Numbers are decimal or hex
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2154 \(preceded by 0x). Comments are marked with a #. Charset codepoints
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2155 for two-dimensional charsets should have the first octet stored in the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2156 high 8 bits of the hex number and the second in the low 8 bits.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2157
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2158 If @var{start} and @var{end} are given, only charset codepoints within
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2159 the given range will be processed. If @var{offset} is given, that value
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2160 will be added to all charset codepoints in the file to obtain the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2161 internal charset codepoint. @var{start} and @var{end} apply to the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2162 codepoints in the file, before @var{offset} is applied.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2163
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2164 (Note that, as usual, we assume that octets are in the range 32 to
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2165 127 or 33 to 126. If you have a table in kuten form, with octets in
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2166 the range 1 to 94, you will have to use an offset of 5140,
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2167 i.e. 0x2020.)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2168
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2169 @var{flags}, if specified, control further how the tables are interpreted
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2170 and are used to special-case certain known table weirdnesses in the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2171 Unicode tables:
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2172
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2173 @table @code
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2174 @item ignore-first-column'
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2175 Exactly as it sounds. The JIS X 0208 tables have 3 columns of data instead
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2176 of 2; the first is the Shift-JIS codepoint.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2177
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2178 @item big5
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2179 The charset codepoint is a Big Five codepoint; convert it to the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2180 proper hacked-up codepoint in `chinese-big5-1' or `chinese-big5-2'.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2181 @end table
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2182 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2183