annotate man/lispref/mule.texi @ 5044:e84a30b0e4a2

remove duplicative code in change_frame_size() -------------------- ChangeLog entries follow: -------------------- src/ChangeLog addition: 2010-02-15 Ben Wing <ben@xemacs.org> * frame.c (change_frame_size_1): Simplify the logic in this function. (1) Don't allow 0 as the value of height or width. The old code that tried to allow this was totally broken, anyway, so obviously this never happens any more. (2) Don't duplicate the code in frame_conversion_internal() that converts displayable pixel size to total pixel size -- just call that function.
author Ben Wing <ben@xemacs.org>
date Mon, 15 Feb 2010 22:58:10 -0600
parents d1754e7f0cea
children 3889ef128488
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1 @c -*-texinfo-*-
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2 @c This is part of the XEmacs Lisp Reference Manual.
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
3 @c Copyright (C) 1996 Ben Wing, 2001-2002 Free Software Foundation.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
4 @c See the file lispref.texi for copying conditions.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
5 @setfilename ../../info/internationalization.info
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
6 @node MULE, Tips, Internationalization, top
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
7 @chapter MULE
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
8
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
9 @dfn{MULE} is the name originally given to the version of GNU Emacs
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
10 extended for multi-lingual (and in particular Asian-language) support.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
11 ``MULE'' is short for ``MUlti-Lingual Emacs''. It is an extension and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
12 complete rewrite of Nemacs (``Nihon Emacs'' where ``Nihon'' is the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
13 Japanese word for ``Japan''), which only provided support for Japanese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
14 XEmacs refers to its multi-lingual support as @dfn{MULE support} since
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
15 it is based on @dfn{MULE}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
16
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
17 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
18 * Internationalization Terminology::
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
19 Definition of various internationalization terms.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
20 * Charsets:: Sets of related characters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
21 * MULE Characters:: Working with characters in XEmacs/MULE.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
22 * Composite Characters:: Making new characters by overstriking other ones.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
23 * Coding Systems:: Ways of representing a string of chars using integers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
24 * CCL:: A special language for writing fast converters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
25 * Category Tables:: Subdividing charsets into groups.
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
26 * Unicode Support:: The universal coded character set.
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
27 * Charset Unification:: Handling overlapping character sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
28 * Charsets and Coding Systems:: Tables and reference information.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
29 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
30
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
31 @node Internationalization Terminology, Charsets, , MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
32 @section Internationalization Terminology
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
33
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
34 In internationalization terminology, a string of text is divided up
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
35 into @dfn{characters}, which are the printable units that make up the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
36 text. A single character is (for example) a capital @samp{A}, the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
37 number @samp{2}, a Katakana character, a Hangul character, a Kanji
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
38 ideograph (an @dfn{ideograph} is a ``picture'' character, such as is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
39 used in Japanese Kanji, Chinese Hanzi, and Korean Hanja; typically there
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
40 are thousands of such ideographs in each language), etc. The basic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
41 property of a character is that it is the smallest unit of text with
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
42 semantic significance in text processing---i.e., characters are abstract
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
43 units defined by their meaning, not by their exact appearance.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
44
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
45 Human beings normally process text visually, so to a first approximation
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
46 a character may be identified with its shape. Note that the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
47 character may be drawn by two different people (or in two different
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
48 fonts) in slightly different ways, although the "basic shape" will be the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
49 same. But consider the works of Scott Kim; human beings can recognize
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
50 hugely variant shapes as the "same" character. Sometimes, especially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
51 where characters are extremely complicated to write, completely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
52 different shapes may be defined as the "same" character in national
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
53 standards. The Taiwanese variant of Hanzi is generally the most
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
54 complicated; over the centuries, the Japanese, Koreans, and the People's
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
55 Republic of China have adopted simplifications of the shape, but the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
56 line of descent from the original shape is recorded, and the meanings
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
57 and pronunciation of different forms of the same character are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
58 considered to be identical within each language. (Of course, it may
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
59 take a specialist to recognize the related form; the point is that the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
60 relations are standardized, despite the differing shapes.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
61
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
62 In some cases, the differences will be significant enough that it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
63 actually possible to identify two or more distinct shapes that both
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
64 represent the same character. For example, the lowercase letters
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
65 @samp{a} and @samp{g} each have two distinct possible shapes---the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
66 @samp{a} can optionally have a curved tail projecting off the top, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
67 the @samp{g} can be formed either of two loops, or of one loop and a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
68 tail hanging off the bottom. Such distinct possible shapes of a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
69 character are called @dfn{glyphs}. The important characteristic of two
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
70 glyphs making up the same character is that the choice between one or
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
71 the other is purely stylistic and has no linguistic effect on a word
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
72 (this is the reason why a capital @samp{A} and lowercase @samp{a}
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
73 are different characters rather than different glyphs---e.g.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
74 @samp{Aspen} is a city while @samp{aspen} is a kind of tree).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
75
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
76 Note that @dfn{character} and @dfn{glyph} are used differently
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
77 here than elsewhere in XEmacs.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
78
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
79 A @dfn{character set} is essentially a set of related characters. ASCII,
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
80 for example, is a set of 94 characters (or 128, if you count
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
81 non-printing characters). Other character sets are ISO8859-1 (ASCII
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
82 plus various accented characters and other international symbols),
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
83 JIS X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
84 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
85 GB2312 (Mainland Chinese Hanzi), etc.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
86
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
87 The definition of a character set will implicitly or explicitly give
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
88 it an @dfn{ordering}, a way of assigning a number to each character in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
89 the set. For many character sets, there is a natural ordering, for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
90 example the ``ABC'' ordering of the Roman letters. But it is not clear
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
91 whether digits should come before or after the letters, and in fact
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
92 different European languages treat the ordering of accented characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
93 differently. It is useful to use the natural order where available, of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
94 course. The number assigned to any particular character is called the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
95 character's @dfn{code point}. (Within a given character set, each
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
96 character has a unique code point. Thus the word "set" is ill-chosen;
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
97 different orderings of the same characters are different character sets.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
98 Identifying characters is simple enough for alphabetic character sets,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
99 but the difference in ordering can cause great headaches when the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
100 thousands of characters are used by different cultures as in the Hanzi.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
101
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
102 It's important to understand that a character is defined not by any
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
103 number attached to it, but by its meaning. For example, ASCII and
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
104 EBCDIC are two charsets containing exactly the same characters
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
105 (lowercase and uppercase letters, numbers 0 through 9, particular
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
106 punctuation marks) but with different numberings. The @samp{comma}
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
107 character in ASCII and EBCDIC, for instance, is the same character
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
108 despite having a different numbering. Conversely, when comparing ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
109 and JIS-Roman, which look the same except that the latter has a yen sign
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
110 substituted for the backslash, we would say that the backslash and yen
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
111 sign are @emph{not} the same characters, despite having the same number
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
112 (95) and despite the fact that all other characters are present in both
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
113 charsets, with the same numbering. ASCII and JIS-Roman, then, do
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
114 @emph{not} have exactly the same characters in them (ASCII has a
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
115 backslash character but no yen-sign character, and vice-versa for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
116 JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
117 and JIS-Roman are closer.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
118
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
119 Sometimes, a code point is not a single number, but instead a group of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
120 numbers, called @dfn{position codes}. In such cases, the number of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
121 position codes required to index a particular character in a character
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
122 set is called the @dfn{dimension} of the character set. Character sets
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
123 indexed by more than one position code typically use byte-sized position
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
124 codes. Small character sets, e.g. ASCII, invariably use a single
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
125 position code, but for larger character sets, the choice of whether to
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
126 use multiple position codes or a single large (16-bit or 32-bit) number
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
127 is arbitrary. Unicode typically uses a single large number, but
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
128 language-specific or "national" character sets often use multiple
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
129 (usually two) position codes. For example, JIS X 0208, i.e. Japanese
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
130 Kanji, has thousands of characters, and is of dimension two -- every
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
131 character is indexed by two position codes, each in the range 1 through
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
132 94. (This number ``94'' is not a coincidence; it is the same as the
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
133 number of printable characters in ASCII, and was chosen so that JIS
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
134 characters could be directly encoded using two printable ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
135 characters.) Note that the choice of the range here is somewhat
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
136 arbitrary -- it could just as easily be 0 through 93, 2 through 95, etc.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
137 In fact, the range for JIS position codes (and for other character sets
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
138 modeled after it) is often given as range 33 through 126, so as to
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
139 directly match ASCII printing characters.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
140
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
141 An @dfn{encoding} is a way of numerically representing characters from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
142 one or more character sets into a stream of like-sized numerical values
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
143 called @dfn{words} -- typically 8-bit bytes, but sometimes 16-bit or
2818
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
144 32-bit quantities. In a context where dealing with Japanese motivates
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
145 much of XEmacs' design in this area, it's important to clearly
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
146 distinguish between charsets and encodings. For a simple charset like
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
147 ASCII, there is only one encoding normally used -- each character is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
148 represented by a single byte, with the same value as its code point.
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
149 For more complicated charsets, however, or when a single encoding needs
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
150 to represent more than charset, things are not so obvious. Unicode
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
151 version 2, for example, is a large charset with thousands of characters,
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
152 each indexed by a 16-bit number, often represented in hex, e.g. 0x05D0
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
153 for the Hebrew letter "aleph". One obvious encoding (actually two
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
154 encodings, depending on which of the two possible byte orderings is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
155 chosen) simply uses two bytes per character. This encoding is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
156 convenient for internal processing of Unicode text; however, it's
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
157 incompatible with ASCII, and thus external text (files, e-mail, etc.)
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
158 that is encoded this way is completely uninterpretable by programs
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
159 lacking Unicode support. For this reason, a different, ASCII-compatible
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
160 encoding, e.g. UTF-8, is usually used for external text. UTF-8
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
161 represents Unicode characters with one to three bytes (often extended to
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
162 six bytes to handle characters with up to 31-bit indices). Unicode
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
163 characters 00 to 7F (identical with ASCII) are directly represented with
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
164 one byte, and other characters with two or more bytes, each in the range
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
165 80 to FF. Applications that don't understand Unicode will still be able
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
166 to process ASCII characters represented in UTF-8-encoded text, and will
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
167 typically ignore (and hopefully preserve) the high-bit characters.
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
168
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
169 Similarly, Shift-JIS and EUC-JP are different encodings normally used to
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
170 encode the same character set(s), these character sets being subsets of
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
171 Unicode. However, the obvious approach of unifying XEmacs' internal
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
172 encoding across character sets, as was part of the motivation behind
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
173 Unicode, wasn't taken. This means that characters in these character
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
174 sets that are identical to characters in other character sets---for
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
175 example, the Greek alphabet is in the large Japanese character sets and
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]
aidan
parents: 2690
diff changeset
176 at least one European character set--are unfortunately disjoint.
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
177
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
178 Naive use of code points is also not possible if more than one
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
179 character set is to be used in the encoding. For example, printed
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
180 Japanese text typically requires characters from multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
181 -- ASCII, JIS X 0208, and JIS X 0212, to be specific. Each of these is
1261
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
182 indexed using one or more position codes in the range 1 through 94 (or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
183 33 through 126), so the position codes could not be used directly or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
184 there would be no way to tell which character was meant. Different
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
185 Japanese encodings handle this differently -- JIS uses special escape
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
186 characters to denote different character sets; EUC sets the high bit of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
187 the position codes for JIS X 0208 and JIS X 0212, and puts a special
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
188 extra byte before each JIS X 0212 character; etc.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
189
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
190 The encodings described above are all 7-bit or 8-bit encodings. The
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
191 fixed-width Unicode encoding previous described, however, is sometimes
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
192 considered to be a 16-bit encoding, in which case the issue of byte
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
193 ordering does not come up. (Imagine, for example, that the text is
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
194 represented as an array of shorts.) Similarly, Unicode version 3 (which
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
195 has characters with indices above 0xFFFF), and other very large
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
196 character sets, may be represented internally as 32-bit encodings,
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
197 i.e. arrays of ints. However, it does not make too much sense to talk
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
198 about 16-bit or 32-bit encodings for external data, since nowadays 8-bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
199 data is a universal standard -- the closest you can get is fixed-width
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
200 encodings using two or four bytes to encode 16-bit or 32-bit values. (A
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
201 "7-bit" encoding is used when it cannot be guaranteed that the high bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
202 of 8-bit data will be correctly preserved. Some e-mail gateways, for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
203 example, strip the high bit of text passing through them. These same
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
204 gateways often handle non-printable characters incorrectly, and so 7-bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]
ben
parents: 1188
diff changeset
205 encodings usually avoid using bytes with such values.)
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
206
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
207 A general method of handling text using multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
208 (whether for multilingual text, or simply text in an extremely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
209 complicated single language like Japanese) is defined in the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
210 international standard ISO 2022. ISO 2022 will be discussed in more
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
211 detail later (@pxref{ISO 2022}), but for now suffice it to say that text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
212 needs control functions (at least spacing), and if escape sequences are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
213 to be used, an escape sequence introducer. It was decided to make all
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
214 text streams compatible with ASCII in the sense that the codes 0--31
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
215 (and 128-159) would always be control codes, never graphic characters,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
216 and where defined by the character set the @samp{SPC} character would be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
217 assigned code 32, and @samp{DEL} would be assigned 127. Thus there are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
218 94 code points remaining if 7 bits are used. This is the reason that
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
219 most character sets are defined using position codes in the range 1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
220 through 94. Then ISO 2022 compatible encodings are produced by shifting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
221 the position codes 1 to 94 into character codes 33 to 126, or (if 8 bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
222 codes are available) into character codes 161 to 254.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
223
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
224 Encodings are classified as either @dfn{modal} or @dfn{non-modal}. In
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
225 a @dfn{modal encoding}, there are multiple states that the encoding can
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
226 be in, and the interpretation of the values in the stream depends on the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
227 current global state of the encoding. Special values in the encoding,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
228 called @dfn{escape sequences}, are used to change the global state.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
229 JIS, for example, is a modal encoding. The bytes @samp{ESC $ B}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
230 indicate that, from then on, bytes are to be interpreted as position
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
231 codes for JIS X 0208, rather than as ASCII. This effect is cancelled
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
232 using the bytes @samp{ESC ( B}, which mean ``switch from whatever the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
233 current state is to ASCII''. To switch to JIS X 0212, the escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
234 sequence @samp{ESC $ ( D}. (Note that here, as is common, the escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
235 sequences do in fact begin with @samp{ESC}. This is not necessarily the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
236 case, however. Some encodings use control characters called "locking
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
237 shifts" (effect persists until cancelled) to switch character sets.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
238
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
239 A @dfn{non-modal encoding} has no global state that extends past the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
240 character currently being interpreted. EUC, for example, is a
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
241 non-modal encoding. Characters in JIS X 0208 are encoded by setting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
242 the high bit of the position codes, and characters in JIS X 0212 are
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
243 encoded by doing the same but also prefixing the character with the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
244 byte 0x8F.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
245
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
246 The advantage of a modal encoding is that it is generally more
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
247 space-efficient, and is easily extendible because there are essentially
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
248 an arbitrary number of escape sequences that can be created. The
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
249 disadvantage, however, is that it is much more difficult to work with
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
250 if it is not being processed in a sequential manner. In the non-modal
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
251 EUC encoding, for example, the byte 0x41 always refers to the letter
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
252 @samp{A}; whereas in JIS, it could either be the letter @samp{A}, or
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
253 one of the two position codes in a JIS X 0208 character, or one of the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
254 two position codes in a JIS X 0212 character. Determining exactly which
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
255 one is meant could be difficult and time-consuming if the previous
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
256 bytes in the string have not already been processed, or impossible if
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
257 they are drawn from an external stream that cannot be rewound.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
258
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
259 Non-modal encodings are further divided into @dfn{fixed-width} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
260 @dfn{variable-width} formats. A fixed-width encoding always uses
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
261 the same number of words per character, whereas a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
262 encoding does not. EUC is a good example of a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
263 encoding: one to three bytes are used per character, depending on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
264 the character set. 16-bit and 32-bit encodings are nearly always
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
265 fixed-width, and this is in fact one of the main reasons for using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
266 an encoding with a larger word size. The advantages of fixed-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
267 encodings should be obvious. The advantages of variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
268 encodings are that they are generally more space-efficient and allow
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
269 for compatibility with existing 8-bit encodings such as ASCII. (For
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
270 example, in Unicode ASCII characters are simply promoted to a 16-bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
271 representation. That means that every ASCII character contains a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
272 @samp{NUL} byte; evidently all of the standard string manipulation
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
273 functions will lose badly in a fixed-width Unicode environment.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
274
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
275 The bytes in an 8-bit encoding are often referred to as @dfn{octets}
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
276 rather than simply as bytes. This terminology dates back to the days
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
277 before 8-bit bytes were universal, when some computers had 9-bit bytes,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
278 others had 10-bit bytes, etc.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
279
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
280 @node Charsets, MULE Characters, Internationalization Terminology, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
281 @section Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
282
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
283 A @dfn{charset} in MULE is an object that encapsulates a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
284 particular character set as well as an ordering of those characters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
285 Charsets are permanent objects and are named using symbols, like
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
286 faces.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
287
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
288 @defun charsetp object
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
289 This function returns non-@code{nil} if @var{object} is a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
290 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
291
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
292 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
293 * Charset Properties:: Properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
294 * Basic Charset Functions:: Functions for working with charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
295 * Charset Property Functions:: Functions for accessing charset properties.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
296 * Predefined Charsets:: Predefined charset objects.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
297 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
298
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
299 @node Charset Properties, Basic Charset Functions, , Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
300 @subsection Charset Properties
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
301
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
302 Charsets have the following properties:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
303
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
304 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
305 @item name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
306 A symbol naming the charset. Every charset must have a different name;
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
307 this allows a charset to be referred to using its name rather than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
308 the actual charset object.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
309 @item doc-string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
310 A documentation string describing the charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
311 @item registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
312 A regular expression matching the font registry field for this character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
313 set. For example, both the @code{ascii} and @code{latin-iso8859-1}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
314 charsets use the registry @code{"ISO8859-1"}. This field is used to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
315 choose an appropriate font when the user gives a general font
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
316 specification such as @samp{-*-courier-medium-r-*-140-*}, i.e. a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
317 14-point upright medium-weight Courier font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
318 @item dimension
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
319 Number of position codes used to index a character in the character set.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
320 XEmacs/MULE can only handle character sets of dimension 1 or 2.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
321 This property defaults to 1.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
322 @item chars
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
323 Number of characters in each dimension. In XEmacs/MULE, the only
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
324 allowed values are 94 or 96. (There are a couple of pre-defined
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
325 character sets, such as ASCII, that do not follow this, but you cannot
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
326 define new ones like this.) Defaults to 94. Note that if the dimension
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
327 is 2, the character set thus described is 94x94 or 96x96.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
328 @item columns
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
329 Number of columns used to display a character in this charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
330 Only used in TTY mode. (Under X, the actual width of a character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
331 can be derived from the font used to display the characters.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
332 If unspecified, defaults to the dimension. (This is almost
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
333 always the correct value, because character sets with dimension 2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
334 are usually ideograph character sets, which need two columns to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
335 display the intricate ideographs.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
336 @item direction
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
337 A symbol, either @code{l2r} (left-to-right) or @code{r2l}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
338 (right-to-left). Defaults to @code{l2r}. This specifies the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
339 direction that the text should be displayed in, and will be
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
340 left-to-right for most charsets but right-to-left for Hebrew
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
341 and Arabic. (Right-to-left display is not currently implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
342 @item final
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
343 Final byte of the standard ISO 2022 escape sequence designating this
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
344 charset. Must be supplied. Each combination of (@var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
345 @var{chars}) defines a separate namespace for final bytes, and each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
346 charset within a particular namespace must have a different final byte.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
347 Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
348 dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
349 bytes in the range 0x30 - 0x3F are reserved for user-defined (not
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
350 official) character sets. For more information on ISO 2022, see @ref{Coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
351 Systems}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
352 @item graphic
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
353 0 (use left half of font on output) or 1 (use right half of font on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
354 output). Defaults to 0. This specifies how to convert the position
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
355 codes that index a character in a character set into an index into the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
356 font used to display the character set. With @code{graphic} set to 0,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
357 position codes 33 through 126 map to font indices 33 through 126; with
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
358 it set to 1, position codes 33 through 126 map to font indices 161
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
359 through 254 (i.e. the same number but with the high bit set). For
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
360 example, for a font whose registry is ISO8859-1, the left half of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
361 font (octets 0x20 - 0x7F) is the @code{ascii} charset, while the right
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
362 half (octets 0xA0 - 0xFF) is the @code{latin-iso8859-1} charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
363 @item ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
364 A compiled CCL program used to convert a character in this charset into
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
365 an index into the font. This is in addition to the @code{graphic}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
366 property. If a CCL program is defined, the position codes of a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
367 character will first be processed according to @code{graphic} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
368 then passed through the CCL program, with the resulting values used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
369 to index the font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
370
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
371 This is used, for example, in the Big5 character set (used in Taiwan).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
372 This character set is not ISO-2022-compliant, and its size (94x157) does
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
373 not fit within the maximum 96x96 size of ISO-2022-compliant character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
374 sets. As a result, XEmacs/MULE splits it (in a rather complex fashion,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
375 so as to group the most commonly used characters together) into two
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
376 charset objects (@code{big5-1} and @code{big5-2}), each of size 94x94,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
377 and each charset object uses a CCL program to convert the modified
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
378 position codes back into standard Big5 indices to retrieve a character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
379 from a Big5 font.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
380 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
381
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
382 Most of the above properties can only be set when the charset is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
383 initialized, and cannot be changed later.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
384 @xref{Charset Property Functions}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
385
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
386 @node Basic Charset Functions, Charset Property Functions, Charset Properties, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
387 @subsection Basic Charset Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
388
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
389 @defun find-charset charset-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
390 This function retrieves the charset of the given name. If
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
391 @var{charset-or-name} is a charset object, it is simply returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
392 Otherwise, @var{charset-or-name} should be a symbol. If there is no
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
393 such charset, @code{nil} is returned. Otherwise the associated charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
394 object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
395 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
396
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
397 @defun get-charset name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
398 This function retrieves the charset of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
399 @code{find-charset} except an error is signalled if there is no such
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
400 charset instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
401 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
402
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
403 @defun charset-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
404 This function returns a list of the names of all defined charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
405 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
406
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
407 @defun make-charset name doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
408 This function defines a new character set. This function is for use
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
409 with MULE support. @var{name} is a symbol, the name by which the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
410 character set is normally referred. @var{doc-string} is a string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
411 describing the character set. @var{props} is a property list,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
412 describing the specific nature of the character set. The recognized
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
413 properties are @code{registry}, @code{dimension}, @code{columns},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
414 @code{chars}, @code{final}, @code{graphic}, @code{direction}, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
415 @code{ccl-program}, as previously described.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
416 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
417
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
418 @defun make-reverse-direction-charset charset new-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
419 This function makes a charset equivalent to @var{charset} but which goes
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
420 in the opposite direction. @var{new-name} is the name of the new
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
421 charset. The new charset is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
422 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
423
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
424 @defun charset-from-attributes dimension chars final &optional direction
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
425 This function returns a charset with the given @var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
426 @var{chars}, @var{final}, and @var{direction}. If @var{direction} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
427 omitted, both directions will be checked (left-to-right will be returned
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
428 if character sets exist for both directions).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
429 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
430
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
431 @defun charset-reverse-direction-charset charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
432 This function returns the charset (if any) with the same dimension,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
433 number of characters, and final byte as @var{charset}, but which is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
434 displayed in the opposite direction.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
435 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
436
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
437 @node Charset Property Functions, Predefined Charsets, Basic Charset Functions, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
438 @subsection Charset Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
439
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
440 All of these functions accept either a charset name or charset object.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
441
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
442 @defun charset-property charset prop
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
443 This function returns property @var{prop} of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
444 @xref{Charset Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
445 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
446
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
447 Convenience functions are also provided for retrieving individual
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
448 properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
449
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
450 @defun charset-name charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
451 This function returns the name of @var{charset}. This will be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
452 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
453
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
454 @defun charset-description charset
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
455 This function returns the documentation string of @var{charset}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
456 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
457
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
458 @defun charset-registry charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
459 This function returns the registry of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
460 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
461
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
462 @defun charset-dimension charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
463 This function returns the dimension of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
464 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
465
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
466 @defun charset-chars charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
467 This function returns the number of characters per dimension of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
468 @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
469 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
470
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
471 @defun charset-width charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
472 This function returns the number of display columns per character (in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
473 TTY mode) of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
474 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
475
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
476 @defun charset-direction charset
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
477 This function returns the display direction of @var{charset}---either
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
478 @code{l2r} or @code{r2l}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
479 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
480
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
481 @defun charset-iso-final-char charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
482 This function returns the final byte of the ISO 2022 escape sequence
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
483 designating @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
484 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
485
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
486 @defun charset-iso-graphic-plane charset
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
487 This function returns either 0 or 1, depending on whether the position
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
488 codes of characters in @var{charset} map to the left or right half
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
489 of their font, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
490 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
491
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
492 @defun charset-ccl-program charset
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
493 This function returns the CCL program, if any, for converting
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
494 position codes of characters in @var{charset} into font indices.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
495 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
496
1734
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
497 The two properties of a charset that can currently be set after the
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
498 charset has been created are the CCL program and the font registry.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
499
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
500 @defun set-charset-ccl-program charset ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
501 This function sets the @code{ccl-program} property of @var{charset} to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
502 @var{ccl-program}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
503 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
504
1734
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
505 @defun set-charset-registry charset registry
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
506 This function sets the @code{registry} property of @var{charset} to
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
507 @var{registry}.
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
508 @end defun
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
509
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
510 @node Predefined Charsets, , Charset Property Functions, Charsets
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
511 @subsection Predefined Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
512
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
513 The following charsets are predefined in the C code.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
514
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
515 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
516 Name Type Fi Gr Dir Registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
517 --------------------------------------------------------------
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
518 ascii 94 B 0 l2r ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
519 control-1 94 0 l2r ---
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
520 latin-iso8859-1 94 A 1 l2r ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
521 latin-iso8859-2 96 B 1 l2r ISO8859-2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
522 latin-iso8859-3 96 C 1 l2r ISO8859-3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
523 latin-iso8859-4 96 D 1 l2r ISO8859-4
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
524 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
525 arabic-iso8859-6 96 G 1 r2l ISO8859-6
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
526 greek-iso8859-7 96 F 1 l2r ISO8859-7
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
527 hebrew-iso8859-8 96 H 1 r2l ISO8859-8
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
528 latin-iso8859-9 96 M 1 l2r ISO8859-9
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
529 thai-tis620 96 T 1 l2r TIS620
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
530 katakana-jisx0201 94 I 1 l2r JISX0201.1976
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
531 latin-jisx0201 94 J 0 l2r JISX0201.1976
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
532 japanese-jisx0208-1978 94x94 @@ 0 l2r JISX0208.1978
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
533 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83|90)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
534 japanese-jisx0212 94x94 D 0 l2r JISX0212
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
535 chinese-gb2312 94x94 A 0 l2r GB2312
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
536 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
537 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
538 chinese-big5-1 94x94 0 0 l2r Big5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
539 chinese-big5-2 94x94 1 0 l2r Big5
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
540 korean-ksc5601 94x94 C 0 l2r KSC5601
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
541 composite 96x96 0 l2r ---
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
542 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
543
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
544 The following charsets are predefined in the Lisp code.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
545
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
546 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
547 Name Type Fi Gr Dir Registry
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
548 --------------------------------------------------------------
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
549 arabic-digit 94 2 0 l2r MuleArabic-0
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
550 arabic-1-column 94 3 0 r2l MuleArabic-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
551 arabic-2-column 94 4 0 r2l MuleArabic-2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
552 sisheng 94 0 0 l2r sisheng_cwnn\|OMRON_UDC_ZH
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
553 chinese-cns11643-3 94x94 I 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
554 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
555 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
556 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
557 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
558 ethiopic 94x94 2 0 l2r Ethio
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
559 ascii-r2l 94 B 0 r2l ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
560 ipa 96 0 1 l2r MuleIPA
1734
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
561 vietnamese-viscii-lower 96 1 1 l2r VISCII1.1
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]
stephent
parents: 1261
diff changeset
562 vietnamese-viscii-upper 96 2 1 l2r VISCII1.1
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
563 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
564
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
565 For all of the above charsets, the dimension and number of columns are
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
566 the same.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
567
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
568 Note that ASCII, Control-1, and Composite are handled specially.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
569 This is why some of the fields are blank; and some of the filled-in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
570 fields (e.g. the type) are not really accurate.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
571
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
572 @node MULE Characters, Composite Characters, Charsets, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
573 @section MULE Characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
574
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
575 @defun make-char charset arg1 &optional arg2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
576 This function makes a multi-byte character from @var{charset} and octets
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
577 @var{arg1} and @var{arg2}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
578 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
579
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
580 @defun char-charset character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
581 This function returns the character set of char @var{character}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
582 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
583
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
584 @defun char-octet character &optional n
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
585 This function returns the octet (i.e. position code) numbered @var{n}
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
586 (should be 0 or 1) of char @var{character}. @var{n} defaults to 0 if omitted.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
587 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
588
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
589 @defun find-charset-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
590 This function returns a list of the charsets in the region between
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
591 @var{start} and @var{end}. @var{buffer} defaults to the current buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
592 if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
593 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
594
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
595 @defun find-charset-string string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
596 This function returns a list of the charsets in @var{string}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
597 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
598
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
599 @node Composite Characters, Coding Systems, MULE Characters, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
600 @section Composite Characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
601
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
602 Composite characters are not yet completely implemented.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
603
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
604 @defun make-composite-char string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
605 This function converts a string into a single composite character. The
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
606 character is the result of overstriking all the characters in the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
607 string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
608 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
609
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
610 @defun composite-char-string character
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
611 This function returns a string of the characters comprising a composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
612 character.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
613 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
614
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
615 @defun compose-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
616 This function composes the characters in the region from @var{start} to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
617 @var{end} in @var{buffer} into one composite character. The composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
618 character replaces the composed characters. @var{buffer} defaults to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
619 the current buffer if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
620 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
621
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
622 @defun decompose-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
623 This function decomposes any composite characters in the region from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
624 @var{start} to @var{end} in @var{buffer}. This converts each composite
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
625 character into one or more characters, the individual characters out of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
626 which the composite character was formed. Non-composite characters are
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
627 left as-is. @var{buffer} defaults to the current buffer if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
628 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
629
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
630 @node Coding Systems, CCL, Composite Characters, MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
631 @section Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
632
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
633 A coding system is an object that defines how text containing multiple
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
634 character sets is encoded into a stream of (typically 8-bit) bytes. The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
635 coding system is used to decode the stream into a series of characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
636 (which may be from multiple charsets) when the text is read from a file
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
637 or process, and is used to encode the text back into the same format
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
638 when it is written out to a file or process.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
639
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
640 For example, many ISO-2022-compliant coding systems (such as Compound
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
641 Text, which is used for inter-client data under the X Window System) use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
642 escape sequences to switch between different charsets -- Japanese Kanji,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
643 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
644 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
645 @code{make-coding-system} for more information.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
646
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
647 Coding systems are normally identified using a symbol, and the symbol is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
648 accepted in place of the actual coding system object whenever a coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
649 system is called for. (This is similar to how faces and charsets work.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
650
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
651 @defun coding-system-p object
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
652 This function returns non-@code{nil} if @var{object} is a coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
653 @end defun
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
654
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
655 @menu
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
656 * Coding System Types:: Classifying coding systems.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
657 * ISO 2022:: An international standard for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
658 charsets and encodings.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
659 * EOL Conversion:: Dealing with different ways of denoting
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
660 the end of a line.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
661 * Coding System Properties:: Properties of a coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
662 * Basic Coding System Functions:: Working with coding systems.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
663 * Coding System Property Functions:: Retrieving a coding system's properties.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
664 * Encoding and Decoding Text:: Encoding and decoding text.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
665 * Detection of Textual Encoding:: Determining how text is encoded.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
666 * Big5 and Shift-JIS Functions:: Special functions for these non-standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
667 encodings.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
668 * Predefined Coding Systems:: Coding systems implemented by MULE.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
669 @end menu
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
670
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
671 @node Coding System Types, ISO 2022, , Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
672 @subsection Coding System Types
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
673
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
674 The coding system type determines the basic algorithm XEmacs will use to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
675 decode or encode a data stream. Character encodings will be converted
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
676 to the MULE encoding, escape sequences processed, and newline sequences
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
677 converted to XEmacs's internal representation. There are three basic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
678 classes of coding system type: no-conversion, ISO-2022, and special.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
679
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
680 No conversion allows you to look at the file's internal representation.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
681 Since XEmacs is basically a text editor, "no conversion" does convert
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
682 newline conventions by default. (Use the 'binary coding-system if this
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
683 is not desired.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
684
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
685 ISO 2022 (@pxref{ISO 2022}) is the basic international standard regulating
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
686 use of "coded character sets for the exchange of data", ie, text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
687 streams. ISO 2022 contains functions that make it possible to encode
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
688 text streams to comply with restrictions of the Internet mail system and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
689 de facto restrictions of most file systems (eg, use of the separator
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
690 character in file names). Coding systems which are not ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
691 conformant can be difficult to handle. Perhaps more important, they are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
692 not adaptable to multilingual information interchange, with the obvious
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
693 exception of ISO 10646 (Unicode). (Unicode is partially supported by
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
694 XEmacs with the addition of the Lisp package ucs-conv.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
695
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
696 The special class of coding systems includes automatic detection, CCL (a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
697 "little language" embedded as an interpreter, useful for translating
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
698 between variants of a single character set), non-ISO-2022-conformant
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
699 encodings like Unicode, Shift JIS, and Big5, and MULE internal coding.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
700 (NB: this list is based on XEmacs 21.2. Terminology may vary slightly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
701 for other versions of XEmacs and for GNU Emacs 20.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
702
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
703 @table @code
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
704 @item no-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
705 No conversion, for binary files, and a few special cases of non-ISO-2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
706 coding systems where conversion is done by hook functions (usually
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
707 implemented in CCL). On output, graphic characters that are not in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
708 ASCII or Latin-1 will be replaced by a @samp{?}. (For a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
709 no-conversion-encoded buffer, these characters will only be present if
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
710 you explicitly insert them.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
711 @item iso2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
712 Any ISO-2022-compliant encoding. Among others, this includes JIS (the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
713 Japanese encoding commonly used for e-mail), national variants of EUC
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
714 (the standard Unix encoding for Japanese and other languages), and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
715 Compound Text (an encoding used in X11). You can specify more specific
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
716 information about the conversion with the @var{flags} argument.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
717 @item ucs-4
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
718 ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of Unicode.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
719 @item utf-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
720 ISO 10646 UTF-8 encoding. A ``file system safe'' transformation format
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
721 that can be used with both UCS-4 and Unicode.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
722 @item undecided
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
723 Automatic conversion. XEmacs attempts to detect the coding system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
724 in the file.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
725 @item shift-jis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
726 Shift-JIS (a Japanese encoding commonly used in PC operating systems).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
727 @item big5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
728 Big5 (the encoding commonly used for Taiwanese).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
729 @item ccl
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
730 The conversion is performed using a user-written pseudo-code program.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
731 CCL (Code Conversion Language) is the name of this pseudo-code. For
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
732 example, CCL is used to map KOI8-R characters (an encoding for Russian
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
733 Cyrillic) to ISO8859-5 (the form used internally by MULE).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
734 @item internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
735 Write out or read in the raw contents of the memory representing the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
736 buffer's text. This is primarily useful for debugging purposes, and is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
737 only enabled when XEmacs has been compiled with @code{DEBUG_XEMACS} set
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
738 (the @samp{--debug} configure option). @strong{Warning}: Reading in a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
739 file using @code{internal} conversion can result in an internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
740 inconsistency in the memory representing a buffer's text, which will
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
741 produce unpredictable results and may cause XEmacs to crash. Under
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
742 normal circumstances you should never use @code{internal} conversion.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
743 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
744
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
745 @node ISO 2022, EOL Conversion, Coding System Types, Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
746 @section ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
747
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
748 This section briefly describes the ISO 2022 encoding standard. A more
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
749 thorough treatment is available in the original document of ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
750 2022 as well as various national standards (such as JIS X 0202).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
751
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
752 Character sets (@dfn{charsets}) are classified into the following four
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
753 categories, according to the number of characters in the charset:
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
754 94-charset, 96-charset, 94x94-charset, and 96x96-charset. This means
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
755 that although an ISO 2022 coding system may have variable width
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
756 characters, each charset used is fixed-width (in contrast to the MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
757 character set and UTF-8, for example).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
758
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
759 ISO 2022 provides for switching between character sets via escape
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
760 sequences. This switching is somewhat complicated, because ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
761 provides for both legacy applications like Internet mail that accept
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
762 only 7 significant bits in some contexts (RFC 822 headers, for example),
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
763 and more modern "8-bit clean" applications. It also provides for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
764 compact and transparent representation of languages like Japanese which
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
765 mix ASCII and a national script (even outside of computer programs).
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
766
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
767 First, ISO 2022 codified prevailing practice by dividing the code space
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
768 into "control" and "graphic" regions. The code points 0x00-0x1F and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
769 0x80-0x9F are reserved for "control characters", while "graphic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
770 characters" must be assigned to code points in the regions 0x20-0x7F and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
771 0xA0-0xFF. The positions 0x20 and 0x7F are special, and under some
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
772 circumstances must be assigned the graphic character "ASCII SPACE" and
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
773 the control character "ASCII DEL" respectively.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
774
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
775 The various regions are given the name C0 (0x00-0x1F), GL (0x20-0x7F),
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
776 C1 (0x80-0x9F), and GR (0xA0-0xFF). GL and GR stand for "graphic left"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
777 and "graphic right", respectively, because of the standard method of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
778 displaying graphic character sets in tables with the high byte indexing
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
779 columns and the low byte indexing rows. I don't find it very intuitive,
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
780 but these are called "registers".
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
781
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
782 An ISO 2022-conformant encoding for a graphic character set must use a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
783 fixed number of bytes per character, and the values must fit into a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
784 single register; that is, each byte must range over either 0x20-0x7F, or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
785 0xA0-0xFF. It is not allowed to extend the range of the repertoire of a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
786 character set by using both ranges at the same. This is why a standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
787 character set such as ISO 8859-1 is actually considered by ISO 2022 to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
788 be an aggregation of two character sets, ASCII and LATIN-1, and why it
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
789 is technically incorrect to refer to ISO 8859-1 as "Latin 1". Also, a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
790 single character's bytes must all be drawn from the same register; this
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
791 is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
792 2022-compatible encodings.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
793
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
794 The reason for this restriction becomes clear when you attempt to define
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
795 an efficient, robust encoding for a language like Japanese. Like ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
796 8859, Japanese encodings are aggregations of several character sets. In
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
797 practice, the vast majority of characters are drawn from the "JIS Roman"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
798 character set (a derivative of ASCII; it won't hurt to think of it as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
799 ASCII) and the JIS X 0208 standard "basic Japanese" character set
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
800 including not only ideographic characters ("kanji") but syllabic
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
801 Japanese characters ("kana"), a wide variety of symbols, and many
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
802 alphabetic characters (Roman, Greek, and Cyrillic) as well. Although
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
803 JIS X 0208 includes the whole Roman alphabet, as a 2-byte code it is not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
804 suited to programming; thus the inclusion of ASCII in the standard
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
805 Japanese encodings.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
806
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
807 For normal Japanese text such as in newspapers, a broad repertoire of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
808 approximately 3000 characters is used. Evidently this won't fit into
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
809 one byte; two must be used. But much of the text processed by Japanese
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
810 computers is computer source code, nearly all of which is ASCII. A not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
811 insignificant portion of ordinary text is English (as such or as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
812 borrowed Japanese vocabulary) or other languages which can represented
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
813 at least approximately in ASCII, as well. It seems reasonable then to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
814 represent ASCII in one byte, and JIS X 0208 in two. And this is exactly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
815 what the Extended Unix Code for Japanese (EUC-JP) does. ASCII is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
816 invoked to the GL register, and JIS X 0208 is invoked to the GR
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
817 register. Thus, each byte can be tested for its character set by
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
818 looking at the high bit; if set, it is Japanese, if clear, it is ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
819 Furthermore, since control characters like newline can never be part of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
820 a graphic character, even in the case of corruption in transmission the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
821 stream will be resynchronized at every line break, on the order of 60-80
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
822 bytes. This coding system requires no escape sequences or special
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
823 control codes to represent 99.9% of all Japanese text.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
824
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
825 Note carefully the distinction between the character sets (ASCII and JIS
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
826 X 0208), the encoding (EUC-JP), and the coding system (ISO 2022). The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
827 JIS X 0208 character set is used in three different encodings for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
828 Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
829 always clear), in EUC-JP it is invoked into GR (setting the high bit in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
830 the process), and in Shift JIS the high bit may be set or reset, and the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
831 significant bits are shifted within the 16-bit character so that the two
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
832 main character sets can coexist with a third (the "halfwidth katakana"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
833 of JIS X 0201). As the name implies, the ISO-2022-JP encoding is also a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
834 version of the ISO-2022 coding system.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
835
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
836 In order to systematically treat subsidiary character sets (like the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
837 "halfwidth katakana" already mentioned, and the "supplementary kanji" of
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
838 JIS X 0212), four further registers are defined: G0, G1, G2, and G3.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
839 Unlike GL and GR, they are not logically distinguished by internal
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
840 format. Instead, the process of "invocation" mentioned earlier is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
841 broken into two steps: first, a character set is @dfn{designated} to one
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
842 of the registers G0-G3 by use of an @dfn{escape sequence} of the form:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
843
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
844 @example
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
845 ESC [@var{I}] @var{I} @var{F}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
846 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
847
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
848 where @var{I} is an intermediate character or characters in the range
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
849 0x20 - 0x3F, and @var{F}, from the range 0x30-0x7Fm is the final
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
850 character identifying this charset. (Final characters in the range
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
851 0x30-0x3F are reserved for private use and will never have a publicly
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
852 registered meaning.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
853
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
854 Then that register is @dfn{invoked} to either GL or GR, either
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
855 automatically (designations to G0 normally involve invocation to GL as
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
856 well), or by use of shifting (affecting only the following character in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
857 the data stream) or locking (effective until the next designation or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
858 locking) control sequences. An encoding conformant to ISO 2022 is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
859 typically defined by designating the initial contents of the G0-G3
901
37e56e920ac5 [xemacs-hg @ 2002-07-05 20:35:47 by adrian]
adrian
parents: 775
diff changeset
860 registers, specifying a 7 or 8 bit environment, and specifying whether
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
861 further designations will be recognized.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
862
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
863 Some examples of character sets and the registered final characters
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
864 @var{F} used to designate them:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
865
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
866 @need 1000
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
867 @table @asis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
868 @item 94-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
869 ASCII (B), left (J) and right (I) half of JIS X 0201, ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
870 @item 96-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
871 Latin-1 (A), Latin-2 (B), Latin-3 (C), ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
872 @item 94x94-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
873 GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
874 @item 96x96-charset
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
875 none for the moment
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
876 @end table
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
877
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
878 The meanings of the various characters in these sequences, where not
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
879 specified by the ISO 2022 standard (such as the ESC character), are
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
880 assigned by @dfn{ECMA}, the European Computer Manufacturers Association.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
881
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
882 The meaning of intermediate characters are:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
883
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
884 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
885 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
886 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
887 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
888 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
889 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
890 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
891 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
892 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
893 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
894 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
895 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
896 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
897
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
898 The comma may be used in files read and written only by MULE, as a MULE
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
899 extension, but this is illegal in ISO 2022. (The reason is that in ISO
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
900 2022 G0 must be a 94-member character set, with 0x20 assigned the value
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
901 SPACE, and 0x7F assigned the value DEL.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
902
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
903 Here are examples of designations:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
904
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
905 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
906 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
907 ESC ( B : designate to G0 ASCII
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
908 ESC - A : designate to G1 Latin-1
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
909 ESC $ ( A or ESC $ A : designate to G0 GB2312
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
910 ESC $ ( B or ESC $ B : designate to G0 JISX0208
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
911 ESC $ ) C : designate to G1 KSC5601
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
912 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
913 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
914
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
915 (The short forms used to designate GB2312 and JIS X 0208 are for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
916 backwards compatibility; the long forms are preferred.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
917
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
918 To use a charset designated to G2 or G3, and to use a charset designated
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
919 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
920 into GL. There are two types of invocation, Locking Shift (forever) and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
921 Single Shift (one character only).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
922
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
923 Locking Shift is done as follows:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
924
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
925 @example
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
926 LS0 or SI (0x0F): invoke G0 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
927 LS1 or SO (0x0E): invoke G1 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
928 LS2: invoke G2 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
929 LS3: invoke G3 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
930 LS1R: invoke G1 into GR
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
931 LS2R: invoke G2 into GR
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
932 LS3R: invoke G3 into GR
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
933 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
934
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
935 Single Shift is done as follows:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
936
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
937 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
938 @group
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
939 SS2 or ESC N: invoke G2 into GL
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
940 SS3 or ESC O: invoke G3 into GL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
941 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
942 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
943
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
944 The shift functions (such as LS1R and SS3) are represented by control
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
945 characters (from C1) in 8 bit environments and by escape sequences in 7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
946 bit environments.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
947
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
948 (#### Ben says: I think the above is slightly incorrect. It appears that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
949 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
950 ESC O behave as indicated. The above definitions will not parse
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
951 EUC-encoded text correctly, and it looks like the code in mule-coding.c
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
952 has similar problems.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
953
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
954 Evidently there are a lot of ISO-2022-compliant ways of encoding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
955 multilingual text. Now, in the world, there exist many coding systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
956 such as X11's Compound Text, Japanese JUNET code, and so-called EUC
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
957 (Extended UNIX Code); all of these are variants of ISO 2022.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
958
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
959 In MULE, we characterize a version of ISO 2022 by the following
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
960 attributes:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
961
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
962 @enumerate
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
963 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
964 The character sets initially designated to G0 thru G3.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
965 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
966 Whether short form designations are allowed for Japanese and Chinese.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
967 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
968 Whether ASCII should be designated to G0 before control characters.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
969 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
970 Whether ASCII should be designated to G0 at the end of line.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
971 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
972 7-bit environment or 8-bit environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
973 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
974 Whether Locking Shifts are used or not.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
975 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
976 Whether to use ASCII or the variant JIS X 0201-1976-Roman.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
977 @item
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
978 Whether to use JIS X 0208-1983 or the older version JIS X 0208-1976.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
979 @end enumerate
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
980
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
981 (The last two are only for Japanese.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
982
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
983 By specifying these attributes, you can create any variant
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
984 of ISO 2022.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
985
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
986 Here are several examples:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
987
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
988 @example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
989 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
990 ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
991 1. G0 <- ASCII, G1..3 <- never used
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
992 2. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
993 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
994 4. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
995 5. 7-bit environment
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
996 6. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
997 7. Use ASCII
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
998 8. Use JIS X 0208-1983
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
999 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1000
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1001 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1002 ctext -- X11 Compound Text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1003 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1004 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1005 3. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1006 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1007 5. 8-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1008 6. No.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1009 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1010 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1011 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1012
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1013 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1014 euc-china -- Chinese EUC. Often called the "GB encoding", but that is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1015 technically incorrect.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1016 1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1017 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1018 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1019 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1020 5. 8-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1021 6. No.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1022 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1023 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1024 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1025
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1026 @group
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1027 ISO-2022-KR -- Coding system used in Korean email.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1028 1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1029 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1030 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1031 4. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1032 5. 7-bit environment.
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
1033 6. Yes.
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1034 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1035 8. Use JIS X 0208-1983.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1036 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1037 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1038
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1039 MULE creates all of these coding systems by default.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1040
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1041 @node EOL Conversion, Coding System Properties, ISO 2022, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1042 @subsection EOL Conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1043
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1044 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1045 @item nil
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1046 Automatically detect the end-of-line type (LF, CRLF, or CR). Also
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1047 generate subsidiary coding systems named @code{@var{name}-unix},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1048 @code{@var{name}-dos}, and @code{@var{name}-mac}, that are identical to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1049 this coding system but have an EOL-TYPE value of @code{lf}, @code{crlf},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1050 and @code{cr}, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1051 @item lf
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1052 The end of a line is marked externally using ASCII LF. Since this is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1053 also the way that XEmacs represents an end-of-line internally,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1054 specifying this option results in no end-of-line conversion. This is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1055 the standard format for Unix text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1056 @item crlf
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1057 The end of a line is marked externally using ASCII CRLF. This is the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1058 standard format for MS-DOS text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1059 @item cr
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1060 The end of a line is marked externally using ASCII CR. This is the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1061 standard format for Macintosh text files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1062 @item t
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1063 Automatically detect the end-of-line type but do not generate subsidiary
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1064 coding systems. (This value is converted to @code{nil} when stored
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1065 internally, and @code{coding-system-property} will return @code{nil}.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1066 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1067
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1068 @node Coding System Properties, Basic Coding System Functions, EOL Conversion, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1069 @subsection Coding System Properties
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1070
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1071 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1072 @item mnemonic
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1073 String to be displayed in the modeline when this coding system is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1074 active.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1075
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1076 @item eol-type
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1077 End-of-line conversion to be used. It should be one of the types
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1078 listed in @ref{EOL Conversion}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1079
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1080 @item eol-lf
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1081 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1082 Unix line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1083
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1084 @item eol-crlf
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1085 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1086 DOS line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1087
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1088 @item eol-cr
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1089 The coding system which is the same as this one, except that it uses the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1090 Macintosh line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1091
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1092 @item post-read-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1093 Function called after a file has been read in, to perform the decoding.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1094 Called with two arguments, @var{start} and @var{end}, denoting a region of
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1095 the current buffer to be decoded.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1096
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1097 @item pre-write-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1098 Function called before a file is written out, to perform the encoding.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1099 Called with two arguments, @var{start} and @var{end}, denoting a region of
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1100 the current buffer to be encoded.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1101 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1102
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1103 The following additional properties are recognized if @var{type} is
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1104 @code{iso2022}:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1105
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1106 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1107 @item charset-g0
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1108 @itemx charset-g1
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1109 @itemx charset-g2
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1110 @itemx charset-g3
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1111 The character set initially designated to the G0 - G3 registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1112 The value should be one of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1113
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1114 @itemize @bullet
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1115 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1116 A charset object (designate that character set)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1117 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1118 @code{nil} (do not ever use this register)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1119 @item
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1120 @code{t} (no character set is initially designated to the register, but
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1121 may be later on; this automatically sets the corresponding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1122 @code{force-g*-on-output} property)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1123 @end itemize
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1124
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1125 @item force-g0-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1126 @itemx force-g1-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1127 @itemx force-g2-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1128 @itemx force-g3-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1129 If non-@code{nil}, send an explicit designation sequence on output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1130 before using the specified register.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1131
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1132 @item short
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1133 If non-@code{nil}, use the short forms @samp{ESC $ @@}, @samp{ESC $ A},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1134 and @samp{ESC $ B} on output in place of the full designation sequences
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1135 @samp{ESC $ ( @@}, @samp{ESC $ ( A}, and @samp{ESC $ ( B}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1136
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1137 @item no-ascii-eol
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1138 If non-@code{nil}, don't designate ASCII to G0 at each end of line on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1139 output. Setting this to non-@code{nil} also suppresses other
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1140 state-resetting that normally happens at the end of a line.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1141
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1142 @item no-ascii-cntl
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1143 If non-@code{nil}, don't designate ASCII to G0 before control chars on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1144 output.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1145
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1146 @item seven
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1147 If non-@code{nil}, use 7-bit environment on output. Otherwise, use 8-bit
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1148 environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1149
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1150 @item lock-shift
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1151 If non-@code{nil}, use locking-shift (SO/SI) instead of single-shift or
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1152 designation by escape sequence.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1153
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1154 @item no-iso6429
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1155 If non-@code{nil}, don't use ISO6429's direction specification.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1156
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1157 @item escape-quoted
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1158 If non-@code{nil}, literal control characters that are the same as the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1159 beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1160 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F),
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1161 and CSI (0x9B)) are ``quoted'' with an escape character so that they can
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1162 be properly distinguished from an escape sequence. (Note that doing
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1163 this results in a non-portable encoding.) This encoding flag is used for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1164 byte-compiled files. Note that ESC is a good choice for a quoting
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1165 character because there are no escape sequences whose second byte is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1166 character from the Control-0 or Control-1 character sets; this is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1167 explicitly disallowed by the ISO 2022 standard.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1168
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1169 @item input-charset-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1170 A list of conversion specifications, specifying conversion of characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1171 in one charset to another when decoding is performed. Each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1172 specification is a list of two elements: the source charset, and the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1173 destination charset.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1174
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1175 @item output-charset-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1176 A list of conversion specifications, specifying conversion of characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1177 in one charset to another when encoding is performed. The form of each
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1178 specification is the same as for @code{input-charset-conversion}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1179 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1180
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1181 The following additional properties are recognized (and required) if
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1182 @var{type} is @code{ccl}:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1183
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1184 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1185 @item decode
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1186 CCL program used for decoding (converting to internal format).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1187
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1188 @item encode
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1189 CCL program used for encoding (converting to external format).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1190 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1191
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1192 The following properties are used internally: @var{eol-cr},
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1193 @var{eol-crlf}, @var{eol-lf}, and @var{base}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1194
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1195 @node Basic Coding System Functions, Coding System Property Functions, Coding System Properties, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1196 @subsection Basic Coding System Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1197
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1198 @defun find-coding-system coding-system-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1199 This function retrieves the coding system of the given name.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1200
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1201 If @var{coding-system-or-name} is a coding-system object, it is simply
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1202 returned. Otherwise, @var{coding-system-or-name} should be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1203 If there is no such coding system, @code{nil} is returned. Otherwise
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1204 the associated coding system object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1205 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1206
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1207 @defun get-coding-system name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1208 This function retrieves the coding system of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1209 @code{find-coding-system} except an error is signalled if there is no
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1210 such coding system instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1211 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1212
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1213 @defun coding-system-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1214 This function returns a list of the names of all defined coding systems.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1215 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1216
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1217 @defun coding-system-name coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1218 This function returns the name of the given coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1219 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1220
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1221 @defun coding-system-base coding-system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1222 Returns the base coding system (undecided EOL convention)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1223 coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1224 @end defun
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1225
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1226 @defun make-coding-system name type &optional doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1227 This function registers symbol @var{name} as a coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1228
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1229 @var{type} describes the conversion method used and should be one of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1230 the types listed in @ref{Coding System Types}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1231
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1232 @var{doc-string} is a string describing the coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1233
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1234 @var{props} is a property list, describing the specific nature of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1235 character set. Recognized properties are as in @ref{Coding System
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1236 Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1237 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1238
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1239 @defun copy-coding-system old-coding-system new-name
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1240 This function copies @var{old-coding-system} to @var{new-name}. If
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1241 @var{new-name} does not name an existing coding system, a new one will
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1242 be created.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1243 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1244
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1245 @defun subsidiary-coding-system coding-system eol-type
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1246 This function returns the subsidiary coding system of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1247 @var{coding-system} with eol type @var{eol-type}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1248 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1249
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1250 @node Coding System Property Functions, Encoding and Decoding Text, Basic Coding System Functions, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1251 @subsection Coding System Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1252
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1253 @defun coding-system-doc-string coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1254 This function returns the doc string for @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1255 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1256
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1257 @defun coding-system-type coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1258 This function returns the type of @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1259 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1260
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1261 @defun coding-system-property coding-system prop
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1262 This function returns the @var{prop} property of @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1263 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1264
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1265 @node Encoding and Decoding Text, Detection of Textual Encoding, Coding System Property Functions, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1266 @subsection Encoding and Decoding Text
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1267
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1268 @defun decode-coding-region start end coding-system &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1269 This function decodes the text between @var{start} and @var{end} which
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1270 is encoded in @var{coding-system}. This is useful if you've read in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1271 encoded text from a file without decoding it (e.g. you read in a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1272 JIS-formatted file but used the @code{binary} or @code{no-conversion} coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1273 system, so that it shows up as @samp{^[$B!<!+^[(B}). The length of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1274 encoded text is returned. @var{buffer} defaults to the current buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1275 if unspecified.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1276 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1277
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1278 @defun encode-coding-region start end coding-system &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1279 This function encodes the text between @var{start} and @var{end} using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1280 @var{coding-system}. This will, for example, convert Japanese
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1281 characters into stuff such as @samp{^[$B!<!+^[(B} if you use the JIS
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1282 encoding. The length of the encoded text is returned. @var{buffer}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1283 defaults to the current buffer if unspecified.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1284 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1285
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1286 @node Detection of Textual Encoding, Big5 and Shift-JIS Functions, Encoding and Decoding Text, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1287 @subsection Detection of Textual Encoding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1288
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1289 @defun coding-category-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1290 This function returns a list of all recognized coding categories.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1291 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1292
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1293 @defun set-coding-priority-list list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1294 This function changes the priority order of the coding categories.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1295 @var{list} should be a list of coding categories, in descending order of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1296 priority. Unspecified coding categories will be lower in priority than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1297 all specified ones, in the same relative order they were in previously.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1298 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1299
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1300 @defun coding-priority-list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1301 This function returns a list of coding categories in descending order of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1302 priority.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1303 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1304
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1305 @defun set-coding-category-system coding-category coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1306 This function changes the coding system associated with a coding category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1307 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1308
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1309 @defun coding-category-system coding-category
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1310 This function returns the coding system associated with a coding category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1311 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1312
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1313 @defun detect-coding-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1314 This function detects coding system of the text in the region between
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1315 @var{start} and @var{end}. Returned value is a list of possible coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1316 systems ordered by priority. If only ASCII characters are found, it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1317 returns @code{autodetect} or one of its subsidiary coding systems
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1318 according to a detected end-of-line type. Optional arg @var{buffer}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1319 defaults to the current buffer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1320 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1321
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1322 @node Big5 and Shift-JIS Functions, Predefined Coding Systems, Detection of Textual Encoding, Coding Systems
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1323 @subsection Big5 and Shift-JIS Functions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1324
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1325 These are special functions for working with the non-standard
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1326 Shift-JIS and Big5 encodings.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1327
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1328 @defun decode-shift-jis-char code
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1329 This function decodes a JIS X 0208 character of Shift-JIS coding-system.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1330 @var{code} is the character code in Shift-JIS as a cons of type bytes.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1331 The corresponding character is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1332 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1333
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1334 @defun encode-shift-jis-char character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1335 This function encodes a JIS X 0208 character @var{character} to
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1336 SHIFT-JIS coding-system. The corresponding character code in SHIFT-JIS
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1337 is returned as a cons of two bytes.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1338 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1339
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1340 @defun decode-big5-char code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1341 This function decodes a Big5 character @var{code} of BIG5 coding-system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1342 @var{code} is the character code in BIG5. The corresponding character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1343 is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1344 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1345
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1346 @defun encode-big5-char character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1347 This function encodes the Big5 character @var{character} to BIG5
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1348 coding-system. The corresponding character code in Big5 is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1349 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1350
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1351 @node Predefined Coding Systems, , Big5 and Shift-JIS Functions, Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1352 @subsection Coding Systems Implemented
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1353
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1354 MULE initializes most of the commonly used coding systems at XEmacs's
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1355 startup. A few others are initialized only when the relevant language
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1356 environment is selected and support libraries are loaded. (NB: The
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1357 following list is based on XEmacs 21.2.19, the development branch at the
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1358 time of writing. The list may be somewhat different for other
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1359 versions. Recent versions of GNU Emacs 20 implement a few more rare
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1360 coding systems; work is being done to port these to XEmacs.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1361
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1362 Unfortunately, there is not a consistent naming convention for character
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1363 sets, and for practical purposes coding systems often take their name
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1364 from their principal character sets (ASCII, KOI8-R, Shift JIS). Others
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1365 take their names from the coding system (ISO-2022-JP, EUC-KR), and a few
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1366 from their non-text usages (internal, binary). To provide for this, and
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1367 for the fact that many coding systems have several common names, an
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1368 aliasing system is provided. Finally, some effort has been made to use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1369 names that are registered as MIME charsets (this is why the name
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1370 'shift_jis contains that un-Lisp-y underscore).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1371
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1372 There is a systematic naming convention regarding end-of-line (EOL)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1373 conventions for different systems. A coding system whose name ends in
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1374 "-unix" forces the assumptions that lines are broken by newlines (0x0A).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1375 A coding system whose name ends in "-mac" forces the assumptions that
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1376 lines are broken by ASCII CRs (0x0D). A coding system whose name ends
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1377 in "-dos" forces the assumptions that lines are broken by CRLF sequences
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1378 (0x0D 0x0A). These subsidiary coding systems are automatically derived
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1379 from a base coding system. Use of the base coding system implies
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1380 autodetection of the text file convention. (The fact that the -unix,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1381 -mac, and -dos are derived from a base system results in them showing up
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1382 as "aliases" in `list-coding-systems'.) These subsidiaries have a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1383 consistent modeline indicator as well. "-dos" coding systems have ":T"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1384 appended to their modeline indicator, while "-mac" coding systems have
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1385 ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1386
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1387 In the following table, each coding system is given with its mode line
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1388 indicator in parentheses. Non-textual coding systems are listed first,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1389 followed by textual coding systems and their aliases. (The coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1390 subsidiary modeline indicators ":T" and ":t" will be omitted from the
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1391 table of coding systems.)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1392
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1393 ### SJT 1999-08-23 Maybe should order these by language? Definitely
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1394 need language usage for the ISO-8859 family.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1395
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1396 Note that although true coding system aliases have been implemented for
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1397 XEmacs 21.2, the coding system initialization has not yet been converted
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1398 as of 21.2.19. So coding systems described as aliases have the same
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1399 properties as the aliased coding system, but will not be equal as Lisp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1400 objects.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1401
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1402 @table @code
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1403
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1404 @item automatic-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1405 @itemx undecided
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1406 @itemx undecided-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1407 @itemx undecided-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1408 @itemx undecided-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1409
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1410 Modeline indicator: @code{Auto}. A type @code{undecided} coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1411 Attempts to determine an appropriate coding system from file contents or
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1412 the environment.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1413
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1414 @item raw-text
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1415 @itemx no-conversion
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1416 @itemx raw-text-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1417 @itemx raw-text-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1418 @itemx raw-text-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1419 @itemx no-conversion-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1420 @itemx no-conversion-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1421 @itemx no-conversion-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1422
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1423 Modeline indicator: @code{Raw}. A type @code{no-conversion} coding system,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1424 which converts only line-break-codes. An implementation quirk means
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1425 that this coding system is also used for ISO8859-1.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1426
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1427 @item binary
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1428 Modeline indicator: @code{Binary}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1429 system which does no character coding or EOL conversions. An alias for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1430 @code{raw-text-unix}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1431
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1432 @item alternativnyj
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1433 @itemx alternativnyj-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1434 @itemx alternativnyj-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1435 @itemx alternativnyj-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1436
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1437 Modeline indicator: @code{Cy.Alt}. A type @code{ccl} coding system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1438 Alternativnyj, an encoding of the Cyrillic alphabet.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1439
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1440 @item big5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1441 @itemx big5-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1442 @itemx big5-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1443 @itemx big5-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1444
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1445 Modeline indicator: @code{Zh/Big5}. A type @code{big5} coding system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1446 BIG5, the most common encoding of traditional Chinese as used in Taiwan.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1447
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1448 @item cn-gb-2312
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1449 @itemx cn-gb-2312-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1450 @itemx cn-gb-2312-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1451 @itemx cn-gb-2312-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1452
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1453 Modeline indicator: @code{Zh-GB/EUC}. A type @code{iso2022} coding system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1454 for simplified Chinese (as used in the People's Republic of China), with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1455 the @code{ascii} (G0), @code{chinese-gb2312} (G1), and @code{sisheng}
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1456 (G2) character sets initially designated. Chinese EUC (Extended Unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1457 Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1458
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1459 @item ctext-hebrew
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1460 @itemx ctext-hebrew-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1461 @itemx ctext-hebrew-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1462 @itemx ctext-hebrew-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1463
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1464 Modeline indicator: @code{CText/Hbrw}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1465 with the @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) character
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1466 sets initially designated for Hebrew.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1467
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1468 @item ctext
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1469 @itemx ctext-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1470 @itemx ctext-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1471 @itemx ctext-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1472
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1473 Modeline indicator: @code{CText}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1474 with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1) character
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1475 sets initially designated. X11 Compound Text Encoding. Often
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1476 mistakenly recognized instead of EUC encodings; usual cause is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1477 inappropriate setting of @code{coding-priority-list}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1478
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1479 @item escape-quoted
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1480
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1481 Modeline indicator: @code{ESC/Quot}. A type @code{iso2022} 8-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1482 system with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1483 character sets initially designated and escape quoting. Unix EOL
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1484 conversion (ie, no conversion). It is used for .ELC files.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1485
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1486 @item euc-jp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1487 @itemx euc-jp-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1488 @itemx euc-jp-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1489 @itemx euc-jp-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1490
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1491 Modeline indicator: @code{Ja/EUC}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1492 with @code{ascii} (G0), @code{japanese-jisx0208} (G1),
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1493 @code{katakana-jisx0201} (G2), and @code{japanese-jisx0212} (G3)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1494 initially designated. Japanese EUC (Extended Unix Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1495
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1496 @item euc-kr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1497 @itemx euc-kr-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1498 @itemx euc-kr-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1499 @itemx euc-kr-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1500
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1501 Modeline indicator: @code{ko/EUC}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1502 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1503 designated. Korean EUC (Extended Unix Code).
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1504
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1505 @item hz-gb-2312
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1506 Modeline indicator: @code{Zh-GB/Hz}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1507 system with Unix EOL convention (ie, no conversion) using
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1508 post-read-decode and pre-write-encode functions to translate the Hz/ZW
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1509 coding system used for Chinese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1510
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1511 @item iso-2022-7bit
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1512 @itemx iso-2022-7bit-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1513 @itemx iso-2022-7bit-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1514 @itemx iso-2022-7bit-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1515 @itemx iso-2022-7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1516
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1517 Modeline indicator: @code{ISO7}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1518 with @code{ascii} (G0) initially designated. Other character sets must
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1519 be explicitly designated to be used.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1520
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1521 @item iso-2022-7bit-ss2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1522 @itemx iso-2022-7bit-ss2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1523 @itemx iso-2022-7bit-ss2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1524 @itemx iso-2022-7bit-ss2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1525
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1526 Modeline indicator: @code{ISO7/SS}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1527 with @code{ascii} (G0) initially designated. Other character sets must
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1528 be explicitly designated to be used. SS2 is used to invoke a
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1529 96-charset, one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1530
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1531 @item iso-2022-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1532 @itemx iso-2022-8-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1533 @itemx iso-2022-8-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1534 @itemx iso-2022-8-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1535
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1536 Modeline indicator: @code{ISO8}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1537 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1538 designated. Other character sets must be explicitly designated to be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1539 used. No single-shift or locking-shift.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1540
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1541 @item iso-2022-8bit-ss2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1542 @itemx iso-2022-8bit-ss2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1543 @itemx iso-2022-8bit-ss2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1544 @itemx iso-2022-8bit-ss2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1545
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1546 Modeline indicator: @code{ISO8/SS}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1547 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1548 designated. Other character sets must be explicitly designated to be
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1549 used. SS2 is used to invoke a 96-charset, one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1550
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1551 @item iso-2022-int-1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1552 @itemx iso-2022-int-1-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1553 @itemx iso-2022-int-1-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1554 @itemx iso-2022-int-1-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1555
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1556 Modeline indicator: @code{INT-1}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1557 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1558 designated. ISO-2022-INT-1.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1559
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1560 @item iso-2022-jp-1978-irv
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1561 @itemx iso-2022-jp-1978-irv-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1562 @itemx iso-2022-jp-1978-irv-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1563 @itemx iso-2022-jp-1978-irv-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1564
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1565 Modeline indicator: @code{Ja-78/7bit}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1566 system. For compatibility with old Japanese terminals; if you need to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1567 know, look at the source.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1568
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1569 @item iso-2022-jp
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1570 @itemx iso-2022-jp-2 (ISO7/SS)
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1571 @itemx iso-2022-jp-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1572 @itemx iso-2022-jp-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1573 @itemx iso-2022-jp-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1574 @itemx iso-2022-jp-2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1575 @itemx iso-2022-jp-2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1576 @itemx iso-2022-jp-2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1577
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1578 Modeline indicator: @code{MULE/7bit}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1579 system with @code{ascii} (G0) initially designated, and complex
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1580 specifications to insure backward compatibility with old Japanese
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1581 systems. Used for communication with mail and news in Japan. The "-2"
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1582 versions also use SS2 to invoke a 96-charset one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1583
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1584 @item iso-2022-kr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1585 Modeline indicator: @code{Ko/7bit} A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1586 system with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1587 designated. Used for e-mail in Korea.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1588
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1589 @item iso-2022-lock
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1590 @itemx iso-2022-lock-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1591 @itemx iso-2022-lock-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1592 @itemx iso-2022-lock-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1593
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1594 Modeline indicator: @code{ISO7/Lock}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1595 system with @code{ascii} (G0) initially designated, using Locking-Shift
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1596 to invoke a 96-charset.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1597
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1598 @item iso-8859-1
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1599 @itemx iso-8859-1-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1600 @itemx iso-8859-1-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1601 @itemx iso-8859-1-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1602
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1603 Due to implementation, this is not a type @code{iso2022} coding system,
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1604 but rather an alias for the @code{raw-text} coding system.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1605
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1606 @item iso-8859-2
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1607 @itemx iso-8859-2-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1608 @itemx iso-8859-2-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1609 @itemx iso-8859-2-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1610
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1611 Modeline indicator: @code{MIME/Ltn-2}. A type @code{iso2022} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1612 system with @code{ascii} (G0) and @code{latin-iso8859-2} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1613 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1614
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1615 @item iso-8859-3
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1616 @itemx iso-8859-3-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1617 @itemx iso-8859-3-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1618 @itemx iso-8859-3-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1619
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1620 Modeline indicator: @code{MIME/Ltn-3}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1621 with @code{ascii} (G0) and @code{latin-iso8859-3} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1622 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1623
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1624 @item iso-8859-4
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1625 @itemx iso-8859-4-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1626 @itemx iso-8859-4-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1627 @itemx iso-8859-4-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1628
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1629 Modeline indicator: @code{MIME/Ltn-4}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1630 with @code{ascii} (G0) and @code{latin-iso8859-4} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1631 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1632
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1633 @item iso-8859-5
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1634 @itemx iso-8859-5-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1635 @itemx iso-8859-5-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1636 @itemx iso-8859-5-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1637
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1638 Modeline indicator: @code{ISO8/Cyr}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1639 @code{ascii} (G0) and @code{cyrillic-iso8859-5} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1640
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1641 @item iso-8859-7
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1642 @itemx iso-8859-7-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1643 @itemx iso-8859-7-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1644 @itemx iso-8859-7-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1645
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1646 Modeline indicator: @code{Grk}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1647 @code{ascii} (G0) and @code{greek-iso8859-7} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1648
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1649 @item iso-8859-8
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1650 @itemx iso-8859-8-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1651 @itemx iso-8859-8-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1652 @itemx iso-8859-8-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1653
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1654 Modeline indicator: @code{MIME/Hbrw}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1655 @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1656
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1657 @item iso-8859-9
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1658 @itemx iso-8859-9-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1659 @itemx iso-8859-9-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1660 @itemx iso-8859-9-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1661
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1662 Modeline indicator: @code{MIME/Ltn-5}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1663 with @code{ascii} (G0) and @code{latin-iso8859-9} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1664 invoked.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1665
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1666 @item koi8-r
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1667 @itemx koi8-r-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1668 @itemx koi8-r-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1669 @itemx koi8-r-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1670
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1671 Modeline indicator: @code{KOI8}. A type @code{ccl} coding-system used for
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1672 KOI8-R, an encoding of the Cyrillic alphabet.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1673
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1674 @item shift_jis
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1675 @itemx shift_jis-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1676 @itemx shift_jis-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1677 @itemx shift_jis-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1678
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1679 Modeline indicator: @code{Ja/SJIS}. A type @code{shift-jis} coding-system
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1680 implementing the Shift-JIS encoding for Japanese. The underscore is to
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1681 conform to the MIME charset implementing this encoding.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1682
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1683 @item tis-620
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1684 @itemx tis-620-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1685 @itemx tis-620-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1686 @itemx tis-620-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1687
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1688 Modeline indicator: @code{TIS620}. A type @code{ccl} encoding for Thai. The
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1689 external encoding is defined by TIS620, the internal encoding is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1690 peculiar to MULE, and called @code{thai-xtis}.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1691
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1692 @item viqr
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1693
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1694 Modeline indicator: @code{VIQR}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1695 system with Unix EOL convention (ie, no conversion) using
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1696 post-read-decode and pre-write-encode functions to translate the VIQR
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1697 coding system for Vietnamese.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1698
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1699 @item viscii
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1700 @itemx viscii-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1701 @itemx viscii-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1702 @itemx viscii-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1703
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1704 Modeline indicator: @code{VISCII}. A type @code{ccl} coding-system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1705 for VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1706 given priority by XEmacs.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1707
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1708 @item vscii
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1709 @itemx vscii-dos
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1710 @itemx vscii-mac
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1711 @itemx vscii-unix
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1712
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1713 Modeline indicator: @code{VSCII}. A type @code{ccl} coding-system used
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1714 for VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1715 given priority by XEmacs. Use
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1716 @code{(prefer-coding-system 'vietnamese-vscii)} to give priority to VSCII.
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1717
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1718 @end table
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1719
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1720 @node CCL, Category Tables, Coding Systems, MULE
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1721 @section CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1722
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1723 CCL (Code Conversion Language) is a simple structured programming
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1724 language designed for character coding conversions. A CCL program is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1725 compiled to CCL code (represented by a vector of integers) and executed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1726 by the CCL interpreter embedded in Emacs. The CCL interpreter
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1727 implements a virtual machine with 8 registers called @code{r0}, ...,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1728 @code{r7}, a number of control structures, and some I/O operators. Take
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1729 care when using registers @code{r0} (used in implicit @dfn{set}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1730 statements) and especially @code{r7} (used internally by several
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1731 statements and operations, especially for multiple return values and I/O
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1732 operations).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1733
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1734 CCL is used for code conversion during process I/O and file I/O for
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1735 non-ISO2022 coding systems. (It is the only way for a user to specify a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1736 code conversion function.) It is also used for calculating the code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1737 point of an X11 font from a character code. However, since CCL is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1738 designed as a powerful programming language, it can be used for more
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1739 generic calculation where efficiency is demanded. A combination of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1740 three or more arithmetic operations can be calculated faster by CCL than
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1741 by Emacs Lisp.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1742
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1743 @strong{Warning:} The code in @file{src/mule-ccl.c} and
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1744 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1745 description of CCL's semantics. The previous version of this section
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1746 contained several typos and obsolete names left from earlier versions of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1747 MULE, and many may remain. (I am not an experienced CCL programmer; the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1748 few who know CCL well find writing English painful.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1749
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1750 A CCL program transforms an input data stream into an output data
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1751 stream. The input stream, held in a buffer of constant bytes, is left
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1752 unchanged. The buffer may be filled by an external input operation,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1753 taken from an Emacs buffer, or taken from a Lisp string. The output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1754 buffer is a dynamic array of bytes, which can be written by an external
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1755 output operation, inserted into an Emacs buffer, or returned as a Lisp
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1756 string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1757
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1758 A CCL program is a (Lisp) list containing two or three members. The
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1759 first member is the @dfn{buffer magnification}, which indicates the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1760 required minimum size of the output buffer as a multiple of the input
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1761 buffer. It is followed by the @dfn{main block} which executes while
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1762 there is input remaining, and an optional @dfn{EOF block} which is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1763 executed when the input is exhausted. Both the main block and the EOF
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1764 block are CCL blocks.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1765
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1766 A @dfn{CCL block} is either a CCL statement or list of CCL statements.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1767 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1768 or an @dfn{assignment}, which is a list of a register to receive the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1769 assignment, an assignment operator, and an expression) or a @dfn{control
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1770 statement} (a list starting with a keyword, whose allowable syntax
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1771 depends on the keyword).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1772
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1773 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1774 * CCL Syntax:: CCL program syntax in BNF notation.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1775 * CCL Statements:: Semantics of CCL statements.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1776 * CCL Expressions:: Operators and expressions in CCL.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1777 * Calling CCL:: Running CCL programs.
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
1778 * CCL Example:: A trivial program to transform the Web's URL encoding.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1779 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1780
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1781 @node CCL Syntax, CCL Statements, , CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1782 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1783 @subsection CCL Syntax
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1784
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1785 The full syntax of a CCL program in BNF notation:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1786
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1787 @format
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1788 CCL_PROGRAM :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1789 (BUFFER_MAGNIFICATION
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1790 CCL_MAIN_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1791 [ CCL_EOF_BLOCK ])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1792
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1793 BUFFER_MAGNIFICATION := integer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1794 CCL_MAIN_BLOCK := CCL_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1795 CCL_EOF_BLOCK := CCL_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1796
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1797 CCL_BLOCK :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1798 STATEMENT | (STATEMENT [STATEMENT ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1799 STATEMENT :=
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1800 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE | CALL
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1801 | TRANSLATE | MAP | END
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1802
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1803 SET :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1804 (REG = EXPRESSION)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1805 | (REG ASSIGNMENT_OPERATOR EXPRESSION)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1806 | INT-OR-CHAR
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1807
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1808 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1809
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1810 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1811 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1812 LOOP := (loop STATEMENT [STATEMENT ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1813 BREAK := (break)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1814 REPEAT :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1815 (repeat)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1816 | (write-repeat [REG | INT-OR-CHAR | string])
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1817 | (write-read-repeat REG [INT-OR-CHAR | ARRAY])
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1818 READ :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1819 (read REG ...)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1820 | (read-if (REG OPERATOR ARG) CCL_BLOCK [CCL_BLOCK])
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1821 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1822 WRITE :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1823 (write REG ...)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1824 | (write EXPRESSION)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1825 | (write INT-OR-CHAR) | (write string) | (write REG ARRAY)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1826 | string
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1827 CALL := (call ccl-program-name)
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1828
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1829
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1830 TRANSLATE := ;; Not implemented under XEmacs, except mule-to-unicode and
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1831 ;; unicode-to-mule.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1832 (translate-character REG(table) REG(charset) REG(codepoint))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1833 | (translate-character SYMBOL REG(charset) REG(codepoint))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1834 | (mule-to-unicode REG(charset) REG(codepoint))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1835 | (unicode-to-mule REG(unicode,code) REG(CHARSET))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1836
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1837 END := (end)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1838
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1839 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1840 ARG := REG | INT-OR-CHAR
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1841 OPERATOR :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1842 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1843 | < | > | == | <= | >= | != | de-sjis | en-sjis
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1844 ASSIGNMENT_OPERATOR :=
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1845 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1846 ARRAY := '[' INT-OR-CHAR ... ']'
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1847 INT-OR-CHAR := integer | character
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1848
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1849 @end format
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1850
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1851 @node CCL Statements, CCL Expressions, CCL Syntax, CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1852 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1853 @subsection CCL Statements
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1854
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1855 The Emacs Code Conversion Language provides the following statement
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1856 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat},
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1857 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, @dfn{translate} and
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1858 @dfn{end}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1859
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1860 @heading Set statement:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1861
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1862 The @dfn{set} statement has three variants with the syntaxes
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1863 @samp{(@var{reg} = @var{expression})},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1864 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1865 @samp{@var{integer}}. The assignment operator variation of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1866 @dfn{set} statement works the same way as the corresponding C expression
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1867 statement does. The assignment operators are @code{+=}, @code{-=},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1868 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{|=}, @code{^=},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1869 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1870 "naked integer" @var{integer} is equivalent to a @var{set} statement of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1871 the form @code{(r0 = @var{integer})}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1872
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1873 @heading I/O statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1874
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1875 The @dfn{read} statement takes one or more registers as arguments. It
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1876 reads one byte (a C char) from the input into each register in turn.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1877
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1878 The @dfn{write} takes several forms. In the form @samp{(write @var{reg}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1879 ...)} it takes one or more registers as arguments and writes each in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1880 turn to the output. The integer in a register (interpreted as an
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 1734
diff changeset
1881 Ichar) is encoded to multibyte form (ie, Ibytes) and written to the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1882 current output buffer. If it is less than 256, it is written as is.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1883 The forms @samp{(write @var{expression})} and @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1884 @var{integer})} are treated analogously. The form @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1885 @var{string})} writes the constant string to the output. A
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1886 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1887 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1888 the @var{reg}th element of the @var{array} to the output.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1889
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1890 @heading Conditional statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1891
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1892 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1893 an optional @var{second CCL block} as arguments. If the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1894 @var{expression} evaluates to non-zero, the first @var{CCL block} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1895 executed. Otherwise, if there is a @var{second CCL block}, it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1896 executed.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1897
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1898 The @dfn{read-if} variant of the @dfn{if} statement takes an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1899 @var{expression}, a @var{CCL block}, and an optional @var{second CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1900 block} as arguments. The @var{expression} must have the form
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1901 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1902 a register or an integer). The @code{read-if} statement first reads
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1903 from the input into the first register operand in the @var{expression},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1904 then conditionally executes a CCL block just as the @code{if} statement
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1905 does.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1906
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1907 The @dfn{branch} statement takes an @var{expression} and one or more CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1908 blocks as arguments. The CCL blocks are treated as a zero-indexed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1909 array, and the @code{branch} statement uses the @var{expression} as the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1910 index of the CCL block to execute. Null CCL blocks may be used as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1911 no-ops, continuing execution with the statement following the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1912 @code{branch} statement in the containing CCL block. Out-of-range
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1913 values for the @var{expression} are also treated as no-ops.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1914
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1915 The @dfn{read-branch} variant of the @dfn{branch} statement takes an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1916 @var{register}, a @var{CCL block}, and an optional @var{second CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1917 block} as arguments. The @code{read-branch} statement first reads from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1918 the input into the @var{register}, then conditionally executes a CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1919 block just as the @code{branch} statement does.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1920
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1921 @heading Loop control statements:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1922
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1923 The @dfn{loop} statement creates a block with an implied jump from the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1924 end of the block back to its head. The loop is exited on a @code{break}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1925 statement, and continued without executing the tail by a @code{repeat}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1926 statement.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1927
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1928 The @dfn{break} statement, written @samp{(break)}, terminates the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1929 current loop and continues with the next statement in the current
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1930 block.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1931
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1932 The @dfn{repeat} statement has three variants, @code{repeat},
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1933 @code{write-repeat}, and @code{write-read-repeat}. Each continues the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1934 current loop from its head, possibly after performing I/O.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1935 @code{repeat} takes no arguments and does no I/O before jumping.
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
1936 @code{write-repeat} takes a single argument (a register, an
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1937 integer, or a string), writes it to the output, then jumps.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1938 @code{write-read-repeat} takes one or two arguments. The first must
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1939 be a register. The second may be an integer or an array; if absent, it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1940 is implicitly set to the first (register) argument.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1941 @code{write-read-repeat} writes its second argument to the output, then
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1942 reads from the input into the register, and finally jumps. See the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1943 @code{write} and @code{read} statements for the semantics of the I/O
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1944 operations for each type of argument.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1945
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1946 @heading Other statements:
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1947
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1948 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})},
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1949 executes a CCL program as a subroutine. It does not return a value to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1950 the caller, but can modify the register status.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1951
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1952 The @dfn{mule-to-unicode} statement translates an XEmacs character into a
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1953 UCS code point, using U+FFFD REPLACEMENT CHARACTER if the given XEmacs
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1954 character has no known corresponding code point. It takes two
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1955 arguments; the first is a register in which is stored the character set
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1956 ID of the character to be translated, and into which the UCS code is
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1957 stored. The second is a register which stores the XEmacs code of the
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1958 character in question; if it is from a multidimensional character set,
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1959 like most of the East Asian national sets, it's stored as @samp{((c1 <<
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1960 8) & c2)}, where @samp{c1} is the first code, and @samp{c2} the second.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1961 (That is, as a single integer, the high-order eight bits of which encode
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1962 the first position code, and the low order bits of which encode the
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1963 second.)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1964
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1965 The @dfn{unicode-to-mule} statement translates a Unicode code point
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1966 (an integer) into an XEmacs character. Its first argument is a register
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1967 containing the UCS code point; the code for the correspond character
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1968 will be written into this register, in the same format as for
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1969 @samp{mule-to-unicode} The second argument is a register into which will
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1970 be written the character set ID of the converted character.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 2818
diff changeset
1971
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1972 The @dfn{end} statement, written @samp{(end)}, terminates the CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1973 program successfully, and returns to caller (which may be a CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1974 program). It does not alter the status of the registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1975
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1976 @node CCL Expressions, Calling CCL, CCL Statements, CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1977 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1978 @subsection CCL Expressions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1979
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1980 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1981 consist of a single @var{operand}, either a register (one of @code{r0},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1982 ..., @code{r0}) or an integer. Complex expressions are lists of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1983 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1984 C, assignments are not expressions.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1985
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
1986 In the following table, @var{X} is the target resister for a @dfn{set}.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1987 In subexpressions, this is implicitly @code{r7}. This means that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1988 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1989 freely in subexpressions, since they return parts of their values in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1990 @code{r7}. @var{Y} may be an expression, register, or integer, while
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1991 @var{Z} must be a register or an integer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1992
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1993 @multitable @columnfractions .22 .14 .09 .55
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1994 @item Name @tab Operator @tab Code @tab C-like Description
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1995 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1996 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1997 @item CCL_MUL @tab @code{*} @tab 0x02 @tab X = Y * Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1998 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1999 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2000 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2001 @item CCL_OR @tab @code{|} @tab 0x06 @tab X = Y | Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2002 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2003 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2004 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2005 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) | Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2006 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2007 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2008 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2009 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2010 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2011 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2012 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2013 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2014 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2015 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2016 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2017 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2018 @end multitable
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2019
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2020 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8,
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2021 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2022 and CCL_DECODE_SJIS treat their first and second bytes as the high and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2023 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2024 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2025 complicated transformation of the Japanese standard JIS encoding to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2026 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2027 represent the SJIS operations in infix form.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2028
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2029 @node Calling CCL, CCL Example, CCL Expressions, CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2030 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2031 @subsection Calling CCL
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2032
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2033 CCL programs are called automatically during Emacs buffer I/O when the
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2034 external representation has a coding system type of @code{shift-jis},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2035 @code{big5}, or @code{ccl}. The program is specified by the coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2036 system (@pxref{Coding Systems}). You can also call CCL programs from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2037 other CCL programs, and from Lisp using these functions:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2038
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2039 @defun ccl-execute ccl-program status
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2040 Execute @var{ccl-program} with registers initialized by
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2041 @var{status}. @var{ccl-program} is a vector of compiled CCL code
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2042 created by @code{ccl-compile}. It is an error for the program to try to
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2043 execute a CCL I/O command. @var{status} must be a vector of nine
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2044 values, specifying the initial value for the R0, R1 .. R7 registers and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2045 for the instruction counter IC. A @code{nil} value for a register
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2046 initializer causes the register to be set to 0. A @code{nil} value for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2047 the IC initializer causes execution to start at the beginning of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2048 program. When the program is done, @var{status} is modified (by
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2049 side-effect) to contain the ending values for the corresponding
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2050 registers and IC.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2051 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2052
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2053 @defun ccl-execute-on-string ccl-program status string &optional continue
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2054 Execute @var{ccl-program} with initial @var{status} on
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2055 @var{string}. @var{ccl-program} is a vector of compiled CCL code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2056 created by @code{ccl-compile}. @var{status} must be a vector of nine
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2057 values, specifying the initial value for the R0, R1 .. R7 registers and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2058 for the instruction counter IC. A @code{nil} value for a register
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2059 initializer causes the register to be set to 0. A @code{nil} value for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2060 the IC initializer causes execution to start at the beginning of the
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2061 program. An optional fourth argument @var{continue}, if non-@code{nil}, causes
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2062 the IC to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2063 remain on the unsatisfied read operation if the program terminates due
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2064 to exhaustion of the input buffer. Otherwise the IC is set to the end
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2065 of the program. When the program is done, @var{status} is modified (by
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2066 side-effect) to contain the ending values for the corresponding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2067 registers and IC. Returns the resulting string.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2068 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2069
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2070 To call a CCL program from another CCL program, it must first be
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2071 registered:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2072
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2073 @defun register-ccl-program name ccl-program
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2074 Register @var{name} for CCL program @var{ccl-program} in
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2075 @code{ccl-program-table}. @var{ccl-program} should be the compiled form of
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2076 a CCL program, or @code{nil}. Return index number of the registered CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2077 program.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2078 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2079
442
abe6d1db359e Import from CVS: tag r21-2-36
cvs
parents: 440
diff changeset
2080 Information about the processor time used by the CCL interpreter can be
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2081 obtained using these functions:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2082
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2083 @defun ccl-elapsed-time
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2084 Returns the elapsed processor time of the CCL interpreter as cons of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2085 user and system time, as
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2086 floating point numbers measured in seconds. If only one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2087 overall value can be determined, the return value will be a cons of that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2088 value and 0.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2089 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2090
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2091 @defun ccl-reset-elapsed-time
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2092 Resets the CCL interpreter's internal elapsed time registers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2093 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2094
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2095 @node CCL Example, , Calling CCL, CCL
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2096 @comment Node, Next, Previous, Up
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2097 @subsection CCL Example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2098
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2099 In this section, we describe the implementation of a trivial coding
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2100 system to transform from the Web's URL encoding to XEmacs' internal
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2101 coding. Many people will have been first exposed to URL encoding when
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2102 they saw ``%20'' where they expected a space in a file's name on their
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2103 local hard disk; this can happen when a browser saves a file from the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2104 web and doesn't encode the name, as passed from the server, properly.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2105
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2106 URL encoding itself is underspecified with regard to encodings beyond
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2107 ASCII. The relevant document, RFC 1738, explicitly doesn't give any
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2108 information on how to encode non-ASCII characters, and the ``obvious''
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2109 way---use the %xx values for the octets of the eight bit MIME character
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2110 set in which the page was served---breaks when a user types a character
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2111 outside that character set. Best practice for web development is to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2112 serve all pages as UTF-8 and treat incoming form data as using that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2113 coding system. (Oh, and gamble that your clients won't ever want to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2114 type anything outside Unicode. But that's not so much of a gamble with
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2115 today's client operating systems.) We don't treat non-ASCII in this
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2116 example, as dealing with @samp{(read-multibyte-character ...)} and
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2117 errors therewith would make it much harder to understand.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2118
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2119 Since CCL isn't a very rich language, we move much of the logic that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2120 would ordinarily be computed from operations like @code{(member ..)},
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2121 @code{(and ...)} and @code{(or ...)} into tables, from which register
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2122 values are read and written, and on which @code{if} statements are
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2123 predicated. Much more of the implementation of this coding system is
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2124 occupied with constructing these tables---in normal Emacs Lisp---than it
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2125 is with actual CCL code.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2126
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2127 All the @code{defvar} statements we deal with in the next few sections
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2128 are surrounded by a @code{(eval-and-compile ...)}, which means that the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2129 logic which initializes these variables executes at compile time, and if
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2130 XEmacs loads the compiled version of the file, these variables are
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2131 initialized as constants.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2132
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2133 @menu
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2134 * Four bits to ASCII:: Two tables used for getting hex digits from ASCII.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2135 * URI Encoding constants:: Useful predefined characters.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2136 * Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2137 * Characters to be preserved:: No transformation needed for these characters.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2138 * The program to decode to internal format:: .
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2139 * The program to encode from internal format:: .
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2140 * The actual coding system:: .
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2141 @end menu
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2142
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2143 @node Four bits to ASCII, URI Encoding constants, , CCL Example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2144 @subsubsection Four bits to ASCII
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2145
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2146 The first @code{defvar} is for
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2147 @code{url-coding-high-order-nybble-as-ascii}, a 256-entry table that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2148 maps from an octet's value to the ASCII encoding for the hex value of
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2149 its most significant four bits. That might sound complex, but it isn't;
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2150 for decimal 65, hex value @samp{#x41}, the entry in the table is the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2151 ASCII encoding of `4'. For decimal 122, ASCII `z', hex value
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2152 @code{#x7a}, @code{(elt url-coding-high-order-nybble-as-ascii #x7a)}
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2153 after this file is loaded gives the ASCII encoding of 7.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2154
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2155 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2156 (defvar url-coding-high-order-nybble-as-ascii
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2157 (let ((val (make-vector 256 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2158 (i 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2159 (while (< i (length val))
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2160 (aset val i (char-to-int (aref (format "%02X" i) 0)))
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2161 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2162 val)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2163 "Table to find an ASCII version of an octet's most significant 4 bits.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2164 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2165
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2166 The next table, @code{url-coding-low-order-nybble-as-ascii} is almost
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2167 the same thing, but this time it has a map for the hex encoding of the
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2168 low-order four bits. So the sixty-fifth entry (offset @samp{#x41}) is
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2169 the ASCII encoding of `1', the hundred-and-twenty-second (offset
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2170 @samp{#x7a}) is the ASCII encoding of `A'.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2171
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2172 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2173 (defvar url-coding-low-order-nybble-as-ascii
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2174 (let ((val (make-vector 256 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2175 (i 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2176 (while (< i (length val))
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2177 (aset val i (char-to-int (aref (format "%02X" i) 1)))
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2178 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2179 val)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2180 "Table to find an ASCII version of an octet's least significant 4 bits.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2181 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2182
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2183 @node URI Encoding constants, Numeric to ASCII-hexadecimal conversion, Four bits to ASCII, CCL Example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2184 @subsubsection URI Encoding constants
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2185
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2186 Next, we have a couple of variables that make the CCL code more
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2187 readable. The first is the ASCII encoding of the percentage sign; this
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2188 character is used as an escape code, to start the encoding of a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2189 non-printable character. For historical reasons, URL encoding allows
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2190 the space character to be encoded as a plus sign--it does make typing
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2191 URLs like @samp{http://google.com/search?q=XEmacs+home+page} easier--and
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2192 as such, we have to check when decoding for this value, and map it to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2193 the space character. When doing this in CCL, we use the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2194 @code{url-coding-escaped-space-code} variable.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2195
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2196 @example
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2197 (defvar url-coding-escape-character-code (char-to-int ?%)
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2198 "The code point for the percentage sign, in ASCII.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2199
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2200 (defvar url-coding-escaped-space-code (char-to-int ?+)
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2201 "The URL-encoded value of the space character, that is, +.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2202 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2203
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2204 @node Numeric to ASCII-hexadecimal conversion, Characters to be preserved, URI Encoding constants, CCL Example
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2205 @subsubsection Numeric to ASCII-hexadecimal conversion
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2206
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2207 Now, we have a couple of utility tables that wouldn't be necessary in
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2208 a more expressive programming language than is CCL. The first is sixteen
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2209 in length, and maps a hexadecimal number to the ASCII encoding of that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2210 number; so zero maps to ASCII `0', ten maps to ASCII `A.' The second
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2211 does the reverse; that is, it maps an ASCII character to its value when
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2212 interpreted as a hexadecimal digit. ('A' => 10, 'c' => 12, '2' => 2, as
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2213 a few examples.)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2214
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2215 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2216 (defvar url-coding-hex-digit-table
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2217 (let ((i 0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2218 (val (make-vector 16 0)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2219 (while (< i 16)
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2220 (aset val i (char-to-int (aref (format "%X" i) 0)))
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2221 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2222 val)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2223 "A map from a hexadecimal digit's numeric value to its encoding in ASCII.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2224
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2225 (defvar url-coding-latin-1-as-hex-table
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2226 (let ((val (make-vector 256 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2227 (i 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2228 (while (< i (length val))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2229 ;; Get a hex val for this ASCII character.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2230 (aset val i (string-to-int (format "%c" i) 16))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2231 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2232 val)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2233 "A map from Latin 1 code points to their values as hexadecimal digits.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2234 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2235
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2236 @node Characters to be preserved, The program to decode to internal format, Numeric to ASCII-hexadecimal conversion, CCL Example
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2237 @subsubsection Characters to be preserved
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2238
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2239 And finally, the last of these tables. URL encoding says that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2240 alphanumeric characters, the underscore, hyphen and the full stop
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2241 @footnote{That's what the standards call it, though my North American
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2242 readers will be more familiar with it as the period character.} retain
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2243 their ASCII encoding, and don't undergo transformation.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2244 @code{url-coding-should-preserve-table} is an array in which the entries
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2245 are one if the corresponding ASCII character should be left as-is, and
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2246 zero if they should be transformed. So the entries for all the control
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2247 and most of the punctuation charcters are zero. Lisp programmers will
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2248 observe that this initialization is particularly inefficient, but
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2249 they'll also be aware that this is a long way from an inner loop where
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2250 every nanosecond counts.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2251
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2252 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2253 (defvar url-coding-should-preserve-table
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2254 (let ((preserve
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2255 (list ?- ?_ ?. ?a ?b ?c ?d ?e ?f ?g ?h ?i ?j ?k ?l ?m ?n ?o
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2256 ?p ?q ?r ?s ?t ?u ?v ?w ?x ?y ?z ?A ?B ?C ?D ?E ?F ?G
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2257 ?H ?I ?J ?K ?L ?M ?N ?O ?P ?Q ?R ?S ?T ?U ?V ?W ?X ?Y
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2258 ?Z ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2259 (i 0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2260 (res (make-vector 256 0)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2261 (while (< i 256)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2262 (when (member (int-char i) preserve)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2263 (aset res i 1))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2264 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2265 res)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2266 "A 256-entry array of flags, indicating whether or not to preserve an
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2267 octet as its ASCII encoding.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2268 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2269
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2270 @node The program to decode to internal format, The program to encode from internal format, Characters to be preserved, CCL Example
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2271 @subsubsection The program to decode to internal format
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2272
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2273 After the almost interminable tables, we get to the CCL. The first
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2274 CCL program, @code{ccl-decode-urlcoding} decodes from the URL coding to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2275 our internal format; since this version of CCL doesn't have support for
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2276 error checking on the input, we don't do any verification on it.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2277
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2278 The buffer magnification--approximate ratio of the size of the output
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2279 buffer to the size of the input buffer--is declared as one, because
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2280 fractional values aren't allowed. (Since all those %20's will map to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2281 ` ', the length of the output text will be less than that of the input
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2282 text.)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2283
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2284 So, first we read an octet from the input buffer into register
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2285 @samp{r0}, to set up the loop. Next, we start the loop, with a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2286 @code{(loop ...)} statement, and we check if the value in @samp{r0} is a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2287 percentage sign. (Note the comma before
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2288 @code{url-coding-escape-character-code}; since CCL is a Lisp macro
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2289 language, we can break out of the macro evaluation with a comman, and as
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2290 such, ``@code{,url-coding-escape-character-code}'' will be evaluated as a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2291 literal `37.')
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2292
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2293 If it is a percentage sign, we read the next two octets into @samp{r2}
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2294 and @samp{r3}, and convert them into their hexadecimal numeric values,
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2295 using the @code{url-coding-latin-1-as-hex-table} array declared above.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2296 (But again, it'll be interpreted as a literal array.) We then left
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2297 shift the first by four bits, mask the two together, and write the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2298 result to the output buffer.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2299
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2300 If it isn't a percentage sign, and it is a `+' sign, we write a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2301 space--hexadecimal 20--to the output buffer.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2302
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2303 If none of those things are true, we pass the octet to the output buffer
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2304 untransformed. (This could be a place to put error checking, in a more
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2305 expressive language.) We then read one more octet from the input
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2306 buffer, and move to the next iteration of the loop.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2307
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2308 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2309 (define-ccl-program ccl-decode-urlcoding
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2310 `(1
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2311 ((read r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2312 (loop
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2313 (if (r0 == ,url-coding-escape-character-code)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2314 ((read r2 r3)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2315 ;; Assign the value at offset r2 in the url-coding-hex-digit-table
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2316 ;; to r3.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2317 (r2 = r2 ,url-coding-latin-1-as-hex-table)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2318 (r3 = r3 ,url-coding-latin-1-as-hex-table)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2319 (r2 <<= 4)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2320 (r3 |= r2)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2321 (write r3))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2322 (if (r0 == ,url-coding-escaped-space-code)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2323 (write #x20)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2324 (write r0)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2325 (read r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2326 (repeat))))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2327 "CCL program to take URI-encoded ASCII text and transform it to our
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2328 internal encoding. ")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2329 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2330
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2331 @node The program to encode from internal format, The actual coding system, The program to decode to internal format, CCL Example
2640
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2332 @subsubsection The program to encode from internal format
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2333
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2334 Next, we see the CCL program to encode ASCII text as URL coded text.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2335 Here, the buffer magnification is specified as three, to account for ` '
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2336 mapping to %20, etc. As before, we read an octet from the input into
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2337 @samp{r0}, and move into the body of the loop. Next, we check if we
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2338 should preserve the value of this octet, by reading from offset
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2339 @samp{r0} in the @code{url-coding-should-preserve-table} into @samp{r1}.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2340 Then we have an @samp{if} statement predicated on the value in
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2341 @samp{r1}; for the true branch, we write the input octet directly. For
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2342 the false branch, we write a percentage sign, the ASCII encoding of the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2343 high four bits in hex, and then the ASCII encoding of the low four bits
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2344 in hex.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2345
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2346 We then read an octet from the input into @samp{r0}, and repeat the loop.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2347
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2348 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2349 (define-ccl-program ccl-encode-urlcoding
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2350 `(3
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2351 ((read r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2352 (loop
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2353 (r1 = r0 ,url-coding-should-preserve-table)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2354 ;; If we should preserve the value, just write the octet directly.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2355 (if r1
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2356 (write r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2357 ;; else, write a percentage sign, and the hex value of the octet, in
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2358 ;; an ASCII-friendly format.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2359 ((write ,url-coding-escape-character-code)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2360 (write r0 ,url-coding-high-order-nybble-as-ascii)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2361 (write r0 ,url-coding-low-order-nybble-as-ascii)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2362 (read r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2363 (repeat))))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2364 "CCL program to encode octets (almost) according to RFC 1738")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]
stephent
parents: 2367
diff changeset
2365 @end example
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2366
2690
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2367 @node The actual coding system, , The program to encode from internal format, CCL Example
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2368 @subsubsection The actual coding system
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2369
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2370 To actually create the coding system, we call
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2371 @samp{make-coding-system}. The first argument is the symbol that is to
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2372 be the name of the coding system, in our case @samp{url-coding}. The
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2373 second specifies that the coding system is to be of type
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2374 @samp{ccl}---there are several other coding system types available,
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2375 including, see the documentation for @samp{make-coding-system} for the
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2376 full list. Then there's a documentation string describing the wherefore
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2377 and caveats of the coding system, and the final argument is a property
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2378 list giving information about the CCL programs and the coding system's
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2379 mnemonic.
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2380
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2381 @example
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2382 (make-coding-system
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2383 'url-coding 'ccl
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2384 "The coding used by application/x-www-form-urlencoded HTTP applications.
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2385 This coding form doesn't specify anything about non-ASCII characters, so
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2386 make sure you've transformed to a seven-bit coding system first."
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2387 '(decode ccl-decode-urlcoding
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2388 encode ccl-encode-urlcoding
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2389 mnemonic "URLenc"))
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2390 @end example
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2391
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2392 If you're lucky, the @samp{url-coding} coding system describe here
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2393 should be available in the XEmacs package system. Otherwise, downloading
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2394 it from @samp{http://www.parhasard.net/url-coding.el} should work for
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2395 the foreseeable future.
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]
aidan
parents: 2640
diff changeset
2396
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2397 @node Category Tables, Unicode Support, CCL, MULE
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2398 @section Category Tables
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2399
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2400 A category table is a type of char table used for keeping track of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2401 categories. Categories are used for classifying characters for use in
440
8de8e3f6228a Import from CVS: tag r21-2-28
cvs
parents: 428
diff changeset
2402 regexps---you can refer to a category rather than having to use a
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2403 complicated [] expression (and category lookups are significantly
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2404 faster).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2405
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2406 There are 95 different categories available, one for each printable
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2407 character (including space) in the ASCII charset. Each category is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2408 designated by one such character, called a @dfn{category designator}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2409 They are specified in a regexp using the syntax @samp{\cX}, where X is a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2410 category designator. (This is not yet implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2411
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2412 A category table specifies, for each character, the categories that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2413 the character is in. Note that a character can be in more than one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2414 category. More specifically, a category table maps from a character to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2415 either the value @code{nil} (meaning the character is in no categories)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2416 or a 95-element bit vector, specifying for each of the 95 categories
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2417 whether the character is in that category.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2418
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2419 Special Lisp functions are provided that abstract this, so you do not
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2420 have to directly manipulate bit vectors.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2421
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2422 @defun category-table-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2423 This function returns @code{t} if @var{object} is a category table.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2424 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2425
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2426 @defun category-table &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2427 This function returns the current category table. This is the one
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2428 specified by the current buffer, or by @var{buffer} if it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2429 non-@code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2430 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2431
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2432 @defun standard-category-table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2433 This function returns the standard category table. This is the one used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2434 for new buffers.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2435 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2436
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2437 @defun copy-category-table &optional category-table
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2438 This function returns a new category table which is a copy of
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2439 @var{category-table}, which defaults to the standard category table.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2440 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2441
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2442 @defun set-category-table category-table &optional buffer
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2443 This function selects @var{category-table} as the new category table for
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2444 @var{buffer}. @var{buffer} defaults to the current buffer if omitted.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2445 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2446
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2447 @defun category-designator-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2448 This function returns @code{t} if @var{object} is a category designator (a
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2449 char in the range @samp{' '} to @samp{'~'}).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2450 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2451
444
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2452 @defun category-table-value-p object
576fb035e263 Import from CVS: tag r21-2-37
cvs
parents: 442
diff changeset
2453 This function returns @code{t} if @var{object} is a category table value.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2454 Valid values are @code{nil} or a bit vector of size 95.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2455 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2456
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2457
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2458 @c Added 2002-03-13 sjt
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2459 @node Unicode Support, Charset Unification, Category Tables, MULE
775
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2460 @section Unicode Support
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2461 @cindex unicode
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2462 @cindex utf-8
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2463 @cindex utf-16
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2464 @cindex ucs-2
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2465 @cindex ucs-4
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2466 @cindex bmp
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2467 @cindex basic multilingual plance
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2468
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2469 Unicode support was added by Ben Wing to XEmacs 21.5.6.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2470
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2471 @defun set-language-unicode-precedence-list list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2472 Set the language-specific precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2473 This is a list of charsets, which are consulted in order for a translation
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2474 matching a given Unicode character. If no matches are found, the charsets
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2475 in the default precedence list (see
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2476 @code{set-default-unicode-precedence-list}) are consulted, and then all
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2477 remaining charsets, in some arbitrary order.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2478
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2479 The language-specific precedence list is meant to be set as part of the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2480 language environment initialization; the default precedence list is meant
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2481 to be set by the user.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2482 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2483
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2484 @defun language-unicode-precedence-list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2485 Return the language-specific precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2486 See @code{set-language-unicode-precedence-list} for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2487 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2488
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2489 @defun set-default-unicode-precedence-list list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2490 Set the default precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2491 This is meant to be set by the user. See
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2492 `set-language-unicode-precedence-list' for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2493 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2494
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2495 @defun default-unicode-precedence-list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2496 Return the default precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2497 See @code{set-language-unicode-precedence-list} for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2498 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2499
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2500 @defun set-unicode-conversion character code
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2501 Add conversion information between Unicode codepoints and characters.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2502 @var{character} is one of the following:
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2503
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2504 @c #### fix this markup
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2505 -- A character (in which case @var{code} must be a non-negative integer)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2506 -- A vector of characters (in which case @var{code} must be a vector of
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2507 non-negative integers of the same length)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2508
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2509 Values of @var{code} above 2^20 - 1 are allowed for the purpose of specifying
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2510 private characters, but will cause errors when converted to UTF-16 or UTF-32.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2511 UCS-4 and UTF-8 can handle values to 2^31 - 1, but XEmacs Lisp integers top
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2512 out at 2^30 - 1.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2513 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2514
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2515 @defun character-to-unicode character
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2516 Convert @var{character} to Unicode codepoint.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2517 When there is no international support (i.e. MULE is not defined),
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2518 this function simply does @code{char-to-int}.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2519 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2520
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2521 @defun unicode-to-character code [charsets]
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2522 Convert Unicode codepoint @var{code} to character.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2523 @var{code} should be a non-negative integer.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2524 If @var{charsets} is given, it should be a list of charsets, and only those
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2525 charsets will be consulted, in the given order, for a translation.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2526 Otherwise, the default ordering of all charsets will be given (see
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2527 @code{set-unicode-charset-precedence}).
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2528
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2529 When there is no international support (i.e. MULE is not defined),
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2530 this function simply does @code{int-to-char} and ignores the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2531 @var{charsets} argument.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2532 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2533
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2534 @defun parse-unicode-translation-table filename charset start end offset flags
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2535 Parse Unicode translation data in @var{filename} for MULE @var{charset}.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2536 Data is text, in the form of one translation per line -- charset
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2537 codepoint followed by Unicode codepoint. Numbers are decimal or hex
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2538 \(preceded by 0x). Comments are marked with a #. Charset codepoints
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2539 for two-dimensional charsets should have the first octet stored in the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2540 high 8 bits of the hex number and the second in the low 8 bits.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2541
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2542 If @var{start} and @var{end} are given, only charset codepoints within
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2543 the given range will be processed. If @var{offset} is given, that value
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2544 will be added to all charset codepoints in the file to obtain the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2545 internal charset codepoint. @var{start} and @var{end} apply to the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2546 codepoints in the file, before @var{offset} is applied.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2547
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2548 (Note that, as usual, we assume that octets are in the range 32 to
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2549 127 or 33 to 126. If you have a table in kuten form, with octets in
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2550 the range 1 to 94, you will have to use an offset of 5140,
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2551 i.e. 0x2020.)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2552
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2553 @var{flags}, if specified, control further how the tables are interpreted
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2554 and are used to special-case certain known table weirdnesses in the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2555 Unicode tables:
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2556
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2557 @table @code
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2558 @item ignore-first-column'
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2559 Exactly as it sounds. The JIS X 0208 tables have 3 columns of data instead
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2560 of 2; the first is the Shift-JIS codepoint.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2561
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2562 @item big5
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2563 The charset codepoint is a Big Five codepoint; convert it to the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2564 proper hacked-up codepoint in `chinese-big5-1' or `chinese-big5-2'.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2565 @end table
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2566 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]
stephent
parents: 444
diff changeset
2567
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2568
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2569 @node Charset Unification, Charsets and Coding Systems, Unicode Support, MULE
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2570 @section Character Set Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2571
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2572 Mule suffers from a design defect that causes it to consider the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2573 Latin character sets to be disjoint. This results in oddities such as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2574 files containing both ISO 8859/1 and ISO 8859/15 codes, and using ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2575 2022 control sequences to switch between them, as well as more plausible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2576 but often unnecessary combinations like ISO 8859/1 with ISO 8859/2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2577 This can be very annoying when sending messages or even in simple
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2578 editing on a single host. Unification works around the problem by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2579 converting as many characters as possible to use a single Latin coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2580 character set before saving the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2581
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2582 This node and its children were ripp'd untimely from
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2583 @file{latin-unity.texi}, and have been quickly converted for use here.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2584 However as APIs are likely to diverge, beware of inaccuracies. Please
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2585 report any you discover with @kbd{M-x report-xemacs-bug RET}, as well
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2586 as any ambiguities or downright unintelligible passages.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2587
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2588 A lot of the stuff here doesn't belong here; it belongs in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2589 @ref{Top, , , xemacs, XEmacs User's Manual}. Report those as bugs,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2590 too, preferably with patches.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2591
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2592 @menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2593 * Overview:: Unification history and general information.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2594 * Usage:: An overview of the operation of Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2595 * Configuration:: Configuring Unification for use.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2596 * Theory of Operation:: How Unification works.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2597 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2598 * Charsets and Coding Systems:: Reference lists with annotations.
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
2599 * Unification Internals:: Utilities and implementation details.
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2600 @end menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2601
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2602 @node Overview, Usage, Charset Unification, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2603 @subsection An Overview of Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2604
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2605 Mule suffers from a design defect that causes it to consider the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2606 Latin character sets to be disjoint. This manifests itself when a user
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2607 enters characters using input methods associated with different coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2608 character sets into a single buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2609
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2610 A very important example involves email. Many sites, especially in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2611 U.S., default to use of the ISO 8859/1 coded character set (also called
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2612 ``Latin 1,'' though these are somewhat different concepts). However,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2613 ISO 8859/1 provides a generic CURRENCY SIGN character. Now that the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2614 Euro has become the official currency of most countries in Europe, this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2615 is unsatisfactory (and in practice, useless). So Europeans generally
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2616 use ISO 8859/15, which is nearly identical to ISO 8859/1 for most
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2617 languages, except that it substitutes EURO SIGN for CURRENCY SIGN.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2618
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2619 Suppose a European user yanks text from a post encoded in ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2620 into a message composition buffer, and enters some text including the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2621 Euro sign. Then Mule will consider the buffer to contain both ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2622 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2623 programmed) send the message as a multipart mixed MIME body!
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2624
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2625 This is clearly stupid. What is not as obvious is that, just as any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2626 European can include American English in their text because ASCII is a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2627 subset of ISO 8859/15, most European languages which use Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2628 characters (eg, German and Polish) can typically be mixed while using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2629 only one Latin coded character set (in this case, ISO 8859/2). However,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2630 this often depends on exactly what text is to be encoded.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2631
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2632 Unification works around the problem by converting as many characters as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2633 possible to use a single Latin coded character set before saving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2634 buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2635
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2636 @node Usage, Configuration, Overview, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2637 @subsection Operation of Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2638
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2639 Normally, Unification works in the background by installing
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2640 @code{unity-sanity-check} on @code{write-region-pre-hook}. This is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2641 done by default for the ISO 8859 Latin family of character sets. The
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2642 user activates this functionality for other character set families by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2643 invoking @code{enable-unification}, either interactively or in her
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2644 init file. @xref{Init File, , , xemacs}. Unification can be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2645 deactivated by invoking @code{disable-unification}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2646
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2647 Unification also provides a few functions for remapping or recoding the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2648 buffer by hand. To @dfn{remap} a character means to change the buffer
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2649 representation of the character by using another coded character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2650 Remapping never changes the identity of the character, but may involve
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2651 altering the code point of the character. To @dfn{recode} a character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2652 means to simply change the coded character set. Recoding never alters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2653 the code point of the character, but may change the identity of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2654 character. @xref{Theory of Operation}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2655
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2656 There are a few variables which determine which coding systems are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2657 always acceptable to Unification: @code{unity-ucs-list},
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2658 @code{unity-preferred-coding-system-list}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2659 @code{unity-preapproved-coding-system-list}. The latter two default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2660 to @code{()}, and should probably be avoided because they short-circuit
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2661 the sanity check. If you find you need to use them, consider reporting
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2662 it as a bug or request for enhancement. Because they seem unsafe, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2663 recommended interface is likely to change.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2664
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2665 @menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2666 * Basic Functionality:: User interface and customization.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2667 * Interactive Usage:: Treating text by hand.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2668 Also documents the hook function(s).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2669 @end menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2670
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2671
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2672 @node Basic Functionality, Interactive Usage, , Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2673 @section Basic Functionality
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2674
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2675 These functions and user options initialize and configure Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2676 In normal use, none of these should be needed.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2677
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2678 @strong{These APIs are certain to change.}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2679
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2680 @defun enable-unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2681 Set up hooks and initialize variables for latin-unity.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2682
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2683 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2684
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2685 This function is idempotent. It will reinitialize any hooks or variables
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2686 that are not in initial state.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2687 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2688
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2689 @defun disable-unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2690 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2691
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2692 Clean up hooks and void variables used by latin-unity.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2693 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2694
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2695 @defopt unity-ucs-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2696 List of coding systems considered to be universal.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2697
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2698 The default value is @code{'(utf-8 iso-2022-7 ctext escape-quoted)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2699
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2700 Order matters; coding systems earlier in the list will be preferred when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2701 recommending a coding system. These coding systems will not be used
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2702 without querying the user (unless they are also present in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2703 @code{unity-preapproved-coding-system-list}), and follow the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2704 @code{unity-preferred-coding-system-list} in the list of suggested
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2705 coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2706
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2707 If none of the preferred coding systems are feasible, the first in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2708 this list will be the default.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2709
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2710 Notes on certain coding systems: @code{escape-quoted} is a special
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2711 coding system used for autosaves and compiled Lisp in Mule. You should
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2712 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2713 never delete this, although it is rare that a user would want to use it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2714 directly. Unification does not try to be \"smart\" about other general
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2715 ISO 2022 coding systems, such as ISO-2022-JP. (They are not recognized
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2716 as equivalent to @code{iso-2022-7}.) If your preferred coding system is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2717 one of these, you may consider adding it to @code{unity-ucs-list}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2718 However, this will typically have the side effect that (eg) ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2719 files will be saved in 7-bit form with ISO 2022 escape sequences.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2720 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2721
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2722 Coding systems which are not Latin and not in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2723 @code{unity-ucs-list} are handled by short circuiting checks of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2724 coding system against the next two variables.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2725
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2726 @defopt unity-preapproved-coding-system-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2727 List of coding systems used without querying the user if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2728
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2729 The default value is @samp{(buffer-default preferred)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2730
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2731 The first feasible coding system in this list is used. The special values
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2732 @samp{preferred} and @samp{buffer-default} may be present:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2733
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2734 @table @code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2735 @item buffer-default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2736 Use the coding system used by @samp{write-region}, if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2737
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2738 @item preferred
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2739 Use the coding system specified by @samp{prefer-coding-system} if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2740 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2741
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2742 "Feasible" means that all characters in the buffer can be represented by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2743 the coding system. Coding systems in @samp{unity-ucs-list} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2744 always considered feasible. Other feasible coding systems are computed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2745 by @samp{unity-representations-feasible-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2746
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2747 Note that the first universal coding system in this list shadows all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2748 other coding systems. In particular, if your preferred coding system is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2749 a universal coding system, and @code{preferred} is a member of this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2750 list, unification will blithely convert all your files to that coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2751 system. This is considered a feature, but it may surprise most users.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2752 Users who don't like this behavior should put @code{preferred} in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2753 @code{unity-preferred-coding-system-list}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2754 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2755
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2756 @defopt unity-preferred-coding-system-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2757 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2758 List of coding systems suggested to the user if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2759
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2760 The default value is @samp{(iso-8859-1 iso-8859-15 iso-8859-2 iso-8859-3
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2761 iso-8859-4 iso-8859-9)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2762
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2763 If none of the coding systems in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2764 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2765 @code{unity-preapproved-coding-system-list} are feasible, this list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2766 will be recommended to the user, followed by the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2767 @code{unity-ucs-list}. The first coding system in this list is default. The
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2768 special values @samp{preferred} and @samp{buffer-default} may be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2769 present:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2770
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2771 @table @code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2772 @item buffer-default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2773 Use the coding system used by @samp{write-region}, if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2774
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2775 @item preferred
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2776 Use the coding system specified by @samp{prefer-coding-system} if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2777 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2778
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2779 "Feasible" means that all characters in the buffer can be represented by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2780 the coding system. Coding systems in @samp{unity-ucs-list} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2781 always considered feasible. Other feasible coding systems are computed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2782 by @samp{unity-representations-feasible-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2783 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2784
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2785
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2786 @defvar unity-iso-8859-1-aliases
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2787 List of coding systems to be treated as aliases of ISO 8859/1.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2788
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2789 The default value is '(iso-8859-1).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2790
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2791 This is not a user variable; to customize input of coding systems or
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2792 charsets, @samp{unity-coding-system-alias-alist} or
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2793 @samp{unity-charset-alias-alist}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2794 @end defvar
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2795
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2796
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2797 @node Interactive Usage, , Basic Functionality, Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2798 @section Interactive Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2799
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2800 First, the hook function @code{unity-sanity-check} is documented.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2801 (It is placed here because it is not an interactive function, and there
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2802 is not yet a programmer's section of the manual.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2803
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2804 These functions provide access to internal functionality (such as the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2805 remapping function) and to extra functionality (the recoding functions
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2806 and the test function).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2807
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2808
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2809 @defun unity-sanity-check begin end filename append visit lockname &optional coding-system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2810
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2811 Check if @var{coding-system} can represent all characters between
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2812 @var{begin} and @var{end}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2813
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2814 For compatibility with old broken versions of @code{write-region},
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2815 @var{coding-system} defaults to @code{buffer-file-coding-system}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2816 @var{filename}, @var{append}, @var{visit}, and @var{lockname} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2817 ignored.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2818
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2819 Return nil if buffer-file-coding-system is not (ISO-2022-compatible)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2820 Latin. If @code{buffer-file-coding-system} is safe for the charsets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2821 actually present in the buffer, return it. Otherwise, ask the user to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2822 choose a coding system, and return that.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2823
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2824 This function does @emph{not} do the safe thing when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2825 @code{buffer-file-coding-system} is nil (aka no-conversion). It
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2826 considers that ``non-Latin,'' and passes it on to the Mule detection
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2827 mechanism.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2828
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2829 This function is intended for use as a @code{write-region-pre-hook}. It
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2830 does nothing except return @var{coding-system} if @code{write-region}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2831 handlers are inhibited.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2832 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2833
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2834 @defun unity-buffer-representations-feasible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2835
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2836 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2837
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2838 Apply unity-region-representations-feasible to the current buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2839 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2840
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2841 @defun unity-region-representations-feasible begin end &optional buf
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2842
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2843 Return character sets that can represent the text from @var{begin} to @var{end} in @var{buf}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2844
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2845 @var{buf} defaults to the current buffer. Called interactively, will be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2846 applied to the region. Function assumes @var{begin} <= @var{end}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2847
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2848 The return value is a cons. The car is the list of character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2849 that can individually represent all of the non-ASCII portion of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2850 buffer, and the cdr is the list of character sets that can
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2851 individually represent all of the ASCII portion.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2852
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2853 The following is taken from a comment in the source. Please refer to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2854 the source to be sure of an accurate description.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2855
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2856 The basic algorithm is to map over the region, compute the set of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2857 charsets that can represent each character (the ``feasible charset''),
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2858 and take the intersection of those sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2859
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2860 The current implementation takes advantage of the fact that ASCII
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2861 characters are common and cannot change asciisets. Then using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2862 skip-chars-forward makes motion over ASCII subregions very fast.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2863
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2864 This same strategy could be applied generally by precomputing classes
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2865 of characters equivalent according to their effect on latinsets, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2866 adding a whole class to the skip-chars-forward string once a member is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2867 found.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2868
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2869 Probably efficiency is a function of the number of characters matched,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2870 or maybe the length of the match string? With @code{skip-category-forward}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2871 over a precomputed category table it should be really fast. In practice
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2872 for Latin character sets there are only 29 classes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2873 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2874
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2875 @defun unity-remap-region begin end character-set &optional coding-system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2876
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2877 Remap characters between @var{begin} and @var{end} to equivalents in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2878 @var{character-set}. Optional argument @var{coding-system} may be a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2879 coding system name (a symbol) or nil. Characters with no equivalent are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2880 left as-is.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2881
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2882 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2883 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2884 prompts for @var{character-set}. The function does completion, knows
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2885 how to guess a character set name from a coding system name, and also
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2886 provides some common aliases. See @code{unity-guess-charset}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2887 There is no way to specify @var{coding-system}, as it has no useful
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2888 function interactively.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2889
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2890 Return @var{coding-system} if @var{coding-system} can encode all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2891 characters in the region, t if @var{coding-system} is nil and the coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2892 system with G0 = 'ascii and G1 = @var{character-set} can encode all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2893 characters, and otherwise nil. Note that a non-null return does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2894 @emph{not} mean it is safe to write the file, only the specified region.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2895 (This behavior is useful for multipart MIME encoding and the like.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2896
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2897 Note: by default this function is quite fascist about universal coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2898 systems. It only admits @samp{utf-8}, @samp{iso-2022-7}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2899 @samp{ctext}. Customize @code{unity-approved-ucs-list} to change
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2900 this.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2901
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2902 This function remaps characters that are artificially distinguished by Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2903 internal code. It may change the code point as well as the character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2904 To recode characters that were decoded in the wrong coding system, use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2905 @code{unity-recode-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2906 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2907
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2908 @defun unity-recode-region begin end wrong-cs right-cs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2909
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2910 Recode characters between @var{begin} and @var{end} from @var{wrong-cs}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2911 to @var{right-cs}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2912
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2913 @var{wrong-cs} and @var{right-cs} are character sets. Characters retain
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2914 the same code point but the character set is changed. Only characters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2915 from @var{wrong-cs} are changed to @var{right-cs}. The identity of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2916 character may change. Note that this could be dangerous, if characters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2917 whose identities you do not want changed are included in the region.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2918 This function cannot guess which characters you want changed, and which
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2919 should be left alone.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2920
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2921 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2922 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2923 prompts for @var{wrong-cs} and @var{right-cs}. The function does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2924 completion, knows how to guess a character set name from a coding system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2925 name, and also provides some common aliases. See
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2926 @code{unity-guess-charset}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2927
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2928 Another way to accomplish this, but using coding systems rather than
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2929 character sets to specify the desired recoding, is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2930 @samp{unity-recode-coding-region}. That function may be faster
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2931 but is somewhat more dangerous, because it may recode more than one
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2932 character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2933
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2934 To change from one Mule representation to another without changing identity
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2935 of any characters, use @samp{unity-remap-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2936 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2937
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2938 @defun unity-recode-coding-region begin end wrong-cs right-cs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2939
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2940 Recode text between @var{begin} and @var{end} from @var{wrong-cs} to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2941 @var{right-cs}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2942
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2943 @var{wrong-cs} and @var{right-cs} are coding systems. Characters retain
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2944 the same code point but the character set is changed. The identity of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2945 characters may change. This is an inherently dangerous function;
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2946 multilingual text may be recoded in unexpected ways. #### It's also
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2947 dangerous because the coding systems are not sanity-checked in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2948 current implementation.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2949
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2950 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2951 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2952 prompts for @var{wrong-cs} and @var{right-cs}. The function does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2953 completion, knows how to guess a coding system name from a character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2954 name, and also provides some common aliases. See
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2955 @code{unity-guess-coding-system}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2956
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2957 Another, safer, way to accomplish this, using character sets rather
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2958 than coding systems to specify the desired recoding, is to use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2959 @c #### fixme in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2960 @code{unity-recode-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2961
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2962 To change from one Mule representation to another without changing identity
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2963 of any characters, use @code{unity-remap-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2964 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2965
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2966 Helper functions for input of coding system and character set names.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2967
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2968 @defun unity-guess-charset candidate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2969 Guess a charset based on the symbol @var{candidate}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2970
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2971 @var{candidate} itself is not tried as the value.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2972
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2973 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2974 the values in @samp{unity-charset-alias-alist}."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2975 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2976
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2977 @defun unity-guess-coding-system candidate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2978 Guess a coding system based on the symbol @var{candidate}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2979
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2980 @var{candidate} itself is not tried as the value.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2981
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2982 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2983 the values in @samp{unity-coding-system-alias-alist}."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2984 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2985
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2986 @defun unity-example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2987
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2988 A cheesy example for Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2989
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2990 At present it just makes a multilingual buffer. To test, setq
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2991 buffer-file-coding-system to some value, make the buffer dirty (eg
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2992 with RET BackSpace), and save.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2993 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2994
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2995
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2996 @node Configuration, Theory of Operation, Usage, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2997 @subsection Configuring Unification for Use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2998
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
2999 If you want Unification to be automatically initialized, invoke
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3000 @samp{enable-unification} with no arguments in your init file.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3001 @xref{Init File, , , xemacs}. If you are using GNU Emacs or an XEmacs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3002 earlier than 21.1, you should also load @file{auto-autoloads} using the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3003 full path (@emph{never} @samp{require} @file{auto-autoloads} libraries).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3004
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3005 You may wish to define aliases for commonly used character sets and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3006 coding systems for convenience in input.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3007
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3008 @defopt unity-charset-alias-alist
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3009 Alist mapping aliases to Mule charset names (symbols)."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3010
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3011 The default value is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3012 @example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3013 ((latin-1 . latin-iso8859-1)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3014 (latin-2 . latin-iso8859-2)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3015 (latin-3 . latin-iso8859-3)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3016 (latin-4 . latin-iso8859-4)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3017 (latin-5 . latin-iso8859-9)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3018 (latin-9 . latin-iso8859-15)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3019 (latin-10 . latin-iso8859-16))
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3020 @end example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3021
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3022 If a charset does not exist on your system, it will not complete and you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3023 will not be able to enter it in response to prompts. A real charset
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3024 with the same name as an alias in this list will shadow the alias.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3025 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3026
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3027 @defopt unity-coding-system-alias-alist nil
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3028 Alist mapping aliases to Mule coding system names (symbols).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3029
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3030 The default value is @samp{nil}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3031 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3032
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3033
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3034 @node Theory of Operation, What Unification Cannot Do for You, Configuration, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3035 @subsection Theory of Operation
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3036
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3037 Standard encodings suffer from the design defect that they do not
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3038 provide a reliable way to recognize which coded character sets in use.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3039 @xref{What Unification Cannot Do for You}. There are scores of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3040 character sets which can be represented by a single octet (8-bit byte),
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3041 whose union contains many hundreds of characters. Obviously this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3042 results in great confusion, since you can't tell the players without a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3043 scorecard, and there is no scorecard.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3044
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3045 There are two ways to solve this problem. The first is to create a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3046 universal coded character set. This is the concept behind Unicode.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3047 However, there have been satisfactory (nearly) universal character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3048 for several decades, but even today many Westerners resist using Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3049 because they consider its space requirements excessive. On the other
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3050 hand, Asians dislike Unicode because they consider it to be incomplete.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3051 (This is partly, but not entirely, political.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3052
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3053 In any case, Unicode only solves the internal representation problem.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3054 Many data sets will contain files in ``legacy'' encodings, and Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3055 does not help distinguish among them.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3056
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3057 The second approach is to embed information about the encodings used in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3058 a document in its text. This approach is taken by the ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3059 standard. This would solve the problem completely from the users' of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3060 view, except that ISO 2022 is basically not implemented at all, in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3061 sense that few applications or systems implement more than a small
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3062 subset of ISO 2022 functionality. This is due to the fact that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3063 mono-literate users object to the presence of escape sequences in their
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3064 texts (which they, with some justification, consider data corruption).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3065 Programmers are more than willing to cater to these users, since
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3066 implementing ISO 2022 is a painstaking task.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3067
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3068 In fact, Emacs/Mule adopts both of these approaches. Internally it uses
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3069 a universal character set, @dfn{Mule code}. Externally it uses ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3070 techniques both to save files in forms robust to encoding issues, and as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3071 hints when attempting to ``guess'' an unknown encoding. However, Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3072 suffers from a design defect, namely it embeds the character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3073 information that ISO 2022 attaches to runs of characters by introducing
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3074 them with a control sequence in each character. That causes Mule to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3075 consider the ISO Latin character sets to be disjoint. This manifests
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3076 itself when a user enters characters using input methods associated with
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3077 different coded character sets into a single buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3078
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3079 There are two problems stemming from this design. First, Mule
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
3080 represents the same character in different ways. Abstractly, 'ó'
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3081 (LATIN SMALL LETTER O WITH ACUTE) can get represented as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3082 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
3083 'óó' in the display might actually be represented [latin-iso8859-1
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3084 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3085 #xF3 ESC - A] in the file. In some cases this treatment would be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3086 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3087 (the CJK ideographic character meaning ``one'')), and although arguably
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3088 incorrect it is convenient when mixing the CJK scripts. But in the case
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3089 of the Latin scripts this is wrong.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3090
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3091 Worse yet, it is very likely to occur when mixing ``different'' encodings
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3092 (such as ISO 8859/1 and ISO 8859/15) that differ only in a few code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3093 points that are almost never used. A very important example involves
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3094 email. Many sites, especially in the U.S., default to use of the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3095 8859/1 coded character set (also called ``Latin 1,'' though these are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3096 somewhat different concepts). However, ISO 8859/1 provides a generic
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3097 CURRENCY SIGN character. Now that the Euro has become the official
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3098 currency of most countries in Europe, this is unsatisfactory (and in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3099 practice, useless). So Europeans generally use ISO 8859/15, which is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3100 nearly identical to ISO 8859/1 for most languages, except that it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3101 substitutes EURO SIGN for CURRENCY SIGN.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3102
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3103 Suppose a European user yanks text from a post encoded in ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3104 into a message composition buffer, and enters some text including the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3105 Euro sign. Then Mule will consider the buffer to contain both ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3106 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3107 programmed) send the message as a multipart mixed MIME body!
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3108
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3109 This is clearly stupid. What is not as obvious is that, just as any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3110 European can include American English in their text because ASCII is a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3111 subset of ISO 8859/15, most European languages which use Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3112 characters (eg, German and Polish) can typically be mixed while using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3113 only one Latin coded character set (in the case of German and Polish,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3114 ISO 8859/2). However, this often depends on exactly what text is to be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3115 encoded (even for the same pair of languages).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3116
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3117 Unification works around the problem by converting as many characters as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3118 possible to use a single Latin coded character set before saving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3119 buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3120
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3121 Because the problem is rarely noticable in editing a buffer, but tends
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3122 to manifest when that buffer is exported to a file or process, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3123 Unification package uses the strategy of examining the buffer prior to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3124 export. If use of multiple Latin coded character sets is detected,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3125 Unification attempts to unify them by finding a single coded character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3126 set which contains all of the Latin characters in the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3127
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3128 The primary purpose of Unification is to fix the problem by giving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3129 user the choice to change the representation of all characters to one
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3130 character set and give sensible recommendations based on context. In
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
3131 the 'ó' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3132 both will be suggested. In the EURO SIGN example, only ISO 8859/15
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3133 makes sense, and that is what will be recommended. In both cases, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3134 user will be reminded that there are universal encodings available.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3135
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3136 I call this @dfn{remapping} (from the universal character set to a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3137 particular ISO 8859 coded character set). It is mere accident that this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3138 letter has the same code point in both character sets. (Not entirely,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3139 but there are many examples of Latin characters that have different code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3140 points in different Latin-X sets.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3141
1188
11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]
youngs
parents: 1183
diff changeset
3142 Note that, in the 'ó' example, that treating the buffer in this way will
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3143 result in a representation such as [latin-iso8859-2
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3144 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3].
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3145 This is guaranteed to occasionally result in the second problem you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3146 observed, to which we now turn.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3147
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3148 This problem is that, although the file is intended to be an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3149 ISO-8859/2-encoded file, in an ISO 8859/1 locale Mule (and every POSIX
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3150 compliant program---this is required by the standard, obvious if you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3151 think a bit, @pxref{What Unification Cannot Do for You}) will read that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3152 file as [latin-iso8859-1 #x73] [latin-iso8859-1 #x73]. Of course this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3153 is no problem if all of the characters in the file are contained in ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3154 8859/1, but suppose there are some which are not, but are contained in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3155 the (intended) ISO 8859/2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3156
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3157 You now want to fix this, but not by finding the same character in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3158 another set. Instead, you want to simply change the character set that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3159 Mule associates with that buffer position without changing the code.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3160 (This is conceptually somewhat distinct from the first problem, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3161 logically ought to be handled in the code that defines coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3162 However, unification is not an unreasonable place for it.) Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3163 provides two functions (one fast and dangerous, the other slow and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3164 careful) to handle this. I call this @dfn{recoding}, because the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3165 transformation actually involves @emph{encoding} the buffer to file
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3166 representation, then @emph{decoding} it to buffer representation (in a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3167 different character set). This cannot be done automatically because
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3168 Mule can have no idea what the correct encoding is---after all, it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3169 already gave you its best guess. @xref{What Unification Cannot Do for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3170 You}. So these functions must be invoked by the user. @xref{Interactive
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3171 Usage}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3172
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3173
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3174 @node What Unification Cannot Do for You, Unification Internals, Theory of Operation, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3175 @subsection What Unification Cannot Do for You
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3176
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3177 Unification @strong{cannot} save you if you insist on exporting data in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3178 8-bit encodings in a multilingual environment. @emph{You will
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3179 eventually corrupt data if you do this.} It is not Mule's, or any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3180 application's, fault. You will have only yourself to blame; consider
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3181 yourself warned. (It is true that Mule has bugs, which make Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3182 somewhat more dangerous and inconvenient than some naive applications.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3183 We're working to address those, but no application can remedy the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3184 inherent defect of 8-bit encodings.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3185
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3186 Use standard universal encodings, preferably Unicode (UTF-8) unless
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3187 applicable standards indicate otherwise. The most important such case
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3188 is Internet messages, where MIME should be used, whether or not the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3189 subordinate encoding is a universal encoding. (Note that since one of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3190 the important provisions of MIME is the @samp{Content-Type} header,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3191 which has the charset parameter, MIME is to be considered a universal
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3192 encoding for the purposes of this manual. Of course, technically
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3193 speaking it's neither a coded character set nor a coding extension
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3194 technique compliant with ISO 2022.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3195
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3196 As mentioned earlier, the problem is that standard encodings suffer from
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3197 the design defect that they do not provide a reliable way to recognize
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3198 which coded character sets are in use. There are scores of character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3199 sets which can be represented by a single octet (8-bit byte), whose
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3200 union contains many hundreds of characters. Thus any 8-bit coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3201 character set must contain characters that share code points used for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3202 different characters in other coded character sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3203
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3204 This means that a given file's intended encoding cannot be identified
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3205 with 100% reliability unless it contains encoding markers such as those
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3206 provided by MIME or ISO 2022.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3207
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3208 Unification actually makes it more likely that you will have problems of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3209 this kind. Traditionally Mule has been ``helpful'' by simply using an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3210 ISO 2022 universal coding system when the current buffer coding system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3211 cannot handle all the characters in the buffer. This has the effect
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3212 that, because the file contains control sequences, it is not recognized
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3213 as being in the locale's normal 8-bit encoding. It may be annoying if
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3214 you are not a Mule expert, but your data is automatically recoverable
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3215 with a tool you already have: Mule.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3216
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3217 However, with unification, Mule converts to a single 8-bit character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3218 when possible. But typically this will @emph{not} be in your usual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3219 locale. Ie, the times that an ISO 8859/1 user will need Unification is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3220 when there are ISO 8859/2 characters in the buffer. But then most
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3221 likely the file will be saved in a pure 8-bit encoding that is not ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3222 8859/1, ie, ISO 8859/2. Mule's autorecognizer (which is probably the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3223 most sophisticated yet available) cannot tell the difference between ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3224 8859/1 and ISO 8859/2, and in a Western European locale will choose the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3225 former even though the latter was intended. Even the extension
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3226 (``statistical recognition'') planned for XEmacs 22 is unlikely to be at
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3227 all accurate in the case of mixed codes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3228
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3229 So now consider adding some additional ISO 8859/1 text to the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3230 If it includes any ISO 8859/1 codes that are used by different
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3231 characters in ISO 8859/2, you now have a file that cannot be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3232 mechanically disentangled. You need a human being who can recognize
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3233 that @emph{this is German and Swedish} and stays in Latin-1, while
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3234 @emph{that is Polish} and needs to be recoded to Latin-2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3235
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3236 Moral: switch to a universal coded character set, preferably Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3237 using the UTF-8 transformation format. If you really need the space,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3238 compress your files.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3239
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3240
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3241 @node Unification Internals, , What Unification Cannot Do for You, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3242 @subsection Internals
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3243
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3244 No internals documentation yet.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3245
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3246 @file{unity-utils.el} provides one utility function.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3247
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3248 @defun unity-dump-tables
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3249
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3250 Dump the temporary table created by loading @file{unity-utils.el}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3251 to @file{unity-tables.el}. Loading the latter file initializes
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3252 @samp{unity-equivalences}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3253 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3254
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3255
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3256 @node Charsets and Coding Systems, , Charset Unification, MULE
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3257 @subsection Charsets and Coding Systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3258
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3259 This section provides reference lists of Mule charsets and coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3260 systems. Mule charsets are typically named by character set and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3261 standard.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3262
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3263 @table @strong
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3264 @item ASCII variants
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3265
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3266 Identification of equivalent characters in these sets is not properly
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3267 implemented. Unification does not distinguish the two charsets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3268
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3269 @samp{ascii} @samp{latin-jisx0201}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3270
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3271 @item Extended Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3272
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3273 Characters from the following ISO 2022 conformant charsets are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3274 identified with equivalents in other charsets in the group by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3275 Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3276
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3277 @samp{latin-iso8859-1} @samp{latin-iso8859-15} @samp{latin-iso8859-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3278 @samp{latin-iso8859-3} @samp{latin-iso8859-4} @samp{latin-iso8859-9}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3279 @samp{latin-iso8859-13} @samp{latin-iso8859-16}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3280
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3281 The follow charsets are Latin variants which are not understood by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3282 Unification. In addition, many of the Asian language standards provide
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3283 ASCII, at least, and sometimes other Latin characters. None of these
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3284 are identified with their ISO 8859 equivalents.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3285
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3286 @samp{vietnamese-viscii-lower}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3287 @samp{vietnamese-viscii-upper}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3288
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3289 @item Other character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3290
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3291 @samp{arabic-1-column}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3292 @samp{arabic-2-column}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3293 @samp{arabic-digit}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3294 @samp{arabic-iso8859-6}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3295 @samp{chinese-big5-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3296 @samp{chinese-big5-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3297 @samp{chinese-cns11643-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3298 @samp{chinese-cns11643-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3299 @samp{chinese-cns11643-3}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3300 @samp{chinese-cns11643-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3301 @samp{chinese-cns11643-5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3302 @samp{chinese-cns11643-6}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3303 @samp{chinese-cns11643-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3304 @samp{chinese-gb2312}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3305 @samp{chinese-isoir165}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3306 @samp{cyrillic-iso8859-5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3307 @samp{ethiopic}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3308 @samp{greek-iso8859-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3309 @samp{hebrew-iso8859-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3310 @samp{ipa}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3311 @samp{japanese-jisx0208}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3312 @samp{japanese-jisx0208-1978}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3313 @samp{japanese-jisx0212}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3314 @samp{katakana-jisx0201}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3315 @samp{korean-ksc5601}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3316 @samp{sisheng}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3317 @samp{thai-tis620}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3318 @samp{thai-xtis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3319
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3320 @item Non-graphic charsets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3321
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3322 @samp{control-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3323 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3324
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3325 @table @strong
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3326 @item No conversion
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3327
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3328 Some of these coding systems may specify EOL conventions. Note that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3329 @samp{iso-8859-1} is a no-conversion coding system, not an ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3330 coding system. Although unification attempts to compensate for this, it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3331 is possible that the @samp{iso-8859-1} coding system will behave
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3332 differently from other ISO 8859 coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3333
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3334 @samp{binary} @samp{no-conversion} @samp{raw-text} @samp{iso-8859-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3335
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3336 @item Latin coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3337
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3338 These coding systems are all single-byte, 8-bit ISO 2022 coding systems,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3339 combining ASCII in the GL register (bytes with high-bit clear) and an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3340 extended Latin character set in the GR register (bytes with high-bit set).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3341
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3342 @samp{iso-8859-15} @samp{iso-8859-2} @samp{iso-8859-3} @samp{iso-8859-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3343 @samp{iso-8859-9} @samp{iso-8859-13} @samp{iso-8859-14} @samp{iso-8859-16}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3344
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3345 These coding systems are single-byte, 8-bit coding systems that do not
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3346 conform to international standards. They should be avoided in all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3347 potentially multilingual contexts, including any text distributed over
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3348 the Internet and World Wide Web.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3349
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3350 @samp{windows-1251}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3351
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3352 @item Multilingual coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3353
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3354 The following ISO-2022-based coding systems are useful for multilingual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3355 text.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3356
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3357 @samp{ctext} @samp{iso-2022-lock} @samp{iso-2022-7} @samp{iso-2022-7bit}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3358 @samp{iso-2022-7bit-ss2} @samp{iso-2022-8} @samp{iso-2022-8bit-ss2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3359
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3360 XEmacs also supports Unicode with the Mule-UCS package. These are the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3361 preferred coding systems for multilingual use. (There is a possible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3362 exception for texts that mix several Asian ideographic character sets.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3363
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3364 @samp{utf-16-be} @samp{utf-16-be-no-signature} @samp{utf-16-le}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3365 @samp{utf-16-le-no-signature} @samp{utf-7} @samp{utf-7-safe}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3366 @samp{utf-8} @samp{utf-8-ws}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3367
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3368 Development versions of XEmacs (the 21.5 series) support Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3369 internally, with (at least) the following coding systems implemented:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3370
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3371 @samp{utf-16-be} @samp{utf-16-be-bom} @samp{utf-16-le}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3372 @samp{utf-16-le-bom} @samp{utf-8} @samp{utf-8-bom}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3373
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3374 @item Asian ideographic languages
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3375
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3376 The following coding systems are based on ISO 2022, and are more or less
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3377 suitable for encoding multilingual texts. They all can represent ASCII
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3378 at least, and sometimes several other foreign character sets, without
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3379 resort to arbitrary ISO 2022 designations. However, these subsets are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3380 not identified with the corresponding national standards in XEmacs Mule.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3381
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3382 @samp{chinese-euc} @samp{cn-big5} @samp{cn-gb-2312} @samp{gb2312}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3383 @samp{hz} @samp{hz-gb-2312} @samp{old-jis} @samp{japanese-euc}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3384 @samp{junet} @samp{euc-japan} @samp{euc-jp} @samp{iso-2022-jp}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3385 @samp{iso-2022-jp-1978-irv} @samp{iso-2022-jp-2} @samp{euc-kr}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3386 @samp{korean-euc} @samp{iso-2022-kr} @samp{iso-2022-int-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3387
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3388 The following coding systems cannot be used for general multilingual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3389 text and do not cooperate well with other coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3390
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3391 @samp{big5} @samp{shift_jis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3392
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3393 @item Other languages
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3394
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3395 The following coding systems are based on ISO 2022. Though none of them
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3396 provides any Latin characters beyond ASCII, XEmacs Mule allows (and up
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3397 to 21.4 defaults to) use of ISO 2022 control sequences to designate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3398 other character sets for inclusion the text.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3399
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3400 @samp{iso-8859-5} @samp{iso-8859-7} @samp{iso-8859-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3401 @samp{ctext-hebrew}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3402
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3403 The following are character sets that do not conform to ISO 2022 and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3404 thus cannot be safely used in a multilingual context.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3405
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3406 @samp{alternativnyj} @samp{koi8-r} @samp{tis-620} @samp{viqr}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3407 @samp{viscii} @samp{vscii}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3408
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3409 @item Special coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3410
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3411 Mule uses the following coding systems for special purposes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3412
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3413 @samp{automatic-conversion} @samp{undecided} @samp{escape-quoted}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3414
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3415 @samp{escape-quoted} is especially important, as it is used internally
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3416 as the coding system for autosaved data.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3417
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3418 The following coding systems are aliases for others, and are used for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3419 communication with the host operating system.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3420
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3421 @samp{file-name} @samp{keyboard} @samp{terminal}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3422
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3423 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3424
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3425 Mule detection of coding systems is actually limited to detection of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3426 classes of coding systems called @dfn{coding categories}. These coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3427 categories are identified by the ISO 2022 control sequences they use, if
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3428 any, by their conformance to ISO 2022 restrictions on code points that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3429 may be used, and by characteristic patterns of use of 8-bit code points.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3430
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3431 @samp{no-conversion}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3432 @samp{utf-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3433 @samp{ucs-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3434 @samp{iso-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3435 @samp{iso-lock-shift}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3436 @samp{iso-8-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3437 @samp{iso-8-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3438 @samp{iso-8-designate}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3439 @samp{shift-jis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3440 @samp{big5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3441
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3442
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3443 @c end of mule.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 901
diff changeset
3444