xemacs-beta: man/lispref/mule.texi annotate

author	Aidan Kehoe <kehoea@parhasard.net>
date	Thu, 16 Sep 2010 16:46:27 +0100 (2010-09-16)
parents	d1754e7f0cea
children	3889ef128488

rev	line source
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1 @c --texinfo--
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2 @c This is part of the XEmacs Lisp Reference Manual.
775 7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	3 @c Copyright (C) 1996 Ben Wing, 2001-2002 Free Software Foundation.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	4 @c See the file lispref.texi for copying conditions.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	5 @setfilename ../../info/internationalization.info
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	6 @node MULE, Tips, Internationalization, top
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	7 @chapter MULE
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	8
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	9 @dfn{MULE} is the name originally given to the version of GNU Emacs
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	10 extended for multi-lingual (and in particular Asian-language) support.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	11 ``MULE'' is short for ``MUlti-Lingual Emacs''. It is an extension and
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	12 complete rewrite of Nemacs (``Nihon Emacs'' where ``Nihon'' is the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	13 Japanese word for ``Japan''), which only provided support for Japanese.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	14 XEmacs refers to its multi-lingual support as @dfn{MULE support} since
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	15 it is based on @dfn{MULE}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	16
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	17 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	18 * Internationalization Terminology::
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	19 Definition of various internationalization terms.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	20 * Charsets:: Sets of related characters.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	21 * MULE Characters:: Working with characters in XEmacs/MULE.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	22 * Composite Characters:: Making new characters by overstriking other ones.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	23 * Coding Systems:: Ways of representing a string of chars using integers.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	24 * CCL:: A special language for writing fast converters.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	25 * Category Tables:: Subdividing charsets into groups.
775 7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	26 * Unicode Support:: The universal coded character set.
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	27 * Charset Unification:: Handling overlapping character sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	28 * Charsets and Coding Systems:: Tables and reference information.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	29 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	30
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	31 @node Internationalization Terminology, Charsets, , MULE
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	32 @section Internationalization Terminology
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	33
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	34 In internationalization terminology, a string of text is divided up
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	35 into @dfn{characters}, which are the printable units that make up the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	36 text. A single character is (for example) a capital @samp{A}, the
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	37 number @samp{2}, a Katakana character, a Hangul character, a Kanji
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	38 ideograph (an @dfn{ideograph} is a ``picture'' character, such as is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	39 used in Japanese Kanji, Chinese Hanzi, and Korean Hanja; typically there
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	40 are thousands of such ideographs in each language), etc. The basic
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	41 property of a character is that it is the smallest unit of text with
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	42 semantic significance in text processing---i.e., characters are abstract
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	43 units defined by their meaning, not by their exact appearance.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	44
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	45 Human beings normally process text visually, so to a first approximation
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	46 a character may be identified with its shape. Note that the same
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	47 character may be drawn by two different people (or in two different
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	48 fonts) in slightly different ways, although the "basic shape" will be the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	49 same. But consider the works of Scott Kim; human beings can recognize
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	50 hugely variant shapes as the "same" character. Sometimes, especially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	51 where characters are extremely complicated to write, completely
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	52 different shapes may be defined as the "same" character in national
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	53 standards. The Taiwanese variant of Hanzi is generally the most
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	54 complicated; over the centuries, the Japanese, Koreans, and the People's
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	55 Republic of China have adopted simplifications of the shape, but the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	56 line of descent from the original shape is recorded, and the meanings
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	57 and pronunciation of different forms of the same character are
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	58 considered to be identical within each language. (Of course, it may
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	59 take a specialist to recognize the related form; the point is that the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	60 relations are standardized, despite the differing shapes.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	61
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	62 In some cases, the differences will be significant enough that it is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	63 actually possible to identify two or more distinct shapes that both
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	64 represent the same character. For example, the lowercase letters
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	65 @samp{a} and @samp{g} each have two distinct possible shapes---the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	66 @samp{a} can optionally have a curved tail projecting off the top, and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	67 the @samp{g} can be formed either of two loops, or of one loop and a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	68 tail hanging off the bottom. Such distinct possible shapes of a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	69 character are called @dfn{glyphs}. The important characteristic of two
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	70 glyphs making up the same character is that the choice between one or
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	71 the other is purely stylistic and has no linguistic effect on a word
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	72 (this is the reason why a capital @samp{A} and lowercase @samp{a}
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	73 are different characters rather than different glyphs---e.g.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	74 @samp{Aspen} is a city while @samp{aspen} is a kind of tree).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	75
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	76 Note that @dfn{character} and @dfn{glyph} are used differently
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	77 here than elsewhere in XEmacs.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	78
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	79 A @dfn{character set} is essentially a set of related characters. ASCII,
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	80 for example, is a set of 94 characters (or 128, if you count
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	81 non-printing characters). Other character sets are ISO8859-1 (ASCII
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	82 plus various accented characters and other international symbols),
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	83 JIS X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	84 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	85 GB2312 (Mainland Chinese Hanzi), etc.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	86
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	87 The definition of a character set will implicitly or explicitly give
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	88 it an @dfn{ordering}, a way of assigning a number to each character in
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	89 the set. For many character sets, there is a natural ordering, for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	90 example the ``ABC'' ordering of the Roman letters. But it is not clear
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	91 whether digits should come before or after the letters, and in fact
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	92 different European languages treat the ordering of accented characters
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	93 differently. It is useful to use the natural order where available, of
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	94 course. The number assigned to any particular character is called the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	95 character's @dfn{code point}. (Within a given character set, each
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	96 character has a unique code point. Thus the word "set" is ill-chosen;
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	97 different orderings of the same characters are different character sets.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	98 Identifying characters is simple enough for alphabetic character sets,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	99 but the difference in ordering can cause great headaches when the same
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	100 thousands of characters are used by different cultures as in the Hanzi.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	101
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	102 It's important to understand that a character is defined not by any
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	103 number attached to it, but by its meaning. For example, ASCII and
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	104 EBCDIC are two charsets containing exactly the same characters
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	105 (lowercase and uppercase letters, numbers 0 through 9, particular
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	106 punctuation marks) but with different numberings. The @samp{comma}
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	107 character in ASCII and EBCDIC, for instance, is the same character
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	108 despite having a different numbering. Conversely, when comparing ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	109 and JIS-Roman, which look the same except that the latter has a yen sign
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	110 substituted for the backslash, we would say that the backslash and yen
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	111 sign are @emph{not} the same characters, despite having the same number
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	112 (95) and despite the fact that all other characters are present in both
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	113 charsets, with the same numbering. ASCII and JIS-Roman, then, do
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	114 @emph{not} have exactly the same characters in them (ASCII has a
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	115 backslash character but no yen-sign character, and vice-versa for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	116 JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	117 and JIS-Roman are closer.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	118
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	119 Sometimes, a code point is not a single number, but instead a group of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	120 numbers, called @dfn{position codes}. In such cases, the number of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	121 position codes required to index a particular character in a character
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	122 set is called the @dfn{dimension} of the character set. Character sets
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	123 indexed by more than one position code typically use byte-sized position
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	124 codes. Small character sets, e.g. ASCII, invariably use a single
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	125 position code, but for larger character sets, the choice of whether to
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	126 use multiple position codes or a single large (16-bit or 32-bit) number
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	127 is arbitrary. Unicode typically uses a single large number, but
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	128 language-specific or "national" character sets often use multiple
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	129 (usually two) position codes. For example, JIS X 0208, i.e. Japanese
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	130 Kanji, has thousands of characters, and is of dimension two -- every
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	131 character is indexed by two position codes, each in the range 1 through
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	132 94. (This number ``94'' is not a coincidence; it is the same as the
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	133 number of printable characters in ASCII, and was chosen so that JIS
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	134 characters could be directly encoded using two printable ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	135 characters.) Note that the choice of the range here is somewhat
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	136 arbitrary -- it could just as easily be 0 through 93, 2 through 95, etc.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	137 In fact, the range for JIS position codes (and for other character sets
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	138 modeled after it) is often given as range 33 through 126, so as to
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	139 directly match ASCII printing characters.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	140
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	141 An @dfn{encoding} is a way of numerically representing characters from
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	142 one or more character sets into a stream of like-sized numerical values
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	143 called @dfn{words} -- typically 8-bit bytes, but sometimes 16-bit or
2818 9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	144 32-bit quantities. In a context where dealing with Japanese motivates
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	145 much of XEmacs' design in this area, it's important to clearly
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	146 distinguish between charsets and encodings. For a simple charset like
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	147 ASCII, there is only one encoding normally used -- each character is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	148 represented by a single byte, with the same value as its code point.
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	149 For more complicated charsets, however, or when a single encoding needs
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	150 to represent more than charset, things are not so obvious. Unicode
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	151 version 2, for example, is a large charset with thousands of characters,
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	152 each indexed by a 16-bit number, often represented in hex, e.g. 0x05D0
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	153 for the Hebrew letter "aleph". One obvious encoding (actually two
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	154 encodings, depending on which of the two possible byte orderings is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	155 chosen) simply uses two bytes per character. This encoding is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	156 convenient for internal processing of Unicode text; however, it's
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	157 incompatible with ASCII, and thus external text (files, e-mail, etc.)
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	158 that is encoded this way is completely uninterpretable by programs
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	159 lacking Unicode support. For this reason, a different, ASCII-compatible
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	160 encoding, e.g. UTF-8, is usually used for external text. UTF-8
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	161 represents Unicode characters with one to three bytes (often extended to
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	162 six bytes to handle characters with up to 31-bit indices). Unicode
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	163 characters 00 to 7F (identical with ASCII) are directly represented with
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	164 one byte, and other characters with two or more bytes, each in the range
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	165 80 to FF. Applications that don't understand Unicode will still be able
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	166 to process ASCII characters represented in UTF-8-encoded text, and will
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	167 typically ignore (and hopefully preserve) the high-bit characters.
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	168
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	169 Similarly, Shift-JIS and EUC-JP are different encodings normally used to
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	170 encode the same character set(s), these character sets being subsets of
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	171 Unicode. However, the obvious approach of unifying XEmacs' internal
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	172 encoding across character sets, as was part of the motivation behind
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	173 Unicode, wasn't taken. This means that characters in these character
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	174 sets that are identical to characters in other character sets---for
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	175 example, the Greek alphabet is in the large Japanese character sets and
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	176 at least one European character set--are unfortunately disjoint.
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	177
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	178 Naive use of code points is also not possible if more than one
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	179 character set is to be used in the encoding. For example, printed
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	180 Japanese text typically requires characters from multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	181 -- ASCII, JIS X 0208, and JIS X 0212, to be specific. Each of these is
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	182 indexed using one or more position codes in the range 1 through 94 (or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	183 33 through 126), so the position codes could not be used directly or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	184 there would be no way to tell which character was meant. Different
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	185 Japanese encodings handle this differently -- JIS uses special escape
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	186 characters to denote different character sets; EUC sets the high bit of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	187 the position codes for JIS X 0208 and JIS X 0212, and puts a special
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	188 extra byte before each JIS X 0212 character; etc.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	189
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	190 The encodings described above are all 7-bit or 8-bit encodings. The
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	191 fixed-width Unicode encoding previous described, however, is sometimes
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	192 considered to be a 16-bit encoding, in which case the issue of byte
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	193 ordering does not come up. (Imagine, for example, that the text is
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	194 represented as an array of shorts.) Similarly, Unicode version 3 (which
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	195 has characters with indices above 0xFFFF), and other very large
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	196 character sets, may be represented internally as 32-bit encodings,
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	197 i.e. arrays of ints. However, it does not make too much sense to talk
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	198 about 16-bit or 32-bit encodings for external data, since nowadays 8-bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	199 data is a universal standard -- the closest you can get is fixed-width
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	200 encodings using two or four bytes to encode 16-bit or 32-bit values. (A
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	201 "7-bit" encoding is used when it cannot be guaranteed that the high bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	202 of 8-bit data will be correctly preserved. Some e-mail gateways, for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	203 example, strip the high bit of text passing through them. These same
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	204 gateways often handle non-printable characters incorrectly, and so 7-bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	205 encodings usually avoid using bytes with such values.)
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	206
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	207 A general method of handling text using multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	208 (whether for multilingual text, or simply text in an extremely
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	209 complicated single language like Japanese) is defined in the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	210 international standard ISO 2022. ISO 2022 will be discussed in more
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	211 detail later (@pxref{ISO 2022}), but for now suffice it to say that text
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	212 needs control functions (at least spacing), and if escape sequences are
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	213 to be used, an escape sequence introducer. It was decided to make all
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	214 text streams compatible with ASCII in the sense that the codes 0--31
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	215 (and 128-159) would always be control codes, never graphic characters,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	216 and where defined by the character set the @samp{SPC} character would be
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	217 assigned code 32, and @samp{DEL} would be assigned 127. Thus there are
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	218 94 code points remaining if 7 bits are used. This is the reason that
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	219 most character sets are defined using position codes in the range 1
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	220 through 94. Then ISO 2022 compatible encodings are produced by shifting
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	221 the position codes 1 to 94 into character codes 33 to 126, or (if 8 bit
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	222 codes are available) into character codes 161 to 254.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	223
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	224 Encodings are classified as either @dfn{modal} or @dfn{non-modal}. In
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	225 a @dfn{modal encoding}, there are multiple states that the encoding can
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	226 be in, and the interpretation of the values in the stream depends on the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	227 current global state of the encoding. Special values in the encoding,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	228 called @dfn{escape sequences}, are used to change the global state.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	229 JIS, for example, is a modal encoding. The bytes @samp{ESC $ B}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	230 indicate that, from then on, bytes are to be interpreted as position
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	231 codes for JIS X 0208, rather than as ASCII. This effect is cancelled
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	232 using the bytes @samp{ESC ( B}, which mean ``switch from whatever the
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	233 current state is to ASCII''. To switch to JIS X 0212, the escape
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	234 sequence @samp{ESC $ ( D}. (Note that here, as is common, the escape
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	235 sequences do in fact begin with @samp{ESC}. This is not necessarily the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	236 case, however. Some encodings use control characters called "locking
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	237 shifts" (effect persists until cancelled) to switch character sets.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	238
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	239 A @dfn{non-modal encoding} has no global state that extends past the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	240 character currently being interpreted. EUC, for example, is a
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	241 non-modal encoding. Characters in JIS X 0208 are encoded by setting
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	242 the high bit of the position codes, and characters in JIS X 0212 are
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	243 encoded by doing the same but also prefixing the character with the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	244 byte 0x8F.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	245
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	246 The advantage of a modal encoding is that it is generally more
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	247 space-efficient, and is easily extendible because there are essentially
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	248 an arbitrary number of escape sequences that can be created. The
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	249 disadvantage, however, is that it is much more difficult to work with
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	250 if it is not being processed in a sequential manner. In the non-modal
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	251 EUC encoding, for example, the byte 0x41 always refers to the letter
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	252 @samp{A}; whereas in JIS, it could either be the letter @samp{A}, or
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	253 one of the two position codes in a JIS X 0208 character, or one of the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	254 two position codes in a JIS X 0212 character. Determining exactly which
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	255 one is meant could be difficult and time-consuming if the previous
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	256 bytes in the string have not already been processed, or impossible if
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	257 they are drawn from an external stream that cannot be rewound.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	258
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	259 Non-modal encodings are further divided into @dfn{fixed-width} and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	260 @dfn{variable-width} formats. A fixed-width encoding always uses
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	261 the same number of words per character, whereas a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	262 encoding does not. EUC is a good example of a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	263 encoding: one to three bytes are used per character, depending on
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	264 the character set. 16-bit and 32-bit encodings are nearly always
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	265 fixed-width, and this is in fact one of the main reasons for using
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	266 an encoding with a larger word size. The advantages of fixed-width
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	267 encodings should be obvious. The advantages of variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	268 encodings are that they are generally more space-efficient and allow
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	269 for compatibility with existing 8-bit encodings such as ASCII. (For
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	270 example, in Unicode ASCII characters are simply promoted to a 16-bit
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	271 representation. That means that every ASCII character contains a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	272 @samp{NUL} byte; evidently all of the standard string manipulation
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	273 functions will lose badly in a fixed-width Unicode environment.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	274
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	275 The bytes in an 8-bit encoding are often referred to as @dfn{octets}
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	276 rather than simply as bytes. This terminology dates back to the days
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	277 before 8-bit bytes were universal, when some computers had 9-bit bytes,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	278 others had 10-bit bytes, etc.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	279
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	280 @node Charsets, MULE Characters, Internationalization Terminology, MULE
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	281 @section Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	282
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	283 A @dfn{charset} in MULE is an object that encapsulates a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	284 particular character set as well as an ordering of those characters.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	285 Charsets are permanent objects and are named using symbols, like
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	286 faces.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	287
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	288 @defun charsetp object
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	289 This function returns non-@code{nil} if @var{object} is a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	290 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	291
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	292 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	293 * Charset Properties:: Properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	294 * Basic Charset Functions:: Functions for working with charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	295 * Charset Property Functions:: Functions for accessing charset properties.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	296 * Predefined Charsets:: Predefined charset objects.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	297 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	298
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	299 @node Charset Properties, Basic Charset Functions, , Charsets
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	300 @subsection Charset Properties
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	301
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	302 Charsets have the following properties:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	303
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	304 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	305 @item name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	306 A symbol naming the charset. Every charset must have a different name;
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	307 this allows a charset to be referred to using its name rather than
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	308 the actual charset object.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	309 @item doc-string
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	310 A documentation string describing the charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	311 @item registry
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	312 A regular expression matching the font registry field for this character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	313 set. For example, both the @code{ascii} and @code{latin-iso8859-1}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	314 charsets use the registry @code{"ISO8859-1"}. This field is used to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	315 choose an appropriate font when the user gives a general font
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	316 specification such as @samp{--courier-medium-r--140-*}, i.e. a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	317 14-point upright medium-weight Courier font.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	318 @item dimension
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	319 Number of position codes used to index a character in the character set.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	320 XEmacs/MULE can only handle character sets of dimension 1 or 2.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	321 This property defaults to 1.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	322 @item chars
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	323 Number of characters in each dimension. In XEmacs/MULE, the only
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	324 allowed values are 94 or 96. (There are a couple of pre-defined
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	325 character sets, such as ASCII, that do not follow this, but you cannot
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	326 define new ones like this.) Defaults to 94. Note that if the dimension
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	327 is 2, the character set thus described is 94x94 or 96x96.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	328 @item columns
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	329 Number of columns used to display a character in this charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	330 Only used in TTY mode. (Under X, the actual width of a character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	331 can be derived from the font used to display the characters.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	332 If unspecified, defaults to the dimension. (This is almost
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	333 always the correct value, because character sets with dimension 2
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	334 are usually ideograph character sets, which need two columns to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	335 display the intricate ideographs.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	336 @item direction
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	337 A symbol, either @code{l2r} (left-to-right) or @code{r2l}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	338 (right-to-left). Defaults to @code{l2r}. This specifies the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	339 direction that the text should be displayed in, and will be
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	340 left-to-right for most charsets but right-to-left for Hebrew
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	341 and Arabic. (Right-to-left display is not currently implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	342 @item final
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	343 Final byte of the standard ISO 2022 escape sequence designating this
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	344 charset. Must be supplied. Each combination of (@var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	345 @var{chars}) defines a separate namespace for final bytes, and each
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	346 charset within a particular namespace must have a different final byte.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	347 Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	348 dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	349 bytes in the range 0x30 - 0x3F are reserved for user-defined (not
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	350 official) character sets. For more information on ISO 2022, see @ref{Coding
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	351 Systems}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	352 @item graphic
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	353 0 (use left half of font on output) or 1 (use right half of font on
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	354 output). Defaults to 0. This specifies how to convert the position
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	355 codes that index a character in a character set into an index into the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	356 font used to display the character set. With @code{graphic} set to 0,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	357 position codes 33 through 126 map to font indices 33 through 126; with
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	358 it set to 1, position codes 33 through 126 map to font indices 161
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	359 through 254 (i.e. the same number but with the high bit set). For
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	360 example, for a font whose registry is ISO8859-1, the left half of the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	361 font (octets 0x20 - 0x7F) is the @code{ascii} charset, while the right
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	362 half (octets 0xA0 - 0xFF) is the @code{latin-iso8859-1} charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	363 @item ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	364 A compiled CCL program used to convert a character in this charset into
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	365 an index into the font. This is in addition to the @code{graphic}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	366 property. If a CCL program is defined, the position codes of a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	367 character will first be processed according to @code{graphic} and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	368 then passed through the CCL program, with the resulting values used
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	369 to index the font.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	370
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	371 This is used, for example, in the Big5 character set (used in Taiwan).
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	372 This character set is not ISO-2022-compliant, and its size (94x157) does
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	373 not fit within the maximum 96x96 size of ISO-2022-compliant character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	374 sets. As a result, XEmacs/MULE splits it (in a rather complex fashion,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	375 so as to group the most commonly used characters together) into two
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	376 charset objects (@code{big5-1} and @code{big5-2}), each of size 94x94,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	377 and each charset object uses a CCL program to convert the modified
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	378 position codes back into standard Big5 indices to retrieve a character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	379 from a Big5 font.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	380 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	381
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	382 Most of the above properties can only be set when the charset is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	383 initialized, and cannot be changed later.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	384 @xref{Charset Property Functions}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	385
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	386 @node Basic Charset Functions, Charset Property Functions, Charset Properties, Charsets
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	387 @subsection Basic Charset Functions
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	388
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	389 @defun find-charset charset-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	390 This function retrieves the charset of the given name. If
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	391 @var{charset-or-name} is a charset object, it is simply returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	392 Otherwise, @var{charset-or-name} should be a symbol. If there is no
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	393 such charset, @code{nil} is returned. Otherwise the associated charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	394 object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	395 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	396
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	397 @defun get-charset name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	398 This function retrieves the charset of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	399 @code{find-charset} except an error is signalled if there is no such
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	400 charset instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	401 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	402
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	403 @defun charset-list
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	404 This function returns a list of the names of all defined charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	405 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	406
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	407 @defun make-charset name doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	408 This function defines a new character set. This function is for use
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	409 with MULE support. @var{name} is a symbol, the name by which the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	410 character set is normally referred. @var{doc-string} is a string
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	411 describing the character set. @var{props} is a property list,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	412 describing the specific nature of the character set. The recognized
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	413 properties are @code{registry}, @code{dimension}, @code{columns},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	414 @code{chars}, @code{final}, @code{graphic}, @code{direction}, and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	415 @code{ccl-program}, as previously described.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	416 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	417
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	418 @defun make-reverse-direction-charset charset new-name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	419 This function makes a charset equivalent to @var{charset} but which goes
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	420 in the opposite direction. @var{new-name} is the name of the new
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	421 charset. The new charset is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	422 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	423
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	424 @defun charset-from-attributes dimension chars final &optional direction
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	425 This function returns a charset with the given @var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	426 @var{chars}, @var{final}, and @var{direction}. If @var{direction} is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	427 omitted, both directions will be checked (left-to-right will be returned
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	428 if character sets exist for both directions).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	429 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	430
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	431 @defun charset-reverse-direction-charset charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	432 This function returns the charset (if any) with the same dimension,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	433 number of characters, and final byte as @var{charset}, but which is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	434 displayed in the opposite direction.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	435 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	436
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	437 @node Charset Property Functions, Predefined Charsets, Basic Charset Functions, Charsets
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	438 @subsection Charset Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	439
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	440 All of these functions accept either a charset name or charset object.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	441
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	442 @defun charset-property charset prop
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	443 This function returns property @var{prop} of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	444 @xref{Charset Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	445 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	446
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	447 Convenience functions are also provided for retrieving individual
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	448 properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	449
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	450 @defun charset-name charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	451 This function returns the name of @var{charset}. This will be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	452 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	453
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	454 @defun charset-description charset
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	455 This function returns the documentation string of @var{charset}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	456 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	457
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	458 @defun charset-registry charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	459 This function returns the registry of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	460 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	461
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	462 @defun charset-dimension charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	463 This function returns the dimension of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	464 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	465
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	466 @defun charset-chars charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	467 This function returns the number of characters per dimension of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	468 @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	469 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	470
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	471 @defun charset-width charset
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	472 This function returns the number of display columns per character (in
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	473 TTY mode) of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	474 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	475
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	476 @defun charset-direction charset
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	477 This function returns the display direction of @var{charset}---either
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	478 @code{l2r} or @code{r2l}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	479 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	480
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	481 @defun charset-iso-final-char charset
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	482 This function returns the final byte of the ISO 2022 escape sequence
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	483 designating @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	484 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	485
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	486 @defun charset-iso-graphic-plane charset
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	487 This function returns either 0 or 1, depending on whether the position
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	488 codes of characters in @var{charset} map to the left or right half
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	489 of their font, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	490 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	491
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	492 @defun charset-ccl-program charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	493 This function returns the CCL program, if any, for converting
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	494 position codes of characters in @var{charset} into font indices.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	495 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	496
1734 d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	497 The two properties of a charset that can currently be set after the
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	498 charset has been created are the CCL program and the font registry.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	499
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	500 @defun set-charset-ccl-program charset ccl-program

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1 @c -*-texinfo-*-

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2 @c This is part of the XEmacs Lisp Reference Manual.

775

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

4 @c See the file lispref.texi for copying conditions.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

5 @setfilename ../../info/internationalization.info

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

6 @node MULE, Tips, Internationalization, top

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

7 @chapter MULE

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

8

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

9 @dfn{MULE} is the name originally given to the version of GNU Emacs

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

10 extended for multi-lingual (and in particular Asian-language) support.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

11 ``MULE'' is short for ``MUlti-Lingual Emacs''. It is an extension and

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

12 complete rewrite of Nemacs (``Nihon Emacs'' where ``Nihon'' is the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

13 Japanese word for ``Japan''), which only provided support for Japanese.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

14 XEmacs refers to its multi-lingual support as @dfn{MULE support} since

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

15 it is based on @dfn{MULE}.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

16

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

17 @menu

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

18 * Internationalization Terminology::

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

19 Definition of various internationalization terms.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

20 * Charsets:: Sets of related characters.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

21 * MULE Characters:: Working with characters in XEmacs/MULE.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

22 * Composite Characters:: Making new characters by overstriking other ones.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

23 * Coding Systems:: Ways of representing a string of chars using integers.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

24 * CCL:: A special language for writing fast converters.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

25 * Category Tables:: Subdividing charsets into groups.

775

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

26 * Unicode Support:: The universal coded character set.

1183

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

27 * Charset Unification:: Handling overlapping character sets.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

28 * Charsets and Coding Systems:: Tables and reference information.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

29 @end menu

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

30

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

31 @node Internationalization Terminology, Charsets, , MULE

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

32 @section Internationalization Terminology

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

33

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

34 In internationalization terminology, a string of text is divided up

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

35 into @dfn{characters}, which are the printable units that make up the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

36 text. A single character is (for example) a capital @samp{A}, the

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

37 number @samp{2}, a Katakana character, a Hangul character, a Kanji

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

38 ideograph (an @dfn{ideograph} is a ``picture'' character, such as is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

39 used in Japanese Kanji, Chinese Hanzi, and Korean Hanja; typically there

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

40 are thousands of such ideographs in each language), etc. The basic

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

41 property of a character is that it is the smallest unit of text with

1261

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

42 semantic significance in text processing---i.e., characters are abstract

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

43 units defined by their meaning, not by their exact appearance.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

44

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

45 Human beings normally process text visually, so to a first approximation

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

46 a character may be identified with its shape. Note that the same

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

47 character may be drawn by two different people (or in two different

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

48 fonts) in slightly different ways, although the "basic shape" will be the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

49 same. But consider the works of Scott Kim; human beings can recognize

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

50 hugely variant shapes as the "same" character. Sometimes, especially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

51 where characters are extremely complicated to write, completely

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

52 different shapes may be defined as the "same" character in national

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

53 standards. The Taiwanese variant of Hanzi is generally the most

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

54 complicated; over the centuries, the Japanese, Koreans, and the People's

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

55 Republic of China have adopted simplifications of the shape, but the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

56 line of descent from the original shape is recorded, and the meanings

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

57 and pronunciation of different forms of the same character are

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

58 considered to be identical within each language. (Of course, it may

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

59 take a specialist to recognize the related form; the point is that the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

60 relations are standardized, despite the differing shapes.)

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

61

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

62 In some cases, the differences will be significant enough that it is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

63 actually possible to identify two or more distinct shapes that both

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

64 represent the same character. For example, the lowercase letters

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

65 @samp{a} and @samp{g} each have two distinct possible shapes---the

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

66 @samp{a} can optionally have a curved tail projecting off the top, and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

67 the @samp{g} can be formed either of two loops, or of one loop and a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

68 tail hanging off the bottom. Such distinct possible shapes of a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

69 character are called @dfn{glyphs}. The important characteristic of two

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

70 glyphs making up the same character is that the choice between one or

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

71 the other is purely stylistic and has no linguistic effect on a word

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

72 (this is the reason why a capital @samp{A} and lowercase @samp{a}

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

73 are different characters rather than different glyphs---e.g.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

74 @samp{Aspen} is a city while @samp{aspen} is a kind of tree).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

75

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

76 Note that @dfn{character} and @dfn{glyph} are used differently

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

77 here than elsewhere in XEmacs.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

78

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

79 A @dfn{character set} is essentially a set of related characters. ASCII,

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

80 for example, is a set of 94 characters (or 128, if you count

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

81 non-printing characters). Other character sets are ISO8859-1 (ASCII

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

82 plus various accented characters and other international symbols),

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

83 JIS X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

84 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

85 GB2312 (Mainland Chinese Hanzi), etc.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

86

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

87 The definition of a character set will implicitly or explicitly give

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

88 it an @dfn{ordering}, a way of assigning a number to each character in

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

89 the set. For many character sets, there is a natural ordering, for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

90 example the ``ABC'' ordering of the Roman letters. But it is not clear

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

91 whether digits should come before or after the letters, and in fact

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

92 different European languages treat the ordering of accented characters

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

93 differently. It is useful to use the natural order where available, of

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

94 course. The number assigned to any particular character is called the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

95 character's @dfn{code point}. (Within a given character set, each

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

96 character has a unique code point. Thus the word "set" is ill-chosen;

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

97 different orderings of the same characters are different character sets.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

98 Identifying characters is simple enough for alphabetic character sets,

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

99 but the difference in ordering can cause great headaches when the same

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

100 thousands of characters are used by different cultures as in the Hanzi.)

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

101

1261

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

102 It's important to understand that a character is defined not by any

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

103 number attached to it, but by its meaning. For example, ASCII and

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

104 EBCDIC are two charsets containing exactly the same characters

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

105 (lowercase and uppercase letters, numbers 0 through 9, particular

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

106 punctuation marks) but with different numberings. The @samp{comma}

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

107 character in ASCII and EBCDIC, for instance, is the same character

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

108 despite having a different numbering. Conversely, when comparing ASCII

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

109 and JIS-Roman, which look the same except that the latter has a yen sign

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

110 substituted for the backslash, we would say that the backslash and yen

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

111 sign are @emph{not} the same characters, despite having the same number

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

112 (95) and despite the fact that all other characters are present in both

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

113 charsets, with the same numbering. ASCII and JIS-Roman, then, do

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

114 @emph{not} have exactly the same characters in them (ASCII has a

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

115 backslash character but no yen-sign character, and vice-versa for

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

116 JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

117 and JIS-Roman are closer.

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

118

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

119 Sometimes, a code point is not a single number, but instead a group of

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

120 numbers, called @dfn{position codes}. In such cases, the number of

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

121 position codes required to index a particular character in a character

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

122 set is called the @dfn{dimension} of the character set. Character sets

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

123 indexed by more than one position code typically use byte-sized position

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

124 codes. Small character sets, e.g. ASCII, invariably use a single

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

125 position code, but for larger character sets, the choice of whether to

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

126 use multiple position codes or a single large (16-bit or 32-bit) number

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

127 is arbitrary. Unicode typically uses a single large number, but

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

128 language-specific or "national" character sets often use multiple

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

129 (usually two) position codes. For example, JIS X 0208, i.e. Japanese

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

130 Kanji, has thousands of characters, and is of dimension two -- every

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

131 character is indexed by two position codes, each in the range 1 through

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

132 94. (This number ``94'' is not a coincidence; it is the same as the

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

133 number of printable characters in ASCII, and was chosen so that JIS

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

134 characters could be directly encoded using two printable ASCII

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

135 characters.) Note that the choice of the range here is somewhat

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

136 arbitrary -- it could just as easily be 0 through 93, 2 through 95, etc.

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

137 In fact, the range for JIS position codes (and for other character sets

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

138 modeled after it) is often given as range 33 through 126, so as to

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

139 directly match ASCII printing characters.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

140

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

141 An @dfn{encoding} is a way of numerically representing characters from

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

142 one or more character sets into a stream of like-sized numerical values

1261

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

143 called @dfn{words} -- typically 8-bit bytes, but sometimes 16-bit or

2818

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

144 32-bit quantities. In a context where dealing with Japanese motivates

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

145 much of XEmacs' design in this area, it's important to clearly

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

146 distinguish between charsets and encodings. For a simple charset like

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

147 ASCII, there is only one encoding normally used -- each character is

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

148 represented by a single byte, with the same value as its code point.

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

149 For more complicated charsets, however, or when a single encoding needs

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

150 to represent more than charset, things are not so obvious. Unicode

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

151 version 2, for example, is a large charset with thousands of characters,

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

152 each indexed by a 16-bit number, often represented in hex, e.g. 0x05D0

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

153 for the Hebrew letter "aleph". One obvious encoding (actually two

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

154 encodings, depending on which of the two possible byte orderings is

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

155 chosen) simply uses two bytes per character. This encoding is

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

156 convenient for internal processing of Unicode text; however, it's

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

157 incompatible with ASCII, and thus external text (files, e-mail, etc.)

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

158 that is encoded this way is completely uninterpretable by programs

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

159 lacking Unicode support. For this reason, a different, ASCII-compatible

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

160 encoding, e.g. UTF-8, is usually used for external text. UTF-8

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

161 represents Unicode characters with one to three bytes (often extended to

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

162 six bytes to handle characters with up to 31-bit indices). Unicode

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

163 characters 00 to 7F (identical with ASCII) are directly represented with

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

164 one byte, and other characters with two or more bytes, each in the range

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

165 80 to FF. Applications that don't understand Unicode will still be able

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

166 to process ASCII characters represented in UTF-8-encoded text, and will

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

167 typically ignore (and hopefully preserve) the high-bit characters.

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

168

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

169 Similarly, Shift-JIS and EUC-JP are different encodings normally used to

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

170 encode the same character set(s), these character sets being subsets of

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

171 Unicode. However, the obvious approach of unifying XEmacs' internal

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

172 encoding across character sets, as was part of the motivation behind

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

173 Unicode, wasn't taken. This means that characters in these character

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

174 sets that are identical to characters in other character sets---for

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

175 example, the Greek alphabet is in the large Japanese character sets and

9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan]

aidan

parents: 2690

diff changeset

176 at least one European character set--are unfortunately disjoint.

1261

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

177

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

178 Naive use of code points is also not possible if more than one

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

179 character set is to be used in the encoding. For example, printed

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

180 Japanese text typically requires characters from multiple character sets

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

181 -- ASCII, JIS X 0208, and JIS X 0212, to be specific. Each of these is

1261

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

182 indexed using one or more position codes in the range 1 through 94 (or

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

183 33 through 126), so the position codes could not be used directly or

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

184 there would be no way to tell which character was meant. Different

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

185 Japanese encodings handle this differently -- JIS uses special escape

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

186 characters to denote different character sets; EUC sets the high bit of

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

187 the position codes for JIS X 0208 and JIS X 0212, and puts a special

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

188 extra byte before each JIS X 0212 character; etc.

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

189

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

190 The encodings described above are all 7-bit or 8-bit encodings. The

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

191 fixed-width Unicode encoding previous described, however, is sometimes

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

192 considered to be a 16-bit encoding, in which case the issue of byte

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

193 ordering does not come up. (Imagine, for example, that the text is

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

194 represented as an array of shorts.) Similarly, Unicode version 3 (which

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

195 has characters with indices above 0xFFFF), and other very large

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

196 character sets, may be represented internally as 32-bit encodings,

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

197 i.e. arrays of ints. However, it does not make too much sense to talk

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

198 about 16-bit or 32-bit encodings for external data, since nowadays 8-bit

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

199 data is a universal standard -- the closest you can get is fixed-width

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

200 encodings using two or four bytes to encode 16-bit or 32-bit values. (A

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

201 "7-bit" encoding is used when it cannot be guaranteed that the high bit

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

202 of 8-bit data will be correctly preserved. Some e-mail gateways, for

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

203 example, strip the high bit of text passing through them. These same

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

204 gateways often handle non-printable characters incorrectly, and so 7-bit

465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben]

ben

parents: 1188

diff changeset

205 encodings usually avoid using bytes with such values.)

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

206

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

207 A general method of handling text using multiple character sets

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

208 (whether for multilingual text, or simply text in an extremely

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

209 complicated single language like Japanese) is defined in the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

210 international standard ISO 2022. ISO 2022 will be discussed in more

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

211 detail later (@pxref{ISO 2022}), but for now suffice it to say that text

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

212 needs control functions (at least spacing), and if escape sequences are

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

213 to be used, an escape sequence introducer. It was decided to make all

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

214 text streams compatible with ASCII in the sense that the codes 0--31

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

215 (and 128-159) would always be control codes, never graphic characters,

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

216 and where defined by the character set the @samp{SPC} character would be

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

217 assigned code 32, and @samp{DEL} would be assigned 127. Thus there are

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

218 94 code points remaining if 7 bits are used. This is the reason that

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

219 most character sets are defined using position codes in the range 1

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

220 through 94. Then ISO 2022 compatible encodings are produced by shifting

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

221 the position codes 1 to 94 into character codes 33 to 126, or (if 8 bit

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

222 codes are available) into character codes 161 to 254.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

223

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

224 Encodings are classified as either @dfn{modal} or @dfn{non-modal}. In

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

225 a @dfn{modal encoding}, there are multiple states that the encoding can

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

226 be in, and the interpretation of the values in the stream depends on the

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

227 current global state of the encoding. Special values in the encoding,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

228 called @dfn{escape sequences}, are used to change the global state.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

229 JIS, for example, is a modal encoding. The bytes @samp{ESC $ B}

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

230 indicate that, from then on, bytes are to be interpreted as position

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

231 codes for JIS X 0208, rather than as ASCII. This effect is cancelled

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

232 using the bytes @samp{ESC ( B}, which mean ``switch from whatever the

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

233 current state is to ASCII''. To switch to JIS X 0212, the escape

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

234 sequence @samp{ESC $ ( D}. (Note that here, as is common, the escape

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

235 sequences do in fact begin with @samp{ESC}. This is not necessarily the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

236 case, however. Some encodings use control characters called "locking

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

237 shifts" (effect persists until cancelled) to switch character sets.)

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

238

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

239 A @dfn{non-modal encoding} has no global state that extends past the

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

240 character currently being interpreted. EUC, for example, is a

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

241 non-modal encoding. Characters in JIS X 0208 are encoded by setting

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

242 the high bit of the position codes, and characters in JIS X 0212 are

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

243 encoded by doing the same but also prefixing the character with the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

244 byte 0x8F.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

245

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

246 The advantage of a modal encoding is that it is generally more

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

247 space-efficient, and is easily extendible because there are essentially

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

248 an arbitrary number of escape sequences that can be created. The

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

249 disadvantage, however, is that it is much more difficult to work with

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

250 if it is not being processed in a sequential manner. In the non-modal

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

251 EUC encoding, for example, the byte 0x41 always refers to the letter

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

252 @samp{A}; whereas in JIS, it could either be the letter @samp{A}, or

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

253 one of the two position codes in a JIS X 0208 character, or one of the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

254 two position codes in a JIS X 0212 character. Determining exactly which

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

255 one is meant could be difficult and time-consuming if the previous

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

256 bytes in the string have not already been processed, or impossible if

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

257 they are drawn from an external stream that cannot be rewound.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

258

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

259 Non-modal encodings are further divided into @dfn{fixed-width} and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

260 @dfn{variable-width} formats. A fixed-width encoding always uses

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

261 the same number of words per character, whereas a variable-width

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

262 encoding does not. EUC is a good example of a variable-width

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

263 encoding: one to three bytes are used per character, depending on

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

264 the character set. 16-bit and 32-bit encodings are nearly always

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

265 fixed-width, and this is in fact one of the main reasons for using

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

266 an encoding with a larger word size. The advantages of fixed-width

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

267 encodings should be obvious. The advantages of variable-width

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

268 encodings are that they are generally more space-efficient and allow

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

269 for compatibility with existing 8-bit encodings such as ASCII. (For

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

270 example, in Unicode ASCII characters are simply promoted to a 16-bit

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

271 representation. That means that every ASCII character contains a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

272 @samp{NUL} byte; evidently all of the standard string manipulation

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

273 functions will lose badly in a fixed-width Unicode environment.)

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

274

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

275 The bytes in an 8-bit encoding are often referred to as @dfn{octets}

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

276 rather than simply as bytes. This terminology dates back to the days

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

277 before 8-bit bytes were universal, when some computers had 9-bit bytes,

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

278 others had 10-bit bytes, etc.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

279

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

280 @node Charsets, MULE Characters, Internationalization Terminology, MULE

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

281 @section Charsets

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

282

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

283 A @dfn{charset} in MULE is an object that encapsulates a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

284 particular character set as well as an ordering of those characters.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

285 Charsets are permanent objects and are named using symbols, like

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

286 faces.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

287

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

288 @defun charsetp object

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

289 This function returns non-@code{nil} if @var{object} is a charset.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

290 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

291

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

292 @menu

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

293 * Charset Properties:: Properties of a charset.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

294 * Basic Charset Functions:: Functions for working with charsets.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

295 * Charset Property Functions:: Functions for accessing charset properties.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

296 * Predefined Charsets:: Predefined charset objects.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

297 @end menu

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

298

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

299 @node Charset Properties, Basic Charset Functions, , Charsets

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

300 @subsection Charset Properties

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

301

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

302 Charsets have the following properties:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

303

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

304 @table @code

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

305 @item name

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

306 A symbol naming the charset. Every charset must have a different name;

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

307 this allows a charset to be referred to using its name rather than

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

308 the actual charset object.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

309 @item doc-string

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

310 A documentation string describing the charset.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

311 @item registry

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

312 A regular expression matching the font registry field for this character

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

313 set. For example, both the @code{ascii} and @code{latin-iso8859-1}

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

314 charsets use the registry @code{"ISO8859-1"}. This field is used to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

315 choose an appropriate font when the user gives a general font

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

316 specification such as @samp{-*-courier-medium-r-*-140-*}, i.e. a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

317 14-point upright medium-weight Courier font.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

318 @item dimension

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

319 Number of position codes used to index a character in the character set.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

320 XEmacs/MULE can only handle character sets of dimension 1 or 2.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

321 This property defaults to 1.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

322 @item chars

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

323 Number of characters in each dimension. In XEmacs/MULE, the only

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

324 allowed values are 94 or 96. (There are a couple of pre-defined

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

325 character sets, such as ASCII, that do not follow this, but you cannot

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

326 define new ones like this.) Defaults to 94. Note that if the dimension

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

327 is 2, the character set thus described is 94x94 or 96x96.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

328 @item columns

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

329 Number of columns used to display a character in this charset.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

330 Only used in TTY mode. (Under X, the actual width of a character

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

331 can be derived from the font used to display the characters.)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

332 If unspecified, defaults to the dimension. (This is almost

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

333 always the correct value, because character sets with dimension 2

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

334 are usually ideograph character sets, which need two columns to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

335 display the intricate ideographs.)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

336 @item direction

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

337 A symbol, either @code{l2r} (left-to-right) or @code{r2l}

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

338 (right-to-left). Defaults to @code{l2r}. This specifies the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

339 direction that the text should be displayed in, and will be

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

340 left-to-right for most charsets but right-to-left for Hebrew

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

341 and Arabic. (Right-to-left display is not currently implemented.)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

342 @item final

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

343 Final byte of the standard ISO 2022 escape sequence designating this

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

344 charset. Must be supplied. Each combination of (@var{dimension},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

345 @var{chars}) defines a separate namespace for final bytes, and each

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

346 charset within a particular namespace must have a different final byte.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

347 Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

348 dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

349 bytes in the range 0x30 - 0x3F are reserved for user-defined (not

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

350 official) character sets. For more information on ISO 2022, see @ref{Coding

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

351 Systems}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

352 @item graphic

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

353 0 (use left half of font on output) or 1 (use right half of font on

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

354 output). Defaults to 0. This specifies how to convert the position

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

355 codes that index a character in a character set into an index into the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

356 font used to display the character set. With @code{graphic} set to 0,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

357 position codes 33 through 126 map to font indices 33 through 126; with

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

358 it set to 1, position codes 33 through 126 map to font indices 161

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

359 through 254 (i.e. the same number but with the high bit set). For

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

360 example, for a font whose registry is ISO8859-1, the left half of the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

361 font (octets 0x20 - 0x7F) is the @code{ascii} charset, while the right

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

362 half (octets 0xA0 - 0xFF) is the @code{latin-iso8859-1} charset.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

363 @item ccl-program

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

364 A compiled CCL program used to convert a character in this charset into

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

365 an index into the font. This is in addition to the @code{graphic}

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

366 property. If a CCL program is defined, the position codes of a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

367 character will first be processed according to @code{graphic} and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

368 then passed through the CCL program, with the resulting values used

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

369 to index the font.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

370

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

371 This is used, for example, in the Big5 character set (used in Taiwan).

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

372 This character set is not ISO-2022-compliant, and its size (94x157) does

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

373 not fit within the maximum 96x96 size of ISO-2022-compliant character

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

374 sets. As a result, XEmacs/MULE splits it (in a rather complex fashion,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

375 so as to group the most commonly used characters together) into two

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

376 charset objects (@code{big5-1} and @code{big5-2}), each of size 94x94,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

377 and each charset object uses a CCL program to convert the modified

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

378 position codes back into standard Big5 indices to retrieve a character

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

379 from a Big5 font.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

380 @end table

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

381

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

382 Most of the above properties can only be set when the charset is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

383 initialized, and cannot be changed later.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

384 @xref{Charset Property Functions}.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

385

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

386 @node Basic Charset Functions, Charset Property Functions, Charset Properties, Charsets

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

387 @subsection Basic Charset Functions

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

388

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

389 @defun find-charset charset-or-name

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

390 This function retrieves the charset of the given name. If

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

391 @var{charset-or-name} is a charset object, it is simply returned.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

392 Otherwise, @var{charset-or-name} should be a symbol. If there is no

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

393 such charset, @code{nil} is returned. Otherwise the associated charset

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

394 object is returned.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

395 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

396

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

397 @defun get-charset name

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

398 This function retrieves the charset of the given name. Same as

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

399 @code{find-charset} except an error is signalled if there is no such

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

400 charset instead of returning @code{nil}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

401 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

402

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

403 @defun charset-list

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

404 This function returns a list of the names of all defined charsets.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

405 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

406

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

407 @defun make-charset name doc-string props

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

408 This function defines a new character set. This function is for use

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

409 with MULE support. @var{name} is a symbol, the name by which the

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

410 character set is normally referred. @var{doc-string} is a string

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

411 describing the character set. @var{props} is a property list,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

412 describing the specific nature of the character set. The recognized

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

413 properties are @code{registry}, @code{dimension}, @code{columns},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

414 @code{chars}, @code{final}, @code{graphic}, @code{direction}, and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

415 @code{ccl-program}, as previously described.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

416 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

417

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

418 @defun make-reverse-direction-charset charset new-name

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

419 This function makes a charset equivalent to @var{charset} but which goes

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

420 in the opposite direction. @var{new-name} is the name of the new

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

421 charset. The new charset is returned.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

422 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

423

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

424 @defun charset-from-attributes dimension chars final &optional direction

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

425 This function returns a charset with the given @var{dimension},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

426 @var{chars}, @var{final}, and @var{direction}. If @var{direction} is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

427 omitted, both directions will be checked (left-to-right will be returned

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

428 if character sets exist for both directions).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

429 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

430

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

431 @defun charset-reverse-direction-charset charset

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

432 This function returns the charset (if any) with the same dimension,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

433 number of characters, and final byte as @var{charset}, but which is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

434 displayed in the opposite direction.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

435 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

436

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

437 @node Charset Property Functions, Predefined Charsets, Basic Charset Functions, Charsets

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

438 @subsection Charset Property Functions

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

439

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

440 All of these functions accept either a charset name or charset object.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

441

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

442 @defun charset-property charset prop

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

443 This function returns property @var{prop} of @var{charset}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

444 @xref{Charset Properties}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

445 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

446

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

447 Convenience functions are also provided for retrieving individual

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

448 properties of a charset.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

449

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

450 @defun charset-name charset

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

451 This function returns the name of @var{charset}. This will be a symbol.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

452 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

453

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

454 @defun charset-description charset

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

455 This function returns the documentation string of @var{charset}.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

456 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

457

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

458 @defun charset-registry charset

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

459 This function returns the registry of @var{charset}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

460 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

461

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

462 @defun charset-dimension charset

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

463 This function returns the dimension of @var{charset}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

464 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

465

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

466 @defun charset-chars charset

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

467 This function returns the number of characters per dimension of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

468 @var{charset}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

469 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

470

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

471 @defun charset-width charset

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

472 This function returns the number of display columns per character (in

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

473 TTY mode) of @var{charset}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

474 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

475

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

476 @defun charset-direction charset

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

477 This function returns the display direction of @var{charset}---either

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

478 @code{l2r} or @code{r2l}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

479 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

480

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

481 @defun charset-iso-final-char charset

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

482 This function returns the final byte of the ISO 2022 escape sequence

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

483 designating @var{charset}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

484 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

485

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

486 @defun charset-iso-graphic-plane charset

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

487 This function returns either 0 or 1, depending on whether the position

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

488 codes of characters in @var{charset} map to the left or right half

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

489 of their font, respectively.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

490 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

491

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

492 @defun charset-ccl-program charset

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

493 This function returns the CCL program, if any, for converting

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

494 position codes of characters in @var{charset} into font indices.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

495 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

496

1734

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

497 The two properties of a charset that can currently be set after the

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

498 charset has been created are the CCL program and the font registry.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

499

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

500 @defun set-charset-ccl-program charset ccl-program

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

501 This function sets the @code{ccl-program} property of @var{charset} to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

502 @var{ccl-program}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

503 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

504

1734

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

505 @defun set-charset-registry charset registry

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

506 This function sets the @code{registry} property of @var{charset} to

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

507 @var{registry}.

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

508 @end defun

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

509

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

510 @node Predefined Charsets, , Charset Property Functions, Charsets

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

511 @subsection Predefined Charsets

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

512

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

513 The following charsets are predefined in the C code.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

514

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

515 @example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

516 Name Type Fi Gr Dir Registry

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

517 --------------------------------------------------------------

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

518 ascii 94 B 0 l2r ISO8859-1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

519 control-1 94 0 l2r ---

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

520 latin-iso8859-1 94 A 1 l2r ISO8859-1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

521 latin-iso8859-2 96 B 1 l2r ISO8859-2

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

522 latin-iso8859-3 96 C 1 l2r ISO8859-3

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

523 latin-iso8859-4 96 D 1 l2r ISO8859-4

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

524 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

525 arabic-iso8859-6 96 G 1 r2l ISO8859-6

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

526 greek-iso8859-7 96 F 1 l2r ISO8859-7

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

527 hebrew-iso8859-8 96 H 1 r2l ISO8859-8

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

528 latin-iso8859-9 96 M 1 l2r ISO8859-9

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

529 thai-tis620 96 T 1 l2r TIS620

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

530 katakana-jisx0201 94 I 1 l2r JISX0201.1976

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

531 latin-jisx0201 94 J 0 l2r JISX0201.1976

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

532 japanese-jisx0208-1978 94x94 @@ 0 l2r JISX0208.1978

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

533 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83|90)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

534 japanese-jisx0212 94x94 D 0 l2r JISX0212

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

535 chinese-gb2312 94x94 A 0 l2r GB2312

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

536 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

537 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

538 chinese-big5-1 94x94 0 0 l2r Big5

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

539 chinese-big5-2 94x94 1 0 l2r Big5

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

540 korean-ksc5601 94x94 C 0 l2r KSC5601

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

541 composite 96x96 0 l2r ---

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

542 @end example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

543

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

544 The following charsets are predefined in the Lisp code.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

545

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

546 @example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

547 Name Type Fi Gr Dir Registry

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

548 --------------------------------------------------------------

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

549 arabic-digit 94 2 0 l2r MuleArabic-0

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

550 arabic-1-column 94 3 0 r2l MuleArabic-1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

551 arabic-2-column 94 4 0 r2l MuleArabic-2

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

552 sisheng 94 0 0 l2r sisheng_cwnn\|OMRON_UDC_ZH

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

553 chinese-cns11643-3 94x94 I 0 l2r CNS11643.1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

554 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

555 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

556 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

557 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

558 ethiopic 94x94 2 0 l2r Ethio

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

559 ascii-r2l 94 B 0 r2l ISO8859-1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

560 ipa 96 0 1 l2r MuleIPA

1734

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

561 vietnamese-viscii-lower 96 1 1 l2r VISCII1.1

d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent]

stephent

parents: 1261

diff changeset

562 vietnamese-viscii-upper 96 2 1 l2r VISCII1.1

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

563 @end example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

564

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

565 For all of the above charsets, the dimension and number of columns are

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

566 the same.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

567

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

568 Note that ASCII, Control-1, and Composite are handled specially.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

569 This is why some of the fields are blank; and some of the filled-in

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

570 fields (e.g. the type) are not really accurate.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

571

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

572 @node MULE Characters, Composite Characters, Charsets, MULE

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

573 @section MULE Characters

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

574

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

575 @defun make-char charset arg1 &optional arg2

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

576 This function makes a multi-byte character from @var{charset} and octets

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

577 @var{arg1} and @var{arg2}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

578 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

579

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

580 @defun char-charset character

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

581 This function returns the character set of char @var{character}.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

582 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

583

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

584 @defun char-octet character &optional n

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

585 This function returns the octet (i.e. position code) numbered @var{n}

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

586 (should be 0 or 1) of char @var{character}. @var{n} defaults to 0 if omitted.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

587 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

588

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

589 @defun find-charset-region start end &optional buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

590 This function returns a list of the charsets in the region between

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

591 @var{start} and @var{end}. @var{buffer} defaults to the current buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

592 if omitted.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

593 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

594

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

595 @defun find-charset-string string

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

596 This function returns a list of the charsets in @var{string}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

597 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

598

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

599 @node Composite Characters, Coding Systems, MULE Characters, MULE

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

600 @section Composite Characters

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

601

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

602 Composite characters are not yet completely implemented.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

603

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

604 @defun make-composite-char string

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

605 This function converts a string into a single composite character. The

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

606 character is the result of overstriking all the characters in the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

607 string.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

608 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

609

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

610 @defun composite-char-string character

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

611 This function returns a string of the characters comprising a composite

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

612 character.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

613 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

614

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

615 @defun compose-region start end &optional buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

616 This function composes the characters in the region from @var{start} to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

617 @var{end} in @var{buffer} into one composite character. The composite

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

618 character replaces the composed characters. @var{buffer} defaults to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

619 the current buffer if omitted.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

620 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

621

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

622 @defun decompose-region start end &optional buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

623 This function decomposes any composite characters in the region from

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

624 @var{start} to @var{end} in @var{buffer}. This converts each composite

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

625 character into one or more characters, the individual characters out of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

626 which the composite character was formed. Non-composite characters are

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

627 left as-is. @var{buffer} defaults to the current buffer if omitted.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

628 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

629

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

630 @node Coding Systems, CCL, Composite Characters, MULE

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

631 @section Coding Systems

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

632

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

633 A coding system is an object that defines how text containing multiple

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

634 character sets is encoded into a stream of (typically 8-bit) bytes. The

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

635 coding system is used to decode the stream into a series of characters

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

636 (which may be from multiple charsets) when the text is read from a file

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

637 or process, and is used to encode the text back into the same format

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

638 when it is written out to a file or process.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

639

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

640 For example, many ISO-2022-compliant coding systems (such as Compound

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

641 Text, which is used for inter-client data under the X Window System) use

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

642 escape sequences to switch between different charsets -- Japanese Kanji,

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

643 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

644 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

645 @code{make-coding-system} for more information.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

646

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

647 Coding systems are normally identified using a symbol, and the symbol is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

648 accepted in place of the actual coding system object whenever a coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

649 system is called for. (This is similar to how faces and charsets work.)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

650

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

651 @defun coding-system-p object

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

652 This function returns non-@code{nil} if @var{object} is a coding system.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

653 @end defun

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

654

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

655 @menu

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

656 * Coding System Types:: Classifying coding systems.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

657 * ISO 2022:: An international standard for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

658 charsets and encodings.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

659 * EOL Conversion:: Dealing with different ways of denoting

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

660 the end of a line.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

661 * Coding System Properties:: Properties of a coding system.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

662 * Basic Coding System Functions:: Working with coding systems.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

663 * Coding System Property Functions:: Retrieving a coding system's properties.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

664 * Encoding and Decoding Text:: Encoding and decoding text.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

665 * Detection of Textual Encoding:: Determining how text is encoded.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

666 * Big5 and Shift-JIS Functions:: Special functions for these non-standard

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

667 encodings.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

668 * Predefined Coding Systems:: Coding systems implemented by MULE.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

669 @end menu

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

670

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

671 @node Coding System Types, ISO 2022, , Coding Systems

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

672 @subsection Coding System Types

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

673

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

674 The coding system type determines the basic algorithm XEmacs will use to

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

675 decode or encode a data stream. Character encodings will be converted

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

676 to the MULE encoding, escape sequences processed, and newline sequences

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

677 converted to XEmacs's internal representation. There are three basic

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

678 classes of coding system type: no-conversion, ISO-2022, and special.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

679

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

680 No conversion allows you to look at the file's internal representation.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

681 Since XEmacs is basically a text editor, "no conversion" does convert

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

682 newline conventions by default. (Use the 'binary coding-system if this

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

683 is not desired.)

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

684

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

685 ISO 2022 (@pxref{ISO 2022}) is the basic international standard regulating

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

686 use of "coded character sets for the exchange of data", ie, text

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

687 streams. ISO 2022 contains functions that make it possible to encode

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

688 text streams to comply with restrictions of the Internet mail system and

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

689 de facto restrictions of most file systems (eg, use of the separator

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

690 character in file names). Coding systems which are not ISO 2022

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

691 conformant can be difficult to handle. Perhaps more important, they are

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

692 not adaptable to multilingual information interchange, with the obvious

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

693 exception of ISO 10646 (Unicode). (Unicode is partially supported by

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

694 XEmacs with the addition of the Lisp package ucs-conv.)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

695

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

696 The special class of coding systems includes automatic detection, CCL (a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

697 "little language" embedded as an interpreter, useful for translating

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

698 between variants of a single character set), non-ISO-2022-conformant

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

699 encodings like Unicode, Shift JIS, and Big5, and MULE internal coding.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

700 (NB: this list is based on XEmacs 21.2. Terminology may vary slightly

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

701 for other versions of XEmacs and for GNU Emacs 20.)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

702

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

703 @table @code

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

704 @item no-conversion

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

705 No conversion, for binary files, and a few special cases of non-ISO-2022

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

706 coding systems where conversion is done by hook functions (usually

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

707 implemented in CCL). On output, graphic characters that are not in

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

708 ASCII or Latin-1 will be replaced by a @samp{?}. (For a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

709 no-conversion-encoded buffer, these characters will only be present if

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

710 you explicitly insert them.)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

711 @item iso2022

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

712 Any ISO-2022-compliant encoding. Among others, this includes JIS (the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

713 Japanese encoding commonly used for e-mail), national variants of EUC

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

714 (the standard Unix encoding for Japanese and other languages), and

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

715 Compound Text (an encoding used in X11). You can specify more specific

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

716 information about the conversion with the @var{flags} argument.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

717 @item ucs-4

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

718 ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of Unicode.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

719 @item utf-8

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

720 ISO 10646 UTF-8 encoding. A ``file system safe'' transformation format

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

721 that can be used with both UCS-4 and Unicode.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

722 @item undecided

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

723 Automatic conversion. XEmacs attempts to detect the coding system used

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

724 in the file.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

725 @item shift-jis

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

726 Shift-JIS (a Japanese encoding commonly used in PC operating systems).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

727 @item big5

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

728 Big5 (the encoding commonly used for Taiwanese).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

729 @item ccl

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

730 The conversion is performed using a user-written pseudo-code program.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

731 CCL (Code Conversion Language) is the name of this pseudo-code. For

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

732 example, CCL is used to map KOI8-R characters (an encoding for Russian

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

733 Cyrillic) to ISO8859-5 (the form used internally by MULE).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

734 @item internal

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

735 Write out or read in the raw contents of the memory representing the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

736 buffer's text. This is primarily useful for debugging purposes, and is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

737 only enabled when XEmacs has been compiled with @code{DEBUG_XEMACS} set

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

738 (the @samp{--debug} configure option). @strong{Warning}: Reading in a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

739 file using @code{internal} conversion can result in an internal

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

740 inconsistency in the memory representing a buffer's text, which will

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

741 produce unpredictable results and may cause XEmacs to crash. Under

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

742 normal circumstances you should never use @code{internal} conversion.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

743 @end table

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

744

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

745 @node ISO 2022, EOL Conversion, Coding System Types, Coding Systems

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

746 @section ISO 2022

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

747

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

748 This section briefly describes the ISO 2022 encoding standard. A more

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

749 thorough treatment is available in the original document of ISO

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

750 2022 as well as various national standards (such as JIS X 0202).

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

751

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

752 Character sets (@dfn{charsets}) are classified into the following four

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

753 categories, according to the number of characters in the charset:

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

754 94-charset, 96-charset, 94x94-charset, and 96x96-charset. This means

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

755 that although an ISO 2022 coding system may have variable width

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

756 characters, each charset used is fixed-width (in contrast to the MULE

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

757 character set and UTF-8, for example).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

758

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

759 ISO 2022 provides for switching between character sets via escape

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

760 sequences. This switching is somewhat complicated, because ISO 2022

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

761 provides for both legacy applications like Internet mail that accept

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

762 only 7 significant bits in some contexts (RFC 822 headers, for example),

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

763 and more modern "8-bit clean" applications. It also provides for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

764 compact and transparent representation of languages like Japanese which

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

765 mix ASCII and a national script (even outside of computer programs).

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

766

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

767 First, ISO 2022 codified prevailing practice by dividing the code space

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

768 into "control" and "graphic" regions. The code points 0x00-0x1F and

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

769 0x80-0x9F are reserved for "control characters", while "graphic

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

770 characters" must be assigned to code points in the regions 0x20-0x7F and

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

771 0xA0-0xFF. The positions 0x20 and 0x7F are special, and under some

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

772 circumstances must be assigned the graphic character "ASCII SPACE" and

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

773 the control character "ASCII DEL" respectively.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

774

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

775 The various regions are given the name C0 (0x00-0x1F), GL (0x20-0x7F),

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

776 C1 (0x80-0x9F), and GR (0xA0-0xFF). GL and GR stand for "graphic left"

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

777 and "graphic right", respectively, because of the standard method of

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

778 displaying graphic character sets in tables with the high byte indexing

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

779 columns and the low byte indexing rows. I don't find it very intuitive,

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

780 but these are called "registers".

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

781

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

782 An ISO 2022-conformant encoding for a graphic character set must use a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

783 fixed number of bytes per character, and the values must fit into a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

784 single register; that is, each byte must range over either 0x20-0x7F, or

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

785 0xA0-0xFF. It is not allowed to extend the range of the repertoire of a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

786 character set by using both ranges at the same. This is why a standard

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

787 character set such as ISO 8859-1 is actually considered by ISO 2022 to

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

788 be an aggregation of two character sets, ASCII and LATIN-1, and why it

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

789 is technically incorrect to refer to ISO 8859-1 as "Latin 1". Also, a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

790 single character's bytes must all be drawn from the same register; this

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

791 is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

792 2022-compatible encodings.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

793

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

794 The reason for this restriction becomes clear when you attempt to define

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

795 an efficient, robust encoding for a language like Japanese. Like ISO

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

796 8859, Japanese encodings are aggregations of several character sets. In

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

797 practice, the vast majority of characters are drawn from the "JIS Roman"

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

798 character set (a derivative of ASCII; it won't hurt to think of it as

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

799 ASCII) and the JIS X 0208 standard "basic Japanese" character set

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

800 including not only ideographic characters ("kanji") but syllabic

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

801 Japanese characters ("kana"), a wide variety of symbols, and many

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

802 alphabetic characters (Roman, Greek, and Cyrillic) as well. Although

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

803 JIS X 0208 includes the whole Roman alphabet, as a 2-byte code it is not

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

804 suited to programming; thus the inclusion of ASCII in the standard

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

805 Japanese encodings.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

806

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

807 For normal Japanese text such as in newspapers, a broad repertoire of

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

808 approximately 3000 characters is used. Evidently this won't fit into

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

809 one byte; two must be used. But much of the text processed by Japanese

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

810 computers is computer source code, nearly all of which is ASCII. A not

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

811 insignificant portion of ordinary text is English (as such or as

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

812 borrowed Japanese vocabulary) or other languages which can represented

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

813 at least approximately in ASCII, as well. It seems reasonable then to

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

814 represent ASCII in one byte, and JIS X 0208 in two. And this is exactly

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

815 what the Extended Unix Code for Japanese (EUC-JP) does. ASCII is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

816 invoked to the GL register, and JIS X 0208 is invoked to the GR

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

817 register. Thus, each byte can be tested for its character set by

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

818 looking at the high bit; if set, it is Japanese, if clear, it is ASCII.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

819 Furthermore, since control characters like newline can never be part of

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

820 a graphic character, even in the case of corruption in transmission the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

821 stream will be resynchronized at every line break, on the order of 60-80

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

822 bytes. This coding system requires no escape sequences or special

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

823 control codes to represent 99.9% of all Japanese text.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

824

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

825 Note carefully the distinction between the character sets (ASCII and JIS

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

826 X 0208), the encoding (EUC-JP), and the coding system (ISO 2022). The

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

827 JIS X 0208 character set is used in three different encodings for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

828 Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

829 always clear), in EUC-JP it is invoked into GR (setting the high bit in

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

830 the process), and in Shift JIS the high bit may be set or reset, and the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

831 significant bits are shifted within the 16-bit character so that the two

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

832 main character sets can coexist with a third (the "halfwidth katakana"

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

833 of JIS X 0201). As the name implies, the ISO-2022-JP encoding is also a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

834 version of the ISO-2022 coding system.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

835

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

836 In order to systematically treat subsidiary character sets (like the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

837 "halfwidth katakana" already mentioned, and the "supplementary kanji" of

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

838 JIS X 0212), four further registers are defined: G0, G1, G2, and G3.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

839 Unlike GL and GR, they are not logically distinguished by internal

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

840 format. Instead, the process of "invocation" mentioned earlier is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

841 broken into two steps: first, a character set is @dfn{designated} to one

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

842 of the registers G0-G3 by use of an @dfn{escape sequence} of the form:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

843

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

844 @example

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

845 ESC [@var{I}] @var{I} @var{F}

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

846 @end example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

847

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

848 where @var{I} is an intermediate character or characters in the range

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

849 0x20 - 0x3F, and @var{F}, from the range 0x30-0x7Fm is the final

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

850 character identifying this charset. (Final characters in the range

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

851 0x30-0x3F are reserved for private use and will never have a publicly

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

852 registered meaning.)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

853

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

854 Then that register is @dfn{invoked} to either GL or GR, either

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

855 automatically (designations to G0 normally involve invocation to GL as

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

856 well), or by use of shifting (affecting only the following character in

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

857 the data stream) or locking (effective until the next designation or

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

858 locking) control sequences. An encoding conformant to ISO 2022 is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

859 typically defined by designating the initial contents of the G0-G3

901

37e56e920ac5 [xemacs-hg @ 2002-07-05 20:35:47 by adrian]

adrian

parents: 775

diff changeset

860 registers, specifying a 7 or 8 bit environment, and specifying whether

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

861 further designations will be recognized.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

862

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

863 Some examples of character sets and the registered final characters

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

864 @var{F} used to designate them:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

865

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

866 @need 1000

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

867 @table @asis

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

868 @item 94-charset

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

869 ASCII (B), left (J) and right (I) half of JIS X 0201, ...

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

870 @item 96-charset

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

871 Latin-1 (A), Latin-2 (B), Latin-3 (C), ...

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

872 @item 94x94-charset

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

873 GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

874 @item 96x96-charset

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

875 none for the moment

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

876 @end table

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

877

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

878 The meanings of the various characters in these sequences, where not

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

879 specified by the ISO 2022 standard (such as the ESC character), are

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

880 assigned by @dfn{ECMA}, the European Computer Manufacturers Association.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

881

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

882 The meaning of intermediate characters are:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

883

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

884 @example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

885 @group

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

886 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

887 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

888 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

889 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

890 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

891 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}.

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

892 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

893 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

894 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

895 @end group

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

896 @end example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

897

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

898 The comma may be used in files read and written only by MULE, as a MULE

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

899 extension, but this is illegal in ISO 2022. (The reason is that in ISO

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

900 2022 G0 must be a 94-member character set, with 0x20 assigned the value

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

901 SPACE, and 0x7F assigned the value DEL.)

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

902

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

903 Here are examples of designations:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

904

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

905 @example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

906 @group

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

907 ESC ( B : designate to G0 ASCII

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

908 ESC - A : designate to G1 Latin-1

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

909 ESC $ ( A or ESC $ A : designate to G0 GB2312

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

910 ESC $ ( B or ESC $ B : designate to G0 JISX0208

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

911 ESC $ ) C : designate to G1 KSC5601

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

912 @end group

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

913 @end example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

914

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

915 (The short forms used to designate GB2312 and JIS X 0208 are for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

916 backwards compatibility; the long forms are preferred.)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

917

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

918 To use a charset designated to G2 or G3, and to use a charset designated

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

919 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

920 into GL. There are two types of invocation, Locking Shift (forever) and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

921 Single Shift (one character only).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

922

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

923 Locking Shift is done as follows:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

924

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

925 @example

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

926 LS0 or SI (0x0F): invoke G0 into GL

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

927 LS1 or SO (0x0E): invoke G1 into GL

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

928 LS2: invoke G2 into GL

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

929 LS3: invoke G3 into GL

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

930 LS1R: invoke G1 into GR

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

931 LS2R: invoke G2 into GR

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

932 LS3R: invoke G3 into GR

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

933 @end example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

934

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

935 Single Shift is done as follows:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

936

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

937 @example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

938 @group

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

939 SS2 or ESC N: invoke G2 into GL

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

940 SS3 or ESC O: invoke G3 into GL

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

941 @end group

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

942 @end example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

943

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

944 The shift functions (such as LS1R and SS3) are represented by control

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

945 characters (from C1) in 8 bit environments and by escape sequences in 7

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

946 bit environments.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

947

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

948 (#### Ben says: I think the above is slightly incorrect. It appears that

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

949 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

950 ESC O behave as indicated. The above definitions will not parse

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

951 EUC-encoded text correctly, and it looks like the code in mule-coding.c

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

952 has similar problems.)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

953

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

954 Evidently there are a lot of ISO-2022-compliant ways of encoding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

955 multilingual text. Now, in the world, there exist many coding systems

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

956 such as X11's Compound Text, Japanese JUNET code, and so-called EUC

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

957 (Extended UNIX Code); all of these are variants of ISO 2022.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

958

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

959 In MULE, we characterize a version of ISO 2022 by the following

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

960 attributes:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

961

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

962 @enumerate

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

963 @item

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

964 The character sets initially designated to G0 thru G3.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

965 @item

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

966 Whether short form designations are allowed for Japanese and Chinese.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

967 @item

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

968 Whether ASCII should be designated to G0 before control characters.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

969 @item

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

970 Whether ASCII should be designated to G0 at the end of line.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

971 @item

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

972 7-bit environment or 8-bit environment.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

973 @item

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

974 Whether Locking Shifts are used or not.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

975 @item

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

976 Whether to use ASCII or the variant JIS X 0201-1976-Roman.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

977 @item

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

978 Whether to use JIS X 0208-1983 or the older version JIS X 0208-1976.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

979 @end enumerate

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

980

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

981 (The last two are only for Japanese.)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

982

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

983 By specifying these attributes, you can create any variant

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

984 of ISO 2022.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

985

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

986 Here are several examples:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

987

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

988 @example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

989 @group

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

990 ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

991 1. G0 <- ASCII, G1..3 <- never used

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

992 2. Yes.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

993 3. Yes.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

994 4. Yes.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

995 5. 7-bit environment

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

996 6. No.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

997 7. Use ASCII

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

998 8. Use JIS X 0208-1983

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

999 @end group

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1000

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1001 @group

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1002 ctext -- X11 Compound Text

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1003 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1004 2. No.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1005 3. No.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1006 4. Yes.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1007 5. 8-bit environment.

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1008 6. No.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1009 7. Use ASCII.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1010 8. Use JIS X 0208-1983.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1011 @end group

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1012

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1013 @group

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1014 euc-china -- Chinese EUC. Often called the "GB encoding", but that is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1015 technically incorrect.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1016 1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1017 2. No.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1018 3. Yes.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1019 4. Yes.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1020 5. 8-bit environment.

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1021 6. No.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1022 7. Use ASCII.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1023 8. Use JIS X 0208-1983.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1024 @end group

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1025

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1026 @group

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1027 ISO-2022-KR -- Coding system used in Korean email.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1028 1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1029 2. No.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1030 3. Yes.

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1031 4. Yes.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1032 5. 7-bit environment.

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

1033 6. Yes.

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1034 7. Use ASCII.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1035 8. Use JIS X 0208-1983.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1036 @end group

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1037 @end example

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1038

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1039 MULE creates all of these coding systems by default.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1040

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1041 @node EOL Conversion, Coding System Properties, ISO 2022, Coding Systems

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1042 @subsection EOL Conversion

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1043

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1044 @table @code

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1045 @item nil

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1046 Automatically detect the end-of-line type (LF, CRLF, or CR). Also

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1047 generate subsidiary coding systems named @code{@var{name}-unix},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1048 @code{@var{name}-dos}, and @code{@var{name}-mac}, that are identical to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1049 this coding system but have an EOL-TYPE value of @code{lf}, @code{crlf},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1050 and @code{cr}, respectively.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1051 @item lf

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1052 The end of a line is marked externally using ASCII LF. Since this is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1053 also the way that XEmacs represents an end-of-line internally,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1054 specifying this option results in no end-of-line conversion. This is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1055 the standard format for Unix text files.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1056 @item crlf

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1057 The end of a line is marked externally using ASCII CRLF. This is the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1058 standard format for MS-DOS text files.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1059 @item cr

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1060 The end of a line is marked externally using ASCII CR. This is the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1061 standard format for Macintosh text files.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1062 @item t

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1063 Automatically detect the end-of-line type but do not generate subsidiary

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1064 coding systems. (This value is converted to @code{nil} when stored

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1065 internally, and @code{coding-system-property} will return @code{nil}.)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1066 @end table

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1067

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1068 @node Coding System Properties, Basic Coding System Functions, EOL Conversion, Coding Systems

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1069 @subsection Coding System Properties

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1070

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1071 @table @code

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1072 @item mnemonic

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1073 String to be displayed in the modeline when this coding system is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1074 active.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1075

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1076 @item eol-type

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1077 End-of-line conversion to be used. It should be one of the types

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1078 listed in @ref{EOL Conversion}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1079

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1080 @item eol-lf

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1081 The coding system which is the same as this one, except that it uses the

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1082 Unix line-breaking convention.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1083

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1084 @item eol-crlf

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1085 The coding system which is the same as this one, except that it uses the

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1086 DOS line-breaking convention.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1087

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1088 @item eol-cr

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1089 The coding system which is the same as this one, except that it uses the

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1090 Macintosh line-breaking convention.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1091

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1092 @item post-read-conversion

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1093 Function called after a file has been read in, to perform the decoding.

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1094 Called with two arguments, @var{start} and @var{end}, denoting a region of

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1095 the current buffer to be decoded.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1096

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1097 @item pre-write-conversion

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1098 Function called before a file is written out, to perform the encoding.

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1099 Called with two arguments, @var{start} and @var{end}, denoting a region of

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1100 the current buffer to be encoded.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1101 @end table

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1102

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1103 The following additional properties are recognized if @var{type} is

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1104 @code{iso2022}:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1105

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1106 @table @code

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1107 @item charset-g0

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1108 @itemx charset-g1

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1109 @itemx charset-g2

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1110 @itemx charset-g3

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1111 The character set initially designated to the G0 - G3 registers.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1112 The value should be one of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1113

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1114 @itemize @bullet

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1115 @item

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1116 A charset object (designate that character set)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1117 @item

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1118 @code{nil} (do not ever use this register)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1119 @item

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1120 @code{t} (no character set is initially designated to the register, but

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1121 may be later on; this automatically sets the corresponding

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1122 @code{force-g*-on-output} property)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1123 @end itemize

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1124

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1125 @item force-g0-on-output

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1126 @itemx force-g1-on-output

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1127 @itemx force-g2-on-output

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1128 @itemx force-g3-on-output

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1129 If non-@code{nil}, send an explicit designation sequence on output

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1130 before using the specified register.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1131

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1132 @item short

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1133 If non-@code{nil}, use the short forms @samp{ESC $ @@}, @samp{ESC $ A},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1134 and @samp{ESC $ B} on output in place of the full designation sequences

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1135 @samp{ESC $ ( @@}, @samp{ESC $ ( A}, and @samp{ESC $ ( B}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1136

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1137 @item no-ascii-eol

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1138 If non-@code{nil}, don't designate ASCII to G0 at each end of line on

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1139 output. Setting this to non-@code{nil} also suppresses other

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1140 state-resetting that normally happens at the end of a line.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1141

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1142 @item no-ascii-cntl

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1143 If non-@code{nil}, don't designate ASCII to G0 before control chars on

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1144 output.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1145

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1146 @item seven

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1147 If non-@code{nil}, use 7-bit environment on output. Otherwise, use 8-bit

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1148 environment.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1149

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1150 @item lock-shift

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1151 If non-@code{nil}, use locking-shift (SO/SI) instead of single-shift or

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1152 designation by escape sequence.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1153

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1154 @item no-iso6429

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1155 If non-@code{nil}, don't use ISO6429's direction specification.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1156

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1157 @item escape-quoted

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1158 If non-@code{nil}, literal control characters that are the same as the

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1159 beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1160 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F),

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1161 and CSI (0x9B)) are ``quoted'' with an escape character so that they can

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1162 be properly distinguished from an escape sequence. (Note that doing

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1163 this results in a non-portable encoding.) This encoding flag is used for

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1164 byte-compiled files. Note that ESC is a good choice for a quoting

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1165 character because there are no escape sequences whose second byte is a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1166 character from the Control-0 or Control-1 character sets; this is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1167 explicitly disallowed by the ISO 2022 standard.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1168

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1169 @item input-charset-conversion

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1170 A list of conversion specifications, specifying conversion of characters

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1171 in one charset to another when decoding is performed. Each

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1172 specification is a list of two elements: the source charset, and the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1173 destination charset.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1174

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1175 @item output-charset-conversion

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1176 A list of conversion specifications, specifying conversion of characters

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1177 in one charset to another when encoding is performed. The form of each

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1178 specification is the same as for @code{input-charset-conversion}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1179 @end table

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1180

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1181 The following additional properties are recognized (and required) if

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1182 @var{type} is @code{ccl}:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1183

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1184 @table @code

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1185 @item decode

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1186 CCL program used for decoding (converting to internal format).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1187

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1188 @item encode

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1189 CCL program used for encoding (converting to external format).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1190 @end table

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1191

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1192 The following properties are used internally: @var{eol-cr},

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1193 @var{eol-crlf}, @var{eol-lf}, and @var{base}.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1194

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1195 @node Basic Coding System Functions, Coding System Property Functions, Coding System Properties, Coding Systems

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1196 @subsection Basic Coding System Functions

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1197

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1198 @defun find-coding-system coding-system-or-name

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1199 This function retrieves the coding system of the given name.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1200

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1201 If @var{coding-system-or-name} is a coding-system object, it is simply

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1202 returned. Otherwise, @var{coding-system-or-name} should be a symbol.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1203 If there is no such coding system, @code{nil} is returned. Otherwise

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1204 the associated coding system object is returned.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1205 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1206

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1207 @defun get-coding-system name

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1208 This function retrieves the coding system of the given name. Same as

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1209 @code{find-coding-system} except an error is signalled if there is no

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1210 such coding system instead of returning @code{nil}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1211 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1212

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1213 @defun coding-system-list

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1214 This function returns a list of the names of all defined coding systems.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1215 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1216

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1217 @defun coding-system-name coding-system

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1218 This function returns the name of the given coding system.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1219 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1220

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1221 @defun coding-system-base coding-system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1222 Returns the base coding system (undecided EOL convention)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1223 coding system.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1224 @end defun

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1225

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1226 @defun make-coding-system name type &optional doc-string props

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1227 This function registers symbol @var{name} as a coding system.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1228

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1229 @var{type} describes the conversion method used and should be one of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1230 the types listed in @ref{Coding System Types}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1231

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1232 @var{doc-string} is a string describing the coding system.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1233

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1234 @var{props} is a property list, describing the specific nature of the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1235 character set. Recognized properties are as in @ref{Coding System

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1236 Properties}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1237 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1238

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1239 @defun copy-coding-system old-coding-system new-name

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1240 This function copies @var{old-coding-system} to @var{new-name}. If

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1241 @var{new-name} does not name an existing coding system, a new one will

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1242 be created.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1243 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1244

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1245 @defun subsidiary-coding-system coding-system eol-type

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1246 This function returns the subsidiary coding system of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1247 @var{coding-system} with eol type @var{eol-type}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1248 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1249

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1250 @node Coding System Property Functions, Encoding and Decoding Text, Basic Coding System Functions, Coding Systems

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1251 @subsection Coding System Property Functions

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1252

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1253 @defun coding-system-doc-string coding-system

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1254 This function returns the doc string for @var{coding-system}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1255 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1256

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1257 @defun coding-system-type coding-system

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1258 This function returns the type of @var{coding-system}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1259 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1260

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1261 @defun coding-system-property coding-system prop

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1262 This function returns the @var{prop} property of @var{coding-system}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1263 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1264

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1265 @node Encoding and Decoding Text, Detection of Textual Encoding, Coding System Property Functions, Coding Systems

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1266 @subsection Encoding and Decoding Text

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1267

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1268 @defun decode-coding-region start end coding-system &optional buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1269 This function decodes the text between @var{start} and @var{end} which

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1270 is encoded in @var{coding-system}. This is useful if you've read in

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1271 encoded text from a file without decoding it (e.g. you read in a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1272 JIS-formatted file but used the @code{binary} or @code{no-conversion} coding

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1273 system, so that it shows up as @samp{^[$B!<!+^[(B}). The length of the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1274 encoded text is returned. @var{buffer} defaults to the current buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1275 if unspecified.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1276 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1277

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1278 @defun encode-coding-region start end coding-system &optional buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1279 This function encodes the text between @var{start} and @var{end} using

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1280 @var{coding-system}. This will, for example, convert Japanese

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1281 characters into stuff such as @samp{^[$B!<!+^[(B} if you use the JIS

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1282 encoding. The length of the encoded text is returned. @var{buffer}

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1283 defaults to the current buffer if unspecified.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1284 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1285

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1286 @node Detection of Textual Encoding, Big5 and Shift-JIS Functions, Encoding and Decoding Text, Coding Systems

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1287 @subsection Detection of Textual Encoding

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1288

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1289 @defun coding-category-list

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1290 This function returns a list of all recognized coding categories.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1291 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1292

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1293 @defun set-coding-priority-list list

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1294 This function changes the priority order of the coding categories.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1295 @var{list} should be a list of coding categories, in descending order of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1296 priority. Unspecified coding categories will be lower in priority than

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1297 all specified ones, in the same relative order they were in previously.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1298 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1299

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1300 @defun coding-priority-list

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1301 This function returns a list of coding categories in descending order of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1302 priority.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1303 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1304

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1305 @defun set-coding-category-system coding-category coding-system

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1306 This function changes the coding system associated with a coding category.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1307 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1308

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1309 @defun coding-category-system coding-category

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1310 This function returns the coding system associated with a coding category.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1311 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1312

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1313 @defun detect-coding-region start end &optional buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1314 This function detects coding system of the text in the region between

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1315 @var{start} and @var{end}. Returned value is a list of possible coding

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1316 systems ordered by priority. If only ASCII characters are found, it

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1317 returns @code{autodetect} or one of its subsidiary coding systems

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1318 according to a detected end-of-line type. Optional arg @var{buffer}

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1319 defaults to the current buffer.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1320 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1321

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1322 @node Big5 and Shift-JIS Functions, Predefined Coding Systems, Detection of Textual Encoding, Coding Systems

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1323 @subsection Big5 and Shift-JIS Functions

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1324

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1325 These are special functions for working with the non-standard

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1326 Shift-JIS and Big5 encodings.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1327

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1328 @defun decode-shift-jis-char code

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1329 This function decodes a JIS X 0208 character of Shift-JIS coding-system.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1330 @var{code} is the character code in Shift-JIS as a cons of type bytes.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1331 The corresponding character is returned.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1332 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1333

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1334 @defun encode-shift-jis-char character

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1335 This function encodes a JIS X 0208 character @var{character} to

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1336 SHIFT-JIS coding-system. The corresponding character code in SHIFT-JIS

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1337 is returned as a cons of two bytes.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1338 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1339

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1340 @defun decode-big5-char code

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1341 This function decodes a Big5 character @var{code} of BIG5 coding-system.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1342 @var{code} is the character code in BIG5. The corresponding character

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1343 is returned.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1344 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1345

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1346 @defun encode-big5-char character

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1347 This function encodes the Big5 character @var{character} to BIG5

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1348 coding-system. The corresponding character code in Big5 is returned.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1349 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1350

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1351 @node Predefined Coding Systems, , Big5 and Shift-JIS Functions, Coding Systems

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1352 @subsection Coding Systems Implemented

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1353

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1354 MULE initializes most of the commonly used coding systems at XEmacs's

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1355 startup. A few others are initialized only when the relevant language

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1356 environment is selected and support libraries are loaded. (NB: The

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1357 following list is based on XEmacs 21.2.19, the development branch at the

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1358 time of writing. The list may be somewhat different for other

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1359 versions. Recent versions of GNU Emacs 20 implement a few more rare

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1360 coding systems; work is being done to port these to XEmacs.)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1361

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1362 Unfortunately, there is not a consistent naming convention for character

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1363 sets, and for practical purposes coding systems often take their name

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1364 from their principal character sets (ASCII, KOI8-R, Shift JIS). Others

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1365 take their names from the coding system (ISO-2022-JP, EUC-KR), and a few

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1366 from their non-text usages (internal, binary). To provide for this, and

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1367 for the fact that many coding systems have several common names, an

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1368 aliasing system is provided. Finally, some effort has been made to use

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1369 names that are registered as MIME charsets (this is why the name

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1370 'shift_jis contains that un-Lisp-y underscore).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1371

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1372 There is a systematic naming convention regarding end-of-line (EOL)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1373 conventions for different systems. A coding system whose name ends in

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1374 "-unix" forces the assumptions that lines are broken by newlines (0x0A).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1375 A coding system whose name ends in "-mac" forces the assumptions that

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1376 lines are broken by ASCII CRs (0x0D). A coding system whose name ends

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1377 in "-dos" forces the assumptions that lines are broken by CRLF sequences

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1378 (0x0D 0x0A). These subsidiary coding systems are automatically derived

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1379 from a base coding system. Use of the base coding system implies

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1380 autodetection of the text file convention. (The fact that the -unix,

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1381 -mac, and -dos are derived from a base system results in them showing up

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1382 as "aliases" in `list-coding-systems'.) These subsidiaries have a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1383 consistent modeline indicator as well. "-dos" coding systems have ":T"

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1384 appended to their modeline indicator, while "-mac" coding systems have

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1385 ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1386

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1387 In the following table, each coding system is given with its mode line

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1388 indicator in parentheses. Non-textual coding systems are listed first,

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1389 followed by textual coding systems and their aliases. (The coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1390 subsidiary modeline indicators ":T" and ":t" will be omitted from the

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1391 table of coding systems.)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1392

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1393 ### SJT 1999-08-23 Maybe should order these by language? Definitely

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1394 need language usage for the ISO-8859 family.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1395

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1396 Note that although true coding system aliases have been implemented for

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1397 XEmacs 21.2, the coding system initialization has not yet been converted

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1398 as of 21.2.19. So coding systems described as aliases have the same

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1399 properties as the aliased coding system, but will not be equal as Lisp

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1400 objects.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1401

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1402 @table @code

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1403

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1404 @item automatic-conversion

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1405 @itemx undecided

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1406 @itemx undecided-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1407 @itemx undecided-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1408 @itemx undecided-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1409

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1410 Modeline indicator: @code{Auto}. A type @code{undecided} coding system.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1411 Attempts to determine an appropriate coding system from file contents or

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1412 the environment.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1413

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1414 @item raw-text

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1415 @itemx no-conversion

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1416 @itemx raw-text-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1417 @itemx raw-text-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1418 @itemx raw-text-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1419 @itemx no-conversion-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1420 @itemx no-conversion-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1421 @itemx no-conversion-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1422

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1423 Modeline indicator: @code{Raw}. A type @code{no-conversion} coding system,

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1424 which converts only line-break-codes. An implementation quirk means

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1425 that this coding system is also used for ISO8859-1.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1426

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1427 @item binary

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1428 Modeline indicator: @code{Binary}. A type @code{no-conversion} coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1429 system which does no character coding or EOL conversions. An alias for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1430 @code{raw-text-unix}.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1431

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1432 @item alternativnyj

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1433 @itemx alternativnyj-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1434 @itemx alternativnyj-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1435 @itemx alternativnyj-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1436

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1437 Modeline indicator: @code{Cy.Alt}. A type @code{ccl} coding system used for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1438 Alternativnyj, an encoding of the Cyrillic alphabet.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1439

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1440 @item big5

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1441 @itemx big5-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1442 @itemx big5-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1443 @itemx big5-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1444

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1445 Modeline indicator: @code{Zh/Big5}. A type @code{big5} coding system used for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1446 BIG5, the most common encoding of traditional Chinese as used in Taiwan.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1447

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1448 @item cn-gb-2312

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1449 @itemx cn-gb-2312-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1450 @itemx cn-gb-2312-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1451 @itemx cn-gb-2312-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1452

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1453 Modeline indicator: @code{Zh-GB/EUC}. A type @code{iso2022} coding system used

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1454 for simplified Chinese (as used in the People's Republic of China), with

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1455 the @code{ascii} (G0), @code{chinese-gb2312} (G1), and @code{sisheng}

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1456 (G2) character sets initially designated. Chinese EUC (Extended Unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1457 Code).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1458

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1459 @item ctext-hebrew

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1460 @itemx ctext-hebrew-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1461 @itemx ctext-hebrew-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1462 @itemx ctext-hebrew-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1463

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1464 Modeline indicator: @code{CText/Hbrw}. A type @code{iso2022} coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1465 with the @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) character

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1466 sets initially designated for Hebrew.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1467

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1468 @item ctext

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1469 @itemx ctext-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1470 @itemx ctext-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1471 @itemx ctext-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1472

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1473 Modeline indicator: @code{CText}. A type @code{iso2022} 8-bit coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1474 with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1) character

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1475 sets initially designated. X11 Compound Text Encoding. Often

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1476 mistakenly recognized instead of EUC encodings; usual cause is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1477 inappropriate setting of @code{coding-priority-list}.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1478

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1479 @item escape-quoted

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1480

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1481 Modeline indicator: @code{ESC/Quot}. A type @code{iso2022} 8-bit coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1482 system with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1483 character sets initially designated and escape quoting. Unix EOL

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1484 conversion (ie, no conversion). It is used for .ELC files.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1485

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1486 @item euc-jp

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1487 @itemx euc-jp-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1488 @itemx euc-jp-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1489 @itemx euc-jp-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1490

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1491 Modeline indicator: @code{Ja/EUC}. A type @code{iso2022} 8-bit coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1492 with @code{ascii} (G0), @code{japanese-jisx0208} (G1),

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1493 @code{katakana-jisx0201} (G2), and @code{japanese-jisx0212} (G3)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1494 initially designated. Japanese EUC (Extended Unix Code).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1495

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1496 @item euc-kr

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1497 @itemx euc-kr-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1498 @itemx euc-kr-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1499 @itemx euc-kr-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1500

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1501 Modeline indicator: @code{ko/EUC}. A type @code{iso2022} 8-bit coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1502 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1503 designated. Korean EUC (Extended Unix Code).

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1504

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1505 @item hz-gb-2312

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1506 Modeline indicator: @code{Zh-GB/Hz}. A type @code{no-conversion} coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1507 system with Unix EOL convention (ie, no conversion) using

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1508 post-read-decode and pre-write-encode functions to translate the Hz/ZW

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1509 coding system used for Chinese.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1510

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1511 @item iso-2022-7bit

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1512 @itemx iso-2022-7bit-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1513 @itemx iso-2022-7bit-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1514 @itemx iso-2022-7bit-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1515 @itemx iso-2022-7

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1516

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1517 Modeline indicator: @code{ISO7}. A type @code{iso2022} 7-bit coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1518 with @code{ascii} (G0) initially designated. Other character sets must

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1519 be explicitly designated to be used.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1520

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1521 @item iso-2022-7bit-ss2

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1522 @itemx iso-2022-7bit-ss2-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1523 @itemx iso-2022-7bit-ss2-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1524 @itemx iso-2022-7bit-ss2-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1525

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1526 Modeline indicator: @code{ISO7/SS}. A type @code{iso2022} 7-bit coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1527 with @code{ascii} (G0) initially designated. Other character sets must

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1528 be explicitly designated to be used. SS2 is used to invoke a

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1529 96-charset, one character at a time.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1530

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1531 @item iso-2022-8

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1532 @itemx iso-2022-8-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1533 @itemx iso-2022-8-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1534 @itemx iso-2022-8-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1535

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1536 Modeline indicator: @code{ISO8}. A type @code{iso2022} 8-bit coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1537 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1538 designated. Other character sets must be explicitly designated to be

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1539 used. No single-shift or locking-shift.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1540

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1541 @item iso-2022-8bit-ss2

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1542 @itemx iso-2022-8bit-ss2-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1543 @itemx iso-2022-8bit-ss2-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1544 @itemx iso-2022-8bit-ss2-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1545

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1546 Modeline indicator: @code{ISO8/SS}. A type @code{iso2022} 8-bit coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1547 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1548 designated. Other character sets must be explicitly designated to be

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1549 used. SS2 is used to invoke a 96-charset, one character at a time.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1550

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1551 @item iso-2022-int-1

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1552 @itemx iso-2022-int-1-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1553 @itemx iso-2022-int-1-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1554 @itemx iso-2022-int-1-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1555

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1556 Modeline indicator: @code{INT-1}. A type @code{iso2022} 7-bit coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1557 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1558 designated. ISO-2022-INT-1.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1559

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1560 @item iso-2022-jp-1978-irv

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1561 @itemx iso-2022-jp-1978-irv-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1562 @itemx iso-2022-jp-1978-irv-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1563 @itemx iso-2022-jp-1978-irv-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1564

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1565 Modeline indicator: @code{Ja-78/7bit}. A type @code{iso2022} 7-bit coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1566 system. For compatibility with old Japanese terminals; if you need to

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1567 know, look at the source.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1568

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1569 @item iso-2022-jp

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1570 @itemx iso-2022-jp-2 (ISO7/SS)

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1571 @itemx iso-2022-jp-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1572 @itemx iso-2022-jp-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1573 @itemx iso-2022-jp-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1574 @itemx iso-2022-jp-2-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1575 @itemx iso-2022-jp-2-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1576 @itemx iso-2022-jp-2-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1577

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1578 Modeline indicator: @code{MULE/7bit}. A type @code{iso2022} 7-bit coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1579 system with @code{ascii} (G0) initially designated, and complex

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1580 specifications to insure backward compatibility with old Japanese

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1581 systems. Used for communication with mail and news in Japan. The "-2"

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1582 versions also use SS2 to invoke a 96-charset one character at a time.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1583

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1584 @item iso-2022-kr

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1585 Modeline indicator: @code{Ko/7bit} A type @code{iso2022} 7-bit coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1586 system with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1587 designated. Used for e-mail in Korea.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1588

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1589 @item iso-2022-lock

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1590 @itemx iso-2022-lock-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1591 @itemx iso-2022-lock-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1592 @itemx iso-2022-lock-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1593

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1594 Modeline indicator: @code{ISO7/Lock}. A type @code{iso2022} 7-bit coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1595 system with @code{ascii} (G0) initially designated, using Locking-Shift

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1596 to invoke a 96-charset.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1597

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1598 @item iso-8859-1

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1599 @itemx iso-8859-1-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1600 @itemx iso-8859-1-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1601 @itemx iso-8859-1-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1602

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1603 Due to implementation, this is not a type @code{iso2022} coding system,

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1604 but rather an alias for the @code{raw-text} coding system.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1605

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1606 @item iso-8859-2

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1607 @itemx iso-8859-2-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1608 @itemx iso-8859-2-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1609 @itemx iso-8859-2-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1610

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1611 Modeline indicator: @code{MIME/Ltn-2}. A type @code{iso2022} coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1612 system with @code{ascii} (G0) and @code{latin-iso8859-2} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1613 invoked.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1614

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1615 @item iso-8859-3

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1616 @itemx iso-8859-3-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1617 @itemx iso-8859-3-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1618 @itemx iso-8859-3-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1619

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1620 Modeline indicator: @code{MIME/Ltn-3}. A type @code{iso2022} coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1621 with @code{ascii} (G0) and @code{latin-iso8859-3} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1622 invoked.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1623

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1624 @item iso-8859-4

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1625 @itemx iso-8859-4-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1626 @itemx iso-8859-4-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1627 @itemx iso-8859-4-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1628

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1629 Modeline indicator: @code{MIME/Ltn-4}. A type @code{iso2022} coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1630 with @code{ascii} (G0) and @code{latin-iso8859-4} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1631 invoked.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1632

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1633 @item iso-8859-5

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1634 @itemx iso-8859-5-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1635 @itemx iso-8859-5-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1636 @itemx iso-8859-5-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1637

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1638 Modeline indicator: @code{ISO8/Cyr}. A type @code{iso2022} coding system with

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1639 @code{ascii} (G0) and @code{cyrillic-iso8859-5} (G1) initially invoked.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1640

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1641 @item iso-8859-7

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1642 @itemx iso-8859-7-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1643 @itemx iso-8859-7-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1644 @itemx iso-8859-7-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1645

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1646 Modeline indicator: @code{Grk}. A type @code{iso2022} coding system with

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1647 @code{ascii} (G0) and @code{greek-iso8859-7} (G1) initially invoked.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1648

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1649 @item iso-8859-8

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1650 @itemx iso-8859-8-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1651 @itemx iso-8859-8-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1652 @itemx iso-8859-8-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1653

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1654 Modeline indicator: @code{MIME/Hbrw}. A type @code{iso2022} coding system with

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1655 @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) initially invoked.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1656

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1657 @item iso-8859-9

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1658 @itemx iso-8859-9-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1659 @itemx iso-8859-9-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1660 @itemx iso-8859-9-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1661

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1662 Modeline indicator: @code{MIME/Ltn-5}. A type @code{iso2022} coding system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1663 with @code{ascii} (G0) and @code{latin-iso8859-9} (G1) initially

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1664 invoked.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1665

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1666 @item koi8-r

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1667 @itemx koi8-r-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1668 @itemx koi8-r-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1669 @itemx koi8-r-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1670

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1671 Modeline indicator: @code{KOI8}. A type @code{ccl} coding-system used for

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1672 KOI8-R, an encoding of the Cyrillic alphabet.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1673

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1674 @item shift_jis

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1675 @itemx shift_jis-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1676 @itemx shift_jis-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1677 @itemx shift_jis-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1678

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1679 Modeline indicator: @code{Ja/SJIS}. A type @code{shift-jis} coding-system

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1680 implementing the Shift-JIS encoding for Japanese. The underscore is to

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1681 conform to the MIME charset implementing this encoding.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1682

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1683 @item tis-620

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1684 @itemx tis-620-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1685 @itemx tis-620-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1686 @itemx tis-620-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1687

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1688 Modeline indicator: @code{TIS620}. A type @code{ccl} encoding for Thai. The

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1689 external encoding is defined by TIS620, the internal encoding is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1690 peculiar to MULE, and called @code{thai-xtis}.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1691

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1692 @item viqr

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1693

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1694 Modeline indicator: @code{VIQR}. A type @code{no-conversion} coding

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1695 system with Unix EOL convention (ie, no conversion) using

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1696 post-read-decode and pre-write-encode functions to translate the VIQR

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1697 coding system for Vietnamese.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1698

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1699 @item viscii

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1700 @itemx viscii-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1701 @itemx viscii-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1702 @itemx viscii-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1703

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1704 Modeline indicator: @code{VISCII}. A type @code{ccl} coding-system used

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1705 for VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1706 given priority by XEmacs.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1707

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1708 @item vscii

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1709 @itemx vscii-dos

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1710 @itemx vscii-mac

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1711 @itemx vscii-unix

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1712

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1713 Modeline indicator: @code{VSCII}. A type @code{ccl} coding-system used

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1714 for VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1715 given priority by XEmacs. Use

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1716 @code{(prefer-coding-system 'vietnamese-vscii)} to give priority to VSCII.

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1717

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1718 @end table

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1719

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1720 @node CCL, Category Tables, Coding Systems, MULE

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1721 @section CCL

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1722

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1723 CCL (Code Conversion Language) is a simple structured programming

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1724 language designed for character coding conversions. A CCL program is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1725 compiled to CCL code (represented by a vector of integers) and executed

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1726 by the CCL interpreter embedded in Emacs. The CCL interpreter

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1727 implements a virtual machine with 8 registers called @code{r0}, ...,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1728 @code{r7}, a number of control structures, and some I/O operators. Take

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1729 care when using registers @code{r0} (used in implicit @dfn{set}

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1730 statements) and especially @code{r7} (used internally by several

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1731 statements and operations, especially for multiple return values and I/O

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1732 operations).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1733

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1734 CCL is used for code conversion during process I/O and file I/O for

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1735 non-ISO2022 coding systems. (It is the only way for a user to specify a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1736 code conversion function.) It is also used for calculating the code

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1737 point of an X11 font from a character code. However, since CCL is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1738 designed as a powerful programming language, it can be used for more

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1739 generic calculation where efficiency is demanded. A combination of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1740 three or more arithmetic operations can be calculated faster by CCL than

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1741 by Emacs Lisp.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1742

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1743 @strong{Warning:} The code in @file{src/mule-ccl.c} and

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1744 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1745 description of CCL's semantics. The previous version of this section

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1746 contained several typos and obsolete names left from earlier versions of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1747 MULE, and many may remain. (I am not an experienced CCL programmer; the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1748 few who know CCL well find writing English painful.)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1749

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1750 A CCL program transforms an input data stream into an output data

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1751 stream. The input stream, held in a buffer of constant bytes, is left

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1752 unchanged. The buffer may be filled by an external input operation,

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1753 taken from an Emacs buffer, or taken from a Lisp string. The output

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1754 buffer is a dynamic array of bytes, which can be written by an external

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1755 output operation, inserted into an Emacs buffer, or returned as a Lisp

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1756 string.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1757

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1758 A CCL program is a (Lisp) list containing two or three members. The

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1759 first member is the @dfn{buffer magnification}, which indicates the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1760 required minimum size of the output buffer as a multiple of the input

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1761 buffer. It is followed by the @dfn{main block} which executes while

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1762 there is input remaining, and an optional @dfn{EOF block} which is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1763 executed when the input is exhausted. Both the main block and the EOF

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1764 block are CCL blocks.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1765

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1766 A @dfn{CCL block} is either a CCL statement or list of CCL statements.

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1767 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1768 or an @dfn{assignment}, which is a list of a register to receive the

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1769 assignment, an assignment operator, and an expression) or a @dfn{control

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1770 statement} (a list starting with a keyword, whose allowable syntax

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1771 depends on the keyword).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1772

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1773 @menu

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1774 * CCL Syntax:: CCL program syntax in BNF notation.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1775 * CCL Statements:: Semantics of CCL statements.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1776 * CCL Expressions:: Operators and expressions in CCL.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1777 * Calling CCL:: Running CCL programs.

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

1778 * CCL Example:: A trivial program to transform the Web's URL encoding.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1779 @end menu

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1780

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1781 @node CCL Syntax, CCL Statements, , CCL

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1782 @comment Node, Next, Previous, Up

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1783 @subsection CCL Syntax

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1784

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1785 The full syntax of a CCL program in BNF notation:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1786

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1787 @format

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1788 CCL_PROGRAM :=

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1789 (BUFFER_MAGNIFICATION

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1790 CCL_MAIN_BLOCK

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1791 [ CCL_EOF_BLOCK ])

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1792

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1793 BUFFER_MAGNIFICATION := integer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1794 CCL_MAIN_BLOCK := CCL_BLOCK

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1795 CCL_EOF_BLOCK := CCL_BLOCK

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1796

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1797 CCL_BLOCK :=

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1798 STATEMENT | (STATEMENT [STATEMENT ...])

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1799 STATEMENT :=

2367

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1801 | TRANSLATE | MAP | END

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1802

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1803 SET :=

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1804 (REG = EXPRESSION)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1805 | (REG ASSIGNMENT_OPERATOR EXPRESSION)

2367

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1806 | INT-OR-CHAR

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1807

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1808 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1809

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1810 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1811 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1812 LOOP := (loop STATEMENT [STATEMENT ...])

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1813 BREAK := (break)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1814 REPEAT :=

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1815 (repeat)

2367

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1816 | (write-repeat [REG | INT-OR-CHAR | string])

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1817 | (write-read-repeat REG [INT-OR-CHAR | ARRAY])

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1818 READ :=

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1819 (read REG ...)

2367

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1820 | (read-if (REG OPERATOR ARG) CCL_BLOCK [CCL_BLOCK])

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1821 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1822 WRITE :=

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1823 (write REG ...)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1824 | (write EXPRESSION)

2367

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1825 | (write INT-OR-CHAR) | (write string) | (write REG ARRAY)

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1826 | string

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1827 CALL := (call ccl-program-name)

3439

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1828

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1829

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1830 TRANSLATE := ;; Not implemented under XEmacs, except mule-to-unicode and

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1831 ;; unicode-to-mule.

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1832 (translate-character REG(table) REG(charset) REG(codepoint))

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1833 | (translate-character SYMBOL REG(charset) REG(codepoint))

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1834 | (mule-to-unicode REG(charset) REG(codepoint))

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1835 | (unicode-to-mule REG(unicode,code) REG(CHARSET))

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1836

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1837 END := (end)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1838

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1839 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7

2367

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1840 ARG := REG | INT-OR-CHAR

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1841 OPERATOR :=

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1842 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1843 | < | > | == | <= | >= | != | de-sjis | en-sjis

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1844 ASSIGNMENT_OPERATOR :=

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1845 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=

2367

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1846 ARRAY := '[' INT-OR-CHAR ... ']'

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1847 INT-OR-CHAR := integer | character

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1848

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1849 @end format

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1850

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1851 @node CCL Statements, CCL Expressions, CCL Syntax, CCL

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1852 @comment Node, Next, Previous, Up

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1853 @subsection CCL Statements

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1854

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1855 The Emacs Code Conversion Language provides the following statement

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1856 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat},

3439

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1857 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, @dfn{translate} and

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1858 @dfn{end}.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1859

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1860 @heading Set statement:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1861

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1862 The @dfn{set} statement has three variants with the syntaxes

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1863 @samp{(@var{reg} = @var{expression})},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1864 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1865 @samp{@var{integer}}. The assignment operator variation of the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1866 @dfn{set} statement works the same way as the corresponding C expression

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1867 statement does. The assignment operators are @code{+=}, @code{-=},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1868 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{|=}, @code{^=},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1869 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1870 "naked integer" @var{integer} is equivalent to a @var{set} statement of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1871 the form @code{(r0 = @var{integer})}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1872

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1873 @heading I/O statements:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1874

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1875 The @dfn{read} statement takes one or more registers as arguments. It

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1876 reads one byte (a C char) from the input into each register in turn.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1877

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1878 The @dfn{write} takes several forms. In the form @samp{(write @var{reg}

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1879 ...)} it takes one or more registers as arguments and writes each in

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1880 turn to the output. The integer in a register (interpreted as an

2367

ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]

ben

parents: 1734

diff changeset

1881 Ichar) is encoded to multibyte form (ie, Ibytes) and written to the

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1882 current output buffer. If it is less than 256, it is written as is.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1883 The forms @samp{(write @var{expression})} and @samp{(write

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1884 @var{integer})} are treated analogously. The form @samp{(write

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1885 @var{string})} writes the constant string to the output. A

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1886 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1887 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1888 the @var{reg}th element of the @var{array} to the output.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1889

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1890 @heading Conditional statements:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1891

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1892 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1893 an optional @var{second CCL block} as arguments. If the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1894 @var{expression} evaluates to non-zero, the first @var{CCL block} is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1895 executed. Otherwise, if there is a @var{second CCL block}, it is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1896 executed.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1897

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1898 The @dfn{read-if} variant of the @dfn{if} statement takes an

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1899 @var{expression}, a @var{CCL block}, and an optional @var{second CCL

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1900 block} as arguments. The @var{expression} must have the form

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1901 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1902 a register or an integer). The @code{read-if} statement first reads

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1903 from the input into the first register operand in the @var{expression},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1904 then conditionally executes a CCL block just as the @code{if} statement

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1905 does.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1906

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1907 The @dfn{branch} statement takes an @var{expression} and one or more CCL

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1908 blocks as arguments. The CCL blocks are treated as a zero-indexed

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1909 array, and the @code{branch} statement uses the @var{expression} as the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1910 index of the CCL block to execute. Null CCL blocks may be used as

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1911 no-ops, continuing execution with the statement following the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1912 @code{branch} statement in the containing CCL block. Out-of-range

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1913 values for the @var{expression} are also treated as no-ops.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1914

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1915 The @dfn{read-branch} variant of the @dfn{branch} statement takes an

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1916 @var{register}, a @var{CCL block}, and an optional @var{second CCL

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1917 block} as arguments. The @code{read-branch} statement first reads from

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1918 the input into the @var{register}, then conditionally executes a CCL

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1919 block just as the @code{branch} statement does.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1920

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1921 @heading Loop control statements:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1922

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1923 The @dfn{loop} statement creates a block with an implied jump from the

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1924 end of the block back to its head. The loop is exited on a @code{break}

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1925 statement, and continued without executing the tail by a @code{repeat}

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1926 statement.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1927

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1928 The @dfn{break} statement, written @samp{(break)}, terminates the

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1929 current loop and continues with the next statement in the current

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1930 block.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1931

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1932 The @dfn{repeat} statement has three variants, @code{repeat},

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1933 @code{write-repeat}, and @code{write-read-repeat}. Each continues the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1934 current loop from its head, possibly after performing I/O.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1935 @code{repeat} takes no arguments and does no I/O before jumping.

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

1936 @code{write-repeat} takes a single argument (a register, an

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1937 integer, or a string), writes it to the output, then jumps.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1938 @code{write-read-repeat} takes one or two arguments. The first must

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1939 be a register. The second may be an integer or an array; if absent, it

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1940 is implicitly set to the first (register) argument.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1941 @code{write-read-repeat} writes its second argument to the output, then

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1942 reads from the input into the register, and finally jumps. See the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1943 @code{write} and @code{read} statements for the semantics of the I/O

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1944 operations for each type of argument.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1945

3439

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1946 @heading Other statements:

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1947

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1948 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})},

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1949 executes a CCL program as a subroutine. It does not return a value to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1950 the caller, but can modify the register status.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1951

3439

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1952 The @dfn{mule-to-unicode} statement translates an XEmacs character into a

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1953 UCS code point, using U+FFFD REPLACEMENT CHARACTER if the given XEmacs

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1954 character has no known corresponding code point. It takes two

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1955 arguments; the first is a register in which is stored the character set

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1956 ID of the character to be translated, and into which the UCS code is

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1957 stored. The second is a register which stores the XEmacs code of the

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1958 character in question; if it is from a multidimensional character set,

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1959 like most of the East Asian national sets, it's stored as @samp{((c1 <<

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1960 8) & c2)}, where @samp{c1} is the first code, and @samp{c2} the second.

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1961 (That is, as a single integer, the high-order eight bits of which encode

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1962 the first position code, and the low order bits of which encode the

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1963 second.)

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1964

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1965 The @dfn{unicode-to-mule} statement translates a Unicode code point

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1966 (an integer) into an XEmacs character. Its first argument is a register

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1967 containing the UCS code point; the code for the correspond character

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1968 will be written into this register, in the same format as for

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1969 @samp{mule-to-unicode} The second argument is a register into which will

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1970 be written the character set ID of the converted character.

d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]

aidan

parents: 2818

diff changeset

1971

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1972 The @dfn{end} statement, written @samp{(end)}, terminates the CCL

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1973 program successfully, and returns to caller (which may be a CCL

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1974 program). It does not alter the status of the registers.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1975

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1976 @node CCL Expressions, Calling CCL, CCL Statements, CCL

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1977 @comment Node, Next, Previous, Up

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1978 @subsection CCL Expressions

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1979

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1980 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1981 consist of a single @var{operand}, either a register (one of @code{r0},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1982 ..., @code{r0}) or an integer. Complex expressions are lists of the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1983 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1984 C, assignments are not expressions.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1985

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

1986 In the following table, @var{X} is the target resister for a @dfn{set}.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1987 In subexpressions, this is implicitly @code{r7}. This means that

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1988 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1989 freely in subexpressions, since they return parts of their values in

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1990 @code{r7}. @var{Y} may be an expression, register, or integer, while

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1991 @var{Z} must be a register or an integer.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1992

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1993 @multitable @columnfractions .22 .14 .09 .55

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1994 @item Name @tab Operator @tab Code @tab C-like Description

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1995 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1996 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1997 @item CCL_MUL @tab @code{*} @tab 0x02 @tab X = Y * Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1998 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

1999 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2000 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2001 @item CCL_OR @tab @code{|} @tab 0x06 @tab X = Y | Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2002 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2003 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2004 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2005 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) | Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2006 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2007 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2008 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2009 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2010 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2011 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2012 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2013 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2014 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z))

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2015 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2016 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z))

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2017 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z))

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2018 @end multitable

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2019

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

2020 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8,

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2021 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2022 and CCL_DECODE_SJIS treat their first and second bytes as the high and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2023 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2024 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2025 complicated transformation of the Japanese standard JIS encoding to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2026 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2027 represent the SJIS operations in infix form.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2028

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2029 @node Calling CCL, CCL Example, CCL Expressions, CCL

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2030 @comment Node, Next, Previous, Up

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2031 @subsection Calling CCL

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2032

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

2033 CCL programs are called automatically during Emacs buffer I/O when the

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2034 external representation has a coding system type of @code{shift-jis},

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2035 @code{big5}, or @code{ccl}. The program is specified by the coding

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2036 system (@pxref{Coding Systems}). You can also call CCL programs from

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2037 other CCL programs, and from Lisp using these functions:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2038

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2039 @defun ccl-execute ccl-program status

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2040 Execute @var{ccl-program} with registers initialized by

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2041 @var{status}. @var{ccl-program} is a vector of compiled CCL code

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2042 created by @code{ccl-compile}. It is an error for the program to try to

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2043 execute a CCL I/O command. @var{status} must be a vector of nine

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2044 values, specifying the initial value for the R0, R1 .. R7 registers and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2045 for the instruction counter IC. A @code{nil} value for a register

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2046 initializer causes the register to be set to 0. A @code{nil} value for

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2047 the IC initializer causes execution to start at the beginning of the

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2048 program. When the program is done, @var{status} is modified (by

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2049 side-effect) to contain the ending values for the corresponding

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2050 registers and IC.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2051 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2052

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2053 @defun ccl-execute-on-string ccl-program status string &optional continue

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2054 Execute @var{ccl-program} with initial @var{status} on

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2055 @var{string}. @var{ccl-program} is a vector of compiled CCL code

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2056 created by @code{ccl-compile}. @var{status} must be a vector of nine

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2057 values, specifying the initial value for the R0, R1 .. R7 registers and

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2058 for the instruction counter IC. A @code{nil} value for a register

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2059 initializer causes the register to be set to 0. A @code{nil} value for

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2060 the IC initializer causes execution to start at the beginning of the

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2061 program. An optional fourth argument @var{continue}, if non-@code{nil}, causes

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2062 the IC to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2063 remain on the unsatisfied read operation if the program terminates due

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2064 to exhaustion of the input buffer. Otherwise the IC is set to the end

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2065 of the program. When the program is done, @var{status} is modified (by

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2066 side-effect) to contain the ending values for the corresponding

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2067 registers and IC. Returns the resulting string.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2068 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2069

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

2070 To call a CCL program from another CCL program, it must first be

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2071 registered:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2072

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2073 @defun register-ccl-program name ccl-program

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2074 Register @var{name} for CCL program @var{ccl-program} in

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2075 @code{ccl-program-table}. @var{ccl-program} should be the compiled form of

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2076 a CCL program, or @code{nil}. Return index number of the registered CCL

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2077 program.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2078 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2079

442

abe6d1db359e Import from CVS: tag r21-2-36

cvs

parents: 440

diff changeset

2080 Information about the processor time used by the CCL interpreter can be

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2081 obtained using these functions:

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2082

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2083 @defun ccl-elapsed-time

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2084 Returns the elapsed processor time of the CCL interpreter as cons of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2085 user and system time, as

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2086 floating point numbers measured in seconds. If only one

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2087 overall value can be determined, the return value will be a cons of that

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2088 value and 0.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2089 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2090

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2091 @defun ccl-reset-elapsed-time

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2092 Resets the CCL interpreter's internal elapsed time registers.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2093 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2094

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2095 @node CCL Example, , Calling CCL, CCL

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2096 @comment Node, Next, Previous, Up

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2097 @subsection CCL Example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2098

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2099 In this section, we describe the implementation of a trivial coding

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2100 system to transform from the Web's URL encoding to XEmacs' internal

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2101 coding. Many people will have been first exposed to URL encoding when

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2102 they saw ``%20'' where they expected a space in a file's name on their

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2103 local hard disk; this can happen when a browser saves a file from the

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2104 web and doesn't encode the name, as passed from the server, properly.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2105

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2106 URL encoding itself is underspecified with regard to encodings beyond

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2107 ASCII. The relevant document, RFC 1738, explicitly doesn't give any

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2108 information on how to encode non-ASCII characters, and the ``obvious''

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2109 way---use the %xx values for the octets of the eight bit MIME character

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2110 set in which the page was served---breaks when a user types a character

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2111 outside that character set. Best practice for web development is to

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2112 serve all pages as UTF-8 and treat incoming form data as using that

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2113 coding system. (Oh, and gamble that your clients won't ever want to

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2114 type anything outside Unicode. But that's not so much of a gamble with

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2115 today's client operating systems.) We don't treat non-ASCII in this

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2116 example, as dealing with @samp{(read-multibyte-character ...)} and

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2117 errors therewith would make it much harder to understand.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2118

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2119 Since CCL isn't a very rich language, we move much of the logic that

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2120 would ordinarily be computed from operations like @code{(member ..)},

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2121 @code{(and ...)} and @code{(or ...)} into tables, from which register

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2122 values are read and written, and on which @code{if} statements are

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2123 predicated. Much more of the implementation of this coding system is

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2124 occupied with constructing these tables---in normal Emacs Lisp---than it

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2125 is with actual CCL code.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2126

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2127 All the @code{defvar} statements we deal with in the next few sections

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2128 are surrounded by a @code{(eval-and-compile ...)}, which means that the

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2129 logic which initializes these variables executes at compile time, and if

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2130 XEmacs loads the compiled version of the file, these variables are

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2131 initialized as constants.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2132

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2133 @menu

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2134 * Four bits to ASCII:: Two tables used for getting hex digits from ASCII.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2135 * URI Encoding constants:: Useful predefined characters.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2136 * Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2137 * Characters to be preserved:: No transformation needed for these characters.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2138 * The program to decode to internal format:: .

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2139 * The program to encode from internal format:: .

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2140 * The actual coding system:: .

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2141 @end menu

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2142

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2143 @node Four bits to ASCII, URI Encoding constants, , CCL Example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2144 @subsubsection Four bits to ASCII

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2145

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2146 The first @code{defvar} is for

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2147 @code{url-coding-high-order-nybble-as-ascii}, a 256-entry table that

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2148 maps from an octet's value to the ASCII encoding for the hex value of

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2149 its most significant four bits. That might sound complex, but it isn't;

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2150 for decimal 65, hex value @samp{#x41}, the entry in the table is the

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2151 ASCII encoding of `4'. For decimal 122, ASCII `z', hex value

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2152 @code{#x7a}, @code{(elt url-coding-high-order-nybble-as-ascii #x7a)}

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2153 after this file is loaded gives the ASCII encoding of 7.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2154

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2155 @example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2156 (defvar url-coding-high-order-nybble-as-ascii

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2157 (let ((val (make-vector 256 0))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2158 (i 0))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2159 (while (< i (length val))

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2160 (aset val i (char-to-int (aref (format "%02X" i) 0)))

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2161 (setq i (1+ i)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2162 val)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2163 "Table to find an ASCII version of an octet's most significant 4 bits.")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2164 @end example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2165

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2166 The next table, @code{url-coding-low-order-nybble-as-ascii} is almost

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2167 the same thing, but this time it has a map for the hex encoding of the

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2168 low-order four bits. So the sixty-fifth entry (offset @samp{#x41}) is

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2169 the ASCII encoding of `1', the hundred-and-twenty-second (offset

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2170 @samp{#x7a}) is the ASCII encoding of `A'.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2171

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2172 @example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2173 (defvar url-coding-low-order-nybble-as-ascii

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2174 (let ((val (make-vector 256 0))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2175 (i 0))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2176 (while (< i (length val))

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2177 (aset val i (char-to-int (aref (format "%02X" i) 1)))

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2178 (setq i (1+ i)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2179 val)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2180 "Table to find an ASCII version of an octet's least significant 4 bits.")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2181 @end example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2182

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2183 @node URI Encoding constants, Numeric to ASCII-hexadecimal conversion, Four bits to ASCII, CCL Example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2184 @subsubsection URI Encoding constants

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2185

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2186 Next, we have a couple of variables that make the CCL code more

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2187 readable. The first is the ASCII encoding of the percentage sign; this

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2188 character is used as an escape code, to start the encoding of a

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2189 non-printable character. For historical reasons, URL encoding allows

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2190 the space character to be encoded as a plus sign--it does make typing

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2191 URLs like @samp{http://google.com/search?q=XEmacs+home+page} easier--and

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2192 as such, we have to check when decoding for this value, and map it to

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2193 the space character. When doing this in CCL, we use the

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2194 @code{url-coding-escaped-space-code} variable.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2195

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2196 @example

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2197 (defvar url-coding-escape-character-code (char-to-int ?%)

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2198 "The code point for the percentage sign, in ASCII.")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2199

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2200 (defvar url-coding-escaped-space-code (char-to-int ?+)

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2201 "The URL-encoded value of the space character, that is, +.")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2202 @end example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2203

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2204 @node Numeric to ASCII-hexadecimal conversion, Characters to be preserved, URI Encoding constants, CCL Example

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2205 @subsubsection Numeric to ASCII-hexadecimal conversion

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2206

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2207 Now, we have a couple of utility tables that wouldn't be necessary in

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2208 a more expressive programming language than is CCL. The first is sixteen

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2209 in length, and maps a hexadecimal number to the ASCII encoding of that

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2210 number; so zero maps to ASCII `0', ten maps to ASCII `A.' The second

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2211 does the reverse; that is, it maps an ASCII character to its value when

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2212 interpreted as a hexadecimal digit. ('A' => 10, 'c' => 12, '2' => 2, as

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2213 a few examples.)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2214

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2215 @example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2216 (defvar url-coding-hex-digit-table

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2217 (let ((i 0)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2218 (val (make-vector 16 0)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2219 (while (< i 16)

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2220 (aset val i (char-to-int (aref (format "%X" i) 0)))

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2221 (setq i (1+ i)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2222 val)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2223 "A map from a hexadecimal digit's numeric value to its encoding in ASCII.")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2224

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2225 (defvar url-coding-latin-1-as-hex-table

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2226 (let ((val (make-vector 256 0))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2227 (i 0))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2228 (while (< i (length val))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2229 ;; Get a hex val for this ASCII character.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2230 (aset val i (string-to-int (format "%c" i) 16))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2231 (setq i (1+ i)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2232 val)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2233 "A map from Latin 1 code points to their values as hexadecimal digits.")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2234 @end example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2235

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2236 @node Characters to be preserved, The program to decode to internal format, Numeric to ASCII-hexadecimal conversion, CCL Example

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2237 @subsubsection Characters to be preserved

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2238

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2239 And finally, the last of these tables. URL encoding says that

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2240 alphanumeric characters, the underscore, hyphen and the full stop

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2241 @footnote{That's what the standards call it, though my North American

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2242 readers will be more familiar with it as the period character.} retain

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2243 their ASCII encoding, and don't undergo transformation.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2244 @code{url-coding-should-preserve-table} is an array in which the entries

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2245 are one if the corresponding ASCII character should be left as-is, and

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2246 zero if they should be transformed. So the entries for all the control

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2247 and most of the punctuation charcters are zero. Lisp programmers will

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2248 observe that this initialization is particularly inefficient, but

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2249 they'll also be aware that this is a long way from an inner loop where

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2250 every nanosecond counts.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2251

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2252 @example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2253 (defvar url-coding-should-preserve-table

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2254 (let ((preserve

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2255 (list ?- ?_ ?. ?a ?b ?c ?d ?e ?f ?g ?h ?i ?j ?k ?l ?m ?n ?o

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2256 ?p ?q ?r ?s ?t ?u ?v ?w ?x ?y ?z ?A ?B ?C ?D ?E ?F ?G

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2257 ?H ?I ?J ?K ?L ?M ?N ?O ?P ?Q ?R ?S ?T ?U ?V ?W ?X ?Y

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2258 ?Z ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2259 (i 0)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2260 (res (make-vector 256 0)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2261 (while (< i 256)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2262 (when (member (int-char i) preserve)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2263 (aset res i 1))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2264 (setq i (1+ i)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2265 res)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2266 "A 256-entry array of flags, indicating whether or not to preserve an

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2267 octet as its ASCII encoding.")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2268 @end example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2269

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2270 @node The program to decode to internal format, The program to encode from internal format, Characters to be preserved, CCL Example

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2271 @subsubsection The program to decode to internal format

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2272

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2273 After the almost interminable tables, we get to the CCL. The first

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2274 CCL program, @code{ccl-decode-urlcoding} decodes from the URL coding to

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2275 our internal format; since this version of CCL doesn't have support for

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2276 error checking on the input, we don't do any verification on it.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2277

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2278 The buffer magnification--approximate ratio of the size of the output

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2279 buffer to the size of the input buffer--is declared as one, because

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2280 fractional values aren't allowed. (Since all those %20's will map to

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2281 ` ', the length of the output text will be less than that of the input

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2282 text.)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2283

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2284 So, first we read an octet from the input buffer into register

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2285 @samp{r0}, to set up the loop. Next, we start the loop, with a

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2286 @code{(loop ...)} statement, and we check if the value in @samp{r0} is a

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2287 percentage sign. (Note the comma before

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2288 @code{url-coding-escape-character-code}; since CCL is a Lisp macro

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2289 language, we can break out of the macro evaluation with a comman, and as

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2290 such, ``@code{,url-coding-escape-character-code}'' will be evaluated as a

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2291 literal `37.')

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2292

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2293 If it is a percentage sign, we read the next two octets into @samp{r2}

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2294 and @samp{r3}, and convert them into their hexadecimal numeric values,

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2295 using the @code{url-coding-latin-1-as-hex-table} array declared above.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2296 (But again, it'll be interpreted as a literal array.) We then left

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2297 shift the first by four bits, mask the two together, and write the

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2298 result to the output buffer.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2299

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2300 If it isn't a percentage sign, and it is a `+' sign, we write a

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2301 space--hexadecimal 20--to the output buffer.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2302

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2303 If none of those things are true, we pass the octet to the output buffer

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2304 untransformed. (This could be a place to put error checking, in a more

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2305 expressive language.) We then read one more octet from the input

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2306 buffer, and move to the next iteration of the loop.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2307

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2308 @example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2309 (define-ccl-program ccl-decode-urlcoding

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2310 `(1

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2311 ((read r0)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2312 (loop

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2313 (if (r0 == ,url-coding-escape-character-code)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2314 ((read r2 r3)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2315 ;; Assign the value at offset r2 in the url-coding-hex-digit-table

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2316 ;; to r3.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2317 (r2 = r2 ,url-coding-latin-1-as-hex-table)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2318 (r3 = r3 ,url-coding-latin-1-as-hex-table)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2319 (r2 <<= 4)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2320 (r3 |= r2)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2321 (write r3))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2322 (if (r0 == ,url-coding-escaped-space-code)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2323 (write #x20)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2324 (write r0)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2325 (read r0)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2326 (repeat))))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2327 "CCL program to take URI-encoded ASCII text and transform it to our

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2328 internal encoding. ")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2329 @end example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2330

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2331 @node The program to encode from internal format, The actual coding system, The program to decode to internal format, CCL Example

2640

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2332 @subsubsection The program to encode from internal format

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2333

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2334 Next, we see the CCL program to encode ASCII text as URL coded text.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2335 Here, the buffer magnification is specified as three, to account for ` '

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2336 mapping to %20, etc. As before, we read an octet from the input into

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2337 @samp{r0}, and move into the body of the loop. Next, we check if we

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2338 should preserve the value of this octet, by reading from offset

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2339 @samp{r0} in the @code{url-coding-should-preserve-table} into @samp{r1}.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2340 Then we have an @samp{if} statement predicated on the value in

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2341 @samp{r1}; for the true branch, we write the input octet directly. For

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2342 the false branch, we write a percentage sign, the ASCII encoding of the

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2343 high four bits in hex, and then the ASCII encoding of the low four bits

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2344 in hex.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2345

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2346 We then read an octet from the input into @samp{r0}, and repeat the loop.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2347

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2348 @example

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2349 (define-ccl-program ccl-encode-urlcoding

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2350 `(3

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2351 ((read r0)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2352 (loop

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2353 (r1 = r0 ,url-coding-should-preserve-table)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2354 ;; If we should preserve the value, just write the octet directly.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2355 (if r1

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2356 (write r0)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2357 ;; else, write a percentage sign, and the hex value of the octet, in

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2358 ;; an ASCII-friendly format.

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2359 ((write ,url-coding-escape-character-code)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2360 (write r0 ,url-coding-high-order-nybble-as-ascii)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2361 (write r0 ,url-coding-low-order-nybble-as-ascii)))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2362 (read r0)

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2363 (repeat))))

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2364 "CCL program to encode octets (almost) according to RFC 1738")

a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent]

stephent

parents: 2367

diff changeset

2365 @end example

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2366

2690

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2367 @node The actual coding system, , The program to encode from internal format, CCL Example

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2368 @subsubsection The actual coding system

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2369

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2370 To actually create the coding system, we call

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2371 @samp{make-coding-system}. The first argument is the symbol that is to

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2372 be the name of the coding system, in our case @samp{url-coding}. The

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2373 second specifies that the coding system is to be of type

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2374 @samp{ccl}---there are several other coding system types available,

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2375 including, see the documentation for @samp{make-coding-system} for the

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2376 full list. Then there's a documentation string describing the wherefore

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2377 and caveats of the coding system, and the final argument is a property

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2378 list giving information about the CCL programs and the coding system's

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2379 mnemonic.

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2380

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2381 @example

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2382 (make-coding-system

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2383 'url-coding 'ccl

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2384 "The coding used by application/x-www-form-urlencoded HTTP applications.

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2385 This coding form doesn't specify anything about non-ASCII characters, so

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2386 make sure you've transformed to a seven-bit coding system first."

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2387 '(decode ccl-decode-urlcoding

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2388 encode ccl-encode-urlcoding

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2389 mnemonic "URLenc"))

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2390 @end example

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2391

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2392 If you're lucky, the @samp{url-coding} coding system describe here

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2393 should be available in the XEmacs package system. Otherwise, downloading

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2394 it from @samp{http://www.parhasard.net/url-coding.el} should work for

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2395 the foreseeable future.

d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan]

aidan

parents: 2640

diff changeset

2396

775

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2397 @node Category Tables, Unicode Support, CCL, MULE

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2398 @section Category Tables

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2399

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2400 A category table is a type of char table used for keeping track of

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2401 categories. Categories are used for classifying characters for use in

440

8de8e3f6228a Import from CVS: tag r21-2-28

cvs

parents: 428

diff changeset

2402 regexps---you can refer to a category rather than having to use a

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2403 complicated [] expression (and category lookups are significantly

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2404 faster).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2405

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2406 There are 95 different categories available, one for each printable

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2407 character (including space) in the ASCII charset. Each category is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2408 designated by one such character, called a @dfn{category designator}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2409 They are specified in a regexp using the syntax @samp{\cX}, where X is a

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2410 category designator. (This is not yet implemented.)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2411

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2412 A category table specifies, for each character, the categories that

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2413 the character is in. Note that a character can be in more than one

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2414 category. More specifically, a category table maps from a character to

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2415 either the value @code{nil} (meaning the character is in no categories)

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2416 or a 95-element bit vector, specifying for each of the 95 categories

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2417 whether the character is in that category.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2418

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2419 Special Lisp functions are provided that abstract this, so you do not

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2420 have to directly manipulate bit vectors.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2421

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2422 @defun category-table-p object

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2423 This function returns @code{t} if @var{object} is a category table.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2424 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2425

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2426 @defun category-table &optional buffer

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2427 This function returns the current category table. This is the one

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2428 specified by the current buffer, or by @var{buffer} if it is

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2429 non-@code{nil}.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2430 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2431

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2432 @defun standard-category-table

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2433 This function returns the standard category table. This is the one used

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2434 for new buffers.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2435 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2436

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2437 @defun copy-category-table &optional category-table

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2438 This function returns a new category table which is a copy of

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2439 @var{category-table}, which defaults to the standard category table.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2440 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2441

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2442 @defun set-category-table category-table &optional buffer

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2443 This function selects @var{category-table} as the new category table for

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2444 @var{buffer}. @var{buffer} defaults to the current buffer if omitted.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2445 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2446

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2447 @defun category-designator-p object

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2448 This function returns @code{t} if @var{object} is a category designator (a

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2449 char in the range @samp{' '} to @samp{'~'}).

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2450 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2451

444

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2452 @defun category-table-value-p object

576fb035e263 Import from CVS: tag r21-2-37

cvs

parents: 442

diff changeset

2453 This function returns @code{t} if @var{object} is a category table value.

428

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2454 Valid values are @code{nil} or a bit vector of size 95.

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2455 @end defun

3ecd8885ac67 Import from CVS: tag r21-2-22

cvs

parents:

diff changeset

2456

775

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2457

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2458 @c Added 2002-03-13 sjt

1183

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2459 @node Unicode Support, Charset Unification, Category Tables, MULE

775

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2460 @section Unicode Support

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2461 @cindex unicode

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2462 @cindex utf-8

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2463 @cindex utf-16

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2464 @cindex ucs-2

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2465 @cindex ucs-4

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2466 @cindex bmp

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2467 @cindex basic multilingual plance

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2468

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2469 Unicode support was added by Ben Wing to XEmacs 21.5.6.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2470

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2471 @defun set-language-unicode-precedence-list list

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2472 Set the language-specific precedence list used for Unicode decoding.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2473 This is a list of charsets, which are consulted in order for a translation

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2474 matching a given Unicode character. If no matches are found, the charsets

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2475 in the default precedence list (see

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2476 @code{set-default-unicode-precedence-list}) are consulted, and then all

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2477 remaining charsets, in some arbitrary order.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2478

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2479 The language-specific precedence list is meant to be set as part of the

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2480 language environment initialization; the default precedence list is meant

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2481 to be set by the user.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2482 @end defun

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2483

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2484 @defun language-unicode-precedence-list

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2485 Return the language-specific precedence list used for Unicode decoding.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2486 See @code{set-language-unicode-precedence-list} for more information.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2487 @end defun

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2488

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2489 @defun set-default-unicode-precedence-list list

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2490 Set the default precedence list used for Unicode decoding.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2491 This is meant to be set by the user. See

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2492 `set-language-unicode-precedence-list' for more information.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2493 @end defun

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2494

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2495 @defun default-unicode-precedence-list

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2496 Return the default precedence list used for Unicode decoding.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2497 See @code{set-language-unicode-precedence-list} for more information.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2498 @end defun

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2499

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2500 @defun set-unicode-conversion character code

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2501 Add conversion information between Unicode codepoints and characters.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2502 @var{character} is one of the following:

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2503

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2504 @c #### fix this markup

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2505 -- A character (in which case @var{code} must be a non-negative integer)

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2506 -- A vector of characters (in which case @var{code} must be a vector of

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2507 non-negative integers of the same length)

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2508

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2509 Values of @var{code} above 2^20 - 1 are allowed for the purpose of specifying

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2510 private characters, but will cause errors when converted to UTF-16 or UTF-32.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2511 UCS-4 and UTF-8 can handle values to 2^31 - 1, but XEmacs Lisp integers top

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2512 out at 2^30 - 1.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2513 @end defun

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2514

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2515 @defun character-to-unicode character

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2516 Convert @var{character} to Unicode codepoint.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2517 When there is no international support (i.e. MULE is not defined),

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2518 this function simply does @code{char-to-int}.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2519 @end defun

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2520

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2521 @defun unicode-to-character code [charsets]

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2522 Convert Unicode codepoint @var{code} to character.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2523 @var{code} should be a non-negative integer.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2524 If @var{charsets} is given, it should be a list of charsets, and only those

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2525 charsets will be consulted, in the given order, for a translation.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2526 Otherwise, the default ordering of all charsets will be given (see

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2527 @code{set-unicode-charset-precedence}).

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2528

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2529 When there is no international support (i.e. MULE is not defined),

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2530 this function simply does @code{int-to-char} and ignores the

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2531 @var{charsets} argument.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2532 @end defun

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2533

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2534 @defun parse-unicode-translation-table filename charset start end offset flags

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2535 Parse Unicode translation data in @var{filename} for MULE @var{charset}.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2536 Data is text, in the form of one translation per line -- charset

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2537 codepoint followed by Unicode codepoint. Numbers are decimal or hex

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2538 \(preceded by 0x). Comments are marked with a #. Charset codepoints

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2539 for two-dimensional charsets should have the first octet stored in the

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2540 high 8 bits of the hex number and the second in the low 8 bits.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2541

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2542 If @var{start} and @var{end} are given, only charset codepoints within

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2543 the given range will be processed. If @var{offset} is given, that value

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2544 will be added to all charset codepoints in the file to obtain the

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2545 internal charset codepoint. @var{start} and @var{end} apply to the

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2546 codepoints in the file, before @var{offset} is applied.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2547

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2548 (Note that, as usual, we assume that octets are in the range 32 to

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2549 127 or 33 to 126. If you have a table in kuten form, with octets in

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2550 the range 1 to 94, you will have to use an offset of 5140,

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2551 i.e. 0x2020.)

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2552

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2553 @var{flags}, if specified, control further how the tables are interpreted

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2554 and are used to special-case certain known table weirdnesses in the

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2555 Unicode tables:

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2556

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2557 @table @code

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2558 @item ignore-first-column'

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2559 Exactly as it sounds. The JIS X 0208 tables have 3 columns of data instead

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2560 of 2; the first is the Shift-JIS codepoint.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2561

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2562 @item big5

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2563 The charset codepoint is a Big Five codepoint; convert it to the

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2564 proper hacked-up codepoint in `chinese-big5-1' or `chinese-big5-2'.

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2565 @end table

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2566 @end defun

7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent]

stephent

parents: 444

diff changeset

2567

1183

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2568

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2569 @node Charset Unification, Charsets and Coding Systems, Unicode Support, MULE

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2570 @section Character Set Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2571

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2572 Mule suffers from a design defect that causes it to consider the ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2573 Latin character sets to be disjoint. This results in oddities such as

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2574 files containing both ISO 8859/1 and ISO 8859/15 codes, and using ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2575 2022 control sequences to switch between them, as well as more plausible

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2576 but often unnecessary combinations like ISO 8859/1 with ISO 8859/2.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2577 This can be very annoying when sending messages or even in simple

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2578 editing on a single host. Unification works around the problem by

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2579 converting as many characters as possible to use a single Latin coded

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2580 character set before saving the buffer.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2581

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2582 This node and its children were ripp'd untimely from

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2583 @file{latin-unity.texi}, and have been quickly converted for use here.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2584 However as APIs are likely to diverge, beware of inaccuracies. Please

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2585 report any you discover with @kbd{M-x report-xemacs-bug RET}, as well

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2586 as any ambiguities or downright unintelligible passages.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2587

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2588 A lot of the stuff here doesn't belong here; it belongs in the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2589 @ref{Top, , , xemacs, XEmacs User's Manual}. Report those as bugs,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2590 too, preferably with patches.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2591

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2592 @menu

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2593 * Overview:: Unification history and general information.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2594 * Usage:: An overview of the operation of Unification.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2595 * Configuration:: Configuring Unification for use.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2596 * Theory of Operation:: How Unification works.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2597 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2598 * Charsets and Coding Systems:: Reference lists with annotations.

1188

11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]

youngs

parents: 1183

diff changeset

2599 * Unification Internals:: Utilities and implementation details.

1183

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2600 @end menu

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2601

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2602 @node Overview, Usage, Charset Unification, Charset Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2603 @subsection An Overview of Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2604

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2605 Mule suffers from a design defect that causes it to consider the ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2606 Latin character sets to be disjoint. This manifests itself when a user

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2607 enters characters using input methods associated with different coded

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2608 character sets into a single buffer.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2609

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2610 A very important example involves email. Many sites, especially in the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2611 U.S., default to use of the ISO 8859/1 coded character set (also called

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2612 ``Latin 1,'' though these are somewhat different concepts). However,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2613 ISO 8859/1 provides a generic CURRENCY SIGN character. Now that the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2614 Euro has become the official currency of most countries in Europe, this

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2615 is unsatisfactory (and in practice, useless). So Europeans generally

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2616 use ISO 8859/15, which is nearly identical to ISO 8859/1 for most

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2617 languages, except that it substitutes EURO SIGN for CURRENCY SIGN.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2618

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2619 Suppose a European user yanks text from a post encoded in ISO 8859/1

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2620 into a message composition buffer, and enters some text including the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2621 Euro sign. Then Mule will consider the buffer to contain both ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2622 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2623 programmed) send the message as a multipart mixed MIME body!

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2624

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2625 This is clearly stupid. What is not as obvious is that, just as any

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2626 European can include American English in their text because ASCII is a

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2627 subset of ISO 8859/15, most European languages which use Latin

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2628 characters (eg, German and Polish) can typically be mixed while using

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2629 only one Latin coded character set (in this case, ISO 8859/2). However,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2630 this often depends on exactly what text is to be encoded.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2631

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2632 Unification works around the problem by converting as many characters as

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2633 possible to use a single Latin coded character set before saving the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2634 buffer.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2635

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2636 @node Usage, Configuration, Overview, Charset Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2637 @subsection Operation of Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2638

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2639 Normally, Unification works in the background by installing

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2640 @code{unity-sanity-check} on @code{write-region-pre-hook}. This is

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2641 done by default for the ISO 8859 Latin family of character sets. The

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2642 user activates this functionality for other character set families by

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2643 invoking @code{enable-unification}, either interactively or in her

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2644 init file. @xref{Init File, , , xemacs}. Unification can be

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2645 deactivated by invoking @code{disable-unification}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2646

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2647 Unification also provides a few functions for remapping or recoding the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2648 buffer by hand. To @dfn{remap} a character means to change the buffer

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2649 representation of the character by using another coded character set.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2650 Remapping never changes the identity of the character, but may involve

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2651 altering the code point of the character. To @dfn{recode} a character

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2652 means to simply change the coded character set. Recoding never alters

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2653 the code point of the character, but may change the identity of the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2654 character. @xref{Theory of Operation}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2655

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2656 There are a few variables which determine which coding systems are

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2657 always acceptable to Unification: @code{unity-ucs-list},

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2658 @code{unity-preferred-coding-system-list}, and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2659 @code{unity-preapproved-coding-system-list}. The latter two default

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2660 to @code{()}, and should probably be avoided because they short-circuit

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2661 the sanity check. If you find you need to use them, consider reporting

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2662 it as a bug or request for enhancement. Because they seem unsafe, the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2663 recommended interface is likely to change.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2664

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2665 @menu

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2666 * Basic Functionality:: User interface and customization.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2667 * Interactive Usage:: Treating text by hand.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2668 Also documents the hook function(s).

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2669 @end menu

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2670

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2671

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2672 @node Basic Functionality, Interactive Usage, , Usage

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2673 @section Basic Functionality

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2674

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2675 These functions and user options initialize and configure Unification.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2676 In normal use, none of these should be needed.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2677

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2678 @strong{These APIs are certain to change.}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2679

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2680 @defun enable-unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2681 Set up hooks and initialize variables for latin-unity.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2682

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2683 There are no arguments.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2684

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2685 This function is idempotent. It will reinitialize any hooks or variables

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2686 that are not in initial state.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2687 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2688

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2689 @defun disable-unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2690 There are no arguments.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2691

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2692 Clean up hooks and void variables used by latin-unity.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2693 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2694

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2695 @defopt unity-ucs-list

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2696 List of coding systems considered to be universal.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2697

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2698 The default value is @code{'(utf-8 iso-2022-7 ctext escape-quoted)}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2699

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2700 Order matters; coding systems earlier in the list will be preferred when

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2701 recommending a coding system. These coding systems will not be used

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2702 without querying the user (unless they are also present in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2703 @code{unity-preapproved-coding-system-list}), and follow the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2704 @code{unity-preferred-coding-system-list} in the list of suggested

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2705 coding systems.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2706

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2707 If none of the preferred coding systems are feasible, the first in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2708 this list will be the default.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2709

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2710 Notes on certain coding systems: @code{escape-quoted} is a special

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2711 coding system used for autosaves and compiled Lisp in Mule. You should

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2712 @c #### fix in latin-unity.texi

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2713 never delete this, although it is rare that a user would want to use it

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2714 directly. Unification does not try to be \"smart\" about other general

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2715 ISO 2022 coding systems, such as ISO-2022-JP. (They are not recognized

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2716 as equivalent to @code{iso-2022-7}.) If your preferred coding system is

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2717 one of these, you may consider adding it to @code{unity-ucs-list}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2718 However, this will typically have the side effect that (eg) ISO 8859/1

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2719 files will be saved in 7-bit form with ISO 2022 escape sequences.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2720 @end defopt

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2721

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2722 Coding systems which are not Latin and not in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2723 @code{unity-ucs-list} are handled by short circuiting checks of

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2724 coding system against the next two variables.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2725

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2726 @defopt unity-preapproved-coding-system-list

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2727 List of coding systems used without querying the user if feasible.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2728

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2729 The default value is @samp{(buffer-default preferred)}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2730

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2731 The first feasible coding system in this list is used. The special values

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2732 @samp{preferred} and @samp{buffer-default} may be present:

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2733

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2734 @table @code

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2735 @item buffer-default

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2736 Use the coding system used by @samp{write-region}, if feasible.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2737

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2738 @item preferred

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2739 Use the coding system specified by @samp{prefer-coding-system} if feasible.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2740 @end table

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2741

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2742 "Feasible" means that all characters in the buffer can be represented by

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2743 the coding system. Coding systems in @samp{unity-ucs-list} are

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2744 always considered feasible. Other feasible coding systems are computed

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2745 by @samp{unity-representations-feasible-region}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2746

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2747 Note that the first universal coding system in this list shadows all

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2748 other coding systems. In particular, if your preferred coding system is

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2749 a universal coding system, and @code{preferred} is a member of this

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2750 list, unification will blithely convert all your files to that coding

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2751 system. This is considered a feature, but it may surprise most users.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2752 Users who don't like this behavior should put @code{preferred} in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2753 @code{unity-preferred-coding-system-list}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2754 @end defopt

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2755

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2756 @defopt unity-preferred-coding-system-list

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2757 @c #### fix in latin-unity.texi

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2758 List of coding systems suggested to the user if feasible.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2759

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2760 The default value is @samp{(iso-8859-1 iso-8859-15 iso-8859-2 iso-8859-3

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2761 iso-8859-4 iso-8859-9)}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2762

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2763 If none of the coding systems in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2764 @c #### fix in latin-unity.texi

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2765 @code{unity-preapproved-coding-system-list} are feasible, this list

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2766 will be recommended to the user, followed by the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2767 @code{unity-ucs-list}. The first coding system in this list is default. The

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2768 special values @samp{preferred} and @samp{buffer-default} may be

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2769 present:

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2770

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2771 @table @code

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2772 @item buffer-default

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2773 Use the coding system used by @samp{write-region}, if feasible.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2774

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2775 @item preferred

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2776 Use the coding system specified by @samp{prefer-coding-system} if feasible.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2777 @end table

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2778

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2779 "Feasible" means that all characters in the buffer can be represented by

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2780 the coding system. Coding systems in @samp{unity-ucs-list} are

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2781 always considered feasible. Other feasible coding systems are computed

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2782 by @samp{unity-representations-feasible-region}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2783 @end defopt

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2784

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2785

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2786 @defvar unity-iso-8859-1-aliases

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2787 List of coding systems to be treated as aliases of ISO 8859/1.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2788

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2789 The default value is '(iso-8859-1).

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2790

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2791 This is not a user variable; to customize input of coding systems or

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2792 charsets, @samp{unity-coding-system-alias-alist} or

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2793 @samp{unity-charset-alias-alist}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2794 @end defvar

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2795

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2796

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2797 @node Interactive Usage, , Basic Functionality, Usage

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2798 @section Interactive Usage

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2799

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2800 First, the hook function @code{unity-sanity-check} is documented.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2801 (It is placed here because it is not an interactive function, and there

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2802 is not yet a programmer's section of the manual.)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2803

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2804 These functions provide access to internal functionality (such as the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2805 remapping function) and to extra functionality (the recoding functions

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2806 and the test function).

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2807

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2808

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2809 @defun unity-sanity-check begin end filename append visit lockname &optional coding-system

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2810

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2811 Check if @var{coding-system} can represent all characters between

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2812 @var{begin} and @var{end}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2813

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2814 For compatibility with old broken versions of @code{write-region},

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2815 @var{coding-system} defaults to @code{buffer-file-coding-system}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2816 @var{filename}, @var{append}, @var{visit}, and @var{lockname} are

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2817 ignored.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2818

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2819 Return nil if buffer-file-coding-system is not (ISO-2022-compatible)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2820 Latin. If @code{buffer-file-coding-system} is safe for the charsets

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2821 actually present in the buffer, return it. Otherwise, ask the user to

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2822 choose a coding system, and return that.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2823

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2824 This function does @emph{not} do the safe thing when

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2825 @code{buffer-file-coding-system} is nil (aka no-conversion). It

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2826 considers that ``non-Latin,'' and passes it on to the Mule detection

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2827 mechanism.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2828

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2829 This function is intended for use as a @code{write-region-pre-hook}. It

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2830 does nothing except return @var{coding-system} if @code{write-region}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2831 handlers are inhibited.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2832 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2833

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2834 @defun unity-buffer-representations-feasible

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2835

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2836 There are no arguments.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2837

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2838 Apply unity-region-representations-feasible to the current buffer.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2839 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2840

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2841 @defun unity-region-representations-feasible begin end &optional buf

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2842

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2843 Return character sets that can represent the text from @var{begin} to @var{end} in @var{buf}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2844

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2845 @var{buf} defaults to the current buffer. Called interactively, will be

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2846 applied to the region. Function assumes @var{begin} <= @var{end}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2847

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2848 The return value is a cons. The car is the list of character sets

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2849 that can individually represent all of the non-ASCII portion of the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2850 buffer, and the cdr is the list of character sets that can

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2851 individually represent all of the ASCII portion.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2852

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2853 The following is taken from a comment in the source. Please refer to

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2854 the source to be sure of an accurate description.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2855

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2856 The basic algorithm is to map over the region, compute the set of

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2857 charsets that can represent each character (the ``feasible charset''),

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2858 and take the intersection of those sets.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2859

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2860 The current implementation takes advantage of the fact that ASCII

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2861 characters are common and cannot change asciisets. Then using

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2862 skip-chars-forward makes motion over ASCII subregions very fast.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2863

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2864 This same strategy could be applied generally by precomputing classes

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2865 of characters equivalent according to their effect on latinsets, and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2866 adding a whole class to the skip-chars-forward string once a member is

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2867 found.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2868

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2869 Probably efficiency is a function of the number of characters matched,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2870 or maybe the length of the match string? With @code{skip-category-forward}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2871 over a precomputed category table it should be really fast. In practice

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2872 for Latin character sets there are only 29 classes.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2873 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2874

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2875 @defun unity-remap-region begin end character-set &optional coding-system

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2876

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2877 Remap characters between @var{begin} and @var{end} to equivalents in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2878 @var{character-set}. Optional argument @var{coding-system} may be a

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2879 coding system name (a symbol) or nil. Characters with no equivalent are

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2880 left as-is.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2881

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2882 When called interactively, @var{begin} and @var{end} are set to the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2883 beginning and end, respectively, of the active region, and the function

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2884 prompts for @var{character-set}. The function does completion, knows

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2885 how to guess a character set name from a coding system name, and also

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2886 provides some common aliases. See @code{unity-guess-charset}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2887 There is no way to specify @var{coding-system}, as it has no useful

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2888 function interactively.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2889

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2890 Return @var{coding-system} if @var{coding-system} can encode all

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2891 characters in the region, t if @var{coding-system} is nil and the coding

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2892 system with G0 = 'ascii and G1 = @var{character-set} can encode all

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2893 characters, and otherwise nil. Note that a non-null return does

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2894 @emph{not} mean it is safe to write the file, only the specified region.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2895 (This behavior is useful for multipart MIME encoding and the like.)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2896

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2897 Note: by default this function is quite fascist about universal coding

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2898 systems. It only admits @samp{utf-8}, @samp{iso-2022-7}, and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2899 @samp{ctext}. Customize @code{unity-approved-ucs-list} to change

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2900 this.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2901

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2902 This function remaps characters that are artificially distinguished by Mule

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2903 internal code. It may change the code point as well as the character set.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2904 To recode characters that were decoded in the wrong coding system, use

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2905 @code{unity-recode-region}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2906 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2907

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2908 @defun unity-recode-region begin end wrong-cs right-cs

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2909

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2910 Recode characters between @var{begin} and @var{end} from @var{wrong-cs}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2911 to @var{right-cs}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2912

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2913 @var{wrong-cs} and @var{right-cs} are character sets. Characters retain

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2914 the same code point but the character set is changed. Only characters

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2915 from @var{wrong-cs} are changed to @var{right-cs}. The identity of the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2916 character may change. Note that this could be dangerous, if characters

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2917 whose identities you do not want changed are included in the region.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2918 This function cannot guess which characters you want changed, and which

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2919 should be left alone.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2920

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2921 When called interactively, @var{begin} and @var{end} are set to the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2922 beginning and end, respectively, of the active region, and the function

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2923 prompts for @var{wrong-cs} and @var{right-cs}. The function does

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2924 completion, knows how to guess a character set name from a coding system

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2925 name, and also provides some common aliases. See

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2926 @code{unity-guess-charset}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2927

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2928 Another way to accomplish this, but using coding systems rather than

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2929 character sets to specify the desired recoding, is

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2930 @samp{unity-recode-coding-region}. That function may be faster

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2931 but is somewhat more dangerous, because it may recode more than one

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2932 character set.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2933

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2934 To change from one Mule representation to another without changing identity

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2935 of any characters, use @samp{unity-remap-region}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2936 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2937

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2938 @defun unity-recode-coding-region begin end wrong-cs right-cs

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2939

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2940 Recode text between @var{begin} and @var{end} from @var{wrong-cs} to

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2941 @var{right-cs}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2942

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2943 @var{wrong-cs} and @var{right-cs} are coding systems. Characters retain

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2944 the same code point but the character set is changed. The identity of

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2945 characters may change. This is an inherently dangerous function;

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2946 multilingual text may be recoded in unexpected ways. #### It's also

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2947 dangerous because the coding systems are not sanity-checked in the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2948 current implementation.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2949

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2950 When called interactively, @var{begin} and @var{end} are set to the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2951 beginning and end, respectively, of the active region, and the function

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2952 prompts for @var{wrong-cs} and @var{right-cs}. The function does

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2953 completion, knows how to guess a coding system name from a character set

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2954 name, and also provides some common aliases. See

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2955 @code{unity-guess-coding-system}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2956

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2957 Another, safer, way to accomplish this, using character sets rather

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2958 than coding systems to specify the desired recoding, is to use

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2959 @c #### fixme in latin-unity.texi

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2960 @code{unity-recode-region}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2961

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2962 To change from one Mule representation to another without changing identity

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2963 of any characters, use @code{unity-remap-region}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2964 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2965

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2966 Helper functions for input of coding system and character set names.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2967

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2968 @defun unity-guess-charset candidate

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2969 Guess a charset based on the symbol @var{candidate}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2970

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2971 @var{candidate} itself is not tried as the value.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2972

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2973 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2974 the values in @samp{unity-charset-alias-alist}."

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2975 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2976

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2977 @defun unity-guess-coding-system candidate

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2978 Guess a coding system based on the symbol @var{candidate}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2979

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2980 @var{candidate} itself is not tried as the value.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2981

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2982 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2983 the values in @samp{unity-coding-system-alias-alist}."

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2984 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2985

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2986 @defun unity-example

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2987

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2988 A cheesy example for Unification.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2989

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2990 At present it just makes a multilingual buffer. To test, setq

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2991 buffer-file-coding-system to some value, make the buffer dirty (eg

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2992 with RET BackSpace), and save.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2993 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2994

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2995

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2996 @node Configuration, Theory of Operation, Usage, Charset Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2997 @subsection Configuring Unification for Use

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2998

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

2999 If you want Unification to be automatically initialized, invoke

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3000 @samp{enable-unification} with no arguments in your init file.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3001 @xref{Init File, , , xemacs}. If you are using GNU Emacs or an XEmacs

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3002 earlier than 21.1, you should also load @file{auto-autoloads} using the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3003 full path (@emph{never} @samp{require} @file{auto-autoloads} libraries).

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3004

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3005 You may wish to define aliases for commonly used character sets and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3006 coding systems for convenience in input.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3007

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3008 @defopt unity-charset-alias-alist

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3009 Alist mapping aliases to Mule charset names (symbols)."

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3010

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3011 The default value is

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3012 @example

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3013 ((latin-1 . latin-iso8859-1)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3014 (latin-2 . latin-iso8859-2)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3015 (latin-3 . latin-iso8859-3)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3016 (latin-4 . latin-iso8859-4)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3017 (latin-5 . latin-iso8859-9)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3018 (latin-9 . latin-iso8859-15)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3019 (latin-10 . latin-iso8859-16))

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3020 @end example

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3021

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3022 If a charset does not exist on your system, it will not complete and you

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3023 will not be able to enter it in response to prompts. A real charset

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3024 with the same name as an alias in this list will shadow the alias.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3025 @end defopt

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3026

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3027 @defopt unity-coding-system-alias-alist nil

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3028 Alist mapping aliases to Mule coding system names (symbols).

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3029

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3030 The default value is @samp{nil}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3031 @end defopt

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3032

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3033

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3034 @node Theory of Operation, What Unification Cannot Do for You, Configuration, Charset Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3035 @subsection Theory of Operation

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3036

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3037 Standard encodings suffer from the design defect that they do not

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3038 provide a reliable way to recognize which coded character sets in use.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3039 @xref{What Unification Cannot Do for You}. There are scores of

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3040 character sets which can be represented by a single octet (8-bit byte),

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3041 whose union contains many hundreds of characters. Obviously this

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3042 results in great confusion, since you can't tell the players without a

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3043 scorecard, and there is no scorecard.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3044

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3045 There are two ways to solve this problem. The first is to create a

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3046 universal coded character set. This is the concept behind Unicode.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3047 However, there have been satisfactory (nearly) universal character sets

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3048 for several decades, but even today many Westerners resist using Unicode

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3049 because they consider its space requirements excessive. On the other

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3050 hand, Asians dislike Unicode because they consider it to be incomplete.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3051 (This is partly, but not entirely, political.)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3052

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3053 In any case, Unicode only solves the internal representation problem.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3054 Many data sets will contain files in ``legacy'' encodings, and Unicode

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3055 does not help distinguish among them.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3056

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3057 The second approach is to embed information about the encodings used in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3058 a document in its text. This approach is taken by the ISO 2022

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3059 standard. This would solve the problem completely from the users' of

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3060 view, except that ISO 2022 is basically not implemented at all, in the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3061 sense that few applications or systems implement more than a small

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3062 subset of ISO 2022 functionality. This is due to the fact that

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3063 mono-literate users object to the presence of escape sequences in their

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3064 texts (which they, with some justification, consider data corruption).

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3065 Programmers are more than willing to cater to these users, since

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3066 implementing ISO 2022 is a painstaking task.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3067

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3068 In fact, Emacs/Mule adopts both of these approaches. Internally it uses

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3069 a universal character set, @dfn{Mule code}. Externally it uses ISO 2022

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3070 techniques both to save files in forms robust to encoding issues, and as

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3071 hints when attempting to ``guess'' an unknown encoding. However, Mule

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3072 suffers from a design defect, namely it embeds the character set

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3073 information that ISO 2022 attaches to runs of characters by introducing

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3074 them with a control sequence in each character. That causes Mule to

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3075 consider the ISO Latin character sets to be disjoint. This manifests

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3076 itself when a user enters characters using input methods associated with

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3077 different coded character sets into a single buffer.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3078

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3079 There are two problems stemming from this design. First, Mule

1188

11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]

youngs

parents: 1183

diff changeset

3080 represents the same character in different ways. Abstractly, '�'

1183

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3081 (LATIN SMALL LETTER O WITH ACUTE) can get represented as

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3082 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like

1188

11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]

youngs

parents: 1183

diff changeset

3083 '��' in the display might actually be represented [latin-iso8859-1

1183

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3084 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3085 #xF3 ESC - A] in the file. In some cases this treatment would be

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3086 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3087 (the CJK ideographic character meaning ``one'')), and although arguably

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3088 incorrect it is convenient when mixing the CJK scripts. But in the case

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3089 of the Latin scripts this is wrong.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3090

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3091 Worse yet, it is very likely to occur when mixing ``different'' encodings

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3092 (such as ISO 8859/1 and ISO 8859/15) that differ only in a few code

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3093 points that are almost never used. A very important example involves

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3094 email. Many sites, especially in the U.S., default to use of the ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3095 8859/1 coded character set (also called ``Latin 1,'' though these are

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3096 somewhat different concepts). However, ISO 8859/1 provides a generic

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3097 CURRENCY SIGN character. Now that the Euro has become the official

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3098 currency of most countries in Europe, this is unsatisfactory (and in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3099 practice, useless). So Europeans generally use ISO 8859/15, which is

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3100 nearly identical to ISO 8859/1 for most languages, except that it

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3101 substitutes EURO SIGN for CURRENCY SIGN.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3102

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3103 Suppose a European user yanks text from a post encoded in ISO 8859/1

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3104 into a message composition buffer, and enters some text including the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3105 Euro sign. Then Mule will consider the buffer to contain both ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3106 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3107 programmed) send the message as a multipart mixed MIME body!

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3108

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3109 This is clearly stupid. What is not as obvious is that, just as any

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3110 European can include American English in their text because ASCII is a

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3111 subset of ISO 8859/15, most European languages which use Latin

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3112 characters (eg, German and Polish) can typically be mixed while using

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3113 only one Latin coded character set (in the case of German and Polish,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3114 ISO 8859/2). However, this often depends on exactly what text is to be

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3115 encoded (even for the same pair of languages).

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3116

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3117 Unification works around the problem by converting as many characters as

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3118 possible to use a single Latin coded character set before saving the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3119 buffer.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3120

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3121 Because the problem is rarely noticable in editing a buffer, but tends

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3122 to manifest when that buffer is exported to a file or process, the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3123 Unification package uses the strategy of examining the buffer prior to

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3124 export. If use of multiple Latin coded character sets is detected,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3125 Unification attempts to unify them by finding a single coded character

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3126 set which contains all of the Latin characters in the buffer.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3127

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3128 The primary purpose of Unification is to fix the problem by giving the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3129 user the choice to change the representation of all characters to one

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3130 character set and give sensible recommendations based on context. In

1188

11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]

youngs

parents: 1183

diff changeset

3131 the '�' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and

1183

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3132 both will be suggested. In the EURO SIGN example, only ISO 8859/15

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3133 makes sense, and that is what will be recommended. In both cases, the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3134 user will be reminded that there are universal encodings available.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3135

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3136 I call this @dfn{remapping} (from the universal character set to a

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3137 particular ISO 8859 coded character set). It is mere accident that this

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3138 letter has the same code point in both character sets. (Not entirely,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3139 but there are many examples of Latin characters that have different code

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3140 points in different Latin-X sets.)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3141

1188

11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs]

youngs

parents: 1183

diff changeset

3142 Note that, in the '�' example, that treating the buffer in this way will

1183

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3143 result in a representation such as [latin-iso8859-2

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3144 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3].

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3145 This is guaranteed to occasionally result in the second problem you

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3146 observed, to which we now turn.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3147

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3148 This problem is that, although the file is intended to be an

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3149 ISO-8859/2-encoded file, in an ISO 8859/1 locale Mule (and every POSIX

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3150 compliant program---this is required by the standard, obvious if you

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3151 think a bit, @pxref{What Unification Cannot Do for You}) will read that

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3152 file as [latin-iso8859-1 #x73] [latin-iso8859-1 #x73]. Of course this

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3153 is no problem if all of the characters in the file are contained in ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3154 8859/1, but suppose there are some which are not, but are contained in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3155 the (intended) ISO 8859/2.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3156

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3157 You now want to fix this, but not by finding the same character in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3158 another set. Instead, you want to simply change the character set that

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3159 Mule associates with that buffer position without changing the code.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3160 (This is conceptually somewhat distinct from the first problem, and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3161 logically ought to be handled in the code that defines coding systems.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3162 However, unification is not an unreasonable place for it.) Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3163 provides two functions (one fast and dangerous, the other slow and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3164 careful) to handle this. I call this @dfn{recoding}, because the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3165 transformation actually involves @emph{encoding} the buffer to file

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3166 representation, then @emph{decoding} it to buffer representation (in a

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3167 different character set). This cannot be done automatically because

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3168 Mule can have no idea what the correct encoding is---after all, it

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3169 already gave you its best guess. @xref{What Unification Cannot Do for

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3170 You}. So these functions must be invoked by the user. @xref{Interactive

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3171 Usage}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3172

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3173

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3174 @node What Unification Cannot Do for You, Unification Internals, Theory of Operation, Charset Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3175 @subsection What Unification Cannot Do for You

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3176

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3177 Unification @strong{cannot} save you if you insist on exporting data in

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3178 8-bit encodings in a multilingual environment. @emph{You will

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3179 eventually corrupt data if you do this.} It is not Mule's, or any

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3180 application's, fault. You will have only yourself to blame; consider

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3181 yourself warned. (It is true that Mule has bugs, which make Mule

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3182 somewhat more dangerous and inconvenient than some naive applications.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3183 We're working to address those, but no application can remedy the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3184 inherent defect of 8-bit encodings.)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3185

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3186 Use standard universal encodings, preferably Unicode (UTF-8) unless

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3187 applicable standards indicate otherwise. The most important such case

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3188 is Internet messages, where MIME should be used, whether or not the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3189 subordinate encoding is a universal encoding. (Note that since one of

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3190 the important provisions of MIME is the @samp{Content-Type} header,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3191 which has the charset parameter, MIME is to be considered a universal

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3192 encoding for the purposes of this manual. Of course, technically

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3193 speaking it's neither a coded character set nor a coding extension

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3194 technique compliant with ISO 2022.)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3195

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3196 As mentioned earlier, the problem is that standard encodings suffer from

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3197 the design defect that they do not provide a reliable way to recognize

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3198 which coded character sets are in use. There are scores of character

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3199 sets which can be represented by a single octet (8-bit byte), whose

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3200 union contains many hundreds of characters. Thus any 8-bit coded

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3201 character set must contain characters that share code points used for

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3202 different characters in other coded character sets.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3203

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3204 This means that a given file's intended encoding cannot be identified

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3205 with 100% reliability unless it contains encoding markers such as those

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3206 provided by MIME or ISO 2022.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3207

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3208 Unification actually makes it more likely that you will have problems of

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3209 this kind. Traditionally Mule has been ``helpful'' by simply using an

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3210 ISO 2022 universal coding system when the current buffer coding system

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3211 cannot handle all the characters in the buffer. This has the effect

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3212 that, because the file contains control sequences, it is not recognized

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3213 as being in the locale's normal 8-bit encoding. It may be annoying if

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3214 you are not a Mule expert, but your data is automatically recoverable

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3215 with a tool you already have: Mule.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3216

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3217 However, with unification, Mule converts to a single 8-bit character set

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3218 when possible. But typically this will @emph{not} be in your usual

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3219 locale. Ie, the times that an ISO 8859/1 user will need Unification is

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3220 when there are ISO 8859/2 characters in the buffer. But then most

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3221 likely the file will be saved in a pure 8-bit encoding that is not ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3222 8859/1, ie, ISO 8859/2. Mule's autorecognizer (which is probably the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3223 most sophisticated yet available) cannot tell the difference between ISO

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3224 8859/1 and ISO 8859/2, and in a Western European locale will choose the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3225 former even though the latter was intended. Even the extension

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3226 (``statistical recognition'') planned for XEmacs 22 is unlikely to be at

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3227 all accurate in the case of mixed codes.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3228

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3229 So now consider adding some additional ISO 8859/1 text to the buffer.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3230 If it includes any ISO 8859/1 codes that are used by different

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3231 characters in ISO 8859/2, you now have a file that cannot be

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3232 mechanically disentangled. You need a human being who can recognize

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3233 that @emph{this is German and Swedish} and stays in Latin-1, while

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3234 @emph{that is Polish} and needs to be recoded to Latin-2.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3235

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3236 Moral: switch to a universal coded character set, preferably Unicode

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3237 using the UTF-8 transformation format. If you really need the space,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3238 compress your files.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3239

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3240

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3241 @node Unification Internals, , What Unification Cannot Do for You, Charset Unification

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3242 @subsection Internals

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3243

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3244 No internals documentation yet.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3245

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3246 @file{unity-utils.el} provides one utility function.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3247

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3248 @defun unity-dump-tables

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3249

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3250 Dump the temporary table created by loading @file{unity-utils.el}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3251 to @file{unity-tables.el}. Loading the latter file initializes

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3252 @samp{unity-equivalences}.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3253 @end defun

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3254

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3255

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3256 @node Charsets and Coding Systems, , Charset Unification, MULE

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3257 @subsection Charsets and Coding Systems

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3258

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3259 This section provides reference lists of Mule charsets and coding

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3260 systems. Mule charsets are typically named by character set and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3261 standard.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3262

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3263 @table @strong

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3264 @item ASCII variants

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3265

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3266 Identification of equivalent characters in these sets is not properly

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3267 implemented. Unification does not distinguish the two charsets.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3268

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3269 @samp{ascii} @samp{latin-jisx0201}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3270

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3271 @item Extended Latin

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3272

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3273 Characters from the following ISO 2022 conformant charsets are

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3274 identified with equivalents in other charsets in the group by

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3275 Unification.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3276

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3277 @samp{latin-iso8859-1} @samp{latin-iso8859-15} @samp{latin-iso8859-2}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3278 @samp{latin-iso8859-3} @samp{latin-iso8859-4} @samp{latin-iso8859-9}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3279 @samp{latin-iso8859-13} @samp{latin-iso8859-16}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3280

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3281 The follow charsets are Latin variants which are not understood by

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3282 Unification. In addition, many of the Asian language standards provide

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3283 ASCII, at least, and sometimes other Latin characters. None of these

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3284 are identified with their ISO 8859 equivalents.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3285

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3286 @samp{vietnamese-viscii-lower}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3287 @samp{vietnamese-viscii-upper}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3288

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3289 @item Other character sets

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3290

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3291 @samp{arabic-1-column}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3292 @samp{arabic-2-column}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3293 @samp{arabic-digit}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3294 @samp{arabic-iso8859-6}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3295 @samp{chinese-big5-1}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3296 @samp{chinese-big5-2}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3297 @samp{chinese-cns11643-1}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3298 @samp{chinese-cns11643-2}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3299 @samp{chinese-cns11643-3}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3300 @samp{chinese-cns11643-4}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3301 @samp{chinese-cns11643-5}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3302 @samp{chinese-cns11643-6}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3303 @samp{chinese-cns11643-7}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3304 @samp{chinese-gb2312}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3305 @samp{chinese-isoir165}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3306 @samp{cyrillic-iso8859-5}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3307 @samp{ethiopic}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3308 @samp{greek-iso8859-7}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3309 @samp{hebrew-iso8859-8}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3310 @samp{ipa}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3311 @samp{japanese-jisx0208}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3312 @samp{japanese-jisx0208-1978}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3313 @samp{japanese-jisx0212}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3314 @samp{katakana-jisx0201}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3315 @samp{korean-ksc5601}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3316 @samp{sisheng}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3317 @samp{thai-tis620}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3318 @samp{thai-xtis}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3319

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3320 @item Non-graphic charsets

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3321

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3322 @samp{control-1}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3323 @end table

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3324

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3325 @table @strong

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3326 @item No conversion

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3327

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3328 Some of these coding systems may specify EOL conventions. Note that

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3329 @samp{iso-8859-1} is a no-conversion coding system, not an ISO 2022

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3330 coding system. Although unification attempts to compensate for this, it

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3331 is possible that the @samp{iso-8859-1} coding system will behave

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3332 differently from other ISO 8859 coding systems.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3333

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3334 @samp{binary} @samp{no-conversion} @samp{raw-text} @samp{iso-8859-1}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3335

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3336 @item Latin coding systems

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3337

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3338 These coding systems are all single-byte, 8-bit ISO 2022 coding systems,

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3339 combining ASCII in the GL register (bytes with high-bit clear) and an

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3340 extended Latin character set in the GR register (bytes with high-bit set).

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3341

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3342 @samp{iso-8859-15} @samp{iso-8859-2} @samp{iso-8859-3} @samp{iso-8859-4}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3343 @samp{iso-8859-9} @samp{iso-8859-13} @samp{iso-8859-14} @samp{iso-8859-16}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3344

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3345 These coding systems are single-byte, 8-bit coding systems that do not

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3346 conform to international standards. They should be avoided in all

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3347 potentially multilingual contexts, including any text distributed over

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3348 the Internet and World Wide Web.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3349

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3350 @samp{windows-1251}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3351

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3352 @item Multilingual coding systems

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3353

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3354 The following ISO-2022-based coding systems are useful for multilingual

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3355 text.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3356

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3357 @samp{ctext} @samp{iso-2022-lock} @samp{iso-2022-7} @samp{iso-2022-7bit}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3358 @samp{iso-2022-7bit-ss2} @samp{iso-2022-8} @samp{iso-2022-8bit-ss2}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3359

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3360 XEmacs also supports Unicode with the Mule-UCS package. These are the

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3361 preferred coding systems for multilingual use. (There is a possible

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3362 exception for texts that mix several Asian ideographic character sets.)

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3363

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3364 @samp{utf-16-be} @samp{utf-16-be-no-signature} @samp{utf-16-le}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3365 @samp{utf-16-le-no-signature} @samp{utf-7} @samp{utf-7-safe}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3366 @samp{utf-8} @samp{utf-8-ws}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3367

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3368 Development versions of XEmacs (the 21.5 series) support Unicode

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3369 internally, with (at least) the following coding systems implemented:

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3370

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3371 @samp{utf-16-be} @samp{utf-16-be-bom} @samp{utf-16-le}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3372 @samp{utf-16-le-bom} @samp{utf-8} @samp{utf-8-bom}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3373

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3374 @item Asian ideographic languages

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3375

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3376 The following coding systems are based on ISO 2022, and are more or less

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3377 suitable for encoding multilingual texts. They all can represent ASCII

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3378 at least, and sometimes several other foreign character sets, without

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3379 resort to arbitrary ISO 2022 designations. However, these subsets are

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3380 not identified with the corresponding national standards in XEmacs Mule.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3381

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3382 @samp{chinese-euc} @samp{cn-big5} @samp{cn-gb-2312} @samp{gb2312}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3383 @samp{hz} @samp{hz-gb-2312} @samp{old-jis} @samp{japanese-euc}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3384 @samp{junet} @samp{euc-japan} @samp{euc-jp} @samp{iso-2022-jp}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3385 @samp{iso-2022-jp-1978-irv} @samp{iso-2022-jp-2} @samp{euc-kr}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3386 @samp{korean-euc} @samp{iso-2022-kr} @samp{iso-2022-int-1}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3387

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3388 The following coding systems cannot be used for general multilingual

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3389 text and do not cooperate well with other coding systems.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3390

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3391 @samp{big5} @samp{shift_jis}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3392

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3393 @item Other languages

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3394

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3395 The following coding systems are based on ISO 2022. Though none of them

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3396 provides any Latin characters beyond ASCII, XEmacs Mule allows (and up

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3397 to 21.4 defaults to) use of ISO 2022 control sequences to designate

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3398 other character sets for inclusion the text.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3399

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3400 @samp{iso-8859-5} @samp{iso-8859-7} @samp{iso-8859-8}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3401 @samp{ctext-hebrew}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3402

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3403 The following are character sets that do not conform to ISO 2022 and

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3404 thus cannot be safely used in a multilingual context.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3405

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3406 @samp{alternativnyj} @samp{koi8-r} @samp{tis-620} @samp{viqr}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3407 @samp{viscii} @samp{vscii}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3408

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3409 @item Special coding systems

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3410

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3411 Mule uses the following coding systems for special purposes.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3412

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3413 @samp{automatic-conversion} @samp{undecided} @samp{escape-quoted}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3414

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3415 @samp{escape-quoted} is especially important, as it is used internally

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3416 as the coding system for autosaved data.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3417

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3418 The following coding systems are aliases for others, and are used for

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3419 communication with the host operating system.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3420

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3421 @samp{file-name} @samp{keyboard} @samp{terminal}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3422

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3423 @end table

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3424

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3425 Mule detection of coding systems is actually limited to detection of

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3426 classes of coding systems called @dfn{coding categories}. These coding

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3427 categories are identified by the ISO 2022 control sequences they use, if

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3428 any, by their conformance to ISO 2022 restrictions on code points that

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3429 may be used, and by characteristic patterns of use of 8-bit code points.

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3430

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3431 @samp{no-conversion}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3432 @samp{utf-8}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3433 @samp{ucs-4}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3434 @samp{iso-7}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3435 @samp{iso-lock-shift}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3436 @samp{iso-8-1}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3437 @samp{iso-8-2}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3438 @samp{iso-8-designate}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3439 @samp{shift-jis}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3440 @samp{big5}

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3441

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3442

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3443 @c end of mule.texi

c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]

stephent

parents: 901

diff changeset

3444

Mercurial > hg > xemacs-beta

annotate man/lispref/mule.texi @ 5271:2def0d83a5e3