xemacs-beta: man/lispref/mule.texi annotate

annotate man/lispref/mule.texi @ 5840:93a18dbcfd8c

Don't leave fields uninitialized.

author	Marcus Crestani <marcus@crestani.de>
date	Sat, 13 Dec 2014 14:20:17 +0100
parents	9fae6227ede5
children

rev	line source
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1 @c --texinfo--
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2 @c This is part of the XEmacs Lisp Reference Manual.
775 7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	3 @c Copyright (C) 1996 Ben Wing, 2001-2002 Free Software Foundation.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	4 @c See the file lispref.texi for copying conditions.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	5 @setfilename ../../info/internationalization.info
5791 9fae6227ede5 Silence texinfo 5.2 warnings, primarily by adding next, prev, and up Jerry James <james@xemacs.org> parents: 5384 diff changeset	6 @node MULE, Tips, Internationalization, Top
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	7 @chapter MULE
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	8
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	9 @dfn{MULE} is the name originally given to the version of GNU Emacs
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	10 extended for multi-lingual (and in particular Asian-language) support.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	11 ``MULE'' is short for ``MUlti-Lingual Emacs''. It is an extension and
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	12 complete rewrite of Nemacs (``Nihon Emacs'' where ``Nihon'' is the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	13 Japanese word for ``Japan''), which only provided support for Japanese.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	14 XEmacs refers to its multi-lingual support as @dfn{MULE support} since
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	15 it is based on @dfn{MULE}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	16
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	17 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	18 * Internationalization Terminology::
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	19 Definition of various internationalization terms.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	20 * Charsets:: Sets of related characters.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	21 * MULE Characters:: Working with characters in XEmacs/MULE.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	22 * Composite Characters:: Making new characters by overstriking other ones.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	23 * Coding Systems:: Ways of representing a string of chars using integers.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	24 * CCL:: A special language for writing fast converters.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	25 * Category Tables:: Subdividing charsets into groups.
775 7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	26 * Unicode Support:: The universal coded character set.
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	27 * Charset Unification:: Handling overlapping character sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	28 * Charsets and Coding Systems:: Tables and reference information.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	29 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	30
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	31 @node Internationalization Terminology, Charsets, , MULE
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	32 @section Internationalization Terminology
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	33
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	34 In internationalization terminology, a string of text is divided up
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	35 into @dfn{characters}, which are the printable units that make up the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	36 text. A single character is (for example) a capital @samp{A}, the
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	37 number @samp{2}, a Katakana character, a Hangul character, a Kanji
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	38 ideograph (an @dfn{ideograph} is a ``picture'' character, such as is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	39 used in Japanese Kanji, Chinese Hanzi, and Korean Hanja; typically there
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	40 are thousands of such ideographs in each language), etc. The basic
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	41 property of a character is that it is the smallest unit of text with
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	42 semantic significance in text processing---i.e., characters are abstract
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	43 units defined by their meaning, not by their exact appearance.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	44
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	45 Human beings normally process text visually, so to a first approximation
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	46 a character may be identified with its shape. Note that the same
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	47 character may be drawn by two different people (or in two different
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	48 fonts) in slightly different ways, although the "basic shape" will be the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	49 same. But consider the works of Scott Kim; human beings can recognize
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	50 hugely variant shapes as the "same" character. Sometimes, especially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	51 where characters are extremely complicated to write, completely
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	52 different shapes may be defined as the "same" character in national
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	53 standards. The Taiwanese variant of Hanzi is generally the most
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	54 complicated; over the centuries, the Japanese, Koreans, and the People's
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	55 Republic of China have adopted simplifications of the shape, but the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	56 line of descent from the original shape is recorded, and the meanings
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	57 and pronunciation of different forms of the same character are
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	58 considered to be identical within each language. (Of course, it may
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	59 take a specialist to recognize the related form; the point is that the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	60 relations are standardized, despite the differing shapes.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	61
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	62 In some cases, the differences will be significant enough that it is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	63 actually possible to identify two or more distinct shapes that both
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	64 represent the same character. For example, the lowercase letters
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	65 @samp{a} and @samp{g} each have two distinct possible shapes---the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	66 @samp{a} can optionally have a curved tail projecting off the top, and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	67 the @samp{g} can be formed either of two loops, or of one loop and a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	68 tail hanging off the bottom. Such distinct possible shapes of a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	69 character are called @dfn{glyphs}. The important characteristic of two
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	70 glyphs making up the same character is that the choice between one or
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	71 the other is purely stylistic and has no linguistic effect on a word
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	72 (this is the reason why a capital @samp{A} and lowercase @samp{a}
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	73 are different characters rather than different glyphs---e.g.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	74 @samp{Aspen} is a city while @samp{aspen} is a kind of tree).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	75
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	76 Note that @dfn{character} and @dfn{glyph} are used differently
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	77 here than elsewhere in XEmacs.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	78
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	79 A @dfn{character set} is essentially a set of related characters. ASCII,
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	80 for example, is a set of 94 characters (or 128, if you count
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	81 non-printing characters). Other character sets are ISO8859-1 (ASCII
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	82 plus various accented characters and other international symbols),
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	83 JIS X 0201 (ASCII, more or less, plus half-width Katakana), JIS X 0208
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	84 (Japanese Kanji), JIS X 0212 (a second set of less-used Japanese Kanji),
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	85 GB2312 (Mainland Chinese Hanzi), etc.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	86
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	87 The definition of a character set will implicitly or explicitly give
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	88 it an @dfn{ordering}, a way of assigning a number to each character in
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	89 the set. For many character sets, there is a natural ordering, for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	90 example the ``ABC'' ordering of the Roman letters. But it is not clear
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	91 whether digits should come before or after the letters, and in fact
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	92 different European languages treat the ordering of accented characters
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	93 differently. It is useful to use the natural order where available, of
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	94 course. The number assigned to any particular character is called the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	95 character's @dfn{code point}. (Within a given character set, each
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	96 character has a unique code point. Thus the word "set" is ill-chosen;
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	97 different orderings of the same characters are different character sets.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	98 Identifying characters is simple enough for alphabetic character sets,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	99 but the difference in ordering can cause great headaches when the same
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	100 thousands of characters are used by different cultures as in the Hanzi.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	101
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	102 It's important to understand that a character is defined not by any
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	103 number attached to it, but by its meaning. For example, ASCII and
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	104 EBCDIC are two charsets containing exactly the same characters
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	105 (lowercase and uppercase letters, numbers 0 through 9, particular
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	106 punctuation marks) but with different numberings. The @samp{comma}
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	107 character in ASCII and EBCDIC, for instance, is the same character
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	108 despite having a different numbering. Conversely, when comparing ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	109 and JIS-Roman, which look the same except that the latter has a yen sign
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	110 substituted for the backslash, we would say that the backslash and yen
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	111 sign are @emph{not} the same characters, despite having the same number
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	112 (95) and despite the fact that all other characters are present in both
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	113 charsets, with the same numbering. ASCII and JIS-Roman, then, do
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	114 @emph{not} have exactly the same characters in them (ASCII has a
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	115 backslash character but no yen-sign character, and vice-versa for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	116 JIS-Roman), unlike ASCII and EBCDIC, even though the numberings in ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	117 and JIS-Roman are closer.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	118
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	119 Sometimes, a code point is not a single number, but instead a group of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	120 numbers, called @dfn{position codes}. In such cases, the number of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	121 position codes required to index a particular character in a character
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	122 set is called the @dfn{dimension} of the character set. Character sets
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	123 indexed by more than one position code typically use byte-sized position
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	124 codes. Small character sets, e.g. ASCII, invariably use a single
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	125 position code, but for larger character sets, the choice of whether to
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	126 use multiple position codes or a single large (16-bit or 32-bit) number
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	127 is arbitrary. Unicode typically uses a single large number, but
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	128 language-specific or "national" character sets often use multiple
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	129 (usually two) position codes. For example, JIS X 0208, i.e. Japanese
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	130 Kanji, has thousands of characters, and is of dimension two -- every
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	131 character is indexed by two position codes, each in the range 1 through
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	132 94. (This number ``94'' is not a coincidence; it is the same as the
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	133 number of printable characters in ASCII, and was chosen so that JIS
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	134 characters could be directly encoded using two printable ASCII
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	135 characters.) Note that the choice of the range here is somewhat
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	136 arbitrary -- it could just as easily be 0 through 93, 2 through 95, etc.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	137 In fact, the range for JIS position codes (and for other character sets
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	138 modeled after it) is often given as range 33 through 126, so as to
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	139 directly match ASCII printing characters.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	140
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	141 An @dfn{encoding} is a way of numerically representing characters from
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	142 one or more character sets into a stream of like-sized numerical values
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	143 called @dfn{words} -- typically 8-bit bytes, but sometimes 16-bit or
2818 9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	144 32-bit quantities. In a context where dealing with Japanese motivates
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	145 much of XEmacs' design in this area, it's important to clearly
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	146 distinguish between charsets and encodings. For a simple charset like
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	147 ASCII, there is only one encoding normally used -- each character is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	148 represented by a single byte, with the same value as its code point.
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	149 For more complicated charsets, however, or when a single encoding needs
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	150 to represent more than charset, things are not so obvious. Unicode
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	151 version 2, for example, is a large charset with thousands of characters,
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	152 each indexed by a 16-bit number, often represented in hex, e.g. 0x05D0
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	153 for the Hebrew letter "aleph". One obvious encoding (actually two
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	154 encodings, depending on which of the two possible byte orderings is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	155 chosen) simply uses two bytes per character. This encoding is
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	156 convenient for internal processing of Unicode text; however, it's
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	157 incompatible with ASCII, and thus external text (files, e-mail, etc.)
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	158 that is encoded this way is completely uninterpretable by programs
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	159 lacking Unicode support. For this reason, a different, ASCII-compatible
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	160 encoding, e.g. UTF-8, is usually used for external text. UTF-8
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	161 represents Unicode characters with one to three bytes (often extended to
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	162 six bytes to handle characters with up to 31-bit indices). Unicode
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	163 characters 00 to 7F (identical with ASCII) are directly represented with
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	164 one byte, and other characters with two or more bytes, each in the range
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	165 80 to FF. Applications that don't understand Unicode will still be able
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	166 to process ASCII characters represented in UTF-8-encoded text, and will
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	167 typically ignore (and hopefully preserve) the high-bit characters.
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	168
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	169 Similarly, Shift-JIS and EUC-JP are different encodings normally used to
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	170 encode the same character set(s), these character sets being subsets of
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	171 Unicode. However, the obvious approach of unifying XEmacs' internal
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	172 encoding across character sets, as was part of the motivation behind
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	173 Unicode, wasn't taken. This means that characters in these character
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	174 sets that are identical to characters in other character sets---for
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	175 example, the Greek alphabet is in the large Japanese character sets and
9fa10603c898 [xemacs-hg @ 2005-06-19 20:49:43 by aidan] aidan parents: 2690 diff changeset	176 at least one European character set--are unfortunately disjoint.
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	177
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	178 Naive use of code points is also not possible if more than one
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	179 character set is to be used in the encoding. For example, printed
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	180 Japanese text typically requires characters from multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	181 -- ASCII, JIS X 0208, and JIS X 0212, to be specific. Each of these is
1261 465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	182 indexed using one or more position codes in the range 1 through 94 (or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	183 33 through 126), so the position codes could not be used directly or
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	184 there would be no way to tell which character was meant. Different
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	185 Japanese encodings handle this differently -- JIS uses special escape
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	186 characters to denote different character sets; EUC sets the high bit of
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	187 the position codes for JIS X 0208 and JIS X 0212, and puts a special
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	188 extra byte before each JIS X 0212 character; etc.
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	189
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	190 The encodings described above are all 7-bit or 8-bit encodings. The
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	191 fixed-width Unicode encoding previous described, however, is sometimes
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	192 considered to be a 16-bit encoding, in which case the issue of byte
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	193 ordering does not come up. (Imagine, for example, that the text is
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	194 represented as an array of shorts.) Similarly, Unicode version 3 (which
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	195 has characters with indices above 0xFFFF), and other very large
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	196 character sets, may be represented internally as 32-bit encodings,
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	197 i.e. arrays of ints. However, it does not make too much sense to talk
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	198 about 16-bit or 32-bit encodings for external data, since nowadays 8-bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	199 data is a universal standard -- the closest you can get is fixed-width
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	200 encodings using two or four bytes to encode 16-bit or 32-bit values. (A
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	201 "7-bit" encoding is used when it cannot be guaranteed that the high bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	202 of 8-bit data will be correctly preserved. Some e-mail gateways, for
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	203 example, strip the high bit of text passing through them. These same
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	204 gateways often handle non-printable characters incorrectly, and so 7-bit
465bd3c7d932 [xemacs-hg @ 2003-02-06 06:35:47 by ben] ben parents: 1188 diff changeset	205 encodings usually avoid using bytes with such values.)
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	206
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	207 A general method of handling text using multiple character sets
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	208 (whether for multilingual text, or simply text in an extremely
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	209 complicated single language like Japanese) is defined in the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	210 international standard ISO 2022. ISO 2022 will be discussed in more
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	211 detail later (@pxref{ISO 2022}), but for now suffice it to say that text
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	212 needs control functions (at least spacing), and if escape sequences are
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	213 to be used, an escape sequence introducer. It was decided to make all
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	214 text streams compatible with ASCII in the sense that the codes 0--31
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	215 (and 128-159) would always be control codes, never graphic characters,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	216 and where defined by the character set the @samp{SPC} character would be
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	217 assigned code 32, and @samp{DEL} would be assigned 127. Thus there are
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	218 94 code points remaining if 7 bits are used. This is the reason that
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	219 most character sets are defined using position codes in the range 1
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	220 through 94. Then ISO 2022 compatible encodings are produced by shifting
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	221 the position codes 1 to 94 into character codes 33 to 126, or (if 8 bit
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	222 codes are available) into character codes 161 to 254.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	223
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	224 Encodings are classified as either @dfn{modal} or @dfn{non-modal}. In
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	225 a @dfn{modal encoding}, there are multiple states that the encoding can
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	226 be in, and the interpretation of the values in the stream depends on the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	227 current global state of the encoding. Special values in the encoding,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	228 called @dfn{escape sequences}, are used to change the global state.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	229 JIS, for example, is a modal encoding. The bytes @samp{ESC $ B}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	230 indicate that, from then on, bytes are to be interpreted as position
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	231 codes for JIS X 0208, rather than as ASCII. This effect is cancelled
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	232 using the bytes @samp{ESC ( B}, which mean ``switch from whatever the
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	233 current state is to ASCII''. To switch to JIS X 0212, the escape
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	234 sequence @samp{ESC $ ( D}. (Note that here, as is common, the escape
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	235 sequences do in fact begin with @samp{ESC}. This is not necessarily the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	236 case, however. Some encodings use control characters called "locking
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	237 shifts" (effect persists until cancelled) to switch character sets.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	238
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	239 A @dfn{non-modal encoding} has no global state that extends past the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	240 character currently being interpreted. EUC, for example, is a
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	241 non-modal encoding. Characters in JIS X 0208 are encoded by setting
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	242 the high bit of the position codes, and characters in JIS X 0212 are
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	243 encoded by doing the same but also prefixing the character with the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	244 byte 0x8F.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	245
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	246 The advantage of a modal encoding is that it is generally more
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	247 space-efficient, and is easily extendible because there are essentially
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	248 an arbitrary number of escape sequences that can be created. The
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	249 disadvantage, however, is that it is much more difficult to work with
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	250 if it is not being processed in a sequential manner. In the non-modal
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	251 EUC encoding, for example, the byte 0x41 always refers to the letter
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	252 @samp{A}; whereas in JIS, it could either be the letter @samp{A}, or
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	253 one of the two position codes in a JIS X 0208 character, or one of the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	254 two position codes in a JIS X 0212 character. Determining exactly which
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	255 one is meant could be difficult and time-consuming if the previous
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	256 bytes in the string have not already been processed, or impossible if
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	257 they are drawn from an external stream that cannot be rewound.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	258
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	259 Non-modal encodings are further divided into @dfn{fixed-width} and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	260 @dfn{variable-width} formats. A fixed-width encoding always uses
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	261 the same number of words per character, whereas a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	262 encoding does not. EUC is a good example of a variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	263 encoding: one to three bytes are used per character, depending on
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	264 the character set. 16-bit and 32-bit encodings are nearly always
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	265 fixed-width, and this is in fact one of the main reasons for using
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	266 an encoding with a larger word size. The advantages of fixed-width
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	267 encodings should be obvious. The advantages of variable-width
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	268 encodings are that they are generally more space-efficient and allow
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	269 for compatibility with existing 8-bit encodings such as ASCII. (For
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	270 example, in Unicode ASCII characters are simply promoted to a 16-bit
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	271 representation. That means that every ASCII character contains a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	272 @samp{NUL} byte; evidently all of the standard string manipulation
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	273 functions will lose badly in a fixed-width Unicode environment.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	274
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	275 The bytes in an 8-bit encoding are often referred to as @dfn{octets}
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	276 rather than simply as bytes. This terminology dates back to the days
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	277 before 8-bit bytes were universal, when some computers had 9-bit bytes,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	278 others had 10-bit bytes, etc.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	279
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	280 @node Charsets, MULE Characters, Internationalization Terminology, MULE
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	281 @section Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	282
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	283 A @dfn{charset} in MULE is an object that encapsulates a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	284 particular character set as well as an ordering of those characters.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	285 Charsets are permanent objects and are named using symbols, like
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	286 faces.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	287
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	288 @defun charsetp object
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	289 This function returns non-@code{nil} if @var{object} is a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	290 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	291
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	292 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	293 * Charset Properties:: Properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	294 * Basic Charset Functions:: Functions for working with charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	295 * Charset Property Functions:: Functions for accessing charset properties.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	296 * Predefined Charsets:: Predefined charset objects.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	297 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	298
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	299 @node Charset Properties, Basic Charset Functions, , Charsets
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	300 @subsection Charset Properties
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	301
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	302 Charsets have the following properties:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	303
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	304 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	305 @item name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	306 A symbol naming the charset. Every charset must have a different name;
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	307 this allows a charset to be referred to using its name rather than
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	308 the actual charset object.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	309 @item doc-string
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	310 A documentation string describing the charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	311 @item registry
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	312 A regular expression matching the font registry field for this character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	313 set. For example, both the @code{ascii} and @code{latin-iso8859-1}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	314 charsets use the registry @code{"ISO8859-1"}. This field is used to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	315 choose an appropriate font when the user gives a general font
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	316 specification such as @samp{--courier-medium-r--140-*}, i.e. a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	317 14-point upright medium-weight Courier font.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	318 @item dimension
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	319 Number of position codes used to index a character in the character set.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	320 XEmacs/MULE can only handle character sets of dimension 1 or 2.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	321 This property defaults to 1.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	322 @item chars
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	323 Number of characters in each dimension. In XEmacs/MULE, the only
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	324 allowed values are 94 or 96. (There are a couple of pre-defined
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	325 character sets, such as ASCII, that do not follow this, but you cannot
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	326 define new ones like this.) Defaults to 94. Note that if the dimension
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	327 is 2, the character set thus described is 94x94 or 96x96.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	328 @item columns
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	329 Number of columns used to display a character in this charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	330 Only used in TTY mode. (Under X, the actual width of a character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	331 can be derived from the font used to display the characters.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	332 If unspecified, defaults to the dimension. (This is almost
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	333 always the correct value, because character sets with dimension 2
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	334 are usually ideograph character sets, which need two columns to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	335 display the intricate ideographs.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	336 @item direction
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	337 A symbol, either @code{l2r} (left-to-right) or @code{r2l}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	338 (right-to-left). Defaults to @code{l2r}. This specifies the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	339 direction that the text should be displayed in, and will be
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	340 left-to-right for most charsets but right-to-left for Hebrew
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	341 and Arabic. (Right-to-left display is not currently implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	342 @item final
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	343 Final byte of the standard ISO 2022 escape sequence designating this
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	344 charset. Must be supplied. Each combination of (@var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	345 @var{chars}) defines a separate namespace for final bytes, and each
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	346 charset within a particular namespace must have a different final byte.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	347 Note that ISO 2022 restricts the final byte to the range 0x30 - 0x7E if
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	348 dimension == 1, and 0x30 - 0x5F if dimension == 2. Note also that final
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	349 bytes in the range 0x30 - 0x3F are reserved for user-defined (not
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	350 official) character sets. For more information on ISO 2022, see @ref{Coding
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	351 Systems}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	352 @item graphic
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	353 0 (use left half of font on output) or 1 (use right half of font on
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	354 output). Defaults to 0. This specifies how to convert the position
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	355 codes that index a character in a character set into an index into the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	356 font used to display the character set. With @code{graphic} set to 0,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	357 position codes 33 through 126 map to font indices 33 through 126; with
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	358 it set to 1, position codes 33 through 126 map to font indices 161
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	359 through 254 (i.e. the same number but with the high bit set). For
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	360 example, for a font whose registry is ISO8859-1, the left half of the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	361 font (octets 0x20 - 0x7F) is the @code{ascii} charset, while the right
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	362 half (octets 0xA0 - 0xFF) is the @code{latin-iso8859-1} charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	363 @item ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	364 A compiled CCL program used to convert a character in this charset into
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	365 an index into the font. This is in addition to the @code{graphic}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	366 property. If a CCL program is defined, the position codes of a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	367 character will first be processed according to @code{graphic} and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	368 then passed through the CCL program, with the resulting values used
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	369 to index the font.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	370
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	371 This is used, for example, in the Big5 character set (used in Taiwan).
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	372 This character set is not ISO-2022-compliant, and its size (94x157) does
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	373 not fit within the maximum 96x96 size of ISO-2022-compliant character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	374 sets. As a result, XEmacs/MULE splits it (in a rather complex fashion,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	375 so as to group the most commonly used characters together) into two
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	376 charset objects (@code{big5-1} and @code{big5-2}), each of size 94x94,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	377 and each charset object uses a CCL program to convert the modified
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	378 position codes back into standard Big5 indices to retrieve a character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	379 from a Big5 font.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	380 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	381
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	382 Most of the above properties can only be set when the charset is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	383 initialized, and cannot be changed later.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	384 @xref{Charset Property Functions}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	385
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	386 @node Basic Charset Functions, Charset Property Functions, Charset Properties, Charsets
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	387 @subsection Basic Charset Functions
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	388
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	389 @defun find-charset charset-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	390 This function retrieves the charset of the given name. If
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	391 @var{charset-or-name} is a charset object, it is simply returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	392 Otherwise, @var{charset-or-name} should be a symbol. If there is no
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	393 such charset, @code{nil} is returned. Otherwise the associated charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	394 object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	395 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	396
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	397 @defun get-charset name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	398 This function retrieves the charset of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	399 @code{find-charset} except an error is signalled if there is no such
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	400 charset instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	401 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	402
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	403 @defun charset-list
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	404 This function returns a list of the names of all defined charsets.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	405 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	406
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	407 @defun make-charset name doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	408 This function defines a new character set. This function is for use
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	409 with MULE support. @var{name} is a symbol, the name by which the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	410 character set is normally referred. @var{doc-string} is a string
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	411 describing the character set. @var{props} is a property list,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	412 describing the specific nature of the character set. The recognized
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	413 properties are @code{registry}, @code{dimension}, @code{columns},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	414 @code{chars}, @code{final}, @code{graphic}, @code{direction}, and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	415 @code{ccl-program}, as previously described.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	416 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	417
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	418 @defun make-reverse-direction-charset charset new-name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	419 This function makes a charset equivalent to @var{charset} but which goes
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	420 in the opposite direction. @var{new-name} is the name of the new
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	421 charset. The new charset is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	422 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	423
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	424 @defun charset-from-attributes dimension chars final &optional direction
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	425 This function returns a charset with the given @var{dimension},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	426 @var{chars}, @var{final}, and @var{direction}. If @var{direction} is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	427 omitted, both directions will be checked (left-to-right will be returned
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	428 if character sets exist for both directions).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	429 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	430
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	431 @defun charset-reverse-direction-charset charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	432 This function returns the charset (if any) with the same dimension,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	433 number of characters, and final byte as @var{charset}, but which is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	434 displayed in the opposite direction.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	435 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	436
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	437 @node Charset Property Functions, Predefined Charsets, Basic Charset Functions, Charsets
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	438 @subsection Charset Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	439
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	440 All of these functions accept either a charset name or charset object.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	441
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	442 @defun charset-property charset prop
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	443 This function returns property @var{prop} of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	444 @xref{Charset Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	445 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	446
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	447 Convenience functions are also provided for retrieving individual
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	448 properties of a charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	449
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	450 @defun charset-name charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	451 This function returns the name of @var{charset}. This will be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	452 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	453
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	454 @defun charset-description charset
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	455 This function returns the documentation string of @var{charset}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	456 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	457
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	458 @defun charset-registry charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	459 This function returns the registry of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	460 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	461
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	462 @defun charset-dimension charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	463 This function returns the dimension of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	464 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	465
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	466 @defun charset-chars charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	467 This function returns the number of characters per dimension of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	468 @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	469 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	470
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	471 @defun charset-width charset
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	472 This function returns the number of display columns per character (in
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	473 TTY mode) of @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	474 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	475
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	476 @defun charset-direction charset
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	477 This function returns the display direction of @var{charset}---either
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	478 @code{l2r} or @code{r2l}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	479 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	480
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	481 @defun charset-iso-final-char charset
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	482 This function returns the final byte of the ISO 2022 escape sequence
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	483 designating @var{charset}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	484 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	485
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	486 @defun charset-iso-graphic-plane charset
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	487 This function returns either 0 or 1, depending on whether the position
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	488 codes of characters in @var{charset} map to the left or right half
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	489 of their font, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	490 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	491
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	492 @defun charset-ccl-program charset
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	493 This function returns the CCL program, if any, for converting
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	494 position codes of characters in @var{charset} into font indices.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	495 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	496
1734 d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	497 The two properties of a charset that can currently be set after the
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	498 charset has been created are the CCL program and the font registry.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	499
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	500 @defun set-charset-ccl-program charset ccl-program
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	501 This function sets the @code{ccl-program} property of @var{charset} to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	502 @var{ccl-program}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	503 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	504
1734 d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	505 @defun set-charset-registry charset registry
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	506 This function sets the @code{registry} property of @var{charset} to
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	507 @var{registry}.
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	508 @end defun
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	509
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	510 @node Predefined Charsets, , Charset Property Functions, Charsets
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	511 @subsection Predefined Charsets
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	512
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	513 The following charsets are predefined in the C code.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	514
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	515 @example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	516 Name Type Fi Gr Dir Registry
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	517 --------------------------------------------------------------
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	518 ascii 94 B 0 l2r ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	519 control-1 94 0 l2r ---
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	520 latin-iso8859-1 94 A 1 l2r ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	521 latin-iso8859-2 96 B 1 l2r ISO8859-2
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	522 latin-iso8859-3 96 C 1 l2r ISO8859-3
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	523 latin-iso8859-4 96 D 1 l2r ISO8859-4
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	524 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	525 arabic-iso8859-6 96 G 1 r2l ISO8859-6
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	526 greek-iso8859-7 96 F 1 l2r ISO8859-7
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	527 hebrew-iso8859-8 96 H 1 r2l ISO8859-8
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	528 latin-iso8859-9 96 M 1 l2r ISO8859-9
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	529 thai-tis620 96 T 1 l2r TIS620
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	530 katakana-jisx0201 94 I 1 l2r JISX0201.1976
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	531 latin-jisx0201 94 J 0 l2r JISX0201.1976
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	532 japanese-jisx0208-1978 94x94 @@ 0 l2r JISX0208.1978
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	533 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83\|90)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	534 japanese-jisx0212 94x94 D 0 l2r JISX0212
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	535 chinese-gb2312 94x94 A 0 l2r GB2312
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	536 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	537 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	538 chinese-big5-1 94x94 0 0 l2r Big5
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	539 chinese-big5-2 94x94 1 0 l2r Big5
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	540 korean-ksc5601 94x94 C 0 l2r KSC5601
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	541 composite 96x96 0 l2r ---
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	542 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	543
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	544 The following charsets are predefined in the Lisp code.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	545
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	546 @example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	547 Name Type Fi Gr Dir Registry
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	548 --------------------------------------------------------------
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	549 arabic-digit 94 2 0 l2r MuleArabic-0
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	550 arabic-1-column 94 3 0 r2l MuleArabic-1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	551 arabic-2-column 94 4 0 r2l MuleArabic-2
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	552 sisheng 94 0 0 l2r sisheng_cwnn\\|OMRON_UDC_ZH
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	553 chinese-cns11643-3 94x94 I 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	554 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	555 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	556 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	557 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	558 ethiopic 94x94 2 0 l2r Ethio
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	559 ascii-r2l 94 B 0 r2l ISO8859-1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	560 ipa 96 0 1 l2r MuleIPA
1734 d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	561 vietnamese-viscii-lower 96 1 1 l2r VISCII1.1
d6d41d23b6ec [xemacs-hg @ 2003-10-10 10:18:24 by stephent] stephent parents: 1261 diff changeset	562 vietnamese-viscii-upper 96 2 1 l2r VISCII1.1
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	563 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	564
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	565 For all of the above charsets, the dimension and number of columns are
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	566 the same.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	567
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	568 Note that ASCII, Control-1, and Composite are handled specially.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	569 This is why some of the fields are blank; and some of the filled-in
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	570 fields (e.g. the type) are not really accurate.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	571
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	572 @node MULE Characters, Composite Characters, Charsets, MULE
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	573 @section MULE Characters
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	574
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	575 @defun make-char charset arg1 &optional arg2
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	576 This function makes a multi-byte character from @var{charset} and octets
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	577 @var{arg1} and @var{arg2}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	578 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	579
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	580 @defun char-charset character
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	581 This function returns the character set of char @var{character}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	582 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	583
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	584 @defun char-octet character &optional n
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	585 This function returns the octet (i.e. position code) numbered @var{n}
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	586 (should be 0 or 1) of char @var{character}. @var{n} defaults to 0 if omitted.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	587 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	588
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	589 @defun find-charset-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	590 This function returns a list of the charsets in the region between
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	591 @var{start} and @var{end}. @var{buffer} defaults to the current buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	592 if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	593 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	594
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	595 @defun find-charset-string string
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	596 This function returns a list of the charsets in @var{string}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	597 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	598
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	599 @node Composite Characters, Coding Systems, MULE Characters, MULE
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	600 @section Composite Characters
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	601
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	602 Composite characters are not yet completely implemented.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	603
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	604 @defun make-composite-char string
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	605 This function converts a string into a single composite character. The
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	606 character is the result of overstriking all the characters in the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	607 string.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	608 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	609
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	610 @defun composite-char-string character
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	611 This function returns a string of the characters comprising a composite
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	612 character.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	613 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	614
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	615 @defun compose-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	616 This function composes the characters in the region from @var{start} to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	617 @var{end} in @var{buffer} into one composite character. The composite
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	618 character replaces the composed characters. @var{buffer} defaults to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	619 the current buffer if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	620 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	621
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	622 @defun decompose-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	623 This function decomposes any composite characters in the region from
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	624 @var{start} to @var{end} in @var{buffer}. This converts each composite
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	625 character into one or more characters, the individual characters out of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	626 which the composite character was formed. Non-composite characters are
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	627 left as-is. @var{buffer} defaults to the current buffer if omitted.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	628 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	629
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	630 @node Coding Systems, CCL, Composite Characters, MULE
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	631 @section Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	632
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	633 A coding system is an object that defines how text containing multiple
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	634 character sets is encoded into a stream of (typically 8-bit) bytes. The
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	635 coding system is used to decode the stream into a series of characters
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	636 (which may be from multiple charsets) when the text is read from a file
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	637 or process, and is used to encode the text back into the same format
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	638 when it is written out to a file or process.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	639
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	640 For example, many ISO-2022-compliant coding systems (such as Compound
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	641 Text, which is used for inter-client data under the X Window System) use
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	642 escape sequences to switch between different charsets -- Japanese Kanji,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	643 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	644 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	645 @code{make-coding-system} for more information.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	646
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	647 Coding systems are normally identified using a symbol, and the symbol is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	648 accepted in place of the actual coding system object whenever a coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	649 system is called for. (This is similar to how faces and charsets work.)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	650
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	651 @defun coding-system-p object
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	652 This function returns non-@code{nil} if @var{object} is a coding system.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	653 @end defun
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	654
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	655 @menu
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	656 * Coding System Types:: Classifying coding systems.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	657 * ISO 2022:: An international standard for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	658 charsets and encodings.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	659 * EOL Conversion:: Dealing with different ways of denoting
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	660 the end of a line.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	661 * Coding System Properties:: Properties of a coding system.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	662 * Basic Coding System Functions:: Working with coding systems.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	663 * Coding System Property Functions:: Retrieving a coding system's properties.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	664 * Encoding and Decoding Text:: Encoding and decoding text.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	665 * Detection of Textual Encoding:: Determining how text is encoded.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	666 * Big5 and Shift-JIS Functions:: Special functions for these non-standard
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	667 encodings.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	668 * Predefined Coding Systems:: Coding systems implemented by MULE.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	669 @end menu
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	670
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	671 @node Coding System Types, ISO 2022, , Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	672 @subsection Coding System Types
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	673
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	674 The coding system type determines the basic algorithm XEmacs will use to
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	675 decode or encode a data stream. Character encodings will be converted
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	676 to the MULE encoding, escape sequences processed, and newline sequences
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	677 converted to XEmacs's internal representation. There are three basic
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	678 classes of coding system type: no-conversion, ISO-2022, and special.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	679
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	680 No conversion allows you to look at the file's internal representation.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	681 Since XEmacs is basically a text editor, "no conversion" does convert
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	682 newline conventions by default. (Use the 'binary coding-system if this
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	683 is not desired.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	684
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	685 ISO 2022 (@pxref{ISO 2022}) is the basic international standard regulating
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	686 use of "coded character sets for the exchange of data", ie, text
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	687 streams. ISO 2022 contains functions that make it possible to encode
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	688 text streams to comply with restrictions of the Internet mail system and
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	689 de facto restrictions of most file systems (eg, use of the separator
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	690 character in file names). Coding systems which are not ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	691 conformant can be difficult to handle. Perhaps more important, they are
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	692 not adaptable to multilingual information interchange, with the obvious
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	693 exception of ISO 10646 (Unicode). (Unicode is partially supported by
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	694 XEmacs with the addition of the Lisp package ucs-conv.)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	695
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	696 The special class of coding systems includes automatic detection, CCL (a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	697 "little language" embedded as an interpreter, useful for translating
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	698 between variants of a single character set), non-ISO-2022-conformant
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	699 encodings like Unicode, Shift JIS, and Big5, and MULE internal coding.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	700 (NB: this list is based on XEmacs 21.2. Terminology may vary slightly
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	701 for other versions of XEmacs and for GNU Emacs 20.)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	702
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	703 @table @code
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	704 @item no-conversion
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	705 No conversion, for binary files, and a few special cases of non-ISO-2022
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	706 coding systems where conversion is done by hook functions (usually
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	707 implemented in CCL). On output, graphic characters that are not in
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	708 ASCII or Latin-1 will be replaced by a @samp{?}. (For a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	709 no-conversion-encoded buffer, these characters will only be present if
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	710 you explicitly insert them.)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	711 @item iso2022
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	712 Any ISO-2022-compliant encoding. Among others, this includes JIS (the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	713 Japanese encoding commonly used for e-mail), national variants of EUC
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	714 (the standard Unix encoding for Japanese and other languages), and
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	715 Compound Text (an encoding used in X11). You can specify more specific
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	716 information about the conversion with the @var{flags} argument.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	717 @item ucs-4
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	718 ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of Unicode.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	719 @item utf-8
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	720 ISO 10646 UTF-8 encoding. A ``file system safe'' transformation format
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	721 that can be used with both UCS-4 and Unicode.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	722 @item undecided
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	723 Automatic conversion. XEmacs attempts to detect the coding system used
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	724 in the file.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	725 @item shift-jis
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	726 Shift-JIS (a Japanese encoding commonly used in PC operating systems).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	727 @item big5
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	728 Big5 (the encoding commonly used for Taiwanese).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	729 @item ccl
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	730 The conversion is performed using a user-written pseudo-code program.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	731 CCL (Code Conversion Language) is the name of this pseudo-code. For
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	732 example, CCL is used to map KOI8-R characters (an encoding for Russian
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	733 Cyrillic) to ISO8859-5 (the form used internally by MULE).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	734 @item internal
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	735 Write out or read in the raw contents of the memory representing the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	736 buffer's text. This is primarily useful for debugging purposes, and is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	737 only enabled when XEmacs has been compiled with @code{DEBUG_XEMACS} set
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	738 (the @samp{--debug} configure option). @strong{Warning}: Reading in a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	739 file using @code{internal} conversion can result in an internal
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	740 inconsistency in the memory representing a buffer's text, which will
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	741 produce unpredictable results and may cause XEmacs to crash. Under
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	742 normal circumstances you should never use @code{internal} conversion.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	743 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	744
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	745 @node ISO 2022, EOL Conversion, Coding System Types, Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	746 @section ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	747
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	748 This section briefly describes the ISO 2022 encoding standard. A more
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	749 thorough treatment is available in the original document of ISO
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	750 2022 as well as various national standards (such as JIS X 0202).
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	751
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	752 Character sets (@dfn{charsets}) are classified into the following four
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	753 categories, according to the number of characters in the charset:
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	754 94-charset, 96-charset, 94x94-charset, and 96x96-charset. This means
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	755 that although an ISO 2022 coding system may have variable width
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	756 characters, each charset used is fixed-width (in contrast to the MULE
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	757 character set and UTF-8, for example).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	758
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	759 ISO 2022 provides for switching between character sets via escape
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	760 sequences. This switching is somewhat complicated, because ISO 2022
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	761 provides for both legacy applications like Internet mail that accept
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	762 only 7 significant bits in some contexts (RFC 822 headers, for example),
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	763 and more modern "8-bit clean" applications. It also provides for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	764 compact and transparent representation of languages like Japanese which
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	765 mix ASCII and a national script (even outside of computer programs).
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	766
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	767 First, ISO 2022 codified prevailing practice by dividing the code space
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	768 into "control" and "graphic" regions. The code points 0x00-0x1F and
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	769 0x80-0x9F are reserved for "control characters", while "graphic
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	770 characters" must be assigned to code points in the regions 0x20-0x7F and
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	771 0xA0-0xFF. The positions 0x20 and 0x7F are special, and under some
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	772 circumstances must be assigned the graphic character "ASCII SPACE" and
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	773 the control character "ASCII DEL" respectively.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	774
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	775 The various regions are given the name C0 (0x00-0x1F), GL (0x20-0x7F),
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	776 C1 (0x80-0x9F), and GR (0xA0-0xFF). GL and GR stand for "graphic left"
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	777 and "graphic right", respectively, because of the standard method of
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	778 displaying graphic character sets in tables with the high byte indexing
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	779 columns and the low byte indexing rows. I don't find it very intuitive,
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	780 but these are called "registers".
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	781
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	782 An ISO 2022-conformant encoding for a graphic character set must use a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	783 fixed number of bytes per character, and the values must fit into a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	784 single register; that is, each byte must range over either 0x20-0x7F, or
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	785 0xA0-0xFF. It is not allowed to extend the range of the repertoire of a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	786 character set by using both ranges at the same. This is why a standard
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	787 character set such as ISO 8859-1 is actually considered by ISO 2022 to
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	788 be an aggregation of two character sets, ASCII and LATIN-1, and why it
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	789 is technically incorrect to refer to ISO 8859-1 as "Latin 1". Also, a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	790 single character's bytes must all be drawn from the same register; this
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	791 is why Shift JIS (for Japanese) and Big 5 (for Chinese) are not ISO
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	792 2022-compatible encodings.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	793
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	794 The reason for this restriction becomes clear when you attempt to define
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	795 an efficient, robust encoding for a language like Japanese. Like ISO
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	796 8859, Japanese encodings are aggregations of several character sets. In
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	797 practice, the vast majority of characters are drawn from the "JIS Roman"
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	798 character set (a derivative of ASCII; it won't hurt to think of it as
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	799 ASCII) and the JIS X 0208 standard "basic Japanese" character set
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	800 including not only ideographic characters ("kanji") but syllabic
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	801 Japanese characters ("kana"), a wide variety of symbols, and many
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	802 alphabetic characters (Roman, Greek, and Cyrillic) as well. Although
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	803 JIS X 0208 includes the whole Roman alphabet, as a 2-byte code it is not
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	804 suited to programming; thus the inclusion of ASCII in the standard
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	805 Japanese encodings.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	806
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	807 For normal Japanese text such as in newspapers, a broad repertoire of
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	808 approximately 3000 characters is used. Evidently this won't fit into
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	809 one byte; two must be used. But much of the text processed by Japanese
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	810 computers is computer source code, nearly all of which is ASCII. A not
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	811 insignificant portion of ordinary text is English (as such or as
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	812 borrowed Japanese vocabulary) or other languages which can represented
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	813 at least approximately in ASCII, as well. It seems reasonable then to
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	814 represent ASCII in one byte, and JIS X 0208 in two. And this is exactly
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	815 what the Extended Unix Code for Japanese (EUC-JP) does. ASCII is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	816 invoked to the GL register, and JIS X 0208 is invoked to the GR
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	817 register. Thus, each byte can be tested for its character set by
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	818 looking at the high bit; if set, it is Japanese, if clear, it is ASCII.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	819 Furthermore, since control characters like newline can never be part of
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	820 a graphic character, even in the case of corruption in transmission the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	821 stream will be resynchronized at every line break, on the order of 60-80
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	822 bytes. This coding system requires no escape sequences or special
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	823 control codes to represent 99.9% of all Japanese text.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	824
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	825 Note carefully the distinction between the character sets (ASCII and JIS
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	826 X 0208), the encoding (EUC-JP), and the coding system (ISO 2022). The
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	827 JIS X 0208 character set is used in three different encodings for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	828 Japanese, but in ISO-2022-JP it is invoked into GL (so the high bit is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	829 always clear), in EUC-JP it is invoked into GR (setting the high bit in
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	830 the process), and in Shift JIS the high bit may be set or reset, and the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	831 significant bits are shifted within the 16-bit character so that the two
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	832 main character sets can coexist with a third (the "halfwidth katakana"
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	833 of JIS X 0201). As the name implies, the ISO-2022-JP encoding is also a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	834 version of the ISO-2022 coding system.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	835
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	836 In order to systematically treat subsidiary character sets (like the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	837 "halfwidth katakana" already mentioned, and the "supplementary kanji" of
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	838 JIS X 0212), four further registers are defined: G0, G1, G2, and G3.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	839 Unlike GL and GR, they are not logically distinguished by internal
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	840 format. Instead, the process of "invocation" mentioned earlier is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	841 broken into two steps: first, a character set is @dfn{designated} to one
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	842 of the registers G0-G3 by use of an @dfn{escape sequence} of the form:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	843
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	844 @example
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	845 ESC [@var{I}] @var{I} @var{F}
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	846 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	847
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	848 where @var{I} is an intermediate character or characters in the range
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	849 0x20 - 0x3F, and @var{F}, from the range 0x30-0x7Fm is the final
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	850 character identifying this charset. (Final characters in the range
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	851 0x30-0x3F are reserved for private use and will never have a publicly
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	852 registered meaning.)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	853
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	854 Then that register is @dfn{invoked} to either GL or GR, either
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	855 automatically (designations to G0 normally involve invocation to GL as
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	856 well), or by use of shifting (affecting only the following character in
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	857 the data stream) or locking (effective until the next designation or
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	858 locking) control sequences. An encoding conformant to ISO 2022 is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	859 typically defined by designating the initial contents of the G0-G3
901 37e56e920ac5 [xemacs-hg @ 2002-07-05 20:35:47 by adrian] adrian parents: 775 diff changeset	860 registers, specifying a 7 or 8 bit environment, and specifying whether
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	861 further designations will be recognized.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	862
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	863 Some examples of character sets and the registered final characters
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	864 @var{F} used to designate them:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	865
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	866 @need 1000
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	867 @table @asis
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	868 @item 94-charset
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	869 ASCII (B), left (J) and right (I) half of JIS X 0201, ...
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	870 @item 96-charset
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	871 Latin-1 (A), Latin-2 (B), Latin-3 (C), ...
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	872 @item 94x94-charset
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	873 GB2312 (A), JIS X 0208 (B), KSC5601 (C), ...
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	874 @item 96x96-charset
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	875 none for the moment
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	876 @end table
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	877
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	878 The meanings of the various characters in these sequences, where not
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	879 specified by the ISO 2022 standard (such as the ESC character), are
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	880 assigned by @dfn{ECMA}, the European Computer Manufacturers Association.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	881
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	882 The meaning of intermediate characters are:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	883
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	884 @example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	885 @group
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	886 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	887 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	888 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	889 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	890 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	891 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}.
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	892 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	893 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	894 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	895 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	896 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	897
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	898 The comma may be used in files read and written only by MULE, as a MULE
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	899 extension, but this is illegal in ISO 2022. (The reason is that in ISO
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	900 2022 G0 must be a 94-member character set, with 0x20 assigned the value
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	901 SPACE, and 0x7F assigned the value DEL.)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	902
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	903 Here are examples of designations:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	904
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	905 @example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	906 @group
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	907 ESC ( B : designate to G0 ASCII
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	908 ESC - A : designate to G1 Latin-1
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	909 ESC $ ( A or ESC $ A : designate to G0 GB2312
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	910 ESC $ ( B or ESC $ B : designate to G0 JISX0208
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	911 ESC $ ) C : designate to G1 KSC5601
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	912 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	913 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	914
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	915 (The short forms used to designate GB2312 and JIS X 0208 are for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	916 backwards compatibility; the long forms are preferred.)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	917
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	918 To use a charset designated to G2 or G3, and to use a charset designated
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	919 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	920 into GL. There are two types of invocation, Locking Shift (forever) and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	921 Single Shift (one character only).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	922
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	923 Locking Shift is done as follows:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	924
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	925 @example
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	926 LS0 or SI (0x0F): invoke G0 into GL
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	927 LS1 or SO (0x0E): invoke G1 into GL
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	928 LS2: invoke G2 into GL
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	929 LS3: invoke G3 into GL
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	930 LS1R: invoke G1 into GR
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	931 LS2R: invoke G2 into GR
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	932 LS3R: invoke G3 into GR
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	933 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	934
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	935 Single Shift is done as follows:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	936
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	937 @example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	938 @group
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	939 SS2 or ESC N: invoke G2 into GL
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	940 SS3 or ESC O: invoke G3 into GL
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	941 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	942 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	943
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	944 The shift functions (such as LS1R and SS3) are represented by control
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	945 characters (from C1) in 8 bit environments and by escape sequences in 7
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	946 bit environments.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	947
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	948 (#### Ben says: I think the above is slightly incorrect. It appears that
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	949 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	950 ESC O behave as indicated. The above definitions will not parse
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	951 EUC-encoded text correctly, and it looks like the code in mule-coding.c
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	952 has similar problems.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	953
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	954 Evidently there are a lot of ISO-2022-compliant ways of encoding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	955 multilingual text. Now, in the world, there exist many coding systems
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	956 such as X11's Compound Text, Japanese JUNET code, and so-called EUC
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	957 (Extended UNIX Code); all of these are variants of ISO 2022.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	958
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	959 In MULE, we characterize a version of ISO 2022 by the following
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	960 attributes:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	961
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	962 @enumerate
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	963 @item
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	964 The character sets initially designated to G0 thru G3.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	965 @item
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	966 Whether short form designations are allowed for Japanese and Chinese.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	967 @item
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	968 Whether ASCII should be designated to G0 before control characters.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	969 @item
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	970 Whether ASCII should be designated to G0 at the end of line.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	971 @item
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	972 7-bit environment or 8-bit environment.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	973 @item
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	974 Whether Locking Shifts are used or not.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	975 @item
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	976 Whether to use ASCII or the variant JIS X 0201-1976-Roman.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	977 @item
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	978 Whether to use JIS X 0208-1983 or the older version JIS X 0208-1976.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	979 @end enumerate
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	980
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	981 (The last two are only for Japanese.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	982
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	983 By specifying these attributes, you can create any variant
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	984 of ISO 2022.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	985
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	986 Here are several examples:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	987
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	988 @example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	989 @group
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	990 ISO-2022-JP -- Coding system used in Japanese email (RFC 1463 #### check).
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	991 1. G0 <- ASCII, G1..3 <- never used
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	992 2. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	993 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	994 4. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	995 5. 7-bit environment
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	996 6. No.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	997 7. Use ASCII
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	998 8. Use JIS X 0208-1983
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	999 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1000
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1001 @group
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1002 ctext -- X11 Compound Text
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1003 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used.
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1004 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1005 3. No.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1006 4. Yes.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1007 5. 8-bit environment.
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1008 6. No.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1009 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1010 8. Use JIS X 0208-1983.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1011 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1012
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1013 @group
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1014 euc-china -- Chinese EUC. Often called the "GB encoding", but that is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1015 technically incorrect.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1016 1. G0 <- ASCII, G1 <- GB 2312, G2,3 <- never used.
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1017 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1018 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1019 4. Yes.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1020 5. 8-bit environment.
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1021 6. No.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1022 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1023 8. Use JIS X 0208-1983.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1024 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1025
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1026 @group
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1027 ISO-2022-KR -- Coding system used in Korean email.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1028 1. G0 <- ASCII, G1 <- KSC 5601, G2,3 <- never used.
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1029 2. No.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1030 3. Yes.
8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1031 4. Yes.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1032 5. 7-bit environment.
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	1033 6. Yes.
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1034 7. Use ASCII.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1035 8. Use JIS X 0208-1983.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1036 @end group
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1037 @end example
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1038
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1039 MULE creates all of these coding systems by default.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1040
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1041 @node EOL Conversion, Coding System Properties, ISO 2022, Coding Systems
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1042 @subsection EOL Conversion
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1043
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1044 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1045 @item nil
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1046 Automatically detect the end-of-line type (LF, CRLF, or CR). Also
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1047 generate subsidiary coding systems named @code{@var{name}-unix},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1048 @code{@var{name}-dos}, and @code{@var{name}-mac}, that are identical to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1049 this coding system but have an EOL-TYPE value of @code{lf}, @code{crlf},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1050 and @code{cr}, respectively.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1051 @item lf
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1052 The end of a line is marked externally using ASCII LF. Since this is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1053 also the way that XEmacs represents an end-of-line internally,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1054 specifying this option results in no end-of-line conversion. This is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1055 the standard format for Unix text files.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1056 @item crlf
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1057 The end of a line is marked externally using ASCII CRLF. This is the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1058 standard format for MS-DOS text files.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1059 @item cr
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1060 The end of a line is marked externally using ASCII CR. This is the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1061 standard format for Macintosh text files.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1062 @item t
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1063 Automatically detect the end-of-line type but do not generate subsidiary
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1064 coding systems. (This value is converted to @code{nil} when stored
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1065 internally, and @code{coding-system-property} will return @code{nil}.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1066 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1067
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1068 @node Coding System Properties, Basic Coding System Functions, EOL Conversion, Coding Systems
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1069 @subsection Coding System Properties
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1070
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1071 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1072 @item mnemonic
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1073 String to be displayed in the modeline when this coding system is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1074 active.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1075
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1076 @item eol-type
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1077 End-of-line conversion to be used. It should be one of the types
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1078 listed in @ref{EOL Conversion}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1079
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1080 @item eol-lf
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1081 The coding system which is the same as this one, except that it uses the
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1082 Unix line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1083
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1084 @item eol-crlf
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1085 The coding system which is the same as this one, except that it uses the
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1086 DOS line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1087
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1088 @item eol-cr
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1089 The coding system which is the same as this one, except that it uses the
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1090 Macintosh line-breaking convention.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1091
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1092 @item post-read-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1093 Function called after a file has been read in, to perform the decoding.
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1094 Called with two arguments, @var{start} and @var{end}, denoting a region of
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1095 the current buffer to be decoded.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1096
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1097 @item pre-write-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1098 Function called before a file is written out, to perform the encoding.
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1099 Called with two arguments, @var{start} and @var{end}, denoting a region of
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1100 the current buffer to be encoded.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1101 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1102
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1103 The following additional properties are recognized if @var{type} is
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1104 @code{iso2022}:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1105
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1106 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1107 @item charset-g0
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1108 @itemx charset-g1
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1109 @itemx charset-g2
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1110 @itemx charset-g3
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1111 The character set initially designated to the G0 - G3 registers.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1112 The value should be one of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1113
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1114 @itemize @bullet
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1115 @item
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1116 A charset object (designate that character set)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1117 @item
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1118 @code{nil} (do not ever use this register)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1119 @item
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1120 @code{t} (no character set is initially designated to the register, but
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1121 may be later on; this automatically sets the corresponding
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1122 @code{force-g*-on-output} property)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1123 @end itemize
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1124
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1125 @item force-g0-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1126 @itemx force-g1-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1127 @itemx force-g2-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1128 @itemx force-g3-on-output
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1129 If non-@code{nil}, send an explicit designation sequence on output
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1130 before using the specified register.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1131
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1132 @item short
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1133 If non-@code{nil}, use the short forms @samp{ESC $ @@}, @samp{ESC $ A},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1134 and @samp{ESC $ B} on output in place of the full designation sequences
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1135 @samp{ESC $ ( @@}, @samp{ESC $ ( A}, and @samp{ESC $ ( B}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1136
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1137 @item no-ascii-eol
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1138 If non-@code{nil}, don't designate ASCII to G0 at each end of line on
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1139 output. Setting this to non-@code{nil} also suppresses other
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1140 state-resetting that normally happens at the end of a line.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1141
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1142 @item no-ascii-cntl
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1143 If non-@code{nil}, don't designate ASCII to G0 before control chars on
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1144 output.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1145
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1146 @item seven
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1147 If non-@code{nil}, use 7-bit environment on output. Otherwise, use 8-bit
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1148 environment.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1149
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1150 @item lock-shift
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1151 If non-@code{nil}, use locking-shift (SO/SI) instead of single-shift or
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1152 designation by escape sequence.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1153
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1154 @item no-iso6429
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1155 If non-@code{nil}, don't use ISO6429's direction specification.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1156
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1157 @item escape-quoted
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1158 If non-@code{nil}, literal control characters that are the same as the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1159 beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1160 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F),
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1161 and CSI (0x9B)) are ``quoted'' with an escape character so that they can
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1162 be properly distinguished from an escape sequence. (Note that doing
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1163 this results in a non-portable encoding.) This encoding flag is used for
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1164 byte-compiled files. Note that ESC is a good choice for a quoting
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1165 character because there are no escape sequences whose second byte is a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1166 character from the Control-0 or Control-1 character sets; this is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1167 explicitly disallowed by the ISO 2022 standard.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1168
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1169 @item input-charset-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1170 A list of conversion specifications, specifying conversion of characters
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1171 in one charset to another when decoding is performed. Each
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1172 specification is a list of two elements: the source charset, and the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1173 destination charset.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1174
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1175 @item output-charset-conversion
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1176 A list of conversion specifications, specifying conversion of characters
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1177 in one charset to another when encoding is performed. The form of each
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1178 specification is the same as for @code{input-charset-conversion}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1179 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1180
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1181 The following additional properties are recognized (and required) if
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1182 @var{type} is @code{ccl}:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1183
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1184 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1185 @item decode
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1186 CCL program used for decoding (converting to internal format).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1187
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1188 @item encode
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1189 CCL program used for encoding (converting to external format).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1190 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1191
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1192 The following properties are used internally: @var{eol-cr},
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1193 @var{eol-crlf}, @var{eol-lf}, and @var{base}.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1194
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1195 @node Basic Coding System Functions, Coding System Property Functions, Coding System Properties, Coding Systems
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1196 @subsection Basic Coding System Functions
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1197
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1198 @defun find-coding-system coding-system-or-name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1199 This function retrieves the coding system of the given name.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1200
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1201 If @var{coding-system-or-name} is a coding-system object, it is simply
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1202 returned. Otherwise, @var{coding-system-or-name} should be a symbol.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1203 If there is no such coding system, @code{nil} is returned. Otherwise
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1204 the associated coding system object is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1205 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1206
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1207 @defun get-coding-system name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1208 This function retrieves the coding system of the given name. Same as
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1209 @code{find-coding-system} except an error is signalled if there is no
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1210 such coding system instead of returning @code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1211 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1212
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1213 @defun coding-system-list
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1214 This function returns a list of the names of all defined coding systems.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1215 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1216
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1217 @defun coding-system-name coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1218 This function returns the name of the given coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1219 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1220
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1221 @defun coding-system-base coding-system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1222 Returns the base coding system (undecided EOL convention)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1223 coding system.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1224 @end defun
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1225
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1226 @defun make-coding-system name type &optional doc-string props
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1227 This function registers symbol @var{name} as a coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1228
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1229 @var{type} describes the conversion method used and should be one of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1230 the types listed in @ref{Coding System Types}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1231
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1232 @var{doc-string} is a string describing the coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1233
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1234 @var{props} is a property list, describing the specific nature of the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1235 character set. Recognized properties are as in @ref{Coding System
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1236 Properties}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1237 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1238
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1239 @defun copy-coding-system old-coding-system new-name
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1240 This function copies @var{old-coding-system} to @var{new-name}. If
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1241 @var{new-name} does not name an existing coding system, a new one will
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1242 be created.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1243 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1244
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1245 @defun subsidiary-coding-system coding-system eol-type
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1246 This function returns the subsidiary coding system of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1247 @var{coding-system} with eol type @var{eol-type}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1248 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1249
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1250 @node Coding System Property Functions, Encoding and Decoding Text, Basic Coding System Functions, Coding Systems
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1251 @subsection Coding System Property Functions
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1252
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1253 @defun coding-system-doc-string coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1254 This function returns the doc string for @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1255 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1256
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1257 @defun coding-system-type coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1258 This function returns the type of @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1259 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1260
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1261 @defun coding-system-property coding-system prop
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1262 This function returns the @var{prop} property of @var{coding-system}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1263 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1264
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1265 @node Encoding and Decoding Text, Detection of Textual Encoding, Coding System Property Functions, Coding Systems
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1266 @subsection Encoding and Decoding Text
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1267
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1268 @defun decode-coding-region start end coding-system &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1269 This function decodes the text between @var{start} and @var{end} which
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1270 is encoded in @var{coding-system}. This is useful if you've read in
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1271 encoded text from a file without decoding it (e.g. you read in a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1272 JIS-formatted file but used the @code{binary} or @code{no-conversion} coding
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1273 system, so that it shows up as @samp{^[$B!<!+^[(B}). The length of the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1274 encoded text is returned. @var{buffer} defaults to the current buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1275 if unspecified.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1276 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1277
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1278 @defun encode-coding-region start end coding-system &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1279 This function encodes the text between @var{start} and @var{end} using
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1280 @var{coding-system}. This will, for example, convert Japanese
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1281 characters into stuff such as @samp{^[$B!<!+^[(B} if you use the JIS
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1282 encoding. The length of the encoded text is returned. @var{buffer}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1283 defaults to the current buffer if unspecified.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1284 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1285
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1286 @node Detection of Textual Encoding, Big5 and Shift-JIS Functions, Encoding and Decoding Text, Coding Systems
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1287 @subsection Detection of Textual Encoding
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1288
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1289 @defun coding-category-list
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1290 This function returns a list of all recognized coding categories.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1291 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1292
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1293 @defun set-coding-priority-list list
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1294 This function changes the priority order of the coding categories.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1295 @var{list} should be a list of coding categories, in descending order of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1296 priority. Unspecified coding categories will be lower in priority than
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1297 all specified ones, in the same relative order they were in previously.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1298 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1299
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1300 @defun coding-priority-list
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1301 This function returns a list of coding categories in descending order of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1302 priority.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1303 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1304
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1305 @defun set-coding-category-system coding-category coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1306 This function changes the coding system associated with a coding category.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1307 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1308
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1309 @defun coding-category-system coding-category
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1310 This function returns the coding system associated with a coding category.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1311 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1312
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1313 @defun detect-coding-region start end &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1314 This function detects coding system of the text in the region between
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1315 @var{start} and @var{end}. Returned value is a list of possible coding
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1316 systems ordered by priority. If only ASCII characters are found, it
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1317 returns @code{autodetect} or one of its subsidiary coding systems
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1318 according to a detected end-of-line type. Optional arg @var{buffer}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1319 defaults to the current buffer.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1320 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1321
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1322 @node Big5 and Shift-JIS Functions, Predefined Coding Systems, Detection of Textual Encoding, Coding Systems
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1323 @subsection Big5 and Shift-JIS Functions
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1324
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1325 These are special functions for working with the non-standard
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1326 Shift-JIS and Big5 encodings.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1327
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1328 @defun decode-shift-jis-char code
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1329 This function decodes a JIS X 0208 character of Shift-JIS coding-system.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1330 @var{code} is the character code in Shift-JIS as a cons of type bytes.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1331 The corresponding character is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1332 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1333
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1334 @defun encode-shift-jis-char character
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1335 This function encodes a JIS X 0208 character @var{character} to
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1336 SHIFT-JIS coding-system. The corresponding character code in SHIFT-JIS
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1337 is returned as a cons of two bytes.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1338 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1339
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1340 @defun decode-big5-char code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1341 This function decodes a Big5 character @var{code} of BIG5 coding-system.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1342 @var{code} is the character code in BIG5. The corresponding character
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1343 is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1344 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1345
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1346 @defun encode-big5-char character
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1347 This function encodes the Big5 character @var{character} to BIG5
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1348 coding-system. The corresponding character code in Big5 is returned.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1349 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1350
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1351 @node Predefined Coding Systems, , Big5 and Shift-JIS Functions, Coding Systems
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1352 @subsection Coding Systems Implemented
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1353
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1354 MULE initializes most of the commonly used coding systems at XEmacs's
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1355 startup. A few others are initialized only when the relevant language
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1356 environment is selected and support libraries are loaded. (NB: The
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1357 following list is based on XEmacs 21.2.19, the development branch at the
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1358 time of writing. The list may be somewhat different for other
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1359 versions. Recent versions of GNU Emacs 20 implement a few more rare
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1360 coding systems; work is being done to port these to XEmacs.)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1361
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1362 Unfortunately, there is not a consistent naming convention for character
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1363 sets, and for practical purposes coding systems often take their name
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1364 from their principal character sets (ASCII, KOI8-R, Shift JIS). Others
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1365 take their names from the coding system (ISO-2022-JP, EUC-KR), and a few
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1366 from their non-text usages (internal, binary). To provide for this, and
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1367 for the fact that many coding systems have several common names, an
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1368 aliasing system is provided. Finally, some effort has been made to use
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1369 names that are registered as MIME charsets (this is why the name
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1370 'shift_jis contains that un-Lisp-y underscore).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1371
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1372 There is a systematic naming convention regarding end-of-line (EOL)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1373 conventions for different systems. A coding system whose name ends in
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1374 "-unix" forces the assumptions that lines are broken by newlines (0x0A).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1375 A coding system whose name ends in "-mac" forces the assumptions that
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1376 lines are broken by ASCII CRs (0x0D). A coding system whose name ends
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1377 in "-dos" forces the assumptions that lines are broken by CRLF sequences
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1378 (0x0D 0x0A). These subsidiary coding systems are automatically derived
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1379 from a base coding system. Use of the base coding system implies
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1380 autodetection of the text file convention. (The fact that the -unix,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1381 -mac, and -dos are derived from a base system results in them showing up
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1382 as "aliases" in `list-coding-systems'.) These subsidiaries have a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1383 consistent modeline indicator as well. "-dos" coding systems have ":T"
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1384 appended to their modeline indicator, while "-mac" coding systems have
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1385 ":t" appended (eg, "ISO8:t" for iso-2022-8-mac).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1386
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1387 In the following table, each coding system is given with its mode line
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1388 indicator in parentheses. Non-textual coding systems are listed first,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1389 followed by textual coding systems and their aliases. (The coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1390 subsidiary modeline indicators ":T" and ":t" will be omitted from the
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1391 table of coding systems.)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1392
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1393 ### SJT 1999-08-23 Maybe should order these by language? Definitely
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1394 need language usage for the ISO-8859 family.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1395
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1396 Note that although true coding system aliases have been implemented for
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1397 XEmacs 21.2, the coding system initialization has not yet been converted
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1398 as of 21.2.19. So coding systems described as aliases have the same
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1399 properties as the aliased coding system, but will not be equal as Lisp
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1400 objects.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1401
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1402 @table @code
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1403
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1404 @item automatic-conversion
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1405 @itemx undecided
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1406 @itemx undecided-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1407 @itemx undecided-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1408 @itemx undecided-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1409
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1410 Modeline indicator: @code{Auto}. A type @code{undecided} coding system.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1411 Attempts to determine an appropriate coding system from file contents or
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1412 the environment.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1413
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1414 @item raw-text
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1415 @itemx no-conversion
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1416 @itemx raw-text-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1417 @itemx raw-text-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1418 @itemx raw-text-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1419 @itemx no-conversion-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1420 @itemx no-conversion-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1421 @itemx no-conversion-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1422
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1423 Modeline indicator: @code{Raw}. A type @code{no-conversion} coding system,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1424 which converts only line-break-codes. An implementation quirk means
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1425 that this coding system is also used for ISO8859-1.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1426
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1427 @item binary
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1428 Modeline indicator: @code{Binary}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1429 system which does no character coding or EOL conversions. An alias for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1430 @code{raw-text-unix}.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1431
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1432 @item alternativnyj
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1433 @itemx alternativnyj-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1434 @itemx alternativnyj-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1435 @itemx alternativnyj-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1436
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1437 Modeline indicator: @code{Cy.Alt}. A type @code{ccl} coding system used for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1438 Alternativnyj, an encoding of the Cyrillic alphabet.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1439
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1440 @item big5
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1441 @itemx big5-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1442 @itemx big5-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1443 @itemx big5-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1444
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1445 Modeline indicator: @code{Zh/Big5}. A type @code{big5} coding system used for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1446 BIG5, the most common encoding of traditional Chinese as used in Taiwan.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1447
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1448 @item cn-gb-2312
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1449 @itemx cn-gb-2312-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1450 @itemx cn-gb-2312-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1451 @itemx cn-gb-2312-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1452
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1453 Modeline indicator: @code{Zh-GB/EUC}. A type @code{iso2022} coding system used
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1454 for simplified Chinese (as used in the People's Republic of China), with
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1455 the @code{ascii} (G0), @code{chinese-gb2312} (G1), and @code{sisheng}
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1456 (G2) character sets initially designated. Chinese EUC (Extended Unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1457 Code).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1458
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1459 @item ctext-hebrew
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1460 @itemx ctext-hebrew-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1461 @itemx ctext-hebrew-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1462 @itemx ctext-hebrew-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1463
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1464 Modeline indicator: @code{CText/Hbrw}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1465 with the @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) character
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1466 sets initially designated for Hebrew.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1467
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1468 @item ctext
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1469 @itemx ctext-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1470 @itemx ctext-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1471 @itemx ctext-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1472
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1473 Modeline indicator: @code{CText}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1474 with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1) character
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1475 sets initially designated. X11 Compound Text Encoding. Often
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1476 mistakenly recognized instead of EUC encodings; usual cause is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1477 inappropriate setting of @code{coding-priority-list}.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1478
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1479 @item escape-quoted
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1480
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1481 Modeline indicator: @code{ESC/Quot}. A type @code{iso2022} 8-bit coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1482 system with the @code{ascii} (G0) and @code{latin-iso8859-1} (G1)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1483 character sets initially designated and escape quoting. Unix EOL
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1484 conversion (ie, no conversion). It is used for .ELC files.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1485
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1486 @item euc-jp
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1487 @itemx euc-jp-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1488 @itemx euc-jp-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1489 @itemx euc-jp-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1490
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1491 Modeline indicator: @code{Ja/EUC}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1492 with @code{ascii} (G0), @code{japanese-jisx0208} (G1),
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1493 @code{katakana-jisx0201} (G2), and @code{japanese-jisx0212} (G3)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1494 initially designated. Japanese EUC (Extended Unix Code).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1495
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1496 @item euc-kr
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1497 @itemx euc-kr-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1498 @itemx euc-kr-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1499 @itemx euc-kr-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1500
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1501 Modeline indicator: @code{ko/EUC}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1502 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1503 designated. Korean EUC (Extended Unix Code).
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1504
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1505 @item hz-gb-2312
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1506 Modeline indicator: @code{Zh-GB/Hz}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1507 system with Unix EOL convention (ie, no conversion) using
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1508 post-read-decode and pre-write-encode functions to translate the Hz/ZW
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1509 coding system used for Chinese.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1510
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1511 @item iso-2022-7bit
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1512 @itemx iso-2022-7bit-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1513 @itemx iso-2022-7bit-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1514 @itemx iso-2022-7bit-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1515 @itemx iso-2022-7
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1516
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1517 Modeline indicator: @code{ISO7}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1518 with @code{ascii} (G0) initially designated. Other character sets must
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1519 be explicitly designated to be used.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1520
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1521 @item iso-2022-7bit-ss2
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1522 @itemx iso-2022-7bit-ss2-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1523 @itemx iso-2022-7bit-ss2-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1524 @itemx iso-2022-7bit-ss2-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1525
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1526 Modeline indicator: @code{ISO7/SS}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1527 with @code{ascii} (G0) initially designated. Other character sets must
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1528 be explicitly designated to be used. SS2 is used to invoke a
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1529 96-charset, one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1530
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1531 @item iso-2022-8
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1532 @itemx iso-2022-8-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1533 @itemx iso-2022-8-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1534 @itemx iso-2022-8-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1535
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1536 Modeline indicator: @code{ISO8}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1537 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1538 designated. Other character sets must be explicitly designated to be
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1539 used. No single-shift or locking-shift.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1540
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1541 @item iso-2022-8bit-ss2
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1542 @itemx iso-2022-8bit-ss2-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1543 @itemx iso-2022-8bit-ss2-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1544 @itemx iso-2022-8bit-ss2-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1545
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1546 Modeline indicator: @code{ISO8/SS}. A type @code{iso2022} 8-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1547 with @code{ascii} (G0) and @code{latin-iso8859-1} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1548 designated. Other character sets must be explicitly designated to be
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1549 used. SS2 is used to invoke a 96-charset, one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1550
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1551 @item iso-2022-int-1
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1552 @itemx iso-2022-int-1-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1553 @itemx iso-2022-int-1-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1554 @itemx iso-2022-int-1-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1555
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1556 Modeline indicator: @code{INT-1}. A type @code{iso2022} 7-bit coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1557 with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1558 designated. ISO-2022-INT-1.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1559
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1560 @item iso-2022-jp-1978-irv
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1561 @itemx iso-2022-jp-1978-irv-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1562 @itemx iso-2022-jp-1978-irv-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1563 @itemx iso-2022-jp-1978-irv-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1564
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1565 Modeline indicator: @code{Ja-78/7bit}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1566 system. For compatibility with old Japanese terminals; if you need to
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1567 know, look at the source.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1568
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1569 @item iso-2022-jp
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1570 @itemx iso-2022-jp-2 (ISO7/SS)
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1571 @itemx iso-2022-jp-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1572 @itemx iso-2022-jp-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1573 @itemx iso-2022-jp-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1574 @itemx iso-2022-jp-2-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1575 @itemx iso-2022-jp-2-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1576 @itemx iso-2022-jp-2-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1577
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1578 Modeline indicator: @code{MULE/7bit}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1579 system with @code{ascii} (G0) initially designated, and complex
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1580 specifications to insure backward compatibility with old Japanese
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1581 systems. Used for communication with mail and news in Japan. The "-2"
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1582 versions also use SS2 to invoke a 96-charset one character at a time.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1583
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1584 @item iso-2022-kr
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1585 Modeline indicator: @code{Ko/7bit} A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1586 system with @code{ascii} (G0) and @code{korean-ksc5601} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1587 designated. Used for e-mail in Korea.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1588
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1589 @item iso-2022-lock
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1590 @itemx iso-2022-lock-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1591 @itemx iso-2022-lock-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1592 @itemx iso-2022-lock-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1593
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1594 Modeline indicator: @code{ISO7/Lock}. A type @code{iso2022} 7-bit coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1595 system with @code{ascii} (G0) initially designated, using Locking-Shift
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1596 to invoke a 96-charset.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1597
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1598 @item iso-8859-1
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1599 @itemx iso-8859-1-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1600 @itemx iso-8859-1-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1601 @itemx iso-8859-1-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1602
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1603 Due to implementation, this is not a type @code{iso2022} coding system,
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1604 but rather an alias for the @code{raw-text} coding system.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1605
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1606 @item iso-8859-2
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1607 @itemx iso-8859-2-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1608 @itemx iso-8859-2-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1609 @itemx iso-8859-2-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1610
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1611 Modeline indicator: @code{MIME/Ltn-2}. A type @code{iso2022} coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1612 system with @code{ascii} (G0) and @code{latin-iso8859-2} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1613 invoked.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1614
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1615 @item iso-8859-3
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1616 @itemx iso-8859-3-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1617 @itemx iso-8859-3-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1618 @itemx iso-8859-3-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1619
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1620 Modeline indicator: @code{MIME/Ltn-3}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1621 with @code{ascii} (G0) and @code{latin-iso8859-3} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1622 invoked.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1623
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1624 @item iso-8859-4
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1625 @itemx iso-8859-4-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1626 @itemx iso-8859-4-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1627 @itemx iso-8859-4-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1628
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1629 Modeline indicator: @code{MIME/Ltn-4}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1630 with @code{ascii} (G0) and @code{latin-iso8859-4} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1631 invoked.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1632
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1633 @item iso-8859-5
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1634 @itemx iso-8859-5-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1635 @itemx iso-8859-5-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1636 @itemx iso-8859-5-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1637
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1638 Modeline indicator: @code{ISO8/Cyr}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1639 @code{ascii} (G0) and @code{cyrillic-iso8859-5} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1640
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1641 @item iso-8859-7
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1642 @itemx iso-8859-7-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1643 @itemx iso-8859-7-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1644 @itemx iso-8859-7-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1645
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1646 Modeline indicator: @code{Grk}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1647 @code{ascii} (G0) and @code{greek-iso8859-7} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1648
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1649 @item iso-8859-8
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1650 @itemx iso-8859-8-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1651 @itemx iso-8859-8-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1652 @itemx iso-8859-8-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1653
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1654 Modeline indicator: @code{MIME/Hbrw}. A type @code{iso2022} coding system with
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1655 @code{ascii} (G0) and @code{hebrew-iso8859-8} (G1) initially invoked.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1656
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1657 @item iso-8859-9
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1658 @itemx iso-8859-9-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1659 @itemx iso-8859-9-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1660 @itemx iso-8859-9-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1661
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1662 Modeline indicator: @code{MIME/Ltn-5}. A type @code{iso2022} coding system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1663 with @code{ascii} (G0) and @code{latin-iso8859-9} (G1) initially
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1664 invoked.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1665
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1666 @item koi8-r
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1667 @itemx koi8-r-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1668 @itemx koi8-r-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1669 @itemx koi8-r-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1670
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1671 Modeline indicator: @code{KOI8}. A type @code{ccl} coding-system used for
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1672 KOI8-R, an encoding of the Cyrillic alphabet.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1673
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1674 @item shift_jis
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1675 @itemx shift_jis-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1676 @itemx shift_jis-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1677 @itemx shift_jis-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1678
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1679 Modeline indicator: @code{Ja/SJIS}. A type @code{shift-jis} coding-system
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1680 implementing the Shift-JIS encoding for Japanese. The underscore is to
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1681 conform to the MIME charset implementing this encoding.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1682
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1683 @item tis-620
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1684 @itemx tis-620-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1685 @itemx tis-620-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1686 @itemx tis-620-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1687
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1688 Modeline indicator: @code{TIS620}. A type @code{ccl} encoding for Thai. The
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1689 external encoding is defined by TIS620, the internal encoding is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1690 peculiar to MULE, and called @code{thai-xtis}.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1691
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1692 @item viqr
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1693
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1694 Modeline indicator: @code{VIQR}. A type @code{no-conversion} coding
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1695 system with Unix EOL convention (ie, no conversion) using
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1696 post-read-decode and pre-write-encode functions to translate the VIQR
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1697 coding system for Vietnamese.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1698
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1699 @item viscii
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1700 @itemx viscii-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1701 @itemx viscii-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1702 @itemx viscii-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1703
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1704 Modeline indicator: @code{VISCII}. A type @code{ccl} coding-system used
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1705 for VISCII 1.1 for Vietnamese. Differs slightly from VSCII; VISCII is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1706 given priority by XEmacs.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1707
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1708 @item vscii
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1709 @itemx vscii-dos
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1710 @itemx vscii-mac
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1711 @itemx vscii-unix
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1712
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1713 Modeline indicator: @code{VSCII}. A type @code{ccl} coding-system used
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1714 for VSCII 1.1 for Vietnamese. Differs slightly from VISCII, which is
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1715 given priority by XEmacs. Use
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1716 @code{(prefer-coding-system 'vietnamese-vscii)} to give priority to VSCII.
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1717
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1718 @end table
abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1719
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1720 @node CCL, Category Tables, Coding Systems, MULE
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1721 @section CCL
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1722
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1723 CCL (Code Conversion Language) is a simple structured programming
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1724 language designed for character coding conversions. A CCL program is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1725 compiled to CCL code (represented by a vector of integers) and executed
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1726 by the CCL interpreter embedded in Emacs. The CCL interpreter
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1727 implements a virtual machine with 8 registers called @code{r0}, ...,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1728 @code{r7}, a number of control structures, and some I/O operators. Take
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1729 care when using registers @code{r0} (used in implicit @dfn{set}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1730 statements) and especially @code{r7} (used internally by several
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1731 statements and operations, especially for multiple return values and I/O
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1732 operations).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1733
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1734 CCL is used for code conversion during process I/O and file I/O for
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1735 non-ISO2022 coding systems. (It is the only way for a user to specify a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1736 code conversion function.) It is also used for calculating the code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1737 point of an X11 font from a character code. However, since CCL is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1738 designed as a powerful programming language, it can be used for more
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1739 generic calculation where efficiency is demanded. A combination of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1740 three or more arithmetic operations can be calculated faster by CCL than
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1741 by Emacs Lisp.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1742
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1743 @strong{Warning:} The code in @file{src/mule-ccl.c} and
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1744 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1745 description of CCL's semantics. The previous version of this section
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1746 contained several typos and obsolete names left from earlier versions of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1747 MULE, and many may remain. (I am not an experienced CCL programmer; the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1748 few who know CCL well find writing English painful.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1749
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1750 A CCL program transforms an input data stream into an output data
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1751 stream. The input stream, held in a buffer of constant bytes, is left
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1752 unchanged. The buffer may be filled by an external input operation,
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1753 taken from an Emacs buffer, or taken from a Lisp string. The output
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1754 buffer is a dynamic array of bytes, which can be written by an external
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1755 output operation, inserted into an Emacs buffer, or returned as a Lisp
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1756 string.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1757
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1758 A CCL program is a (Lisp) list containing two or three members. The
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1759 first member is the @dfn{buffer magnification}, which indicates the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1760 required minimum size of the output buffer as a multiple of the input
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1761 buffer. It is followed by the @dfn{main block} which executes while
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1762 there is input remaining, and an optional @dfn{EOF block} which is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1763 executed when the input is exhausted. Both the main block and the EOF
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1764 block are CCL blocks.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1765
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1766 A @dfn{CCL block} is either a CCL statement or list of CCL statements.
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1767 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1768 or an @dfn{assignment}, which is a list of a register to receive the
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1769 assignment, an assignment operator, and an expression) or a @dfn{control
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1770 statement} (a list starting with a keyword, whose allowable syntax
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1771 depends on the keyword).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1772
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1773 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1774 * CCL Syntax:: CCL program syntax in BNF notation.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1775 * CCL Statements:: Semantics of CCL statements.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1776 * CCL Expressions:: Operators and expressions in CCL.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1777 * Calling CCL:: Running CCL programs.
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	1778 * CCL Example:: A trivial program to transform the Web's URL encoding.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1779 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1780
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1781 @node CCL Syntax, CCL Statements, , CCL
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1782 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1783 @subsection CCL Syntax
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1784
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1785 The full syntax of a CCL program in BNF notation:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1786
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1787 @format
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1788 CCL_PROGRAM :=
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1789 (BUFFER_MAGNIFICATION
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1790 CCL_MAIN_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1791 [ CCL_EOF_BLOCK ])
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1792
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1793 BUFFER_MAGNIFICATION := integer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1794 CCL_MAIN_BLOCK := CCL_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1795 CCL_EOF_BLOCK := CCL_BLOCK
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1796
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1797 CCL_BLOCK :=
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1798 STATEMENT \| (STATEMENT [STATEMENT ...])
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1799 STATEMENT :=
2367 ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1800 SET \| IF \| BRANCH \| LOOP \| REPEAT \| BREAK \| READ \| WRITE \| CALL
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1801 \| TRANSLATE \| MAP \| END
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1802
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1803 SET :=
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1804 (REG = EXPRESSION)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1805 \| (REG ASSIGNMENT_OPERATOR EXPRESSION)
2367 ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1806 \| INT-OR-CHAR
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1807
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1808 EXPRESSION := ARG \| (EXPRESSION OPERATOR ARG)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1809
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1810 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1811 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1812 LOOP := (loop STATEMENT [STATEMENT ...])
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1813 BREAK := (break)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1814 REPEAT :=
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1815 (repeat)
2367 ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1816 \| (write-repeat [REG \| INT-OR-CHAR \| string])
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1817 \| (write-read-repeat REG [INT-OR-CHAR \| ARRAY])
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1818 READ :=
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1819 (read REG ...)
2367 ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1820 \| (read-if (REG OPERATOR ARG) CCL_BLOCK [CCL_BLOCK])
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1821 \| (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1822 WRITE :=
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1823 (write REG ...)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1824 \| (write EXPRESSION)
2367 ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1825 \| (write INT-OR-CHAR) \| (write string) \| (write REG ARRAY)
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1826 \| string
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1827 CALL := (call ccl-program-name)
3439 d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1828
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1829
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1830 TRANSLATE := ;; Not implemented under XEmacs, except mule-to-unicode and
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1831 ;; unicode-to-mule.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1832 (translate-character REG(table) REG(charset) REG(codepoint))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1833 \| (translate-character SYMBOL REG(charset) REG(codepoint))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1834 \| (mule-to-unicode REG(charset) REG(codepoint))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1835 \| (unicode-to-mule REG(unicode,code) REG(CHARSET))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1836
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1837 END := (end)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1838
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1839 REG := r0 \| r1 \| r2 \| r3 \| r4 \| r5 \| r6 \| r7
2367 ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1840 ARG := REG \| INT-OR-CHAR
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1841 OPERATOR :=
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1842 + \| - \| * \| / \| % \| & \| '\|' \| ^ \| << \| >> \| <8 \| >8 \| //
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1843 \| < \| > \| == \| <= \| >= \| != \| de-sjis \| en-sjis
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1844 ASSIGNMENT_OPERATOR :=
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1845 += \| -= \| *= \| /= \| %= \| &= \| '\|=' \| ^= \| <<= \| >>=
2367 ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1846 ARRAY := '[' INT-OR-CHAR ... ']'
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1847 INT-OR-CHAR := integer \| character
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1848
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1849 @end format
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1850
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1851 @node CCL Statements, CCL Expressions, CCL Syntax, CCL
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1852 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1853 @subsection CCL Statements
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1854
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1855 The Emacs Code Conversion Language provides the following statement
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1856 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat},
3439 d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1857 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, @dfn{translate} and
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1858 @dfn{end}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1859
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1860 @heading Set statement:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1861
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1862 The @dfn{set} statement has three variants with the syntaxes
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1863 @samp{(@var{reg} = @var{expression})},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1864 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1865 @samp{@var{integer}}. The assignment operator variation of the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1866 @dfn{set} statement works the same way as the corresponding C expression
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1867 statement does. The assignment operators are @code{+=}, @code{-=},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1868 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{\|=}, @code{^=},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1869 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1870 "naked integer" @var{integer} is equivalent to a @var{set} statement of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1871 the form @code{(r0 = @var{integer})}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1872
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1873 @heading I/O statements:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1874
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1875 The @dfn{read} statement takes one or more registers as arguments. It
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1876 reads one byte (a C char) from the input into each register in turn.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1877
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1878 The @dfn{write} takes several forms. In the form @samp{(write @var{reg}
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1879 ...)} it takes one or more registers as arguments and writes each in
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1880 turn to the output. The integer in a register (interpreted as an
2367 ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben] ben parents: 1734 diff changeset	1881 Ichar) is encoded to multibyte form (ie, Ibytes) and written to the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1882 current output buffer. If it is less than 256, it is written as is.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1883 The forms @samp{(write @var{expression})} and @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1884 @var{integer})} are treated analogously. The form @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1885 @var{string})} writes the constant string to the output. A
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1886 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1887 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1888 the @var{reg}th element of the @var{array} to the output.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1889
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1890 @heading Conditional statements:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1891
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1892 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1893 an optional @var{second CCL block} as arguments. If the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1894 @var{expression} evaluates to non-zero, the first @var{CCL block} is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1895 executed. Otherwise, if there is a @var{second CCL block}, it is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1896 executed.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1897
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1898 The @dfn{read-if} variant of the @dfn{if} statement takes an
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1899 @var{expression}, a @var{CCL block}, and an optional @var{second CCL
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1900 block} as arguments. The @var{expression} must have the form
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1901 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1902 a register or an integer). The @code{read-if} statement first reads
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1903 from the input into the first register operand in the @var{expression},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1904 then conditionally executes a CCL block just as the @code{if} statement
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1905 does.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1906
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1907 The @dfn{branch} statement takes an @var{expression} and one or more CCL
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1908 blocks as arguments. The CCL blocks are treated as a zero-indexed
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1909 array, and the @code{branch} statement uses the @var{expression} as the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1910 index of the CCL block to execute. Null CCL blocks may be used as
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1911 no-ops, continuing execution with the statement following the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1912 @code{branch} statement in the containing CCL block. Out-of-range
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1913 values for the @var{expression} are also treated as no-ops.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1914
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1915 The @dfn{read-branch} variant of the @dfn{branch} statement takes an
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1916 @var{register}, a @var{CCL block}, and an optional @var{second CCL
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1917 block} as arguments. The @code{read-branch} statement first reads from
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1918 the input into the @var{register}, then conditionally executes a CCL
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1919 block just as the @code{branch} statement does.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1920
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1921 @heading Loop control statements:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1922
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1923 The @dfn{loop} statement creates a block with an implied jump from the
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1924 end of the block back to its head. The loop is exited on a @code{break}
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1925 statement, and continued without executing the tail by a @code{repeat}
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1926 statement.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1927
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1928 The @dfn{break} statement, written @samp{(break)}, terminates the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1929 current loop and continues with the next statement in the current
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1930 block.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1931
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1932 The @dfn{repeat} statement has three variants, @code{repeat},
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1933 @code{write-repeat}, and @code{write-read-repeat}. Each continues the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1934 current loop from its head, possibly after performing I/O.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1935 @code{repeat} takes no arguments and does no I/O before jumping.
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	1936 @code{write-repeat} takes a single argument (a register, an
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1937 integer, or a string), writes it to the output, then jumps.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1938 @code{write-read-repeat} takes one or two arguments. The first must
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1939 be a register. The second may be an integer or an array; if absent, it
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1940 is implicitly set to the first (register) argument.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1941 @code{write-read-repeat} writes its second argument to the output, then
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1942 reads from the input into the register, and finally jumps. See the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1943 @code{write} and @code{read} statements for the semantics of the I/O
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1944 operations for each type of argument.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1945
3439 d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1946 @heading Other statements:
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1947
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1948 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})},
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1949 executes a CCL program as a subroutine. It does not return a value to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1950 the caller, but can modify the register status.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1951
3439 d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1952 The @dfn{mule-to-unicode} statement translates an XEmacs character into a
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1953 UCS code point, using U+FFFD REPLACEMENT CHARACTER if the given XEmacs
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1954 character has no known corresponding code point. It takes two
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1955 arguments; the first is a register in which is stored the character set
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1956 ID of the character to be translated, and into which the UCS code is
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1957 stored. The second is a register which stores the XEmacs code of the
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1958 character in question; if it is from a multidimensional character set,
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1959 like most of the East Asian national sets, it's stored as @samp{((c1 <<
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1960 8) & c2)}, where @samp{c1} is the first code, and @samp{c2} the second.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1961 (That is, as a single integer, the high-order eight bits of which encode
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1962 the first position code, and the low order bits of which encode the
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1963 second.)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1964
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1965 The @dfn{unicode-to-mule} statement translates a Unicode code point
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1966 (an integer) into an XEmacs character. Its first argument is a register
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1967 containing the UCS code point; the code for the correspond character
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1968 will be written into this register, in the same format as for
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1969 @samp{mule-to-unicode} The second argument is a register into which will
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1970 be written the character set ID of the converted character.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan] aidan parents: 2818 diff changeset	1971
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1972 The @dfn{end} statement, written @samp{(end)}, terminates the CCL
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1973 program successfully, and returns to caller (which may be a CCL
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1974 program). It does not alter the status of the registers.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1975
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1976 @node CCL Expressions, Calling CCL, CCL Statements, CCL
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1977 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1978 @subsection CCL Expressions
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1979
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1980 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1981 consist of a single @var{operand}, either a register (one of @code{r0},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1982 ..., @code{r0}) or an integer. Complex expressions are lists of the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1983 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1984 C, assignments are not expressions.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1985
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	1986 In the following table, @var{X} is the target resister for a @dfn{set}.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1987 In subexpressions, this is implicitly @code{r7}. This means that
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1988 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1989 freely in subexpressions, since they return parts of their values in
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1990 @code{r7}. @var{Y} may be an expression, register, or integer, while
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1991 @var{Z} must be a register or an integer.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1992
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1993 @multitable @columnfractions .22 .14 .09 .55
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1994 @item Name @tab Operator @tab Code @tab C-like Description
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1995 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1996 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1997 @item CCL_MUL @tab @code{} @tab 0x02 @tab X = Y Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1998 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	1999 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2000 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2001 @item CCL_OR @tab @code{\|} @tab 0x06 @tab X = Y \| Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2002 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2003 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2004 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2005 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) \| Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2006 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2007 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2008 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2009 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2010 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2011 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2012 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2013 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2014 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2015 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2016 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2017 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2018 @end multitable
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2019
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	2020 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8,
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2021 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2022 and CCL_DECODE_SJIS treat their first and second bytes as the high and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2023 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2024 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2025 complicated transformation of the Japanese standard JIS encoding to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2026 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2027 represent the SJIS operations in infix form.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2028
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2029 @node Calling CCL, CCL Example, CCL Expressions, CCL
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2030 @comment Node, Next, Previous, Up
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2031 @subsection Calling CCL
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2032
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	2033 CCL programs are called automatically during Emacs buffer I/O when the
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2034 external representation has a coding system type of @code{shift-jis},
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2035 @code{big5}, or @code{ccl}. The program is specified by the coding
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2036 system (@pxref{Coding Systems}). You can also call CCL programs from
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2037 other CCL programs, and from Lisp using these functions:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2038
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2039 @defun ccl-execute ccl-program status
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2040 Execute @var{ccl-program} with registers initialized by
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2041 @var{status}. @var{ccl-program} is a vector of compiled CCL code
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2042 created by @code{ccl-compile}. It is an error for the program to try to
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2043 execute a CCL I/O command. @var{status} must be a vector of nine
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2044 values, specifying the initial value for the R0, R1 .. R7 registers and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2045 for the instruction counter IC. A @code{nil} value for a register
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2046 initializer causes the register to be set to 0. A @code{nil} value for
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2047 the IC initializer causes execution to start at the beginning of the
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2048 program. When the program is done, @var{status} is modified (by
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2049 side-effect) to contain the ending values for the corresponding
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2050 registers and IC.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2051 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2052
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2053 @defun ccl-execute-on-string ccl-program status string &optional continue
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2054 Execute @var{ccl-program} with initial @var{status} on
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2055 @var{string}. @var{ccl-program} is a vector of compiled CCL code
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2056 created by @code{ccl-compile}. @var{status} must be a vector of nine
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2057 values, specifying the initial value for the R0, R1 .. R7 registers and
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2058 for the instruction counter IC. A @code{nil} value for a register
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2059 initializer causes the register to be set to 0. A @code{nil} value for
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2060 the IC initializer causes execution to start at the beginning of the
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2061 program. An optional fourth argument @var{continue}, if non-@code{nil}, causes
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2062 the IC to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2063 remain on the unsatisfied read operation if the program terminates due
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2064 to exhaustion of the input buffer. Otherwise the IC is set to the end
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2065 of the program. When the program is done, @var{status} is modified (by
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2066 side-effect) to contain the ending values for the corresponding
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2067 registers and IC. Returns the resulting string.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2068 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2069
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	2070 To call a CCL program from another CCL program, it must first be
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2071 registered:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2072
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2073 @defun register-ccl-program name ccl-program
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2074 Register @var{name} for CCL program @var{ccl-program} in
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2075 @code{ccl-program-table}. @var{ccl-program} should be the compiled form of
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2076 a CCL program, or @code{nil}. Return index number of the registered CCL
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2077 program.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2078 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2079
442 abe6d1db359e Import from CVS: tag r21-2-36 cvs parents: 440 diff changeset	2080 Information about the processor time used by the CCL interpreter can be
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2081 obtained using these functions:
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2082
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2083 @defun ccl-elapsed-time
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2084 Returns the elapsed processor time of the CCL interpreter as cons of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2085 user and system time, as
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2086 floating point numbers measured in seconds. If only one
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2087 overall value can be determined, the return value will be a cons of that
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2088 value and 0.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2089 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2090
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2091 @defun ccl-reset-elapsed-time
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2092 Resets the CCL interpreter's internal elapsed time registers.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2093 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2094
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2095 @node CCL Example, , Calling CCL, CCL
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2096 @comment Node, Next, Previous, Up
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2097 @subsection CCL Example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2098
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2099 In this section, we describe the implementation of a trivial coding
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2100 system to transform from the Web's URL encoding to XEmacs' internal
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2101 coding. Many people will have been first exposed to URL encoding when
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2102 they saw ``%20'' where they expected a space in a file's name on their
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2103 local hard disk; this can happen when a browser saves a file from the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2104 web and doesn't encode the name, as passed from the server, properly.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2105
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2106 URL encoding itself is underspecified with regard to encodings beyond
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2107 ASCII. The relevant document, RFC 1738, explicitly doesn't give any
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2108 information on how to encode non-ASCII characters, and the ``obvious''
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2109 way---use the %xx values for the octets of the eight bit MIME character
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2110 set in which the page was served---breaks when a user types a character
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2111 outside that character set. Best practice for web development is to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2112 serve all pages as UTF-8 and treat incoming form data as using that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2113 coding system. (Oh, and gamble that your clients won't ever want to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2114 type anything outside Unicode. But that's not so much of a gamble with
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2115 today's client operating systems.) We don't treat non-ASCII in this
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2116 example, as dealing with @samp{(read-multibyte-character ...)} and
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2117 errors therewith would make it much harder to understand.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2118
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2119 Since CCL isn't a very rich language, we move much of the logic that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2120 would ordinarily be computed from operations like @code{(member ..)},
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2121 @code{(and ...)} and @code{(or ...)} into tables, from which register
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2122 values are read and written, and on which @code{if} statements are
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2123 predicated. Much more of the implementation of this coding system is
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2124 occupied with constructing these tables---in normal Emacs Lisp---than it
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2125 is with actual CCL code.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2126
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2127 All the @code{defvar} statements we deal with in the next few sections
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2128 are surrounded by a @code{(eval-and-compile ...)}, which means that the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2129 logic which initializes these variables executes at compile time, and if
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2130 XEmacs loads the compiled version of the file, these variables are
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2131 initialized as constants.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2132
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2133 @menu
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2134 * Four bits to ASCII:: Two tables used for getting hex digits from ASCII.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2135 * URI Encoding constants:: Useful predefined characters.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2136 * Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2137 * Characters to be preserved:: No transformation needed for these characters.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2138 * The program to decode to internal format:: .
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2139 * The program to encode from internal format:: .
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2140 * The actual coding system:: .
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2141 @end menu
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2142
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2143 @node Four bits to ASCII, URI Encoding constants, , CCL Example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2144 @subsubsection Four bits to ASCII
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2145
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2146 The first @code{defvar} is for
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2147 @code{url-coding-high-order-nybble-as-ascii}, a 256-entry table that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2148 maps from an octet's value to the ASCII encoding for the hex value of
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2149 its most significant four bits. That might sound complex, but it isn't;
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2150 for decimal 65, hex value @samp{#x41}, the entry in the table is the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2151 ASCII encoding of `4'. For decimal 122, ASCII `z', hex value
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2152 @code{#x7a}, @code{(elt url-coding-high-order-nybble-as-ascii #x7a)}
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2153 after this file is loaded gives the ASCII encoding of 7.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2154
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2155 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2156 (defvar url-coding-high-order-nybble-as-ascii
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2157 (let ((val (make-vector 256 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2158 (i 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2159 (while (< i (length val))
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2160 (aset val i (char-to-int (aref (format "%02X" i) 0)))
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2161 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2162 val)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2163 "Table to find an ASCII version of an octet's most significant 4 bits.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2164 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2165
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2166 The next table, @code{url-coding-low-order-nybble-as-ascii} is almost
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2167 the same thing, but this time it has a map for the hex encoding of the
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2168 low-order four bits. So the sixty-fifth entry (offset @samp{#x41}) is
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2169 the ASCII encoding of `1', the hundred-and-twenty-second (offset
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2170 @samp{#x7a}) is the ASCII encoding of `A'.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2171
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2172 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2173 (defvar url-coding-low-order-nybble-as-ascii
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2174 (let ((val (make-vector 256 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2175 (i 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2176 (while (< i (length val))
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2177 (aset val i (char-to-int (aref (format "%02X" i) 1)))
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2178 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2179 val)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2180 "Table to find an ASCII version of an octet's least significant 4 bits.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2181 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2182
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2183 @node URI Encoding constants, Numeric to ASCII-hexadecimal conversion, Four bits to ASCII, CCL Example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2184 @subsubsection URI Encoding constants
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2185
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2186 Next, we have a couple of variables that make the CCL code more
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2187 readable. The first is the ASCII encoding of the percentage sign; this
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2188 character is used as an escape code, to start the encoding of a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2189 non-printable character. For historical reasons, URL encoding allows
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2190 the space character to be encoded as a plus sign--it does make typing
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2191 URLs like @samp{http://google.com/search?q=XEmacs+home+page} easier--and
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2192 as such, we have to check when decoding for this value, and map it to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2193 the space character. When doing this in CCL, we use the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2194 @code{url-coding-escaped-space-code} variable.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2195
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2196 @example
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2197 (defvar url-coding-escape-character-code (char-to-int ?%)
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2198 "The code point for the percentage sign, in ASCII.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2199
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2200 (defvar url-coding-escaped-space-code (char-to-int ?+)
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2201 "The URL-encoded value of the space character, that is, +.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2202 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2203
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2204 @node Numeric to ASCII-hexadecimal conversion, Characters to be preserved, URI Encoding constants, CCL Example
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2205 @subsubsection Numeric to ASCII-hexadecimal conversion
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2206
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2207 Now, we have a couple of utility tables that wouldn't be necessary in
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2208 a more expressive programming language than is CCL. The first is sixteen
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2209 in length, and maps a hexadecimal number to the ASCII encoding of that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2210 number; so zero maps to ASCII `0', ten maps to ASCII `A.' The second
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2211 does the reverse; that is, it maps an ASCII character to its value when
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2212 interpreted as a hexadecimal digit. ('A' => 10, 'c' => 12, '2' => 2, as
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2213 a few examples.)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2214
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2215 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2216 (defvar url-coding-hex-digit-table
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2217 (let ((i 0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2218 (val (make-vector 16 0)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2219 (while (< i 16)
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2220 (aset val i (char-to-int (aref (format "%X" i) 0)))
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2221 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2222 val)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2223 "A map from a hexadecimal digit's numeric value to its encoding in ASCII.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2224
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2225 (defvar url-coding-latin-1-as-hex-table
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2226 (let ((val (make-vector 256 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2227 (i 0))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2228 (while (< i (length val))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2229 ;; Get a hex val for this ASCII character.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2230 (aset val i (string-to-int (format "%c" i) 16))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2231 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2232 val)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2233 "A map from Latin 1 code points to their values as hexadecimal digits.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2234 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2235
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2236 @node Characters to be preserved, The program to decode to internal format, Numeric to ASCII-hexadecimal conversion, CCL Example
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2237 @subsubsection Characters to be preserved
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2238
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2239 And finally, the last of these tables. URL encoding says that
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2240 alphanumeric characters, the underscore, hyphen and the full stop
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2241 @footnote{That's what the standards call it, though my North American
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2242 readers will be more familiar with it as the period character.} retain
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2243 their ASCII encoding, and don't undergo transformation.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2244 @code{url-coding-should-preserve-table} is an array in which the entries
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2245 are one if the corresponding ASCII character should be left as-is, and
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2246 zero if they should be transformed. So the entries for all the control
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2247 and most of the punctuation charcters are zero. Lisp programmers will
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2248 observe that this initialization is particularly inefficient, but
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2249 they'll also be aware that this is a long way from an inner loop where
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2250 every nanosecond counts.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2251
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2252 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2253 (defvar url-coding-should-preserve-table
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2254 (let ((preserve
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2255 (list ?- ?_ ?. ?a ?b ?c ?d ?e ?f ?g ?h ?i ?j ?k ?l ?m ?n ?o
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2256 ?p ?q ?r ?s ?t ?u ?v ?w ?x ?y ?z ?A ?B ?C ?D ?E ?F ?G
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2257 ?H ?I ?J ?K ?L ?M ?N ?O ?P ?Q ?R ?S ?T ?U ?V ?W ?X ?Y
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2258 ?Z ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2259 (i 0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2260 (res (make-vector 256 0)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2261 (while (< i 256)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2262 (when (member (int-char i) preserve)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2263 (aset res i 1))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2264 (setq i (1+ i)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2265 res)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2266 "A 256-entry array of flags, indicating whether or not to preserve an
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2267 octet as its ASCII encoding.")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2268 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2269
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2270 @node The program to decode to internal format, The program to encode from internal format, Characters to be preserved, CCL Example
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2271 @subsubsection The program to decode to internal format
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2272
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2273 After the almost interminable tables, we get to the CCL. The first
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2274 CCL program, @code{ccl-decode-urlcoding} decodes from the URL coding to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2275 our internal format; since this version of CCL doesn't have support for
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2276 error checking on the input, we don't do any verification on it.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2277
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2278 The buffer magnification--approximate ratio of the size of the output
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2279 buffer to the size of the input buffer--is declared as one, because
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2280 fractional values aren't allowed. (Since all those %20's will map to
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2281 ` ', the length of the output text will be less than that of the input
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2282 text.)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2283
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2284 So, first we read an octet from the input buffer into register
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2285 @samp{r0}, to set up the loop. Next, we start the loop, with a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2286 @code{(loop ...)} statement, and we check if the value in @samp{r0} is a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2287 percentage sign. (Note the comma before
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2288 @code{url-coding-escape-character-code}; since CCL is a Lisp macro
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2289 language, we can break out of the macro evaluation with a comman, and as
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2290 such, ``@code{,url-coding-escape-character-code}'' will be evaluated as a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2291 literal `37.')
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2292
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2293 If it is a percentage sign, we read the next two octets into @samp{r2}
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2294 and @samp{r3}, and convert them into their hexadecimal numeric values,
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2295 using the @code{url-coding-latin-1-as-hex-table} array declared above.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2296 (But again, it'll be interpreted as a literal array.) We then left
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2297 shift the first by four bits, mask the two together, and write the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2298 result to the output buffer.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2299
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2300 If it isn't a percentage sign, and it is a `+' sign, we write a
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2301 space--hexadecimal 20--to the output buffer.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2302
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2303 If none of those things are true, we pass the octet to the output buffer
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2304 untransformed. (This could be a place to put error checking, in a more
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2305 expressive language.) We then read one more octet from the input
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2306 buffer, and move to the next iteration of the loop.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2307
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2308 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2309 (define-ccl-program ccl-decode-urlcoding
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2310 `(1
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2311 ((read r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2312 (loop
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2313 (if (r0 == ,url-coding-escape-character-code)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2314 ((read r2 r3)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2315 ;; Assign the value at offset r2 in the url-coding-hex-digit-table
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2316 ;; to r3.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2317 (r2 = r2 ,url-coding-latin-1-as-hex-table)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2318 (r3 = r3 ,url-coding-latin-1-as-hex-table)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2319 (r2 <<= 4)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2320 (r3 \|= r2)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2321 (write r3))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2322 (if (r0 == ,url-coding-escaped-space-code)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2323 (write #x20)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2324 (write r0)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2325 (read r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2326 (repeat))))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2327 "CCL program to take URI-encoded ASCII text and transform it to our
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2328 internal encoding. ")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2329 @end example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2330
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2331 @node The program to encode from internal format, The actual coding system, The program to decode to internal format, CCL Example
2640 a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2332 @subsubsection The program to encode from internal format
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2333
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2334 Next, we see the CCL program to encode ASCII text as URL coded text.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2335 Here, the buffer magnification is specified as three, to account for ` '
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2336 mapping to %20, etc. As before, we read an octet from the input into
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2337 @samp{r0}, and move into the body of the loop. Next, we check if we
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2338 should preserve the value of this octet, by reading from offset
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2339 @samp{r0} in the @code{url-coding-should-preserve-table} into @samp{r1}.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2340 Then we have an @samp{if} statement predicated on the value in
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2341 @samp{r1}; for the true branch, we write the input octet directly. For
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2342 the false branch, we write a percentage sign, the ASCII encoding of the
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2343 high four bits in hex, and then the ASCII encoding of the low four bits
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2344 in hex.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2345
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2346 We then read an octet from the input into @samp{r0}, and repeat the loop.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2347
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2348 @example
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2349 (define-ccl-program ccl-encode-urlcoding
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2350 `(3
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2351 ((read r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2352 (loop
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2353 (r1 = r0 ,url-coding-should-preserve-table)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2354 ;; If we should preserve the value, just write the octet directly.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2355 (if r1
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2356 (write r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2357 ;; else, write a percentage sign, and the hex value of the octet, in
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2358 ;; an ASCII-friendly format.
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2359 ((write ,url-coding-escape-character-code)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2360 (write r0 ,url-coding-high-order-nybble-as-ascii)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2361 (write r0 ,url-coding-low-order-nybble-as-ascii)))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2362 (read r0)
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2363 (repeat))))
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2364 "CCL program to encode octets (almost) according to RFC 1738")
a4040d921acc [xemacs-hg @ 2005-03-09 05:36:28 by stephent] stephent parents: 2367 diff changeset	2365 @end example
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2366
2690 d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2367 @node The actual coding system, , The program to encode from internal format, CCL Example
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2368 @subsubsection The actual coding system
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2369
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2370 To actually create the coding system, we call
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2371 @samp{make-coding-system}. The first argument is the symbol that is to
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2372 be the name of the coding system, in our case @samp{url-coding}. The
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2373 second specifies that the coding system is to be of type
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2374 @samp{ccl}---there are several other coding system types available,
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2375 including, see the documentation for @samp{make-coding-system} for the
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2376 full list. Then there's a documentation string describing the wherefore
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2377 and caveats of the coding system, and the final argument is a property
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2378 list giving information about the CCL programs and the coding system's
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2379 mnemonic.
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2380
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2381 @example
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2382 (make-coding-system
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2383 'url-coding 'ccl
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2384 "The coding used by application/x-www-form-urlencoded HTTP applications.
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2385 This coding form doesn't specify anything about non-ASCII characters, so
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2386 make sure you've transformed to a seven-bit coding system first."
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2387 '(decode ccl-decode-urlcoding
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2388 encode ccl-encode-urlcoding
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2389 mnemonic "URLenc"))
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2390 @end example
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2391
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2392 If you're lucky, the @samp{url-coding} coding system describe here
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2393 should be available in the XEmacs package system. Otherwise, downloading
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2394 it from @samp{http://www.parhasard.net/url-coding.el} should work for
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2395 the foreseeable future.
d5bfa26d5c3f [xemacs-hg @ 2005-03-26 16:20:01 by aidan] aidan parents: 2640 diff changeset	2396
775 7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2397 @node Category Tables, Unicode Support, CCL, MULE
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2398 @section Category Tables
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2399
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2400 A category table is a type of char table used for keeping track of
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2401 categories. Categories are used for classifying characters for use in
440 8de8e3f6228a Import from CVS: tag r21-2-28 cvs parents: 428 diff changeset	2402 regexps---you can refer to a category rather than having to use a
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2403 complicated [] expression (and category lookups are significantly
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2404 faster).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2405
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2406 There are 95 different categories available, one for each printable
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2407 character (including space) in the ASCII charset. Each category is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2408 designated by one such character, called a @dfn{category designator}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2409 They are specified in a regexp using the syntax @samp{\cX}, where X is a
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2410 category designator. (This is not yet implemented.)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2411
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2412 A category table specifies, for each character, the categories that
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2413 the character is in. Note that a character can be in more than one
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2414 category. More specifically, a category table maps from a character to
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2415 either the value @code{nil} (meaning the character is in no categories)
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2416 or a 95-element bit vector, specifying for each of the 95 categories
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2417 whether the character is in that category.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2418
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2419 Special Lisp functions are provided that abstract this, so you do not
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2420 have to directly manipulate bit vectors.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2421
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2422 @defun category-table-p object
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2423 This function returns @code{t} if @var{object} is a category table.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2424 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2425
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2426 @defun category-table &optional buffer
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2427 This function returns the current category table. This is the one
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2428 specified by the current buffer, or by @var{buffer} if it is
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2429 non-@code{nil}.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2430 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2431
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2432 @defun standard-category-table
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2433 This function returns the standard category table. This is the one used
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2434 for new buffers.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2435 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2436
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2437 @defun copy-category-table &optional category-table
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2438 This function returns a new category table which is a copy of
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2439 @var{category-table}, which defaults to the standard category table.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2440 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2441
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2442 @defun set-category-table category-table &optional buffer
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2443 This function selects @var{category-table} as the new category table for
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2444 @var{buffer}. @var{buffer} defaults to the current buffer if omitted.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2445 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2446
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2447 @defun category-designator-p object
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2448 This function returns @code{t} if @var{object} is a category designator (a
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2449 char in the range @samp{' '} to @samp{'~'}).
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2450 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2451
444 576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2452 @defun category-table-value-p object
576fb035e263 Import from CVS: tag r21-2-37 cvs parents: 442 diff changeset	2453 This function returns @code{t} if @var{object} is a category table value.
428 3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2454 Valid values are @code{nil} or a bit vector of size 95.
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2455 @end defun
3ecd8885ac67 Import from CVS: tag r21-2-22 cvs parents: diff changeset	2456
775 7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2457
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2458 @c Added 2002-03-13 sjt
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2459 @node Unicode Support, Charset Unification, Category Tables, MULE
775 7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2460 @section Unicode Support
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2461 @cindex unicode
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2462 @cindex utf-8
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2463 @cindex utf-16
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2464 @cindex ucs-2
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2465 @cindex ucs-4
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2466 @cindex bmp
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2467 @cindex basic multilingual plance
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2468
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2469 Unicode support was added by Ben Wing to XEmacs 21.5.6.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2470
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2471 @defun set-language-unicode-precedence-list list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2472 Set the language-specific precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2473 This is a list of charsets, which are consulted in order for a translation
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2474 matching a given Unicode character. If no matches are found, the charsets
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2475 in the default precedence list (see
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2476 @code{set-default-unicode-precedence-list}) are consulted, and then all
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2477 remaining charsets, in some arbitrary order.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2478
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2479 The language-specific precedence list is meant to be set as part of the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2480 language environment initialization; the default precedence list is meant
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2481 to be set by the user.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2482 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2483
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2484 @defun language-unicode-precedence-list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2485 Return the language-specific precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2486 See @code{set-language-unicode-precedence-list} for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2487 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2488
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2489 @defun set-default-unicode-precedence-list list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2490 Set the default precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2491 This is meant to be set by the user. See
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2492 `set-language-unicode-precedence-list' for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2493 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2494
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2495 @defun default-unicode-precedence-list
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2496 Return the default precedence list used for Unicode decoding.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2497 See @code{set-language-unicode-precedence-list} for more information.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2498 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2499
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2500 @defun set-unicode-conversion character code
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2501 Add conversion information between Unicode codepoints and characters.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2502 @var{character} is one of the following:
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2503
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2504 @c #### fix this markup
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2505 -- A character (in which case @var{code} must be a non-negative integer)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2506 -- A vector of characters (in which case @var{code} must be a vector of
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2507 non-negative integers of the same length)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2508
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2509 Values of @var{code} above 2^20 - 1 are allowed for the purpose of specifying
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2510 private characters, but will cause errors when converted to UTF-16 or UTF-32.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2511 UCS-4 and UTF-8 can handle values to 2^31 - 1, but XEmacs Lisp integers top
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2512 out at 2^30 - 1.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2513 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2514
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2515 @defun character-to-unicode character
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2516 Convert @var{character} to Unicode codepoint.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2517 When there is no international support (i.e. MULE is not defined),
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2518 this function simply does @code{char-to-int}.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2519 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2520
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2521 @defun unicode-to-character code [charsets]
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2522 Convert Unicode codepoint @var{code} to character.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2523 @var{code} should be a non-negative integer.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2524 If @var{charsets} is given, it should be a list of charsets, and only those
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2525 charsets will be consulted, in the given order, for a translation.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2526 Otherwise, the default ordering of all charsets will be given (see
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2527 @code{set-unicode-charset-precedence}).
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2528
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2529 When there is no international support (i.e. MULE is not defined),
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2530 this function simply does @code{int-to-char} and ignores the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2531 @var{charsets} argument.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2532 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2533
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2534 @defun parse-unicode-translation-table filename charset start end offset flags
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2535 Parse Unicode translation data in @var{filename} for MULE @var{charset}.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2536 Data is text, in the form of one translation per line -- charset
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2537 codepoint followed by Unicode codepoint. Numbers are decimal or hex
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2538 \(preceded by 0x). Comments are marked with a #. Charset codepoints
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2539 for two-dimensional charsets should have the first octet stored in the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2540 high 8 bits of the hex number and the second in the low 8 bits.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2541
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2542 If @var{start} and @var{end} are given, only charset codepoints within
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2543 the given range will be processed. If @var{offset} is given, that value
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2544 will be added to all charset codepoints in the file to obtain the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2545 internal charset codepoint. @var{start} and @var{end} apply to the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2546 codepoints in the file, before @var{offset} is applied.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2547
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2548 (Note that, as usual, we assume that octets are in the range 32 to
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2549 127 or 33 to 126. If you have a table in kuten form, with octets in
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2550 the range 1 to 94, you will have to use an offset of 5140,
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2551 i.e. 0x2020.)
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2552
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2553 @var{flags}, if specified, control further how the tables are interpreted
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2554 and are used to special-case certain known table weirdnesses in the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2555 Unicode tables:
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2556
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2557 @table @code
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2558 @item ignore-first-column'
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2559 Exactly as it sounds. The JIS X 0208 tables have 3 columns of data instead
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2560 of 2; the first is the Shift-JIS codepoint.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2561
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2562 @item big5
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2563 The charset codepoint is a Big Five codepoint; convert it to the
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2564 proper hacked-up codepoint in `chinese-big5-1' or `chinese-big5-2'.
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2565 @end table
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2566 @end defun
7d972c3de90a [xemacs-hg @ 2002-03-14 11:50:12 by stephent] stephent parents: 444 diff changeset	2567
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2568
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2569 @node Charset Unification, Charsets and Coding Systems, Unicode Support, MULE
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2570 @section Character Set Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2571
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2572 Mule suffers from a design defect that causes it to consider the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2573 Latin character sets to be disjoint. This results in oddities such as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2574 files containing both ISO 8859/1 and ISO 8859/15 codes, and using ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2575 2022 control sequences to switch between them, as well as more plausible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2576 but often unnecessary combinations like ISO 8859/1 with ISO 8859/2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2577 This can be very annoying when sending messages or even in simple
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2578 editing on a single host. Unification works around the problem by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2579 converting as many characters as possible to use a single Latin coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2580 character set before saving the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2581
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2582 This node and its children were ripp'd untimely from
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2583 @file{latin-unity.texi}, and have been quickly converted for use here.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2584 However as APIs are likely to diverge, beware of inaccuracies. Please
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2585 report any you discover with @kbd{M-x report-xemacs-bug RET}, as well
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2586 as any ambiguities or downright unintelligible passages.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2587
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2588 A lot of the stuff here doesn't belong here; it belongs in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2589 @ref{Top, , , xemacs, XEmacs User's Manual}. Report those as bugs,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2590 too, preferably with patches.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2591
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2592 @menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2593 * Overview:: Unification history and general information.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2594 * Usage:: An overview of the operation of Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2595 * Configuration:: Configuring Unification for use.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2596 * Theory of Operation:: How Unification works.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2597 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2598 * Charsets and Coding Systems:: Reference lists with annotations.
1188 11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs] youngs parents: 1183 diff changeset	2599 * Unification Internals:: Utilities and implementation details.
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2600 @end menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2601
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2602 @node Overview, Usage, Charset Unification, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2603 @subsection An Overview of Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2604
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2605 Mule suffers from a design defect that causes it to consider the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2606 Latin character sets to be disjoint. This manifests itself when a user
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2607 enters characters using input methods associated with different coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2608 character sets into a single buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2609
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2610 A very important example involves email. Many sites, especially in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2611 U.S., default to use of the ISO 8859/1 coded character set (also called
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2612 ``Latin 1,'' though these are somewhat different concepts). However,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2613 ISO 8859/1 provides a generic CURRENCY SIGN character. Now that the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2614 Euro has become the official currency of most countries in Europe, this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2615 is unsatisfactory (and in practice, useless). So Europeans generally
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2616 use ISO 8859/15, which is nearly identical to ISO 8859/1 for most
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2617 languages, except that it substitutes EURO SIGN for CURRENCY SIGN.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2618
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2619 Suppose a European user yanks text from a post encoded in ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2620 into a message composition buffer, and enters some text including the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2621 Euro sign. Then Mule will consider the buffer to contain both ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2622 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2623 programmed) send the message as a multipart mixed MIME body!
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2624
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2625 This is clearly stupid. What is not as obvious is that, just as any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2626 European can include American English in their text because ASCII is a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2627 subset of ISO 8859/15, most European languages which use Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2628 characters (eg, German and Polish) can typically be mixed while using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2629 only one Latin coded character set (in this case, ISO 8859/2). However,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2630 this often depends on exactly what text is to be encoded.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2631
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2632 Unification works around the problem by converting as many characters as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2633 possible to use a single Latin coded character set before saving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2634 buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2635
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2636 @node Usage, Configuration, Overview, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2637 @subsection Operation of Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2638
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2639 Normally, Unification works in the background by installing
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2640 @code{unity-sanity-check} on @code{write-region-pre-hook}. This is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2641 done by default for the ISO 8859 Latin family of character sets. The
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2642 user activates this functionality for other character set families by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2643 invoking @code{enable-unification}, either interactively or in her
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2644 init file. @xref{Init File, , , xemacs}. Unification can be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2645 deactivated by invoking @code{disable-unification}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2646
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2647 Unification also provides a few functions for remapping or recoding the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2648 buffer by hand. To @dfn{remap} a character means to change the buffer
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2649 representation of the character by using another coded character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2650 Remapping never changes the identity of the character, but may involve
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2651 altering the code point of the character. To @dfn{recode} a character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2652 means to simply change the coded character set. Recoding never alters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2653 the code point of the character, but may change the identity of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2654 character. @xref{Theory of Operation}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2655
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2656 There are a few variables which determine which coding systems are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2657 always acceptable to Unification: @code{unity-ucs-list},
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2658 @code{unity-preferred-coding-system-list}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2659 @code{unity-preapproved-coding-system-list}. The latter two default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2660 to @code{()}, and should probably be avoided because they short-circuit
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2661 the sanity check. If you find you need to use them, consider reporting
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2662 it as a bug or request for enhancement. Because they seem unsafe, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2663 recommended interface is likely to change.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2664
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2665 @menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2666 * Basic Functionality:: User interface and customization.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2667 * Interactive Usage:: Treating text by hand.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2668 Also documents the hook function(s).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2669 @end menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2670
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2671
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2672 @node Basic Functionality, Interactive Usage, , Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2673 @section Basic Functionality
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2674
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2675 These functions and user options initialize and configure Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2676 In normal use, none of these should be needed.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2677
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2678 @strong{These APIs are certain to change.}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2679
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2680 @defun enable-unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2681 Set up hooks and initialize variables for latin-unity.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2682
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2683 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2684
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2685 This function is idempotent. It will reinitialize any hooks or variables
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2686 that are not in initial state.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2687 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2688
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2689 @defun disable-unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2690 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2691
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2692 Clean up hooks and void variables used by latin-unity.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2693 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2694
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2695 @defopt unity-ucs-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2696 List of coding systems considered to be universal.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2697
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2698 The default value is @code{'(utf-8 iso-2022-7 ctext escape-quoted)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2699
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2700 Order matters; coding systems earlier in the list will be preferred when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2701 recommending a coding system. These coding systems will not be used
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2702 without querying the user (unless they are also present in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2703 @code{unity-preapproved-coding-system-list}), and follow the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2704 @code{unity-preferred-coding-system-list} in the list of suggested
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2705 coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2706
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2707 If none of the preferred coding systems are feasible, the first in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2708 this list will be the default.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2709
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2710 Notes on certain coding systems: @code{escape-quoted} is a special
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2711 coding system used for autosaves and compiled Lisp in Mule. You should
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2712 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2713 never delete this, although it is rare that a user would want to use it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2714 directly. Unification does not try to be \"smart\" about other general
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2715 ISO 2022 coding systems, such as ISO-2022-JP. (They are not recognized
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2716 as equivalent to @code{iso-2022-7}.) If your preferred coding system is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2717 one of these, you may consider adding it to @code{unity-ucs-list}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2718 However, this will typically have the side effect that (eg) ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2719 files will be saved in 7-bit form with ISO 2022 escape sequences.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2720 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2721
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2722 Coding systems which are not Latin and not in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2723 @code{unity-ucs-list} are handled by short circuiting checks of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2724 coding system against the next two variables.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2725
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2726 @defopt unity-preapproved-coding-system-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2727 List of coding systems used without querying the user if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2728
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2729 The default value is @samp{(buffer-default preferred)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2730
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2731 The first feasible coding system in this list is used. The special values
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2732 @samp{preferred} and @samp{buffer-default} may be present:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2733
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2734 @table @code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2735 @item buffer-default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2736 Use the coding system used by @samp{write-region}, if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2737
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2738 @item preferred
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2739 Use the coding system specified by @samp{prefer-coding-system} if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2740 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2741
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2742 "Feasible" means that all characters in the buffer can be represented by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2743 the coding system. Coding systems in @samp{unity-ucs-list} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2744 always considered feasible. Other feasible coding systems are computed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2745 by @samp{unity-representations-feasible-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2746
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2747 Note that the first universal coding system in this list shadows all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2748 other coding systems. In particular, if your preferred coding system is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2749 a universal coding system, and @code{preferred} is a member of this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2750 list, unification will blithely convert all your files to that coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2751 system. This is considered a feature, but it may surprise most users.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2752 Users who don't like this behavior should put @code{preferred} in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2753 @code{unity-preferred-coding-system-list}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2754 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2755
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2756 @defopt unity-preferred-coding-system-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2757 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2758 List of coding systems suggested to the user if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2759
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2760 The default value is @samp{(iso-8859-1 iso-8859-15 iso-8859-2 iso-8859-3
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2761 iso-8859-4 iso-8859-9)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2762
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2763 If none of the coding systems in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2764 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2765 @code{unity-preapproved-coding-system-list} are feasible, this list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2766 will be recommended to the user, followed by the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2767 @code{unity-ucs-list}. The first coding system in this list is default. The
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2768 special values @samp{preferred} and @samp{buffer-default} may be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2769 present:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2770
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2771 @table @code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2772 @item buffer-default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2773 Use the coding system used by @samp{write-region}, if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2774
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2775 @item preferred
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2776 Use the coding system specified by @samp{prefer-coding-system} if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2777 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2778
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2779 "Feasible" means that all characters in the buffer can be represented by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2780 the coding system. Coding systems in @samp{unity-ucs-list} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2781 always considered feasible. Other feasible coding systems are computed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2782 by @samp{unity-representations-feasible-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2783 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2784
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2785
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2786 @defvar unity-iso-8859-1-aliases
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2787 List of coding systems to be treated as aliases of ISO 8859/1.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2788
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2789 The default value is '(iso-8859-1).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2790
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2791 This is not a user variable; to customize input of coding systems or
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2792 charsets, @samp{unity-coding-system-alias-alist} or
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2793 @samp{unity-charset-alias-alist}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2794 @end defvar
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2795
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2796
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2797 @node Interactive Usage, , Basic Functionality, Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2798 @section Interactive Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2799
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2800 First, the hook function @code{unity-sanity-check} is documented.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2801 (It is placed here because it is not an interactive function, and there
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2802 is not yet a programmer's section of the manual.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2803
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2804 These functions provide access to internal functionality (such as the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2805 remapping function) and to extra functionality (the recoding functions
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2806 and the test function).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2807
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2808
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2809 @defun unity-sanity-check begin end filename append visit lockname &optional coding-system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2810
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2811 Check if @var{coding-system} can represent all characters between
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2812 @var{begin} and @var{end}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2813
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2814 For compatibility with old broken versions of @code{write-region},
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2815 @var{coding-system} defaults to @code{buffer-file-coding-system}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2816 @var{filename}, @var{append}, @var{visit}, and @var{lockname} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2817 ignored.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2818
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2819 Return nil if buffer-file-coding-system is not (ISO-2022-compatible)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2820 Latin. If @code{buffer-file-coding-system} is safe for the charsets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2821 actually present in the buffer, return it. Otherwise, ask the user to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2822 choose a coding system, and return that.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2823
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2824 This function does @emph{not} do the safe thing when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2825 @code{buffer-file-coding-system} is nil (aka no-conversion). It
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2826 considers that ``non-Latin,'' and passes it on to the Mule detection
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2827 mechanism.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2828
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2829 This function is intended for use as a @code{write-region-pre-hook}. It
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2830 does nothing except return @var{coding-system} if @code{write-region}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2831 handlers are inhibited.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2832 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2833
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2834 @defun unity-buffer-representations-feasible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2835
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2836 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2837
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2838 Apply unity-region-representations-feasible to the current buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2839 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2840
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2841 @defun unity-region-representations-feasible begin end &optional buf
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2842
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2843 Return character sets that can represent the text from @var{begin} to @var{end} in @var{buf}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2844
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2845 @var{buf} defaults to the current buffer. Called interactively, will be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2846 applied to the region. Function assumes @var{begin} <= @var{end}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2847
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2848 The return value is a cons. The car is the list of character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2849 that can individually represent all of the non-ASCII portion of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2850 buffer, and the cdr is the list of character sets that can
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2851 individually represent all of the ASCII portion.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2852
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2853 The following is taken from a comment in the source. Please refer to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2854 the source to be sure of an accurate description.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2855
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2856 The basic algorithm is to map over the region, compute the set of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2857 charsets that can represent each character (the ``feasible charset''),
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2858 and take the intersection of those sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2859
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2860 The current implementation takes advantage of the fact that ASCII
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2861 characters are common and cannot change asciisets. Then using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2862 skip-chars-forward makes motion over ASCII subregions very fast.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2863
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2864 This same strategy could be applied generally by precomputing classes
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2865 of characters equivalent according to their effect on latinsets, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2866 adding a whole class to the skip-chars-forward string once a member is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2867 found.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2868
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2869 Probably efficiency is a function of the number of characters matched,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2870 or maybe the length of the match string? With @code{skip-category-forward}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2871 over a precomputed category table it should be really fast. In practice
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2872 for Latin character sets there are only 29 classes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2873 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2874
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2875 @defun unity-remap-region begin end character-set &optional coding-system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2876
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2877 Remap characters between @var{begin} and @var{end} to equivalents in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2878 @var{character-set}. Optional argument @var{coding-system} may be a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2879 coding system name (a symbol) or nil. Characters with no equivalent are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2880 left as-is.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2881
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2882 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2883 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2884 prompts for @var{character-set}. The function does completion, knows
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2885 how to guess a character set name from a coding system name, and also
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2886 provides some common aliases. See @code{unity-guess-charset}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2887 There is no way to specify @var{coding-system}, as it has no useful
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2888 function interactively.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2889
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2890 Return @var{coding-system} if @var{coding-system} can encode all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2891 characters in the region, t if @var{coding-system} is nil and the coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2892 system with G0 = 'ascii and G1 = @var{character-set} can encode all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2893 characters, and otherwise nil. Note that a non-null return does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2894 @emph{not} mean it is safe to write the file, only the specified region.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2895 (This behavior is useful for multipart MIME encoding and the like.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2896
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2897 Note: by default this function is quite fascist about universal coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2898 systems. It only admits @samp{utf-8}, @samp{iso-2022-7}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2899 @samp{ctext}. Customize @code{unity-approved-ucs-list} to change
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2900 this.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2901
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2902 This function remaps characters that are artificially distinguished by Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2903 internal code. It may change the code point as well as the character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2904 To recode characters that were decoded in the wrong coding system, use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2905 @code{unity-recode-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2906 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2907
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2908 @defun unity-recode-region begin end wrong-cs right-cs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2909
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2910 Recode characters between @var{begin} and @var{end} from @var{wrong-cs}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2911 to @var{right-cs}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2912
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2913 @var{wrong-cs} and @var{right-cs} are character sets. Characters retain
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2914 the same code point but the character set is changed. Only characters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2915 from @var{wrong-cs} are changed to @var{right-cs}. The identity of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2916 character may change. Note that this could be dangerous, if characters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2917 whose identities you do not want changed are included in the region.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2918 This function cannot guess which characters you want changed, and which
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2919 should be left alone.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2920
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2921 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2922 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2923 prompts for @var{wrong-cs} and @var{right-cs}. The function does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2924 completion, knows how to guess a character set name from a coding system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2925 name, and also provides some common aliases. See
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2926 @code{unity-guess-charset}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2927
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2928 Another way to accomplish this, but using coding systems rather than
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2929 character sets to specify the desired recoding, is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2930 @samp{unity-recode-coding-region}. That function may be faster
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2931 but is somewhat more dangerous, because it may recode more than one
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2932 character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2933
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2934 To change from one Mule representation to another without changing identity
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2935 of any characters, use @samp{unity-remap-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2936 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2937
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2938 @defun unity-recode-coding-region begin end wrong-cs right-cs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2939
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2940 Recode text between @var{begin} and @var{end} from @var{wrong-cs} to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2941 @var{right-cs}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2942
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2943 @var{wrong-cs} and @var{right-cs} are coding systems. Characters retain
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2944 the same code point but the character set is changed. The identity of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2945 characters may change. This is an inherently dangerous function;
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2946 multilingual text may be recoded in unexpected ways. #### It's also
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2947 dangerous because the coding systems are not sanity-checked in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2948 current implementation.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2949
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2950 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2951 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2952 prompts for @var{wrong-cs} and @var{right-cs}. The function does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2953 completion, knows how to guess a coding system name from a character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2954 name, and also provides some common aliases. See
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2955 @code{unity-guess-coding-system}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2956
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2957 Another, safer, way to accomplish this, using character sets rather
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2958 than coding systems to specify the desired recoding, is to use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2959 @c #### fixme in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2960 @code{unity-recode-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2961
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2962 To change from one Mule representation to another without changing identity
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2963 of any characters, use @code{unity-remap-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2964 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2965
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2966 Helper functions for input of coding system and character set names.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2967
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2968 @defun unity-guess-charset candidate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2969 Guess a charset based on the symbol @var{candidate}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2970
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2971 @var{candidate} itself is not tried as the value.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2972
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2973 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2974 the values in @samp{unity-charset-alias-alist}."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2975 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2976
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2977 @defun unity-guess-coding-system candidate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2978 Guess a coding system based on the symbol @var{candidate}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2979
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2980 @var{candidate} itself is not tried as the value.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2981
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2982 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2983 the values in @samp{unity-coding-system-alias-alist}."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2984 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2985
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2986 @defun unity-example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2987
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2988 A cheesy example for Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2989
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2990 At present it just makes a multilingual buffer. To test, setq
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2991 buffer-file-coding-system to some value, make the buffer dirty (eg
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2992 with RET BackSpace), and save.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2993 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2994
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2995
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2996 @node Configuration, Theory of Operation, Usage, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2997 @subsection Configuring Unification for Use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2998
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	2999 If you want Unification to be automatically initialized, invoke
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3000 @samp{enable-unification} with no arguments in your init file.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3001 @xref{Init File, , , xemacs}. If you are using GNU Emacs or an XEmacs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3002 earlier than 21.1, you should also load @file{auto-autoloads} using the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3003 full path (@emph{never} @samp{require} @file{auto-autoloads} libraries).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3004
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3005 You may wish to define aliases for commonly used character sets and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3006 coding systems for convenience in input.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3007
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3008 @defopt unity-charset-alias-alist
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3009 Alist mapping aliases to Mule charset names (symbols)."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3010
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3011 The default value is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3012 @example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3013 ((latin-1 . latin-iso8859-1)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3014 (latin-2 . latin-iso8859-2)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3015 (latin-3 . latin-iso8859-3)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3016 (latin-4 . latin-iso8859-4)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3017 (latin-5 . latin-iso8859-9)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3018 (latin-9 . latin-iso8859-15)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3019 (latin-10 . latin-iso8859-16))
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3020 @end example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3021
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3022 If a charset does not exist on your system, it will not complete and you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3023 will not be able to enter it in response to prompts. A real charset
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3024 with the same name as an alias in this list will shadow the alias.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3025 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3026
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3027 @defopt unity-coding-system-alias-alist nil
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3028 Alist mapping aliases to Mule coding system names (symbols).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3029
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3030 The default value is @samp{nil}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3031 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3032
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3033
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3034 @node Theory of Operation, What Unification Cannot Do for You, Configuration, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3035 @subsection Theory of Operation
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3036
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3037 Standard encodings suffer from the design defect that they do not
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3038 provide a reliable way to recognize which coded character sets in use.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3039 @xref{What Unification Cannot Do for You}. There are scores of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3040 character sets which can be represented by a single octet (8-bit byte),
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3041 whose union contains many hundreds of characters. Obviously this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3042 results in great confusion, since you can't tell the players without a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3043 scorecard, and there is no scorecard.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3044
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3045 There are two ways to solve this problem. The first is to create a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3046 universal coded character set. This is the concept behind Unicode.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3047 However, there have been satisfactory (nearly) universal character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3048 for several decades, but even today many Westerners resist using Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3049 because they consider its space requirements excessive. On the other
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3050 hand, Asians dislike Unicode because they consider it to be incomplete.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3051 (This is partly, but not entirely, political.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3052
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3053 In any case, Unicode only solves the internal representation problem.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3054 Many data sets will contain files in ``legacy'' encodings, and Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3055 does not help distinguish among them.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3056
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3057 The second approach is to embed information about the encodings used in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3058 a document in its text. This approach is taken by the ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3059 standard. This would solve the problem completely from the users' of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3060 view, except that ISO 2022 is basically not implemented at all, in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3061 sense that few applications or systems implement more than a small
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3062 subset of ISO 2022 functionality. This is due to the fact that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3063 mono-literate users object to the presence of escape sequences in their
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3064 texts (which they, with some justification, consider data corruption).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3065 Programmers are more than willing to cater to these users, since
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3066 implementing ISO 2022 is a painstaking task.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3067
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3068 In fact, Emacs/Mule adopts both of these approaches. Internally it uses
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3069 a universal character set, @dfn{Mule code}. Externally it uses ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3070 techniques both to save files in forms robust to encoding issues, and as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3071 hints when attempting to ``guess'' an unknown encoding. However, Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3072 suffers from a design defect, namely it embeds the character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3073 information that ISO 2022 attaches to runs of characters by introducing
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3074 them with a control sequence in each character. That causes Mule to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3075 consider the ISO Latin character sets to be disjoint. This manifests
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3076 itself when a user enters characters using input methods associated with
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3077 different coded character sets into a single buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3078
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3079 There are two problems stemming from this design. First, Mule
1188 11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs] youngs parents: 1183 diff changeset	3080 represents the same character in different ways. Abstractly, '�'
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3081 (LATIN SMALL LETTER O WITH ACUTE) can get represented as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3082 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like
1188 11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs] youngs parents: 1183 diff changeset	3083 '��' in the display might actually be represented [latin-iso8859-1
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3084 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3085 #xF3 ESC - A] in the file. In some cases this treatment would be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3086 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3087 (the CJK ideographic character meaning ``one'')), and although arguably
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3088 incorrect it is convenient when mixing the CJK scripts. But in the case
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3089 of the Latin scripts this is wrong.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3090
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3091 Worse yet, it is very likely to occur when mixing ``different'' encodings
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3092 (such as ISO 8859/1 and ISO 8859/15) that differ only in a few code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3093 points that are almost never used. A very important example involves
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3094 email. Many sites, especially in the U.S., default to use of the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3095 8859/1 coded character set (also called ``Latin 1,'' though these are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3096 somewhat different concepts). However, ISO 8859/1 provides a generic
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3097 CURRENCY SIGN character. Now that the Euro has become the official
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3098 currency of most countries in Europe, this is unsatisfactory (and in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3099 practice, useless). So Europeans generally use ISO 8859/15, which is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3100 nearly identical to ISO 8859/1 for most languages, except that it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3101 substitutes EURO SIGN for CURRENCY SIGN.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3102
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3103 Suppose a European user yanks text from a post encoded in ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3104 into a message composition buffer, and enters some text including the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3105 Euro sign. Then Mule will consider the buffer to contain both ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3106 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3107 programmed) send the message as a multipart mixed MIME body!
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3108
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3109 This is clearly stupid. What is not as obvious is that, just as any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3110 European can include American English in their text because ASCII is a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3111 subset of ISO 8859/15, most European languages which use Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3112 characters (eg, German and Polish) can typically be mixed while using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3113 only one Latin coded character set (in the case of German and Polish,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3114 ISO 8859/2). However, this often depends on exactly what text is to be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3115 encoded (even for the same pair of languages).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3116
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3117 Unification works around the problem by converting as many characters as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3118 possible to use a single Latin coded character set before saving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3119 buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3120
5384 3889ef128488 Fix misspelled words, and some grammar, across the entire source tree. Jerry James <james@xemacs.org> parents: 3439 diff changeset	3121 Because the problem is rarely noticeable in editing a buffer, but tends
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3122 to manifest when that buffer is exported to a file or process, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3123 Unification package uses the strategy of examining the buffer prior to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3124 export. If use of multiple Latin coded character sets is detected,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3125 Unification attempts to unify them by finding a single coded character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3126 set which contains all of the Latin characters in the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3127
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3128 The primary purpose of Unification is to fix the problem by giving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3129 user the choice to change the representation of all characters to one
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3130 character set and give sensible recommendations based on context. In
1188 11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs] youngs parents: 1183 diff changeset	3131 the '�' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3132 both will be suggested. In the EURO SIGN example, only ISO 8859/15
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3133 makes sense, and that is what will be recommended. In both cases, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3134 user will be reminded that there are universal encodings available.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3135
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3136 I call this @dfn{remapping} (from the universal character set to a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3137 particular ISO 8859 coded character set). It is mere accident that this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3138 letter has the same code point in both character sets. (Not entirely,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3139 but there are many examples of Latin characters that have different code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3140 points in different Latin-X sets.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3141
1188 11ff4edb6bb7 [xemacs-hg @ 2003-01-05 10:53:58 by youngs] youngs parents: 1183 diff changeset	3142 Note that, in the '�' example, that treating the buffer in this way will
1183 c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3143 result in a representation such as [latin-iso8859-2
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3144 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3].
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3145 This is guaranteed to occasionally result in the second problem you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3146 observed, to which we now turn.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3147
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3148 This problem is that, although the file is intended to be an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3149 ISO-8859/2-encoded file, in an ISO 8859/1 locale Mule (and every POSIX
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3150 compliant program---this is required by the standard, obvious if you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3151 think a bit, @pxref{What Unification Cannot Do for You}) will read that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3152 file as [latin-iso8859-1 #x73] [latin-iso8859-1 #x73]. Of course this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3153 is no problem if all of the characters in the file are contained in ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3154 8859/1, but suppose there are some which are not, but are contained in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3155 the (intended) ISO 8859/2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3156
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3157 You now want to fix this, but not by finding the same character in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3158 another set. Instead, you want to simply change the character set that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3159 Mule associates with that buffer position without changing the code.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3160 (This is conceptually somewhat distinct from the first problem, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3161 logically ought to be handled in the code that defines coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3162 However, unification is not an unreasonable place for it.) Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3163 provides two functions (one fast and dangerous, the other slow and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3164 careful) to handle this. I call this @dfn{recoding}, because the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3165 transformation actually involves @emph{encoding} the buffer to file
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3166 representation, then @emph{decoding} it to buffer representation (in a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3167 different character set). This cannot be done automatically because
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3168 Mule can have no idea what the correct encoding is---after all, it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3169 already gave you its best guess. @xref{What Unification Cannot Do for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3170 You}. So these functions must be invoked by the user. @xref{Interactive
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3171 Usage}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3172
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3173
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3174 @node What Unification Cannot Do for You, Unification Internals, Theory of Operation, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3175 @subsection What Unification Cannot Do for You
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3176
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3177 Unification @strong{cannot} save you if you insist on exporting data in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3178 8-bit encodings in a multilingual environment. @emph{You will
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3179 eventually corrupt data if you do this.} It is not Mule's, or any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3180 application's, fault. You will have only yourself to blame; consider
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3181 yourself warned. (It is true that Mule has bugs, which make Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3182 somewhat more dangerous and inconvenient than some naive applications.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3183 We're working to address those, but no application can remedy the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3184 inherent defect of 8-bit encodings.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3185
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3186 Use standard universal encodings, preferably Unicode (UTF-8) unless
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3187 applicable standards indicate otherwise. The most important such case
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3188 is Internet messages, where MIME should be used, whether or not the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3189 subordinate encoding is a universal encoding. (Note that since one of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3190 the important provisions of MIME is the @samp{Content-Type} header,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3191 which has the charset parameter, MIME is to be considered a universal
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3192 encoding for the purposes of this manual. Of course, technically
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3193 speaking it's neither a coded character set nor a coding extension
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3194 technique compliant with ISO 2022.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3195
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3196 As mentioned earlier, the problem is that standard encodings suffer from
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3197 the design defect that they do not provide a reliable way to recognize
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3198 which coded character sets are in use. There are scores of character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3199 sets which can be represented by a single octet (8-bit byte), whose
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3200 union contains many hundreds of characters. Thus any 8-bit coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3201 character set must contain characters that share code points used for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3202 different characters in other coded character sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3203
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3204 This means that a given file's intended encoding cannot be identified
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3205 with 100% reliability unless it contains encoding markers such as those
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3206 provided by MIME or ISO 2022.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3207
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3208 Unification actually makes it more likely that you will have problems of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3209 this kind. Traditionally Mule has been ``helpful'' by simply using an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3210 ISO 2022 universal coding system when the current buffer coding system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3211 cannot handle all the characters in the buffer. This has the effect
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3212 that, because the file contains control sequences, it is not recognized
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3213 as being in the locale's normal 8-bit encoding. It may be annoying if
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3214 you are not a Mule expert, but your data is automatically recoverable
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3215 with a tool you already have: Mule.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3216
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3217 However, with unification, Mule converts to a single 8-bit character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3218 when possible. But typically this will @emph{not} be in your usual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3219 locale. Ie, the times that an ISO 8859/1 user will need Unification is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3220 when there are ISO 8859/2 characters in the buffer. But then most
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3221 likely the file will be saved in a pure 8-bit encoding that is not ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3222 8859/1, ie, ISO 8859/2. Mule's autorecognizer (which is probably the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3223 most sophisticated yet available) cannot tell the difference between ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3224 8859/1 and ISO 8859/2, and in a Western European locale will choose the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3225 former even though the latter was intended. Even the extension
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3226 (``statistical recognition'') planned for XEmacs 22 is unlikely to be at
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3227 all accurate in the case of mixed codes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3228
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3229 So now consider adding some additional ISO 8859/1 text to the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3230 If it includes any ISO 8859/1 codes that are used by different
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3231 characters in ISO 8859/2, you now have a file that cannot be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3232 mechanically disentangled. You need a human being who can recognize
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3233 that @emph{this is German and Swedish} and stays in Latin-1, while
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3234 @emph{that is Polish} and needs to be recoded to Latin-2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3235
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3236 Moral: switch to a universal coded character set, preferably Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3237 using the UTF-8 transformation format. If you really need the space,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3238 compress your files.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3239
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3240
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3241 @node Unification Internals, , What Unification Cannot Do for You, Charset Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3242 @subsection Internals
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3243
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3244 No internals documentation yet.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3245
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3246 @file{unity-utils.el} provides one utility function.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3247
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3248 @defun unity-dump-tables
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3249
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3250 Dump the temporary table created by loading @file{unity-utils.el}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3251 to @file{unity-tables.el}. Loading the latter file initializes
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3252 @samp{unity-equivalences}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3253 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3254
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3255
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3256 @node Charsets and Coding Systems, , Charset Unification, MULE
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3257 @subsection Charsets and Coding Systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3258
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3259 This section provides reference lists of Mule charsets and coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3260 systems. Mule charsets are typically named by character set and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3261 standard.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3262
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3263 @table @strong
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3264 @item ASCII variants
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3265
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3266 Identification of equivalent characters in these sets is not properly
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3267 implemented. Unification does not distinguish the two charsets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3268
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3269 @samp{ascii} @samp{latin-jisx0201}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3270
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3271 @item Extended Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3272
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3273 Characters from the following ISO 2022 conformant charsets are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3274 identified with equivalents in other charsets in the group by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3275 Unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3276
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3277 @samp{latin-iso8859-1} @samp{latin-iso8859-15} @samp{latin-iso8859-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3278 @samp{latin-iso8859-3} @samp{latin-iso8859-4} @samp{latin-iso8859-9}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3279 @samp{latin-iso8859-13} @samp{latin-iso8859-16}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3280
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3281 The follow charsets are Latin variants which are not understood by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3282 Unification. In addition, many of the Asian language standards provide
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3283 ASCII, at least, and sometimes other Latin characters. None of these
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3284 are identified with their ISO 8859 equivalents.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3285
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3286 @samp{vietnamese-viscii-lower}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3287 @samp{vietnamese-viscii-upper}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3288
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3289 @item Other character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3290
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3291 @samp{arabic-1-column}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3292 @samp{arabic-2-column}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3293 @samp{arabic-digit}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3294 @samp{arabic-iso8859-6}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3295 @samp{chinese-big5-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3296 @samp{chinese-big5-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3297 @samp{chinese-cns11643-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3298 @samp{chinese-cns11643-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3299 @samp{chinese-cns11643-3}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3300 @samp{chinese-cns11643-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3301 @samp{chinese-cns11643-5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3302 @samp{chinese-cns11643-6}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3303 @samp{chinese-cns11643-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3304 @samp{chinese-gb2312}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3305 @samp{chinese-isoir165}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3306 @samp{cyrillic-iso8859-5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3307 @samp{ethiopic}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3308 @samp{greek-iso8859-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3309 @samp{hebrew-iso8859-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3310 @samp{ipa}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3311 @samp{japanese-jisx0208}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3312 @samp{japanese-jisx0208-1978}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3313 @samp{japanese-jisx0212}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3314 @samp{katakana-jisx0201}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3315 @samp{korean-ksc5601}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3316 @samp{sisheng}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3317 @samp{thai-tis620}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3318 @samp{thai-xtis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3319
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3320 @item Non-graphic charsets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3321
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3322 @samp{control-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3323 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3324
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3325 @table @strong
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3326 @item No conversion
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3327
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3328 Some of these coding systems may specify EOL conventions. Note that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3329 @samp{iso-8859-1} is a no-conversion coding system, not an ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3330 coding system. Although unification attempts to compensate for this, it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3331 is possible that the @samp{iso-8859-1} coding system will behave
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3332 differently from other ISO 8859 coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3333
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3334 @samp{binary} @samp{no-conversion} @samp{raw-text} @samp{iso-8859-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3335
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3336 @item Latin coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3337
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3338 These coding systems are all single-byte, 8-bit ISO 2022 coding systems,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3339 combining ASCII in the GL register (bytes with high-bit clear) and an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3340 extended Latin character set in the GR register (bytes with high-bit set).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3341
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3342 @samp{iso-8859-15} @samp{iso-8859-2} @samp{iso-8859-3} @samp{iso-8859-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3343 @samp{iso-8859-9} @samp{iso-8859-13} @samp{iso-8859-14} @samp{iso-8859-16}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3344
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3345 These coding systems are single-byte, 8-bit coding systems that do not
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3346 conform to international standards. They should be avoided in all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3347 potentially multilingual contexts, including any text distributed over
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3348 the Internet and World Wide Web.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3349
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3350 @samp{windows-1251}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3351
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3352 @item Multilingual coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3353
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3354 The following ISO-2022-based coding systems are useful for multilingual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3355 text.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3356
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3357 @samp{ctext} @samp{iso-2022-lock} @samp{iso-2022-7} @samp{iso-2022-7bit}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3358 @samp{iso-2022-7bit-ss2} @samp{iso-2022-8} @samp{iso-2022-8bit-ss2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3359
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3360 XEmacs also supports Unicode with the Mule-UCS package. These are the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3361 preferred coding systems for multilingual use. (There is a possible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3362 exception for texts that mix several Asian ideographic character sets.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3363
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3364 @samp{utf-16-be} @samp{utf-16-be-no-signature} @samp{utf-16-le}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3365 @samp{utf-16-le-no-signature} @samp{utf-7} @samp{utf-7-safe}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3366 @samp{utf-8} @samp{utf-8-ws}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3367
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3368 Development versions of XEmacs (the 21.5 series) support Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3369 internally, with (at least) the following coding systems implemented:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3370
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3371 @samp{utf-16-be} @samp{utf-16-be-bom} @samp{utf-16-le}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3372 @samp{utf-16-le-bom} @samp{utf-8} @samp{utf-8-bom}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3373
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3374 @item Asian ideographic languages
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3375
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3376 The following coding systems are based on ISO 2022, and are more or less
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3377 suitable for encoding multilingual texts. They all can represent ASCII
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3378 at least, and sometimes several other foreign character sets, without
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3379 resort to arbitrary ISO 2022 designations. However, these subsets are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3380 not identified with the corresponding national standards in XEmacs Mule.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3381
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3382 @samp{chinese-euc} @samp{cn-big5} @samp{cn-gb-2312} @samp{gb2312}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3383 @samp{hz} @samp{hz-gb-2312} @samp{old-jis} @samp{japanese-euc}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3384 @samp{junet} @samp{euc-japan} @samp{euc-jp} @samp{iso-2022-jp}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3385 @samp{iso-2022-jp-1978-irv} @samp{iso-2022-jp-2} @samp{euc-kr}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3386 @samp{korean-euc} @samp{iso-2022-kr} @samp{iso-2022-int-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3387
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3388 The following coding systems cannot be used for general multilingual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3389 text and do not cooperate well with other coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3390
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3391 @samp{big5} @samp{shift_jis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3392
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3393 @item Other languages
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3394
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3395 The following coding systems are based on ISO 2022. Though none of them
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3396 provides any Latin characters beyond ASCII, XEmacs Mule allows (and up
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3397 to 21.4 defaults to) use of ISO 2022 control sequences to designate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3398 other character sets for inclusion the text.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3399
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3400 @samp{iso-8859-5} @samp{iso-8859-7} @samp{iso-8859-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3401 @samp{ctext-hebrew}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3402
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3403 The following are character sets that do not conform to ISO 2022 and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3404 thus cannot be safely used in a multilingual context.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3405
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3406 @samp{alternativnyj} @samp{koi8-r} @samp{tis-620} @samp{viqr}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3407 @samp{viscii} @samp{vscii}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3408
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3409 @item Special coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3410
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3411 Mule uses the following coding systems for special purposes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3412
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3413 @samp{automatic-conversion} @samp{undecided} @samp{escape-quoted}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3414
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3415 @samp{escape-quoted} is especially important, as it is used internally
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3416 as the coding system for autosaved data.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3417
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3418 The following coding systems are aliases for others, and are used for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3419 communication with the host operating system.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3420
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3421 @samp{file-name} @samp{keyboard} @samp{terminal}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3422
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3423 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3424
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3425 Mule detection of coding systems is actually limited to detection of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3426 classes of coding systems called @dfn{coding categories}. These coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3427 categories are identified by the ISO 2022 control sequences they use, if
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3428 any, by their conformance to ISO 2022 restrictions on code points that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3429 may be used, and by characteristic patterns of use of 8-bit code points.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3430
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3431 @samp{no-conversion}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3432 @samp{utf-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3433 @samp{ucs-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3434 @samp{iso-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3435 @samp{iso-lock-shift}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3436 @samp{iso-8-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3437 @samp{iso-8-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3438 @samp{iso-8-designate}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3439 @samp{shift-jis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3440 @samp{big5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3441
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3442
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3443 @c end of mule.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent] stephent parents: 901 diff changeset	3444

Mercurial > hg > xemacs-beta

annotate man/lispref/mule.texi @ 5840:93a18dbcfd8c