annotate man/xemacs/mule.texi @ 5576:071b810ceb18

Declare labels as line where appropriate; use #'labels, not #'flet, tests. lisp/ChangeLog addition: 2011-10-03 Aidan Kehoe <kehoea@parhasard.net> * simple.el (handle-pre-motion-command-current-command-is-motion): Implement #'keysyms-equal with #'labels + (declare (inline ...)), instead of abusing macrolet to the same end. * specifier.el (let-specifier): * mule/mule-cmds.el (describe-language-environment): * mule/mule-cmds.el (set-language-environment-coding-systems): * mule/mule-x-init.el (x-use-halfwidth-roman-font): * faces.el (Face-frob-property): * keymap.el (key-sequence-list-description): * lisp-mode.el (construct-lisp-mode-menu): * loadhist.el (unload-feature): * mouse.el (default-mouse-track-check-for-activation): Declare various labels inline in dumped files when that reduces the size of the dumped image. Declaring labels inline is normally only worthwhile for inner loops and so on, but it's reasonable exercise of the related code to have these changes in core. tests/ChangeLog addition: 2011-10-03 Aidan Kehoe <kehoea@parhasard.net> * automated/case-tests.el (uni-mappings): * automated/database-tests.el (delete-database-files): * automated/hash-table-tests.el (iterations): * automated/lisp-tests.el (test1): * automated/lisp-tests.el (a): * automated/lisp-tests.el (cl-floor): * automated/lisp-tests.el (foo): * automated/lisp-tests.el (list-nreverse): * automated/lisp-tests.el (needs-lexical-context): * automated/mule-tests.el (featurep): * automated/os-tests.el (original-string): * automated/os-tests.el (with): * automated/symbol-tests.el (check-weak-list-unique): Replace #'flet with #'labels where appropriate in these tests, following my own advice on style in the docstrings of those functions.
author Aidan Kehoe <kehoea@parhasard.net>
date Mon, 03 Oct 2011 20:16:14 +0100
parents 6b0000935adc
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1 @c This is part of the Emacs manual.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
2 @c Copyright (C) 1997 Free Software Foundation, Inc.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
3 @c See file emacs.texi for copying conditions.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
4 @node Mule, Major Modes, Windows, Top
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
5 @chapter World Scripts Support
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
6 @cindex MULE
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
7 @cindex international scripts
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
8 @cindex multibyte characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
9 @cindex encoding of characters
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
10
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
11 @cindex Chinese
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
12 @cindex Greek
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
13 @cindex IPA
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
14 @cindex Japanese
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
15 @cindex Korean
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
16 @cindex Cyrillic
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
17 @cindex Russian
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
18 @c #### It's a lie that this file tells you about Unicode....
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
19 @cindex Unicode
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
20 If you build XEmacs using the @code{--with-mule} option, it supports a
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
21 wide variety of world scripts, including the Latin script, the Arabic
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
22 script, Simplified Chinese (for mainland of China), Traditional Chinese
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
23 (for Taiwan and Hong-Kong), the Greek script, the Hebrew script, IPA
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
24 symbols, Japanese scripts (Hiragana, Katakana and Kanji), Korean scripts
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
25 (Hangul and Hanja) and the Cyrillic script (for Byelorussian, Bulgarian,
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
26 Russian, Serbian and Ukrainian). These features have been merged from
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
27 the modified version of Emacs known as MULE (for ``MULti-lingual
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
28 Enhancement to GNU Emacs'').
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
29
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
30 @menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
31 * Mule Intro:: Basic concepts of Mule.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
32 * Language Environments:: Setting things up for the language you use.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
33 * Input Methods:: Entering text characters not on your keyboard.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
34 * Select Input Method:: Specifying your choice of input methods.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
35 * Coding Systems:: Character set conversion when you read and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
36 write files, and so on.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
37 * Recognize Coding:: How XEmacs figures out which conversion to use.
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
38 * Unification:: Integrating overlapping character sets.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
39 * Specify Coding:: Various ways to choose which conversion to use.
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
40 * Charsets and Coding Systems:: Tables and other reference material.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
41 @end menu
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
42
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
43 @node Mule Intro, Language Environments, Mule, Mule
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
44 @section Introduction: The Wide Variety of Scripts and Codings in Use
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
45
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
46 There are hundreds of scripts in use world-wide. The users of these
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
47 scripts have established many more-or-less standard coding systems for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
48 storing text written in them in files. XEmacs translates between its
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
49 internal character encoding and various other coding systems when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
50 reading and writing files, when exchanging data with subprocesses, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
51 (in some cases) in the @kbd{C-q} command (see below).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
52 @footnote{Historically the internal encoding was a specially designed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
53 encoding, called @dfn{Mule encoding}, intended for easy conversion to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
54 and from versions of ISO 2022. However, this encoding shares many
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
55 properties with UTF-8, and conversion to UTF-8 as the internal code is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
56 proposed.}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
57
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
58 @kindex C-h h
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
59 @findex view-hello-file
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
60 The command @kbd{C-h h} (@code{view-hello-file}) displays the file
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
61 @file{etc/HELLO}, which shows how to say ``hello'' in many languages.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
62 This illustrates various scripts.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
63
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
64 Keyboards, even in the countries where these character sets are used,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
65 generally don't have keys for all the characters in them. So XEmacs
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
66 supports various @dfn{input methods}, typically one for each script or
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
67 language, to make it convenient to type them.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
68
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
69 @kindex C-x RET
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
70 The prefix key @kbd{C-x @key{RET}} is used for commands that pertain
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
71 to world scripts, coding systems, and input methods.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
72
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
73
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
74 @node Language Environments, Input Methods, Mule Intro, Mule
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
75 @section Language Environments
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
76 @cindex language environments
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
77
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
78 All supported character sets are supported in XEmacs buffers if it is
873
26f7cf2a4792 [xemacs-hg @ 2002-06-20 21:39:19 by adrian]
adrian
parents: 602
diff changeset
79 compiled with mule; there is no need to select a particular language in
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
80 order to display its characters in an XEmacs buffer. However, it is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
81 important to select a @dfn{language environment} in order to set various
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
82 defaults. The language environment really represents a choice of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
83 preferred script (more or less) rather that a choice of language.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
84
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
85 The language environment controls which coding systems to recognize
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
86 when reading text (@pxref{Recognize Coding}). This applies to files,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
87 incoming mail, netnews, and any other text you read into XEmacs. It may
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
88 also specify the default coding system to use when you create a file.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
89 Each language environment also specifies a default input method.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
90
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
91 @findex set-language-environment
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
92 The command to select a language environment is @kbd{M-x
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
93 set-language-environment}. It makes no difference which buffer is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
94 current when you use this command, because the effects apply globally to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
95 the XEmacs session. The supported language environments include:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
96
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
97 @quotation
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
98 ASCII, Chinese-BIG5, Chinese-GB, Croatian, Cyrillic-ALT, Cyrillic-ISO,
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
99 Cyrillic-KOI8, Cyrillic-Win, Czech, English, Ethiopic, French, German,
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
100 Greek, Hebrew, IPA, Japanese, Korean, Latin-1, Latin-2, Latin-3, Latin-4,
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
101 Latin-5, Norwegian, Polish, Romanian, Slovenian, Thai-XTIS, Vietnamese.
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
102 @end quotation
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
103
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
104 Some operating systems let you specify the language you are using by
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
105 setting locale environment variables. XEmacs handles one common special
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
106 case of this: if your locale name for character types contains the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
107 string @samp{8859-@var{n}}, XEmacs automatically selects the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
108 corresponding language environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
109
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
110 @kindex C-h L
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
111 @findex describe-language-environment
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
112 To display information about the effects of a certain language
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
113 environment @var{lang-env}, use the command @kbd{C-h L @var{lang-env}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
114 @key{RET}} (@code{describe-language-environment}). This tells you which
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
115 languages this language environment is useful for, and lists the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
116 character sets, coding systems, and input methods that go with it. It
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
117 also shows some sample text to illustrate scripts used in this language
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
118 environment. By default, this command describes the chosen language
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
119 environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
120
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
121 @node Input Methods, Select Input Method, Language Environments, Mule
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
122 @section Input Methods
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
123
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
124 @cindex input methods
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
125 An @dfn{input method} is a kind of character conversion designed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
126 specifically for interactive input. In XEmacs, typically each language
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
127 has its own input method; sometimes several languages which use the same
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
128 characters can share one input method. A few languages support several
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
129 input methods.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
130
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
131 The simplest kind of input method works by mapping ASCII letters into
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
132 another alphabet. This is how the Greek and Russian input methods work.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
133
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
134 A more powerful technique is composition: converting sequences of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
135 characters into one letter. Many European input methods use composition
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
136 to produce a single non-ASCII letter from a sequence that consists of a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
137 letter followed by accent characters. For example, some methods convert
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
138 the sequence @kbd{'a} into a single accented letter.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
139
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
140 The input methods for syllabic scripts typically use mapping followed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
141 by composition. The input methods for Thai and Korean work this way.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
142 First, letters are mapped into symbols for particular sounds or tone
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
143 marks; then, sequences of these which make up a whole syllable are
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
144 mapped into one syllable sign.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
145
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
146 Chinese and Japanese require more complex methods. In Chinese input
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
147 methods, first you enter the phonetic spelling of a Chinese word (in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
148 input method @code{chinese-py}, among others), or a sequence of portions
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
149 of the character (input methods @code{chinese-4corner} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
150 @code{chinese-sw}, and others). Since one phonetic spelling typically
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
151 corresponds to many different Chinese characters, you must select one of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
152 the alternatives using special XEmacs commands. Keys such as @kbd{C-f},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
153 @kbd{C-b}, @kbd{C-n}, @kbd{C-p}, and digits have special definitions in
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
154 this situation, used for selecting among the alternatives. @key{TAB}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
155 displays a buffer showing all the possibilities.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
156
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
157 In Japanese input methods, first you input a whole word using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
158 phonetic spelling; then, after the word is in the buffer, XEmacs
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
159 converts it into one or more characters using a large dictionary. One
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
160 phonetic spelling corresponds to many differently written Japanese
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
161 words, so you must select one of them; use @kbd{C-n} and @kbd{C-p} to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
162 cycle through the alternatives.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
163
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
164 Sometimes it is useful to cut off input method processing so that the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
165 characters you have just entered will not combine with subsequent
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
166 characters. For example, in input method @code{latin-1-postfix}, the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
167 sequence @kbd{e '} combines to form an @samp{e} with an accent. What if
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
168 you want to enter them as separate characters?
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
169
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
170 One way is to type the accent twice; that is a special feature for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
171 entering the separate letter and accent. For example, @kbd{e ' '} gives
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
172 you the two characters @samp{e'}. Another way is to type another letter
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
173 after the @kbd{e}---something that won't combine with that---and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
174 immediately delete it. For example, you could type @kbd{e e @key{DEL}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
175 '} to get separate @samp{e} and @samp{'}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
176
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
177 Another method, more general but not quite as easy to type, is to use
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
178 @kbd{C-\ C-\} between two characters to stop them from combining. This
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
179 is the command @kbd{C-\} (@code{toggle-input-method}) used twice.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
180 @ifinfo
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
181 @xref{Select Input Method}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
182 @end ifinfo
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
183
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
184 @kbd{C-\ C-\} is especially useful inside an incremental search,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
185 because stops waiting for more characters to combine, and starts
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
186 searching for what you have already entered.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
187
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
188 @vindex input-method-verbose-flag
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
189 @vindex input-method-highlight-flag
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
190 The variables @code{input-method-highlight-flag} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
191 @code{input-method-verbose-flag} control how input methods explain what
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
192 is happening. If @code{input-method-highlight-flag} is non-@code{nil},
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
193 the partial sequence is highlighted in the buffer. If
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
194 @code{input-method-verbose-flag} is non-@code{nil}, the list of possible
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
195 characters to type next is displayed in the echo area (but not when you
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
196 are in the minibuffer).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
197
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
198 @node Select Input Method, Coding Systems, Input Methods, Mule
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
199 @section Selecting an Input Method
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
200
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
201 @table @kbd
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
202 @item C-\
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
203 Enable or disable use of the selected input method.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
204
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
205 @item C-x @key{RET} C-\ @var{method} @key{RET}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
206 Select a new input method for the current buffer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
207
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
208 @item C-h I @var{method} @key{RET}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
209 @itemx C-h C-\ @var{method} @key{RET}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
210 @findex describe-input-method
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
211 @kindex C-h I
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
212 @kindex C-h C-\
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
213 Describe the input method @var{method} (@code{describe-input-method}).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
214 By default, it describes the current input method (if any).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
215
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
216 @item M-x list-input-methods
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
217 Display a list of all the supported input methods.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
218 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
219
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
220 @findex select-input-method
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
221 @vindex current-input-method
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
222 @kindex C-x RET C-\
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
223 To choose an input method for the current buffer, use @kbd{C-x
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
224 @key{RET} C-\} (@code{select-input-method}). This command reads the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
225 input method name with the minibuffer; the name normally starts with the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
226 language environment that it is meant to be used with. The variable
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
227 @code{current-input-method} records which input method is selected.
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
228
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
229 @findex toggle-input-method
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
230 @kindex C-\
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
231 Input methods use various sequences of ASCII characters to stand for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
232 non-ASCII characters. Sometimes it is useful to turn off the input
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
233 method temporarily. To do this, type @kbd{C-\}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
234 (@code{toggle-input-method}). To reenable the input method, type
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
235 @kbd{C-\} again.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
236
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
237 If you type @kbd{C-\} and you have not yet selected an input method,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
238 it prompts for you to specify one. This has the same effect as using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
239 @kbd{C-x @key{RET} C-\} to specify an input method.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
240
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
241 @vindex default-input-method
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
242 Selecting a language environment specifies a default input method for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
243 use in various buffers. When you have a default input method, you can
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
244 select it in the current buffer by typing @kbd{C-\}. The variable
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
245 @code{default-input-method} specifies the default input method
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
246 (@code{nil} means there is none).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
247
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
248 @findex quail-set-keyboard-layout
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
249 Some input methods for alphabetic scripts work by (in effect)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
250 remapping the keyboard to emulate various keyboard layouts commonly used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
251 for those scripts. How to do this remapping properly depends on your
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
252 actual keyboard layout. To specify which layout your keyboard has, use
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
253 the command @kbd{M-x quail-set-keyboard-layout}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
254
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
255 @findex list-input-methods
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
256 To display a list of all the supported input methods, type @kbd{M-x
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
257 list-input-methods}. The list gives information about each input
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
258 method, including the string that stands for it in the mode line.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
259
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
260 @node Coding Systems, Recognize Coding, Select Input Method, Mule
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
261 @section Coding Systems
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
262 @cindex coding systems
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
263
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
264 Users of various languages have established many more-or-less standard
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
265 coding systems for representing them. XEmacs does not use these coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
266 systems internally; instead, it converts from various coding systems to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
267 its own system when reading data, and converts the internal coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
268 system to other coding systems when writing data. Conversion is
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
269 possible in reading or writing files, in sending or receiving from the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
270 terminal, and in exchanging data with subprocesses.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
271
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
272 XEmacs assigns a name to each coding system. Most coding systems are
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
273 used for one language, and the name of the coding system starts with the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
274 language name. Some coding systems are used for several languages;
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
275 their names usually start with @samp{iso}. There are also special
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
276 coding systems @code{binary} and @code{no-conversion} which do not
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
277 convert printing characters at all.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
278
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
279 In addition to converting various representations of non-ASCII
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
280 characters, a coding system can perform end-of-line conversion. XEmacs
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
281 handles three different conventions for how to separate lines in a file:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
282 newline, carriage-return linefeed, and just carriage-return.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
283
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
284 @table @kbd
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
285 @item C-x @key{RET} C @var{coding} @key{RET}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
286 Describe coding system @var{coding}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
287
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
288 @item C-x @key{RET} C @key{RET}
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
289 Describe the coding systems currently in use.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
290
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
291 @item M-x list-coding-systems
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
292 Display a list of all the supported coding systems.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
293
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
294 @item C-u M-x list-coding-systems
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
295 Display comprehensive list of specific details of all supported coding
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
296 systems.
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
297
602
b9f1a2e84ead [xemacs-hg @ 2001-06-01 08:17:05 by martinb]
martinb
parents: 600
diff changeset
298 @end table
b9f1a2e84ead [xemacs-hg @ 2001-06-01 08:17:05 by martinb]
martinb
parents: 600
diff changeset
299
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
300 @kindex C-x RET C
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
301 @findex describe-coding-system
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
302 The command @kbd{C-x RET C} (@code{describe-coding-system}) displays
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
303 information about particular coding systems. You can specify a coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
304 system name as argument; alternatively, with an empty argument, it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
305 describes the coding systems currently selected for various purposes,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
306 both in the current buffer and as the defaults, and the priority list
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
307 for recognizing coding systems (@pxref{Recognize Coding}).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
308
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
309 @findex list-coding-systems
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
310 To display a list of all the supported coding systems, type @kbd{M-x
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
311 list-coding-systems}. The list gives information about each coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
312 system, including the letter that stands for it in the mode line
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
313 (@pxref{Mode Line}).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
314
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
315 Each of the coding systems that appear in this list---except for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
316 @code{binary}, which means no conversion of any kind---specifies how and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
317 whether to convert printing characters, but leaves the choice of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
318 end-of-line conversion to be decided based on the contents of each file.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
319 For example, if the file appears to use carriage-return linefeed between
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
320 lines, that end-of-line conversion will be used.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
321
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
322 Each of the listed coding systems has three variants which specify
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
323 exactly what to do for end-of-line conversion:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
324
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
325 @table @code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
326 @item @dots{}-unix
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
327 Don't do any end-of-line conversion; assume the file uses
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
328 newline to separate lines. (This is the convention normally used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
329 on Unix and GNU systems.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
330
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
331 @item @dots{}-dos
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
332 Assume the file uses carriage-return linefeed to separate lines,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
333 and do the appropriate conversion. (This is the convention normally used
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
334 on Microsoft systems.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
335
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
336 @item @dots{}-mac
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
337 Assume the file uses carriage-return to separate lines, and do the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
338 appropriate conversion. (This is the convention normally used on the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
339 Macintosh system.)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
340 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
341
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
342 These variant coding systems are omitted from the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
343 @code{list-coding-systems} display for brevity, since they are entirely
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
344 predictable. For example, the coding system @code{iso-8859-1} has
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
345 variants @code{iso-8859-1-unix}, @code{iso-8859-1-dos} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
346 @code{iso-8859-1-mac}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
347
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
348 In contrast, the coding system @code{binary} specifies no character
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
349 code conversion at all---none for non-Latin-1 byte values and none for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
350 end of line. This is useful for reading or writing binary files, tar
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
351 files, and other files that must be examined verbatim.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
352
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
353 The easiest way to edit a file with no conversion of any kind is with
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
354 the @kbd{M-x find-file-literally} command. This uses @code{binary}, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
355 also suppresses other XEmacs features that might convert the file
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
356 contents before you see them. @xref{Visiting}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
357
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
358 The coding system @code{no-conversion} means that the file contains
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
359 non-Latin-1 characters stored with the internal XEmacs encoding. It
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
360 handles end-of-line conversion based on the data encountered, and has
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
361 the usual three variants to specify the kind of end-of-line conversion.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
362
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
363
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
364 @node Recognize Coding, Unification, Coding Systems, Mule
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
365 @section Recognizing Coding Systems
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
366
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
367 Most of the time, XEmacs can recognize which coding system to use for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
368 any given file--once you have specified your preferences.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
369
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
370 Some coding systems can be recognized or distinguished by which byte
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
371 sequences appear in the data. However, there are coding systems that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
372 cannot be distinguished, not even potentially. For example, there is no
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
373 way to distinguish between Latin-1 and Latin-2; they use the same byte
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
374 values with different meanings.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
375
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
376 XEmacs handles this situation by means of a priority list of coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
377 systems. Whenever XEmacs reads a file, if you do not specify the coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
378 system to use, XEmacs checks the data against each coding system,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
379 starting with the first in priority and working down the list, until it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
380 finds a coding system that fits the data. Then it converts the file
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
381 contents assuming that they are represented in this coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
382
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
383 The priority list of coding systems depends on the selected language
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
384 environment (@pxref{Language Environments}). For example, if you use
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
385 French, you probably want XEmacs to prefer Latin-1 to Latin-2; if you
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
386 use Czech, you probably want Latin-2 to be preferred. This is one of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
387 the reasons to specify a language environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
388
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
389 @findex prefer-coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
390 However, you can alter the priority list in detail with the command
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
391 @kbd{M-x prefer-coding-system}. This command reads the name of a coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
392 system from the minibuffer, and adds it to the front of the priority
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
393 list, so that it is preferred to all others. If you use this command
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
394 several times, each use adds one element to the front of the priority
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
395 list.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
396
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
397 @vindex file-coding-system-alist
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
398 Sometimes a file name indicates which coding system to use for the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
399 file. The variable @code{file-coding-system-alist} specifies this
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
400 correspondence. There is a special function
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
401 @code{modify-coding-system-alist} for adding elements to this list. For
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
402 example, to read and write all @samp{.txt} using the coding system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
403 @code{china-iso-8bit}, you can execute this Lisp expression:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
404
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
405 @smallexample
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
406 (modify-coding-system-alist 'file "\\.txt\\'" 'china-iso-8bit)
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
407 @end smallexample
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
408
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
409 @noindent
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
410 The first argument should be @code{file}, the second argument should be
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
411 a regular expression that determines which files this applies to, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
412 the third argument says which coding system to use for these files.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
413
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
414 @vindex coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
415 You can specify the coding system for a particular file using the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
416 @samp{-*-@dots{}-*-} construct at the beginning of a file, or a local
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
417 variables list at the end (@pxref{File Variables}). You do this by
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
418 defining a value for the ``variable'' named @code{coding}. XEmacs does
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
419 not really have a variable @code{coding}; instead of setting a variable,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
420 it uses the specified coding system for the file. For example,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
421 @samp{-*-mode: C; coding: iso-8859-1;-*-} specifies use of the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
422 iso-8859-1 coding system, as well as C mode.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
423
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
424 @vindex buffer-file-coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
425 Once XEmacs has chosen a coding system for a buffer, it stores that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
426 coding system in @code{buffer-file-coding-system} and uses that coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
427 system, by default, for operations that write from this buffer into a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
428 file. This includes the commands @code{save-buffer} and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
429 @code{write-region}. If you want to write files from this buffer using
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
430 a different coding system, you can specify a different coding system for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
431 the buffer using @code{set-buffer-file-coding-system} (@pxref{Specify
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
432 Coding}).
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
433
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
434
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
435 @node Unification, Specify Coding, Recognize Coding, Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
436 @section Character Set Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
437
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
438 Mule suffers from a design defect that causes it to consider the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
439 Latin character sets to be disjoint. This results in oddities such as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
440 files containing both ISO 8859/1 and ISO 8859/15 codes, and using ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
441 2022 control sequences to switch between them, as well as more
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
442 plausible but often unnecessary combinations like ISO 8859/1 with ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
443 8859/2. This can be very annoying when sending messages or even in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
444 simple editing on a single host. XEmacs works around the problem by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
445 converting as many characters as possible to use a single Latin coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
446 character set before saving the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
447
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
448 Unification is planned for extension to other character set families,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
449 in particular the Han family of character sets based on the Chinese
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
450 ideographic characters. At least for the Han sets, however, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
451 unification feature will be disabled by default.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
452
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
453 This functionality is based on the @file{latin-unity} package by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
454 Stephen Turnbull @email{stephen@@xemacs.org}, but is somewhat
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
455 divergent. This documentation is also based on the package
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
456 documentation, and is likely to be inaccurate because of the different
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
457 constraints we place on ``core'' and packaged functionality.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
458
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
459 @menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
460 * Unification Overview:: History and general information.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
461 * Unification Usage:: An overview of operation.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
462 * Unification Configuration:: Configuring unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
463 * Unification FAQs:: Questions and answers from the mailing list.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
464 * Unification Theory:: How unification works.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
465 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
466 @end menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
467
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
468 @node Unification Overview, Unification Usage, Unification, Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
469 @subsection An Overview of Character Set Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
470
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
471 Mule suffers from a design defect that causes it to consider the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
472 Latin character sets to be disjoint. This manifests itself when a user
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
473 enters characters using input methods associated with different coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
474 character sets into a single buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
475
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
476 A very important example involves email. Many sites, especially in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
477 U.S., default to use of the ISO 8859/1 coded character set (also called
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
478 ``Latin 1,'' though these are somewhat different concepts). However,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
479 ISO 8859/1 provides a generic CURRENCY SIGN character. Now that the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
480 Euro has become the official currency of most countries in Europe, this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
481 is unsatisfactory (and in practice, useless). So Europeans generally
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
482 use ISO 8859/15, which is nearly identical to ISO 8859/1 for most
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
483 languages, except that it substitutes EURO SIGN for CURRENCY SIGN.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
484
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
485 Suppose a European user yanks text from a post encoded in ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
486 into a message composition buffer, and enters some text including the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
487 Euro sign. Then Mule will consider the buffer to contain both ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
488 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
489 programmed) send the message as a multipart mixed MIME body!
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
490
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
491 This is clearly stupid. What is not as obvious is that, just as any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
492 European can include American English in their text because ASCII is a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
493 subset of ISO 8859/15, most European languages which use Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
494 characters (eg, German and Polish) can typically be mixed while using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
495 only one Latin coded character set (in this case, ISO 8859/2). However,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
496 this often depends on exactly what text is to be encoded.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
497
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
498 Unification works around the problem by converting as many characters as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
499 possible to use a single Latin coded character set before saving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
500 buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
501
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
502
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
503 @node Unification Usage, Unification Configuration, Unification Overview, Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
504 @subsection Operation of Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
505
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
506 This is a description of the early hack to include unification in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
507 XEmacs 21.5. This will almost surely change.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
508
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
509 Normally, unification works in the background by installing
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
510 @code{unity-sanity-check} on @code{write-region-pre-hook}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
511 Unification is on by default for the ISO-8859 Latin sets. The user
4488
6b0000935adc Spelling fixes.
"Ville Skyttä <scop@xemacs.org>"
parents: 1183
diff changeset
512 activates this functionality for other character set families by
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
513 invoking @code{enable-unification}, either interactively or in her
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
514 init file. @xref{Init File, , , xemacs}. Unification can be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
515 deactivated by invoking @code{disable-unification}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
516
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
517 Unification also provides a few functions for remapping or recoding the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
518 buffer by hand. To @dfn{remap} a character means to change the buffer
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
519 representation of the character by using another coded character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
520 Remapping never changes the identity of the character, but may involve
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
521 altering the code point of the character. To @dfn{recode} a character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
522 means to simply change the coded character set. Recoding never alters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
523 the code point of the character, but may change the identity of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
524 character. @xref{Unification Theory}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
525
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
526 There are a few variables which determine which coding systems are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
527 always acceptable to unification: @code{unity-ucs-list},
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
528 @code{unity-preferred-coding-system-list}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
529 @code{unity-preapproved-coding-system-list}. The last defaults to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
530 @code{(buffer preferred)}, and you should probably avoid changing it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
531 because it short-circuits the sanity check. If you find you need to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
532 use it, consider reporting it as a bug or request for enhancement.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
533
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
534 @menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
535 * Basic Functionality:: User interface and customization.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
536 * Interactive Usage:: Treating text by hand.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
537 Also documents the hook function(s).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
538 @end menu
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
539
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
540
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
541 @node Basic Functionality, Interactive Usage, , Unification Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
542 @subsubsection Basic Functionality
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
543
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
544 These functions and user options initialize and configure unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
545 In normal use, they are not needed.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
546
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
547 @strong{These interfaces will change. Also, the @samp{unity-} prefix
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
548 is likely to be changed for many of the variables and functions, as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
549 they are of more general usefulness.}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
550
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
551 @defun enable-unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
552 Set up hooks and initialize variables for unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
553
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
554 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
555
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
556 This function is idempotent. It will reinitialize any hooks or variables
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
557 that are not in initial state.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
558 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
559
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
560 @defun disable-unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
561 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
562
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
563 Clean up hooks and void variables used by unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
564 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
565
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
566 @c #### several changes should go to latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
567 @defopt unity-ucs-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
568 List of universal coding systems recommended for character set unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
569
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
570 The default value is @code{'(utf-8 iso-2022-7 ctext escape-quoted)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
571
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
572 Order matters; coding systems earlier in the list will be preferred when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
573 recommending a coding system. These coding systems will not be used
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
574 without querying the user (unless they are also present in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
575 @code{unity-preapproved-coding-system-list}), and follow the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
576 @code{unity-preferred-coding-system-list} in the list of suggested
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
577 coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
578
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
579 If none of the preferred coding systems are feasible, the first in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
580 this list will be the default.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
581
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
582 Notes on certain coding systems: @code{escape-quoted} is a special
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
583 coding system used for autosaves and compiled Lisp in Mule. You should
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
584 never delete this, although it is rare that a user would want to use it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
585 directly. Unification does not try to be ``smart'' about other general
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
586 ISO 2022 coding systems, such as ISO-2022-JP. (They are not recognized
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
587 as equivalent to @code{iso-2022-7}.) If your preferred coding system is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
588 one of these, you may consider adding it to @code{unity-ucs-list}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
589 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
590
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
591 Coding systems which are not Latin and not in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
592 @code{unity-ucs-list} are handled by short circuiting checks of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
593 coding system against the next two variables.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
594
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
595 @defopt unity-preapproved-coding-system-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
596 List of coding systems used without querying the user if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
597
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
598 The default value is @samp{(buffer-default preferred)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
599
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
600 The first feasible coding system in this list is used. The special values
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
601 @samp{preferred} and @samp{buffer-default} may be present:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
602
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
603 @table @code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
604 @item buffer-default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
605 Use the coding system used by @samp{write-region}, if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
606
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
607 @item preferred
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
608 Use the coding system specified by @samp{prefer-coding-system} if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
609 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
610
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
611 "Feasible" means that all characters in the buffer can be represented by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
612 the coding system. Coding systems in @samp{unity-ucs-list} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
613 always considered feasible. Other feasible coding systems are computed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
614 by @samp{unity-representations-feasible-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
615
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
616 Note that, by definition, the first universal coding system in this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
617 list shadows all other coding systems. In particular, if your
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
618 preferred coding system is a universal coding system, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
619 @code{preferred} is a member of this list, unification will blithely
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
620 convert all your files to that coding system. This is considered a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
621 feature, but it may surprise most users. Users who don't like this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
622 behavior may put @code{preferred} in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
623 @code{unity-preferred-coding-system-list}, but not in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
624 @code{unity-preapproved-coding-system-list}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
625 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
626
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
627
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
628 @defopt unity-preferred-coding-system-list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
629 List of coding systems suggested to the user if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
630
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
631 The default value is @samp{(iso-8859-1 iso-8859-15 iso-8859-2 iso-8859-3
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
632 iso-8859-4 iso-8859-9)}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
633
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
634 If none of the coding systems in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
635 @samp{unity-preapproved-coding-system-list} are feasible, this list
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
636 will be recommended to the user, followed by the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
637 @samp{unity-ucs-list} (so those coding systems should not be in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
638 this list). The first coding system in this list is default. The
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
639 special values @samp{preferred} and @samp{buffer-default} may be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
640 present:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
641
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
642 @table @code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
643 @item buffer-default
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
644 Use the coding system used by @samp{write-region}, if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
645
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
646 @item preferred
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
647 Use the coding system specified by @samp{prefer-coding-system} if feasible.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
648 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
649
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
650 "Feasible" means that all characters in the buffer can be represented by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
651 the coding system. Coding systems in @samp{unity-ucs-list} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
652 always considered feasible. Other feasible coding systems are computed
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
653 by @samp{unity-representations-feasible-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
654 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
655
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
656
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
657 @defvar unity-iso-8859-1-aliases
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
658 List of coding systems to be treated as aliases of ISO 8859/1.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
659
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
660 The default value is '(iso-8859-1).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
661
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
662 This is not a user variable; to customize input of coding systems or
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
663 charsets, @samp{unity-coding-system-alias-alist} or
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
664 @samp{unity-charset-alias-alist}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
665 @end defvar
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
666
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
667
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
668 @node Interactive Usage, , Basic Functionality, Unification Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
669 @subsubsection Interactive Usage
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
670
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
671 First, the hook function @code{unity-sanity-check} is documented.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
672 (It is placed here because it is not an interactive function, and there
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
673 is not yet a programmer's section of the manual.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
674
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
675 These functions provide access to internal functionality (such as the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
676 remapping function) and to extra functionality (the recoding functions
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
677 and the test function).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
678
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
679 @defun unity-sanity-check begin end filename append visit lockname &optional coding-system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
680
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
681 Check if @var{coding-system} can represent all characters between
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
682 @var{begin} and @var{end}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
683
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
684 For compatibility with old broken versions of @code{write-region},
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
685 @var{coding-system} defaults to @code{buffer-file-coding-system}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
686 @var{filename}, @var{append}, @var{visit}, and @var{lockname} are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
687 ignored.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
688
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
689 Return nil if buffer-file-coding-system is not (ISO-2022-compatible)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
690 Latin. If @code{buffer-file-coding-system} is safe for the charsets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
691 actually present in the buffer, return it. Otherwise, ask the user to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
692 choose a coding system, and return that.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
693
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
694 This function does @emph{not} do the safe thing when
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
695 @code{buffer-file-coding-system} is nil (aka no-conversion). It
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
696 considers that ``non-Latin,'' and passes it on to the Mule detection
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
697 mechanism.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
698
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
699 This function is intended for use as a @code{write-region-pre-hook}. It
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
700 does nothing except return @var{coding-system} if @code{write-region}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
701 handlers are inhibited.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
702 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
703
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
704 @defun unity-buffer-representations-feasible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
705 There are no arguments.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
706
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
707 Apply unity-region-representations-feasible to the current buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
708 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
709
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
710 @defun unity-region-representations-feasible begin end &optional buf
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
711 Return character sets that can represent the text from @var{begin} to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
712 @var{end} in @var{buf}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
713
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
714 @c #### Fix in latin-unity.texi.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
715 @var{buf} defaults to the current buffer. Called interactively, will be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
716 applied to the region. The function assumes @var{begin} <= @var{end}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
717
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
718 The return value is a cons. The car is the list of character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
719 that can individually represent all of the non-ASCII portion of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
720 buffer, and the cdr is the list of character sets that can
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
721 individually represent all of the ASCII portion.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
722
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
723 The following is taken from a comment in the source. Please refer to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
724 the source to be sure of an accurate description.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
725
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
726 The basic algorithm is to map over the region, compute the set of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
727 charsets that can represent each character (the ``feasible charset''),
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
728 and take the intersection of those sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
729
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
730 The current implementation takes advantage of the fact that ASCII
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
731 characters are common and cannot change asciisets. Then using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
732 skip-chars-forward makes motion over ASCII subregions very fast.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
733
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
734 This same strategy could be applied generally by precomputing classes
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
735 of characters equivalent according to their effect on latinsets, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
736 adding a whole class to the skip-chars-forward string once a member is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
737 found.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
738
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
739 Probably efficiency is a function of the number of characters matched,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
740 or maybe the length of the match string? With @code{skip-category-forward}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
741 over a precomputed category table it should be really fast. In practice
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
742 for Latin character sets there are only 29 classes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
743 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
744
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
745 @defun unity-remap-region begin end character-set &optional coding-system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
746
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
747 Remap characters between @var{begin} and @var{end} to equivalents in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
748 @var{character-set}. Optional argument @var{coding-system} may be a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
749 coding system name (a symbol) or nil. Characters with no equivalent are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
750 left as-is.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
751
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
752 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
753 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
754 prompts for @var{character-set}. The function does completion, knows
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
755 how to guess a character set name from a coding system name, and also
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
756 provides some common aliases. See @code{unity-guess-charset}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
757 There is no way to specify @var{coding-system}, as it has no useful
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
758 function interactively.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
759
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
760 Return @var{coding-system} if @var{coding-system} can encode all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
761 characters in the region, t if @var{coding-system} is nil and the coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
762 system with G0 = 'ascii and G1 = @var{character-set} can encode all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
763 characters, and otherwise nil. Note that a non-null return does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
764 @emph{not} mean it is safe to write the file, only the specified region.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
765 (This behavior is useful for multipart MIME encoding and the like.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
766
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
767 Note: by default this function is quite fascist about universal coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
768 systems. It only admits @samp{utf-8}, @samp{iso-2022-7}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
769 @samp{ctext}. Customize @code{unity-approved-ucs-list} to change
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
770 this.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
771
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
772 This function remaps characters that are artificially distinguished by Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
773 internal code. It may change the code point as well as the character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
774 To recode characters that were decoded in the wrong coding system, use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
775 @code{unity-recode-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
776 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
777
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
778 @defun unity-recode-region begin end wrong-cs right-cs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
779
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
780 Recode characters between @var{begin} and @var{end} from @var{wrong-cs}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
781 to @var{right-cs}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
782
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
783 @var{wrong-cs} and @var{right-cs} are character sets. Characters retain
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
784 the same code point but the character set is changed. Only characters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
785 from @var{wrong-cs} are changed to @var{right-cs}. The identity of the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
786 character may change. Note that this could be dangerous, if characters
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
787 whose identities you do not want changed are included in the region.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
788 This function cannot guess which characters you want changed, and which
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
789 should be left alone.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
790
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
791 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
792 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
793 prompts for @var{wrong-cs} and @var{right-cs}. The function does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
794 completion, knows how to guess a character set name from a coding system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
795 name, and also provides some common aliases. See
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
796 @code{unity-guess-charset}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
797
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
798 Another way to accomplish this, but using coding systems rather than
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
799 character sets to specify the desired recoding, is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
800 @samp{unity-recode-coding-region}. That function may be faster
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
801 but is somewhat more dangerous, because it may recode more than one
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
802 character set.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
803
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
804 To change from one Mule representation to another without changing identity
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
805 of any characters, use @samp{unity-remap-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
806 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
807
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
808 @defun unity-recode-coding-region begin end wrong-cs right-cs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
809
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
810 Recode text between @var{begin} and @var{end} from @var{wrong-cs} to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
811 @var{right-cs}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
812
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
813 @var{wrong-cs} and @var{right-cs} are coding systems. Characters retain
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
814 the same code point but the character set is changed. The identity of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
815 characters may change. This is an inherently dangerous function;
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
816 multilingual text may be recoded in unexpected ways. #### It's also
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
817 dangerous because the coding systems are not sanity-checked in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
818 current implementation.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
819
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
820 When called interactively, @var{begin} and @var{end} are set to the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
821 beginning and end, respectively, of the active region, and the function
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
822 prompts for @var{wrong-cs} and @var{right-cs}. The function does
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
823 completion, knows how to guess a coding system name from a character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
824 name, and also provides some common aliases. See
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
825 @code{unity-guess-coding-system}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
826
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
827 Another, safer, way to accomplish this, using character sets rather
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
828 than coding systems to specify the desired recoding, is to use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
829 @code{unity-recode-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
830
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
831 To change from one Mule representation to another without changing identity
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
832 of any characters, use @code{unity-remap-region}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
833 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
834
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
835 Helper functions for input of coding system and character set names.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
836
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
837 @defun unity-guess-charset candidate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
838 Guess a charset based on the symbol @var{candidate}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
839
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
840 @var{candidate} itself is not tried as the value.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
841
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
842 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
843 the values in @samp{unity-charset-alias-alist}."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
844 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
845
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
846 @defun unity-guess-coding-system candidate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
847 Guess a coding system based on the symbol @var{candidate}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
848
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
849 @var{candidate} itself is not tried as the value.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
850
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
851 Uses the natural mapping in @samp{unity-cset-codesys-alist}, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
852 the values in @samp{unity-coding-system-alias-alist}."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
853 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
854
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
855 @defun unity-example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
856
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
857 A cheesy example for unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
858
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
859 At present it just makes a multilingual buffer. To test, setq
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
860 buffer-file-coding-system to some value, make the buffer dirty (eg
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
861 with RET BackSpace), and save.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
862 @end defun
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
863
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
864
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
865 @node Unification Configuration, Unification FAQs, Unification Usage, Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
866 @subsection Configuring Unification for Use
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
867
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
868 If you want unification to be automatically initialized, invoke
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
869 @samp{enable-unification} with no arguments in your init file.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
870 @xref{Init File, , , xemacs}. If you are using GNU Emacs or an XEmacs
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
871 earlier than 21.1, you should also load @file{auto-autoloads} using the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
872 full path (@emph{never} @samp{require} @file{auto-autoloads} libraries).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
873
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
874 You may wish to define aliases for commonly used character sets and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
875 coding systems for convenience in input.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
876
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
877 @defopt unity-charset-alias-alist
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
878 Alist mapping aliases to Mule charset names (symbols)."
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
879
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
880 The default value is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
881 @example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
882 ((latin-1 . latin-iso8859-1)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
883 (latin-2 . latin-iso8859-2)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
884 (latin-3 . latin-iso8859-3)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
885 (latin-4 . latin-iso8859-4)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
886 (latin-5 . latin-iso8859-9)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
887 (latin-9 . latin-iso8859-15)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
888 (latin-10 . latin-iso8859-16))
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
889 @end example
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
890
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
891 If a charset does not exist on your system, it will not complete and you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
892 will not be able to enter it in response to prompts. A real charset
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
893 with the same name as an alias in this list will shadow the alias.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
894 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
895
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
896 @defopt unity-coding-system-alias-alist nil
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
897 Alist mapping aliases to Mule coding system names (symbols).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
898
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
899 The default value is @samp{nil}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
900 @end defopt
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
901
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
902
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
903 @node Unification FAQs, Unification Theory, Unification Configuration, Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
904 @subsection Frequently Asked Questions About Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
905
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
906 @enumerate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
907 @item
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
908 I'm smarter than XEmacs's unification feature! How can that be?
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
909
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
910 Don't be surprised. Trust yourself.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
911
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
912 Unification is very young as yet. Teach it what you know by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
913 Customizing its variables, and report your changes to the maintainer
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
914 (@kbd{M-x report-xemacs-bug RET}).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
915
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
916 @item
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
917 What is a UCS?
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
918
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
919 According to ISO 10646, a Universal Coded character Set. In
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
920 XEmacs, it's Universal (Mule) Coding System.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
921 @ref{Coding Systems, , , xemacs}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
922
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
923 @item
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
924 I know @code{utf-16-le-bom} is a UCS, but unification won't use it.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
925 Why not?
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
926
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
927 There are an awful lot of UCSes in Mule, and you probably do not want to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
928 ever use, and definitely not be asked about, most of them. So the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
929 default set includes a few that the author thought plausible, but
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
930 they're surely not comprehensive or optimal.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
931
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
932 Customize @code{unity-ucs-list} to include the ones you use often, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
933 report your favorites to the maintainer for consideration for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
934 inclusion in the defaults using @kbd{M-x report-xemacs-bug RET}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
935 (Note that you @emph{must} include @code{escape-quoted} in this list,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
936 because Mule uses it internally as the coding system for auto-save
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
937 files.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
938
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
939 Alternatively, if you just want to use it this one time, simply type
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
940 it in at the prompt. Unification will confirm that is a real coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
941 system, and then assume that you know what you're doing.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
942
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
943 @item
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
944 This is crazy: I can't quit XEmacs and get queried on autosaves! Why?
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
945
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
946 You probably removed @code{escape-quoted} from
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
947 @code{unity-ucs-list}. Put it back.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
948
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
949 @item
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
950 Unification is really buggy and I can't get any work done.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
951
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
952 First, use @kbd{M-x disable-unification RET}, then report your
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
953 problems as a bug (@kbd{M-x report-xemacs-bug RET}).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
954 @end enumerate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
955
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
956
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
957 @node Unification Theory, What Unification Cannot Do for You, Unification FAQs, Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
958 @subsection Unification Theory
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
959
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
960 Standard encodings suffer from the design defect that they do not
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
961 provide a reliable way to recognize which coded character sets in use.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
962 @xref{What Unification Cannot Do for You}. There are scores of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
963 character sets which can be represented by a single octet (8-bit
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
964 byte), whose union contains many hundreds of characters. Obviously
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
965 this results in great confusion, since you can't tell the players
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
966 without a scorecard, and there is no scorecard.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
967
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
968 There are two ways to solve this problem. The first is to create a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
969 universal coded character set. This is the concept behind Unicode.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
970 However, there have been satisfactory (nearly) universal character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
971 sets for several decades, but even today many Westerners resist using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
972 Unicode because they consider its space requirements excessive. On
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
973 the other hand, many Asians dislike Unicode because they consider it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
974 to be incomplete. (This is partly, but not entirely, political.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
975
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
976 In any case, Unicode only solves the internal representation problem.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
977 Many data sets will contain files in ``legacy'' encodings, and Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
978 does not help distinguish among them.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
979
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
980 The second approach is to embed information about the encodings used in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
981 a document in its text. This approach is taken by the ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
982 standard. This would solve the problem completely from the users' of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
983 view, except that ISO 2022 is basically not implemented at all, in the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
984 sense that few applications or systems implement more than a small
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
985 subset of ISO 2022 functionality. This is due to the fact that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
986 mono-literate users object to the presence of escape sequences in their
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
987 texts (which they, with some justification, consider data corruption).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
988 Programmers are more than willing to cater to these users, since
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
989 implementing ISO 2022 is a painstaking task.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
990
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
991 In fact, Emacs/Mule adopts both of these approaches. Internally it uses
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
992 a universal character set, @dfn{Mule code}. Externally it uses ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
993 techniques both to save files in forms robust to encoding issues, and as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
994 hints when attempting to ``guess'' an unknown encoding. However, Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
995 suffers from a design defect, namely it embeds the character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
996 information that ISO 2022 attaches to runs of characters by introducing
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
997 them with a control sequence in each character. That causes Mule to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
998 consider the ISO Latin character sets to be disjoint. This manifests
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
999 itself when a user enters characters using input methods associated with
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1000 different coded character sets into a single buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1001
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1002 There are two problems stemming from this design. First, Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1003 represents the same character in different ways. Abstractly, ',As(B'
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1004 (LATIN SMALL LETTER O WITH ACUTE) can get represented as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1005 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1006 ',Ass(B' in the display might actually be represented [latin-iso8859-1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1007 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1008 #xF3 ESC - A] in the file. In some cases this treatment would be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1009 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1010 (the CJK ideographic character meaning ``one'')), and although arguably
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1011 incorrect it is convenient when mixing the CJK scripts. But in the case
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1012 of the Latin scripts this is wrong.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1013
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1014 Worse yet, it is very likely to occur when mixing ``different'' encodings
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1015 (such as ISO 8859/1 and ISO 8859/15) that differ only in a few code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1016 points that are almost never used. A very important example involves
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1017 email. Many sites, especially in the U.S., default to use of the ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1018 8859/1 coded character set (also called ``Latin 1,'' though these are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1019 somewhat different concepts). However, ISO 8859/1 provides a generic
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1020 CURRENCY SIGN character. Now that the Euro has become the official
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1021 currency of most countries in Europe, this is unsatisfactory (and in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1022 practice, useless). So Europeans generally use ISO 8859/15, which is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1023 nearly identical to ISO 8859/1 for most languages, except that it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1024 substitutes EURO SIGN for CURRENCY SIGN.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1025
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1026 Suppose a European user yanks text from a post encoded in ISO 8859/1
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1027 into a message composition buffer, and enters some text including the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1028 Euro sign. Then Mule will consider the buffer to contain both ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1029 8859/1 and ISO 8859/15 text, and MUAs such as Gnus will (if naively
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1030 programmed) send the message as a multipart mixed MIME body!
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1031
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1032 This is clearly stupid. What is not as obvious is that, just as any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1033 European can include American English in their text because ASCII is a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1034 subset of ISO 8859/15, most European languages which use Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1035 characters (eg, German and Polish) can typically be mixed while using
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1036 only one Latin coded character set (in the case of German and Polish,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1037 ISO 8859/2). However, this often depends on exactly what text is to be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1038 encoded (even for the same pair of languages).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1039
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1040 Unification works around the problem by converting as many characters as
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1041 possible to use a single Latin coded character set before saving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1042 buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1043
4488
6b0000935adc Spelling fixes.
"Ville Skyttä <scop@xemacs.org>"
parents: 1183
diff changeset
1044 Because the problem is rarely noticeable in editing a buffer, but tends
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1045 to manifest when that buffer is exported to a file or process,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1046 unification uses the strategy of examining the buffer prior to export.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1047 If use of multiple Latin coded character sets is detected, unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1048 attempts to unify them by finding a single coded character set which
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1049 contains all of the Latin characters in the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1050
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1051 The primary purpose of unification is to fix the problem by giving the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1052 user the choice to change the representation of all characters to one
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1053 character set and give sensible recommendations based on context. In
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1054 the ',As(B' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1055 both will be suggested. In the EURO SIGN example, only ISO 8859/15
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1056 makes sense, and that is what will be recommended. In both cases, the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1057 user will be reminded that there are universal encodings available.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1058
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1059 I call this @dfn{remapping} (from the universal character set to a
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1060 particular ISO 8859 coded character set). It is mere accident that this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1061 letter has the same code point in both character sets. (Not entirely,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1062 but there are many examples of Latin characters that have different code
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1063 points in different Latin-X sets.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1064
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1065 Note that, in the ',As(B' example, that treating the buffer in this way will
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1066 result in a representation such as [latin-iso8859-2
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1067 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3].
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1068 This is guaranteed to occasionally result in the second problem you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1069 observed, to which we now turn.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1070
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1071 This problem is that, although the file is intended to be an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1072 ISO-8859/2-encoded file, in an ISO 8859/1 locale Mule (and every POSIX
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1073 compliant program---this is required by the standard, obvious if you
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1074 think a bit, @pxref{What Unification Cannot Do for You}) will read that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1075 file as [latin-iso8859-1 #x73] [latin-iso8859-1 #x73]. Of course this
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1076 is no problem if all of the characters in the file are contained in ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1077 8859/1, but suppose there are some which are not, but are contained in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1078 the (intended) ISO 8859/2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1079
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1080 You now want to fix this, but not by finding the same character in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1081 another set. Instead, you want to simply change the character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1082 that Mule associates with that buffer position without changing the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1083 code. (This is conceptually somewhat distinct from the first problem,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1084 and logically ought to be handled in the code that defines coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1085 systems. However, unification is not an unreasonable place for it.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1086 Unification provides two functions (one fast and dangerous, the other
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1087 @c #### fix latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1088 slower and careful) to handle this. I call this @dfn{recoding}, because
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1089 the transformation actually involves @emph{encoding} the buffer to
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1090 file representation, then @emph{decoding} it to buffer representation
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1091 (in a different character set). This cannot be done automatically
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1092 because Mule can have no idea what the correct encoding is---after
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1093 all, it already gave you its best guess. @xref{What Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1094 Cannot Do for You}. So these functions must be invoked by the user.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1095 @xref{Interactive Usage}.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1096
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1097
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1098 @node What Unification Cannot Do for You, , Unification Theory, Unification
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1099 @subsection What Unification Cannot Do for You
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1100
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1101 Unification @strong{cannot} save you if you insist on exporting data in
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1102 8-bit encodings in a multilingual environment. @emph{You will
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1103 eventually corrupt data if you do this.} It is not Mule's, or any
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1104 application's, fault. You will have only yourself to blame; consider
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1105 yourself warned. (It is true that Mule has bugs, which make Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1106 somewhat more dangerous and inconvenient than some naive applications.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1107 We're working to address those, but no application can remedy the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1108 inherent defect of 8-bit encodings.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1109
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1110 Use standard universal encodings, preferably Unicode (UTF-8) unless
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1111 applicable standards indicate otherwise. The most important such case
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1112 is Internet messages, where MIME should be used, whether or not the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1113 subordinate encoding is a universal encoding. (Note that since one of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1114 the important provisions of MIME is the @samp{Content-Type} header,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1115 which has the charset parameter, MIME is to be considered a universal
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1116 encoding for the purposes of this manual. Of course, technically
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1117 speaking it's neither a coded character set nor a coding extension
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1118 technique compliant with ISO 2022.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1119
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1120 As mentioned earlier, the problem is that standard encodings suffer from
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1121 the design defect that they do not provide a reliable way to recognize
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1122 which coded character sets are in use. There are scores of character
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1123 sets which can be represented by a single octet (8-bit byte), whose
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1124 union contains many hundreds of characters. Thus any 8-bit coded
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1125 character set must contain characters that share code points used for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1126 different characters in other coded character sets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1127
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1128 This means that a given file's intended encoding cannot be identified
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1129 with 100% reliability unless it contains encoding markers such as those
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1130 provided by MIME or ISO 2022.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1131
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1132 Unification actually makes it more likely that you will have problems of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1133 this kind. Traditionally Mule has been ``helpful'' by simply using an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1134 ISO 2022 universal coding system when the current buffer coding system
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1135 cannot handle all the characters in the buffer. This has the effect
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1136 that, because the file contains control sequences, it is not recognized
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1137 as being in the locale's normal 8-bit encoding. It may be annoying if
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1138 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1139 you are not a Mule expert, but your data is guaranteed to be recoverable
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1140 with a tool you already have: Mule.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1141
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1142 However, with unification, Mule converts to a single 8-bit character set
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1143 when possible. But typically this will @emph{not} be in your usual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1144 locale. Ie, the times that an ISO 8859/1 user will need unification is
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1145 when there are ISO 8859/2 characters in the buffer. But then most
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1146 likely the file will be saved in a pure 8-bit encoding that is not ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1147 8859/1, ie, ISO 8859/2. Mule's autorecognizer (which is probably the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1148 most sophisticated yet available) cannot tell the difference between ISO
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1149 8859/1 and ISO 8859/2, and in a Western European locale will choose the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1150 former even though the latter was intended. Even the extension
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1151 @c #### fix in latin-unity.texi
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1152 (``statistical recognition'') planned for XEmacs 22 is unlikely to be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1153 acceptably accurate in the case of mixed codes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1154
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1155 So now consider adding some additional ISO 8859/1 text to the buffer.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1156 If it includes any ISO 8859/1 codes that are used by different
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1157 characters in ISO 8859/2, you now have a file that cannot be
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1158 mechanically disentangled. You need a human being who can recognize
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1159 that @emph{this is German and Swedish} and stays in Latin-1, while
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1160 @emph{that is Polish} and needs to be recoded to Latin-2.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1161
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1162 Moral: switch to a universal coded character set, preferably Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1163 using the UTF-8 transformation format. If you really need the space,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1164 compress your files.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1165
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1166
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1167 @node Specify Coding, Charsets and Coding Systems, Unification, Mule
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1168 @section Specifying a Coding System
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1169
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1170 In cases where XEmacs does not automatically choose the right coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1171 system, you can use these commands to specify one:
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1172
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1173 @table @kbd
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1174 @item C-x @key{RET} f @var{coding} @key{RET}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1175 Use coding system @var{coding} for the visited file
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1176 in the current buffer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1177
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1178 @item C-x @key{RET} c @var{coding} @key{RET}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1179 Specify coding system @var{coding} for the immediately following
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1180 command.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1181
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1182 @item C-x @key{RET} k @var{coding} @key{RET}
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
1183 Use coding system @var{coding} for keyboard input. (This feature is
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
1184 non-functional and is temporarily disabled.)
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1185
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1186 @item C-x @key{RET} t @var{coding} @key{RET}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1187 Use coding system @var{coding} for terminal output.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1188
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1189 @item C-x @key{RET} p @var{coding} @key{RET}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1190 Use coding system @var{coding} for subprocess input and output
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1191 in the current buffer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1192 @end table
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1193
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1194 @kindex C-x RET f
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1195 @findex set-buffer-file-coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1196 The command @kbd{C-x RET f} (@code{set-buffer-file-coding-system})
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1197 specifies the file coding system for the current buffer---in other
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1198 words, which coding system to use when saving or rereading the visited
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1199 file. You specify which coding system using the minibuffer. Since this
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1200 command applies to a file you have already visited, it affects only the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1201 way the file is saved.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1202
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1203 @kindex C-x RET c
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1204 @findex universal-coding-system-argument
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1205 Another way to specify the coding system for a file is when you visit
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1206 the file. First use the command @kbd{C-x @key{RET} c}
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1207 (@code{universal-coding-system-argument}); this command uses the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1208 minibuffer to read a coding system name. After you exit the minibuffer,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1209 the specified coding system is used for @emph{the immediately following
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1210 command}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1211
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1212 So if the immediately following command is @kbd{C-x C-f}, for example,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1213 it reads the file using that coding system (and records the coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1214 system for when the file is saved). Or if the immediately following
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1215 command is @kbd{C-x C-w}, it writes the file using that coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1216 Other file commands affected by a specified coding system include
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1217 @kbd{C-x C-i} and @kbd{C-x C-v}, as well as the other-window variants of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1218 @kbd{C-x C-f}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1219
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1220 In addition, if you run some file input commands with the precedent
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1221 @kbd{C-u}, you can specify coding system to read from minibuffer. So if
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1222 the immediately following command is @kbd{C-x C-f}, for example, it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1223 reads the file using that coding system (and records the coding system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1224 for when the file is saved). Other file commands affected by a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1225 specified coding system include @kbd{C-x C-i} and @kbd{C-x C-v}, as well
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1226 as the other-window variants of @kbd{C-x C-f}.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1227
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1228 @vindex default-buffer-file-coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1229 The variable @code{default-buffer-file-coding-system} specifies the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1230 choice of coding system to use when you create a new file. It applies
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1231 when you find a new file, and when you create a buffer and then save it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1232 in a file. Selecting a language environment typically sets this
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1233 variable to a good choice of default coding system for that language
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1234 environment.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1235
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1236 @kindex C-x RET t
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1237 @findex set-terminal-coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1238 The command @kbd{C-x @key{RET} t} (@code{set-terminal-coding-system})
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1239 specifies the coding system for terminal output. If you specify a
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1240 character code for terminal output, all characters output to the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1241 terminal are translated into that coding system.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1242
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1243 This feature is useful for certain character-only terminals built to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1244 support specific languages or character sets---for example, European
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1245 terminals that support one of the ISO Latin character sets.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1246
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1247 By default, output to the terminal is not translated at all.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1248
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1249 @kindex C-x RET k
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1250 @findex set-keyboard-coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1251 The command @kbd{C-x @key{RET} k} (@code{set-keyboard-coding-system})
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1252 specifies the coding system for keyboard input. Character-code
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1253 translation of keyboard input is useful for terminals with keys that
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1254 send non-ASCII graphic characters---for example, some terminals designed
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1255 for ISO Latin-1 or subsets of it.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1256
600
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
1257 (This feature is non-functional and is temporarily disabled.)
a99eebfee7d3 [xemacs-hg @ 2001-06-01 07:15:24 by martinb]
martinb
parents: 442
diff changeset
1258
428
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1259 By default, keyboard input is not translated at all.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1260
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1261 There is a similarity between using a coding system translation for
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1262 keyboard input, and using an input method: both define sequences of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1263 keyboard input that translate into single characters. However, input
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1264 methods are designed to be convenient for interactive use by humans, and
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1265 the sequences that are translated are typically sequences of ASCII
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1266 printing characters. Coding systems typically translate sequences of
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1267 non-graphic characters.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1268
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1269 @kindex C-x RET p
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1270 @findex set-buffer-process-coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1271 The command @kbd{C-x @key{RET} p} (@code{set-buffer-process-coding-system})
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1272 specifies the coding system for input and output to a subprocess. This
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1273 command applies to the current buffer; normally, each subprocess has its
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1274 own buffer, and thus you can use this command to specify translation to
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1275 and from a particular subprocess by giving the command in the
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1276 corresponding buffer.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1277
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1278 By default, process input and output are not translated at all.
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1279
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1280 @vindex file-name-coding-system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1281 The variable @code{file-name-coding-system} specifies a coding system
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1282 to use for encoding file names. If you set the variable to a coding
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1283 system name (as a Lisp symbol or a string), XEmacs encodes file names
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1284 using that coding system for all file operations. This makes it
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1285 possible to use non-Latin-1 characters in file names---or, at least,
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1286 those non-Latin-1 characters which the specified coding system can
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1287 encode. By default, this variable is @code{nil}, which implies that you
3ecd8885ac67 Import from CVS: tag r21-2-22
cvs
parents:
diff changeset
1288 cannot use non-Latin-1 characters in file names.
1183
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1289
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1290
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1291 @node Charsets and Coding Systems, , Specify Coding, Mule
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1292 @section Charsets and Coding Systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1293
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1294 This section provides reference lists of Mule charsets and coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1295 systems. Mule charsets are typically named by character set and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1296 standard.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1297
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1298 @table @strong
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1299 @item ASCII variants
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1300
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1301 Identification of equivalent characters in these sets is not properly
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1302 implemented. Unification does not distinguish the two charsets.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1303
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1304 @samp{ascii} @samp{latin-jisx0201}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1305
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1306 @item Extended Latin
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1307
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1308 Characters from the following ISO 2022 conformant charsets are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1309 identified with equivalents in other charsets in the group by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1310 unification.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1311
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1312 @samp{latin-iso8859-1} @samp{latin-iso8859-15} @samp{latin-iso8859-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1313 @samp{latin-iso8859-3} @samp{latin-iso8859-4} @samp{latin-iso8859-9}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1314 @samp{latin-iso8859-13} @samp{latin-iso8859-16}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1315
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1316 The follow charsets are Latin variants which are not understood by
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1317 unification. In addition, many of the Asian language standards provide
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1318 ASCII, at least, and sometimes other Latin characters. None of these
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1319 are identified with their ISO 8859 equivalents.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1320
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1321 @samp{vietnamese-viscii-lower}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1322 @samp{vietnamese-viscii-upper}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1323
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1324 @item Other character sets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1325
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1326 @samp{arabic-1-column}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1327 @samp{arabic-2-column}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1328 @samp{arabic-digit}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1329 @samp{arabic-iso8859-6}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1330 @samp{chinese-big5-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1331 @samp{chinese-big5-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1332 @samp{chinese-cns11643-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1333 @samp{chinese-cns11643-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1334 @samp{chinese-cns11643-3}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1335 @samp{chinese-cns11643-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1336 @samp{chinese-cns11643-5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1337 @samp{chinese-cns11643-6}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1338 @samp{chinese-cns11643-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1339 @samp{chinese-gb2312}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1340 @samp{chinese-isoir165}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1341 @samp{cyrillic-iso8859-5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1342 @samp{ethiopic}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1343 @samp{greek-iso8859-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1344 @samp{hebrew-iso8859-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1345 @samp{ipa}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1346 @samp{japanese-jisx0208}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1347 @samp{japanese-jisx0208-1978}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1348 @samp{japanese-jisx0212}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1349 @samp{katakana-jisx0201}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1350 @samp{korean-ksc5601}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1351 @samp{sisheng}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1352 @samp{thai-tis620}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1353 @samp{thai-xtis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1354
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1355 @item Non-graphic charsets
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1356
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1357 @samp{control-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1358 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1359
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1360 @table @strong
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1361 @item No conversion
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1362
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1363 Some of these coding systems may specify EOL conventions. Note that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1364 @samp{iso-8859-1} is a no-conversion coding system, not an ISO 2022
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1365 coding system. Although unification attempts to compensate for this, it
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1366 is possible that the @samp{iso-8859-1} coding system will behave
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1367 differently from other ISO 8859 coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1368
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1369 @samp{binary} @samp{no-conversion} @samp{raw-text} @samp{iso-8859-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1370
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1371 @item Latin coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1372
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1373 These coding systems are all single-byte, 8-bit ISO 2022 coding systems,
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1374 combining ASCII in the GL register (bytes with high-bit clear) and an
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1375 extended Latin character set in the GR register (bytes with high-bit set).
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1376
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1377 @samp{iso-8859-15} @samp{iso-8859-2} @samp{iso-8859-3} @samp{iso-8859-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1378 @samp{iso-8859-9} @samp{iso-8859-13} @samp{iso-8859-14} @samp{iso-8859-16}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1379
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1380 These coding systems are single-byte, 8-bit coding systems that do not
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1381 conform to international standards. They should be avoided in all
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1382 potentially multilingual contexts, including any text distributed over
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1383 the Internet and World Wide Web.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1384
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1385 @samp{windows-1251}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1386
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1387 @item Multilingual coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1388
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1389 The following ISO-2022-based coding systems are useful for multilingual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1390 text.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1391
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1392 @samp{ctext} @samp{iso-2022-lock} @samp{iso-2022-7} @samp{iso-2022-7bit}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1393 @samp{iso-2022-7bit-ss2} @samp{iso-2022-8} @samp{iso-2022-8bit-ss2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1394
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1395 XEmacs also supports Unicode with the Mule-UCS package. These are the
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1396 preferred coding systems for multilingual use. (There is a possible
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1397 exception for texts that mix several Asian ideographic character sets.)
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1398
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1399 @samp{utf-16-be} @samp{utf-16-be-no-signature} @samp{utf-16-le}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1400 @samp{utf-16-le-no-signature} @samp{utf-7} @samp{utf-7-safe}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1401 @samp{utf-8} @samp{utf-8-ws}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1402
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1403 Development versions of XEmacs (the 21.5 series) support Unicode
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1404 internally, with (at least) the following coding systems implemented:
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1405
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1406 @samp{utf-16-be} @samp{utf-16-be-bom} @samp{utf-16-le}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1407 @samp{utf-16-le-bom} @samp{utf-8} @samp{utf-8-bom}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1408
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1409 @item Asian ideographic languages
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1410
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1411 The following coding systems are based on ISO 2022, and are more or less
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1412 suitable for encoding multilingual texts. They all can represent ASCII
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1413 at least, and sometimes several other foreign character sets, without
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1414 resort to arbitrary ISO 2022 designations. However, these subsets are
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1415 not identified with the corresponding national standards in XEmacs Mule.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1416
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1417 @samp{chinese-euc} @samp{cn-big5} @samp{cn-gb-2312} @samp{gb2312}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1418 @samp{hz} @samp{hz-gb-2312} @samp{old-jis} @samp{japanese-euc}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1419 @samp{junet} @samp{euc-japan} @samp{euc-jp} @samp{iso-2022-jp}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1420 @samp{iso-2022-jp-1978-irv} @samp{iso-2022-jp-2} @samp{euc-kr}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1421 @samp{korean-euc} @samp{iso-2022-kr} @samp{iso-2022-int-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1422
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1423 The following coding systems cannot be used for general multilingual
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1424 text and do not cooperate well with other coding systems.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1425
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1426 @samp{big5} @samp{shift_jis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1427
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1428 @item Other languages
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1429
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1430 The following coding systems are based on ISO 2022. Though none of them
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1431 provides any Latin characters beyond ASCII, XEmacs Mule allows (and up
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1432 to 21.4 defaults to) use of ISO 2022 control sequences to designate
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1433 other character sets for inclusion the text.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1434
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1435 @samp{iso-8859-5} @samp{iso-8859-7} @samp{iso-8859-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1436 @samp{ctext-hebrew}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1437
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1438 The following are character sets that do not conform to ISO 2022 and
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1439 thus cannot be safely used in a multilingual context.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1440
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1441 @samp{alternativnyj} @samp{koi8-r} @samp{tis-620} @samp{viqr}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1442 @samp{viscii} @samp{vscii}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1443
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1444 @item Special coding systems
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1445
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1446 Mule uses the following coding systems for special purposes.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1447
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1448 @samp{automatic-conversion} @samp{undecided} @samp{escape-quoted}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1449
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1450 @samp{escape-quoted} is especially important, as it is used internally
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1451 as the coding system for autosaved data.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1452
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1453 The following coding systems are aliases for others, and are used for
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1454 communication with the host operating system.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1455
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1456 @samp{file-name} @samp{keyboard} @samp{terminal}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1457
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1458 @end table
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1459
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1460 Mule detection of coding systems is actually limited to detection of
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1461 classes of coding systems called @dfn{coding categories}. These coding
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1462 categories are identified by the ISO 2022 control sequences they use, if
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1463 any, by their conformance to ISO 2022 restrictions on code points that
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1464 may be used, and by characteristic patterns of use of 8-bit code points.
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1465
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1466 @samp{no-conversion}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1467 @samp{utf-8}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1468 @samp{ucs-4}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1469 @samp{iso-7}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1470 @samp{iso-lock-shift}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1471 @samp{iso-8-1}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1472 @samp{iso-8-2}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1473 @samp{iso-8-designate}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1474 @samp{shift-jis}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1475 @samp{big5}
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1476
c1553814932e [xemacs-hg @ 2003-01-03 12:12:30 by stephent]
stephent
parents: 873
diff changeset
1477