comparison man/mule/mule.texi @ 70:131b0175ea99 r20-0b30

Import from CVS: tag r20-0b30
author cvs
date Mon, 13 Aug 2007 09:02:59 +0200
parents
children 360340f9fd5f
comparison
equal deleted inserted replaced
69:804d1389bcd6 70:131b0175ea99
1
2 @node Coding-system
3 @section Coding-system
4
5 @noindent
6 `coding-system' is a method for encoding several
7 character-sets and represented by a symbol which has
8 properties of 'coding-system and 'eol-type.
9
10 You can specify different coding-system on file I/O, process
11 I/O, output to terminal (if not running on X), input from
12 keyboard (if not running on X).
13
14
15 @menu
16 * Structure:: Structure of coding-system
17 o Property 'coding-system
18 o Property 'eol-type
19 o Property 'post-read-conversion
20 o Property 'pre-write-conversion
21 * Creation:: How to create coding-system?
22 * Predefined coding-system::
23 * Automatic conversion::
24 o Category of coding-system
25 o How automatic conversion works?
26 o Priority of category
27 * Mode-line:: How coding-system is shown in mode-line?::
28 * ISO2022 restriction::
29 * Big5:: Special treatment of Big5
30 @end menu
31
32 @node Structure
33 @subsection Structure of coding-system
34
35 @subsubsection Property 'coding-system
36
37 The value of the property 'coding-system is a vector:
38 @quotation
39 [ TYPE MNEMONIC DOCUMENT DUMMY FLAGS ]
40 @end quotation
41 or the other coding-system. Contents of the vector are:
42 @example
43 TYPE: nil: no conversion, t: automatic conversion,
44 0:Internal, 1:Shift-JIS, 2:ISO2022, 3:Big5, 4:CCL.
45 MNEMONIC: a character shown at mode-line to indicate the coding-system.
46 DOCUMENT: a describing documents for the coding-system.
47 DUMMY: always nil (for backward compatibility)
48 FLAGS (option): more precise information about the coding-system,
49 If TYPE is 2 (ISO2022), FLAGS should be a list of:
50 LB-G0, LB-G1, LB-G2, LB-G3:
51 Leading character of charset initially designated to G? graphic set,
52 nil means G? is not designated initially,
53 lb-invalid means G? can never be designated to,
54 if (- leading-char) is specified, it is designated on output,
55 SHORT: non-nil - allow such as \"ESC $ B\", nil - always \"ESC $ \( B\",
56 ASCII-EOL: non-nil - designate ASCII to g0 at end of line on output,
57 ASCII-CNTL: non-nil - designate ASCII to g0 at control codes on output
58 SEVEN: non-nil - use 7-bit environment on output,
59 LOCK-SHIFT: non-nil - use locking-shift (SO/SI) instead of single-shift
60 or designation by escape sequence,
61 USE-ROMAN: non-nil - designate JIS0201-1976-Roman instead of ASCII,
62 USE-OLDJIS: non-nil - designate JIS0208-1976 instead of JIS0208-1983,
63 NO-ISO6429: non-nil - don't use ISO6429's direction specification,
64 If TYPE is 3 (Big5), FLAGS `t' means Big5-ETen, `nil' means Big5-HKU,
65 If TYPE is 4 (private), FLAGS should be a cons of CCL programs
66 for encoding and decoding. See documentation of CCL for more detail.
67 @end example
68
69 @subsubsection Property 'eol-type
70
71 The value of the property 'eol-type is:
72 nil: no conversion for end-of-line type
73 1: LF
74 2: CRLF
75 3: CR
76 vector of length 3: automatic detection of end-of-line type.
77 1st element: coding-system of eol-type LF
78 2nd element: coding-system of eol-type CRLF
79 3rd element: coding-system of eol-type CR
80
81 @subsubsection Property 'post-read-conversion
82
83 The value of the property 'post-read-conversion is a
84 function to convert some text just read into a buffer. When
85 the function is called, the text has already been converted
86 according to 'coding-system and 'eol-type of the
87 coding-system. The argument of the function is the region
88 (START and END) of inserted text.
89
90 @subsection Property 'pre-write-conversion
91
92 The value of the property 'pre-write-conversion is a
93 function to convert some text just before writing it out.
94 After the function is called, the text is converted accoding
95 to 'coding-system and 'eol-type of the coding-system. The
96 argument of the function is the region (START and END) of
97 the text.
98
99 @node Creation
100 @subsection How to create coding-system?
101
102 Mule provides a function `make-coding-system' to create a
103 coding-system.
104
105 FUNCTION make-coding-system: NAME TYPE MNEMONIC DOC &optional EOL-TYPE FLAGS
106
107 Register symbol NAME as a coding-system whose 'coding-system
108 property is a vector [ TYPE MNEMONIC DOC nil FLAGS ] and
109 'eol-type property is EOL-TYPE. If `t' is specified as
110 EOL-TYPE, the value of 'eol-type property is a vector of
111 generated coding-systems whose 'eol-type properties are 1
112 (LF), 2 (CRLF), and 3 (CR). The names of generated
113 coding-systems are NAMEunix, NAMEdos, and NAMEmac respectively.
114
115 Just to make an alias of some coding-system, call a fucntion
116 `copy-coding-system'.
117
118 FUNCTION copy-coding-system: ORIGINAL ALIAS
119
120 Make the same coding-system as ORIGINAL and name it ALIAS.
121 If 'eol-type property of ORIGINAL is a vector, coding-systems
122 ALIASunix, ALIASdos, and ALIASmac are generated, and
123 'eol-type property of ALIAS becomes a vector of them.
124
125 @node Predefined coding-system
126 @subsection Predefined coding-system
127
128 See lisp/mule.el.
129
130 @node Automatic conversion
131 @subsection Automatic conversion
132
133 @subsubsection Category of coding-system
134
135 Mule has a facility to detect coding-system of text
136 automatically, however, what mule actually detect is not a
137 coding-system itself but a category of coding-system. A
138 category is also represented by a symbol and a value should
139 be an actual coding-system.
140
141 There are eight categories:
142 @table @asis
143 @item *coding-category-internal*:
144 coding-system used in a buffer
145 @item *coding-category-sjis*
146 Shift-JIS
147 @item *coding-category-iso-7*
148 ISO2022 variation with the following feature:
149 o no locking shift, single shift
150 o only G0 is used
151 @item *coding-category-iso-8-1*
152 ISO2022 variation with the following feature:
153 o no locking shift
154 o designation sequence is allowed only for G0 and G1
155 o G1 is used only for 1-byte character set
156 @item *coding-category-iso-8-2*
157 ISO2022 variation with the following feature:
158 o no locking shift
159 o designation sequence is allowed only for G0 and G1
160 o G1 is used only for 2-byte character set
161 @item *coding-category-iso-else*
162 ISO2022 variation which doesn't satisfy any of above.
163 @item *coding-category-big5*
164 Big5 (ETen or HKU)
165 @item *coding-category-bin*
166 Any other coding-system which uses MSB.
167 @end table
168
169 The values of these symbols are pre-defined as follows:
170
171 @example
172 ----- lisp/mule.el -----------------------------------------
173 (defvar *coding-category-internal* '*internal*)
174 (defvar *coding-category-sjis* '*sjis*)
175 (defvar *coding-category-iso-7* '*junet*)
176 (defvar *coding-category-iso-8-1* '*ctext*)
177 (defvar *coding-category-iso-8-2* '*euc-japan*)
178 (defvar *coding-category-iso-else* '*iso-2022-ss2-7*)
179 (defvar *coding-category-big5* '*big5-eten*)
180 (defvar *coding-category-bin* '*noconv*)
181 ------------------------------------------------------------
182 @end example
183
184 but, some of them are overridden in such language specific
185 files as japanese.el, chinese.el, etc.
186
187 @subsubsection How automatic conversion works?
188
189 When coding-system `*autoconv*' is specified on reading text
190 (this is the default), mule tries to detect a category of
191 coding-system by which text are encoded. If an appropriate
192 category is found, it converts text according to a
193 coding-system bound to the cateogry. If the 'eol-type
194 property of the coding-system is a vector of coding-systems
195 and Mule detects a type of end-of-line (LF, CRLF, or CR) of
196 the text, one of those coding-system is used.
197
198 Automatic conversion occurs both on reading from files and
199 inputing from process. In the latter case, if some
200 coding-system is found, output-coding-system of the process
201 is also set to the found coding-system.
202
203 @subsubsection Priority of cateogry
204
205 In the case that more than two categories are found, the
206 category of the highest priority is selected.
207
208 A priority of category is pre-defined as follows:
209
210 @example
211 ----- lisp/mule.el -----------------------------------------
212 (set-coding-priority
213 '(*coding-category-iso-8-2*
214 *coding-category-sjis*
215 *coding-category-iso-8-1*
216 *coding-category-big5*
217 *coding-category-iso-7*
218 *coding-category-iso-else*
219 *coding-category-bin*
220 *coding-category-internal*))
221 ------------------------------------------------------------
222 @end example
223
224 The function `set-coding-priority' put a property 'priority
225 to each element of the argument from 0 to 7 (smaller number
226 has higher priority). Some language specific files may
227 override this priority.
228
229 @node Mode-line
230 @subsection How coding-system is shown in mode-line?
231
232 Each coding-system has unique mnemonic (one character).
233 By default, mnemonic of `file-coding-system' of a buffer is
234 shown at the left of mode-line of the buffer. In addition,
235 the mnemonic is followed by an another mnemonic to show
236 eol-type of the coding-system. This mnemonic is defined as
237 follows:
238 ".": LF
239 ":": CRLF
240 "'": CR
241 "_": not yet desided
242 "-": nil (for coding-system of nil, *noconv*, or *internal*)
243 So, usual appearance of mode-line for a buffer which is
244 visiting a file (*junet* encoding on Unix system) is:
245
246 @example
247 +-- mnemonic of file-coding-system
248 |+-- mnemonic of eol-type
249 VV
250 [--]J.:----Mule: filename
251 @end example
252
253 The left most bracket is the indicator for inputing method.
254
255 When a buffer is attaced to some process, coding-system
256 for input and output of the process are also shown as
257 follows:
258
259 @example
260 +-- mnemonic of file-coding-system
261 |+-- mnemonic of eol-type of file-coding-system
262 ||+-- mnemonic of input-coding-system of a process
263 |||+-- mnemonic of eol-type of input-coding-system
264 ||||+-- mnemonic of output-coding-system of a process
265 |||||+-- mnemonic of eol-type of output-coding-system
266 VVVVVV
267 [--]+_+.--:--**-Mule: *shell*
268 @end example
269
270 This means that Mule is now communicating with shell with
271 coding-systems *autoconv*unix ("+.") for input and nil
272 ("--") for output.
273
274 @node ISO2022 restriction
275 @subsection ISO2022 restriction
276
277 For decoding to Type 2 (ISO2022), we have the following
278 restrictions:
279
280 @table @asis
281 @item Locking-Shift:
282 Use SI and SO only when decoding with a coding-system
283 whose LOCK-SHIFT and SEVEN is t.
284
285 @item Single-Shift:
286 Use SS2 and SS3 (if SEVEN is nil) or ESC N and ESC O (if
287 SEVEN is t).
288
289 @item Invocation:
290 G0 is always invoked to GL, G1 to GR (but only if SEVEN is
291 nil). G2 and G3 are invoked to GL by Single-Shift of SS2
292 and SS3.
293
294 @item Unofficial use of ESC sequence for designation:
295 If SEVEN is t, LOCK-SHIFT is nil, and designation to G2
296 and G3 are prohibited, we should designate all character
297 sets to G0 (and hence invoke to GL). To designate 96
298 char-set to G0, we use "ESC , <F>". For instance, to
299 designate ISO8859-1 to G0, we use "ESC , A".
300
301 @item Unofficial use of ESC sequence for composit character:
302 To indicate the start and end of composit character, we
303 use ESC 0 (start) and ESC 1 (end).
304
305 @item Text direction specifier of ISO6429
306 We use ISO6429's ESC sequence "ESC [ 2 ]" to change text
307 direction to right-to-left, and "ESC [ 0 ]" to revert it
308 to left-to-right.
309 @end table
310
311 @node Big5
312 @subsection Special treatment of Big5
313
314 As far as I know, there's several different codes called
315 Big5. The most famous ones are Big5-ETen and
316 Big5-HKU-form2. Since both of them use a code range 0xa140
317 - 0xfefe (in each row, columns (second byte) 0x7f - 0xa0 is
318 skipped) and number of characters is more than 13000, it's
319 impossible to treat each of them as a single character-set
320 in the current Mule system. So, Mule treat them in a quite
321 irregular manner as described below:
322
323 @enumerate
324 @item
325 Mule does not treats them as a different character set,
326 but as the same character set called Big5.
327 Caution!! Big5 is a different character set from GB.
328
329 @item
330 Mule divides Big5 into two sub-character-sets:
331 0xa140 - 0xc67e (Level 1)
332 0xc6a1 - 0xfefe (Level 2)
333 and allocates two leading-chars lc-big5-1 and lc-big5-2 to
334 them. (See character.txt)
335
336 @item
337 Usually, each leading-char (or character-set) has unique
338 character category. But lc-big5-1 and lc-big5-2 has the
339 same character category of mnemonic 't'. So, regular
340 expression "\\ct" matches any Big5 (Level 1 and Level 2)
341 characters. (See syntax.txt)
342
343 @item
344 If you specify ISO2022 type coding-system on output,
345 Mule converts Big5 code using unofficial final-characters
346 '0' (for Level 1) and '1' (for Level 2).
347
348 @item
349 You can use either fonts of ETen or HKU for displaying
350 Big5 code. Mule judges which font is used by examining
351 existence of character whose code point is 0xC6A1. If it
352 exists, the font is HKU, else the fonts is ETen.
353 @end enumerate
354
355 @node Syntax
356 @section Syntax and Category of character
357
358 @subsection Syntax
359
360 Mule can define syntax of all multi-byte characters by
361 @code{modify-syntax-entry}.
362
363 The first argument of @code{modify-syntax-entry should} be one of below:
364 @enumerate
365 @item
366 ASCII character
367 @item
368 multi-byte character
369 @item
370 leading character of multi-byte character
371 @item
372 partially defined characters returned by:
373
374 @quotation
375 @code{(make-character leading-char arg)}
376 @end quotation
377 @end enumerate
378
379 There's a restriction of specifying matching character within
380 second argument. If the first argument specifies multi-byte
381 character or leading char of multi-byte character, the
382 matching character should have the same leading character. If
383 the character is 2-byte code, the first-byte of it should
384 also be the same with the first-byte of first argument.
385
386 @subsection Category
387
388 Like syntax, category also defines characteristics of
389 characters. The differences are:
390 @enumerate
391 @item
392 Each Character can have more than one category.
393 @item
394 User can define new type of category as he wishes.
395 Example: See japanese.el
396 @item
397 @code{char-category} returns all mnemonics of the character by string.
398 @item
399 For regular expression search, you can use the \cm or \Cm (any mnemonics
400 comes at the place of 'm') instead of \sm and \Sm.
401 @end enumerate
402
403 @node Font
404 @section Font
405
406 FONTSET is a set of fonts which have the same height and style. A
407 fontset should hopefully contain enough fonts to display a character of
408 various character sets.
409
410 Mule uses fontset instead of font. You can specify fontset at any place
411 where you can specify font. You can still specify font, in which case,
412 a fontset which include the font is searched and used.
413
414 Like font, fontset is also a string specifying the name.
415
416 @menu
417 * Initial fontsets:: Fontsets which Mule have at startup time.
418 * Specify fontset:: How to specify a fontset?
419 * Manage fontset:: How to create or modify a fontset?
420 @end menu
421
422 @node Initial fontsets
423 @subsection Initial fontsets
424
425 @subsubsection "default-fontset"
426
427 Mule automatically creates a fontset named "default-fontset" at startup
428 time. Each font in this fontset is specifed by a very generic name such
429 as "-*-fixed-medium-r-*--16-*-iso8859-1" for ASCII and
430 "-*-fixed-medium-r-*--*-jisx0208.1983-*" for JISX0208 (Kanji).
431 These values are defined in @file{lisp/term/x-win.el}.
432
433 If there's no other fontsets specifed by X's resource, "default-fontset"
434 is used for the first frame of Mule.
435
436 In most cases, this is enough. You probably don't have to have any
437 other fontsets.
438
439 @subsubsection X's resourse
440
441 Mule also creates fontsets specified in X's resource "fontSetList (class
442 FontSetList)". The value is a comma separated list of fontset names.
443
444 @example
445 *FontSetList: 16,24
446 @end example
447
448 The actual contents of each fontset is specified by "fontSet-xxx (class
449 FontSet-xxx)" where "xxx" is a name of the corresponding fontset. The
450 value of this resource is a comma separated list of font names.
451
452 @example
453 *FontSet-16: -etl-fixed-medium-r-*--24-*-iso8859-1
454 @end example
455
456 Each font name should not contain wild card `*' or `?' in
457 CHARSET_REGSTRY field because a character set for this font is
458 recognized by this field. This means that you don't have to care about
459 the order of font names.
460
461 For instance,
462
463 @example
464 *FontSet-16:\
465 -etl-fixed-medium-r-*--16-*-iso8859-1\
466 -ming-fixed-medium-r-*--*-*-jisx0208.1983-*
467 @end example
468
469 is enough to tell Mule that the fontset "16" contains ASCII font and
470 JISX0208 font. Please note that the second name has only wild card in
471 PIXEL_SIZE field. Since Mule try to open a font of the same PIXEL_SIZE
472 as ASCII font of the same fontset, you'ld better not specify actual
473 value in PIXEL_SIZE field except for ASCII font.
474
475 As for fonts not listed in the specification of fontset, corresponding
476 font names in "default fontset" is used.
477
478 The first fontset in FontSetList is used for the first frame of Mule.
479 If you want to use "default-fontset" while specifying other fontsets in
480 the resource, please put "default-fontset" at the first of the value.
481
482 @example
483 *FontSetList: default-fontset,16,24
484 @end example
485
486 In this case, you don't have to have the resource
487 "FontSet-default-fontset".
488
489 @node Specify fontset
490 @subsection How to specify a fontset?
491
492 You can specify fontset at any place where you can sepcify font.
493
494 To change the fontset used for the first frame of Mule:
495
496 @enumerate
497 @item
498 command line arguments "-fn xxx" or "-font xxx"
499
500 If this argument exits, fontset is searched in the following order:
501 @enumerate
502 @item
503 A fontset whose name is "xxx".
504 @item
505 A fontset which contains ASCII font "xxx".
506 @item
507 Create a new fontset "xxx" which contains ASCII font "xxx".
508 @end enumerate
509
510 @item
511 In your ~/.emacs,
512
513 @example
514 (setcdr (assoc 'font default-frame-alist) "xxx")
515 @end example
516
517 @end enumerate
518
519 To change a fontset after Mule started:
520
521 @enumerate
522 @item
523 By the command
524
525 @example
526 M-x set-default-fontset<CR>xxx<CR>
527 @end example
528
529 @item
530 By @key{Ctl-Mouse-3}
531
532 @end enumerate
533
534 @node Manage fontset
535 @subsection How to create or modify a fontset?
536
537 You can create a new fontset by `new-fontset' and modify an
538 existing fontset by `set-fontset-font'.
539
540 You can get a list of fontset currently created by
541 `fonset-list'.
542
543 You can check if a fontset is already created or not by
544 `fonsetp'.