Mercurial > hg > xemacs-beta
comparison man/mule/mule.texi @ 70:131b0175ea99 r20-0b30
Import from CVS: tag r20-0b30
author | cvs |
---|---|
date | Mon, 13 Aug 2007 09:02:59 +0200 |
parents | |
children | 360340f9fd5f |
comparison
equal
deleted
inserted
replaced
69:804d1389bcd6 | 70:131b0175ea99 |
---|---|
1 | |
2 @node Coding-system | |
3 @section Coding-system | |
4 | |
5 @noindent | |
6 `coding-system' is a method for encoding several | |
7 character-sets and represented by a symbol which has | |
8 properties of 'coding-system and 'eol-type. | |
9 | |
10 You can specify different coding-system on file I/O, process | |
11 I/O, output to terminal (if not running on X), input from | |
12 keyboard (if not running on X). | |
13 | |
14 | |
15 @menu | |
16 * Structure:: Structure of coding-system | |
17 o Property 'coding-system | |
18 o Property 'eol-type | |
19 o Property 'post-read-conversion | |
20 o Property 'pre-write-conversion | |
21 * Creation:: How to create coding-system? | |
22 * Predefined coding-system:: | |
23 * Automatic conversion:: | |
24 o Category of coding-system | |
25 o How automatic conversion works? | |
26 o Priority of category | |
27 * Mode-line:: How coding-system is shown in mode-line?:: | |
28 * ISO2022 restriction:: | |
29 * Big5:: Special treatment of Big5 | |
30 @end menu | |
31 | |
32 @node Structure | |
33 @subsection Structure of coding-system | |
34 | |
35 @subsubsection Property 'coding-system | |
36 | |
37 The value of the property 'coding-system is a vector: | |
38 @quotation | |
39 [ TYPE MNEMONIC DOCUMENT DUMMY FLAGS ] | |
40 @end quotation | |
41 or the other coding-system. Contents of the vector are: | |
42 @example | |
43 TYPE: nil: no conversion, t: automatic conversion, | |
44 0:Internal, 1:Shift-JIS, 2:ISO2022, 3:Big5, 4:CCL. | |
45 MNEMONIC: a character shown at mode-line to indicate the coding-system. | |
46 DOCUMENT: a describing documents for the coding-system. | |
47 DUMMY: always nil (for backward compatibility) | |
48 FLAGS (option): more precise information about the coding-system, | |
49 If TYPE is 2 (ISO2022), FLAGS should be a list of: | |
50 LB-G0, LB-G1, LB-G2, LB-G3: | |
51 Leading character of charset initially designated to G? graphic set, | |
52 nil means G? is not designated initially, | |
53 lb-invalid means G? can never be designated to, | |
54 if (- leading-char) is specified, it is designated on output, | |
55 SHORT: non-nil - allow such as \"ESC $ B\", nil - always \"ESC $ \( B\", | |
56 ASCII-EOL: non-nil - designate ASCII to g0 at end of line on output, | |
57 ASCII-CNTL: non-nil - designate ASCII to g0 at control codes on output | |
58 SEVEN: non-nil - use 7-bit environment on output, | |
59 LOCK-SHIFT: non-nil - use locking-shift (SO/SI) instead of single-shift | |
60 or designation by escape sequence, | |
61 USE-ROMAN: non-nil - designate JIS0201-1976-Roman instead of ASCII, | |
62 USE-OLDJIS: non-nil - designate JIS0208-1976 instead of JIS0208-1983, | |
63 NO-ISO6429: non-nil - don't use ISO6429's direction specification, | |
64 If TYPE is 3 (Big5), FLAGS `t' means Big5-ETen, `nil' means Big5-HKU, | |
65 If TYPE is 4 (private), FLAGS should be a cons of CCL programs | |
66 for encoding and decoding. See documentation of CCL for more detail. | |
67 @end example | |
68 | |
69 @subsubsection Property 'eol-type | |
70 | |
71 The value of the property 'eol-type is: | |
72 nil: no conversion for end-of-line type | |
73 1: LF | |
74 2: CRLF | |
75 3: CR | |
76 vector of length 3: automatic detection of end-of-line type. | |
77 1st element: coding-system of eol-type LF | |
78 2nd element: coding-system of eol-type CRLF | |
79 3rd element: coding-system of eol-type CR | |
80 | |
81 @subsubsection Property 'post-read-conversion | |
82 | |
83 The value of the property 'post-read-conversion is a | |
84 function to convert some text just read into a buffer. When | |
85 the function is called, the text has already been converted | |
86 according to 'coding-system and 'eol-type of the | |
87 coding-system. The argument of the function is the region | |
88 (START and END) of inserted text. | |
89 | |
90 @subsection Property 'pre-write-conversion | |
91 | |
92 The value of the property 'pre-write-conversion is a | |
93 function to convert some text just before writing it out. | |
94 After the function is called, the text is converted accoding | |
95 to 'coding-system and 'eol-type of the coding-system. The | |
96 argument of the function is the region (START and END) of | |
97 the text. | |
98 | |
99 @node Creation | |
100 @subsection How to create coding-system? | |
101 | |
102 Mule provides a function `make-coding-system' to create a | |
103 coding-system. | |
104 | |
105 FUNCTION make-coding-system: NAME TYPE MNEMONIC DOC &optional EOL-TYPE FLAGS | |
106 | |
107 Register symbol NAME as a coding-system whose 'coding-system | |
108 property is a vector [ TYPE MNEMONIC DOC nil FLAGS ] and | |
109 'eol-type property is EOL-TYPE. If `t' is specified as | |
110 EOL-TYPE, the value of 'eol-type property is a vector of | |
111 generated coding-systems whose 'eol-type properties are 1 | |
112 (LF), 2 (CRLF), and 3 (CR). The names of generated | |
113 coding-systems are NAMEunix, NAMEdos, and NAMEmac respectively. | |
114 | |
115 Just to make an alias of some coding-system, call a fucntion | |
116 `copy-coding-system'. | |
117 | |
118 FUNCTION copy-coding-system: ORIGINAL ALIAS | |
119 | |
120 Make the same coding-system as ORIGINAL and name it ALIAS. | |
121 If 'eol-type property of ORIGINAL is a vector, coding-systems | |
122 ALIASunix, ALIASdos, and ALIASmac are generated, and | |
123 'eol-type property of ALIAS becomes a vector of them. | |
124 | |
125 @node Predefined coding-system | |
126 @subsection Predefined coding-system | |
127 | |
128 See lisp/mule.el. | |
129 | |
130 @node Automatic conversion | |
131 @subsection Automatic conversion | |
132 | |
133 @subsubsection Category of coding-system | |
134 | |
135 Mule has a facility to detect coding-system of text | |
136 automatically, however, what mule actually detect is not a | |
137 coding-system itself but a category of coding-system. A | |
138 category is also represented by a symbol and a value should | |
139 be an actual coding-system. | |
140 | |
141 There are eight categories: | |
142 @table @asis | |
143 @item *coding-category-internal*: | |
144 coding-system used in a buffer | |
145 @item *coding-category-sjis* | |
146 Shift-JIS | |
147 @item *coding-category-iso-7* | |
148 ISO2022 variation with the following feature: | |
149 o no locking shift, single shift | |
150 o only G0 is used | |
151 @item *coding-category-iso-8-1* | |
152 ISO2022 variation with the following feature: | |
153 o no locking shift | |
154 o designation sequence is allowed only for G0 and G1 | |
155 o G1 is used only for 1-byte character set | |
156 @item *coding-category-iso-8-2* | |
157 ISO2022 variation with the following feature: | |
158 o no locking shift | |
159 o designation sequence is allowed only for G0 and G1 | |
160 o G1 is used only for 2-byte character set | |
161 @item *coding-category-iso-else* | |
162 ISO2022 variation which doesn't satisfy any of above. | |
163 @item *coding-category-big5* | |
164 Big5 (ETen or HKU) | |
165 @item *coding-category-bin* | |
166 Any other coding-system which uses MSB. | |
167 @end table | |
168 | |
169 The values of these symbols are pre-defined as follows: | |
170 | |
171 @example | |
172 ----- lisp/mule.el ----------------------------------------- | |
173 (defvar *coding-category-internal* '*internal*) | |
174 (defvar *coding-category-sjis* '*sjis*) | |
175 (defvar *coding-category-iso-7* '*junet*) | |
176 (defvar *coding-category-iso-8-1* '*ctext*) | |
177 (defvar *coding-category-iso-8-2* '*euc-japan*) | |
178 (defvar *coding-category-iso-else* '*iso-2022-ss2-7*) | |
179 (defvar *coding-category-big5* '*big5-eten*) | |
180 (defvar *coding-category-bin* '*noconv*) | |
181 ------------------------------------------------------------ | |
182 @end example | |
183 | |
184 but, some of them are overridden in such language specific | |
185 files as japanese.el, chinese.el, etc. | |
186 | |
187 @subsubsection How automatic conversion works? | |
188 | |
189 When coding-system `*autoconv*' is specified on reading text | |
190 (this is the default), mule tries to detect a category of | |
191 coding-system by which text are encoded. If an appropriate | |
192 category is found, it converts text according to a | |
193 coding-system bound to the cateogry. If the 'eol-type | |
194 property of the coding-system is a vector of coding-systems | |
195 and Mule detects a type of end-of-line (LF, CRLF, or CR) of | |
196 the text, one of those coding-system is used. | |
197 | |
198 Automatic conversion occurs both on reading from files and | |
199 inputing from process. In the latter case, if some | |
200 coding-system is found, output-coding-system of the process | |
201 is also set to the found coding-system. | |
202 | |
203 @subsubsection Priority of cateogry | |
204 | |
205 In the case that more than two categories are found, the | |
206 category of the highest priority is selected. | |
207 | |
208 A priority of category is pre-defined as follows: | |
209 | |
210 @example | |
211 ----- lisp/mule.el ----------------------------------------- | |
212 (set-coding-priority | |
213 '(*coding-category-iso-8-2* | |
214 *coding-category-sjis* | |
215 *coding-category-iso-8-1* | |
216 *coding-category-big5* | |
217 *coding-category-iso-7* | |
218 *coding-category-iso-else* | |
219 *coding-category-bin* | |
220 *coding-category-internal*)) | |
221 ------------------------------------------------------------ | |
222 @end example | |
223 | |
224 The function `set-coding-priority' put a property 'priority | |
225 to each element of the argument from 0 to 7 (smaller number | |
226 has higher priority). Some language specific files may | |
227 override this priority. | |
228 | |
229 @node Mode-line | |
230 @subsection How coding-system is shown in mode-line? | |
231 | |
232 Each coding-system has unique mnemonic (one character). | |
233 By default, mnemonic of `file-coding-system' of a buffer is | |
234 shown at the left of mode-line of the buffer. In addition, | |
235 the mnemonic is followed by an another mnemonic to show | |
236 eol-type of the coding-system. This mnemonic is defined as | |
237 follows: | |
238 ".": LF | |
239 ":": CRLF | |
240 "'": CR | |
241 "_": not yet desided | |
242 "-": nil (for coding-system of nil, *noconv*, or *internal*) | |
243 So, usual appearance of mode-line for a buffer which is | |
244 visiting a file (*junet* encoding on Unix system) is: | |
245 | |
246 @example | |
247 +-- mnemonic of file-coding-system | |
248 |+-- mnemonic of eol-type | |
249 VV | |
250 [--]J.:----Mule: filename | |
251 @end example | |
252 | |
253 The left most bracket is the indicator for inputing method. | |
254 | |
255 When a buffer is attaced to some process, coding-system | |
256 for input and output of the process are also shown as | |
257 follows: | |
258 | |
259 @example | |
260 +-- mnemonic of file-coding-system | |
261 |+-- mnemonic of eol-type of file-coding-system | |
262 ||+-- mnemonic of input-coding-system of a process | |
263 |||+-- mnemonic of eol-type of input-coding-system | |
264 ||||+-- mnemonic of output-coding-system of a process | |
265 |||||+-- mnemonic of eol-type of output-coding-system | |
266 VVVVVV | |
267 [--]+_+.--:--**-Mule: *shell* | |
268 @end example | |
269 | |
270 This means that Mule is now communicating with shell with | |
271 coding-systems *autoconv*unix ("+.") for input and nil | |
272 ("--") for output. | |
273 | |
274 @node ISO2022 restriction | |
275 @subsection ISO2022 restriction | |
276 | |
277 For decoding to Type 2 (ISO2022), we have the following | |
278 restrictions: | |
279 | |
280 @table @asis | |
281 @item Locking-Shift: | |
282 Use SI and SO only when decoding with a coding-system | |
283 whose LOCK-SHIFT and SEVEN is t. | |
284 | |
285 @item Single-Shift: | |
286 Use SS2 and SS3 (if SEVEN is nil) or ESC N and ESC O (if | |
287 SEVEN is t). | |
288 | |
289 @item Invocation: | |
290 G0 is always invoked to GL, G1 to GR (but only if SEVEN is | |
291 nil). G2 and G3 are invoked to GL by Single-Shift of SS2 | |
292 and SS3. | |
293 | |
294 @item Unofficial use of ESC sequence for designation: | |
295 If SEVEN is t, LOCK-SHIFT is nil, and designation to G2 | |
296 and G3 are prohibited, we should designate all character | |
297 sets to G0 (and hence invoke to GL). To designate 96 | |
298 char-set to G0, we use "ESC , <F>". For instance, to | |
299 designate ISO8859-1 to G0, we use "ESC , A". | |
300 | |
301 @item Unofficial use of ESC sequence for composit character: | |
302 To indicate the start and end of composit character, we | |
303 use ESC 0 (start) and ESC 1 (end). | |
304 | |
305 @item Text direction specifier of ISO6429 | |
306 We use ISO6429's ESC sequence "ESC [ 2 ]" to change text | |
307 direction to right-to-left, and "ESC [ 0 ]" to revert it | |
308 to left-to-right. | |
309 @end table | |
310 | |
311 @node Big5 | |
312 @subsection Special treatment of Big5 | |
313 | |
314 As far as I know, there's several different codes called | |
315 Big5. The most famous ones are Big5-ETen and | |
316 Big5-HKU-form2. Since both of them use a code range 0xa140 | |
317 - 0xfefe (in each row, columns (second byte) 0x7f - 0xa0 is | |
318 skipped) and number of characters is more than 13000, it's | |
319 impossible to treat each of them as a single character-set | |
320 in the current Mule system. So, Mule treat them in a quite | |
321 irregular manner as described below: | |
322 | |
323 @enumerate | |
324 @item | |
325 Mule does not treats them as a different character set, | |
326 but as the same character set called Big5. | |
327 Caution!! Big5 is a different character set from GB. | |
328 | |
329 @item | |
330 Mule divides Big5 into two sub-character-sets: | |
331 0xa140 - 0xc67e (Level 1) | |
332 0xc6a1 - 0xfefe (Level 2) | |
333 and allocates two leading-chars lc-big5-1 and lc-big5-2 to | |
334 them. (See character.txt) | |
335 | |
336 @item | |
337 Usually, each leading-char (or character-set) has unique | |
338 character category. But lc-big5-1 and lc-big5-2 has the | |
339 same character category of mnemonic 't'. So, regular | |
340 expression "\\ct" matches any Big5 (Level 1 and Level 2) | |
341 characters. (See syntax.txt) | |
342 | |
343 @item | |
344 If you specify ISO2022 type coding-system on output, | |
345 Mule converts Big5 code using unofficial final-characters | |
346 '0' (for Level 1) and '1' (for Level 2). | |
347 | |
348 @item | |
349 You can use either fonts of ETen or HKU for displaying | |
350 Big5 code. Mule judges which font is used by examining | |
351 existence of character whose code point is 0xC6A1. If it | |
352 exists, the font is HKU, else the fonts is ETen. | |
353 @end enumerate | |
354 | |
355 @node Syntax | |
356 @section Syntax and Category of character | |
357 | |
358 @subsection Syntax | |
359 | |
360 Mule can define syntax of all multi-byte characters by | |
361 @code{modify-syntax-entry}. | |
362 | |
363 The first argument of @code{modify-syntax-entry should} be one of below: | |
364 @enumerate | |
365 @item | |
366 ASCII character | |
367 @item | |
368 multi-byte character | |
369 @item | |
370 leading character of multi-byte character | |
371 @item | |
372 partially defined characters returned by: | |
373 | |
374 @quotation | |
375 @code{(make-character leading-char arg)} | |
376 @end quotation | |
377 @end enumerate | |
378 | |
379 There's a restriction of specifying matching character within | |
380 second argument. If the first argument specifies multi-byte | |
381 character or leading char of multi-byte character, the | |
382 matching character should have the same leading character. If | |
383 the character is 2-byte code, the first-byte of it should | |
384 also be the same with the first-byte of first argument. | |
385 | |
386 @subsection Category | |
387 | |
388 Like syntax, category also defines characteristics of | |
389 characters. The differences are: | |
390 @enumerate | |
391 @item | |
392 Each Character can have more than one category. | |
393 @item | |
394 User can define new type of category as he wishes. | |
395 Example: See japanese.el | |
396 @item | |
397 @code{char-category} returns all mnemonics of the character by string. | |
398 @item | |
399 For regular expression search, you can use the \cm or \Cm (any mnemonics | |
400 comes at the place of 'm') instead of \sm and \Sm. | |
401 @end enumerate | |
402 | |
403 @node Font | |
404 @section Font | |
405 | |
406 FONTSET is a set of fonts which have the same height and style. A | |
407 fontset should hopefully contain enough fonts to display a character of | |
408 various character sets. | |
409 | |
410 Mule uses fontset instead of font. You can specify fontset at any place | |
411 where you can specify font. You can still specify font, in which case, | |
412 a fontset which include the font is searched and used. | |
413 | |
414 Like font, fontset is also a string specifying the name. | |
415 | |
416 @menu | |
417 * Initial fontsets:: Fontsets which Mule have at startup time. | |
418 * Specify fontset:: How to specify a fontset? | |
419 * Manage fontset:: How to create or modify a fontset? | |
420 @end menu | |
421 | |
422 @node Initial fontsets | |
423 @subsection Initial fontsets | |
424 | |
425 @subsubsection "default-fontset" | |
426 | |
427 Mule automatically creates a fontset named "default-fontset" at startup | |
428 time. Each font in this fontset is specifed by a very generic name such | |
429 as "-*-fixed-medium-r-*--16-*-iso8859-1" for ASCII and | |
430 "-*-fixed-medium-r-*--*-jisx0208.1983-*" for JISX0208 (Kanji). | |
431 These values are defined in @file{lisp/term/x-win.el}. | |
432 | |
433 If there's no other fontsets specifed by X's resource, "default-fontset" | |
434 is used for the first frame of Mule. | |
435 | |
436 In most cases, this is enough. You probably don't have to have any | |
437 other fontsets. | |
438 | |
439 @subsubsection X's resourse | |
440 | |
441 Mule also creates fontsets specified in X's resource "fontSetList (class | |
442 FontSetList)". The value is a comma separated list of fontset names. | |
443 | |
444 @example | |
445 *FontSetList: 16,24 | |
446 @end example | |
447 | |
448 The actual contents of each fontset is specified by "fontSet-xxx (class | |
449 FontSet-xxx)" where "xxx" is a name of the corresponding fontset. The | |
450 value of this resource is a comma separated list of font names. | |
451 | |
452 @example | |
453 *FontSet-16: -etl-fixed-medium-r-*--24-*-iso8859-1 | |
454 @end example | |
455 | |
456 Each font name should not contain wild card `*' or `?' in | |
457 CHARSET_REGSTRY field because a character set for this font is | |
458 recognized by this field. This means that you don't have to care about | |
459 the order of font names. | |
460 | |
461 For instance, | |
462 | |
463 @example | |
464 *FontSet-16:\ | |
465 -etl-fixed-medium-r-*--16-*-iso8859-1\ | |
466 -ming-fixed-medium-r-*--*-*-jisx0208.1983-* | |
467 @end example | |
468 | |
469 is enough to tell Mule that the fontset "16" contains ASCII font and | |
470 JISX0208 font. Please note that the second name has only wild card in | |
471 PIXEL_SIZE field. Since Mule try to open a font of the same PIXEL_SIZE | |
472 as ASCII font of the same fontset, you'ld better not specify actual | |
473 value in PIXEL_SIZE field except for ASCII font. | |
474 | |
475 As for fonts not listed in the specification of fontset, corresponding | |
476 font names in "default fontset" is used. | |
477 | |
478 The first fontset in FontSetList is used for the first frame of Mule. | |
479 If you want to use "default-fontset" while specifying other fontsets in | |
480 the resource, please put "default-fontset" at the first of the value. | |
481 | |
482 @example | |
483 *FontSetList: default-fontset,16,24 | |
484 @end example | |
485 | |
486 In this case, you don't have to have the resource | |
487 "FontSet-default-fontset". | |
488 | |
489 @node Specify fontset | |
490 @subsection How to specify a fontset? | |
491 | |
492 You can specify fontset at any place where you can sepcify font. | |
493 | |
494 To change the fontset used for the first frame of Mule: | |
495 | |
496 @enumerate | |
497 @item | |
498 command line arguments "-fn xxx" or "-font xxx" | |
499 | |
500 If this argument exits, fontset is searched in the following order: | |
501 @enumerate | |
502 @item | |
503 A fontset whose name is "xxx". | |
504 @item | |
505 A fontset which contains ASCII font "xxx". | |
506 @item | |
507 Create a new fontset "xxx" which contains ASCII font "xxx". | |
508 @end enumerate | |
509 | |
510 @item | |
511 In your ~/.emacs, | |
512 | |
513 @example | |
514 (setcdr (assoc 'font default-frame-alist) "xxx") | |
515 @end example | |
516 | |
517 @end enumerate | |
518 | |
519 To change a fontset after Mule started: | |
520 | |
521 @enumerate | |
522 @item | |
523 By the command | |
524 | |
525 @example | |
526 M-x set-default-fontset<CR>xxx<CR> | |
527 @end example | |
528 | |
529 @item | |
530 By @key{Ctl-Mouse-3} | |
531 | |
532 @end enumerate | |
533 | |
534 @node Manage fontset | |
535 @subsection How to create or modify a fontset? | |
536 | |
537 You can create a new fontset by `new-fontset' and modify an | |
538 existing fontset by `set-fontset-font'. | |
539 | |
540 You can get a list of fontset currently created by | |
541 `fonset-list'. | |
542 | |
543 You can check if a fontset is already created or not by | |
544 `fonsetp'. |