70
|
1
|
|
2 @node Coding-system
|
|
3 @section Coding-system
|
|
4
|
|
5 @noindent
|
|
6 `coding-system' is a method for encoding several
|
|
7 character-sets and represented by a symbol which has
|
|
8 properties of 'coding-system and 'eol-type.
|
|
9
|
|
10 You can specify different coding-system on file I/O, process
|
|
11 I/O, output to terminal (if not running on X), input from
|
|
12 keyboard (if not running on X).
|
|
13
|
|
14
|
|
15 @menu
|
|
16 * Structure:: Structure of coding-system
|
|
17 o Property 'coding-system
|
|
18 o Property 'eol-type
|
|
19 o Property 'post-read-conversion
|
|
20 o Property 'pre-write-conversion
|
|
21 * Creation:: How to create coding-system?
|
|
22 * Predefined coding-system::
|
|
23 * Automatic conversion::
|
|
24 o Category of coding-system
|
|
25 o How automatic conversion works?
|
|
26 o Priority of category
|
|
27 * Mode-line:: How coding-system is shown in mode-line?::
|
|
28 * ISO2022 restriction::
|
|
29 * Big5:: Special treatment of Big5
|
|
30 @end menu
|
|
31
|
|
32 @node Structure
|
|
33 @subsection Structure of coding-system
|
|
34
|
|
35 @subsubsection Property 'coding-system
|
|
36
|
|
37 The value of the property 'coding-system is a vector:
|
|
38 @quotation
|
|
39 [ TYPE MNEMONIC DOCUMENT DUMMY FLAGS ]
|
|
40 @end quotation
|
|
41 or the other coding-system. Contents of the vector are:
|
|
42 @example
|
|
43 TYPE: nil: no conversion, t: automatic conversion,
|
|
44 0:Internal, 1:Shift-JIS, 2:ISO2022, 3:Big5, 4:CCL.
|
|
45 MNEMONIC: a character shown at mode-line to indicate the coding-system.
|
|
46 DOCUMENT: a describing documents for the coding-system.
|
|
47 DUMMY: always nil (for backward compatibility)
|
|
48 FLAGS (option): more precise information about the coding-system,
|
|
49 If TYPE is 2 (ISO2022), FLAGS should be a list of:
|
|
50 LB-G0, LB-G1, LB-G2, LB-G3:
|
|
51 Leading character of charset initially designated to G? graphic set,
|
|
52 nil means G? is not designated initially,
|
|
53 lb-invalid means G? can never be designated to,
|
|
54 if (- leading-char) is specified, it is designated on output,
|
|
55 SHORT: non-nil - allow such as \"ESC $ B\", nil - always \"ESC $ \( B\",
|
|
56 ASCII-EOL: non-nil - designate ASCII to g0 at end of line on output,
|
|
57 ASCII-CNTL: non-nil - designate ASCII to g0 at control codes on output
|
|
58 SEVEN: non-nil - use 7-bit environment on output,
|
|
59 LOCK-SHIFT: non-nil - use locking-shift (SO/SI) instead of single-shift
|
|
60 or designation by escape sequence,
|
|
61 USE-ROMAN: non-nil - designate JIS0201-1976-Roman instead of ASCII,
|
|
62 USE-OLDJIS: non-nil - designate JIS0208-1976 instead of JIS0208-1983,
|
|
63 NO-ISO6429: non-nil - don't use ISO6429's direction specification,
|
|
64 If TYPE is 3 (Big5), FLAGS `t' means Big5-ETen, `nil' means Big5-HKU,
|
|
65 If TYPE is 4 (private), FLAGS should be a cons of CCL programs
|
|
66 for encoding and decoding. See documentation of CCL for more detail.
|
|
67 @end example
|
|
68
|
|
69 @subsubsection Property 'eol-type
|
|
70
|
|
71 The value of the property 'eol-type is:
|
|
72 nil: no conversion for end-of-line type
|
|
73 1: LF
|
|
74 2: CRLF
|
|
75 3: CR
|
|
76 vector of length 3: automatic detection of end-of-line type.
|
|
77 1st element: coding-system of eol-type LF
|
|
78 2nd element: coding-system of eol-type CRLF
|
|
79 3rd element: coding-system of eol-type CR
|
|
80
|
|
81 @subsubsection Property 'post-read-conversion
|
|
82
|
|
83 The value of the property 'post-read-conversion is a
|
|
84 function to convert some text just read into a buffer. When
|
|
85 the function is called, the text has already been converted
|
|
86 according to 'coding-system and 'eol-type of the
|
|
87 coding-system. The argument of the function is the region
|
|
88 (START and END) of inserted text.
|
|
89
|
|
90 @subsection Property 'pre-write-conversion
|
|
91
|
|
92 The value of the property 'pre-write-conversion is a
|
|
93 function to convert some text just before writing it out.
|
|
94 After the function is called, the text is converted accoding
|
|
95 to 'coding-system and 'eol-type of the coding-system. The
|
|
96 argument of the function is the region (START and END) of
|
|
97 the text.
|
|
98
|
|
99 @node Creation
|
|
100 @subsection How to create coding-system?
|
|
101
|
|
102 Mule provides a function `make-coding-system' to create a
|
|
103 coding-system.
|
|
104
|
|
105 FUNCTION make-coding-system: NAME TYPE MNEMONIC DOC &optional EOL-TYPE FLAGS
|
|
106
|
|
107 Register symbol NAME as a coding-system whose 'coding-system
|
|
108 property is a vector [ TYPE MNEMONIC DOC nil FLAGS ] and
|
|
109 'eol-type property is EOL-TYPE. If `t' is specified as
|
|
110 EOL-TYPE, the value of 'eol-type property is a vector of
|
|
111 generated coding-systems whose 'eol-type properties are 1
|
|
112 (LF), 2 (CRLF), and 3 (CR). The names of generated
|
|
113 coding-systems are NAMEunix, NAMEdos, and NAMEmac respectively.
|
|
114
|
108
|
115 Just to make an alias of some coding-system, call a function
|
70
|
116 `copy-coding-system'.
|
|
117
|
|
118 FUNCTION copy-coding-system: ORIGINAL ALIAS
|
|
119
|
|
120 Make the same coding-system as ORIGINAL and name it ALIAS.
|
|
121 If 'eol-type property of ORIGINAL is a vector, coding-systems
|
|
122 ALIASunix, ALIASdos, and ALIASmac are generated, and
|
|
123 'eol-type property of ALIAS becomes a vector of them.
|
|
124
|
|
125 @node Predefined coding-system
|
|
126 @subsection Predefined coding-system
|
|
127
|
|
128 See lisp/mule.el.
|
|
129
|
|
130 @node Automatic conversion
|
|
131 @subsection Automatic conversion
|
|
132
|
|
133 @subsubsection Category of coding-system
|
|
134
|
|
135 Mule has a facility to detect coding-system of text
|
|
136 automatically, however, what mule actually detect is not a
|
|
137 coding-system itself but a category of coding-system. A
|
|
138 category is also represented by a symbol and a value should
|
|
139 be an actual coding-system.
|
|
140
|
|
141 There are eight categories:
|
|
142 @table @asis
|
|
143 @item *coding-category-internal*:
|
|
144 coding-system used in a buffer
|
|
145 @item *coding-category-sjis*
|
|
146 Shift-JIS
|
|
147 @item *coding-category-iso-7*
|
|
148 ISO2022 variation with the following feature:
|
|
149 o no locking shift, single shift
|
|
150 o only G0 is used
|
|
151 @item *coding-category-iso-8-1*
|
|
152 ISO2022 variation with the following feature:
|
|
153 o no locking shift
|
|
154 o designation sequence is allowed only for G0 and G1
|
|
155 o G1 is used only for 1-byte character set
|
|
156 @item *coding-category-iso-8-2*
|
|
157 ISO2022 variation with the following feature:
|
|
158 o no locking shift
|
|
159 o designation sequence is allowed only for G0 and G1
|
|
160 o G1 is used only for 2-byte character set
|
|
161 @item *coding-category-iso-else*
|
|
162 ISO2022 variation which doesn't satisfy any of above.
|
|
163 @item *coding-category-big5*
|
|
164 Big5 (ETen or HKU)
|
|
165 @item *coding-category-bin*
|
|
166 Any other coding-system which uses MSB.
|
|
167 @end table
|
|
168
|
|
169 The values of these symbols are pre-defined as follows:
|
|
170
|
|
171 @example
|
|
172 ----- lisp/mule.el -----------------------------------------
|
|
173 (defvar *coding-category-internal* '*internal*)
|
|
174 (defvar *coding-category-sjis* '*sjis*)
|
|
175 (defvar *coding-category-iso-7* '*junet*)
|
|
176 (defvar *coding-category-iso-8-1* '*ctext*)
|
|
177 (defvar *coding-category-iso-8-2* '*euc-japan*)
|
|
178 (defvar *coding-category-iso-else* '*iso-2022-ss2-7*)
|
|
179 (defvar *coding-category-big5* '*big5-eten*)
|
|
180 (defvar *coding-category-bin* '*noconv*)
|
|
181 ------------------------------------------------------------
|
|
182 @end example
|
|
183
|
|
184 but, some of them are overridden in such language specific
|
|
185 files as japanese.el, chinese.el, etc.
|
|
186
|
|
187 @subsubsection How automatic conversion works?
|
|
188
|
|
189 When coding-system `*autoconv*' is specified on reading text
|
|
190 (this is the default), mule tries to detect a category of
|
|
191 coding-system by which text are encoded. If an appropriate
|
|
192 category is found, it converts text according to a
|
|
193 coding-system bound to the cateogry. If the 'eol-type
|
|
194 property of the coding-system is a vector of coding-systems
|
|
195 and Mule detects a type of end-of-line (LF, CRLF, or CR) of
|
|
196 the text, one of those coding-system is used.
|
|
197
|
|
198 Automatic conversion occurs both on reading from files and
|
|
199 inputing from process. In the latter case, if some
|
|
200 coding-system is found, output-coding-system of the process
|
|
201 is also set to the found coding-system.
|
|
202
|
|
203 @subsubsection Priority of cateogry
|
|
204
|
|
205 In the case that more than two categories are found, the
|
|
206 category of the highest priority is selected.
|
|
207
|
|
208 A priority of category is pre-defined as follows:
|
|
209
|
|
210 @example
|
|
211 ----- lisp/mule.el -----------------------------------------
|
|
212 (set-coding-priority
|
|
213 '(*coding-category-iso-8-2*
|
|
214 *coding-category-sjis*
|
|
215 *coding-category-iso-8-1*
|
|
216 *coding-category-big5*
|
|
217 *coding-category-iso-7*
|
|
218 *coding-category-iso-else*
|
|
219 *coding-category-bin*
|
|
220 *coding-category-internal*))
|
|
221 ------------------------------------------------------------
|
|
222 @end example
|
|
223
|
|
224 The function `set-coding-priority' put a property 'priority
|
|
225 to each element of the argument from 0 to 7 (smaller number
|
|
226 has higher priority). Some language specific files may
|
|
227 override this priority.
|
|
228
|
|
229 @node Mode-line
|
|
230 @subsection How coding-system is shown in mode-line?
|
|
231
|
|
232 Each coding-system has unique mnemonic (one character).
|
|
233 By default, mnemonic of `file-coding-system' of a buffer is
|
|
234 shown at the left of mode-line of the buffer. In addition,
|
|
235 the mnemonic is followed by an another mnemonic to show
|
|
236 eol-type of the coding-system. This mnemonic is defined as
|
|
237 follows:
|
|
238 ".": LF
|
|
239 ":": CRLF
|
|
240 "'": CR
|
|
241 "_": not yet desided
|
|
242 "-": nil (for coding-system of nil, *noconv*, or *internal*)
|
|
243 So, usual appearance of mode-line for a buffer which is
|
|
244 visiting a file (*junet* encoding on Unix system) is:
|
|
245
|
|
246 @example
|
|
247 +-- mnemonic of file-coding-system
|
|
248 |+-- mnemonic of eol-type
|
|
249 VV
|
|
250 [--]J.:----Mule: filename
|
|
251 @end example
|
|
252
|
|
253 The left most bracket is the indicator for inputing method.
|
|
254
|
|
255 When a buffer is attaced to some process, coding-system
|
|
256 for input and output of the process are also shown as
|
|
257 follows:
|
|
258
|
|
259 @example
|
|
260 +-- mnemonic of file-coding-system
|
|
261 |+-- mnemonic of eol-type of file-coding-system
|
|
262 ||+-- mnemonic of input-coding-system of a process
|
|
263 |||+-- mnemonic of eol-type of input-coding-system
|
|
264 ||||+-- mnemonic of output-coding-system of a process
|
|
265 |||||+-- mnemonic of eol-type of output-coding-system
|
|
266 VVVVVV
|
|
267 [--]+_+.--:--**-Mule: *shell*
|
|
268 @end example
|
|
269
|
|
270 This means that Mule is now communicating with shell with
|
|
271 coding-systems *autoconv*unix ("+.") for input and nil
|
|
272 ("--") for output.
|
|
273
|
|
274 @node ISO2022 restriction
|
|
275 @subsection ISO2022 restriction
|
|
276
|
|
277 For decoding to Type 2 (ISO2022), we have the following
|
|
278 restrictions:
|
|
279
|
|
280 @table @asis
|
|
281 @item Locking-Shift:
|
|
282 Use SI and SO only when decoding with a coding-system
|
|
283 whose LOCK-SHIFT and SEVEN is t.
|
|
284
|
|
285 @item Single-Shift:
|
|
286 Use SS2 and SS3 (if SEVEN is nil) or ESC N and ESC O (if
|
|
287 SEVEN is t).
|
|
288
|
|
289 @item Invocation:
|
|
290 G0 is always invoked to GL, G1 to GR (but only if SEVEN is
|
|
291 nil). G2 and G3 are invoked to GL by Single-Shift of SS2
|
|
292 and SS3.
|
|
293
|
|
294 @item Unofficial use of ESC sequence for designation:
|
|
295 If SEVEN is t, LOCK-SHIFT is nil, and designation to G2
|
|
296 and G3 are prohibited, we should designate all character
|
|
297 sets to G0 (and hence invoke to GL). To designate 96
|
|
298 char-set to G0, we use "ESC , <F>". For instance, to
|
|
299 designate ISO8859-1 to G0, we use "ESC , A".
|
|
300
|
|
301 @item Unofficial use of ESC sequence for composit character:
|
|
302 To indicate the start and end of composit character, we
|
|
303 use ESC 0 (start) and ESC 1 (end).
|
|
304
|
|
305 @item Text direction specifier of ISO6429
|
|
306 We use ISO6429's ESC sequence "ESC [ 2 ]" to change text
|
|
307 direction to right-to-left, and "ESC [ 0 ]" to revert it
|
|
308 to left-to-right.
|
|
309 @end table
|
|
310
|
|
311 @node Big5
|
|
312 @subsection Special treatment of Big5
|
|
313
|
|
314 As far as I know, there's several different codes called
|
|
315 Big5. The most famous ones are Big5-ETen and
|
|
316 Big5-HKU-form2. Since both of them use a code range 0xa140
|
|
317 - 0xfefe (in each row, columns (second byte) 0x7f - 0xa0 is
|
|
318 skipped) and number of characters is more than 13000, it's
|
|
319 impossible to treat each of them as a single character-set
|
|
320 in the current Mule system. So, Mule treat them in a quite
|
|
321 irregular manner as described below:
|
|
322
|
|
323 @enumerate
|
|
324 @item
|
|
325 Mule does not treats them as a different character set,
|
|
326 but as the same character set called Big5.
|
|
327 Caution!! Big5 is a different character set from GB.
|
|
328
|
|
329 @item
|
|
330 Mule divides Big5 into two sub-character-sets:
|
|
331 0xa140 - 0xc67e (Level 1)
|
|
332 0xc6a1 - 0xfefe (Level 2)
|
|
333 and allocates two leading-chars lc-big5-1 and lc-big5-2 to
|
|
334 them. (See character.txt)
|
|
335
|
|
336 @item
|
|
337 Usually, each leading-char (or character-set) has unique
|
|
338 character category. But lc-big5-1 and lc-big5-2 has the
|
|
339 same character category of mnemonic 't'. So, regular
|
|
340 expression "\\ct" matches any Big5 (Level 1 and Level 2)
|
|
341 characters. (See syntax.txt)
|
|
342
|
|
343 @item
|
|
344 If you specify ISO2022 type coding-system on output,
|
|
345 Mule converts Big5 code using unofficial final-characters
|
|
346 '0' (for Level 1) and '1' (for Level 2).
|
|
347
|
|
348 @item
|
|
349 You can use either fonts of ETen or HKU for displaying
|
|
350 Big5 code. Mule judges which font is used by examining
|
|
351 existence of character whose code point is 0xC6A1. If it
|
|
352 exists, the font is HKU, else the fonts is ETen.
|
|
353 @end enumerate
|
|
354
|
|
355 @node Syntax
|
|
356 @section Syntax and Category of character
|
|
357
|
|
358 @subsection Syntax
|
|
359
|
|
360 Mule can define syntax of all multi-byte characters by
|
|
361 @code{modify-syntax-entry}.
|
|
362
|
|
363 The first argument of @code{modify-syntax-entry should} be one of below:
|
|
364 @enumerate
|
|
365 @item
|
|
366 ASCII character
|
|
367 @item
|
|
368 multi-byte character
|
|
369 @item
|
|
370 leading character of multi-byte character
|
|
371 @item
|
|
372 partially defined characters returned by:
|
|
373
|
|
374 @quotation
|
|
375 @code{(make-character leading-char arg)}
|
|
376 @end quotation
|
|
377 @end enumerate
|
|
378
|
|
379 There's a restriction of specifying matching character within
|
|
380 second argument. If the first argument specifies multi-byte
|
|
381 character or leading char of multi-byte character, the
|
|
382 matching character should have the same leading character. If
|
|
383 the character is 2-byte code, the first-byte of it should
|
|
384 also be the same with the first-byte of first argument.
|
|
385
|
|
386 @subsection Category
|
|
387
|
|
388 Like syntax, category also defines characteristics of
|
|
389 characters. The differences are:
|
|
390 @enumerate
|
|
391 @item
|
|
392 Each Character can have more than one category.
|
|
393 @item
|
|
394 User can define new type of category as he wishes.
|
|
395 Example: See japanese.el
|
|
396 @item
|
|
397 @code{char-category} returns all mnemonics of the character by string.
|
|
398 @item
|
|
399 For regular expression search, you can use the \cm or \Cm (any mnemonics
|
|
400 comes at the place of 'm') instead of \sm and \Sm.
|
|
401 @end enumerate
|
|
402
|
|
403 @node Font
|
|
404 @section Font
|
|
405
|
|
406 FONTSET is a set of fonts which have the same height and style. A
|
|
407 fontset should hopefully contain enough fonts to display a character of
|
|
408 various character sets.
|
|
409
|
|
410 Mule uses fontset instead of font. You can specify fontset at any place
|
|
411 where you can specify font. You can still specify font, in which case,
|
|
412 a fontset which include the font is searched and used.
|
|
413
|
|
414 Like font, fontset is also a string specifying the name.
|
|
415
|
|
416 @menu
|
|
417 * Initial fontsets:: Fontsets which Mule have at startup time.
|
|
418 * Specify fontset:: How to specify a fontset?
|
|
419 * Manage fontset:: How to create or modify a fontset?
|
|
420 @end menu
|
|
421
|
|
422 @node Initial fontsets
|
|
423 @subsection Initial fontsets
|
|
424
|
|
425 @subsubsection "default-fontset"
|
|
426
|
|
427 Mule automatically creates a fontset named "default-fontset" at startup
|
|
428 time. Each font in this fontset is specifed by a very generic name such
|
|
429 as "-*-fixed-medium-r-*--16-*-iso8859-1" for ASCII and
|
|
430 "-*-fixed-medium-r-*--*-jisx0208.1983-*" for JISX0208 (Kanji).
|
|
431 These values are defined in @file{lisp/term/x-win.el}.
|
|
432
|
|
433 If there's no other fontsets specifed by X's resource, "default-fontset"
|
|
434 is used for the first frame of Mule.
|
|
435
|
|
436 In most cases, this is enough. You probably don't have to have any
|
|
437 other fontsets.
|
|
438
|
|
439 @subsubsection X's resourse
|
|
440
|
|
441 Mule also creates fontsets specified in X's resource "fontSetList (class
|
|
442 FontSetList)". The value is a comma separated list of fontset names.
|
|
443
|
|
444 @example
|
|
445 *FontSetList: 16,24
|
|
446 @end example
|
|
447
|
|
448 The actual contents of each fontset is specified by "fontSet-xxx (class
|
|
449 FontSet-xxx)" where "xxx" is a name of the corresponding fontset. The
|
|
450 value of this resource is a comma separated list of font names.
|
|
451
|
|
452 @example
|
|
453 *FontSet-16: -etl-fixed-medium-r-*--24-*-iso8859-1
|
|
454 @end example
|
|
455
|
|
456 Each font name should not contain wild card `*' or `?' in
|
|
457 CHARSET_REGSTRY field because a character set for this font is
|
|
458 recognized by this field. This means that you don't have to care about
|
|
459 the order of font names.
|
|
460
|
|
461 For instance,
|
|
462
|
|
463 @example
|
|
464 *FontSet-16:\
|
|
465 -etl-fixed-medium-r-*--16-*-iso8859-1\
|
|
466 -ming-fixed-medium-r-*--*-*-jisx0208.1983-*
|
|
467 @end example
|
|
468
|
|
469 is enough to tell Mule that the fontset "16" contains ASCII font and
|
|
470 JISX0208 font. Please note that the second name has only wild card in
|
|
471 PIXEL_SIZE field. Since Mule try to open a font of the same PIXEL_SIZE
|
|
472 as ASCII font of the same fontset, you'ld better not specify actual
|
|
473 value in PIXEL_SIZE field except for ASCII font.
|
|
474
|
|
475 As for fonts not listed in the specification of fontset, corresponding
|
|
476 font names in "default fontset" is used.
|
|
477
|
|
478 The first fontset in FontSetList is used for the first frame of Mule.
|
|
479 If you want to use "default-fontset" while specifying other fontsets in
|
|
480 the resource, please put "default-fontset" at the first of the value.
|
|
481
|
|
482 @example
|
|
483 *FontSetList: default-fontset,16,24
|
|
484 @end example
|
|
485
|
|
486 In this case, you don't have to have the resource
|
|
487 "FontSet-default-fontset".
|
|
488
|
|
489 @node Specify fontset
|
|
490 @subsection How to specify a fontset?
|
|
491
|
|
492 You can specify fontset at any place where you can sepcify font.
|
|
493
|
|
494 To change the fontset used for the first frame of Mule:
|
|
495
|
|
496 @enumerate
|
|
497 @item
|
|
498 command line arguments "-fn xxx" or "-font xxx"
|
|
499
|
|
500 If this argument exits, fontset is searched in the following order:
|
|
501 @enumerate
|
|
502 @item
|
|
503 A fontset whose name is "xxx".
|
|
504 @item
|
|
505 A fontset which contains ASCII font "xxx".
|
|
506 @item
|
|
507 Create a new fontset "xxx" which contains ASCII font "xxx".
|
|
508 @end enumerate
|
|
509
|
|
510 @item
|
|
511 In your ~/.emacs,
|
|
512
|
|
513 @example
|
|
514 (setcdr (assoc 'font default-frame-alist) "xxx")
|
|
515 @end example
|
|
516
|
|
517 @end enumerate
|
|
518
|
|
519 To change a fontset after Mule started:
|
|
520
|
|
521 @enumerate
|
|
522 @item
|
|
523 By the command
|
|
524
|
|
525 @example
|
|
526 M-x set-default-fontset<CR>xxx<CR>
|
|
527 @end example
|
|
528
|
|
529 @item
|
|
530 By @key{Ctl-Mouse-3}
|
|
531
|
|
532 @end enumerate
|
|
533
|
|
534 @node Manage fontset
|
|
535 @subsection How to create or modify a fontset?
|
|
536
|
|
537 You can create a new fontset by `new-fontset' and modify an
|
|
538 existing fontset by `set-fontset-font'.
|
|
539
|
|
540 You can get a list of fontset currently created by
|
|
541 `fonset-list'.
|
|
542
|
|
543 You can check if a fontset is already created or not by
|
|
544 `fonsetp'.
|