comparison man/lispref/mule.texi @ 54:05472e90ae02 r19-16-pre2

Import from CVS: tag r19-16-pre2
author cvs
date Mon, 13 Aug 2007 08:57:55 +0200
parents 376386a54a3c
children 131b0175ea99
comparison
equal deleted inserted replaced
53:875393c1a535 54:05472e90ae02
201 the actual charset object. 201 the actual charset object.
202 @item doc-string 202 @item doc-string
203 A documentation string describing the charset. 203 A documentation string describing the charset.
204 @item registry 204 @item registry
205 A regular expression matching the font registry field for this character 205 A regular expression matching the font registry field for this character
206 set. For example, both the @code{ascii} and @code{latin-1} charsets 206 set. For example, both the @code{ascii} and @code{latin-iso8859-1}
207 use the registry @code{"ISO8859-1"}. This field is used to choose 207 charsets use the registry @code{"ISO8859-1"}. This field is used to
208 an appropriate font when the user gives a general font specification 208 choose an appropriate font when the user gives a general font
209 such as @samp{-*-courier-medium-r-*-140-*}, i.e. a 14-point upright 209 specification such as @samp{-*-courier-medium-r-*-140-*}, i.e. a
210 medium-weight Courier font. 210 14-point upright medium-weight Courier font.
211 @item dimension 211 @item dimension
212 Number of position codes used to index a character in the character set. 212 Number of position codes used to index a character in the character set.
213 XEmacs/MULE can only handle character sets of dimension 1 or 2. 213 XEmacs/MULE can only handle character sets of dimension 1 or 2.
214 This property defaults to 1. 214 This property defaults to 1.
215 @item chars 215 @item chars
249 font used to display the character set. With @code{graphic} set to 0, 249 font used to display the character set. With @code{graphic} set to 0,
250 position codes 33 through 126 map to font indices 33 through 126; with 250 position codes 33 through 126 map to font indices 33 through 126; with
251 it set to 1, position codes 33 through 126 map to font indices 161 251 it set to 1, position codes 33 through 126 map to font indices 161
252 through 254 (i.e. the same number but with the high bit set). For 252 through 254 (i.e. the same number but with the high bit set). For
253 example, for a font whose registry is ISO8859-1, the left half of the 253 example, for a font whose registry is ISO8859-1, the left half of the
254 font (octets 0x20 - 0x7F) is the @code{ascii} charset, while the 254 font (octets 0x20 - 0x7F) is the @code{ascii} charset, while the right
255 right half (octets 0xA0 - 0xFF) is the @code{latin-1} charset. 255 half (octets 0xA0 - 0xFF) is the @code{latin-iso8859-1} charset.
256 @item ccl-program 256 @item ccl-program
257 A compiled CCL program used to convert a character in this charset into 257 A compiled CCL program used to convert a character in this charset into
258 an index into the font. This is in addition to the @code{graphic} 258 an index into the font. This is in addition to the @code{graphic}
259 property. If a CCL program is defined, the position codes of a 259 property. If a CCL program is defined, the position codes of a
260 character will first be processed according to @code{graphic} and 260 character will first be processed according to @code{graphic} and
398 @subsection Predefined Charsets 398 @subsection Predefined Charsets
399 399
400 The following charsets are predefined in the C code. 400 The following charsets are predefined in the C code.
401 401
402 @example 402 @example
403 Name Doc String Type Fi Gr Dir Registry 403 Name Type Fi Gr Dir Registry
404 -------------------------------------------------------------- 404 --------------------------------------------------------------
405 ascii ASCII 94 B 0 l2r ISO8859-1 405 ascii 94 B 0 l2r ISO8859-1
406 control-1 Control characters 94 0 l2r --- 406 control-1 94 0 l2r ---
407 latin-1 Latin-1 94 A 1 l2r ISO8859-1 407 latin-iso8859-1 94 A 1 l2r ISO8859-1
408 latin-2 Latin-2 96 B 1 l2r ISO8859-2 408 latin-iso8859-2 96 B 1 l2r ISO8859-2
409 latin-3 Latin-3 96 C 1 l2r ISO8859-3 409 latin-iso8859-3 96 C 1 l2r ISO8859-3
410 latin-4 Latin-4 96 D 1 l2r ISO8859-4 410 latin-iso8859-4 96 D 1 l2r ISO8859-4
411 cyrillic Cyrillic 96 L 1 l2r ISO8859-5 411 cyrillic-iso8859-5 96 L 1 l2r ISO8859-5
412 arabic Arabic 96 G 1 r2l ISO8859-6 412 arabic-iso8859-6 96 G 1 r2l ISO8859-6
413 greek Greek 96 F 1 l2r ISO8859-7 413 greek-iso8859-7 96 F 1 l2r ISO8859-7
414 hebrew Hebrew 96 H 1 r2l ISO8859-8 414 hebrew-iso8859-8 96 H 1 r2l ISO8859-8
415 latin-5 Latin-5 96 M 1 l2r ISO8859-9 415 latin-iso8859-9 96 M 1 l2r ISO8859-9
416 thai Thai 96 T 1 l2r TIS620 416 thai-tis620 96 T 1 l2r TIS620
417 japanese-kana Japanese Katakana 94 I 1 l2r JISX0201.1976 417 katakana-jisx0201 94 I 1 l2r JISX0201.1976
418 japanese-roman Japanese Roman 94 J 0 l2r JISX0201.1976 418 latin-jisx0201 94 J 0 l2r JISX0201.1976
419 japanese-old Japanese Old 94x94 @@ 0 l2r JISX0208.1978 419 japanese-jisx0208-1978 94x94 @@ 0 l2r JISX0208.1978
420 chinese-gb Chinese GB 94x94 A 0 l2r GB2312 420 japanese-jisx0208 94x94 B 0 l2r JISX0208.19(83|90)
421 japanese Japanese 94x94 B 0 l2r JISX0208.19(83|90) 421 japanese-jisx0212 94x94 D 0 l2r JISX0212
422 korean Korean 94x94 C 0 l2r KSC5601 422 chinese-gb2312 94x94 A 0 l2r GB2312
423 japanese-2 Japanese Supplement 94x94 D 0 l2r JISX0212 423 chinese-cns11643-1 94x94 G 0 l2r CNS11643.1
424 chinese-cns-1 Chinese CNS Plane 1 94x94 G 0 l2r CNS11643.1 424 chinese-cns11643-2 94x94 H 0 l2r CNS11643.2
425 chinese-cns-2 Chinese CNS Plane 2 94x94 H 0 l2r CNS11643.2 425 chinese-big5-1 94x94 0 0 l2r Big5
426 chinese-big5-1 Chinese Big5 Level 1 94x94 0 0 l2r Big5 426 chinese-big5-2 94x94 1 0 l2r Big5
427 chinese-big5-2 Chinese Big5 Level 2 94x94 1 0 l2r Big5 427 korean-ksc5601 94x94 C 0 l2r KSC5601
428 composite Composite 96x96 0 l2r --- 428 composite 96x96 0 l2r ---
429 @end example 429 @end example
430 430
431 The following charsets are predefined in the Lisp code. 431 The following charsets are predefined in the Lisp code.
432 432
433 @example 433 @example
434 Name Doc String Type Fi Gr Dir Registry 434 Name Type Fi Gr Dir Registry
435 -------------------------------------------------------------- 435 --------------------------------------------------------------
436 arabic-0 Arabic digits 94 2 0 l2r MuleArabic-0 436 arabic-digit 94 2 0 l2r MuleArabic-0
437 arabic-1 one-column Arabic 94 3 0 r2l MuleArabic-1 437 arabic-1-column 94 3 0 r2l MuleArabic-1
438 arabic-2 one-column Arabic 94 4 0 r2l MuleArabic-2 438 arabic-2-column 94 4 0 r2l MuleArabic-2
439 sisheng PinYin-ZhuYin 94 0 0 l2r sisheng_cwnn\| 439 sisheng 94 0 0 l2r sisheng_cwnn\|OMRON_UDC_ZH
440 OMRON_UDC_ZH 440 chinese-cns11643-3 94x94 I 0 l2r CNS11643.1
441 chinese-cns-3 Chinese CNS Plane 3 94x94 I 0 l2r CNS11643.1 441 chinese-cns11643-4 94x94 J 0 l2r CNS11643.1
442 chinese-cns-4 Chinese CNS Plane 4 94x94 J 0 l2r CNS11643.1 442 chinese-cns11643-5 94x94 K 0 l2r CNS11643.1
443 chinese-cns-5 Chinese CNS Plane 5 94x94 K 0 l2r CNS11643.1 443 chinese-cns11643-6 94x94 L 0 l2r CNS11643.1
444 chinese-cns-6 Chinese CNS Plane 6 94x94 L 0 l2r CNS11643.1 444 chinese-cns11643-7 94x94 M 0 l2r CNS11643.1
445 chinese-cns-7 Chinese CNS Plane 7 94x94 M 0 l2r CNS11643.1 445 ethiopic 94x94 2 0 l2r Ethio
446 ethiopic Ethiopic 94x94 2 0 l2r Ethio 446 ascii-r2l 94 B 0 r2l ISO8859-1
447 ascii-r2l Right-to-Left ASCII 94 B 0 r2l ISO8859-1 447 ipa 96 0 1 l2r MuleIPA
448 ipa IPA for Mule 96 0 1 l2r MuleIPA 448 vietnamese-lower 96 1 1 l2r VISCII1.1
449 vietnamese-1 VISCII lower 96 1 1 l2r VISCII1.1 449 vietnamese-upper 96 2 1 l2r VISCII1.1
450 vietnamese-2 VISCII upper 96 2 1 l2r VISCII1.1
451 @end example 450 @end example
452 451
453 For all of the above charsets, the dimension and number of columns are 452 For all of the above charsets, the dimension and number of columns are
454 the same. 453 the same.
455 454
472 @defun char-octet ch &optional n 471 @defun char-octet ch &optional n
473 This function returns the octet (i.e. position code) numbered @var{n} 472 This function returns the octet (i.e. position code) numbered @var{n}
474 (should be 0 or 1) of char @var{ch}. @var{n} defaults to 0 if omitted. 473 (should be 0 or 1) of char @var{ch}. @var{n} defaults to 0 if omitted.
475 @end defun 474 @end defun
476 475
477 @defun charsets-in-region start end &optional buffer 476 @defun find-charset-region start end &optional buffer
478 This function returns a list of the charsets in the region between 477 This function returns a list of the charsets in the region between
479 @var{start} and @var{end}. @var{buffer} defaults to the current buffer 478 @var{start} and @var{end}. @var{buffer} defaults to the current buffer
480 if omitted. 479 if omitted.
481 @end defun 480 @end defun
482 481
483 @defun charsets-in-string string 482 @defun find-charset-string string
484 This function returns a list of the charsets in @var{string}. 483 This function returns a list of the charsets in @var{string}.
485 @end defun 484 @end defun
486 485
487 @node Composite Characters 486 @node Composite Characters
488 @section Composite Characters 487 @section Composite Characters
516 @end defun 515 @end defun
517 516
518 @node ISO 2022 517 @node ISO 2022
519 @section ISO 2022 518 @section ISO 2022
520 519
521 This section briefly describes the ISO2022 encoding standard. For more 520 This section briefly describes the ISO 2022 encoding standard. For more
522 thorough understanding, please refer to the original document of 521 thorough understanding, please refer to the original document of ISO
523 ISO2022. 522 2022.
524 523
525 Character sets (@dfn{charsets}) are classified into the following four 524 Character sets (@dfn{charsets}) are classified into the following four
526 categories, according to the number of characters of charset: 525 categories, according to the number of characters of charset:
527 94-charset, 96-charset, 94x94-charset, and 96x96-charset. 526 94-charset, 96-charset, 94x94-charset, and 96x96-charset.
528 527
564 @end example 563 @end example
565 564
566 Usually, in the initial state, G0 is invoked into GL, and G1 565 Usually, in the initial state, G0 is invoked into GL, and G1
567 is invoked into GR. 566 is invoked into GR.
568 567
569 ISO2022 distinguishes 7-bit environments and 8-bit 568 ISO 2022 distinguishes 7-bit environments and 8-bit environments. In
570 environments. In 7-bit environments, only C0 and GL are used. 569 7-bit environments, only C0 and GL are used.
571 570
572 Charset designation is done by escape sequences of the form: 571 Charset designation is done by escape sequences of the form:
573 572
574 @example 573 @example
575 ESC [@var{I}] @var{I} @var{F} 574 ESC [@var{I}] @var{I} @var{F}
587 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}. 586 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}.
588 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}. 587 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}.
589 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}. 588 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}.
590 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}. 589 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}.
591 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}. 590 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}.
592 / [0x2F]: designate to G3 a 96-charset whose final byte is 591 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}.
593 @var{F}.
594 @end group 592 @end group
595 @end example 593 @end example
596 594
597 The following rule is not allowed in ISO2022 but can be used 595 The following rule is not allowed in ISO 2022 but can be used in Mule.
598 in Mule.
599 596
600 @example 597 @example
601 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}. 598 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}.
602 @end example 599 @end example
603 600
611 ESC $ ( B or ESC $ B : designate to G0 JISX0208 608 ESC $ ( B or ESC $ B : designate to G0 JISX0208
612 ESC $ ) C : designate to G1 KSC5601 609 ESC $ ) C : designate to G1 KSC5601
613 @end group 610 @end group
614 @end example 611 @end example
615 612
616 To use a charset designated to G2 or G3, and to use a 613 To use a charset designated to G2 or G3, and to use a charset designated
617 charset designated to G1 in a 7-bit environment, you must 614 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3
618 explicitly invoke G1, G2, or G3 into GL. There are two 615 into GL. There are two types of invocation, Locking Shift (forever) and
619 types of invocation, Locking Shift (forever) and Single 616 Single Shift (one character only).
620 Shift (one character only).
621 617
622 Locking Shift is done as follows: 618 Locking Shift is done as follows:
623 619
624 @example 620 @example
625 SI or LS0: invoke G0 into GL 621 LS0 or SI (0x0F): invoke G0 into GL
626 SO or LS1: invoke G1 into GL 622 LS1 or SO (0x0E): invoke G1 into GL
627 LS2: invoke G2 into GL 623 LS2: invoke G2 into GL
628 LS3: invoke G3 into GL 624 LS3: invoke G3 into GL
629 LS1R: invoke G1 into GR 625 LS1R: invoke G1 into GR
630 LS2R: invoke G2 into GR 626 LS2R: invoke G2 into GR
631 LS3R: invoke G3 into GR 627 LS3R: invoke G3 into GR
644 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and 640 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and
645 ESC O behave as indicated. The above definitions will not parse 641 ESC O behave as indicated. The above definitions will not parse
646 EUC-encoded text correctly, and it looks like the code in mule-coding.c 642 EUC-encoded text correctly, and it looks like the code in mule-coding.c
647 has similar problems.) 643 has similar problems.)
648 644
649 You may realize that there are a lot of ISO2022-compliant ways of 645 You may realize that there are a lot of ISO-2022-compliant ways of
650 encoding multilingual text. Now, in the world, there exist many coding 646 encoding multilingual text. Now, in the world, there exist many coding
651 systems such as X11's Compound Text, Japanese JUNET code, and so-called 647 systems such as X11's Compound Text, Japanese JUNET code, and so-called
652 EUC (Extended UNIX Code); all of these are variants of ISO2022. 648 EUC (Extended UNIX Code); all of these are variants of ISO 2022.
653 649
654 In Mule, we characterize ISO2022 by the following attributes: 650 In Mule, we characterize ISO 2022 by the following attributes:
655 651
656 @enumerate 652 @enumerate
657 @item 653 @item
658 Initial designation to G0 thru G3. 654 Initial designation to G0 thru G3.
659 @item 655 @item
673 @end enumerate 669 @end enumerate
674 670
675 (The last two are only for Japanese.) 671 (The last two are only for Japanese.)
676 672
677 By specifying these attributes, you can create any variant 673 By specifying these attributes, you can create any variant
678 of ISO2022. 674 of ISO 2022.
679 675
680 Here are several examples: 676 Here are several examples:
681 677
682 @example 678 @example
683 @group 679 @group
740 coding system is used to decode the stream into a series of characters 736 coding system is used to decode the stream into a series of characters
741 (which may be from multiple charsets) when the text is read from a file 737 (which may be from multiple charsets) when the text is read from a file
742 or process, and is used to encode the text back into the same format 738 or process, and is used to encode the text back into the same format
743 when it is written out to a file or process. 739 when it is written out to a file or process.
744 740
745 For example, many ISO2022-compliant coding systems (such as Compound 741 For example, many ISO-2022-compliant coding systems (such as Compound
746 Text, which is used for inter-client data under the X Window System) use 742 Text, which is used for inter-client data under the X Window System) use
747 escape sequences to switch between different charsets -- Japanese Kanji, 743 escape sequences to switch between different charsets -- Japanese Kanji,
748 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with 744 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with
749 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See 745 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See
750 @code{make-coding-system} for more information. 746 @code{make-coding-system} for more information.
776 @table @code 772 @table @code
777 @item nil 773 @item nil
778 @itemx autodetect 774 @itemx autodetect
779 Automatic conversion. XEmacs attempts to detect the coding system used 775 Automatic conversion. XEmacs attempts to detect the coding system used
780 in the file. 776 in the file.
781 @item noconv 777 @item no-conversion
782 No conversion. Use this for binary files and such. On output, graphic 778 No conversion. Use this for binary files and such. On output, graphic
783 characters that are not in ASCII or Latin-1 will be replaced by a 779 characters that are not in ASCII or Latin-1 will be replaced by a
784 @samp{?}. (For a noconv-encoded buffer, these characters will only be 780 @samp{?}. (For a no-conversion-encoded buffer, these characters will
785 present if you explicitly insert them.) 781 only be present if you explicitly insert them.)
786 @item shift-jis 782 @item shift-jis
787 Shift-JIS (a Japanese encoding commonly used in PC operating systems). 783 Shift-JIS (a Japanese encoding commonly used in PC operating systems).
788 @item iso2022 784 @item iso2022
789 Any ISO2022-compliant encoding. Among other things, this includes JIS 785 Any ISO-2022-compliant encoding. Among other things, this includes JIS
790 (the Japanese encoding commonly used for e-mail), EUC (the standard Unix 786 (the Japanese encoding commonly used for e-mail), national variants of
791 encoding for Japanese and other languages), and Compound Text (the 787 EUC (the standard Unix encoding for Japanese and other languages), and
792 encoding used in X11). You can specify more specific information about 788 Compound Text (an encoding used in X11). You can specify more specific
793 the conversion with the @var{flags} argument. 789 information about the conversion with the @var{flags} argument.
794 @item big5 790 @item big5
795 Big5 (the encoding commonly used for Taiwanese). 791 Big5 (the encoding commonly used for Taiwanese).
796 @item ccl 792 @item ccl
797 The conversion is performed using a user-written pseudo-code program. 793 The conversion is performed using a user-written pseudo-code program.
798 CCL (Code Conversion Language) is the name of this pseudo-code. 794 CCL (Code Conversion Language) is the name of this pseudo-code.
880 @end itemize 876 @end itemize
881 877
882 @item force-g0-on-output 878 @item force-g0-on-output
883 @itemx force-g1-on-output 879 @itemx force-g1-on-output
884 @itemx force-g2-on-output 880 @itemx force-g2-on-output
885 @itemx force-g2-on-output 881 @itemx force-g3-on-output
886 If non-@code{nil}, send an explicit designation sequence on output 882 If non-@code{nil}, send an explicit designation sequence on output
887 before using the specified register. 883 before using the specified register.
888 884
889 @item short 885 @item short
890 If non-@code{nil}, use the short forms @samp{ESC $ @@}, @samp{ESC $ A}, 886 If non-@code{nil}, use the short forms @samp{ESC $ @@}, @samp{ESC $ A},
911 @item no-iso6429 907 @item no-iso6429
912 If non-@code{nil}, don't use ISO6429's direction specification. 908 If non-@code{nil}, don't use ISO6429's direction specification.
913 909
914 @item escape-quoted 910 @item escape-quoted
915 If non-nil, literal control characters that are the same as the 911 If non-nil, literal control characters that are the same as the
916 beginning of a recognized ISO2022 or ISO6429 escape sequence (in 912 beginning of a recognized ISO 2022 or ISO 6429 escape sequence (in
917 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F), 913 particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), SS3 (0x8F),
918 and CSI (0x9B)) are ``quoted'' with an escape character so that they can 914 and CSI (0x9B)) are ``quoted'' with an escape character so that they can
919 be properly distinguished from an escape sequence. (Note that doing 915 be properly distinguished from an escape sequence. (Note that doing
920 this results in a non-portable encoding.) This encoding flag is used for 916 this results in a non-portable encoding.) This encoding flag is used for
921 byte-compiled files. Note that ESC is a good choice for a quoting 917 byte-compiled files. Note that ESC is a good choice for a quoting
922 character because there are no escape sequences whose second byte is a 918 character because there are no escape sequences whose second byte is a
923 character from the Control-0 or Control-1 character sets; this is 919 character from the Control-0 or Control-1 character sets; this is
924 explicitly disallowed by the ISO2022 standard. 920 explicitly disallowed by the ISO 2022 standard.
925 921
926 @item input-charset-conversion 922 @item input-charset-conversion
927 A list of conversion specifications, specifying conversion of characters 923 A list of conversion specifications, specifying conversion of characters
928 in one charset to another when decoding is performed. Each 924 in one charset to another when decoding is performed. Each
929 specification is a list of two elements: the source charset, and the 925 specification is a list of two elements: the source charset, and the
1016 1012
1017 @defun decode-coding-region start end coding-system &optional buffer 1013 @defun decode-coding-region start end coding-system &optional buffer
1018 This function decodes the text between @var{start} and @var{end} which 1014 This function decodes the text between @var{start} and @var{end} which
1019 is encoded in @var{coding-system}. This is useful if you've read in 1015 is encoded in @var{coding-system}. This is useful if you've read in
1020 encoded text from a file without decoding it (e.g. you read in a 1016 encoded text from a file without decoding it (e.g. you read in a
1021 JIS-formatted file but used the @code{binary} or @code{noconv} coding 1017 JIS-formatted file but used the @code{binary} or @code{no-conversion} coding
1022 system, so that it shows up as @samp{^[$B!<!+^[(B}). The length of the 1018 system, so that it shows up as @samp{^[$B!<!+^[(B}). The length of the
1023 encoded text is returned. @var{buffer} defaults to the current buffer 1019 encoded text is returned. @var{buffer} defaults to the current buffer
1024 if unspecified. 1020 if unspecified.
1025 @end defun 1021 @end defun
1026 1022