Mercurial > hg > xemacs-beta
comparison man/lispref/mule.texi @ 398:74fd4e045ea6 r21-2-29
Import from CVS: tag r21-2-29
author | cvs |
---|---|
date | Mon, 13 Aug 2007 11:13:30 +0200 |
parents | cc15677e0335 |
children | 2f8bb876ab1d |
comparison
equal
deleted
inserted
replaced
397:f4aeb21a5bad | 398:74fd4e045ea6 |
---|---|
41 ways, although the basic shape will be the same. | 41 ways, although the basic shape will be the same. |
42 | 42 |
43 In some cases, the differences will be significant enough that it is | 43 In some cases, the differences will be significant enough that it is |
44 actually possible to identify two or more distinct shapes that both | 44 actually possible to identify two or more distinct shapes that both |
45 represent the same character. For example, the lowercase letters | 45 represent the same character. For example, the lowercase letters |
46 @samp{a} and @samp{g} each have two distinct possible shapes -- the | 46 @samp{a} and @samp{g} each have two distinct possible shapes---the |
47 @samp{a} can optionally have a curved tail projecting off the top, and | 47 @samp{a} can optionally have a curved tail projecting off the top, and |
48 the @samp{g} can be formed either of two loops, or of one loop and a | 48 the @samp{g} can be formed either of two loops, or of one loop and a |
49 tail hanging off the bottom. Such distinct possible shapes of a | 49 tail hanging off the bottom. Such distinct possible shapes of a |
50 character are called @dfn{glyphs}. The important characteristic of two | 50 character are called @dfn{glyphs}. The important characteristic of two |
51 glyphs making up the same character is that the choice between one or | 51 glyphs making up the same character is that the choice between one or |
52 the other is purely stylistic and has no linguistic effect on a word | 52 the other is purely stylistic and has no linguistic effect on a word |
53 (this is the reason why a capital @samp{A} and lowercase @samp{a} | 53 (this is the reason why a capital @samp{A} and lowercase @samp{a} |
54 are different characters rather than different glyphs -- e.g. | 54 are different characters rather than different glyphs---e.g. |
55 @samp{Aspen} is a city while @samp{aspen} is a kind of tree). | 55 @samp{Aspen} is a city while @samp{aspen} is a kind of tree). |
56 | 56 |
57 Note that @dfn{character} and @dfn{glyph} are used differently | 57 Note that @dfn{character} and @dfn{glyph} are used differently |
58 here than elsewhere in XEmacs. | 58 here than elsewhere in XEmacs. |
59 | 59 |
72 particular ordering. ASCII, for example, places letters in their | 72 particular ordering. ASCII, for example, places letters in their |
73 ``natural'' order, puts uppercase letters before lowercase letters, | 73 ``natural'' order, puts uppercase letters before lowercase letters, |
74 numbers before letters, etc. Note that for many of the Asian character | 74 numbers before letters, etc. Note that for many of the Asian character |
75 sets, there is no natural ordering of the characters. The actual | 75 sets, there is no natural ordering of the characters. The actual |
76 orderings are based on one or more salient characteristic, of which | 76 orderings are based on one or more salient characteristic, of which |
77 there are many to choose from -- e.g. number of strokes, common | 77 there are many to choose from---e.g. number of strokes, common |
78 radicals, phonetic ordering, etc. | 78 radicals, phonetic ordering, etc. |
79 | 79 |
80 The set of numbers assigned to any particular character are called | 80 The set of numbers assigned to any particular character are called |
81 the character's @dfn{position codes}. The number of position codes | 81 the character's @dfn{position codes}. The number of position codes |
82 required to index a particular character in a character set is called | 82 required to index a particular character in a character set is called |
103 position codes for the characters in that character set could be used | 103 position codes for the characters in that character set could be used |
104 directly. (This is the case with ASCII, and as a result, most people do | 104 directly. (This is the case with ASCII, and as a result, most people do |
105 not understand the difference between a character set and an encoding.) | 105 not understand the difference between a character set and an encoding.) |
106 This is not possible, however, if more than one character set is to be | 106 This is not possible, however, if more than one character set is to be |
107 used in the encoding. For example, printed Japanese text typically | 107 used in the encoding. For example, printed Japanese text typically |
108 requires characters from multiple character sets -- ASCII, JISX0208, and | 108 requires characters from multiple character sets---ASCII, JISX0208, and |
109 JISX0212, to be specific. Each of these is indexed using one or more | 109 JISX0212, to be specific. Each of these is indexed using one or more |
110 position codes in the range 33 through 126, so the position codes could | 110 position codes in the range 33 through 126, so the position codes could |
111 not be used directly or there would be no way to tell which character | 111 not be used directly or there would be no way to tell which character |
112 was meant. Different Japanese encodings handle this differently -- JIS | 112 was meant. Different Japanese encodings handle this differently---JIS |
113 uses special escape characters to denote different character sets; EUC | 113 uses special escape characters to denote different character sets; EUC |
114 sets the high bit of the position codes for JISX0208 and JISX0212, and | 114 sets the high bit of the position codes for JISX0208 and JISX0212, and |
115 puts a special extra byte before each JISX0212 character; etc. (JIS, | 115 puts a special extra byte before each JISX0212 character; etc. (JIS, |
116 EUC, and most of the other encodings you will encounter are 7-bit or | 116 EUC, and most of the other encodings you will encounter are 7-bit or |
117 8-bit encodings. There is one common 16-bit encoding, which is Unicode; | 117 8-bit encodings. There is one common 16-bit encoding, which is Unicode; |
364 This function returns the number of display columns per character (in | 364 This function returns the number of display columns per character (in |
365 TTY mode) of @var{charset}. | 365 TTY mode) of @var{charset}. |
366 @end defun | 366 @end defun |
367 | 367 |
368 @defun charset-direction charset | 368 @defun charset-direction charset |
369 This function returns the display direction of @var{charset} -- either | 369 This function returns the display direction of @var{charset}---either |
370 @code{l2r} or @code{r2l}. | 370 @code{l2r} or @code{r2l}. |
371 @end defun | 371 @end defun |
372 | 372 |
373 @defun charset-final charset | 373 @defun charset-final charset |
374 This function returns the final byte of the ISO 2022 escape sequence | 374 This function returns the final byte of the ISO 2022 escape sequence |
553 4 areas: C0, GL, C1, and GR. GL and GR are the areas into which a | 553 4 areas: C0, GL, C1, and GR. GL and GR are the areas into which a |
554 register of charset can be invoked into. | 554 register of charset can be invoked into. |
555 | 555 |
556 @example | 556 @example |
557 @group | 557 @group |
558 C0: 0x00 - 0x1F | 558 C0: 0x00 - 0x1F |
559 GL: 0x20 - 0x7F | 559 GL: 0x20 - 0x7F |
560 C1: 0x80 - 0x9F | 560 C1: 0x80 - 0x9F |
561 GR: 0xA0 - 0xFF | 561 GR: 0xA0 - 0xFF |
562 @end group | 562 @end group |
563 @end example | 563 @end example |
564 | 564 |
565 Usually, in the initial state, G0 is invoked into GL, and G1 | 565 Usually, in the initial state, G0 is invoked into GL, and G1 |
566 is invoked into GR. | 566 is invoked into GR. |
569 7-bit environments, only C0 and GL are used. | 569 7-bit environments, only C0 and GL are used. |
570 | 570 |
571 Charset designation is done by escape sequences of the form: | 571 Charset designation is done by escape sequences of the form: |
572 | 572 |
573 @example | 573 @example |
574 ESC [@var{I}] @var{I} @var{F} | 574 ESC [@var{I}] @var{I} @var{F} |
575 @end example | 575 @end example |
576 | 576 |
577 where @var{I} is an intermediate character in the range 0x20 - 0x2F, and | 577 where @var{I} is an intermediate character in the range 0x20 - 0x2F, and |
578 @var{F} is the final character identifying this charset. | 578 @var{F} is the final character identifying this charset. |
579 | 579 |
580 The meaning of intermediate characters are: | 580 The meaning of intermediate characters are: |
581 | 581 |
582 @example | 582 @example |
583 @group | 583 @group |
584 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96). | 584 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96). |
585 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}. | 585 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}. |
586 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}. | 586 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}. |
587 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}. | 587 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}. |
588 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}. | 588 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}. |
589 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}. | 589 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}. |
590 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}. | 590 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}. |
591 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}. | 591 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}. |
592 @end group | 592 @end group |
593 @end example | 593 @end example |
594 | 594 |
595 The following rule is not allowed in ISO 2022 but can be used in Mule. | 595 The following rule is not allowed in ISO 2022 but can be used in Mule. |
596 | 596 |
597 @example | 597 @example |
598 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}. | 598 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}. |
599 @end example | 599 @end example |
600 | 600 |
601 Here are examples of designations: | 601 Here are examples of designations: |
602 | 602 |
603 @example | 603 @example |
604 @group | 604 @group |
605 ESC ( B : designate to G0 ASCII | 605 ESC ( B : designate to G0 ASCII |
606 ESC - A : designate to G1 Latin-1 | 606 ESC - A : designate to G1 Latin-1 |
607 ESC $ ( A or ESC $ A : designate to G0 GB2312 | 607 ESC $ ( A or ESC $ A : designate to G0 GB2312 |
608 ESC $ ( B or ESC $ B : designate to G0 JISX0208 | 608 ESC $ ( B or ESC $ B : designate to G0 JISX0208 |
609 ESC $ ) C : designate to G1 KSC5601 | 609 ESC $ ) C : designate to G1 KSC5601 |
610 @end group | 610 @end group |
611 @end example | 611 @end example |
612 | 612 |
613 To use a charset designated to G2 or G3, and to use a charset designated | 613 To use a charset designated to G2 or G3, and to use a charset designated |
614 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3 | 614 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3 |
616 Single Shift (one character only). | 616 Single Shift (one character only). |
617 | 617 |
618 Locking Shift is done as follows: | 618 Locking Shift is done as follows: |
619 | 619 |
620 @example | 620 @example |
621 LS0 or SI (0x0F): invoke G0 into GL | 621 LS0 or SI (0x0F): invoke G0 into GL |
622 LS1 or SO (0x0E): invoke G1 into GL | 622 LS1 or SO (0x0E): invoke G1 into GL |
623 LS2: invoke G2 into GL | 623 LS2: invoke G2 into GL |
624 LS3: invoke G3 into GL | 624 LS3: invoke G3 into GL |
625 LS1R: invoke G1 into GR | 625 LS1R: invoke G1 into GR |
626 LS2R: invoke G2 into GR | 626 LS2R: invoke G2 into GR |
627 LS3R: invoke G3 into GR | 627 LS3R: invoke G3 into GR |
628 @end example | 628 @end example |
629 | 629 |
630 Single Shift is done as follows: | 630 Single Shift is done as follows: |
631 | 631 |
632 @example | 632 @example |
633 @group | 633 @group |
634 SS2 or ESC N: invoke G2 into GL | 634 SS2 or ESC N: invoke G2 into GL |
635 SS3 or ESC O: invoke G3 into GL | 635 SS3 or ESC O: invoke G3 into GL |
636 @end group | 636 @end group |
637 @end example | 637 @end example |
638 | 638 |
639 (#### Ben says: I think the above is slightly incorrect. It appears that | 639 (#### Ben says: I think the above is slightly incorrect. It appears that |
640 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and | 640 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and |
676 Here are several examples: | 676 Here are several examples: |
677 | 677 |
678 @example | 678 @example |
679 @group | 679 @group |
680 junet -- Coding system used in JUNET. | 680 junet -- Coding system used in JUNET. |
681 1. G0 <- ASCII, G1..3 <- never used | 681 1. G0 <- ASCII, G1..3 <- never used |
682 2. Yes. | 682 2. Yes. |
683 3. Yes. | 683 3. Yes. |
684 4. Yes. | 684 4. Yes. |
685 5. 7-bit environment | 685 5. 7-bit environment |
686 6. No. | 686 6. No. |
687 7. Use ASCII | 687 7. Use ASCII |
688 8. Use JISX0208-1983 | 688 8. Use JISX0208-1983 |
689 @end group | 689 @end group |
690 | 690 |
691 @group | 691 @group |
692 ctext -- Compound Text | 692 ctext -- Compound Text |
693 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used | 693 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used |
694 2. No. | 694 2. No. |
695 3. No. | 695 3. No. |
696 4. Yes. | 696 4. Yes. |
697 5. 8-bit environment | 697 5. 8-bit environment |
698 6. No. | 698 6. No. |
699 7. Use ASCII | 699 7. Use ASCII |
700 8. Use JISX0208-1983 | 700 8. Use JISX0208-1983 |
701 @end group | 701 @end group |
702 | 702 |
703 @group | 703 @group |
704 euc-china -- Chinese EUC. Although many people call this | 704 euc-china -- Chinese EUC. Although many people call this |
705 as "GB encoding", the name may cause misunderstanding. | 705 as "GB encoding", the name may cause misunderstanding. |
706 1. G0 <- ASCII, G1 <- GB2312, G2,3 <- never used | 706 1. G0 <- ASCII, G1 <- GB2312, G2,3 <- never used |
707 2. No. | 707 2. No. |
708 3. Yes. | 708 3. Yes. |
709 4. Yes. | 709 4. Yes. |
710 5. 8-bit environment | 710 5. 8-bit environment |
711 6. No. | 711 6. No. |
712 7. Use ASCII | 712 7. Use ASCII |
713 8. Use JISX0208-1983 | 713 8. Use JISX0208-1983 |
714 @end group | 714 @end group |
715 | 715 |
716 @group | 716 @group |
717 korean-mail -- Coding system used in Korean network. | 717 korean-mail -- Coding system used in Korean network. |
718 1. G0 <- ASCII, G1 <- KSC5601, G2,3 <- never used | 718 1. G0 <- ASCII, G1 <- KSC5601, G2,3 <- never used |
719 2. No. | 719 2. No. |
720 3. Yes. | 720 3. Yes. |
721 4. Yes. | 721 4. Yes. |
722 5. 7-bit environment | 722 5. 7-bit environment |
723 6. Yes. | 723 6. Yes. |
724 7. No. | 724 7. No. |
725 8. No. | 725 8. No. |
726 @end group | 726 @end group |
727 @end example | 727 @end example |
728 | 728 |
729 Mule creates all these coding systems by default. | 729 Mule creates all these coding systems by default. |
730 | 730 |
738 or process, and is used to encode the text back into the same format | 738 or process, and is used to encode the text back into the same format |
739 when it is written out to a file or process. | 739 when it is written out to a file or process. |
740 | 740 |
741 For example, many ISO-2022-compliant coding systems (such as Compound | 741 For example, many ISO-2022-compliant coding systems (such as Compound |
742 Text, which is used for inter-client data under the X Window System) use | 742 Text, which is used for inter-client data under the X Window System) use |
743 escape sequences to switch between different charsets -- Japanese Kanji, | 743 escape sequences to switch between different charsets---Japanese Kanji, |
744 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with | 744 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with |
745 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See | 745 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See |
746 @code{make-coding-system} for more information. | 746 @code{make-coding-system} for more information. |
747 | 747 |
748 Coding systems are normally identified using a symbol, and the symbol is | 748 Coding systems are normally identified using a symbol, and the symbol is |
1091 @defun encode-big5-char ch | 1091 @defun encode-big5-char ch |
1092 This function encodes the Big5 character @var{char} to BIG5 | 1092 This function encodes the Big5 character @var{char} to BIG5 |
1093 coding-system. The corresponding character code in Big5 is returned. | 1093 coding-system. The corresponding character code in Big5 is returned. |
1094 @end defun | 1094 @end defun |
1095 | 1095 |
1096 @node CCL | 1096 @node CCL, Category Tables, Coding Systems, MULE |
1097 @section CCL | 1097 @section CCL |
1098 | 1098 |
1099 @defun execute-ccl-program ccl-program status | 1099 CCL (Code Conversion Language) is a simple structured programming |
1100 This function executes @var{ccl-program} with registers initialized by | 1100 language designed for character coding conversions. A CCL program is |
1101 compiled to CCL code (represented by a vector of integers) and executed | |
1102 by the CCL interpreter embedded in Emacs. The CCL interpreter | |
1103 implements a virtual machine with 8 registers called @code{r0}, ..., | |
1104 @code{r7}, a number of control structures, and some I/O operators. Take | |
1105 care when using registers @code{r0} (used in implicit @dfn{set} | |
1106 statements) and especially @code{r7} (used internally by several | |
1107 statements and operations, especially for multiple return values and I/O | |
1108 operations). | |
1109 | |
1110 CCL is used for code conversion during process I/O and file I/O for | |
1111 non-ISO2022 coding systems. (It is the only way for a user to specify a | |
1112 code conversion function.) It is also used for calculating the code | |
1113 point of an X11 font from a character code. However, since CCL is | |
1114 designed as a powerful programming language, it can be used for more | |
1115 generic calculation where efficiency is demanded. A combination of | |
1116 three or more arithmetic operations can be calculated faster by CCL than | |
1117 by Emacs Lisp. | |
1118 | |
1119 @strong{Warning:} The code in @file{src/mule-ccl.c} and | |
1120 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive | |
1121 description of CCL's semantics. The previous version of this section | |
1122 contained several typos and obsolete names left from earlier versions of | |
1123 MULE, and many may remain. (I am not an experienced CCL programmer; the | |
1124 few who know CCL well find writing English painful.) | |
1125 | |
1126 A CCL program transforms an input data stream into an output data | |
1127 stream. The input stream, held in a buffer of constant bytes, is left | |
1128 unchanged. The buffer may be filled by an external input operation, | |
1129 taken from an Emacs buffer, or taken from a Lisp string. The output | |
1130 buffer is a dynamic array of bytes, which can be written by an external | |
1131 output operation, inserted into an Emacs buffer, or returned as a Lisp | |
1132 string. | |
1133 | |
1134 A CCL program is a (Lisp) list containing two or three members. The | |
1135 first member is the @dfn{buffer magnification}, which indicates the | |
1136 required minimum size of the output buffer as a multiple of the input | |
1137 buffer. It is followed by the @dfn{main block} which executes while | |
1138 there is input remaining, and an optional @dfn{EOF block} which is | |
1139 executed when the input is exhausted. Both the main block and the EOF | |
1140 block are CCL blocks. | |
1141 | |
1142 A @dfn{CCL block} is either a CCL statement or list of CCL statements. | |
1143 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer | |
1144 or an @dfn{assignment}, which is a list of a register to receive the | |
1145 assignment, an assignment operator, and an expression) or a @dfn{control | |
1146 statement} (a list starting with a keyword, whose allowable syntax | |
1147 depends on the keyword). | |
1148 | |
1149 @menu | |
1150 * CCL Syntax:: CCL program syntax in BNF notation. | |
1151 * CCL Statements:: Semantics of CCL statements. | |
1152 * CCL Expressions:: Operators and expressions in CCL. | |
1153 * Calling CCL:: Running CCL programs. | |
1154 * CCL Examples:: The encoding functions for Big5 and KOI-8. | |
1155 @end menu | |
1156 | |
1157 @node CCL Syntax, CCL Statements, CCL, CCL | |
1158 @comment Node, Next, Previous, Up | |
1159 @subsection CCL Syntax | |
1160 | |
1161 The full syntax of a CCL program in BNF notation: | |
1162 | |
1163 @format | |
1164 CCL_PROGRAM := | |
1165 (BUFFER_MAGNIFICATION | |
1166 CCL_MAIN_BLOCK | |
1167 [ CCL_EOF_BLOCK ]) | |
1168 | |
1169 BUFFER_MAGNIFICATION := integer | |
1170 CCL_MAIN_BLOCK := CCL_BLOCK | |
1171 CCL_EOF_BLOCK := CCL_BLOCK | |
1172 | |
1173 CCL_BLOCK := | |
1174 STATEMENT | (STATEMENT [STATEMENT ...]) | |
1175 STATEMENT := | |
1176 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE | |
1177 | CALL | END | |
1178 | |
1179 SET := | |
1180 (REG = EXPRESSION) | |
1181 | (REG ASSIGNMENT_OPERATOR EXPRESSION) | |
1182 | integer | |
1183 | |
1184 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG) | |
1185 | |
1186 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK]) | |
1187 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...]) | |
1188 LOOP := (loop STATEMENT [STATEMENT ...]) | |
1189 BREAK := (break) | |
1190 REPEAT := | |
1191 (repeat) | |
1192 | (write-repeat [REG | integer | string]) | |
1193 | (write-read-repeat REG [integer | ARRAY]) | |
1194 READ := | |
1195 (read REG ...) | |
1196 | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK) | |
1197 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...]) | |
1198 WRITE := | |
1199 (write REG ...) | |
1200 | (write EXPRESSION) | |
1201 | (write integer) | (write string) | (write REG ARRAY) | |
1202 | string | |
1203 CALL := (call ccl-program-name) | |
1204 END := (end) | |
1205 | |
1206 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
1207 ARG := REG | integer | |
1208 OPERATOR := | |
1209 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | // | |
1210 | < | > | == | <= | >= | != | de-sjis | en-sjis | |
1211 ASSIGNMENT_OPERATOR := | |
1212 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>= | |
1213 ARRAY := '[' integer ... ']' | |
1214 @end format | |
1215 | |
1216 @node CCL Statements, CCL Expressions, CCL Syntax, CCL | |
1217 @comment Node, Next, Previous, Up | |
1218 @subsection CCL Statements | |
1219 | |
1220 The Emacs Code Conversion Language provides the following statement | |
1221 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat}, | |
1222 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, and @dfn{end}. | |
1223 | |
1224 @heading Set statement: | |
1225 | |
1226 The @dfn{set} statement has three variants with the syntaxes | |
1227 @samp{(@var{reg} = @var{expression})}, | |
1228 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and | |
1229 @samp{@var{integer}}. The assignment operator variation of the | |
1230 @dfn{set} statement works the same way as the corresponding C expression | |
1231 statement does. The assignment operators are @code{+=}, @code{-=}, | |
1232 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{|=}, @code{^=}, | |
1233 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A | |
1234 "naked integer" @var{integer} is equivalent to a @var{set} statement of | |
1235 the form @code{(r0 = @var{integer})}. | |
1236 | |
1237 @heading I/O statements: | |
1238 | |
1239 The @dfn{read} statement takes one or more registers as arguments. It | |
1240 reads one byte (a C char) from the input into each register in turn. | |
1241 | |
1242 The @dfn{write} takes several forms. In the form @samp{(write @var{reg} | |
1243 ...)} it takes one or more registers as arguments and writes each in | |
1244 turn to the output. The integer in a register (interpreted as an | |
1245 Emchar) is encoded to multibyte form (ie, Bufbytes) and written to the | |
1246 current output buffer. If it is less than 256, it is written as is. | |
1247 The forms @samp{(write @var{expression})} and @samp{(write | |
1248 @var{integer})} are treated analogously. The form @samp{(write | |
1249 @var{string})} writes the constant string to the output. A | |
1250 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write | |
1251 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes | |
1252 the @var{reg}th element of the @var{array} to the output. | |
1253 | |
1254 @heading Conditional statements: | |
1255 | |
1256 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and | |
1257 an optional @var{second CCL block} as arguments. If the | |
1258 @var{expression} evaluates to non-zero, the first @var{CCL block} is | |
1259 executed. Otherwise, if there is a @var{second CCL block}, it is | |
1260 executed. | |
1261 | |
1262 The @dfn{read-if} variant of the @dfn{if} statement takes an | |
1263 @var{expression}, a @var{CCL block}, and an optional @var{second CCL | |
1264 block} as arguments. The @var{expression} must have the form | |
1265 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is | |
1266 a register or an integer). The @code{read-if} statement first reads | |
1267 from the input into the first register operand in the @var{expression}, | |
1268 then conditionally executes a CCL block just as the @code{if} statement | |
1269 does. | |
1270 | |
1271 The @dfn{branch} statement takes an @var{expression} and one or more CCL | |
1272 blocks as arguments. The CCL blocks are treated as a zero-indexed | |
1273 array, and the @code{branch} statement uses the @var{expression} as the | |
1274 index of the CCL block to execute. Null CCL blocks may be used as | |
1275 no-ops, continuing execution with the statement following the | |
1276 @code{branch} statement in the containing CCL block. Out-of-range | |
1277 values for the @var{EXPRESSION} are also treated as no-ops. | |
1278 | |
1279 The @dfn{read-branch} variant of the @dfn{branch} statement takes an | |
1280 @var{register}, a @var{CCL block}, and an optional @var{second CCL | |
1281 block} as arguments. The @code{read-branch} statement first reads from | |
1282 the input into the @var{register}, then conditionally executes a CCL | |
1283 block just as the @code{branch} statement does. | |
1284 | |
1285 @heading Loop control statements: | |
1286 | |
1287 The @dfn{loop} statement creates a block with an implied jump from the | |
1288 end of the block back to its head. The loop is exited on a @code{break} | |
1289 statement, and continued without executing the tail by a @code{repeat} | |
1290 statement. | |
1291 | |
1292 The @dfn{break} statement, written @samp{(break)}, terminates the | |
1293 current loop and continues with the next statement in the current | |
1294 block. | |
1295 | |
1296 The @dfn{repeat} statement has three variants, @code{repeat}, | |
1297 @code{write-repeat}, and @code{write-read-repeat}. Each continues the | |
1298 current loop from its head, possibly after performing I/O. | |
1299 @code{repeat} takes no arguments and does no I/O before jumping. | |
1300 @code{write-repeat} takes a single argument (a register, an | |
1301 integer, or a string), writes it to the output, then jumps. | |
1302 @code{write-read-repeat} takes one or two arguments. The first must | |
1303 be a register. The second may be an integer or an array; if absent, it | |
1304 is implicitly set to the first (register) argument. | |
1305 @code{write-read-repeat} writes its second argument to the output, then | |
1306 reads from the input into the register, and finally jumps. See the | |
1307 @code{write} and @code{read} statements for the semantics of the I/O | |
1308 operations for each type of argument. | |
1309 | |
1310 @heading Other control statements: | |
1311 | |
1312 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})}, | |
1313 executes a CCL program as a subroutine. It does not return a value to | |
1314 the caller, but can modify the register status. | |
1315 | |
1316 The @dfn{end} statement, written @samp{(end)}, terminates the CCL | |
1317 program successfully, and returns to caller (which may be a CCL | |
1318 program). It does not alter the status of the registers. | |
1319 | |
1320 @node CCL Expressions, Calling CCL, CCL Statements, CCL | |
1321 @comment Node, Next, Previous, Up | |
1322 @subsection CCL Expressions | |
1323 | |
1324 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions | |
1325 consist of a single @var{operand}, either a register (one of @code{r0}, | |
1326 ..., @code{r0}) or an integer. Complex expressions are lists of the | |
1327 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike | |
1328 C, assignments are not expressions. | |
1329 | |
1330 In the following table, @var{X} is the target resister for a @dfn{set}. | |
1331 In subexpressions, this is implicitly @code{r7}. This means that | |
1332 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used | |
1333 freely in subexpressions, since they return parts of their values in | |
1334 @code{r7}. @var{Y} may be an expression, register, or integer, while | |
1335 @var{Z} must be a register or an integer. | |
1336 | |
1337 @multitable @columnfractions .22 .14 .09 .55 | |
1338 @item Name @tab Operator @tab Code @tab C-like Description | |
1339 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z | |
1340 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z | |
1341 @item CCL_MUL @tab @code{*} @tab 0x02 @tab X = Y * Z | |
1342 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z | |
1343 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z | |
1344 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z | |
1345 @item CCL_OR @tab @code{|} @tab 0x06 @tab X = Y | Z | |
1346 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z | |
1347 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z | |
1348 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z | |
1349 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) | Z | |
1350 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF | |
1351 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z | |
1352 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y) | |
1353 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y) | |
1354 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y) | |
1355 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y) | |
1356 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y) | |
1357 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y) | |
1358 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z)) | |
1359 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z) | |
1360 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z)) | |
1361 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z)) | |
1362 @end multitable | |
1363 | |
1364 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8, | |
1365 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS | |
1366 and CCL_DECODE_SJIS treat their first and second bytes as the high and | |
1367 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an | |
1368 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a | |
1369 complicated transformation of the Japanese standard JIS encoding to | |
1370 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to | |
1371 represent the SJIS operations in infix form. | |
1372 | |
1373 @node Calling CCL, CCL Examples, CCL Expressions, CCL | |
1374 @comment Node, Next, Previous, Up | |
1375 @subsection Calling CCL | |
1376 | |
1377 CCL programs are called automatically during Emacs buffer I/O when the | |
1378 external representation has a coding system type of @code{shift-jis}, | |
1379 @code{big5}, or @code{ccl}. The program is specified by the coding | |
1380 system (@pxref{Coding Systems}). You can also call CCL programs from | |
1381 other CCL programs, and from Lisp using these functions: | |
1382 | |
1383 @defun ccl-execute ccl-program status | |
1384 Execute @var{ccl-program} with registers initialized by | |
1101 @var{status}. @var{ccl-program} is a vector of compiled CCL code | 1385 @var{status}. @var{ccl-program} is a vector of compiled CCL code |
1102 created by @code{ccl-compile}. @var{status} must be a vector of nine | 1386 created by @code{ccl-compile}. It is an error for the program to try to |
1387 execute a CCL I/O command. @var{status} must be a vector of nine | |
1103 values, specifying the initial value for the R0, R1 .. R7 registers and | 1388 values, specifying the initial value for the R0, R1 .. R7 registers and |
1104 for the instruction counter IC. A @code{nil} value for a register | 1389 for the instruction counter IC. A @code{nil} value for a register |
1105 initializer causes the register to be set to 0. A @code{nil} value for | 1390 initializer causes the register to be set to 0. A @code{nil} value for |
1106 the IC initializer causes execution to start at the beginning of the | 1391 the IC initializer causes execution to start at the beginning of the |
1107 program. When the program is done, @var{status} is modified (by | 1392 program. When the program is done, @var{status} is modified (by |
1108 side-effect) to contain the ending values for the corresponding | 1393 side-effect) to contain the ending values for the corresponding |
1109 registers and IC. | 1394 registers and IC. |
1110 @end defun | 1395 @end defun |
1111 | 1396 |
1112 @defun execute-ccl-program-string ccl-program status str | 1397 @defun ccl-execute-on-string ccl-program status str &optional continue |
1113 This function executes @var{ccl-program} with initial @var{status} on | 1398 Execute @var{ccl-program} with initial @var{status} on |
1114 @var{string}. @var{ccl-program} is a vector of compiled CCL code | 1399 @var{string}. @var{ccl-program} is a vector of compiled CCL code |
1115 created by @code{ccl-compile}. @var{status} must be a vector of nine | 1400 created by @code{ccl-compile}. @var{status} must be a vector of nine |
1116 values, specifying the initial value for the R0, R1 .. R7 registers and | 1401 values, specifying the initial value for the R0, R1 .. R7 registers and |
1117 for the instruction counter IC. A @code{nil} value for a register | 1402 for the instruction counter IC. A @code{nil} value for a register |
1118 initializer causes the register to be set to 0. A @code{nil} value for | 1403 initializer causes the register to be set to 0. A @code{nil} value for |
1119 the IC initializer causes execution to start at the beginning of the | 1404 the IC initializer causes execution to start at the beginning of the |
1120 program. When the program is done, @var{status} is modified (by | 1405 program. An optional fourth argument @var{continue}, if non-nil, causes |
1406 the IC to | |
1407 remain on the unsatisfied read operation if the program terminates due | |
1408 to exhaustion of the input buffer. Otherwise the IC is set to the end | |
1409 of the program. When the program is done, @var{status} is modified (by | |
1121 side-effect) to contain the ending values for the corresponding | 1410 side-effect) to contain the ending values for the corresponding |
1122 registers and IC. Returns the resulting string. | 1411 registers and IC. Returns the resulting string. |
1123 @end defun | 1412 @end defun |
1124 | 1413 |
1125 @defun ccl-reset-elapsed-time | 1414 To call a CCL program from another CCL program, it must first be |
1126 This function resets the internal value which holds the time elapsed by | 1415 registered: |
1127 CCL interpreter. | 1416 |
1128 @end defun | 1417 @defun register-ccl-program name ccl-program |
1418 Register @var{name} for CCL program @var{program} in | |
1419 @code{ccl-program-table}. @var{program} should be the compiled form of | |
1420 a CCL program, or nil. Return index number of the registered CCL | |
1421 program. | |
1422 @end defun | |
1423 | |
1424 Information about the processor time used by the CCL interpreter can be | |
1425 obtained using these functions: | |
1129 | 1426 |
1130 @defun ccl-elapsed-time | 1427 @defun ccl-elapsed-time |
1131 This function returns the time elapsed by CCL interpreter as cons of | 1428 Returns the elapsed processor time of the CCL interpreter as cons of |
1132 user and system time. This measures processor time, not real time. | 1429 user and system time, as |
1133 Both values are floating point numbers measured in seconds. If only one | 1430 floating point numbers measured in seconds. If only one |
1134 overall value can be determined, the return value will be a cons of that | 1431 overall value can be determined, the return value will be a cons of that |
1135 value and 0. | 1432 value and 0. |
1136 @end defun | 1433 @end defun |
1137 | 1434 |
1138 @node Category Tables | 1435 @defun ccl-reset-elapsed-time |
1436 Resets the CCL interpreter's internal elapsed time registers. | |
1437 @end defun | |
1438 | |
1439 @node CCL Examples, , Calling CCL, CCL | |
1440 @comment Node, Next, Previous, Up | |
1441 @subsection CCL Examples | |
1442 | |
1443 This section is not yet written. | |
1444 | |
1445 @node Category Tables, , CCL, MULE | |
1139 @section Category Tables | 1446 @section Category Tables |
1140 | 1447 |
1141 A category table is a type of char table used for keeping track of | 1448 A category table is a type of char table used for keeping track of |
1142 categories. Categories are used for classifying characters for use in | 1449 categories. Categories are used for classifying characters for use in |
1143 regexps -- you can refer to a category rather than having to use a | 1450 regexps---you can refer to a category rather than having to use a |
1144 complicated [] expression (and category lookups are significantly | 1451 complicated [] expression (and category lookups are significantly |
1145 faster). | 1452 faster). |
1146 | 1453 |
1147 There are 95 different categories available, one for each printable | 1454 There are 95 different categories available, one for each printable |
1148 character (including space) in the ASCII charset. Each category is | 1455 character (including space) in the ASCII charset. Each category is |