comparison man/lispref/mule.texi @ 398:74fd4e045ea6 r21-2-29

Import from CVS: tag r21-2-29
author cvs
date Mon, 13 Aug 2007 11:13:30 +0200
parents cc15677e0335
children 2f8bb876ab1d
comparison
equal deleted inserted replaced
397:f4aeb21a5bad 398:74fd4e045ea6
41 ways, although the basic shape will be the same. 41 ways, although the basic shape will be the same.
42 42
43 In some cases, the differences will be significant enough that it is 43 In some cases, the differences will be significant enough that it is
44 actually possible to identify two or more distinct shapes that both 44 actually possible to identify two or more distinct shapes that both
45 represent the same character. For example, the lowercase letters 45 represent the same character. For example, the lowercase letters
46 @samp{a} and @samp{g} each have two distinct possible shapes -- the 46 @samp{a} and @samp{g} each have two distinct possible shapes---the
47 @samp{a} can optionally have a curved tail projecting off the top, and 47 @samp{a} can optionally have a curved tail projecting off the top, and
48 the @samp{g} can be formed either of two loops, or of one loop and a 48 the @samp{g} can be formed either of two loops, or of one loop and a
49 tail hanging off the bottom. Such distinct possible shapes of a 49 tail hanging off the bottom. Such distinct possible shapes of a
50 character are called @dfn{glyphs}. The important characteristic of two 50 character are called @dfn{glyphs}. The important characteristic of two
51 glyphs making up the same character is that the choice between one or 51 glyphs making up the same character is that the choice between one or
52 the other is purely stylistic and has no linguistic effect on a word 52 the other is purely stylistic and has no linguistic effect on a word
53 (this is the reason why a capital @samp{A} and lowercase @samp{a} 53 (this is the reason why a capital @samp{A} and lowercase @samp{a}
54 are different characters rather than different glyphs -- e.g. 54 are different characters rather than different glyphs---e.g.
55 @samp{Aspen} is a city while @samp{aspen} is a kind of tree). 55 @samp{Aspen} is a city while @samp{aspen} is a kind of tree).
56 56
57 Note that @dfn{character} and @dfn{glyph} are used differently 57 Note that @dfn{character} and @dfn{glyph} are used differently
58 here than elsewhere in XEmacs. 58 here than elsewhere in XEmacs.
59 59
72 particular ordering. ASCII, for example, places letters in their 72 particular ordering. ASCII, for example, places letters in their
73 ``natural'' order, puts uppercase letters before lowercase letters, 73 ``natural'' order, puts uppercase letters before lowercase letters,
74 numbers before letters, etc. Note that for many of the Asian character 74 numbers before letters, etc. Note that for many of the Asian character
75 sets, there is no natural ordering of the characters. The actual 75 sets, there is no natural ordering of the characters. The actual
76 orderings are based on one or more salient characteristic, of which 76 orderings are based on one or more salient characteristic, of which
77 there are many to choose from -- e.g. number of strokes, common 77 there are many to choose from---e.g. number of strokes, common
78 radicals, phonetic ordering, etc. 78 radicals, phonetic ordering, etc.
79 79
80 The set of numbers assigned to any particular character are called 80 The set of numbers assigned to any particular character are called
81 the character's @dfn{position codes}. The number of position codes 81 the character's @dfn{position codes}. The number of position codes
82 required to index a particular character in a character set is called 82 required to index a particular character in a character set is called
103 position codes for the characters in that character set could be used 103 position codes for the characters in that character set could be used
104 directly. (This is the case with ASCII, and as a result, most people do 104 directly. (This is the case with ASCII, and as a result, most people do
105 not understand the difference between a character set and an encoding.) 105 not understand the difference between a character set and an encoding.)
106 This is not possible, however, if more than one character set is to be 106 This is not possible, however, if more than one character set is to be
107 used in the encoding. For example, printed Japanese text typically 107 used in the encoding. For example, printed Japanese text typically
108 requires characters from multiple character sets -- ASCII, JISX0208, and 108 requires characters from multiple character sets---ASCII, JISX0208, and
109 JISX0212, to be specific. Each of these is indexed using one or more 109 JISX0212, to be specific. Each of these is indexed using one or more
110 position codes in the range 33 through 126, so the position codes could 110 position codes in the range 33 through 126, so the position codes could
111 not be used directly or there would be no way to tell which character 111 not be used directly or there would be no way to tell which character
112 was meant. Different Japanese encodings handle this differently -- JIS 112 was meant. Different Japanese encodings handle this differently---JIS
113 uses special escape characters to denote different character sets; EUC 113 uses special escape characters to denote different character sets; EUC
114 sets the high bit of the position codes for JISX0208 and JISX0212, and 114 sets the high bit of the position codes for JISX0208 and JISX0212, and
115 puts a special extra byte before each JISX0212 character; etc. (JIS, 115 puts a special extra byte before each JISX0212 character; etc. (JIS,
116 EUC, and most of the other encodings you will encounter are 7-bit or 116 EUC, and most of the other encodings you will encounter are 7-bit or
117 8-bit encodings. There is one common 16-bit encoding, which is Unicode; 117 8-bit encodings. There is one common 16-bit encoding, which is Unicode;
364 This function returns the number of display columns per character (in 364 This function returns the number of display columns per character (in
365 TTY mode) of @var{charset}. 365 TTY mode) of @var{charset}.
366 @end defun 366 @end defun
367 367
368 @defun charset-direction charset 368 @defun charset-direction charset
369 This function returns the display direction of @var{charset} -- either 369 This function returns the display direction of @var{charset}---either
370 @code{l2r} or @code{r2l}. 370 @code{l2r} or @code{r2l}.
371 @end defun 371 @end defun
372 372
373 @defun charset-final charset 373 @defun charset-final charset
374 This function returns the final byte of the ISO 2022 escape sequence 374 This function returns the final byte of the ISO 2022 escape sequence
553 4 areas: C0, GL, C1, and GR. GL and GR are the areas into which a 553 4 areas: C0, GL, C1, and GR. GL and GR are the areas into which a
554 register of charset can be invoked into. 554 register of charset can be invoked into.
555 555
556 @example 556 @example
557 @group 557 @group
558 C0: 0x00 - 0x1F 558 C0: 0x00 - 0x1F
559 GL: 0x20 - 0x7F 559 GL: 0x20 - 0x7F
560 C1: 0x80 - 0x9F 560 C1: 0x80 - 0x9F
561 GR: 0xA0 - 0xFF 561 GR: 0xA0 - 0xFF
562 @end group 562 @end group
563 @end example 563 @end example
564 564
565 Usually, in the initial state, G0 is invoked into GL, and G1 565 Usually, in the initial state, G0 is invoked into GL, and G1
566 is invoked into GR. 566 is invoked into GR.
569 7-bit environments, only C0 and GL are used. 569 7-bit environments, only C0 and GL are used.
570 570
571 Charset designation is done by escape sequences of the form: 571 Charset designation is done by escape sequences of the form:
572 572
573 @example 573 @example
574 ESC [@var{I}] @var{I} @var{F} 574 ESC [@var{I}] @var{I} @var{F}
575 @end example 575 @end example
576 576
577 where @var{I} is an intermediate character in the range 0x20 - 0x2F, and 577 where @var{I} is an intermediate character in the range 0x20 - 0x2F, and
578 @var{F} is the final character identifying this charset. 578 @var{F} is the final character identifying this charset.
579 579
580 The meaning of intermediate characters are: 580 The meaning of intermediate characters are:
581 581
582 @example 582 @example
583 @group 583 @group
584 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96). 584 $ [0x24]: indicate charset of dimension 2 (94x94 or 96x96).
585 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}. 585 ( [0x28]: designate to G0 a 94-charset whose final byte is @var{F}.
586 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}. 586 ) [0x29]: designate to G1 a 94-charset whose final byte is @var{F}.
587 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}. 587 * [0x2A]: designate to G2 a 94-charset whose final byte is @var{F}.
588 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}. 588 + [0x2B]: designate to G3 a 94-charset whose final byte is @var{F}.
589 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}. 589 - [0x2D]: designate to G1 a 96-charset whose final byte is @var{F}.
590 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}. 590 . [0x2E]: designate to G2 a 96-charset whose final byte is @var{F}.
591 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}. 591 / [0x2F]: designate to G3 a 96-charset whose final byte is @var{F}.
592 @end group 592 @end group
593 @end example 593 @end example
594 594
595 The following rule is not allowed in ISO 2022 but can be used in Mule. 595 The following rule is not allowed in ISO 2022 but can be used in Mule.
596 596
597 @example 597 @example
598 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}. 598 , [0x2C]: designate to G0 a 96-charset whose final byte is @var{F}.
599 @end example 599 @end example
600 600
601 Here are examples of designations: 601 Here are examples of designations:
602 602
603 @example 603 @example
604 @group 604 @group
605 ESC ( B : designate to G0 ASCII 605 ESC ( B : designate to G0 ASCII
606 ESC - A : designate to G1 Latin-1 606 ESC - A : designate to G1 Latin-1
607 ESC $ ( A or ESC $ A : designate to G0 GB2312 607 ESC $ ( A or ESC $ A : designate to G0 GB2312
608 ESC $ ( B or ESC $ B : designate to G0 JISX0208 608 ESC $ ( B or ESC $ B : designate to G0 JISX0208
609 ESC $ ) C : designate to G1 KSC5601 609 ESC $ ) C : designate to G1 KSC5601
610 @end group 610 @end group
611 @end example 611 @end example
612 612
613 To use a charset designated to G2 or G3, and to use a charset designated 613 To use a charset designated to G2 or G3, and to use a charset designated
614 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3 614 to G1 in a 7-bit environment, you must explicitly invoke G1, G2, or G3
616 Single Shift (one character only). 616 Single Shift (one character only).
617 617
618 Locking Shift is done as follows: 618 Locking Shift is done as follows:
619 619
620 @example 620 @example
621 LS0 or SI (0x0F): invoke G0 into GL 621 LS0 or SI (0x0F): invoke G0 into GL
622 LS1 or SO (0x0E): invoke G1 into GL 622 LS1 or SO (0x0E): invoke G1 into GL
623 LS2: invoke G2 into GL 623 LS2: invoke G2 into GL
624 LS3: invoke G3 into GL 624 LS3: invoke G3 into GL
625 LS1R: invoke G1 into GR 625 LS1R: invoke G1 into GR
626 LS2R: invoke G2 into GR 626 LS2R: invoke G2 into GR
627 LS3R: invoke G3 into GR 627 LS3R: invoke G3 into GR
628 @end example 628 @end example
629 629
630 Single Shift is done as follows: 630 Single Shift is done as follows:
631 631
632 @example 632 @example
633 @group 633 @group
634 SS2 or ESC N: invoke G2 into GL 634 SS2 or ESC N: invoke G2 into GL
635 SS3 or ESC O: invoke G3 into GL 635 SS3 or ESC O: invoke G3 into GL
636 @end group 636 @end group
637 @end example 637 @end example
638 638
639 (#### Ben says: I think the above is slightly incorrect. It appears that 639 (#### Ben says: I think the above is slightly incorrect. It appears that
640 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and 640 SS2 invokes G2 into GR and SS3 invokes G3 into GR, whereas ESC N and
676 Here are several examples: 676 Here are several examples:
677 677
678 @example 678 @example
679 @group 679 @group
680 junet -- Coding system used in JUNET. 680 junet -- Coding system used in JUNET.
681 1. G0 <- ASCII, G1..3 <- never used 681 1. G0 <- ASCII, G1..3 <- never used
682 2. Yes. 682 2. Yes.
683 3. Yes. 683 3. Yes.
684 4. Yes. 684 4. Yes.
685 5. 7-bit environment 685 5. 7-bit environment
686 6. No. 686 6. No.
687 7. Use ASCII 687 7. Use ASCII
688 8. Use JISX0208-1983 688 8. Use JISX0208-1983
689 @end group 689 @end group
690 690
691 @group 691 @group
692 ctext -- Compound Text 692 ctext -- Compound Text
693 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used 693 1. G0 <- ASCII, G1 <- Latin-1, G2,3 <- never used
694 2. No. 694 2. No.
695 3. No. 695 3. No.
696 4. Yes. 696 4. Yes.
697 5. 8-bit environment 697 5. 8-bit environment
698 6. No. 698 6. No.
699 7. Use ASCII 699 7. Use ASCII
700 8. Use JISX0208-1983 700 8. Use JISX0208-1983
701 @end group 701 @end group
702 702
703 @group 703 @group
704 euc-china -- Chinese EUC. Although many people call this 704 euc-china -- Chinese EUC. Although many people call this
705 as "GB encoding", the name may cause misunderstanding. 705 as "GB encoding", the name may cause misunderstanding.
706 1. G0 <- ASCII, G1 <- GB2312, G2,3 <- never used 706 1. G0 <- ASCII, G1 <- GB2312, G2,3 <- never used
707 2. No. 707 2. No.
708 3. Yes. 708 3. Yes.
709 4. Yes. 709 4. Yes.
710 5. 8-bit environment 710 5. 8-bit environment
711 6. No. 711 6. No.
712 7. Use ASCII 712 7. Use ASCII
713 8. Use JISX0208-1983 713 8. Use JISX0208-1983
714 @end group 714 @end group
715 715
716 @group 716 @group
717 korean-mail -- Coding system used in Korean network. 717 korean-mail -- Coding system used in Korean network.
718 1. G0 <- ASCII, G1 <- KSC5601, G2,3 <- never used 718 1. G0 <- ASCII, G1 <- KSC5601, G2,3 <- never used
719 2. No. 719 2. No.
720 3. Yes. 720 3. Yes.
721 4. Yes. 721 4. Yes.
722 5. 7-bit environment 722 5. 7-bit environment
723 6. Yes. 723 6. Yes.
724 7. No. 724 7. No.
725 8. No. 725 8. No.
726 @end group 726 @end group
727 @end example 727 @end example
728 728
729 Mule creates all these coding systems by default. 729 Mule creates all these coding systems by default.
730 730
738 or process, and is used to encode the text back into the same format 738 or process, and is used to encode the text back into the same format
739 when it is written out to a file or process. 739 when it is written out to a file or process.
740 740
741 For example, many ISO-2022-compliant coding systems (such as Compound 741 For example, many ISO-2022-compliant coding systems (such as Compound
742 Text, which is used for inter-client data under the X Window System) use 742 Text, which is used for inter-client data under the X Window System) use
743 escape sequences to switch between different charsets -- Japanese Kanji, 743 escape sequences to switch between different charsets---Japanese Kanji,
744 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with 744 for example, is invoked with @samp{ESC $ ( B}; ASCII is invoked with
745 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See 745 @samp{ESC ( B}; and Cyrillic is invoked with @samp{ESC - L}. See
746 @code{make-coding-system} for more information. 746 @code{make-coding-system} for more information.
747 747
748 Coding systems are normally identified using a symbol, and the symbol is 748 Coding systems are normally identified using a symbol, and the symbol is
1091 @defun encode-big5-char ch 1091 @defun encode-big5-char ch
1092 This function encodes the Big5 character @var{char} to BIG5 1092 This function encodes the Big5 character @var{char} to BIG5
1093 coding-system. The corresponding character code in Big5 is returned. 1093 coding-system. The corresponding character code in Big5 is returned.
1094 @end defun 1094 @end defun
1095 1095
1096 @node CCL 1096 @node CCL, Category Tables, Coding Systems, MULE
1097 @section CCL 1097 @section CCL
1098 1098
1099 @defun execute-ccl-program ccl-program status 1099 CCL (Code Conversion Language) is a simple structured programming
1100 This function executes @var{ccl-program} with registers initialized by 1100 language designed for character coding conversions. A CCL program is
1101 compiled to CCL code (represented by a vector of integers) and executed
1102 by the CCL interpreter embedded in Emacs. The CCL interpreter
1103 implements a virtual machine with 8 registers called @code{r0}, ...,
1104 @code{r7}, a number of control structures, and some I/O operators. Take
1105 care when using registers @code{r0} (used in implicit @dfn{set}
1106 statements) and especially @code{r7} (used internally by several
1107 statements and operations, especially for multiple return values and I/O
1108 operations).
1109
1110 CCL is used for code conversion during process I/O and file I/O for
1111 non-ISO2022 coding systems. (It is the only way for a user to specify a
1112 code conversion function.) It is also used for calculating the code
1113 point of an X11 font from a character code. However, since CCL is
1114 designed as a powerful programming language, it can be used for more
1115 generic calculation where efficiency is demanded. A combination of
1116 three or more arithmetic operations can be calculated faster by CCL than
1117 by Emacs Lisp.
1118
1119 @strong{Warning:} The code in @file{src/mule-ccl.c} and
1120 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive
1121 description of CCL's semantics. The previous version of this section
1122 contained several typos and obsolete names left from earlier versions of
1123 MULE, and many may remain. (I am not an experienced CCL programmer; the
1124 few who know CCL well find writing English painful.)
1125
1126 A CCL program transforms an input data stream into an output data
1127 stream. The input stream, held in a buffer of constant bytes, is left
1128 unchanged. The buffer may be filled by an external input operation,
1129 taken from an Emacs buffer, or taken from a Lisp string. The output
1130 buffer is a dynamic array of bytes, which can be written by an external
1131 output operation, inserted into an Emacs buffer, or returned as a Lisp
1132 string.
1133
1134 A CCL program is a (Lisp) list containing two or three members. The
1135 first member is the @dfn{buffer magnification}, which indicates the
1136 required minimum size of the output buffer as a multiple of the input
1137 buffer. It is followed by the @dfn{main block} which executes while
1138 there is input remaining, and an optional @dfn{EOF block} which is
1139 executed when the input is exhausted. Both the main block and the EOF
1140 block are CCL blocks.
1141
1142 A @dfn{CCL block} is either a CCL statement or list of CCL statements.
1143 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer
1144 or an @dfn{assignment}, which is a list of a register to receive the
1145 assignment, an assignment operator, and an expression) or a @dfn{control
1146 statement} (a list starting with a keyword, whose allowable syntax
1147 depends on the keyword).
1148
1149 @menu
1150 * CCL Syntax:: CCL program syntax in BNF notation.
1151 * CCL Statements:: Semantics of CCL statements.
1152 * CCL Expressions:: Operators and expressions in CCL.
1153 * Calling CCL:: Running CCL programs.
1154 * CCL Examples:: The encoding functions for Big5 and KOI-8.
1155 @end menu
1156
1157 @node CCL Syntax, CCL Statements, CCL, CCL
1158 @comment Node, Next, Previous, Up
1159 @subsection CCL Syntax
1160
1161 The full syntax of a CCL program in BNF notation:
1162
1163 @format
1164 CCL_PROGRAM :=
1165 (BUFFER_MAGNIFICATION
1166 CCL_MAIN_BLOCK
1167 [ CCL_EOF_BLOCK ])
1168
1169 BUFFER_MAGNIFICATION := integer
1170 CCL_MAIN_BLOCK := CCL_BLOCK
1171 CCL_EOF_BLOCK := CCL_BLOCK
1172
1173 CCL_BLOCK :=
1174 STATEMENT | (STATEMENT [STATEMENT ...])
1175 STATEMENT :=
1176 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE
1177 | CALL | END
1178
1179 SET :=
1180 (REG = EXPRESSION)
1181 | (REG ASSIGNMENT_OPERATOR EXPRESSION)
1182 | integer
1183
1184 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)
1185
1186 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
1187 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
1188 LOOP := (loop STATEMENT [STATEMENT ...])
1189 BREAK := (break)
1190 REPEAT :=
1191 (repeat)
1192 | (write-repeat [REG | integer | string])
1193 | (write-read-repeat REG [integer | ARRAY])
1194 READ :=
1195 (read REG ...)
1196 | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK)
1197 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
1198 WRITE :=
1199 (write REG ...)
1200 | (write EXPRESSION)
1201 | (write integer) | (write string) | (write REG ARRAY)
1202 | string
1203 CALL := (call ccl-program-name)
1204 END := (end)
1205
1206 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
1207 ARG := REG | integer
1208 OPERATOR :=
1209 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
1210 | < | > | == | <= | >= | != | de-sjis | en-sjis
1211 ASSIGNMENT_OPERATOR :=
1212 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
1213 ARRAY := '[' integer ... ']'
1214 @end format
1215
1216 @node CCL Statements, CCL Expressions, CCL Syntax, CCL
1217 @comment Node, Next, Previous, Up
1218 @subsection CCL Statements
1219
1220 The Emacs Code Conversion Language provides the following statement
1221 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat},
1222 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, and @dfn{end}.
1223
1224 @heading Set statement:
1225
1226 The @dfn{set} statement has three variants with the syntaxes
1227 @samp{(@var{reg} = @var{expression})},
1228 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and
1229 @samp{@var{integer}}. The assignment operator variation of the
1230 @dfn{set} statement works the same way as the corresponding C expression
1231 statement does. The assignment operators are @code{+=}, @code{-=},
1232 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{|=}, @code{^=},
1233 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A
1234 "naked integer" @var{integer} is equivalent to a @var{set} statement of
1235 the form @code{(r0 = @var{integer})}.
1236
1237 @heading I/O statements:
1238
1239 The @dfn{read} statement takes one or more registers as arguments. It
1240 reads one byte (a C char) from the input into each register in turn.
1241
1242 The @dfn{write} takes several forms. In the form @samp{(write @var{reg}
1243 ...)} it takes one or more registers as arguments and writes each in
1244 turn to the output. The integer in a register (interpreted as an
1245 Emchar) is encoded to multibyte form (ie, Bufbytes) and written to the
1246 current output buffer. If it is less than 256, it is written as is.
1247 The forms @samp{(write @var{expression})} and @samp{(write
1248 @var{integer})} are treated analogously. The form @samp{(write
1249 @var{string})} writes the constant string to the output. A
1250 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write
1251 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes
1252 the @var{reg}th element of the @var{array} to the output.
1253
1254 @heading Conditional statements:
1255
1256 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and
1257 an optional @var{second CCL block} as arguments. If the
1258 @var{expression} evaluates to non-zero, the first @var{CCL block} is
1259 executed. Otherwise, if there is a @var{second CCL block}, it is
1260 executed.
1261
1262 The @dfn{read-if} variant of the @dfn{if} statement takes an
1263 @var{expression}, a @var{CCL block}, and an optional @var{second CCL
1264 block} as arguments. The @var{expression} must have the form
1265 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is
1266 a register or an integer). The @code{read-if} statement first reads
1267 from the input into the first register operand in the @var{expression},
1268 then conditionally executes a CCL block just as the @code{if} statement
1269 does.
1270
1271 The @dfn{branch} statement takes an @var{expression} and one or more CCL
1272 blocks as arguments. The CCL blocks are treated as a zero-indexed
1273 array, and the @code{branch} statement uses the @var{expression} as the
1274 index of the CCL block to execute. Null CCL blocks may be used as
1275 no-ops, continuing execution with the statement following the
1276 @code{branch} statement in the containing CCL block. Out-of-range
1277 values for the @var{EXPRESSION} are also treated as no-ops.
1278
1279 The @dfn{read-branch} variant of the @dfn{branch} statement takes an
1280 @var{register}, a @var{CCL block}, and an optional @var{second CCL
1281 block} as arguments. The @code{read-branch} statement first reads from
1282 the input into the @var{register}, then conditionally executes a CCL
1283 block just as the @code{branch} statement does.
1284
1285 @heading Loop control statements:
1286
1287 The @dfn{loop} statement creates a block with an implied jump from the
1288 end of the block back to its head. The loop is exited on a @code{break}
1289 statement, and continued without executing the tail by a @code{repeat}
1290 statement.
1291
1292 The @dfn{break} statement, written @samp{(break)}, terminates the
1293 current loop and continues with the next statement in the current
1294 block.
1295
1296 The @dfn{repeat} statement has three variants, @code{repeat},
1297 @code{write-repeat}, and @code{write-read-repeat}. Each continues the
1298 current loop from its head, possibly after performing I/O.
1299 @code{repeat} takes no arguments and does no I/O before jumping.
1300 @code{write-repeat} takes a single argument (a register, an
1301 integer, or a string), writes it to the output, then jumps.
1302 @code{write-read-repeat} takes one or two arguments. The first must
1303 be a register. The second may be an integer or an array; if absent, it
1304 is implicitly set to the first (register) argument.
1305 @code{write-read-repeat} writes its second argument to the output, then
1306 reads from the input into the register, and finally jumps. See the
1307 @code{write} and @code{read} statements for the semantics of the I/O
1308 operations for each type of argument.
1309
1310 @heading Other control statements:
1311
1312 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})},
1313 executes a CCL program as a subroutine. It does not return a value to
1314 the caller, but can modify the register status.
1315
1316 The @dfn{end} statement, written @samp{(end)}, terminates the CCL
1317 program successfully, and returns to caller (which may be a CCL
1318 program). It does not alter the status of the registers.
1319
1320 @node CCL Expressions, Calling CCL, CCL Statements, CCL
1321 @comment Node, Next, Previous, Up
1322 @subsection CCL Expressions
1323
1324 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions
1325 consist of a single @var{operand}, either a register (one of @code{r0},
1326 ..., @code{r0}) or an integer. Complex expressions are lists of the
1327 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike
1328 C, assignments are not expressions.
1329
1330 In the following table, @var{X} is the target resister for a @dfn{set}.
1331 In subexpressions, this is implicitly @code{r7}. This means that
1332 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used
1333 freely in subexpressions, since they return parts of their values in
1334 @code{r7}. @var{Y} may be an expression, register, or integer, while
1335 @var{Z} must be a register or an integer.
1336
1337 @multitable @columnfractions .22 .14 .09 .55
1338 @item Name @tab Operator @tab Code @tab C-like Description
1339 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z
1340 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z
1341 @item CCL_MUL @tab @code{*} @tab 0x02 @tab X = Y * Z
1342 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z
1343 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z
1344 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z
1345 @item CCL_OR @tab @code{|} @tab 0x06 @tab X = Y | Z
1346 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z
1347 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z
1348 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z
1349 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) | Z
1350 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF
1351 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z
1352 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y)
1353 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y)
1354 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y)
1355 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y)
1356 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y)
1357 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y)
1358 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z))
1359 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z)
1360 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z))
1361 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
1362 @end multitable
1363
1364 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8,
1365 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS
1366 and CCL_DECODE_SJIS treat their first and second bytes as the high and
1367 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an
1368 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a
1369 complicated transformation of the Japanese standard JIS encoding to
1370 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to
1371 represent the SJIS operations in infix form.
1372
1373 @node Calling CCL, CCL Examples, CCL Expressions, CCL
1374 @comment Node, Next, Previous, Up
1375 @subsection Calling CCL
1376
1377 CCL programs are called automatically during Emacs buffer I/O when the
1378 external representation has a coding system type of @code{shift-jis},
1379 @code{big5}, or @code{ccl}. The program is specified by the coding
1380 system (@pxref{Coding Systems}). You can also call CCL programs from
1381 other CCL programs, and from Lisp using these functions:
1382
1383 @defun ccl-execute ccl-program status
1384 Execute @var{ccl-program} with registers initialized by
1101 @var{status}. @var{ccl-program} is a vector of compiled CCL code 1385 @var{status}. @var{ccl-program} is a vector of compiled CCL code
1102 created by @code{ccl-compile}. @var{status} must be a vector of nine 1386 created by @code{ccl-compile}. It is an error for the program to try to
1387 execute a CCL I/O command. @var{status} must be a vector of nine
1103 values, specifying the initial value for the R0, R1 .. R7 registers and 1388 values, specifying the initial value for the R0, R1 .. R7 registers and
1104 for the instruction counter IC. A @code{nil} value for a register 1389 for the instruction counter IC. A @code{nil} value for a register
1105 initializer causes the register to be set to 0. A @code{nil} value for 1390 initializer causes the register to be set to 0. A @code{nil} value for
1106 the IC initializer causes execution to start at the beginning of the 1391 the IC initializer causes execution to start at the beginning of the
1107 program. When the program is done, @var{status} is modified (by 1392 program. When the program is done, @var{status} is modified (by
1108 side-effect) to contain the ending values for the corresponding 1393 side-effect) to contain the ending values for the corresponding
1109 registers and IC. 1394 registers and IC.
1110 @end defun 1395 @end defun
1111 1396
1112 @defun execute-ccl-program-string ccl-program status str 1397 @defun ccl-execute-on-string ccl-program status str &optional continue
1113 This function executes @var{ccl-program} with initial @var{status} on 1398 Execute @var{ccl-program} with initial @var{status} on
1114 @var{string}. @var{ccl-program} is a vector of compiled CCL code 1399 @var{string}. @var{ccl-program} is a vector of compiled CCL code
1115 created by @code{ccl-compile}. @var{status} must be a vector of nine 1400 created by @code{ccl-compile}. @var{status} must be a vector of nine
1116 values, specifying the initial value for the R0, R1 .. R7 registers and 1401 values, specifying the initial value for the R0, R1 .. R7 registers and
1117 for the instruction counter IC. A @code{nil} value for a register 1402 for the instruction counter IC. A @code{nil} value for a register
1118 initializer causes the register to be set to 0. A @code{nil} value for 1403 initializer causes the register to be set to 0. A @code{nil} value for
1119 the IC initializer causes execution to start at the beginning of the 1404 the IC initializer causes execution to start at the beginning of the
1120 program. When the program is done, @var{status} is modified (by 1405 program. An optional fourth argument @var{continue}, if non-nil, causes
1406 the IC to
1407 remain on the unsatisfied read operation if the program terminates due
1408 to exhaustion of the input buffer. Otherwise the IC is set to the end
1409 of the program. When the program is done, @var{status} is modified (by
1121 side-effect) to contain the ending values for the corresponding 1410 side-effect) to contain the ending values for the corresponding
1122 registers and IC. Returns the resulting string. 1411 registers and IC. Returns the resulting string.
1123 @end defun 1412 @end defun
1124 1413
1125 @defun ccl-reset-elapsed-time 1414 To call a CCL program from another CCL program, it must first be
1126 This function resets the internal value which holds the time elapsed by 1415 registered:
1127 CCL interpreter. 1416
1128 @end defun 1417 @defun register-ccl-program name ccl-program
1418 Register @var{name} for CCL program @var{program} in
1419 @code{ccl-program-table}. @var{program} should be the compiled form of
1420 a CCL program, or nil. Return index number of the registered CCL
1421 program.
1422 @end defun
1423
1424 Information about the processor time used by the CCL interpreter can be
1425 obtained using these functions:
1129 1426
1130 @defun ccl-elapsed-time 1427 @defun ccl-elapsed-time
1131 This function returns the time elapsed by CCL interpreter as cons of 1428 Returns the elapsed processor time of the CCL interpreter as cons of
1132 user and system time. This measures processor time, not real time. 1429 user and system time, as
1133 Both values are floating point numbers measured in seconds. If only one 1430 floating point numbers measured in seconds. If only one
1134 overall value can be determined, the return value will be a cons of that 1431 overall value can be determined, the return value will be a cons of that
1135 value and 0. 1432 value and 0.
1136 @end defun 1433 @end defun
1137 1434
1138 @node Category Tables 1435 @defun ccl-reset-elapsed-time
1436 Resets the CCL interpreter's internal elapsed time registers.
1437 @end defun
1438
1439 @node CCL Examples, , Calling CCL, CCL
1440 @comment Node, Next, Previous, Up
1441 @subsection CCL Examples
1442
1443 This section is not yet written.
1444
1445 @node Category Tables, , CCL, MULE
1139 @section Category Tables 1446 @section Category Tables
1140 1447
1141 A category table is a type of char table used for keeping track of 1448 A category table is a type of char table used for keeping track of
1142 categories. Categories are used for classifying characters for use in 1449 categories. Categories are used for classifying characters for use in
1143 regexps -- you can refer to a category rather than having to use a 1450 regexps---you can refer to a category rather than having to use a
1144 complicated [] expression (and category lookups are significantly 1451 complicated [] expression (and category lookups are significantly
1145 faster). 1452 faster).
1146 1453
1147 There are 95 different categories available, one for each printable 1454 There are 95 different categories available, one for each printable
1148 character (including space) in the ASCII charset. Each category is 1455 character (including space) in the ASCII charset. Each category is