Mercurial > hg > xemacs-beta
comparison man/lispref/mule.texi @ 343:8bec6624d99b r21-1-1
Import from CVS: tag r21-1-1
author | cvs |
---|---|
date | Mon, 13 Aug 2007 10:52:53 +0200 |
parents | 8619ce7e4c50 |
children | cc15677e0335 |
comparison
equal
deleted
inserted
replaced
342:b036ce23deaa | 343:8bec6624d99b |
---|---|
1091 @defun encode-big5-char ch | 1091 @defun encode-big5-char ch |
1092 This function encodes the Big5 character @var{char} to BIG5 | 1092 This function encodes the Big5 character @var{char} to BIG5 |
1093 coding-system. The corresponding character code in Big5 is returned. | 1093 coding-system. The corresponding character code in Big5 is returned. |
1094 @end defun | 1094 @end defun |
1095 | 1095 |
1096 @node CCL | 1096 @node CCL, Category Tables, Coding Systems, MULE |
1097 @section CCL | 1097 @section CCL |
1098 | 1098 |
1099 @defun execute-ccl-program ccl-program status | 1099 CCL (Code Conversion Language) is a simple structured programming |
1100 This function executes @var{ccl-program} with registers initialized by | 1100 language designed for character coding conversions. A CCL program is |
1101 compiled to CCL code (represented by a vector of integers) and executed | |
1102 by the CCL interpreter embedded in Emacs. The CCL interpreter | |
1103 implements a virtual machine with 8 registers called @code{r0}, ..., | |
1104 @code{r7}, a number of control structures, and some I/O operators. Take | |
1105 care when using registers @code{r0} (used in implicit @dfn{set} | |
1106 statements) and especially @code{r7} (used internally by several | |
1107 statements and operations, especially for multiple return values and I/O | |
1108 operations). | |
1109 | |
1110 CCL is used for code conversion during process I/O and file I/O for | |
1111 non-ISO2022 coding systems. (It is the only way for a user to specify a | |
1112 code conversion function.) It is also used for calculating the code | |
1113 point of an X11 font from a character code. However, since CCL is | |
1114 designed as a powerful programming language, it can be used for more | |
1115 generic calculation where efficiency is demanded. A combination of | |
1116 three or more arithmetic operations can be calculated faster by CCL than | |
1117 by Emacs Lisp. | |
1118 | |
1119 @strong{Warning:} The code in @file{src/mule-ccl.c} and | |
1120 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive | |
1121 description of CCL's semantics. The previous version of this section | |
1122 contained several typos and obsolete names left from earlier versions of | |
1123 MULE, and many may remain. (I am not an experienced CCL programmer; the | |
1124 few who know CCL well find writing English painful.) | |
1125 | |
1126 A CCL program transforms an input data stream into an output data | |
1127 stream. The input stream, held in a buffer of constant bytes, is left | |
1128 unchanged. The buffer may be filled by an external input operation, | |
1129 taken from an Emacs buffer, or taken from a Lisp string. The output | |
1130 buffer is a dynamic array of bytes, which can be written by an external | |
1131 output operation, inserted into an Emacs buffer, or returned as a Lisp | |
1132 string. | |
1133 | |
1134 A CCL program is a (Lisp) list containing two or three members. The | |
1135 first member is the @dfn{buffer magnification}, which indicates the | |
1136 required minimum size of the output buffer as a multiple of the input | |
1137 buffer. It is followed by the @dfn{main block} which executes while | |
1138 there is input remaining, and an optional @dfn{EOF block} which is | |
1139 executed when the input is exhausted. Both the main block and the EOF | |
1140 block are CCL blocks. | |
1141 | |
1142 A @dfn{CCL block} is either a CCL statement or list of CCL statements. | |
1143 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer | |
1144 or an @dfn{assignment}, which is a list of a register to receive the | |
1145 assignment, an assignment operator, and an expression) or a @dfn{control | |
1146 statement} (a list starting with a keyword, whose allowable syntax | |
1147 depends on the keyword). | |
1148 | |
1149 @menu | |
1150 * CCL Syntax:: CCL program syntax in BNF notation. | |
1151 * CCL Statements:: Semantics of CCL statements. | |
1152 * CCL Expressions:: Operators and expressions in CCL. | |
1153 * Calling CCL:: Running CCL programs. | |
1154 * CCL Examples:: The encoding functions for Big5 and KOI-8. | |
1155 @end menu | |
1156 | |
1157 @node CCL Syntax, CCL Statements, CCL, CCL | |
1158 @comment Node, Next, Previous, Up | |
1159 @subsection CCL Syntax | |
1160 | |
1161 The full syntax of a CCL program in BNF notation: | |
1162 | |
1163 @format | |
1164 CCL_PROGRAM := | |
1165 (BUFFER_MAGNIFICATION | |
1166 CCL_MAIN_BLOCK | |
1167 [ CCL_EOF_BLOCK ]) | |
1168 | |
1169 BUFFER_MAGNIFICATION := integer | |
1170 CCL_MAIN_BLOCK := CCL_BLOCK | |
1171 CCL_EOF_BLOCK := CCL_BLOCK | |
1172 | |
1173 CCL_BLOCK := | |
1174 STATEMENT | (STATEMENT [STATEMENT ...]) | |
1175 STATEMENT := | |
1176 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE | |
1177 | CALL | END | |
1178 | |
1179 SET := | |
1180 (REG = EXPRESSION) | |
1181 | (REG ASSIGNMENT_OPERATOR EXPRESSION) | |
1182 | integer | |
1183 | |
1184 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG) | |
1185 | |
1186 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK]) | |
1187 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...]) | |
1188 LOOP := (loop STATEMENT [STATEMENT ...]) | |
1189 BREAK := (break) | |
1190 REPEAT := | |
1191 (repeat) | |
1192 | (write-repeat [REG | integer | string]) | |
1193 | (write-read-repeat REG [integer | ARRAY]) | |
1194 READ := | |
1195 (read REG ...) | |
1196 | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK) | |
1197 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...]) | |
1198 WRITE := | |
1199 (write REG ...) | |
1200 | (write EXPRESSION) | |
1201 | (write integer) | (write string) | (write REG ARRAY) | |
1202 | string | |
1203 CALL := (call ccl-program-name) | |
1204 END := (end) | |
1205 | |
1206 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7 | |
1207 ARG := REG | integer | |
1208 OPERATOR := | |
1209 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | // | |
1210 | < | > | == | <= | >= | != | de-sjis | en-sjis | |
1211 ASSIGNMENT_OPERATOR := | |
1212 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>= | |
1213 ARRAY := '[' integer ... ']' | |
1214 @end format | |
1215 | |
1216 @node CCL Statements, CCL Expressions, CCL Syntax, CCL | |
1217 @comment Node, Next, Previous, Up | |
1218 @subsection CCL Statements | |
1219 | |
1220 The Emacs Code Conversion Language provides the following statement | |
1221 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat}, | |
1222 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, and @dfn{end}. | |
1223 | |
1224 @heading Set statement: | |
1225 | |
1226 The @dfn{set} statement has three variants with the syntaxes | |
1227 @samp{(@var{reg} = @var{expression})}, | |
1228 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and | |
1229 @samp{@var{integer}}. The assignment operator variation of the | |
1230 @dfn{set} statement works the same way as the corresponding C expression | |
1231 statement does. The assignment operators are @code{+=}, @code{-=}, | |
1232 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{|=}, @code{^=}, | |
1233 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A | |
1234 "naked integer" @var{integer} is equivalent to a @var{set} statement of | |
1235 the form @code{(r0 = @var{integer})}. | |
1236 | |
1237 @heading I/O statements: | |
1238 | |
1239 The @dfn{read} statement takes one or more registers as arguments. It | |
1240 reads one byte (a C char) from the input into each register in turn. | |
1241 | |
1242 The @dfn{write} takes several forms. In the form @samp{(write @var{reg} | |
1243 ...)} it takes one or more registers as arguments and writes each in | |
1244 turn to the output. The integer in a register (interpreted as an | |
1245 Emchar) is encoded to multibyte form (ie, Bufbytes) and written to the | |
1246 current output buffer. If it is less than 256, it is written as is. | |
1247 The forms @samp{(write @var{expression})} and @samp{(write | |
1248 @var{integer})} are treated analogously. The form @samp{(write | |
1249 @var{string})} writes the constant string to the output. A | |
1250 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write | |
1251 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes | |
1252 the @var{reg}th element of the @var{array} to the output. | |
1253 | |
1254 @heading Conditional statements: | |
1255 | |
1256 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and | |
1257 an optional @var{second CCL block} as arguments. If the | |
1258 @var{expression} evaluates to non-zero, the first @var{CCL block} is | |
1259 executed. Otherwise, if there is a @var{second CCL block}, it is | |
1260 executed. | |
1261 | |
1262 The @dfn{read-if} variant of the @dfn{if} statement takes an | |
1263 @var{expression}, a @var{CCL block}, and an optional @var{second CCL | |
1264 block} as arguments. The @var{expression} must have the form | |
1265 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is | |
1266 a register or an integer). The @code{read-if} statement first reads | |
1267 from the input into the first register operand in the @var{expression}, | |
1268 then conditionally executes a CCL block just as the @code{if} statement | |
1269 does. | |
1270 | |
1271 The @dfn{branch} statement takes an @var{expression} and one or more CCL | |
1272 blocks as arguments. The CCL blocks are treated as a zero-indexed | |
1273 array, and the @code{branch} statement uses the @var{expression} as the | |
1274 index of the CCL block to execute. Null CCL blocks may be used as | |
1275 no-ops, continuing execution with the statement following the | |
1276 @code{branch} statement in the containing CCL block. Out-of-range | |
1277 values for the @var{EXPRESSION} are also treated as no-ops. | |
1278 | |
1279 The @dfn{read-branch} variant of the @dfn{branch} statement takes an | |
1280 @var{register}, a @var{CCL block}, and an optional @var{second CCL | |
1281 block} as arguments. The @code{read-branch} statement first reads from | |
1282 the input into the @var{register}, then conditionally executes a CCL | |
1283 block just as the @code{branch} statement does. | |
1284 | |
1285 @heading Loop control statements: | |
1286 | |
1287 The @dfn{loop} statement creates a block with an implied jump from the | |
1288 end of the block back to its head. The loop is exited on a @code{break} | |
1289 statement, and continued without executing the tail by a @code{repeat} | |
1290 statement. | |
1291 | |
1292 The @dfn{break} statement, written @samp{(break)}, terminates the | |
1293 current loop and continues with the next statement in the current | |
1294 block. | |
1295 | |
1296 The @dfn{repeat} statement has three variants, @code{repeat}, | |
1297 @code{write-repeat}, and @code{write-read-repeat}. Each continues the | |
1298 current loop from its head, possibly after performing I/O. | |
1299 @code{repeat} takes no arguments and does no I/O before jumping. | |
1300 @code{write-repeat} takes a single argument (a register, an | |
1301 integer, or a string), writes it to the output, then jumps. | |
1302 @code{write-read-repeat} takes one or two arguments. The first must | |
1303 be a register. The second may be an integer or an array; if absent, it | |
1304 is implicitly set to the first (register) argument. | |
1305 @code{write-read-repeat} writes its second argument to the output, then | |
1306 reads from the input into the register, and finally jumps. See the | |
1307 @code{write} and @code{read} statements for the semantics of the I/O | |
1308 operations for each type of argument. | |
1309 | |
1310 @heading Other control statements: | |
1311 | |
1312 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})}, | |
1313 executes a CCL program as a subroutine. It does not return a value to | |
1314 the caller, but can modify the register status. | |
1315 | |
1316 The @dfn{end} statement, written @samp{(end)}, terminates the CCL | |
1317 program successfully, and returns to caller (which may be a CCL | |
1318 program). It does not alter the status of the registers. | |
1319 | |
1320 @node CCL Expressions, Calling CCL, CCL Statements, CCL | |
1321 @comment Node, Next, Previous, Up | |
1322 @subsection CCL Expressions | |
1323 | |
1324 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions | |
1325 consist of a single @var{operand}, either a register (one of @code{r0}, | |
1326 ..., @code{r0}) or an integer. Complex expressions are lists of the | |
1327 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike | |
1328 C, assignments are not expressions. | |
1329 | |
1330 In the following table, @var{X} is the target resister for a @dfn{set}. | |
1331 In subexpressions, this is implicitly @code{r7}. This means that | |
1332 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used | |
1333 freely in subexpressions, since they return parts of their values in | |
1334 @code{r7}. @var{Y} may be an expression, register, or integer, while | |
1335 @var{Z} must be a register or an integer. | |
1336 | |
1337 @multitable @columnfractions .22 .14 .09 .55 | |
1338 @item Name @tab Operator @tab Code @tab C-like Description | |
1339 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z | |
1340 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z | |
1341 @item CCL_MUL @tab @code{*} @tab 0x02 @tab X = Y * Z | |
1342 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z | |
1343 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z | |
1344 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z | |
1345 @item CCL_OR @tab @code{|} @tab 0x06 @tab X = Y | Z | |
1346 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z | |
1347 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z | |
1348 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z | |
1349 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) | Z | |
1350 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF | |
1351 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z | |
1352 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y) | |
1353 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y) | |
1354 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y) | |
1355 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y) | |
1356 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y) | |
1357 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y) | |
1358 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z)) | |
1359 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z) | |
1360 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z)) | |
1361 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z)) | |
1362 @end multitable | |
1363 | |
1364 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8, | |
1365 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS | |
1366 and CCL_DECODE_SJIS treat their first and second bytes as the high and | |
1367 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an | |
1368 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a | |
1369 complicated transformation of the Japanese standard JIS encoding to | |
1370 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to | |
1371 represent the SJIS operations in infix form. | |
1372 | |
1373 @node Calling CCL, CCL Examples, CCL Expressions, CCL | |
1374 @comment Node, Next, Previous, Up | |
1375 @subsection Calling CCL | |
1376 | |
1377 CCL programs are called automatically during Emacs buffer I/O when the | |
1378 external representation has a coding system type of @code{shift-jis}, | |
1379 @code{big5}, or @code{ccl}. The program is specified by the coding | |
1380 system (@pxref{Coding Systems}). You can also call CCL programs from | |
1381 other CCL programs, and from Lisp using these functions: | |
1382 | |
1383 @defun ccl-execute ccl-program status | |
1384 Execute @var{ccl-program} with registers initialized by | |
1101 @var{status}. @var{ccl-program} is a vector of compiled CCL code | 1385 @var{status}. @var{ccl-program} is a vector of compiled CCL code |
1102 created by @code{ccl-compile}. @var{status} must be a vector of nine | 1386 created by @code{ccl-compile}. It is an error for the program to try to |
1387 execute a CCL I/O command. @var{status} must be a vector of nine | |
1103 values, specifying the initial value for the R0, R1 .. R7 registers and | 1388 values, specifying the initial value for the R0, R1 .. R7 registers and |
1104 for the instruction counter IC. A @code{nil} value for a register | 1389 for the instruction counter IC. A @code{nil} value for a register |
1105 initializer causes the register to be set to 0. A @code{nil} value for | 1390 initializer causes the register to be set to 0. A @code{nil} value for |
1106 the IC initializer causes execution to start at the beginning of the | 1391 the IC initializer causes execution to start at the beginning of the |
1107 program. When the program is done, @var{status} is modified (by | 1392 program. When the program is done, @var{status} is modified (by |
1108 side-effect) to contain the ending values for the corresponding | 1393 side-effect) to contain the ending values for the corresponding |
1109 registers and IC. | 1394 registers and IC. |
1110 @end defun | 1395 @end defun |
1111 | 1396 |
1112 @defun execute-ccl-program-string ccl-program status str | 1397 @defun ccl-execute-on-string ccl-program status str &optional continue |
1113 This function executes @var{ccl-program} with initial @var{status} on | 1398 Execute @var{ccl-program} with initial @var{status} on |
1114 @var{string}. @var{ccl-program} is a vector of compiled CCL code | 1399 @var{string}. @var{ccl-program} is a vector of compiled CCL code |
1115 created by @code{ccl-compile}. @var{status} must be a vector of nine | 1400 created by @code{ccl-compile}. @var{status} must be a vector of nine |
1116 values, specifying the initial value for the R0, R1 .. R7 registers and | 1401 values, specifying the initial value for the R0, R1 .. R7 registers and |
1117 for the instruction counter IC. A @code{nil} value for a register | 1402 for the instruction counter IC. A @code{nil} value for a register |
1118 initializer causes the register to be set to 0. A @code{nil} value for | 1403 initializer causes the register to be set to 0. A @code{nil} value for |
1119 the IC initializer causes execution to start at the beginning of the | 1404 the IC initializer causes execution to start at the beginning of the |
1120 program. When the program is done, @var{status} is modified (by | 1405 program. An optional fourth argument @var{continue}, if non-nil, causes |
1406 the IC to | |
1407 remain on the unsatisfied read operation if the program terminates due | |
1408 to exhaustion of the input buffer. Otherwise the IC is set to the end | |
1409 of the program. When the program is done, @var{status} is modified (by | |
1121 side-effect) to contain the ending values for the corresponding | 1410 side-effect) to contain the ending values for the corresponding |
1122 registers and IC. Returns the resulting string. | 1411 registers and IC. Returns the resulting string. |
1123 @end defun | 1412 @end defun |
1124 | 1413 |
1125 @defun ccl-reset-elapsed-time | 1414 To call a CCL program from another CCL program, it must first be |
1126 This function resets the internal value which holds the time elapsed by | 1415 registered: |
1127 CCL interpreter. | 1416 |
1128 @end defun | 1417 @defun register-ccl-program name ccl-program |
1418 Register @var{name} for CCL program @var{program} in | |
1419 @code{ccl-program-table}. @var{program} should be the compiled form of | |
1420 a CCL program, or nil. Return index number of the registered CCL | |
1421 program. | |
1422 @end defun | |
1423 | |
1424 Information about the processor time used by the CCL interpreter can be | |
1425 obtained using these functions: | |
1129 | 1426 |
1130 @defun ccl-elapsed-time | 1427 @defun ccl-elapsed-time |
1131 This function returns the time elapsed by CCL interpreter as cons of | 1428 Returns the elapsed processor time of the CCL interpreter as cons of |
1132 user and system time. This measures processor time, not real time. | 1429 user and system time, as |
1133 Both values are floating point numbers measured in seconds. If only one | 1430 floating point numbers measured in seconds. If only one |
1134 overall value can be determined, the return value will be a cons of that | 1431 overall value can be determined, the return value will be a cons of that |
1135 value and 0. | 1432 value and 0. |
1136 @end defun | 1433 @end defun |
1137 | 1434 |
1138 @node Category Tables | 1435 @defun ccl-reset-elapsed-time |
1436 Resets the CCL interpreter's internal elapsed time registers. | |
1437 @end defun | |
1438 | |
1439 @node CCL Examples, , Calling CCL, CCL | |
1440 @comment Node, Next, Previous, Up | |
1441 @subsection CCL Examples | |
1442 | |
1443 This section is not yet written. | |
1444 | |
1445 @node Category Tables, , CCL, MULE | |
1139 @section Category Tables | 1446 @section Category Tables |
1140 | 1447 |
1141 A category table is a type of char table used for keeping track of | 1448 A category table is a type of char table used for keeping track of |
1142 categories. Categories are used for classifying characters for use in | 1449 categories. Categories are used for classifying characters for use in |
1143 regexps -- you can refer to a category rather than having to use a | 1450 regexps -- you can refer to a category rather than having to use a |