comparison man/lispref/mule.texi @ 343:8bec6624d99b r21-1-1

Import from CVS: tag r21-1-1
author cvs
date Mon, 13 Aug 2007 10:52:53 +0200
parents 8619ce7e4c50
children cc15677e0335
comparison
equal deleted inserted replaced
342:b036ce23deaa 343:8bec6624d99b
1091 @defun encode-big5-char ch 1091 @defun encode-big5-char ch
1092 This function encodes the Big5 character @var{char} to BIG5 1092 This function encodes the Big5 character @var{char} to BIG5
1093 coding-system. The corresponding character code in Big5 is returned. 1093 coding-system. The corresponding character code in Big5 is returned.
1094 @end defun 1094 @end defun
1095 1095
1096 @node CCL 1096 @node CCL, Category Tables, Coding Systems, MULE
1097 @section CCL 1097 @section CCL
1098 1098
1099 @defun execute-ccl-program ccl-program status 1099 CCL (Code Conversion Language) is a simple structured programming
1100 This function executes @var{ccl-program} with registers initialized by 1100 language designed for character coding conversions. A CCL program is
1101 compiled to CCL code (represented by a vector of integers) and executed
1102 by the CCL interpreter embedded in Emacs. The CCL interpreter
1103 implements a virtual machine with 8 registers called @code{r0}, ...,
1104 @code{r7}, a number of control structures, and some I/O operators. Take
1105 care when using registers @code{r0} (used in implicit @dfn{set}
1106 statements) and especially @code{r7} (used internally by several
1107 statements and operations, especially for multiple return values and I/O
1108 operations).
1109
1110 CCL is used for code conversion during process I/O and file I/O for
1111 non-ISO2022 coding systems. (It is the only way for a user to specify a
1112 code conversion function.) It is also used for calculating the code
1113 point of an X11 font from a character code. However, since CCL is
1114 designed as a powerful programming language, it can be used for more
1115 generic calculation where efficiency is demanded. A combination of
1116 three or more arithmetic operations can be calculated faster by CCL than
1117 by Emacs Lisp.
1118
1119 @strong{Warning:} The code in @file{src/mule-ccl.c} and
1120 @file{$packages/lisp/mule-base/mule-ccl.el} is the definitive
1121 description of CCL's semantics. The previous version of this section
1122 contained several typos and obsolete names left from earlier versions of
1123 MULE, and many may remain. (I am not an experienced CCL programmer; the
1124 few who know CCL well find writing English painful.)
1125
1126 A CCL program transforms an input data stream into an output data
1127 stream. The input stream, held in a buffer of constant bytes, is left
1128 unchanged. The buffer may be filled by an external input operation,
1129 taken from an Emacs buffer, or taken from a Lisp string. The output
1130 buffer is a dynamic array of bytes, which can be written by an external
1131 output operation, inserted into an Emacs buffer, or returned as a Lisp
1132 string.
1133
1134 A CCL program is a (Lisp) list containing two or three members. The
1135 first member is the @dfn{buffer magnification}, which indicates the
1136 required minimum size of the output buffer as a multiple of the input
1137 buffer. It is followed by the @dfn{main block} which executes while
1138 there is input remaining, and an optional @dfn{EOF block} which is
1139 executed when the input is exhausted. Both the main block and the EOF
1140 block are CCL blocks.
1141
1142 A @dfn{CCL block} is either a CCL statement or list of CCL statements.
1143 A @dfn{CCL statement} is either a @dfn{set statement} (either an integer
1144 or an @dfn{assignment}, which is a list of a register to receive the
1145 assignment, an assignment operator, and an expression) or a @dfn{control
1146 statement} (a list starting with a keyword, whose allowable syntax
1147 depends on the keyword).
1148
1149 @menu
1150 * CCL Syntax:: CCL program syntax in BNF notation.
1151 * CCL Statements:: Semantics of CCL statements.
1152 * CCL Expressions:: Operators and expressions in CCL.
1153 * Calling CCL:: Running CCL programs.
1154 * CCL Examples:: The encoding functions for Big5 and KOI-8.
1155 @end menu
1156
1157 @node CCL Syntax, CCL Statements, CCL, CCL
1158 @comment Node, Next, Previous, Up
1159 @subsection CCL Syntax
1160
1161 The full syntax of a CCL program in BNF notation:
1162
1163 @format
1164 CCL_PROGRAM :=
1165 (BUFFER_MAGNIFICATION
1166 CCL_MAIN_BLOCK
1167 [ CCL_EOF_BLOCK ])
1168
1169 BUFFER_MAGNIFICATION := integer
1170 CCL_MAIN_BLOCK := CCL_BLOCK
1171 CCL_EOF_BLOCK := CCL_BLOCK
1172
1173 CCL_BLOCK :=
1174 STATEMENT | (STATEMENT [STATEMENT ...])
1175 STATEMENT :=
1176 SET | IF | BRANCH | LOOP | REPEAT | BREAK | READ | WRITE
1177 | CALL | END
1178
1179 SET :=
1180 (REG = EXPRESSION)
1181 | (REG ASSIGNMENT_OPERATOR EXPRESSION)
1182 | integer
1183
1184 EXPRESSION := ARG | (EXPRESSION OPERATOR ARG)
1185
1186 IF := (if EXPRESSION CCL_BLOCK [CCL_BLOCK])
1187 BRANCH := (branch EXPRESSION CCL_BLOCK [CCL_BLOCK ...])
1188 LOOP := (loop STATEMENT [STATEMENT ...])
1189 BREAK := (break)
1190 REPEAT :=
1191 (repeat)
1192 | (write-repeat [REG | integer | string])
1193 | (write-read-repeat REG [integer | ARRAY])
1194 READ :=
1195 (read REG ...)
1196 | (read-if (REG OPERATOR ARG) CCL_BLOCK CCL_BLOCK)
1197 | (read-branch REG CCL_BLOCK [CCL_BLOCK ...])
1198 WRITE :=
1199 (write REG ...)
1200 | (write EXPRESSION)
1201 | (write integer) | (write string) | (write REG ARRAY)
1202 | string
1203 CALL := (call ccl-program-name)
1204 END := (end)
1205
1206 REG := r0 | r1 | r2 | r3 | r4 | r5 | r6 | r7
1207 ARG := REG | integer
1208 OPERATOR :=
1209 + | - | * | / | % | & | '|' | ^ | << | >> | <8 | >8 | //
1210 | < | > | == | <= | >= | != | de-sjis | en-sjis
1211 ASSIGNMENT_OPERATOR :=
1212 += | -= | *= | /= | %= | &= | '|=' | ^= | <<= | >>=
1213 ARRAY := '[' integer ... ']'
1214 @end format
1215
1216 @node CCL Statements, CCL Expressions, CCL Syntax, CCL
1217 @comment Node, Next, Previous, Up
1218 @subsection CCL Statements
1219
1220 The Emacs Code Conversion Language provides the following statement
1221 types: @dfn{set}, @dfn{if}, @dfn{branch}, @dfn{loop}, @dfn{repeat},
1222 @dfn{break}, @dfn{read}, @dfn{write}, @dfn{call}, and @dfn{end}.
1223
1224 @heading Set statement:
1225
1226 The @dfn{set} statement has three variants with the syntaxes
1227 @samp{(@var{reg} = @var{expression})},
1228 @samp{(@var{reg} @var{assignment_operator} @var{expression})}, and
1229 @samp{@var{integer}}. The assignment operator variation of the
1230 @dfn{set} statement works the same way as the corresponding C expression
1231 statement does. The assignment operators are @code{+=}, @code{-=},
1232 @code{*=}, @code{/=}, @code{%=}, @code{&=}, @code{|=}, @code{^=},
1233 @code{<<=}, and @code{>>=}, and they have the same meanings as in C. A
1234 "naked integer" @var{integer} is equivalent to a @var{set} statement of
1235 the form @code{(r0 = @var{integer})}.
1236
1237 @heading I/O statements:
1238
1239 The @dfn{read} statement takes one or more registers as arguments. It
1240 reads one byte (a C char) from the input into each register in turn.
1241
1242 The @dfn{write} takes several forms. In the form @samp{(write @var{reg}
1243 ...)} it takes one or more registers as arguments and writes each in
1244 turn to the output. The integer in a register (interpreted as an
1245 Emchar) is encoded to multibyte form (ie, Bufbytes) and written to the
1246 current output buffer. If it is less than 256, it is written as is.
1247 The forms @samp{(write @var{expression})} and @samp{(write
1248 @var{integer})} are treated analogously. The form @samp{(write
1249 @var{string})} writes the constant string to the output. A
1250 "naked string" @samp{@var{string}} is equivalent to the statement @samp{(write
1251 @var{string})}. The form @samp{(write @var{reg} @var{array})} writes
1252 the @var{reg}th element of the @var{array} to the output.
1253
1254 @heading Conditional statements:
1255
1256 The @dfn{if} statement takes an @var{expression}, a @var{CCL block}, and
1257 an optional @var{second CCL block} as arguments. If the
1258 @var{expression} evaluates to non-zero, the first @var{CCL block} is
1259 executed. Otherwise, if there is a @var{second CCL block}, it is
1260 executed.
1261
1262 The @dfn{read-if} variant of the @dfn{if} statement takes an
1263 @var{expression}, a @var{CCL block}, and an optional @var{second CCL
1264 block} as arguments. The @var{expression} must have the form
1265 @code{(@var{reg} @var{operator} @var{operand})} (where @var{operand} is
1266 a register or an integer). The @code{read-if} statement first reads
1267 from the input into the first register operand in the @var{expression},
1268 then conditionally executes a CCL block just as the @code{if} statement
1269 does.
1270
1271 The @dfn{branch} statement takes an @var{expression} and one or more CCL
1272 blocks as arguments. The CCL blocks are treated as a zero-indexed
1273 array, and the @code{branch} statement uses the @var{expression} as the
1274 index of the CCL block to execute. Null CCL blocks may be used as
1275 no-ops, continuing execution with the statement following the
1276 @code{branch} statement in the containing CCL block. Out-of-range
1277 values for the @var{EXPRESSION} are also treated as no-ops.
1278
1279 The @dfn{read-branch} variant of the @dfn{branch} statement takes an
1280 @var{register}, a @var{CCL block}, and an optional @var{second CCL
1281 block} as arguments. The @code{read-branch} statement first reads from
1282 the input into the @var{register}, then conditionally executes a CCL
1283 block just as the @code{branch} statement does.
1284
1285 @heading Loop control statements:
1286
1287 The @dfn{loop} statement creates a block with an implied jump from the
1288 end of the block back to its head. The loop is exited on a @code{break}
1289 statement, and continued without executing the tail by a @code{repeat}
1290 statement.
1291
1292 The @dfn{break} statement, written @samp{(break)}, terminates the
1293 current loop and continues with the next statement in the current
1294 block.
1295
1296 The @dfn{repeat} statement has three variants, @code{repeat},
1297 @code{write-repeat}, and @code{write-read-repeat}. Each continues the
1298 current loop from its head, possibly after performing I/O.
1299 @code{repeat} takes no arguments and does no I/O before jumping.
1300 @code{write-repeat} takes a single argument (a register, an
1301 integer, or a string), writes it to the output, then jumps.
1302 @code{write-read-repeat} takes one or two arguments. The first must
1303 be a register. The second may be an integer or an array; if absent, it
1304 is implicitly set to the first (register) argument.
1305 @code{write-read-repeat} writes its second argument to the output, then
1306 reads from the input into the register, and finally jumps. See the
1307 @code{write} and @code{read} statements for the semantics of the I/O
1308 operations for each type of argument.
1309
1310 @heading Other control statements:
1311
1312 The @dfn{call} statement, written @samp{(call @var{ccl-program-name})},
1313 executes a CCL program as a subroutine. It does not return a value to
1314 the caller, but can modify the register status.
1315
1316 The @dfn{end} statement, written @samp{(end)}, terminates the CCL
1317 program successfully, and returns to caller (which may be a CCL
1318 program). It does not alter the status of the registers.
1319
1320 @node CCL Expressions, Calling CCL, CCL Statements, CCL
1321 @comment Node, Next, Previous, Up
1322 @subsection CCL Expressions
1323
1324 CCL, unlike Lisp, uses infix expressions. The simplest CCL expressions
1325 consist of a single @var{operand}, either a register (one of @code{r0},
1326 ..., @code{r0}) or an integer. Complex expressions are lists of the
1327 form @code{( @var{expression} @var{operator} @var{operand} )}. Unlike
1328 C, assignments are not expressions.
1329
1330 In the following table, @var{X} is the target resister for a @dfn{set}.
1331 In subexpressions, this is implicitly @code{r7}. This means that
1332 @code{>8}, @code{//}, @code{de-sjis}, and @code{en-sjis} cannot be used
1333 freely in subexpressions, since they return parts of their values in
1334 @code{r7}. @var{Y} may be an expression, register, or integer, while
1335 @var{Z} must be a register or an integer.
1336
1337 @multitable @columnfractions .22 .14 .09 .55
1338 @item Name @tab Operator @tab Code @tab C-like Description
1339 @item CCL_PLUS @tab @code{+} @tab 0x00 @tab X = Y + Z
1340 @item CCL_MINUS @tab @code{-} @tab 0x01 @tab X = Y - Z
1341 @item CCL_MUL @tab @code{*} @tab 0x02 @tab X = Y * Z
1342 @item CCL_DIV @tab @code{/} @tab 0x03 @tab X = Y / Z
1343 @item CCL_MOD @tab @code{%} @tab 0x04 @tab X = Y % Z
1344 @item CCL_AND @tab @code{&} @tab 0x05 @tab X = Y & Z
1345 @item CCL_OR @tab @code{|} @tab 0x06 @tab X = Y | Z
1346 @item CCL_XOR @tab @code{^} @tab 0x07 @tab X = Y ^ Z
1347 @item CCL_LSH @tab @code{<<} @tab 0x08 @tab X = Y << Z
1348 @item CCL_RSH @tab @code{>>} @tab 0x09 @tab X = Y >> Z
1349 @item CCL_LSH8 @tab @code{<8} @tab 0x0A @tab X = (Y << 8) | Z
1350 @item CCL_RSH8 @tab @code{>8} @tab 0x0B @tab X = Y >> 8, r[7] = Y & 0xFF
1351 @item CCL_DIVMOD @tab @code{//} @tab 0x0C @tab X = Y / Z, r[7] = Y % Z
1352 @item CCL_LS @tab @code{<} @tab 0x10 @tab X = (X < Y)
1353 @item CCL_GT @tab @code{>} @tab 0x11 @tab X = (X > Y)
1354 @item CCL_EQ @tab @code{==} @tab 0x12 @tab X = (X == Y)
1355 @item CCL_LE @tab @code{<=} @tab 0x13 @tab X = (X <= Y)
1356 @item CCL_GE @tab @code{>=} @tab 0x14 @tab X = (X >= Y)
1357 @item CCL_NE @tab @code{!=} @tab 0x15 @tab X = (X != Y)
1358 @item CCL_ENCODE_SJIS @tab @code{en-sjis} @tab 0x16 @tab X = HIGHER_BYTE (SJIS (Y, Z))
1359 @item @tab @tab @tab r[7] = LOWER_BYTE (SJIS (Y, Z)
1360 @item CCL_DECODE_SJIS @tab @code{de-sjis} @tab 0x17 @tab X = HIGHER_BYTE (DE-SJIS (Y, Z))
1361 @item @tab @tab @tab r[7] = LOWER_BYTE (DE-SJIS (Y, Z))
1362 @end multitable
1363
1364 The CCL operators are as in C, with the addition of CCL_LSH8, CCL_RSH8,
1365 CCL_DIVMOD, CCL_ENCODE_SJIS, and CCL_DECODE_SJIS. The CCL_ENCODE_SJIS
1366 and CCL_DECODE_SJIS treat their first and second bytes as the high and
1367 low bytes of a two-byte character code. (SJIS stands for Shift JIS, an
1368 encoding of Japanese characters used by Microsoft. CCL_ENCODE_SJIS is a
1369 complicated transformation of the Japanese standard JIS encoding to
1370 Shift JIS. CCL_DECODE_SJIS is its inverse.) It is somewhat odd to
1371 represent the SJIS operations in infix form.
1372
1373 @node Calling CCL, CCL Examples, CCL Expressions, CCL
1374 @comment Node, Next, Previous, Up
1375 @subsection Calling CCL
1376
1377 CCL programs are called automatically during Emacs buffer I/O when the
1378 external representation has a coding system type of @code{shift-jis},
1379 @code{big5}, or @code{ccl}. The program is specified by the coding
1380 system (@pxref{Coding Systems}). You can also call CCL programs from
1381 other CCL programs, and from Lisp using these functions:
1382
1383 @defun ccl-execute ccl-program status
1384 Execute @var{ccl-program} with registers initialized by
1101 @var{status}. @var{ccl-program} is a vector of compiled CCL code 1385 @var{status}. @var{ccl-program} is a vector of compiled CCL code
1102 created by @code{ccl-compile}. @var{status} must be a vector of nine 1386 created by @code{ccl-compile}. It is an error for the program to try to
1387 execute a CCL I/O command. @var{status} must be a vector of nine
1103 values, specifying the initial value for the R0, R1 .. R7 registers and 1388 values, specifying the initial value for the R0, R1 .. R7 registers and
1104 for the instruction counter IC. A @code{nil} value for a register 1389 for the instruction counter IC. A @code{nil} value for a register
1105 initializer causes the register to be set to 0. A @code{nil} value for 1390 initializer causes the register to be set to 0. A @code{nil} value for
1106 the IC initializer causes execution to start at the beginning of the 1391 the IC initializer causes execution to start at the beginning of the
1107 program. When the program is done, @var{status} is modified (by 1392 program. When the program is done, @var{status} is modified (by
1108 side-effect) to contain the ending values for the corresponding 1393 side-effect) to contain the ending values for the corresponding
1109 registers and IC. 1394 registers and IC.
1110 @end defun 1395 @end defun
1111 1396
1112 @defun execute-ccl-program-string ccl-program status str 1397 @defun ccl-execute-on-string ccl-program status str &optional continue
1113 This function executes @var{ccl-program} with initial @var{status} on 1398 Execute @var{ccl-program} with initial @var{status} on
1114 @var{string}. @var{ccl-program} is a vector of compiled CCL code 1399 @var{string}. @var{ccl-program} is a vector of compiled CCL code
1115 created by @code{ccl-compile}. @var{status} must be a vector of nine 1400 created by @code{ccl-compile}. @var{status} must be a vector of nine
1116 values, specifying the initial value for the R0, R1 .. R7 registers and 1401 values, specifying the initial value for the R0, R1 .. R7 registers and
1117 for the instruction counter IC. A @code{nil} value for a register 1402 for the instruction counter IC. A @code{nil} value for a register
1118 initializer causes the register to be set to 0. A @code{nil} value for 1403 initializer causes the register to be set to 0. A @code{nil} value for
1119 the IC initializer causes execution to start at the beginning of the 1404 the IC initializer causes execution to start at the beginning of the
1120 program. When the program is done, @var{status} is modified (by 1405 program. An optional fourth argument @var{continue}, if non-nil, causes
1406 the IC to
1407 remain on the unsatisfied read operation if the program terminates due
1408 to exhaustion of the input buffer. Otherwise the IC is set to the end
1409 of the program. When the program is done, @var{status} is modified (by
1121 side-effect) to contain the ending values for the corresponding 1410 side-effect) to contain the ending values for the corresponding
1122 registers and IC. Returns the resulting string. 1411 registers and IC. Returns the resulting string.
1123 @end defun 1412 @end defun
1124 1413
1125 @defun ccl-reset-elapsed-time 1414 To call a CCL program from another CCL program, it must first be
1126 This function resets the internal value which holds the time elapsed by 1415 registered:
1127 CCL interpreter. 1416
1128 @end defun 1417 @defun register-ccl-program name ccl-program
1418 Register @var{name} for CCL program @var{program} in
1419 @code{ccl-program-table}. @var{program} should be the compiled form of
1420 a CCL program, or nil. Return index number of the registered CCL
1421 program.
1422 @end defun
1423
1424 Information about the processor time used by the CCL interpreter can be
1425 obtained using these functions:
1129 1426
1130 @defun ccl-elapsed-time 1427 @defun ccl-elapsed-time
1131 This function returns the time elapsed by CCL interpreter as cons of 1428 Returns the elapsed processor time of the CCL interpreter as cons of
1132 user and system time. This measures processor time, not real time. 1429 user and system time, as
1133 Both values are floating point numbers measured in seconds. If only one 1430 floating point numbers measured in seconds. If only one
1134 overall value can be determined, the return value will be a cons of that 1431 overall value can be determined, the return value will be a cons of that
1135 value and 0. 1432 value and 0.
1136 @end defun 1433 @end defun
1137 1434
1138 @node Category Tables 1435 @defun ccl-reset-elapsed-time
1436 Resets the CCL interpreter's internal elapsed time registers.
1437 @end defun
1438
1439 @node CCL Examples, , Calling CCL, CCL
1440 @comment Node, Next, Previous, Up
1441 @subsection CCL Examples
1442
1443 This section is not yet written.
1444
1445 @node Category Tables, , CCL, MULE
1139 @section Category Tables 1446 @section Category Tables
1140 1447
1141 A category table is a type of char table used for keeping track of 1448 A category table is a type of char table used for keeping track of
1142 categories. Categories are used for classifying characters for use in 1449 categories. Categories are used for classifying characters for use in
1143 regexps -- you can refer to a category rather than having to use a 1450 regexps -- you can refer to a category rather than having to use a