Mercurial > hg > xemacs-beta
comparison man/lispref/mule.texi @ 2690:d5bfa26d5c3f
[xemacs-hg @ 2005-03-26 16:20:01 by aidan]
Cleanup of the CCL coding system example, based on Stephen's feedback.
author | aidan |
---|---|
date | Sat, 26 Mar 2005 16:20:05 +0000 |
parents | a4040d921acc |
children | 9fa10603c898 |
comparison
equal
deleted
inserted
replaced
2689:9e54f5421792 | 2690:d5bfa26d5c3f |
---|---|
2095 * URI Encoding constants:: Useful predefined characters. | 2095 * URI Encoding constants:: Useful predefined characters. |
2096 * Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL. | 2096 * Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL. |
2097 * Characters to be preserved:: No transformation needed for these characters. | 2097 * Characters to be preserved:: No transformation needed for these characters. |
2098 * The program to decode to internal format:: . | 2098 * The program to decode to internal format:: . |
2099 * The program to encode from internal format:: . | 2099 * The program to encode from internal format:: . |
2100 | 2100 * The actual coding system:: . |
2101 @end menu | 2101 @end menu |
2102 | 2102 |
2103 @node Four bits to ASCII, URI Encoding constants, , CCL Example | 2103 @node Four bits to ASCII, URI Encoding constants, , CCL Example |
2104 @subsubsection Four bits to ASCII | 2104 @subsubsection Four bits to ASCII |
2105 | 2105 |
2115 @example | 2115 @example |
2116 (defvar url-coding-high-order-nybble-as-ascii | 2116 (defvar url-coding-high-order-nybble-as-ascii |
2117 (let ((val (make-vector 256 0)) | 2117 (let ((val (make-vector 256 0)) |
2118 (i 0)) | 2118 (i 0)) |
2119 (while (< i (length val)) | 2119 (while (< i (length val)) |
2120 (aset val i (char-int (aref (format "%02X" i) 0))) | 2120 (aset val i (char-to-int (aref (format "%02X" i) 0))) |
2121 (setq i (1+ i))) | 2121 (setq i (1+ i))) |
2122 val) | 2122 val) |
2123 "Table to find an ASCII version of an octet's most significant 4 bits.") | 2123 "Table to find an ASCII version of an octet's most significant 4 bits.") |
2124 @end example | 2124 @end example |
2125 | 2125 |
2126 The next table, @code{url-coding-low-order-nybble-as-ascii} is almost | 2126 The next table, @code{url-coding-low-order-nybble-as-ascii} is almost |
2127 the same thing, but this time it has a map for the hex encoding of the | 2127 the same thing, but this time it has a map for the hex encoding of the |
2128 low-order four bits. So the sixty-fifth entry (offset @samp{#x51}) is | 2128 low-order four bits. So the sixty-fifth entry (offset @samp{#x41}) is |
2129 the ASCII encoding of `1', the hundred-and-twenty-second (offset | 2129 the ASCII encoding of `1', the hundred-and-twenty-second (offset |
2130 @samp{#x7a}) is the ASCII encoding of `A'. | 2130 @samp{#x7a}) is the ASCII encoding of `A'. |
2131 | 2131 |
2132 @example | 2132 @example |
2133 (defvar url-coding-low-order-nybble-as-ascii | 2133 (defvar url-coding-low-order-nybble-as-ascii |
2134 (let ((val (make-vector 256 0)) | 2134 (let ((val (make-vector 256 0)) |
2135 (i 0)) | 2135 (i 0)) |
2136 (while (< i (length val)) | 2136 (while (< i (length val)) |
2137 (aset val i (char-int (aref (format "%02X" i) 1))) | 2137 (aset val i (char-to-int (aref (format "%02X" i) 1))) |
2138 (setq i (1+ i))) | 2138 (setq i (1+ i))) |
2139 val) | 2139 val) |
2140 "Table to find an ASCII version of an octet's least significant 4 bits.") | 2140 "Table to find an ASCII version of an octet's least significant 4 bits.") |
2141 @end example | 2141 @end example |
2142 | 2142 |
2152 as such, we have to check when decoding for this value, and map it to | 2152 as such, we have to check when decoding for this value, and map it to |
2153 the space character. When doing this in CCL, we use the | 2153 the space character. When doing this in CCL, we use the |
2154 @code{url-coding-escaped-space-code} variable. | 2154 @code{url-coding-escaped-space-code} variable. |
2155 | 2155 |
2156 @example | 2156 @example |
2157 (defvar url-coding-escape-character-code (char-int ?%) | 2157 (defvar url-coding-escape-character-code (char-to-int ?%) |
2158 "The code point for the percentage sign, in ASCII.") | 2158 "The code point for the percentage sign, in ASCII.") |
2159 | 2159 |
2160 (defvar url-coding-escaped-space-code (char-int ?+) | 2160 (defvar url-coding-escaped-space-code (char-to-int ?+) |
2161 "The URL-encoded value of the space character, that is, +.") | 2161 "The URL-encoded value of the space character, that is, +.") |
2162 @end example | 2162 @end example |
2163 | 2163 |
2164 @node Numeric to ASCII-hexadecimal conversion | 2164 @node Numeric to ASCII-hexadecimal conversion, Characters to be preserved, URI Encoding constants, CCL Example |
2165 @subsubsection Numeric to ASCII-hexadecimal conversion | 2165 @subsubsection Numeric to ASCII-hexadecimal conversion |
2166 | 2166 |
2167 Now, we have a couple of utility tables that wouldn't be necessary in | 2167 Now, we have a couple of utility tables that wouldn't be necessary in |
2168 a more expressive programming language than is CCL. The first is sixteen | 2168 a more expressive programming language than is CCL. The first is sixteen |
2169 in length, and maps a hexadecimal number to the ASCII encoding of that | 2169 in length, and maps a hexadecimal number to the ASCII encoding of that |
2175 @example | 2175 @example |
2176 (defvar url-coding-hex-digit-table | 2176 (defvar url-coding-hex-digit-table |
2177 (let ((i 0) | 2177 (let ((i 0) |
2178 (val (make-vector 16 0))) | 2178 (val (make-vector 16 0))) |
2179 (while (< i 16) | 2179 (while (< i 16) |
2180 (aset val i (char-int (aref (format "%X" i) 0))) | 2180 (aset val i (char-to-int (aref (format "%X" i) 0))) |
2181 (setq i (1+ i))) | 2181 (setq i (1+ i))) |
2182 val) | 2182 val) |
2183 "A map from a hexadecimal digit's numeric value to its encoding in ASCII.") | 2183 "A map from a hexadecimal digit's numeric value to its encoding in ASCII.") |
2184 | 2184 |
2185 (defvar url-coding-latin-1-as-hex-table | 2185 (defvar url-coding-latin-1-as-hex-table |
2191 (setq i (1+ i))) | 2191 (setq i (1+ i))) |
2192 val) | 2192 val) |
2193 "A map from Latin 1 code points to their values as hexadecimal digits.") | 2193 "A map from Latin 1 code points to their values as hexadecimal digits.") |
2194 @end example | 2194 @end example |
2195 | 2195 |
2196 @node Characters to be preserved | 2196 @node Characters to be preserved, The program to decode to internal format, Numeric to ASCII-hexadecimal conversion, CCL Example |
2197 @subsubsection Characters to be preserved | 2197 @subsubsection Characters to be preserved |
2198 | 2198 |
2199 And finally, the last of these tables. URL encoding says that | 2199 And finally, the last of these tables. URL encoding says that |
2200 alphanumeric characters, the underscore, hyphen and the full stop | 2200 alphanumeric characters, the underscore, hyphen and the full stop |
2201 @footnote{That's what the standards call it, though my North American | 2201 @footnote{That's what the standards call it, though my North American |
2225 res) | 2225 res) |
2226 "A 256-entry array of flags, indicating whether or not to preserve an | 2226 "A 256-entry array of flags, indicating whether or not to preserve an |
2227 octet as its ASCII encoding.") | 2227 octet as its ASCII encoding.") |
2228 @end example | 2228 @end example |
2229 | 2229 |
2230 @node The program to decode to internal format | 2230 @node The program to decode to internal format, The program to encode from internal format, Characters to be preserved, CCL Example |
2231 @subsubsection The program to decode to internal format | 2231 @subsubsection The program to decode to internal format |
2232 | 2232 |
2233 After the almost interminable tables, we get to the CCL. The first | 2233 After the almost interminable tables, we get to the CCL. The first |
2234 CCL program, @code{ccl-decode-urlcoding} decodes from the URL coding to | 2234 CCL program, @code{ccl-decode-urlcoding} decodes from the URL coding to |
2235 our internal format; since this version of CCL doesn't have support for | 2235 our internal format; since this version of CCL doesn't have support for |
2286 (repeat)))) | 2286 (repeat)))) |
2287 "CCL program to take URI-encoded ASCII text and transform it to our | 2287 "CCL program to take URI-encoded ASCII text and transform it to our |
2288 internal encoding. ") | 2288 internal encoding. ") |
2289 @end example | 2289 @end example |
2290 | 2290 |
2291 @node The program to encode from internal format | 2291 @node The program to encode from internal format, The actual coding system, The program to decode to internal format, CCL Example |
2292 @subsubsection The program to encode from internal format | 2292 @subsubsection The program to encode from internal format |
2293 | 2293 |
2294 Next, we see the CCL program to encode ASCII text as URL coded text. | 2294 Next, we see the CCL program to encode ASCII text as URL coded text. |
2295 Here, the buffer magnification is specified as three, to account for ` ' | 2295 Here, the buffer magnification is specified as three, to account for ` ' |
2296 mapping to %20, etc. As before, we read an octet from the input into | 2296 mapping to %20, etc. As before, we read an octet from the input into |
2321 (write r0 ,url-coding-low-order-nybble-as-ascii))) | 2321 (write r0 ,url-coding-low-order-nybble-as-ascii))) |
2322 (read r0) | 2322 (read r0) |
2323 (repeat)))) | 2323 (repeat)))) |
2324 "CCL program to encode octets (almost) according to RFC 1738") | 2324 "CCL program to encode octets (almost) according to RFC 1738") |
2325 @end example | 2325 @end example |
2326 | |
2327 @node The actual coding system, , The program to encode from internal format, CCL Example | |
2328 @subsubsection The actual coding system | |
2329 | |
2330 To actually create the coding system, we call | |
2331 @samp{make-coding-system}. The first argument is the symbol that is to | |
2332 be the name of the coding system, in our case @samp{url-coding}. The | |
2333 second specifies that the coding system is to be of type | |
2334 @samp{ccl}---there are several other coding system types available, | |
2335 including, see the documentation for @samp{make-coding-system} for the | |
2336 full list. Then there's a documentation string describing the wherefore | |
2337 and caveats of the coding system, and the final argument is a property | |
2338 list giving information about the CCL programs and the coding system's | |
2339 mnemonic. | |
2340 | |
2341 @example | |
2342 (make-coding-system | |
2343 'url-coding 'ccl | |
2344 "The coding used by application/x-www-form-urlencoded HTTP applications. | |
2345 This coding form doesn't specify anything about non-ASCII characters, so | |
2346 make sure you've transformed to a seven-bit coding system first." | |
2347 '(decode ccl-decode-urlcoding | |
2348 encode ccl-encode-urlcoding | |
2349 mnemonic "URLenc")) | |
2350 @end example | |
2351 | |
2352 If you're lucky, the @samp{url-coding} coding system describe here | |
2353 should be available in the XEmacs package system. Otherwise, downloading | |
2354 it from @samp{http://www.parhasard.net/url-coding.el} should work for | |
2355 the foreseeable future. | |
2326 | 2356 |
2327 @node Category Tables, Unicode Support, CCL, MULE | 2357 @node Category Tables, Unicode Support, CCL, MULE |
2328 @section Category Tables | 2358 @section Category Tables |
2329 | 2359 |
2330 A category table is a type of char table used for keeping track of | 2360 A category table is a type of char table used for keeping track of |