comparison man/lispref/mule.texi @ 2690:d5bfa26d5c3f

[xemacs-hg @ 2005-03-26 16:20:01 by aidan] Cleanup of the CCL coding system example, based on Stephen's feedback.
author aidan
date Sat, 26 Mar 2005 16:20:05 +0000
parents a4040d921acc
children 9fa10603c898
comparison
equal deleted inserted replaced
2689:9e54f5421792 2690:d5bfa26d5c3f
2095 * URI Encoding constants:: Useful predefined characters. 2095 * URI Encoding constants:: Useful predefined characters.
2096 * Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL. 2096 * Numeric to ASCII-hexadecimal conversion:: Trivial in Lisp, not so in CCL.
2097 * Characters to be preserved:: No transformation needed for these characters. 2097 * Characters to be preserved:: No transformation needed for these characters.
2098 * The program to decode to internal format:: . 2098 * The program to decode to internal format:: .
2099 * The program to encode from internal format:: . 2099 * The program to encode from internal format:: .
2100 2100 * The actual coding system:: .
2101 @end menu 2101 @end menu
2102 2102
2103 @node Four bits to ASCII, URI Encoding constants, , CCL Example 2103 @node Four bits to ASCII, URI Encoding constants, , CCL Example
2104 @subsubsection Four bits to ASCII 2104 @subsubsection Four bits to ASCII
2105 2105
2115 @example 2115 @example
2116 (defvar url-coding-high-order-nybble-as-ascii 2116 (defvar url-coding-high-order-nybble-as-ascii
2117 (let ((val (make-vector 256 0)) 2117 (let ((val (make-vector 256 0))
2118 (i 0)) 2118 (i 0))
2119 (while (< i (length val)) 2119 (while (< i (length val))
2120 (aset val i (char-int (aref (format "%02X" i) 0))) 2120 (aset val i (char-to-int (aref (format "%02X" i) 0)))
2121 (setq i (1+ i))) 2121 (setq i (1+ i)))
2122 val) 2122 val)
2123 "Table to find an ASCII version of an octet's most significant 4 bits.") 2123 "Table to find an ASCII version of an octet's most significant 4 bits.")
2124 @end example 2124 @end example
2125 2125
2126 The next table, @code{url-coding-low-order-nybble-as-ascii} is almost 2126 The next table, @code{url-coding-low-order-nybble-as-ascii} is almost
2127 the same thing, but this time it has a map for the hex encoding of the 2127 the same thing, but this time it has a map for the hex encoding of the
2128 low-order four bits. So the sixty-fifth entry (offset @samp{#x51}) is 2128 low-order four bits. So the sixty-fifth entry (offset @samp{#x41}) is
2129 the ASCII encoding of `1', the hundred-and-twenty-second (offset 2129 the ASCII encoding of `1', the hundred-and-twenty-second (offset
2130 @samp{#x7a}) is the ASCII encoding of `A'. 2130 @samp{#x7a}) is the ASCII encoding of `A'.
2131 2131
2132 @example 2132 @example
2133 (defvar url-coding-low-order-nybble-as-ascii 2133 (defvar url-coding-low-order-nybble-as-ascii
2134 (let ((val (make-vector 256 0)) 2134 (let ((val (make-vector 256 0))
2135 (i 0)) 2135 (i 0))
2136 (while (< i (length val)) 2136 (while (< i (length val))
2137 (aset val i (char-int (aref (format "%02X" i) 1))) 2137 (aset val i (char-to-int (aref (format "%02X" i) 1)))
2138 (setq i (1+ i))) 2138 (setq i (1+ i)))
2139 val) 2139 val)
2140 "Table to find an ASCII version of an octet's least significant 4 bits.") 2140 "Table to find an ASCII version of an octet's least significant 4 bits.")
2141 @end example 2141 @end example
2142 2142
2152 as such, we have to check when decoding for this value, and map it to 2152 as such, we have to check when decoding for this value, and map it to
2153 the space character. When doing this in CCL, we use the 2153 the space character. When doing this in CCL, we use the
2154 @code{url-coding-escaped-space-code} variable. 2154 @code{url-coding-escaped-space-code} variable.
2155 2155
2156 @example 2156 @example
2157 (defvar url-coding-escape-character-code (char-int ?%) 2157 (defvar url-coding-escape-character-code (char-to-int ?%)
2158 "The code point for the percentage sign, in ASCII.") 2158 "The code point for the percentage sign, in ASCII.")
2159 2159
2160 (defvar url-coding-escaped-space-code (char-int ?+) 2160 (defvar url-coding-escaped-space-code (char-to-int ?+)
2161 "The URL-encoded value of the space character, that is, +.") 2161 "The URL-encoded value of the space character, that is, +.")
2162 @end example 2162 @end example
2163 2163
2164 @node Numeric to ASCII-hexadecimal conversion 2164 @node Numeric to ASCII-hexadecimal conversion, Characters to be preserved, URI Encoding constants, CCL Example
2165 @subsubsection Numeric to ASCII-hexadecimal conversion 2165 @subsubsection Numeric to ASCII-hexadecimal conversion
2166 2166
2167 Now, we have a couple of utility tables that wouldn't be necessary in 2167 Now, we have a couple of utility tables that wouldn't be necessary in
2168 a more expressive programming language than is CCL. The first is sixteen 2168 a more expressive programming language than is CCL. The first is sixteen
2169 in length, and maps a hexadecimal number to the ASCII encoding of that 2169 in length, and maps a hexadecimal number to the ASCII encoding of that
2175 @example 2175 @example
2176 (defvar url-coding-hex-digit-table 2176 (defvar url-coding-hex-digit-table
2177 (let ((i 0) 2177 (let ((i 0)
2178 (val (make-vector 16 0))) 2178 (val (make-vector 16 0)))
2179 (while (< i 16) 2179 (while (< i 16)
2180 (aset val i (char-int (aref (format "%X" i) 0))) 2180 (aset val i (char-to-int (aref (format "%X" i) 0)))
2181 (setq i (1+ i))) 2181 (setq i (1+ i)))
2182 val) 2182 val)
2183 "A map from a hexadecimal digit's numeric value to its encoding in ASCII.") 2183 "A map from a hexadecimal digit's numeric value to its encoding in ASCII.")
2184 2184
2185 (defvar url-coding-latin-1-as-hex-table 2185 (defvar url-coding-latin-1-as-hex-table
2191 (setq i (1+ i))) 2191 (setq i (1+ i)))
2192 val) 2192 val)
2193 "A map from Latin 1 code points to their values as hexadecimal digits.") 2193 "A map from Latin 1 code points to their values as hexadecimal digits.")
2194 @end example 2194 @end example
2195 2195
2196 @node Characters to be preserved 2196 @node Characters to be preserved, The program to decode to internal format, Numeric to ASCII-hexadecimal conversion, CCL Example
2197 @subsubsection Characters to be preserved 2197 @subsubsection Characters to be preserved
2198 2198
2199 And finally, the last of these tables. URL encoding says that 2199 And finally, the last of these tables. URL encoding says that
2200 alphanumeric characters, the underscore, hyphen and the full stop 2200 alphanumeric characters, the underscore, hyphen and the full stop
2201 @footnote{That's what the standards call it, though my North American 2201 @footnote{That's what the standards call it, though my North American
2225 res) 2225 res)
2226 "A 256-entry array of flags, indicating whether or not to preserve an 2226 "A 256-entry array of flags, indicating whether or not to preserve an
2227 octet as its ASCII encoding.") 2227 octet as its ASCII encoding.")
2228 @end example 2228 @end example
2229 2229
2230 @node The program to decode to internal format 2230 @node The program to decode to internal format, The program to encode from internal format, Characters to be preserved, CCL Example
2231 @subsubsection The program to decode to internal format 2231 @subsubsection The program to decode to internal format
2232 2232
2233 After the almost interminable tables, we get to the CCL. The first 2233 After the almost interminable tables, we get to the CCL. The first
2234 CCL program, @code{ccl-decode-urlcoding} decodes from the URL coding to 2234 CCL program, @code{ccl-decode-urlcoding} decodes from the URL coding to
2235 our internal format; since this version of CCL doesn't have support for 2235 our internal format; since this version of CCL doesn't have support for
2286 (repeat)))) 2286 (repeat))))
2287 "CCL program to take URI-encoded ASCII text and transform it to our 2287 "CCL program to take URI-encoded ASCII text and transform it to our
2288 internal encoding. ") 2288 internal encoding. ")
2289 @end example 2289 @end example
2290 2290
2291 @node The program to encode from internal format 2291 @node The program to encode from internal format, The actual coding system, The program to decode to internal format, CCL Example
2292 @subsubsection The program to encode from internal format 2292 @subsubsection The program to encode from internal format
2293 2293
2294 Next, we see the CCL program to encode ASCII text as URL coded text. 2294 Next, we see the CCL program to encode ASCII text as URL coded text.
2295 Here, the buffer magnification is specified as three, to account for ` ' 2295 Here, the buffer magnification is specified as three, to account for ` '
2296 mapping to %20, etc. As before, we read an octet from the input into 2296 mapping to %20, etc. As before, we read an octet from the input into
2321 (write r0 ,url-coding-low-order-nybble-as-ascii))) 2321 (write r0 ,url-coding-low-order-nybble-as-ascii)))
2322 (read r0) 2322 (read r0)
2323 (repeat)))) 2323 (repeat))))
2324 "CCL program to encode octets (almost) according to RFC 1738") 2324 "CCL program to encode octets (almost) according to RFC 1738")
2325 @end example 2325 @end example
2326
2327 @node The actual coding system, , The program to encode from internal format, CCL Example
2328 @subsubsection The actual coding system
2329
2330 To actually create the coding system, we call
2331 @samp{make-coding-system}. The first argument is the symbol that is to
2332 be the name of the coding system, in our case @samp{url-coding}. The
2333 second specifies that the coding system is to be of type
2334 @samp{ccl}---there are several other coding system types available,
2335 including, see the documentation for @samp{make-coding-system} for the
2336 full list. Then there's a documentation string describing the wherefore
2337 and caveats of the coding system, and the final argument is a property
2338 list giving information about the CCL programs and the coding system's
2339 mnemonic.
2340
2341 @example
2342 (make-coding-system
2343 'url-coding 'ccl
2344 "The coding used by application/x-www-form-urlencoded HTTP applications.
2345 This coding form doesn't specify anything about non-ASCII characters, so
2346 make sure you've transformed to a seven-bit coding system first."
2347 '(decode ccl-decode-urlcoding
2348 encode ccl-encode-urlcoding
2349 mnemonic "URLenc"))
2350 @end example
2351
2352 If you're lucky, the @samp{url-coding} coding system describe here
2353 should be available in the XEmacs package system. Otherwise, downloading
2354 it from @samp{http://www.parhasard.net/url-coding.el} should work for
2355 the foreseeable future.
2326 2356
2327 @node Category Tables, Unicode Support, CCL, MULE 2357 @node Category Tables, Unicode Support, CCL, MULE
2328 @section Category Tables 2358 @section Category Tables
2329 2359
2330 A category table is a type of char table used for keeping track of 2360 A category table is a type of char table used for keeping track of