Mercurial > hg > xemacs-beta
comparison lisp/mule/make-coding-system.el @ 4690:257b468bf2ca
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we need such
functionality if we're going to have a reliable and portable
#'query-coding-region implementation. However, this change doesn't yet
provide #'query-coding-region for the mswindow-multibyte coding systems,
there should be no functional differences between an XEmacs with this change
and one without it.
src/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
This is necessary because there is no reasonable way to access the
corresponding mswindows-multibyte functionality from Lisp, and we
need such functionality if we're going to have a reliable and
portable #'query-coding-region implementation. However, this
change doesn't yet provide #'query-coding-region for the
mswindow-multibyte coding systems, there should be no functional
differences between an XEmacs with this change and one without it.
* mule-coding.c (struct fixed_width_coding_system):
Add a new coding system type, fixed_width, and implement it. It
uses the CCL infrastructure but has a much simpler creation API,
and its own query_method, formerly in lisp/mule/mule-coding.el.
* unicode.c:
Move the Unicode query method implementation here from
unicode.el.
* lisp.h: Declare Fmake_coding_system_internal, Fcopy_range_table
here.
* intl-win32.c (complex_vars_of_intl_win32):
Use Fmake_coding_system_internal, not Fmake_coding_system.
* general-slots.h: Add Qsucceeded, Qunencodable, Qinvalid_sequence
here.
* file-coding.h (enum coding_system_variant):
Add fixed_width_coding_system here.
(struct coding_system_methods):
Add query_method and query_lstream_method to the coding system
methods.
Provide flags for the query methods.
Declare the default query method; initialise it correctly in
INITIALIZE_CODING_SYSTEM_TYPE.
* file-coding.c (default_query_method):
New function, the default query method for coding systems that do
not set it. Moved from coding.el.
(make_coding_system_1):
Accept new elements in PROPS in #'make-coding-system; aliases, a
list of aliases; safe-chars and safe-charsets (these were
previously accepted but not saved); and category.
(Fmake_coding_system_internal):
New function, what used to be #'make-coding-system--on Mule
builds, we've now moved some of the functionality of this to
Lisp.
(Fcoding_system_canonical_name_p):
Move this earlier in the file, since it's now called from within
make_coding_system_1.
(Fquery_coding_region):
Move the implementation of this here, from coding.el.
(complex_vars_of_file_coding):
Call Fmake_coding_system_internal, not Fmake_coding_system;
specify safe-charsets properties when we're a mule build.
* extents.h (mouse_highlight_priority, Fset_extent_priority,
Fset_extent_face, Fmap_extents):
Make these available to other C files.
lisp/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
Move the #'query-coding-region implementation to C.
* coding.el:
Consolidate code that depends on the presence or absence of Mule
at the end of this file.
(default-query-coding-region, query-coding-region):
Move these functions to C.
(default-query-coding-region-safe-charset-skip-chars-map):
Remove this variable, the corresponding C variable is
Vdefault_query_coding_region_chartab_cache in file-coding.c.
(query-coding-string): Update docstring to reflect actual multiple
values, be more careful about not modifying a range table that
we're currently mapping over.
(encode-coding-char): Make the implementation of this simpler.
(featurep 'mule): Autoload #'make-coding-system from
mule/make-coding-system.el if we're a mule build; provide an
appropriate compiler macro.
Do various non-mule compatibility things if we're not a mule
build.
* update-elc.el (additional-dump-dependencies):
Add mule/make-coding-system as a dump time dependency if we're a
mule build.
* unicode.el (ccl-encode-to-ucs-2):
(decode-char):
(encode-char):
Move these earlier in the file, for the sake of some byte compile
warnings.
(unicode-query-coding-region):
Move this to unicode.c
* mule/make-coding-system.el:
New file, not dumped. Contains the functionality to rework the
arguments necessary for fixed-width coding systems, and contains
the implementation of #'make-coding-system, which now calls
#'make-coding-system-internal.
* mule/vietnamese.el (viscii):
* mule/latin.el (iso-8859-2):
(windows-1250):
(iso-8859-3):
(iso-8859-4):
(iso-8859-14):
(iso-8859-15):
(iso-8859-16):
(iso-8859-9):
(macintosh):
(windows-1252):
* mule/hebrew.el (iso-8859-8):
* mule/greek.el (iso-8859-7):
(windows-1253):
* mule/cyrillic.el (iso-8859-5):
(koi8-r):
(koi8-u):
(windows-1251):
(alternativnyj):
(koi8-ru):
(koi8-t):
(koi8-c):
(koi8-o):
* mule/arabic.el (iso-8859-6):
(windows-1256):
Move all these coding systems to being of type fixed-width, not of
type CCL. This allows the distinct query-coding-region for them to
be in C, something which will eventually allow us to implement
query-coding-region for the mswindows-multibyte coding systems.
* mule/general-late.el (posix-charset-to-coding-system-hash):
Document why we're pre-emptively persuading the byte compiler that
the ELC for this file needs to be written using escape-quoted.
Call #'set-unicode-query-skip-chars-args, now the Unicode
query-coding-region implementation is in C.
* mule/thai-xtis.el (tis-620):
Don't bother checking whether we're XEmacs or not here.
* mule/mule-coding.el:
Move the eight bit fixed-width functionality from this file to
make-coding-system.el.
tests/ChangeLog addition:
2009-09-19 Aidan Kehoe <kehoea@parhasard.net>
* automated/mule-tests.el:
Check a coding system's type, not an 8-bit-fixed property, for
whether that coding system should be treated as a fixed-width
coding system.
* automated/query-coding-tests.el:
Don't test the query coding functionality for mswindows-multibyte
coding systems, it's not yet implemented.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 19 Sep 2009 22:53:13 +0100 |
parents | |
children | dc3c2f298857 |
comparison
equal
deleted
inserted
replaced
4689:0636c6ccb430 | 4690:257b468bf2ca |
---|---|
1 ;;; make-coding-system.el; Provides the #'make-coding-system function and | |
2 ;;; much of the implementation of the fixed-width coding system type. | |
3 | |
4 ;; Copyright (C) 2009 Free Software Foundation | |
5 | |
6 ;; Author: Aidan Kehoe | |
7 | |
8 ;; This file is part of XEmacs. | |
9 | |
10 ;; XEmacs is free software; you can redistribute it and/or modify it | |
11 ;; under the terms of the GNU General Public License as published by | |
12 ;; the Free Software Foundation; either version 2, or (at your option) | |
13 ;; any later version. | |
14 | |
15 ;; XEmacs is distributed in the hope that it will be useful, but | |
16 ;; WITHOUT ANY WARRANTY; without even the implied warranty of | |
17 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |
18 ;; General Public License for more details. | |
19 | |
20 ;; You should have received a copy of the GNU General Public License | |
21 ;; along with XEmacs; see the file COPYING. If not, write to the | |
22 ;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, | |
23 ;; Boston, MA 02110-1301, USA. | |
24 | |
25 ;;; Commentary: | |
26 | |
27 ;;; Code: | |
28 | |
29 (defvar fixed-width-private-use-start (decode-char 'ucs #xE000) | |
30 "Start of a 256 code private use area for fixed-width coding systems. | |
31 | |
32 This is used to ensure that distinct octets on disk for a given coding | |
33 system map to distinct XEmacs characters, preventing spurious changes when | |
34 a file is read, not changed, and then written. ") | |
35 | |
36 (defun fixed-width-generate-helper (decode-table encode-table | |
37 encode-failure-octet) | |
38 "Helper func, `fixed-width-generate-encode-program-and-skip-chars-strings', | |
39 which see. | |
40 | |
41 Deals with the case where ASCII and another character set can both be | |
42 encoded unambiguously and completely into the coding-system; if this is so, | |
43 returns multiple values comprisig of such a ccl-program and the character | |
44 set in question. If not, it returns nil." | |
45 (let ((tentative-encode-program-parts | |
46 (eval-when-compile | |
47 (let* ((vec-len 128) | |
48 (compiled | |
49 (append | |
50 (ccl-compile | |
51 `(1 | |
52 (loop | |
53 (read-multibyte-character r0 r1) | |
54 (if (r0 == ,(charset-id 'ascii)) | |
55 (write r1) | |
56 ((if (r0 == #xABAB) | |
57 ;; #xBFFE is a sentinel in the compiled | |
58 ;; program. | |
59 ((r0 = r1 & #x7F) | |
60 (write r0 ,(make-vector vec-len #xBFFE))) | |
61 ((mule-to-unicode r0 r1) | |
62 (if (r0 == #xFFFD) | |
63 (write #xBEEF) | |
64 ((lookup-integer encode-table-sym r0 r3) | |
65 (if r7 | |
66 (write-multibyte-character r0 r3) | |
67 (write #xBEEF)))))))) | |
68 (repeat)))) nil)) | |
69 (first-part compiled) | |
70 (last-part | |
71 (member-if-not (lambda (entr) (eq #xBFFE entr)) | |
72 (member-if | |
73 (lambda (entr) (eq #xBFFE entr)) | |
74 first-part)))) | |
75 (while compiled | |
76 (when (eq #xBFFE (cadr compiled)) | |
77 (assert (= vec-len (search '(#xBFFE) (cdr compiled) | |
78 :test #'/=)) nil | |
79 "Strange ccl vector length") | |
80 (setcdr compiled nil)) | |
81 (setq compiled (cdr compiled))) | |
82 ;; Is the generated code as we expect it to be? | |
83 (assert (and (memq #xABAB first-part) | |
84 (memq #xBEEF14 last-part)) | |
85 nil | |
86 "This code assumes that the constant #xBEEF is #xBEEF14 in \ | |
87 compiled CCL code,\nand that the constant #xABAB is #xABAB. If that is | |
88 not the case, and it appears not to be--that's why you're getting this | |
89 message--it will not work. ") | |
90 (list first-part last-part vec-len)))) | |
91 (charset-lower -1) | |
92 (charset-upper -1) | |
93 worth-trying known-charsets encode-program | |
94 other-charset-vector ucs) | |
95 | |
96 (loop for char across decode-table | |
97 do (pushnew (char-charset char) known-charsets)) | |
98 (setq known-charsets (delq 'ascii known-charsets)) | |
99 | |
100 (loop for known-charset in known-charsets | |
101 do | |
102 ;; This is not possible for two dimensional charsets. | |
103 (when (eq 1 (charset-dimension known-charset)) | |
104 (if (eq 'control-1 known-charset) | |
105 (setq charset-lower 0 | |
106 charset-upper 31) | |
107 ;; There should be a nicer way to get the limits here. | |
108 (condition-case args-out-of-range | |
109 (make-char known-charset #x100) | |
110 (args-out-of-range | |
111 (setq charset-lower (third args-out-of-range) | |
112 charset-upper (fourth args-out-of-range))))) | |
113 (loop | |
114 for i from charset-lower to charset-upper | |
115 always (and (setq ucs | |
116 (encode-char (make-char known-charset i) 'ucs)) | |
117 (gethash ucs encode-table)) | |
118 finally (setq worth-trying known-charset)) | |
119 | |
120 ;; Only trying this for one charset at a time, the first find. | |
121 (when worth-trying (return)) | |
122 | |
123 ;; Okay, this charset is not worth trying, Try the next. | |
124 (setq charset-lower -1 | |
125 charset-upper -1 | |
126 worth-trying nil))) | |
127 | |
128 (when worth-trying | |
129 (setq other-charset-vector | |
130 (make-vector (third tentative-encode-program-parts) | |
131 encode-failure-octet)) | |
132 (loop for i from charset-lower to charset-upper | |
133 do (aset other-charset-vector i | |
134 (gethash (encode-char (make-char worth-trying i) | |
135 'ucs) encode-table))) | |
136 (setq encode-program | |
137 (nsublis | |
138 (list (cons #xABAB (charset-id worth-trying))) | |
139 (nconc | |
140 (copy-list (first | |
141 tentative-encode-program-parts)) | |
142 (append other-charset-vector nil) | |
143 (copy-tree (second | |
144 tentative-encode-program-parts)))))) | |
145 (and encode-program (values encode-program worth-trying)))) | |
146 | |
147 (defun fixed-width-generate-encode-program-and-skip-chars-strings | |
148 (decode-table encode-table encode-failure-octet) | |
149 "Generate a CCL program to encode a 8-bit fixed-width charset. | |
150 | |
151 DECODE-TABLE must have 256 non-cons entries, and will be regarded as | |
152 describing a map from the octet corresponding to an offset in the | |
153 table to the that entry in the table. ENCODE-TABLE is a hash table | |
154 map from unicode values to characters in the range [0,255]. | |
155 ENCODE-FAILURE-OCTET describes an integer between 0 and 255 | |
156 \(inclusive) to write in the event that a character cannot be encoded." | |
157 (check-argument-type #'vectorp decode-table) | |
158 (check-argument-range (length decode-table) #x100 #x100) | |
159 (check-argument-type #'hash-table-p encode-table) | |
160 (check-argument-type #'integerp encode-failure-octet) | |
161 (check-argument-range encode-failure-octet #x00 #xFF) | |
162 (let ((encode-program nil) | |
163 (general-encode-program | |
164 (eval-when-compile | |
165 (let ((prog (append | |
166 (ccl-compile | |
167 `(1 | |
168 (loop | |
169 (read-multibyte-character r0 r1) | |
170 (mule-to-unicode r0 r1) | |
171 (if (r0 == #xFFFD) | |
172 (write #xBEEF) | |
173 ((lookup-integer encode-table-sym r0 r3) | |
174 (if r7 | |
175 (write-multibyte-character r0 r3) | |
176 (write #xBEEF)))) | |
177 (repeat)))) nil))) | |
178 (assert (memq #xBEEF14 prog) | |
179 nil | |
180 "This code assumes that the constant #xBEEF is #xBEEF14 \ | |
181 in compiled CCL code.\nIf that is not the case, and it appears not to | |
182 be--that's why you're getting this message--it will not work. ") | |
183 prog))) | |
184 (encode-program-with-ascii-optimisation | |
185 (eval-when-compile | |
186 (let ((prog (append | |
187 (ccl-compile | |
188 `(1 | |
189 (loop | |
190 (read-multibyte-character r0 r1) | |
191 (if (r0 == ,(charset-id 'ascii)) | |
192 (write r1) | |
193 ((mule-to-unicode r0 r1) | |
194 (if (r0 == #xFFFD) | |
195 (write #xBEEF) | |
196 ((lookup-integer encode-table-sym r0 r3) | |
197 (if r7 | |
198 (write-multibyte-character r0 r3) | |
199 (write #xBEEF)))))) | |
200 (repeat)))) nil))) | |
201 (assert (memq #xBEEF14 prog) | |
202 nil | |
203 "This code assumes that the constant #xBEEF is #xBEEF14 \ | |
204 in compiled CCL code.\nIf that is not the case, and it appears not to | |
205 be--that's why you're getting this message--it will not work. ") | |
206 prog))) | |
207 (ascii-encodes-as-itself nil) | |
208 (control-1-encodes-as-itself t) | |
209 (invalid-sequence-code-point-start | |
210 (eval-when-compile | |
211 (char-to-unicode | |
212 (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)))) | |
213 further-char-set skip-chars invalid-sequences-skip-chars) | |
214 | |
215 ;; Is this coding system ASCII-compatible? If so, we can avoid the hash | |
216 ;; table lookup for those characters. | |
217 (loop | |
218 for i from #x00 to #x7f | |
219 always (eq (int-to-char i) (gethash i encode-table)) | |
220 finally (setq ascii-encodes-as-itself t)) | |
221 | |
222 ;; Note that this logic handles EBCDIC badly. For example, CP037, | |
223 ;; MIME name ebcdic-na, has the entire repertoire of ASCII and | |
224 ;; Latin 1, and thus a more optimal ccl encode program would check | |
225 ;; for those character sets and use tables. But for now, we do a | |
226 ;; hash table lookup for every character. | |
227 (if (null ascii-encodes-as-itself) | |
228 ;; General encode program. Pros; general and correct. Cons; | |
229 ;; slow, a hash table lookup + mule-unicode conversion is done | |
230 ;; for every character encoding. | |
231 (setq encode-program general-encode-program) | |
232 (multiple-value-setq | |
233 (encode-program further-char-set) | |
234 ;; Encode program with ascii-ascii mapping (based on a | |
235 ;; character's mule character set), and one other mule | |
236 ;; character set using table-based encoding, other | |
237 ;; character sets using hash table lookups. | |
238 ;; fixed-width-non-ascii-completely-coveredp only returns | |
239 ;; such a mapping if some non-ASCII charset with | |
240 ;; characters in decode-table is entirely covered by | |
241 ;; encode-table. | |
242 (fixed-width-generate-helper decode-table encode-table | |
243 encode-failure-octet)) | |
244 (unless encode-program | |
245 ;; If fixed-width-non-ascii-completely-coveredp returned nil, | |
246 ;; but ASCII still encodes as itself, do one-to-one mapping | |
247 ;; for ASCII, and a hash table lookup for everything else. | |
248 (setq encode-program encode-program-with-ascii-optimisation))) | |
249 | |
250 (setq encode-program | |
251 (nsublis | |
252 (list (cons #xBEEF14 | |
253 (logior (lsh encode-failure-octet 8) | |
254 #x14))) | |
255 (copy-tree encode-program))) | |
256 (loop | |
257 for i from #x80 to #x9f | |
258 do (unless (= i (aref decode-table i)) | |
259 (setq control-1-encodes-as-itself nil) | |
260 (return))) | |
261 (loop | |
262 for i from #x00 to #xFF | |
263 initially (setq skip-chars | |
264 (cond | |
265 ((and ascii-encodes-as-itself | |
266 control-1-encodes-as-itself further-char-set) | |
267 (concat "\x00-\x9f" (charset-skip-chars-string | |
268 further-char-set))) | |
269 ((and ascii-encodes-as-itself | |
270 control-1-encodes-as-itself) | |
271 "\x00-\x9f") | |
272 ((null ascii-encodes-as-itself) | |
273 (skip-chars-quote (apply #'string | |
274 (append decode-table nil)))) | |
275 (further-char-set | |
276 (concat (charset-skip-chars-string 'ascii) | |
277 (charset-skip-chars-string further-char-set))) | |
278 (t | |
279 (charset-skip-chars-string 'ascii))) | |
280 invalid-sequences-skip-chars "") | |
281 with decoded-ucs = nil | |
282 with decoded = nil | |
283 with no-ascii-transparency-skip-chars-list = | |
284 (unless ascii-encodes-as-itself (append decode-table nil)) | |
285 ;; Can't use #'match-string here, see: | |
286 ;; http://mid.gmane.org/18829.34118.709782.704574@parhasard.net | |
287 with skip-chars-test = | |
288 #'(lambda (skip-chars-string testing) | |
289 (with-temp-buffer | |
290 (insert testing) | |
291 (goto-char (point-min)) | |
292 (skip-chars-forward skip-chars-string) | |
293 (= (point) (point-max)))) | |
294 do | |
295 (setq decoded (aref decode-table i) | |
296 decoded-ucs (char-to-unicode decoded)) | |
297 (cond | |
298 ((<= invalid-sequence-code-point-start decoded-ucs | |
299 (+ invalid-sequence-code-point-start #xFF)) | |
300 (setq invalid-sequences-skip-chars | |
301 (concat (string decoded) | |
302 invalid-sequences-skip-chars)) | |
303 (assert (not (funcall skip-chars-test skip-chars decoded)) | |
304 "This char should only be skipped with \ | |
305 `invalid-sequences-skip-chars', not by `skip-chars'")) | |
306 ((not (funcall skip-chars-test skip-chars decoded)) | |
307 (if ascii-encodes-as-itself | |
308 (setq skip-chars (concat skip-chars (string decoded))) | |
309 (push decoded no-ascii-transparency-skip-chars-list)))) | |
310 finally (unless ascii-encodes-as-itself | |
311 (setq skip-chars | |
312 (skip-chars-quote | |
313 (apply #'string | |
314 no-ascii-transparency-skip-chars-list))))) | |
315 (values encode-program skip-chars invalid-sequences-skip-chars))) | |
316 | |
317 (defun fixed-width-create-decode-encode-tables (unicode-map) | |
318 "Return multiple values \(DECODE-TABLE ENCODE-TABLE) given UNICODE-MAP. | |
319 UNICODE-MAP should be an alist mapping from integer octet values to | |
320 characters with UCS code points; DECODE-TABLE will be a 256-element | |
321 vector, and ENCODE-TABLE will be a hash table mapping from 256 numbers | |
322 to 256 distinct characters." | |
323 (check-argument-type #'listp unicode-map) | |
324 (let ((decode-table (make-vector 256 nil)) | |
325 (encode-table (make-hash-table :size 256)) | |
326 (private-use-start (encode-char fixed-width-private-use-start 'ucs)) | |
327 (invalid-sequence-code-point-start | |
328 (eval-when-compile | |
329 (char-to-unicode | |
330 (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)))) | |
331 desired-ucs decode-table-entry) | |
332 | |
333 (loop for (external internal) | |
334 in unicode-map | |
335 do | |
336 (aset decode-table external internal) | |
337 (assert (not (eq (encode-char internal 'ucs) -1)) | |
338 nil | |
339 "Looks like you're creating a fixed-width coding system \ | |
340 in a dumped file, \nand you're either not providing a literal unicode map | |
341 or PROPS. Don't do that; fixed-width coding systems rely on sensible | |
342 Unicode mappings being available, which they are at compile time for | |
343 dumped files (but this requires the mentioned literals), but not, for | |
344 most of them, at run time. ") | |
345 | |
346 (puthash (encode-char internal 'ucs) | |
347 ;; This is semantically an integer, but Dave Love's design | |
348 ;; for lookup-integer in CCL means we need to store it as a | |
349 ;; character. | |
350 (int-to-char external) | |
351 encode-table)) | |
352 | |
353 ;; Now, go through the decode table. For octet values above #x7f, if the | |
354 ;; decode table entry is nil, this means that they have an undefined | |
355 ;; mapping (= they map to XEmacs characters with keys in | |
356 ;; unicode-error-default-translation-table); for octet values below or | |
357 ;; equal to #x7f, it means that they map to ASCII. | |
358 | |
359 ;; If any entry (whether below or above #x7f) in the decode-table | |
360 ;; already maps to some character with a key in | |
361 ;; unicode-error-default-translation-table, it is treated as an | |
362 ;; undefined octet by `query-coding-region'. That is, it is not | |
363 ;; necessary for an octet value to be above #x7f for this to happen. | |
364 | |
365 (dotimes (i 256) | |
366 (setq decode-table-entry (aref decode-table i)) | |
367 (if decode-table-entry | |
368 (when (get-char-table | |
369 decode-table-entry | |
370 unicode-error-default-translation-table) | |
371 ;; The caller is explicitly specifying that this octet | |
372 ;; corresponds to an invalid sequence on disk: | |
373 (assert (= (get-char-table | |
374 decode-table-entry | |
375 unicode-error-default-translation-table) i) | |
376 "Bad argument for a fixed-width coding system. | |
377 If you're going to designate an octet with value below #x80 as invalid | |
378 for this coding system, make sure to map it to the invalid sequence | |
379 character corresponding to its octet value on disk. ")) | |
380 | |
381 ;; decode-table-entry is nil; either the octet is to be treated as | |
382 ;; contributing to an error sequence (when (> #x7f i)), or it should | |
383 ;; be attempted to treat it as ASCII-equivalent. | |
384 (setq desired-ucs (or (and (< i #x80) i) | |
385 (+ invalid-sequence-code-point-start i))) | |
386 (while (gethash desired-ucs encode-table) | |
387 (assert (not (< i #x80)) | |
388 "UCS code point should not already be in encode-table!" | |
389 ;; There is one invalid sequence char per octet value; | |
390 ;; with fixed-width coding systems, it makes no sense | |
391 ;; for us to be multiply allocating them. | |
392 (gethash desired-ucs encode-table)) | |
393 (setq desired-ucs (+ private-use-start desired-ucs) | |
394 private-use-start (+ private-use-start 1))) | |
395 (puthash desired-ucs (int-to-char i) encode-table) | |
396 (setq desired-ucs (if (> desired-ucs #xFF) | |
397 (unicode-to-char desired-ucs) | |
398 ;; So we get Latin-1 when run at dump time, | |
399 ;; instead of JIT-allocated characters. | |
400 (int-to-char desired-ucs))) | |
401 (aset decode-table i desired-ucs))) | |
402 (values decode-table encode-table))) | |
403 | |
404 (defun fixed-width-generate-decode-program (decode-table) | |
405 "Given DECODE-TABLE, generate a CCL program to decode an 8-bit charset. | |
406 DECODE-TABLE must have 256 non-cons entries, and will be regarded as | |
407 describing a map from the octet corresponding to an offset in the | |
408 table to the that entry in the table. " | |
409 (check-argument-type #'vectorp decode-table) | |
410 (check-argument-range (length decode-table) #x100 #x100) | |
411 (let ((decode-program-parts | |
412 (eval-when-compile | |
413 (let* ((compiled | |
414 (append | |
415 (ccl-compile | |
416 `(3 | |
417 ((read r0) | |
418 (loop | |
419 (write-read-repeat r0 ,(make-vector | |
420 256 'sentinel)))))) nil)) | |
421 (first-part compiled) | |
422 (last-part | |
423 (member-if-not #'symbolp | |
424 (member-if-not #'integerp first-part)))) | |
425 ;; Chop off the sentinel sentinel sentinel [..] part. | |
426 (while compiled | |
427 (if (symbolp (cadr compiled)) | |
428 (setcdr compiled nil)) | |
429 (setq compiled (cdr compiled))) | |
430 (list first-part last-part))))) | |
431 (nconc | |
432 ;; copy-list needed, because the structure of the literal provided | |
433 ;; by our eval-when-compile hangs around. | |
434 (copy-list (first decode-program-parts)) | |
435 (append decode-table nil) | |
436 (second decode-program-parts)))) | |
437 | |
438 (defun fixed-width-choose-category (decode-table) | |
439 "Given DECODE-TABLE, return an appropriate coding category. | |
440 DECODE-TABLE is a 256-entry vector describing the mapping from octets on | |
441 disk to XEmacs characters for some fixed-width 8-bit coding system." | |
442 (check-argument-type #'vectorp decode-table) | |
443 (check-argument-range (length decode-table) #x100 #x100) | |
444 (loop | |
445 named category | |
446 for i from #x80 to #x9F | |
447 do (unless (= i (aref decode-table i)) | |
448 (return-from category 'no-conversion)) | |
449 finally return 'iso-8-1)) | |
450 | |
451 (defun fixed-width-rework-props-runtime (name props) | |
452 "Rework PROPS to a form understood by `make-coding-system-internal'. | |
453 | |
454 NAME must be a symbol, describing a fixed-width coding system that is | |
455 about to be created. Much of the implementation of the fixed-width | |
456 coding system is in Lisp, and this function allows us to rework the | |
457 arguments that `make-coding-system-internal' sees accordingly. | |
458 | |
459 If you are calling this function from anywhere but | |
460 `make-coding-system', you're probably doing something wrong." | |
461 (check-argument-type #'symbolp name) | |
462 (check-valid-plist props) | |
463 (let ((encode-failure-octet (or (plist-get props 'encode-failure-octet) | |
464 (char-to-int ?~))) | |
465 (unicode-map (plist-get props 'unicode-map)) | |
466 (hash-table-sym (gensym (format "%s-encode-table" name))) | |
467 encode-program decode-program decode-table encode-table skip-chars | |
468 invalid-sequences-skip-chars category) | |
469 | |
470 (check-argument-range encode-failure-octet 0 #xFF) | |
471 ;; unicode-map must be a true list, and must be non-nil. | |
472 (check-argument-type #'true-list-p unicode-map) | |
473 (check-argument-type #'consp unicode-map) | |
474 | |
475 ;; Don't pass on our extra data to make-coding-system-internal. | |
476 (setq props (plist-remprop props 'encode-failure-octet) | |
477 props (plist-remprop props 'unicode-map)) | |
478 | |
479 (multiple-value-setq | |
480 (decode-table encode-table) | |
481 (fixed-width-create-decode-encode-tables unicode-map)) | |
482 | |
483 ;; Register the decode-table. | |
484 (define-translation-hash-table hash-table-sym encode-table) | |
485 | |
486 ;; Generate the programs and skip-chars strings. | |
487 (setq decode-program (fixed-width-generate-decode-program decode-table)) | |
488 (multiple-value-setq | |
489 (encode-program skip-chars invalid-sequences-skip-chars) | |
490 (fixed-width-generate-encode-program-and-skip-chars-strings | |
491 decode-table encode-table encode-failure-octet)) | |
492 | |
493 (setq category (fixed-width-choose-category decode-table)) | |
494 | |
495 (unless (vectorp encode-program) | |
496 (setq encode-program | |
497 (apply #'vector | |
498 (nsublis (list (cons 'encode-table-sym hash-table-sym)) | |
499 (copy-tree encode-program))))) | |
500 (unless (vectorp decode-program) | |
501 (setq decode-program | |
502 (apply #'vector decode-program))) | |
503 | |
504 (loop for (symbol . value) | |
505 in `((decode . ,decode-program) | |
506 (encode . ,encode-program) | |
507 (from-unicode . ,encode-table) | |
508 (query-skip-chars . ,skip-chars) | |
509 (invalid-sequences-skip-chars . ,invalid-sequences-skip-chars) | |
510 (category . ,category)) | |
511 with default = (gensym) | |
512 do | |
513 (unless (eq default (plist-get props symbol default)) | |
514 (error | |
515 'invalid-argument | |
516 "Explicit property not allowed for fixed-width coding systems" | |
517 symbol)) | |
518 (setq props (nconc (list symbol value) props))) | |
519 props)) | |
520 | |
521 ;;;###autoload | |
522 (defun make-coding-system (name type description props) | |
523 "Register symbol NAME as a coding system. | |
524 | |
525 TYPE describes the conversion method used and should be one of | |
526 | |
527 nil or `undecided' | |
528 Automatic conversion. XEmacs attempts to detect the coding system | |
529 used in the file. | |
530 `chain' | |
531 Chain two or more coding systems together to make a combination coding | |
532 system. | |
533 `no-conversion' | |
534 No conversion. Use this for binary files and such. On output, | |
535 graphic characters that are not in ASCII or Latin-1 will be | |
536 replaced by a ?. (For a no-conversion-encoded buffer, these | |
537 characters will only be present if you explicitly insert them.) | |
538 `convert-eol' | |
539 Convert CRLF sequences or CR to LF. | |
540 `shift-jis' | |
541 Shift-JIS (a Japanese encoding commonly used in PC operating systems). | |
542 `unicode' | |
543 Any Unicode encoding (UCS-4, UTF-8, UTF-16, etc.). | |
544 `mswindows-unicode-to-multibyte' | |
545 (MS Windows only) Converts from Windows Unicode to Windows Multibyte | |
546 (any code page encoding) upon encoding, and the other way upon decoding. | |
547 `mswindows-multibyte' | |
548 Converts to or from Windows Multibyte (any code page encoding). | |
549 This is resolved into a chain of `mswindows-unicode' and | |
550 `mswindows-unicode-to-multibyte'. | |
551 `iso2022' | |
552 Any ISO2022-compliant encoding. Among other things, this includes | |
553 JIS (the Japanese encoding commonly used for e-mail), EUC (the | |
554 standard Unix encoding for Japanese and other languages), and | |
555 Compound Text (the encoding used in X11). You can specify more | |
556 specific information about the conversion with the PROPS argument. | |
557 `fixed-width' | |
558 A fixed-width eight bit encoding that is not necessarily compliant with | |
559 ISO 2022. This coding system assumes Unicode equivalency, that is, if | |
560 two given XEmacs characters have the same Unicode mapping, they will | |
561 always map to the same octet on disk. | |
562 `big5' | |
563 Big5 (the encoding commonly used for Mandarin Chinese in Taiwan). | |
564 `ccl' | |
565 The conversion is performed using a user-written pseudo-code | |
566 program. CCL (Code Conversion Language) is the name of this | |
567 pseudo-code. | |
568 `gzip' | |
569 GZIP compression format. | |
570 `internal' | |
571 Write out or read in the raw contents of the memory representing | |
572 the buffer's text. This is primarily useful for debugging | |
573 purposes, and is only enabled when XEmacs has been compiled with | |
574 DEBUG_XEMACS defined (via the --debug configure option). | |
575 WARNING: Reading in a file using `internal' conversion can result | |
576 in an internal inconsistency in the memory representing a | |
577 buffer's text, which will produce unpredictable results and may | |
578 cause XEmacs to crash. Under normal circumstances you should | |
579 never use `internal' conversion. | |
580 | |
581 DESCRIPTION is a short English phrase describing the coding system, | |
582 suitable for use as a menu item. (See also the `documentation' property | |
583 below.) | |
584 | |
585 PROPS is a property list, describing the specific nature of the | |
586 character set. Recognized properties are: | |
587 | |
588 `mnemonic' | |
589 String to be displayed in the modeline when this coding system is | |
590 active. | |
591 | |
592 `documentation' | |
593 Detailed documentation on the coding system. | |
594 | |
595 `aliases' | |
596 A list of aliases for the coding system. See | |
597 `define-coding-system-alias'. | |
598 | |
599 `eol-type' | |
600 End-of-line conversion to be used. It should be one of | |
601 | |
602 nil | |
603 Automatically detect the end-of-line type (LF, CRLF, | |
604 or CR). Also generate subsidiary coding systems named | |
605 `NAME-unix', `NAME-dos', and `NAME-mac', that are | |
606 identical to this coding system but have an EOL-TYPE | |
607 value of `lf', `crlf', and `cr', respectively. | |
608 `lf' | |
609 The end of a line is marked externally using ASCII LF. | |
610 Since this is also the way that XEmacs represents an | |
611 end-of-line internally, specifying this option results | |
612 in no end-of-line conversion. This is the standard | |
613 format for Unix text files. | |
614 `crlf' | |
615 The end of a line is marked externally using ASCII | |
616 CRLF. This is the standard format for MS-DOS text | |
617 files. | |
618 `cr' | |
619 The end of a line is marked externally using ASCII CR. | |
620 This is the standard format for Macintosh text files. | |
621 t | |
622 Automatically detect the end-of-line type but do not | |
623 generate subsidiary coding systems. (This value is | |
624 converted to nil when stored internally, and | |
625 `coding-system-property' will return nil.) | |
626 | |
627 `post-read-conversion' | |
628 The value is a function to call after some text is inserted and | |
629 decoded by the coding system itself and before any functions in | |
630 `after-change-functions' are called. (#### Not actually true in | |
631 XEmacs. `after-change-functions' will be called twice if | |
632 `post-read-conversion' changes something.) The argument of this | |
633 function is the same as for a function in | |
634 `after-insert-file-functions', i.e. LENGTH of the text inserted, | |
635 with point at the head of the text to be decoded. | |
636 | |
637 `pre-write-conversion' | |
638 The value is a function to call after all functions in | |
639 `write-region-annotate-functions' and `buffer-file-format' are | |
640 called, and before the text is encoded by the coding system itself. | |
641 The arguments to this function are the same as those of a function | |
642 in `write-region-annotate-functions', i.e. FROM and TO, specifying | |
643 a region of text. | |
644 | |
645 The following properties are used by `default-query-coding-region', | |
646 the default implementation of `query-coding-region'. This | |
647 implementation and these properties are not used by the Unicode coding | |
648 systems, nor by fixed-width coding systems. | |
649 | |
650 `safe-chars' | |
651 The value is a char table. If a character has non-nil value in it, | |
652 the character is safely supported by the coding system. | |
653 This overrides the `safe-charsets' property. | |
654 | |
655 `safe-charsets' | |
656 The value is a list of charsets safely supported by the coding | |
657 system. For coding systems based on ISO 2022, XEmacs may try to | |
658 encode characters outside these character sets, but outside of | |
659 East Asia and East Asian coding systems, it is unlikely that | |
660 consumers of the data will understand XEmacs' encoding. | |
661 The value t means that all XEmacs character sets handles are supported. | |
662 | |
663 The following properties are allowed for FSF compatibility but currently | |
664 ignored: | |
665 | |
666 `translation-table-for-decode' | |
667 The value is a translation table to be applied on decoding. See | |
668 the function `make-translation-table' for the format of translation | |
669 table. This is not applicable to CCL-based coding systems. | |
670 | |
671 `translation-table-for-encode' | |
672 The value is a translation table to be applied on encoding. This is | |
673 not applicable to CCL-based coding systems. | |
674 | |
675 `mime-charset' | |
676 The value is a symbol of which name is `MIME-charset' parameter of | |
677 the coding system. | |
678 | |
679 `valid-codes' (meaningful only for a coding system based on CCL) | |
680 The value is a list to indicate valid byte ranges of the encoded | |
681 file. Each element of the list is an integer or a cons of integer. | |
682 In the former case, the integer value is a valid byte code. In the | |
683 latter case, the integers specifies the range of valid byte codes. | |
684 | |
685 The following additional property is recognized if TYPE is `convert-eol': | |
686 | |
687 `subtype' | |
688 One of `lf', `crlf', `cr' or nil (for autodetection). When decoding, | |
689 the corresponding sequence will be converted to LF. When encoding, | |
690 the opposite happens. This coding system converts characters to | |
691 characters. | |
692 | |
693 | |
694 | |
695 The following additional properties are recognized if TYPE is `iso2022': | |
696 | |
697 `charset-g0' | |
698 `charset-g1' | |
699 `charset-g2' | |
700 `charset-g3' | |
701 The character set initially designated to the G0 - G3 registers. | |
702 The value should be one of | |
703 | |
704 -- A charset object (designate that character set) | |
705 -- nil (do not ever use this register) | |
706 -- t (no character set is initially designated to | |
707 the register, but may be later on; this automatically | |
708 sets the corresponding `force-g*-on-output' property) | |
709 | |
710 `force-g0-on-output' | |
711 `force-g1-on-output' | |
712 `force-g2-on-output' | |
713 `force-g2-on-output' | |
714 If non-nil, send an explicit designation sequence on output before | |
715 using the specified register. | |
716 | |
717 `short' | |
718 If non-nil, use the short forms \"ESC $ @\", \"ESC $ A\", and | |
719 \"ESC $ B\" on output in place of the full designation sequences | |
720 \"ESC $ ( @\", \"ESC $ ( A\", and \"ESC $ ( B\". | |
721 | |
722 `no-ascii-eol' | |
723 If non-nil, don't designate ASCII to G0 at each end of line on output. | |
724 Setting this to non-nil also suppresses other state-resetting that | |
725 normally happens at the end of a line. | |
726 | |
727 `no-ascii-cntl' | |
728 If non-nil, don't designate ASCII to G0 before control chars on output. | |
729 | |
730 `seven' | |
731 If non-nil, use 7-bit environment on output. Otherwise, use 8-bit | |
732 environment. | |
733 | |
734 `lock-shift' | |
735 If non-nil, use locking-shift (SO/SI) instead of single-shift | |
736 or designation by escape sequence. | |
737 | |
738 `no-iso6429' | |
739 If non-nil, don't use ISO6429's direction specification. | |
740 | |
741 `escape-quoted' | |
742 If non-nil, literal control characters that are the same as | |
743 the beginning of a recognized ISO2022 or ISO6429 escape sequence | |
744 (in particular, ESC (0x1B), SO (0x0E), SI (0x0F), SS2 (0x8E), | |
745 SS3 (0x8F), and CSI (0x9B)) are \"quoted\" with an escape character | |
746 so that they can be properly distinguished from an escape sequence. | |
747 (Note that doing this results in a non-portable encoding.) This | |
748 encoding flag is used for byte-compiled files. Note that ESC | |
749 is a good choice for a quoting character because there are no | |
750 escape sequences whose second byte is a character from the Control-0 | |
751 or Control-1 character sets; this is explicitly disallowed by the | |
752 ISO2022 standard. | |
753 | |
754 `input-charset-conversion' | |
755 A list of conversion specifications, specifying conversion of | |
756 characters in one charset to another when decoding is performed. | |
757 Each specification is a list of two elements: the source charset, | |
758 and the destination charset. | |
759 | |
760 `output-charset-conversion' | |
761 A list of conversion specifications, specifying conversion of | |
762 characters in one charset to another when encoding is performed. | |
763 The form of each specification is the same as for | |
764 `input-charset-conversion'. | |
765 | |
766 The following additional properties are recognized if TYPE is | |
767 `fixed-width': | |
768 | |
769 `unicode-map' | |
770 Required. A plist describing a map from octets in the coding system | |
771 NAME (as integers) to XEmacs characters. Those XEmacs characters will | |
772 be used explicitly on decoding, but for encoding (most relevantly, on | |
773 writing to disk) XEmacs characters that map to the same Unicode code | |
774 point will be unified. This means that the ISO-8859-? characters that | |
775 map to the same Unicode code point will not be distinct when written to | |
776 disk, which is normally what is intended; it also means that East Asian | |
777 Han characters from different XEmacs character sets will not be | |
778 distinct when written to disk, which is less often what is intended. | |
779 | |
780 Any octets not mapped, and with values above #x7f, will be decoded into | |
781 XEmacs characters that reflect that their values are undefined. These | |
782 characters will be displayed in a language-environment-specific | |
783 way. See `unicode-error-default-translation-table' and the | |
784 `invalid-sequence-coding-system' argument to `set-language-info'. | |
785 | |
786 These characters will normally be treated as invalid when checking | |
787 whether text can be encoded with `query-coding-region'--see the | |
788 IGNORE-INVALID-SEQUENCESP argument to that function to avoid this. It | |
789 is possible to specify that octets with values less than #x80 (or | |
790 indeed greater than it) be treated in this way, by specifying | |
791 explicitly that they correspond to the character mapping to that octet | |
792 in `unicode-error-default-translation-table'. Far fewer coding systems | |
793 override the ASCII mapping, though, so this is not the default. | |
794 | |
795 `encode-failure-octet' | |
796 An integer between 0 and 255 to write in place of XEmacs characters | |
797 that cannot be encoded, defaulting to the code for tilde `~'. | |
798 | |
799 The following additional properties are recognized (and required) | |
800 if TYPE is `ccl': | |
801 | |
802 `decode' | |
803 CCL program used for decoding (converting to internal format). | |
804 | |
805 `encode' | |
806 CCL program used for encoding (converting to external format). | |
807 | |
808 | |
809 The following additional properties are recognized if TYPE is `chain': | |
810 | |
811 `chain' | |
812 List of coding systems to be chained together, in decoding order. | |
813 | |
814 `canonicalize-after-coding' | |
815 Coding system to be returned by the detector routines in place of | |
816 this coding system. | |
817 | |
818 | |
819 | |
820 The following additional properties are recognized if TYPE is `unicode': | |
821 | |
822 `unicode-type' | |
823 One of `utf-16', `utf-8', `ucs-4', or `utf-7' (the latter is not | |
824 yet implemented). `utf-16' is the basic two-byte encoding; | |
825 `ucs-4' is the four-byte encoding; `utf-8' is an ASCII-compatible | |
826 variable-width 8-bit encoding; `utf-7' is a 7-bit encoding using | |
827 only characters that will safely pass through all mail gateways. | |
828 [[ This should be \"transformation format\". There should also be | |
829 `ucs-2' (or `bmp' -- no surrogates) and `utf-32' (range checked). ]] | |
830 | |
831 `little-endian' | |
832 If non-nil, `utf-16' and `ucs-4' will write out the groups of two | |
833 or four bytes little-endian instead of big-endian. This is required, | |
834 for example, under Windows. | |
835 | |
836 `need-bom' | |
837 If non-nil, a byte order mark (BOM, or Unicode FFFE) should be | |
838 written out at the beginning of the data. This serves both to | |
839 identify the endianness of the following data and to mark the | |
840 data as Unicode (at least, this is how Windows uses it). | |
841 [[ The correct term is \"signature\", since this technique may also | |
842 be used with UTF-8. That is the term used in the standard. ]] | |
843 | |
844 | |
845 The following additional properties are recognized if TYPE is | |
846 `mswindows-multibyte': | |
847 | |
848 `code-page' | |
849 Either a number (specifying a particular code page) or one of the | |
850 symbols `ansi', `oem', `mac', or `ebcdic', specifying the ANSI, | |
851 OEM, Macintosh, or EBCDIC code page associated with a particular | |
852 locale (given by the `locale' property). NOTE: EBCDIC code pages | |
853 only exist in Windows 2000 and later. | |
854 | |
855 `locale' | |
856 If `code-page' is a symbol, this specifies the locale whose code | |
857 page of the corresponding type should be used. This should be | |
858 one of the following: A cons of two strings, (LANGUAGE | |
859 . SUBLANGUAGE) (see `mswindows-set-current-locale'); a string (a | |
860 language; SUBLANG_DEFAULT, i.e. the default sublanguage, is | |
861 used); or one of the symbols `current', `user-default', or | |
862 `system-default', corresponding to the values of | |
863 `mswindows-current-locale', `mswindows-user-default-locale', or | |
864 `mswindows-system-default-locale', respectively. | |
865 | |
866 | |
867 The following additional properties are recognized if TYPE is `undecided': | |
868 \[[ Doesn't GNU use \"detect-*\" for the following two? ]] | |
869 | |
870 `do-eol' | |
871 Do EOL detection. | |
872 | |
873 `do-coding' | |
874 Do encoding detection. | |
875 | |
876 `coding-system' | |
877 If encoding detection is not done, use the specified coding system | |
878 to do decoding. This is used internally when implementing coding | |
879 systems with an EOL type that specifies autodetection (the default), | |
880 so that the detector routines return the proper subsidiary. | |
881 | |
882 | |
883 | |
884 The following additional property is recognized if TYPE is `gzip': | |
885 | |
886 `level' | |
887 Compression level: 0 through 9, or `default' (currently 6)." | |
888 (when (eq 'fixed-width type) | |
889 (setq props (fixed-width-rework-props-runtime name props))) | |
890 (make-coding-system-internal name type description props)) | |
891 | |
892 (define-compiler-macro make-coding-system (&whole form name type | |
893 &optional description props) | |
894 (if (equal '(quote fixed-width) type) | |
895 (if (memq (car-safe props) '(quote eval-when-compile)) | |
896 (let* ((props (if (eq 'eval-when-compile (car props)) | |
897 (eval (cadr props)) | |
898 (cadr props))) | |
899 (encode-failure-octet | |
900 (or (plist-get props 'encode-failure-octet) | |
901 (char-to-int ?~))) | |
902 (unicode-map (plist-get props 'unicode-map)) | |
903 (default-plist-entry (gensym)) | |
904 (encode-table-sym (gensym | |
905 (if (eq 'quote (car name)) | |
906 (format "%s-enc-" (second name))))) | |
907 encode-program decode-program | |
908 decode-table encode-table | |
909 skip-chars invalid-sequences-skip-chars category) | |
910 | |
911 (check-argument-range encode-failure-octet 0 #xFF) | |
912 ;; unicode-map must be a true list, and must be non-nil. | |
913 (check-argument-type #'true-list-p unicode-map) | |
914 (check-argument-type #'consp unicode-map) | |
915 | |
916 ;; Don't pass on our extra data to make-coding-system-internal. | |
917 (setq props (plist-remprop props 'encode-failure-octet) | |
918 props (plist-remprop props 'unicode-map)) | |
919 | |
920 (multiple-value-setq | |
921 (decode-table encode-table) | |
922 (fixed-width-create-decode-encode-tables unicode-map)) | |
923 | |
924 ;; Generate the decode and encode programs, and the skip-chars | |
925 ;; arguments. | |
926 (setq decode-program | |
927 (fixed-width-generate-decode-program decode-table) | |
928 category (fixed-width-choose-category decode-table)) | |
929 | |
930 (multiple-value-setq | |
931 (encode-program skip-chars invalid-sequences-skip-chars) | |
932 (fixed-width-generate-encode-program-and-skip-chars-strings | |
933 decode-table encode-table encode-failure-octet)) | |
934 | |
935 (unless (vectorp decode-program) | |
936 (setq decode-program | |
937 (apply #'vector decode-program))) | |
938 | |
939 (unless (eq default-plist-entry (plist-get props 'encode | |
940 default-plist-entry)) | |
941 (error | |
942 'invalid-argument | |
943 "Explicit property not allowed for fixed-width coding system" | |
944 'encode)) | |
945 (loop for (symbol . value) | |
946 in `((decode . ,decode-program) | |
947 (from-unicode . ,encode-table) | |
948 (query-skip-chars . ,skip-chars) | |
949 (invalid-sequences-skip-chars . | |
950 ,invalid-sequences-skip-chars) | |
951 (category . ,category)) | |
952 do | |
953 (unless (eq default-plist-entry (plist-get props symbol | |
954 default-plist-entry)) | |
955 (error | |
956 'invalid-argument | |
957 "Explicit property not allowed for \ | |
958 fixed-width coding systems" | |
959 symbol)) | |
960 (setq props (nconc (list symbol value) props))) | |
961 `(progn | |
962 (define-translation-hash-table ',encode-table-sym ,encode-table) | |
963 (make-coding-system-internal | |
964 ,name ,type ,description | |
965 ',(nconc (list 'encode | |
966 (apply #'vector | |
967 (nsublis | |
968 (list (cons 'encode-table-sym | |
969 encode-table-sym)) | |
970 encode-program))) | |
971 props)))) | |
972 ;; The form does not use literals; call make-coding-system at | |
973 ;; run time. | |
974 form) | |
975 (if (byte-compile-constp type) | |
976 ;; This is not a fixed-width call; compile it to a form that 21.4 | |
977 ;; can also understand. | |
978 `(funcall (or (and (fboundp 'make-coding-system-internal) | |
979 'make-coding-system-internal) | |
980 'make-coding-system) | |
981 ,@(cdr form)) | |
982 ;; TYPE is not literal; work things out at runtime. | |
983 form))) | |
984 |