Mercurial > hg > xemacs-beta
annotate lisp/mule/mule-coding.el @ 4604:e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
lisp/ChangeLog addition:
2009-02-07 Aidan Kehoe <kehoea@parhasard.net>
* coding.el (query-coding-clear-highlights):
Rename the BUFFER argument to BUFFER-OR-STRING, describe it as
possibly being a string in its documentation.
(default-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document that this
function does not support it.
Bind case-fold-search to nil, we don't want this to influence what the
function thinks is encodable or not.
(query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does; reflect this new argument in the associated compiler macro.
(query-coding-string):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does. Support the HIGHLIGHT argument correctly.
* unicode.el (unicode-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, document what it
does, implement this. Document a potential problem.
Use #'query-coding-clear-highlights instead of reimplementing it
ourselves.
Remove some debugging messages.
* mule/arabic.el (iso-8859-6):
* mule/cyrillic.el (iso-8859-5):
* mule/greek.el (iso-8859-7):
* mule/hebrew.el (iso-8859-8):
* mule/latin.el (iso-8859-2):
* mule/latin.el (iso-8859-3):
* mule/latin.el (iso-8859-4):
* mule/latin.el (iso-8859-14):
* mule/latin.el (iso-8859-15):
* mule/latin.el (iso-8859-16):
* mule/latin.el (iso-8859-9):
* mule/latin.el (windows-1252):
* mule/mule-coding.el (iso-8859-1):
Avoid the assumption that characters not given an explicit mapping
in these coding systems map to the ISO 8859-1 characters
corresponding to the octets on disk; this makes it much more
reasonable to implement the IGNORE-INVALID-SEQUENCESP argument to
query-coding-region.
* mule/mule-cmds.el (set-language-info):
Correct the docstring.
* mule/mule-cmds.el (finish-set-language-environment):
Treat invalid Unicode sequences produced from
invalid-sequence-coding-system and corresponding to control
characters the same as control characters in redisplay.
* mule/mule-cmds.el:
Document that encode-coding-char is available in coding.el
* mule/mule-coding.el (make-8-bit-generate-helper):
Change to return the both the encode-program generated and the
relevant non-ASCII charset; update the docstring to reflect this.
* mule/mule-coding.el
(make-8-bit-generate-encode-program-and-skip-chars-strings):
Rename this function; have it return skip-chars-strings as well as
the encode program. Have these skip-chars-strings use ranges for
charsets, where possible.
* mule/mule-coding.el (make-8-bit-create-decode-encode-tables):
Revise this to allow people to specify explicitly characters that
should be undefined (= corresponding to keys in
unicode-error-default-translation-table), and treating unspecified
octets above #x7f as undefined by default.
* mule/mule-coding.el (8-bit-fixed-query-coding-region):
Add a new IGNORE-INVALID-SEQUENCESP argument, implement support
for it using the 8-bit-fixed-invalid-sequences-skip-chars coding
system property; remove some debugging messages.
* mule/mule-coding.el (make-8-bit-coding-system):
This function is dumped, autoloading it makes no sense.
Document what happens when characters above #x7f are not
specified, implement this.
* mule/vietnamese.el:
Correct spelling.
tests/ChangeLog addition:
2009-02-07 Aidan Kehoe <kehoea@parhasard.net>
* automated/query-coding-tests.el:
Add FAILING-CASE arguments to the Assert calls, making #'q-c-debug
mostly unnecessary. Remove #'q-c-debug.
Add new tests that use the IGNORE-INVALID-SEQUENCESP argument to
#'query-coding-region; rework the existing ones to respect it.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sat, 07 Feb 2009 17:13:37 +0000 |
parents | 1d74a1d115ee |
children | c786c3fd0740 |
rev | line source |
---|---|
502 | 1 ;;; mule-coding.el --- Coding-system functions for Mule. -*- coding: iso-2022-7bit; -*- |
333 | 2 |
3 ;; Copyright (C) 1995 Electrotechnical Laboratory, JAPAN. | |
4 ;; Licensed to the Free Software Foundation. | |
5 ;; Copyright (C) 1995 Amdahl Corporation. | |
6 ;; Copyright (C) 1995 Sun Microsystems. | |
7 ;; Copyright (C) 1997 MORIOKA Tomohiko | |
771 | 8 ;; Copyright (C) 2001 Ben Wing. |
333 | 9 |
10 ;; This file is part of XEmacs. | |
11 | |
12 ;; XEmacs is free software; you can redistribute it and/or modify it | |
13 ;; under the terms of the GNU General Public License as published by | |
14 ;; the Free Software Foundation; either version 2, or (at your option) | |
15 ;; any later version. | |
16 | |
17 ;; XEmacs is distributed in the hope that it will be useful, but | |
18 ;; WITHOUT ANY WARRANTY; without even the implied warranty of | |
19 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |
20 ;; General Public License for more details. | |
21 | |
22 ;; You should have received a copy of the GNU General Public License | |
444 | 23 ;; along with XEmacs; see the file COPYING. If not, write to the |
333 | 24 ;; Free Software Foundation, Inc., 59 Temple Place - Suite 330, |
25 ;; Boston, MA 02111-1307, USA. | |
26 | |
27 ;;; Commentary: | |
28 | |
29 ;;; split off of mule.el and mostly moved to coding.el | |
30 | |
4072 | 31 ;; Needed for make-8-bit-coding-system. |
4080 | 32 (eval-when-compile (require 'ccl)) |
4072 | 33 |
333 | 34 ;;; Code: |
35 | |
36 (defun coding-system-force-on-output (coding-system register) | |
37 "Return the 'force-on-output property of CODING-SYSTEM for the specified REGISTER." | |
444 | 38 (check-type register integer) |
333 | 39 (coding-system-property |
40 coding-system | |
41 (case register | |
42 (0 'force-g0-on-output) | |
43 (1 'force-g1-on-output) | |
44 (2 'force-g2-on-output) | |
45 (3 'force-g3-on-output) | |
46 (t (signal 'args-out-of-range (list register 0 3)))))) | |
47 | |
48 (defun coding-system-short (coding-system) | |
49 "Return the 'short property of CODING-SYSTEM." | |
50 (coding-system-property coding-system 'short)) | |
51 | |
52 (defun coding-system-no-ascii-eol (coding-system) | |
53 "Return the 'no-ascii-eol property of CODING-SYSTEM." | |
54 (coding-system-property coding-system 'no-ascii-eol)) | |
55 | |
56 (defun coding-system-no-ascii-cntl (coding-system) | |
57 "Return the 'no-ascii-cntl property of CODING-SYSTEM." | |
58 (coding-system-property coding-system 'no-ascii-cntl)) | |
59 | |
60 (defun coding-system-seven (coding-system) | |
61 "Return the 'seven property of CODING-SYSTEM." | |
62 (coding-system-property coding-system 'seven)) | |
63 | |
64 (defun coding-system-lock-shift (coding-system) | |
65 "Return the 'lock-shift property of CODING-SYSTEM." | |
66 (coding-system-property coding-system 'lock-shift)) | |
67 | |
68 ;;(defun coding-system-use-japanese-jisx0201-roman (coding-system) | |
69 ;; "Return the 'use-japanese-jisx0201-roman property of CODING-SYSTEM." | |
70 ;; (coding-system-property coding-system 'use-japanese-jisx0201-roman)) | |
71 | |
72 ;;(defun coding-system-use-japanese-jisx0208-1978 (coding-system) | |
73 ;; "Return the 'use-japanese-jisx0208-1978 property of CODING-SYSTEM." | |
74 ;; (coding-system-property coding-system 'use-japanese-jisx0208-2978)) | |
75 | |
76 (defun coding-system-no-iso6429 (coding-system) | |
77 "Return the 'no-iso6429 property of CODING-SYSTEM." | |
78 (coding-system-property coding-system 'no-iso6429)) | |
79 | |
80 (defun coding-system-ccl-encode (coding-system) | |
81 "Return the CCL 'encode property of CODING-SYSTEM." | |
82 (coding-system-property coding-system 'encode)) | |
83 | |
84 (defun coding-system-ccl-decode (coding-system) | |
85 "Return the CCL 'decode property of CODING-SYSTEM." | |
86 (coding-system-property coding-system 'decode)) | |
87 | |
771 | 88 (defun coding-system-iso2022-charset (coding-system register) |
89 "Return the charset initially designated to REGISTER in CODING-SYSTEM. | |
90 The allowable range of REGISTER is 0 through 3." | |
91 (if (or (< register 0) (> register 3)) | |
92 (error 'args-out-of-range "coding-system-charset REGISTER" register 0 3)) | |
93 (coding-system-property coding-system (nth register '(charset-g0 | |
94 charset-g1 | |
95 charset-g2 | |
96 charset-g3)))) | |
97 | |
333 | 98 |
99 ;;;; Definitions of predefined coding systems | |
100 | |
101 (make-coding-system | |
102 'ctext 'iso2022 | |
771 | 103 "Compound Text" |
333 | 104 '(charset-g0 ascii |
105 charset-g1 latin-iso8859-1 | |
106 eol-type nil | |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
107 safe-charsets t ;; Reasonable |
333 | 108 mnemonic "CText")) |
109 | |
110 (make-coding-system | |
111 'iso-2022-8bit-ss2 'iso2022 | |
771 | 112 "ISO-2022 8-bit w/SS2" |
333 | 113 '(charset-g0 ascii |
114 charset-g1 latin-iso8859-1 | |
115 charset-g2 t ;; unspecified but can be used later. | |
116 short t | |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
117 safe-charsets (ascii katakana-jisx0201 japanese-jisx0208-1978 |
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
118 japanese-jisx0208 japanese-jisx0212 japanese-jisx0213-1 |
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
119 japanese-jisx0213-2) |
333 | 120 mnemonic "ISO8/SS" |
771 | 121 documentation "ISO 2022 based 8-bit encoding using SS2 for 96-charset" |
333 | 122 )) |
123 | |
124 (make-coding-system | |
125 'iso-2022-7bit-ss2 'iso2022 | |
771 | 126 "ISO-2022 7-bit w/SS2" |
333 | 127 '(charset-g0 ascii |
128 charset-g2 t ;; unspecified but can be used later. | |
129 seven t | |
130 short t | |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
131 safe-charsets t |
333 | 132 mnemonic "ISO7/SS" |
771 | 133 documentation "ISO 2022 based 7-bit encoding using SS2 for 96-charset" |
333 | 134 eol-type nil)) |
135 | |
136 ;; (copy-coding-system 'iso-2022-7bit-ss2 'iso-2022-jp-2) | |
137 (make-coding-system | |
138 'iso-2022-jp-2 'iso2022 | |
771 | 139 "ISO-2022-JP-2" |
333 | 140 '(charset-g0 ascii |
141 charset-g2 t ;; unspecified but can be used later. | |
142 seven t | |
143 short t | |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
144 safe-charsets t |
333 | 145 mnemonic "ISO7/SS" |
146 eol-type nil)) | |
147 | |
148 (make-coding-system | |
149 'iso-2022-7bit 'iso2022 | |
771 | 150 "ISO 2022 7-bit" |
333 | 151 '(charset-g0 ascii |
152 seven t | |
153 short t | |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
154 safe-charsets t |
771 | 155 mnemonic "ISO7" |
156 documentation "ISO-2022-based 7-bit encoding using only G0" | |
157 )) | |
333 | 158 |
159 ;; compatibility for old XEmacsen | |
771 | 160 (define-coding-system-alias 'iso-2022-7 'iso-2022-7bit) |
333 | 161 |
162 (make-coding-system | |
163 'iso-2022-8 'iso2022 | |
771 | 164 "ISO-2022 8-bit" |
333 | 165 '(charset-g0 ascii |
166 charset-g1 latin-iso8859-1 | |
167 short t | |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
168 safe-charsets t |
333 | 169 mnemonic "ISO8" |
771 | 170 documentation "ISO-2022 eight-bit coding system. No single-shift or locking-shift." |
333 | 171 )) |
172 | |
173 (make-coding-system | |
174 'escape-quoted 'iso2022 | |
771 | 175 "Escape-Quoted (for .ELC files)" |
333 | 176 '(charset-g0 ascii |
177 charset-g1 latin-iso8859-1 | |
178 eol-type lf | |
179 escape-quoted t | |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
180 safe-charsets t |
333 | 181 mnemonic "ESC/Quot" |
771 | 182 documentation "ISO-2022 eight-bit coding system with escape quoting; used for .ELC files." |
333 | 183 )) |
184 | |
185 (make-coding-system | |
186 'iso-2022-lock 'iso2022 | |
771 | 187 "ISO-2022 w/locking-shift" |
333 | 188 '(charset-g0 ascii |
189 charset-g1 t ;; unspecified but can be used later. | |
190 seven t | |
191 lock-shift t | |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4567
diff
changeset
|
192 safe-charsets t |
333 | 193 mnemonic "ISO7/Lock" |
771 | 194 documentation "ISO-2022 coding system using Locking-Shift for 96-charset." |
333 | 195 )) |
4072 | 196 |
333 | 197 |
4072 | 198 ;; This is used by people writing CCL programs, but is called at runtime. |
199 (defun define-translation-hash-table (symbol table) | |
200 "Define SYMBOL as the name of the hash translation TABLE for use in CCL. | |
201 | |
202 Analogous to `define-translation-table', but updates | |
203 `translation-hash-table-vector' and the table is for use in the CCL | |
204 `lookup-integer' and `lookup-character' functions." | |
4145 | 205 (check-argument-type #'symbolp symbol) |
206 (check-argument-type #'hash-table-p table) | |
4072 | 207 (let ((len (length translation-hash-table-vector)) |
208 (id 0) | |
209 done) | |
210 (put symbol 'translation-hash-table table) | |
211 (while (not done) | |
212 (if (>= id len) | |
213 (setq translation-hash-table-vector | |
214 (vconcat translation-hash-table-vector [nil]))) | |
215 (let ((slot (aref translation-hash-table-vector id))) | |
216 (if (or (not slot) | |
217 (eq (car slot) symbol)) | |
218 (progn | |
219 (aset translation-hash-table-vector id (cons symbol table)) | |
220 (setq done t)) | |
221 (setq id (1+ id))))) | |
222 (put symbol 'translation-hash-table-id id) | |
223 id)) | |
224 | |
225 (defvar make-8-bit-private-use-start (decode-char 'ucs #xE000) | |
226 "Start of a 256 code private use area for make-8-bit-coding-system. | |
227 | |
228 This is used to ensure that distinct octets on disk for a given coding | |
229 system map to distinct XEmacs characters, preventing a spurious changes when | |
230 a file is read, not changed, and then written. ") | |
231 | |
232 (defun make-8-bit-generate-helper (decode-table encode-table | |
233 encode-failure-octet) | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
234 "Helper function, `make-8-bit-generate-encode-program-and-skip-chars-strings', |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
235 which see. |
4072 | 236 |
4145 | 237 Deals with the case where ASCII and another character set can both be |
238 encoded unambiguously and completely into the coding-system; if this is so, | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
239 returns a list comprised of such a ccl-program and the character set in |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
240 question. If not, it returns a list with both entries nil." |
4072 | 241 (let ((tentative-encode-program-parts |
242 (eval-when-compile | |
4295 | 243 (let* ((vec-len 128) |
244 (compiled | |
4072 | 245 (append |
246 (ccl-compile | |
247 `(1 | |
248 (loop | |
249 (read-multibyte-character r0 r1) | |
250 (if (r0 == ,(charset-id 'ascii)) | |
251 (write r1) | |
252 ((if (r0 == #xABAB) | |
253 ;; #xBFFE is a sentinel in the compiled | |
254 ;; program. | |
4295 | 255 ((r0 = r1 & #x7F) |
256 (write r0 ,(make-vector vec-len #xBFFE))) | |
4072 | 257 ((mule-to-unicode r0 r1) |
258 (if (r0 == #xFFFD) | |
259 (write #xBEEF) | |
260 ((lookup-integer encode-table-sym r0 r3) | |
261 (if r7 | |
262 (write-multibyte-character r0 r3) | |
263 (write #xBEEF)))))))) | |
264 (repeat)))) nil)) | |
265 (first-part compiled) | |
266 (last-part | |
267 (member-if-not (lambda (entr) (eq #xBFFE entr)) | |
268 (member-if | |
269 (lambda (entr) (eq #xBFFE entr)) | |
270 first-part)))) | |
271 (while compiled | |
4295 | 272 (when (eq #xBFFE (cadr compiled)) |
273 (assert (= vec-len (search '(#xBFFE) (cdr compiled) | |
274 :test #'/=)) nil | |
275 "Strange ccl vector length") | |
276 (setcdr compiled nil)) | |
4072 | 277 (setq compiled (cdr compiled))) |
278 ;; Is the generated code as we expect it to be? | |
279 (assert (and (memq #xABAB first-part) | |
280 (memq #xBEEF14 last-part)) | |
281 nil | |
282 "This code assumes that the constant #xBEEF is #xBEEF14 in \ | |
283 compiled CCL code,\nand that the constant #xABAB is #xABAB. If that is | |
284 not the case, and it appears not to be--that's why you're getting this | |
285 message--it will not work. ") | |
4295 | 286 (list first-part last-part vec-len)))) |
4072 | 287 (charset-lower -1) |
288 (charset-upper -1) | |
289 worth-trying known-charsets encode-program | |
4295 | 290 other-charset-vector ucs args-out-of-range) |
4072 | 291 |
292 (loop for char across decode-table | |
293 do (pushnew (char-charset char) known-charsets)) | |
294 (setq known-charsets (delq 'ascii known-charsets)) | |
295 | |
296 (loop for known-charset in known-charsets | |
297 do | |
298 ;; This is not possible for two dimensional charsets. | |
299 (when (eq 1 (charset-dimension known-charset)) | |
300 (setq args-out-of-range t) | |
301 (if (eq 'control-1 known-charset) | |
302 (setq charset-lower 0 | |
303 charset-upper 31) | |
304 ;; There should be a nicer way to get the limits here. | |
305 (condition-case args-out-of-range | |
306 (make-char known-charset #x100) | |
307 (args-out-of-range | |
308 (setq charset-lower (third args-out-of-range) | |
309 charset-upper (fourth args-out-of-range))))) | |
310 (loop | |
311 for i from charset-lower to charset-upper | |
312 always (and (setq ucs | |
313 (encode-char (make-char known-charset i) 'ucs)) | |
314 (gethash ucs encode-table)) | |
315 finally (setq worth-trying known-charset)) | |
316 | |
317 ;; Only trying this for one charset at a time, the first find. | |
318 (when worth-trying (return)) | |
319 | |
320 ;; Okay, this charset is not worth trying, Try the next. | |
321 (setq charset-lower -1 | |
322 charset-upper -1 | |
323 worth-trying nil))) | |
324 | |
325 (when worth-trying | |
4295 | 326 (setq other-charset-vector |
327 (make-vector (third tentative-encode-program-parts) | |
328 encode-failure-octet)) | |
4072 | 329 (loop for i from charset-lower to charset-upper |
4090 | 330 do (aset other-charset-vector i |
4072 | 331 (gethash (encode-char (make-char worth-trying i) |
332 'ucs) encode-table))) | |
333 (setq encode-program | |
334 (nsublis | |
335 (list (cons #xABAB (charset-id worth-trying))) | |
336 (nconc | |
337 (copy-list (first | |
338 tentative-encode-program-parts)) | |
339 (append other-charset-vector nil) | |
340 (copy-tree (second | |
341 tentative-encode-program-parts)))))) | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
342 (values encode-program worth-trying))) |
4072 | 343 |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
344 (defun make-8-bit-generate-encode-program-and-skip-chars-strings |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
345 (decode-table encode-table encode-failure-octet) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
346 "Generate a CCL program to encode a 8-bit fixed-width charset. |
4072 | 347 |
348 DECODE-TABLE must have 256 non-cons entries, and will be regarded as | |
349 describing a map from the octet corresponding to an offset in the | |
350 table to the that entry in the table. ENCODE-TABLE is a hash table | |
351 map from unicode values to characters in the range [0,255]. | |
352 ENCODE-FAILURE-OCTET describes an integer between 0 and 255 | |
353 \(inclusive) to write in the event that a character cannot be encoded. " | |
354 (check-argument-type #'vectorp decode-table) | |
355 (check-argument-range (length decode-table) #x100 #x100) | |
356 (check-argument-type #'hash-table-p encode-table) | |
357 (check-argument-type #'integerp encode-failure-octet) | |
358 (check-argument-range encode-failure-octet #x00 #xFF) | |
359 (let ((encode-program nil) | |
360 (general-encode-program | |
361 (eval-when-compile | |
362 (let ((prog (append | |
363 (ccl-compile | |
364 `(1 | |
365 (loop | |
366 (read-multibyte-character r0 r1) | |
367 (mule-to-unicode r0 r1) | |
368 (if (r0 == #xFFFD) | |
369 (write #xBEEF) | |
370 ((lookup-integer encode-table-sym r0 r3) | |
371 (if r7 | |
372 (write-multibyte-character r0 r3) | |
373 (write #xBEEF)))) | |
374 (repeat)))) nil))) | |
375 (assert (memq #xBEEF14 prog) | |
376 nil | |
377 "This code assumes that the constant #xBEEF is #xBEEF14 \ | |
378 in compiled CCL code.\nIf that is not the case, and it appears not to | |
379 be--that's why you're getting this message--it will not work. ") | |
380 prog))) | |
381 (encode-program-with-ascii-optimisation | |
382 (eval-when-compile | |
383 (let ((prog (append | |
384 (ccl-compile | |
385 `(1 | |
386 (loop | |
387 (read-multibyte-character r0 r1) | |
388 (if (r0 == ,(charset-id 'ascii)) | |
389 (write r1) | |
390 ((mule-to-unicode r0 r1) | |
391 (if (r0 == #xFFFD) | |
392 (write #xBEEF) | |
393 ((lookup-integer encode-table-sym r0 r3) | |
394 (if r7 | |
395 (write-multibyte-character r0 r3) | |
396 (write #xBEEF)))))) | |
397 (repeat)))) nil))) | |
398 (assert (memq #xBEEF14 prog) | |
399 nil | |
400 "This code assumes that the constant #xBEEF is #xBEEF14 \ | |
401 in compiled CCL code.\nIf that is not the case, and it appears not to | |
402 be--that's why you're getting this message--it will not work. ") | |
403 prog))) | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
404 (ascii-encodes-as-itself nil) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
405 (control-1-encodes-as-itself t) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
406 (invalid-sequence-code-point-start |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
407 (eval-when-compile |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
408 (char-to-unicode |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
409 (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
410 further-char-set skip-chars invalid-sequences-skip-chars) |
4072 | 411 |
412 ;; Is this coding system ASCII-compatible? If so, we can avoid the hash | |
413 ;; table lookup for those characters. | |
414 (loop | |
415 for i from #x00 to #x7f | |
416 always (eq (int-to-char i) (gethash i encode-table)) | |
417 finally (setq ascii-encodes-as-itself t)) | |
418 | |
419 ;; Note that this logic handles EBCDIC badly. For example, CP037, | |
420 ;; MIME name ebcdic-na, has the entire repertoire of ASCII and | |
421 ;; Latin 1, and thus a more optimal ccl encode program would check | |
422 ;; for those character sets and use tables. But for now, we do a | |
423 ;; hash table lookup for every character. | |
424 (if (null ascii-encodes-as-itself) | |
425 ;; General encode program. Pros; general and correct. Cons; | |
426 ;; slow, a hash table lookup + mule-unicode conversion is done | |
427 ;; for every character encoding. | |
428 (setq encode-program general-encode-program) | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
429 (multiple-value-setq |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
430 (encode-program further-char-set) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
431 ;; Encode program with ascii-ascii mapping (based on a |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
432 ;; character's mule character set), and one other mule |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
433 ;; character set using table-based encoding, other |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
434 ;; character sets using hash table lookups. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
435 ;; make-8-bit-non-ascii-completely-coveredp only returns |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
436 ;; such a mapping if some non-ASCII charset with |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
437 ;; characters in decode-table is entirely covered by |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
438 ;; encode-table. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
439 (make-8-bit-generate-helper decode-table encode-table |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
440 encode-failure-octet)) |
4072 | 441 (unless encode-program |
442 ;; If make-8-bit-non-ascii-completely-coveredp returned nil, | |
443 ;; but ASCII still encodes as itself, do one-to-one mapping | |
444 ;; for ASCII, and a hash table lookup for everything else. | |
445 (setq encode-program encode-program-with-ascii-optimisation))) | |
446 | |
447 (setq encode-program | |
448 (nsublis | |
449 (list (cons #xBEEF14 | |
450 (logior (lsh encode-failure-octet 8) | |
451 #x14))) | |
452 (copy-tree encode-program))) | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
453 (loop |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
454 for i from #x80 to #x9f |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
455 do (unless (= i (aref decode-table i)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
456 (setq control-1-encodes-as-itself nil) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
457 (return))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
458 (loop |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
459 for i from #x00 to #xFF |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
460 initially (setq skip-chars |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
461 (cond |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
462 ((and ascii-encodes-as-itself |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
463 control-1-encodes-as-itself further-char-set) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
464 (concat "\x00-\x9f" (charset-skip-chars-string |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
465 further-char-set))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
466 ((and ascii-encodes-as-itself |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
467 control-1-encodes-as-itself) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
468 "\x00-\x9f") |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
469 ((null ascii-encodes-as-itself) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
470 (skip-chars-quote (apply #'string |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
471 (append decode-table nil)))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
472 (further-char-set |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
473 (concat (charset-skip-chars-string 'ascii) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
474 (charset-skip-chars-string further-char-set))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
475 (t |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
476 (charset-skip-chars-string 'ascii))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
477 invalid-sequences-skip-chars "") |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
478 with decoded-ucs = nil |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
479 with decoded = nil |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
480 with no-ascii-transparency-skip-chars-list = |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
481 (unless ascii-encodes-as-itself (append decode-table nil)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
482 ;; Can't use #'match-string here, see: |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
483 ;; http://mid.gmane.org/18829.34118.709782.704574@parhasard.net |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
484 with skip-chars-test = |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
485 #'(lambda (skip-chars-string testing) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
486 (with-temp-buffer |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
487 (insert testing) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
488 (goto-char (point-min)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
489 (skip-chars-forward skip-chars-string) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
490 (= (point) (point-max)))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
491 do |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
492 (setq decoded (aref decode-table i) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
493 decoded-ucs (char-to-unicode decoded)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
494 (cond |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
495 ((<= invalid-sequence-code-point-start decoded-ucs |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
496 (+ invalid-sequence-code-point-start #xFF)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
497 (setq invalid-sequences-skip-chars |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
498 (concat (string decoded) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
499 invalid-sequences-skip-chars)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
500 (assert (not (funcall skip-chars-test skip-chars decoded)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
501 "This char should only be skipped with \ |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
502 `invalid-sequences-skip-chars', not by `skip-chars'")) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
503 ((not (funcall skip-chars-test skip-chars decoded)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
504 (if ascii-encodes-as-itself |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
505 (setq skip-chars (concat skip-chars (string decoded))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
506 (push decoded no-ascii-transparency-skip-chars-list)))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
507 finally (unless ascii-encodes-as-itself |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
508 (setq skip-chars |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
509 (skip-chars-quote |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
510 (apply #'string |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
511 no-ascii-transparency-skip-chars-list))))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
512 (values encode-program skip-chars invalid-sequences-skip-chars))) |
4072 | 513 |
514 (defun make-8-bit-create-decode-encode-tables (unicode-map) | |
515 "Return a list \(DECODE-TABLE ENCODE-TABLE) given UNICODE-MAP. | |
516 UNICODE-MAP should be an alist mapping from integer octet values to | |
517 characters with UCS code points; DECODE-TABLE will be a 256-element | |
518 vector, and ENCODE-TABLE will be a hash table mapping from 256 numbers | |
519 to 256 distinct characters. " | |
520 (check-argument-type #'listp unicode-map) | |
521 (let ((decode-table (make-vector 256 nil)) | |
522 (encode-table (make-hash-table :size 256)) | |
523 (private-use-start (encode-char make-8-bit-private-use-start 'ucs)) | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
524 (invalid-sequence-code-point-start |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
525 (eval-when-compile |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
526 (char-to-unicode |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
527 (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3)))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
528 desired-ucs decode-table-entry) |
4072 | 529 |
530 (loop for (external internal) | |
531 in unicode-map | |
532 do | |
533 (aset decode-table external internal) | |
534 (assert (not (eq (encode-char internal 'ucs) -1)) | |
535 nil | |
536 "Looks like you're calling make-8-bit-coding-system in a \ | |
537 dumped file, \nand you're either not providing a literal UNICODE-MAP | |
538 or PROPS. Don't do that; make-8-bit-coding-system relies on sensible | |
539 Unicode mappings being available, which they are at compile time for | |
540 dumped files (but this requires the mentioned literals), but not, for | |
541 most of them, at run time. ") | |
542 | |
543 (puthash (encode-char internal 'ucs) | |
544 ;; This is semantically an integer, but Dave Love's design | |
545 ;; for lookup-integer in CCL means we need to store it as a | |
546 ;; character. | |
547 (int-to-char external) | |
548 encode-table)) | |
549 | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
550 ;; Now, go through the decode table. For octet values above #x7f, if the |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
551 ;; decode table entry is nil, this means that they have an undefined |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
552 ;; mapping (= they map to XEmacs characters with keys in |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
553 ;; unicode-error-default-translation-table); for octet values below or |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
554 ;; equal to #x7f, it means that they map to ASCII. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
555 |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
556 ;; If any entry (whether below or above #x7f) in the decode-table |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
557 ;; already maps to some character with a key in |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
558 ;; unicode-error-default-translation-table, it is treated as an |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
559 ;; undefined octet by `query-coding-region'. That is, it is not |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
560 ;; necessary for an octet value to be above #x7f for this to happen. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
561 |
4072 | 562 (dotimes (i 256) |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
563 (setq decode-table-entry (aref decode-table i)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
564 (if decode-table-entry |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
565 (when (get-char-table |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
566 decode-table-entry |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
567 unicode-error-default-translation-table) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
568 ;; The caller is explicitly specifying that this octet |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
569 ;; corresponds to an invalid sequence on disk: |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
570 (assert (= (get-char-table |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
571 decode-table-entry |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
572 unicode-error-default-translation-table) i) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
573 "Bad argument to `make-8-bit-coding-system'. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
574 If you're going to designate an octet with value below #x80 as invalid |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
575 for this coding system, make sure to map it to the invalid sequence |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
576 character corresponding to its octet value on disk. ")) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
577 |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
578 ;; decode-table-entry is nil; either the octet is to be treated as |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
579 ;; contributing to an error sequence (when (> #x7f i)), or it should |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
580 ;; be attempted to treat it as ASCII-equivalent. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
581 (setq desired-ucs (or (and (< i #x80) i) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
582 (+ invalid-sequence-code-point-start i))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
583 (while (gethash desired-ucs encode-table) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
584 (assert (not (< i #x80)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
585 "UCS code point should not already be in encode-table!" |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
586 ;; There is one invalid sequence char per octet value; |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
587 ;; with eight-bit-fixed coding systems, it makes no sense |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
588 ;; for us to be multiply allocating them. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
589 (gethash desired-ucs encode-table)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
590 (setq desired-ucs (+ private-use-start desired-ucs) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
591 private-use-start (+ private-use-start 1))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
592 (puthash desired-ucs (int-to-char i) encode-table) |
4085 | 593 (setq desired-ucs (if (> desired-ucs #xFF) |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
594 (unicode-to-char desired-ucs) |
4085 | 595 ;; So we get Latin-1 when run at dump time, |
596 ;; instead of JIT-allocated characters. | |
597 (int-to-char desired-ucs))) | |
598 (aset decode-table i desired-ucs))) | |
4072 | 599 (values decode-table encode-table))) |
600 | |
601 (defun make-8-bit-generate-decode-program (decode-table) | |
602 "Given DECODE-TABLE, generate a CCL program to decode an 8-bit charset. | |
603 DECODE-TABLE must have 256 non-cons entries, and will be regarded as | |
604 describing a map from the octet corresponding to an offset in the | |
605 table to the that entry in the table. " | |
606 (check-argument-type #'vectorp decode-table) | |
607 (check-argument-range (length decode-table) #x100 #x100) | |
608 (let ((decode-program-parts | |
609 (eval-when-compile | |
610 (let* ((compiled | |
611 (append | |
612 (ccl-compile | |
613 `(3 | |
614 ((read r0) | |
615 (loop | |
616 (write-read-repeat r0 ,(make-vector | |
617 256 'sentinel)))))) nil)) | |
618 (first-part compiled) | |
619 (last-part | |
620 (member-if-not #'symbolp | |
621 (member-if-not #'integerp first-part)))) | |
622 ;; Chop off the sentinel sentinel sentinel [..] part. | |
623 (while compiled | |
624 (if (symbolp (cadr compiled)) | |
625 (setcdr compiled nil)) | |
626 (setq compiled (cdr compiled))) | |
627 (list first-part last-part))))) | |
628 (nconc | |
629 ;; copy-list needed, because the structure of the literal provided | |
630 ;; by our eval-when-compile hangs around. | |
631 (copy-list (first decode-program-parts)) | |
632 (append decode-table nil) | |
633 (second decode-program-parts)))) | |
634 | |
4145 | 635 (defun make-8-bit-choose-category (decode-table) |
636 "Given DECODE-TABLE, return an appropriate coding category. | |
637 DECODE-TABLE is a 256-entry vector describing the mapping from octets on | |
638 disk to XEmacs characters for some fixed-width 8-bit coding system. " | |
639 (check-argument-type #'vectorp decode-table) | |
640 (check-argument-range (length decode-table) #x100 #x100) | |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
641 (loop |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
642 named category |
4559
bd1a68c34d44
Merge my change of 2008-05-14 to the query-coding-region code.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4558
diff
changeset
|
643 for i from #x80 to #x9F |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
644 do (unless (= i (aref decode-table i)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
645 (return-from category 'no-conversion)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
646 finally return 'iso-8-1)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
647 |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
648 (defun 8-bit-fixed-query-coding-region (begin end coding-system &optional |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
649 buffer ignore-invalid-sequencesp |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
650 errorp highlightp) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
651 "The `query-coding-region' implementation for 8-bit-fixed coding systems. |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
652 |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
653 Uses the `8-bit-fixed-query-from-unicode' and `8-bit-fixed-query-skip-chars' |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
654 coding system properties. The former is a hash table mapping from valid |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
655 Unicode code points to on-disk octets in the coding system; the latter a set |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
656 of characters as used by `skip-chars-forward'. Both of these properties are |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
657 generated automatically by `make-8-bit-coding-system'. |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
658 |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
659 See that the documentation of `query-coding-region'; see also |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
660 `make-8-bit-coding-system'. " |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
661 (check-argument-type #'coding-system-p |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
662 (setq coding-system (find-coding-system coding-system))) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
663 (check-argument-type #'integer-or-marker-p begin) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
664 (check-argument-type #'integer-or-marker-p end) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
665 (let ((from-unicode |
4551 | 666 (or (coding-system-get coding-system '8-bit-fixed-query-from-unicode) |
667 (coding-system-get (coding-system-base coding-system) | |
668 '8-bit-fixed-query-from-unicode))) | |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
669 (skip-chars-arg |
4551 | 670 (or (coding-system-get coding-system '8-bit-fixed-query-skip-chars) |
671 (coding-system-get (coding-system-base coding-system) | |
672 '8-bit-fixed-query-skip-chars))) | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
673 (invalid-sequences-skip-chars |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
674 (or (coding-system-get coding-system |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
675 '8-bit-fixed-invalid-sequences-skip-chars) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
676 (coding-system-get (coding-system-base coding-system) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
677 '8-bit-fixed-invalid-sequences-skip-chars))) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
678 (ranges (make-range-table)) |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
679 (case-fold-search nil) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
680 char-after fail-range-start fail-range-end previous-fail extent |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
681 failed invalid-sequences-looking-at failed-reason |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
682 previous-failed-reason) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
683 (check-type from-unicode hash-table) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
684 (check-type skip-chars-arg string) |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
685 (check-type invalid-sequences-skip-chars string) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
686 (setq invalid-sequences-looking-at |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
687 (if (equal "" invalid-sequences-skip-chars) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
688 ;; Regexp that will never match. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
689 #r".\{0,0\}" |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
690 (concat "[" invalid-sequences-skip-chars "]"))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
691 (when ignore-invalid-sequencesp |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
692 (setq skip-chars-arg |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
693 (concat skip-chars-arg invalid-sequences-skip-chars))) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
694 (save-excursion |
4551 | 695 (when highlightp |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
696 (query-coding-clear-highlights begin end buffer)) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
697 (goto-char begin buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
698 (skip-chars-forward skip-chars-arg end buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
699 (while (< (point buffer) end) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
700 (setq char-after (char-after (point buffer) buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
701 fail-range-start (point buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
702 (while (and |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
703 (< (point buffer) end) |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
704 (or (and |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
705 (not (gethash (encode-char char-after 'ucs) from-unicode)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
706 (setq failed-reason 'unencodable)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
707 (and (not ignore-invalid-sequencesp) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
708 (looking-at invalid-sequences-looking-at buffer) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
709 (setq failed-reason 'invalid-sequence))) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
710 (or (null previous-failed-reason) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
711 (eq previous-failed-reason failed-reason))) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
712 (forward-char 1 buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
713 (setq char-after (char-after (point buffer) buffer) |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
714 failed t |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
715 previous-failed-reason failed-reason)) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
716 (if (= fail-range-start (point buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
717 ;; The character can actually be encoded by the coding |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
718 ;; system; check the characters past it. |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
719 (forward-char 1 buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
720 ;; The character actually failed. |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
721 (when errorp |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
722 (error 'text-conversion-error |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
723 (format "Cannot encode %s using coding system" |
4551 | 724 (buffer-substring fail-range-start (point buffer) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
725 buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
726 (coding-system-name coding-system))) |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
727 (assert (not (null previous-failed-reason)) t |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
728 "previous-failed-reason should always be non-nil here") |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
729 (put-range-table fail-range-start |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
730 ;; If char-after is non-nil, we're not at |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
731 ;; the end of the buffer. |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
732 (setq fail-range-end (if char-after |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
733 (point buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
734 (point-max buffer))) |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
735 previous-failed-reason ranges) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
736 (setq previous-failed-reason nil) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
737 (when highlightp |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
738 (setq extent (make-extent fail-range-start fail-range-end buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
739 (set-extent-priority extent (+ mouse-highlight-priority 2)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
740 (set-extent-face extent 'query-coding-warning-face)) |
4551 | 741 (skip-chars-forward skip-chars-arg end buffer))) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
742 (if failed |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
743 (values nil ranges) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
744 (values t nil))))) |
4145 | 745 |
4072 | 746 (defun make-8-bit-coding-system (name unicode-map &optional description props) |
747 "Make and return a fixed-width 8-bit CCL coding system named NAME. | |
748 NAME must be a symbol, and UNICODE-MAP a list. | |
749 | |
750 UNICODE-MAP is a plist describing a map from octets in the coding | |
751 system NAME (as integers) to XEmacs characters. Those XEmacs | |
752 characters will be used explicitly on decoding, but for encoding (most | |
753 relevantly, on writing to disk) XEmacs characters that map to the same | |
754 Unicode code point will be unified. This means that the ISO-8859-? | |
755 characters that map to the same Unicode code point will not be | |
756 distinct when written to disk, which is normally what is intended; it | |
757 also means that East Asian Han characters from different XEmacs | |
758 character sets will not be distinct when written to disk, which is | |
759 less often what is intended. | |
760 | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
761 Any octets not mapped, and with values above #x7f, will be decoded into |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
762 XEmacs characters that reflect that their values are undefined. These |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
763 characters will be displayed in a language-environment-specific way. See |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
764 `unicode-error-default-translation-table' and the |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
765 `invalid-sequence-coding-system' argument to `set-language-info'. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
766 |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
767 These characters will normally be treated as invalid when checking whether |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
768 text can be encoded with `query-coding-region'--see the |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
769 IGNORE-INVALID-SEQUENCESP argument to that function to avoid this. It is |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
770 possible to specify that octets with values less than #x80 (or indeed |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
771 greater than it) be treated in this way, by specifying explicitly that they |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
772 correspond to the character mapping to that octet in |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
773 `unicode-error-default-translation-table'. Far fewer coding systems |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
774 override the ASCII mapping, though, so this is not the default. |
4072 | 775 |
776 DESCRIPTION and PROPS are as in `make-coding-system', which see. This | |
777 function also accepts two additional (optional) properties in PROPS; | |
778 `aliases', giving a list of aliases to be initialized for this | |
779 coding-system, and `encode-failure-octet', an integer between 0 and 256 to | |
780 write in place of XEmacs characters that cannot be encoded, defaulting to | |
781 the code for tilde `~'. " | |
782 (check-argument-type #'symbolp name) | |
783 (check-argument-type #'listp unicode-map) | |
784 (check-argument-type #'stringp | |
785 (or description | |
786 (setq description | |
787 (format "Coding system used for %s." name)))) | |
788 (check-valid-plist props) | |
789 (let ((encode-failure-octet (or (plist-get props 'encode-failure-octet) | |
790 (char-to-int ?~))) | |
791 (aliases (plist-get props 'aliases)) | |
792 (hash-table-sym (gentemp (format "%s-encode-table" name))) | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
793 encode-program decode-program result decode-table encode-table |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
794 skip-chars invalid-sequences-skip-chars) |
4072 | 795 |
796 ;; Some more sanity checking. | |
797 (check-argument-range encode-failure-octet 0 #xFF) | |
798 (check-argument-type #'listp aliases) | |
799 | |
800 ;; Don't pass on our extra data to make-coding-system. | |
801 (setq props (plist-remprop props 'encode-failure-octet) | |
802 props (plist-remprop props 'aliases)) | |
803 | |
804 (multiple-value-setq | |
805 (decode-table encode-table) | |
806 (make-8-bit-create-decode-encode-tables unicode-map)) | |
807 | |
808 ;; Register the decode-table. | |
809 (define-translation-hash-table hash-table-sym encode-table) | |
810 | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
811 ;; Generate the programs and skip-chars strings. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
812 (setq decode-program (make-8-bit-generate-decode-program decode-table)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
813 (multiple-value-setq |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
814 (encode-program skip-chars invalid-sequences-skip-chars) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
815 (make-8-bit-generate-encode-program-and-skip-chars-strings |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
816 decode-table encode-table encode-failure-octet)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
817 |
4072 | 818 (unless (vectorp encode-program) |
819 (setq encode-program | |
820 (apply #'vector | |
821 (nsublis (list (cons 'encode-table-sym hash-table-sym)) | |
822 (copy-tree encode-program))))) | |
823 (unless (vectorp decode-program) | |
824 (setq decode-program | |
825 (apply #'vector decode-program))) | |
826 | |
827 ;; And now generate the actual coding system. | |
828 (setq result | |
829 (make-coding-system | |
830 name 'ccl | |
831 description | |
832 (plist-put (plist-put props 'decode decode-program) | |
833 'encode encode-program))) | |
4295 | 834 (coding-system-put name '8-bit-fixed t) |
4145 | 835 (coding-system-put name 'category |
836 (make-8-bit-choose-category decode-table)) | |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
837 (coding-system-put name '8-bit-fixed-query-skip-chars |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
838 skip-chars) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
839 (coding-system-put name '8-bit-fixed-invalid-sequences-skip-chars |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
840 invalid-sequences-skip-chars) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
841 (coding-system-put name '8-bit-fixed-query-from-unicode encode-table) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
842 (coding-system-put name 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
843 #'8-bit-fixed-query-coding-region) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
844 (coding-system-put (intern (format "%s-unix" name)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
845 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
846 #'8-bit-fixed-query-coding-region) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
847 (coding-system-put (intern (format "%s-dos" name)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
848 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
849 #'8-bit-fixed-query-coding-region) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
850 (coding-system-put (intern (format "%s-mac" name)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
851 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
852 #'8-bit-fixed-query-coding-region) |
4072 | 853 (loop for alias in aliases |
854 do (define-coding-system-alias alias name)) | |
855 result)) | |
856 | |
857 (define-compiler-macro make-8-bit-coding-system (&whole form name unicode-map | |
858 &optional description props) | |
859 ;; We provide the compiler macro (= macro that is expanded only on | |
860 ;; compilation, and that can punt to a runtime version of the | |
861 ;; associate function if necessary) not for reasons of speed, though | |
862 ;; it does speed up things at runtime a little, but because the | |
863 ;; Unicode mappings are available at compile time in the dumped | |
864 ;; files, but they are not available at run time for the vast | |
865 ;; majority of them. | |
866 | |
867 (if (not (and (and (consp name) (eq (car name) 'quote)) | |
868 (and (consp unicode-map) (eq (car unicode-map) 'quote)) | |
869 (and (or (and (consp props) (eq (car props) 'quote)) | |
870 (null props))))) | |
871 ;; The call does not use literals; do it at runtime. | |
872 form | |
873 (setq name (cadr name) | |
874 unicode-map (cadr unicode-map) | |
875 props (if props (cadr props))) | |
876 (let ((encode-failure-octet | |
877 (or (plist-get props 'encode-failure-octet) (char-to-int ?~))) | |
878 (aliases (plist-get props 'aliases)) | |
879 encode-program decode-program | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
880 decode-table encode-table |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
881 skip-chars invalid-sequences-skip-chars) |
4072 | 882 |
883 ;; Some sanity checking. | |
884 (check-argument-range encode-failure-octet 0 #xFF) | |
885 (check-argument-type #'listp aliases) | |
886 | |
887 ;; Don't pass on our extra data to make-coding-system. | |
888 (setq props (plist-remprop props 'encode-failure-octet) | |
889 props (plist-remprop props 'aliases)) | |
890 | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
891 ;; Work out encode-table and decode-table |
4072 | 892 (multiple-value-setq |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
893 (decode-table encode-table) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
894 (make-8-bit-create-decode-encode-tables unicode-map)) |
4072 | 895 |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
896 ;; Generate the decode and encode programs, and the skip-chars |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
897 ;; arguments. |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
898 (setq decode-program (make-8-bit-generate-decode-program decode-table)) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
899 (multiple-value-setq |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
900 (encode-program skip-chars invalid-sequences-skip-chars) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
901 (make-8-bit-generate-encode-program-and-skip-chars-strings |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
902 decode-table encode-table encode-failure-octet)) |
4072 | 903 |
904 ;; And return the generated code. | |
905 `(let ((encode-table-sym (gentemp (format "%s-encode-table" ',name))) | |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
906 (encode-table ,encode-table)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
907 (define-translation-hash-table encode-table-sym encode-table) |
4103 | 908 (make-coding-system |
909 ',name 'ccl ,description | |
910 (plist-put (plist-put ',props 'decode | |
911 ,(apply #'vector decode-program)) | |
912 'encode | |
913 (apply #'vector | |
914 (nsublis | |
915 (list (cons | |
916 'encode-table-sym | |
917 (symbol-value 'encode-table-sym))) | |
918 ',encode-program)))) | |
4295 | 919 (coding-system-put ',name '8-bit-fixed t) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
920 (coding-system-put ',name 'category |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
921 ',(make-8-bit-choose-category decode-table)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
922 (coding-system-put ',name '8-bit-fixed-query-skip-chars |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
923 ,skip-chars) |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
924 (coding-system-put ',name '8-bit-fixed-invalid-sequences-skip-chars |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
925 ,invalid-sequences-skip-chars) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
926 (coding-system-put ',name '8-bit-fixed-query-from-unicode encode-table) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
927 (coding-system-put ',name 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
928 #'8-bit-fixed-query-coding-region) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
929 (coding-system-put ',(intern (format "%s-unix" name)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
930 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
931 #'8-bit-fixed-query-coding-region) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
932 (coding-system-put ',(intern (format "%s-dos" name)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
933 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
934 #'8-bit-fixed-query-coding-region) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
935 (coding-system-put ',(intern (format "%s-mac" name)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
936 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4299
diff
changeset
|
937 #'8-bit-fixed-query-coding-region) |
4072 | 938 ,(macroexpand `(loop for alias in ',aliases |
939 do (define-coding-system-alias alias | |
940 ',name))) | |
4103 | 941 (find-coding-system ',name))))) |
4299 | 942 |
943 ;; Ideally this would be in latin.el, but code-init.el uses it. | |
944 (make-8-bit-coding-system | |
945 'iso-8859-1 | |
4604
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
946 (loop |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
947 for i from #x80 to #xff |
e0a8715fdb1f
Support new IGNORE-INVALID-SEQUENCESP argument, #'query-coding-region.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
948 collect (list i (int-char i))) ;; Identical to Latin-1. |
4299 | 949 "ISO-8859-1 (Latin-1)" |
950 '(mnemonic "Latin 1" | |
951 documentation "The most used encoding of Western Europe and the Americas." | |
952 aliases (iso-latin-1 latin-1))) |