Mercurial > hg > xemacs-beta
annotate lisp/unicode.el @ 4570:e6a7054a9c30
Add check-coding-systems-region, test it and others, fix some bugs.
tests/ChangeLog addition:
2008-12-28 Aidan Kehoe <kehoea@parhasard.net>
* automated/query-coding-tests.el:
Add tests for #'unencodable-char-position,
#'check-coding-systems-region, #'encode-coding-char. Remove some
debugging statements.
lisp/ChangeLog addition:
2008-12-28 Aidan Kehoe <kehoea@parhasard.net>
* coding.el (query-coding-region):
(query-coding-string):
Make these defsubsts, they're short enough and they're called
explicitly rarely enough that it make some sense. The alternative
would be compiler macros that avoid the binding of the arguments.
(unencodable-char-position):
Document where the docstring and API are from.
Correct a special case for zero--check-argument-type returns nil
when it succeeds, we can't usefully chain its result in an and
here.
(check-coding-systems-region): New. API taken from GNU; docstring
and implementation are independent.
(encode-coding-char):
Add an optional third argument, as used by recent GNU. Document
the origen of the docstring.
(default-query-coding-region): Add a short docstring to the
non-Mule implementation of this function.
* unicode.el:
Don't set the query-coding-function property for unicode coding
systems if we're on non-mule. Unintern
unicode-query-coding-region, unicode-query-coding-skip-chars-arg
in the same context.
author | Aidan Kehoe <kehoea@parhasard.net> |
---|---|
date | Sun, 28 Dec 2008 22:51:14 +0000 |
parents | 1d74a1d115ee |
children | e0a8715fdb1f |
rev | line source |
---|---|
771 | 1 ;;; unicode.el --- Unicode support -*- coding: iso-2022-7bit; -*- |
2 | |
778 | 3 ;; Copyright (C) 2001, 2002 Ben Wing. |
771 | 4 |
5 ;; Keywords: multilingual, Unicode | |
6 | |
7 ;; This file is part of XEmacs. | |
8 | |
9 ;; XEmacs is free software; you can redistribute it and/or modify it | |
10 ;; under the terms of the GNU General Public License as published by | |
11 ;; the Free Software Foundation; either version 2, or (at your option) | |
12 ;; any later version. | |
13 | |
14 ;; XEmacs is distributed in the hope that it will be useful, but | |
15 ;; WITHOUT ANY WARRANTY; without even the implied warranty of | |
16 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU | |
17 ;; General Public License for more details. | |
18 | |
19 ;; You should have received a copy of the GNU General Public License | |
20 ;; along with XEmacs; see the file COPYING. If not, write to the Free | |
21 ;; Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA | |
22 ;; 02111-1307, USA. | |
23 | |
24 ;;; Synched up with: Not in FSF. | |
25 | |
26 ;;; Commentary: | |
27 | |
28 ;; Lisp support for Unicode, e.g. initialize the translation tables. | |
29 | |
30 ;;; Code: | |
31 | |
3659 | 32 ;; GNU Emacs has the charsets: |
778 | 33 |
3659 | 34 ;; mule-unicode-2500-33ff |
35 ;; mule-unicode-e000-ffff | |
36 ;; mule-unicode-0100-24ff | |
778 | 37 |
3659 | 38 ;; built-in. This is hack--and an incomplete hack at that--against the |
39 ;; spirit and the letter of standard ISO 2022 character sets. Instead of | |
40 ;; this, we have the jit-ucs-charset-N Mule character sets, created in | |
41 ;; unicode.c on encountering a Unicode code point that we don't recognise, | |
42 ;; and saved in ISO 2022 coding systems using the UTF-8 escape described in | |
43 ;; ISO-IR 196. | |
778 | 44 |
4083 | 45 (eval-when-compile (when (featurep 'mule) (require 'ccl))) |
46 | |
2367 | 47 ;; accessed in loadup.el, mule-cmds.el; see discussion in unicode.c |
48 (defvar load-unicode-tables-at-dump-time (eq system-type 'windows-nt) | |
49 "[INTERNAL] Whether to load the Unicode tables at dump time. | |
50 Setting this at run-time does nothing.") | |
51 | |
771 | 52 ;; NOTE: This takes only a fraction of a second on my Pentium III |
53 ;; 700Mhz even with a totally optimization-disabled XEmacs. | |
54 (defun load-unicode-tables () | |
55 "Initialize the Unicode translation tables for all standard charsets." | |
780 | 56 (let ((parse-args |
57 '(("unicode/unicode-consortium" | |
877 | 58 ;; Due to the braindamaged way Mule treats the ASCII and Control-1 |
59 ;; charsets' types, trying to load them results in out-of-range | |
60 ;; warnings at unicode.c:1439. They're no-ops anyway, they're | |
61 ;; hardwired in unicode.c (unicode_to_ichar, ichar_to_unicode). | |
62 ;; ("8859-1.TXT" ascii #x00 #x7F #x0) | |
63 ;; ("8859-1.TXT" control-1 #x80 #x9F #x-80) | |
64 ;; The 8859-1.TXT G1 assignments are half no-ops, hardwired in | |
65 ;; unicode.c ichar_to_unicode, but not in unicode_to_ichar. | |
780 | 66 ("8859-1.TXT" latin-iso8859-1 #xA0 #xFF #x-80) |
67 ;; "8859-10.TXT" | |
68 ;; "8859-13.TXT" | |
69 ("8859-14.TXT" latin-iso8859-14 #xA0 #xFF #x-80) | |
70 ("8859-15.TXT" latin-iso8859-15 #xA0 #xFF #x-80) | |
2575 | 71 ("8859-16.TXT" latin-iso8859-16 #xA0 #xFF #x-80) |
780 | 72 ("8859-2.TXT" latin-iso8859-2 #xA0 #xFF #x-80) |
73 ("8859-3.TXT" latin-iso8859-3 #xA0 #xFF #x-80) | |
74 ("8859-4.TXT" latin-iso8859-4 #xA0 #xFF #x-80) | |
75 ("8859-5.TXT" cyrillic-iso8859-5 #xA0 #xFF #x-80) | |
76 ("8859-7.TXT" greek-iso8859-7 #xA0 #xFF #x-80) | |
77 ("8859-8.TXT" hebrew-iso8859-8 #xA0 #xFF #x-80) | |
78 ("8859-9.TXT" latin-iso8859-9 #xA0 #xFF #x-80) | |
79 ;; charset for Big5 does not matter; specifying `big5' will | |
80 ;; automatically make the right thing happen | |
81 ("BIG5.TXT" chinese-big5-1 nil nil nil big5) | |
82 ("CNS11643.TXT" chinese-cns11643-1 #x10000 #x1FFFF #x-10000) | |
83 ("CNS11643.TXT" chinese-cns11643-2 #x20000 #x2FFFF #x-20000) | |
84 ;; "CP1250.TXT" | |
85 ;; "CP1251.TXT" | |
86 ;; "CP1252.TXT" | |
87 ;; "CP1253.TXT" | |
88 ;; "CP1254.TXT" | |
89 ;; "CP1255.TXT" | |
90 ;; "CP1256.TXT" | |
91 ;; "CP1257.TXT" | |
92 ;; "CP1258.TXT" | |
93 ;; "CP874.TXT" | |
94 ;; "CP932.TXT" | |
95 ;; "CP936.TXT" | |
96 ;; "CP949.TXT" | |
97 ;; "CP950.TXT" | |
98 ;; "GB12345.TXT" | |
99 ("GB2312.TXT" chinese-gb2312) | |
2297 | 100 ;; "HANGUL.TXT" |
101 ;; #### shouldn't JIS X 0201's upper limit be 7f? | |
780 | 102 ("JIS0201.TXT" latin-jisx0201 #x21 #x80) |
103 ("JIS0201.TXT" katakana-jisx0201 #xA0 #xFF #x-80) | |
104 ("JIS0208.TXT" japanese-jisx0208 nil nil nil ignore-first-column) | |
105 ("JIS0212.TXT" japanese-jisx0212) | |
106 ;; "JOHAB.TXT" | |
107 ;; "KOI8-R.TXT" | |
108 ;; "KSC5601.TXT" | |
109 ;; note that KSC5601.TXT as currently distributed is NOT what | |
110 ;; it claims to be! see comments in KSX1001.TXT. | |
111 ("KSX1001.TXT" korean-ksc5601) | |
112 ;; "OLD5601.TXT" | |
113 ;; "SHIFTJIS.TXT" | |
114 ) | |
115 ("unicode/mule-ucs" | |
2297 | 116 ;; #### we don't support surrogates?!?? |
780 | 117 ;; use these instead of the above ones once we support surrogates |
118 ;;("chinese-cns11643-1.txt" chinese-cns11643-1) | |
119 ;;("chinese-cns11643-2.txt" chinese-cns11643-2) | |
120 ;;("chinese-cns11643-3.txt" chinese-cns11643-3) | |
121 ;;("chinese-cns11643-4.txt" chinese-cns11643-4) | |
122 ;;("chinese-cns11643-5.txt" chinese-cns11643-5) | |
123 ;;("chinese-cns11643-6.txt" chinese-cns11643-6) | |
124 ;;("chinese-cns11643-7.txt" chinese-cns11643-7) | |
125 ("chinese-sisheng.txt" chinese-sisheng) | |
126 ("ethiopic.txt" ethiopic) | |
127 ("indian-is13194.txt" indian-is13194) | |
128 ("ipa.txt" ipa) | |
129 ("thai-tis620.txt" thai-tis620) | |
130 ("tibetan.txt" tibetan) | |
131 ("vietnamese-viscii-lower.txt" vietnamese-viscii-lower) | |
132 ("vietnamese-viscii-upper.txt" vietnamese-viscii-upper) | |
133 ) | |
134 ("unicode/other" | |
135 ("lao.txt" lao) | |
136 ) | |
771 | 137 ))) |
780 | 138 (mapcar #'(lambda (tables) |
139 (let ((undir | |
140 (expand-file-name (car tables) data-directory))) | |
141 (mapcar #'(lambda (args) | |
1318 | 142 (apply 'load-unicode-mapping-table |
780 | 143 (expand-file-name (car args) undir) |
144 (cdr args))) | |
145 (cdr tables)))) | |
4145 | 146 parse-args) |
147 ;; The default-unicode-precedence-list. We set this here to default to | |
148 ;; *not* mapping various European characters to East Asian characters; | |
149 ;; otherwise the default-unicode-precedence-list is numerically ordered | |
150 ;; by charset ID. | |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
151 (declare-fboundp |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
152 (set-default-unicode-precedence-list |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
153 '(ascii control-1 latin-iso8859-1 latin-iso8859-2 latin-iso8859-15 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
154 greek-iso8859-7 hebrew-iso8859-8 ipa cyrillic-iso8859-5 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
155 latin-iso8859-16 latin-iso8859-3 latin-iso8859-4 latin-iso8859-9 |
4491
d402d7b18bd8
Revamp the Arabic support. Create greek-iso-8bit-with-esc, arabic-iso-8bit-with-esc.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4468
diff
changeset
|
156 vietnamese-viscii-lower vietnamese-viscii-upper |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
157 jit-ucs-charset-0 japanese-jisx0208 japanese-jisx0208-1978 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
158 japanese-jisx0212 japanese-jisx0213-1 japanese-jisx0213-2 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
159 chinese-gb2312 chinese-sisheng chinese-big5-1 chinese-big5-2 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
160 indian-is13194 korean-ksc5601 chinese-cns11643-1 chinese-cns11643-2 |
4491
d402d7b18bd8
Revamp the Arabic support. Create greek-iso-8bit-with-esc, arabic-iso-8bit-with-esc.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4468
diff
changeset
|
161 chinese-isoir165 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
162 composite ethiopic indian-1-column indian-2-column jit-ucs-charset-0 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
163 katakana-jisx0201 lao thai-tis620 thai-xtis tibetan tibetan-1-column |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
164 latin-jisx0201 chinese-cns11643-3 chinese-cns11643-4 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
165 chinese-cns11643-5 chinese-cns11643-6 chinese-cns11643-7))))) |
771 | 166 |
167 (make-coding-system | |
168 'utf-16 'unicode | |
169 "UTF-16" | |
170 '(mnemonic "UTF-16" | |
3767 | 171 documentation |
771 | 172 "UTF-16 Unicode encoding -- the standard (almost-) fixed-width |
173 two-byte encoding, with surrogates. It will be fixed-width if all | |
174 characters are in the BMP (Basic Multilingual Plane -- first 65536 | |
175 codepoints). Cannot represent characters with codepoints above | |
176 0x10FFFF (a little more than 1,000,000). Unicode and ISO guarantee | |
177 never to encode any characters outside this range -- all the rest are | |
178 for private, corporate or internal use." | |
3767 | 179 unicode-type utf-16)) |
771 | 180 |
2574 | 181 (define-coding-system-alias 'utf-16-be 'utf-16) |
182 | |
771 | 183 (make-coding-system |
184 'utf-16-bom 'unicode | |
185 "UTF-16 w/BOM" | |
186 '(mnemonic "UTF16-BOM" | |
3767 | 187 documentation |
771 | 188 "UTF-16 Unicode encoding with byte order mark (BOM) at the beginning. |
189 The BOM is Unicode character U+FEFF -- i.e. the first two bytes are | |
190 0xFE and 0xFF, respectively, or reversed in a little-endian | |
191 representation. It has been sanctioned by the Unicode Consortium for | |
192 use at the beginning of a Unicode stream as a marker of the byte order | |
193 of the stream, and commonly appears in Unicode files under Microsoft | |
194 Windows, where it also functions as a magic cookie identifying a | |
195 Unicode file. The character is called \"ZERO WIDTH NO-BREAK SPACE\" | |
196 and is suitable as a byte-order marker because: | |
197 | |
198 -- it has no displayable representation | |
199 -- due to its semantics it never normally appears at the beginning | |
200 of a stream | |
201 -- its reverse U+FFFE is not a legal Unicode character | |
202 -- neither byte sequence is at all likely in any other standard | |
203 encoding, particularly at the beginning of a stream | |
204 | |
205 This coding system will insert a BOM at the beginning of a stream when | |
206 writing and strip it off when reading." | |
3767 | 207 unicode-type utf-16 |
771 | 208 need-bom t)) |
209 | |
210 (make-coding-system | |
211 'utf-16-little-endian 'unicode | |
212 "UTF-16 Little Endian" | |
213 '(mnemonic "UTF16-LE" | |
214 documentation | |
215 "Little-endian version of UTF-16 Unicode encoding. | |
216 See `utf-16' coding system." | |
3767 | 217 unicode-type utf-16 |
771 | 218 little-endian t)) |
219 | |
2574 | 220 (define-coding-system-alias 'utf-16-le 'utf-16-little-endian) |
221 | |
771 | 222 (make-coding-system |
223 'utf-16-little-endian-bom 'unicode | |
224 "UTF-16 Little Endian w/BOM" | |
225 '(mnemonic "MSW-Unicode" | |
226 documentation | |
227 "Little-endian version of UTF-16 Unicode encoding, with byte order mark. | |
228 Standard encoding for representing Unicode under MS Windows. See | |
229 `utf-16-bom' coding system." | |
3767 | 230 unicode-type utf-16 |
771 | 231 little-endian t |
232 need-bom t)) | |
233 | |
234 (make-coding-system | |
235 'ucs-4 'unicode | |
236 "UCS-4" | |
237 '(mnemonic "UCS4" | |
238 documentation | |
239 "UCS-4 Unicode encoding -- fully fixed-width four-byte encoding." | |
3767 | 240 unicode-type ucs-4)) |
771 | 241 |
242 (make-coding-system | |
243 'ucs-4-little-endian 'unicode | |
244 "UCS-4 Little Endian" | |
245 '(mnemonic "UCS4-LE" | |
246 documentation | |
2297 | 247 ;; #### I don't think this is permitted by ISO 10646, only Unicode. |
248 ;; Call it UTF-32 instead? | |
771 | 249 "Little-endian version of UCS-4 Unicode encoding. See `ucs-4' coding system." |
3767 | 250 unicode-type ucs-4 |
771 | 251 little-endian t)) |
252 | |
253 (make-coding-system | |
4096 | 254 'utf-32 'unicode |
255 "UTF-32" | |
256 '(mnemonic "UTF32" | |
257 documentation | |
258 "UTF-32 Unicode encoding -- fixed-width four-byte encoding, | |
259 characters less than #x10FFFF are not supported. " | |
260 unicode-type utf-32)) | |
261 | |
262 (make-coding-system | |
263 'utf-32-little-endian 'unicode | |
264 "UTF-32 Little Endian" | |
265 '(mnemonic "UTF32-LE" | |
266 documentation | |
267 "Little-endian version of UTF-32 Unicode encoding. | |
268 | |
269 A fixed-width four-byte encoding, characters less than #x10FFFF are not | |
270 supported. " | |
271 unicode-type ucs-4 little-endian t)) | |
272 | |
273 (make-coding-system | |
771 | 274 'utf-8 'unicode |
275 "UTF-8" | |
276 '(mnemonic "UTF8" | |
3767 | 277 documentation " |
278 UTF-8 Unicode encoding -- ASCII-compatible 8-bit variable-width encoding | |
2297 | 279 sharing the following principles with the Mule-internal encoding: |
771 | 280 |
281 -- All ASCII characters (codepoints 0 through 127) are represented | |
282 by themselves (i.e. using one byte, with the same value as the | |
283 ASCII codepoint), and these bytes are disjoint from bytes | |
284 representing non-ASCII characters. | |
285 | |
286 This means that any 8-bit clean application can safely process | |
287 UTF-8-encoded text as it were ASCII, with no corruption (e.g. a | |
288 '/' byte is always a slash character, never the second byte of | |
289 some other character, as with Big5, so a pathname encoded in | |
290 UTF-8 can safely be split up into components and reassembled | |
291 again using standard ASCII processes). | |
292 | |
293 -- Leading bytes and non-leading bytes in the encoding of a | |
294 character are disjoint, so moving backwards is easy. | |
295 | |
296 -- Given only the leading byte, you know how many following bytes | |
297 are present. | |
298 " | |
3767 | 299 unicode-type utf-8)) |
771 | 300 |
985 | 301 (make-coding-system |
302 'utf-8-bom 'unicode | |
303 "UTF-8 w/BOM" | |
304 '(mnemonic "MSW-UTF8" | |
305 documentation | |
306 "UTF-8 Unicode encoding, with byte order mark. | |
307 Standard encoding for representing UTF-8 under MS Windows." | |
3767 | 308 unicode-type utf-8 |
985 | 309 little-endian t |
310 need-bom t)) | |
311 | |
2633 | 312 (defun decode-char (quote-ucs code &optional restriction) |
3659 | 313 "FSF compatibility--return Mule character with Unicode codepoint CODE. |
2633 | 314 The second argument must be 'ucs, the third argument is ignored. " |
4096 | 315 ;; We're prepared to accept invalid Unicode in unicode-to-char, but not in |
316 ;; this function, which is the API that should actually be used, since | |
317 ;; it's available in GNU and in Mule-UCS. | |
318 (check-argument-range code #x0 #x10FFFF) | |
3506 | 319 (assert (eq quote-ucs 'ucs) t |
2633 | 320 "Sorry, decode-char doesn't yet support anything but the UCS. ") |
321 (unicode-to-char code)) | |
322 | |
323 (defun encode-char (char quote-ucs &optional restriction) | |
3659 | 324 "FSF compatibility--return the Unicode code point of CHAR. |
2633 | 325 The second argument must be 'ucs, the third argument is ignored. " |
3506 | 326 (assert (eq quote-ucs 'ucs) t |
2633 | 327 "Sorry, encode-char doesn't yet support anything but the UCS. ") |
328 (char-to-unicode char)) | |
329 | |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
330 (defconst ccl-encode-to-ucs-2 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
331 (eval-when-compile |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
332 (let ((pre-existing |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
333 ;; This is the compiled CCL program from the assert |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
334 ;; below. Since this file is dumped and ccl.el isn't (and |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
335 ;; even when it was, it was dumped much later than this |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
336 ;; one), we can't compile the program at dump time. We can |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
337 ;; check at byte compile time that the program is as |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
338 ;; expected, though. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
339 [1 16 131127 7 98872 65823 1307 5 -65536 65313 64833 1028 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
340 147513 8 82009 255 22])) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
341 (when (featurep 'mule) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
342 ;; Check that the pre-existing constant reflects the intended |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
343 ;; CCL program. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
344 (assert |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
345 (equal pre-existing |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
346 (ccl-compile |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
347 `(1 |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
348 ( ;; mule-to-unicode's first argument is the |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
349 ;; charset ID, the second its first byte |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
350 ;; left shifted by 7 bits masked with its |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
351 ;; second byte. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
352 (r1 = (r1 << 7)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
353 (r1 = (r1 | r2)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
354 (mule-to-unicode r0 r1) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
355 (if (r0 & ,(lognot #xFFFF)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
356 ;; Redisplay looks in r1 and r2 for the first |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
357 ;; and second bytes of the X11 font, |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
358 ;; respectively. For non-BMP characters we |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
359 ;; display U+FFFD. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
360 ((r1 = #xFF) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
361 (r2 = #xFD)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
362 ((r1 = (r0 >> 8)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
363 (r2 = (r0 & #xFF)))))))) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
364 nil |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
365 "The pre-compiled CCL program appears broken. ")) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
366 pre-existing)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
367 "CCL program to transform Mule characters to UCS-2.") |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
368 |
3667 | 369 (when (featurep 'mule) |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
370 (put 'ccl-encode-to-ucs-2 'ccl-program-idx |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
371 (declare-fboundp |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
372 (register-ccl-program 'ccl-encode-to-ucs-2 ccl-encode-to-ucs-2)))) |
4145 | 373 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
374 ;; Now, create jit-ucs-charset-0 entries for those characters in Windows |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
375 ;; Glyph List 4 that would otherwise end up in East Asian character sets. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
376 ;; |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
377 ;; WGL4 is a character repertoire from Microsoft that gives a guideline |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
378 ;; for font implementors as to what characters are sufficient for |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
379 ;; pan-European support. The intention of this code is to avoid the |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
380 ;; situation where these characters end up mapping to East Asian XEmacs |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
381 ;; characters, which generally clash strongly with European characters |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
382 ;; both in font choice and character width; jit-ucs-charset-0 is a |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
383 ;; single-width character set which comes before the East Asian character |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
384 ;; sets in the default-unicode-precedence-list above. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
385 (loop for (ucs ascii-or-latin-1) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
386 in '((#x2013 ?-) ;; U+2013 EN DASH |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
387 (#x2014 ?-) ;; U+2014 EM DASH |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
388 (#x2105 ?%) ;; U+2105 CARE OF |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
389 (#x203e ?-) ;; U+203E OVERLINE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
390 (#x221f ?|) ;; U+221F RIGHT ANGLE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
391 (#x2584 ?|) ;; U+2584 LOWER HALF BLOCK |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
392 (#x2588 ?|) ;; U+2588 FULL BLOCK |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
393 (#x258c ?|) ;; U+258C LEFT HALF BLOCK |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
394 (#x2550 ?|) ;; U+2550 BOX DRAWINGS DOUBLE HORIZONTAL |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
395 (#x255e ?|) ;; U+255E BOX DRAWINGS VERTICAL SINGLE AND RIGHT DOUBLE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
396 (#x256a ?|) ;; U+256A BOX DRAWINGS VERTICAL SINGLE & HORIZONTAL DOUBLE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
397 (#x2561 ?|) ;; U+2561 BOX DRAWINGS VERTICAL SINGLE AND LEFT DOUBLE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
398 (#x2215 ?/) ;; U+2215 DIVISION SLASH |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
399 (#x02c9 ?`) ;; U+02C9 MODIFIER LETTER MACRON |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
400 (#x2211 ?s) ;; U+2211 N-ARY SUMMATION |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
401 (#x220f ?s) ;; U+220F N-ARY PRODUCT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
402 (#x2248 ?=) ;; U+2248 ALMOST EQUAL TO |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
403 (#x2264 ?=) ;; U+2264 LESS-THAN OR EQUAL TO |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
404 (#x2265 ?=) ;; U+2265 GREATER-THAN OR EQUAL TO |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
405 (#x201c ?') ;; U+201C LEFT DOUBLE QUOTATION MARK |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
406 (#x2026 ?.) ;; U+2026 HORIZONTAL ELLIPSIS |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
407 (#x2212 ?-) ;; U+2212 MINUS SIGN |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
408 (#x2260 ?=) ;; U+2260 NOT EQUAL TO |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
409 (#x221e ?=) ;; U+221E INFINITY |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
410 (#x2642 ?=) ;; U+2642 MALE SIGN |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
411 (#x2640 ?=) ;; U+2640 FEMALE SIGN |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
412 (#x2032 ?=) ;; U+2032 PRIME |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
413 (#x2033 ?=) ;; U+2033 DOUBLE PRIME |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
414 (#x25cb ?=) ;; U+25CB WHITE CIRCLE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
415 (#x25cf ?=) ;; U+25CF BLACK CIRCLE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
416 (#x25a1 ?=) ;; U+25A1 WHITE SQUARE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
417 (#x25a0 ?=) ;; U+25A0 BLACK SQUARE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
418 (#x25b2 ?=) ;; U+25B2 BLACK UP-POINTING TRIANGLE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
419 (#x25bc ?=) ;; U+25BC BLACK DOWN-POINTING TRIANGLE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
420 (#x2192 ?=) ;; U+2192 RIGHTWARDS ARROW |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
421 (#x2190 ?=) ;; U+2190 LEFTWARDS ARROW |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
422 (#x2191 ?=) ;; U+2191 UPWARDS ARROW |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
423 (#x2193 ?=) ;; U+2193 DOWNWARDS ARROW |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
424 (#x2229 ?=) ;; U+2229 INTERSECTION |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
425 (#x2202 ?=) ;; U+2202 PARTIAL DIFFERENTIAL |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
426 (#x2261 ?=) ;; U+2261 IDENTICAL TO |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
427 (#x221a ?=) ;; U+221A SQUARE ROOT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
428 (#x222b ?=) ;; U+222B INTEGRAL |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
429 (#x2030 ?=) ;; U+2030 PER MILLE SIGN |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
430 (#x266a ?=) ;; U+266A EIGHTH NOTE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
431 (#x2020 ?*) ;; U+2020 DAGGER |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
432 (#x2021 ?*) ;; U+2021 DOUBLE DAGGER |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
433 (#x2500 ?|) ;; U+2500 BOX DRAWINGS LIGHT HORIZONTAL |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
434 (#x2502 ?|) ;; U+2502 BOX DRAWINGS LIGHT VERTICAL |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
435 (#x250c ?|) ;; U+250C BOX DRAWINGS LIGHT DOWN AND RIGHT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
436 (#x2510 ?|) ;; U+2510 BOX DRAWINGS LIGHT DOWN AND LEFT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
437 (#x2518 ?|) ;; U+2518 BOX DRAWINGS LIGHT UP AND LEFT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
438 (#x2514 ?|) ;; U+2514 BOX DRAWINGS LIGHT UP AND RIGHT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
439 (#x251c ?|) ;; U+251C BOX DRAWINGS LIGHT VERTICAL AND RIGHT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
440 (#x252c ?|) ;; U+252C BOX DRAWINGS LIGHT DOWN AND HORIZONTAL |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
441 (#x2524 ?|) ;; U+2524 BOX DRAWINGS LIGHT VERTICAL AND LEFT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
442 (#x2534 ?|) ;; U+2534 BOX DRAWINGS LIGHT UP AND HORIZONTAL |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
443 (#x253c ?|) ;; U+253C BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
444 (#x02da ?^) ;; U+02DA RING ABOVE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
445 (#x2122 ?\xa9) ;; U+2122 TRADE MARK SIGN, ?,A)(B |
4145 | 446 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
447 (#x0132 ?\xe6) ;; U+0132 LATIN CAPITAL LIGATURE IJ, ?,Af(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
448 (#x013f ?\xe6) ;; U+013F LATIN CAPITAL LETTER L WITH MIDDLE DOT, ?,Af(B |
4145 | 449 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
450 (#x0133 ?\xe6) ;; U+0133 LATIN SMALL LIGATURE IJ, ?,Af(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
451 (#x0140 ?\xe6) ;; U+0140 LATIN SMALL LETTER L WITH MIDDLE DOT, ?,Af(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
452 (#x0149 ?\xe6) ;; U+0149 LATIN SMALL LETTER N PRECEDED BY APOSTROPH,?,Af(B |
4145 | 453 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
454 (#x2194 ?|) ;; U+2194 LEFT RIGHT ARROW |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
455 (#x2660 ?*) ;; U+2660 BLACK SPADE SUIT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
456 (#x2665 ?*) ;; U+2665 BLACK HEART SUIT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
457 (#x2663 ?*) ;; U+2663 BLACK CLUB SUIT |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
458 (#x2592 ?|) ;; U+2592 MEDIUM SHADE |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
459 (#x2195 ?|) ;; U+2195 UP DOWN ARROW |
4145 | 460 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
461 (#x2113 ?\xb9) ;; U+2113 SCRIPT SMALL L, ?,A9(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
462 (#x215b ?\xbe) ;; U+215B VULGAR FRACTION ONE EIGHTH, ?,A>(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
463 (#x215c ?\xbe) ;; U+215C VULGAR FRACTION THREE EIGHTHS, ?,A>(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
464 (#x215d ?\xbe) ;; U+215D VULGAR FRACTION FIVE EIGHTHS, ?,A>(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
465 (#x215e ?\xbe) ;; U+215E VULGAR FRACTION SEVEN EIGHTHS, ?,A>(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
466 (#x207f ?\xbe) ;; U+207F SUPERSCRIPT LATIN SMALL LETTER N, ?,A>(B |
4145 | 467 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
468 ;; These are not in WGL 4, but are IPA characters that should not |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
469 ;; be double width. They are the only IPA characters that both |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
470 ;; occur in packages/mule-packages/leim/ipa.el and end up in East |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
471 ;; Asian character sets when that file is loaded in an XEmacs |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
472 ;; without packages. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
473 (#x2197 ?|) ;; U+2197 NORTH EAST ARROW |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
474 (#x2199 ?|) ;; U+2199 SOUTH WEST ARROW |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
475 (#x2191 ?|) ;; U+2191 UPWARDS ARROW |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
476 (#x207f ?\xb9)) ;; U+207F SUPERSCRIPT LATIN SMALL LETTER N, ?,A9(B |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
477 with decoded = nil |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
478 with syntax-table = (standard-syntax-table) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
479 initially (unless (featurep 'mule) (return)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
480 ;; This creates jit-ucs-charset-0 entries because: |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
481 ;; |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
482 ;; 1. If the tables are dumped, it is run at dump time before they are |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
483 ;; dumped, and as such before the relevant conversions are available |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
484 ;; (they are made available in mule/general-late.el). |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
485 ;; |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
486 ;; 2. If the tables are not dumped, it is run at dump time, long before |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
487 ;; any of the other mappings are available. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
488 ;; |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
489 do |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
490 (setq decoded (decode-char 'ucs ucs)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
491 (assert (eq (declare-fboundp (char-charset decoded)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
492 'jit-ucs-charset-0) nil |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
493 "Unexpected Unicode decoding behavior. ") |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
494 (modify-syntax-entry decoded |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
495 (string |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
496 (char-syntax ascii-or-latin-1)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
497 syntax-table)) |
4145 | 498 |
4268 | 499 ;; *Sigh*, declarations needs to be at the start of the line to be picked up |
500 ;; by make-docfile. Not so much an issue with ccl-encode-to-ucs-2, which we | |
501 ;; don't necessarily want to advertise, but the following are important. | |
502 | |
503 ;; Create all the Unicode error sequences, normally as jit-ucs-charset-0 | |
504 ;; characters starting at U+200000 (which isn't a valid Unicode code | |
505 ;; point). Make them available to user code. | |
506 (defvar unicode-error-default-translation-table | |
507 (loop | |
4468
a78d697ccd2c
Import and extend GNU's descr-text.el, supporting prefix argument for C-x =
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
508 with char-table = (make-char-table 'generic) |
4268 | 509 for i from ?\x00 to ?\xFF |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
510 initially (unless (featurep 'mule) (return)) |
4268 | 511 do |
512 (put-char-table (aref | |
513 ;; #xd800 is the first leading surrogate; | |
514 ;; trailing surrogates must be in the range | |
515 ;; #xdc00-#xdfff. These examples are not, so we | |
516 ;; intentionally provoke an error sequence. | |
517 (decode-coding-string (format "\xd8\x00\x00%c" i) | |
518 'utf-16-be) | |
519 3) | |
520 i | |
521 char-table) | |
522 finally return char-table) | |
523 "Translation table mapping Unicode error sequences to Latin-1 chars. | |
4145 | 524 |
4202 | 525 To transform XEmacs Unicode error sequences to the Latin-1 characters that |
526 correspond to the octets on disk, you can use this variable. ") | |
4145 | 527 |
4490
67fbcaf3dbdc
error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents:
4489
diff
changeset
|
528 (defvar unicode-invalid-sequence-regexp-range |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
529 (and (featurep 'mule) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
530 (format "%c%c-%c" |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
531 (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 0) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
532 (aref (decode-coding-string "\xd8\x00\x00\x00" 'utf-16-be) 3) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
533 (aref (decode-coding-string "\xd8\x00\x00\xFF" 'utf-16-be) 3))) |
4268 | 534 "Regular expression range to match Unicode error sequences in XEmacs. |
4145 | 535 |
4202 | 536 Invalid Unicode sequences on input are represented as XEmacs |
537 characters with values stored as the keys in | |
538 `unicode-error-default-translation-table', one character for each | |
539 invalid octet. You can use this variable (with `re-search-forward' or | |
540 `skip-chars-forward') to search for such characters; see also | |
541 `unicode-error-translate-region'. ") | |
542 | |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
543 ;; Check that the lookup table is correct, and that all the actual error |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
544 ;; sequences are caught by the regexp. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
545 (with-temp-buffer |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
546 (loop |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
547 for i from ?\x00 to ?\xFF |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
548 with to-check = (make-string 20 ?\x20) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
549 initially (unless (featurep 'mule) (return)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
550 do |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
551 (delete-region (point-min) (point-max)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
552 (insert to-check) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
553 (goto-char 10) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
554 (insert (decode-coding-string (format "\xd8\x00\x00%c" i) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
555 'utf-16-be)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
556 (backward-char) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
557 (assert (= i (get-char-table (char-after (point)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
558 unicode-error-default-translation-table)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
559 (format "Char ?\\x%x not the expected error sequence!" |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
560 i)) |
4202 | 561 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
562 (goto-char (point-min)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
563 ;; Comment out until the issue in |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
564 ;; 18179.49815.622843.336527@parhasard.net is fixed. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
565 (assert t ; (re-search-forward (concat "[" |
4490
67fbcaf3dbdc
error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents:
4489
diff
changeset
|
566 ; unicode-invalid-sequence-regexp-range |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
567 ; "]")) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
568 nil |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
569 (format "Could not find char ?\\x%x in buffer" i)))) |
4202 | 570 |
4268 | 571 (defun frob-unicode-errors-region (frob-function begin end &optional buffer) |
572 "Call FROB-FUNCTION on the Unicode error sequences between BEGIN and END. | |
4202 | 573 |
574 Optional argument BUFFER specifies the buffer that should be examined for | |
575 such sequences. " | |
4268 | 576 (check-argument-type #'functionp frob-function) |
577 (check-argument-range begin (point-min buffer) (point-max buffer)) | |
578 (check-argument-range end (point-min buffer) (point-max buffer)) | |
4202 | 579 (save-excursion |
580 (save-restriction | |
581 (if buffer (set-buffer buffer)) | |
582 (narrow-to-region begin end) | |
583 (goto-char (point-min)) | |
584 (while end | |
585 (setq begin | |
586 (progn | |
587 (skip-chars-forward | |
4490
67fbcaf3dbdc
error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents:
4489
diff
changeset
|
588 (concat "^" unicode-invalid-sequence-regexp-range)) |
4202 | 589 (point)) |
590 end (and (not (= (point) (point-max))) | |
591 (progn | |
592 (skip-chars-forward | |
4490
67fbcaf3dbdc
error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents:
4489
diff
changeset
|
593 unicode-invalid-sequence-regexp-range) |
4202 | 594 (point)))) |
595 (if end | |
596 (funcall frob-function begin end)))))) | |
597 | |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
598 (defun unicode-error-translate-region (begin end &optional buffer table) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
599 "Translate the Unicode error sequences in BUFFER between BEGIN and END. |
4202 | 600 |
601 The error sequences are transformed, by default, into the ASCII, | |
602 control-1 and latin-iso8859-1 characters with the numeric values | |
603 corresponding to the incorrect octets encountered. This is achieved | |
604 by using `unicode-error-default-translation-table' (which see) for | |
605 TABLE; you can change this by supplying another character table, | |
606 mapping from the error sequences to the desired characters. " | |
607 (unless table (setq table unicode-error-default-translation-table)) | |
608 (frob-unicode-errors-region | |
609 (lambda (start finish) | |
610 (translate-region start finish table)) | |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
611 begin end buffer)) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
612 |
4489
b75b075a9041
Support displaying invalid UTF-8 in language-environment-specific ways.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4468
diff
changeset
|
613 ;; Sure would be nice to be able to use defface here. |
4490
67fbcaf3dbdc
error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents:
4489
diff
changeset
|
614 (copy-face 'highlight 'unicode-invalid-sequence-warning-face) |
4489
b75b075a9041
Support displaying invalid UTF-8 in language-environment-specific ways.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4468
diff
changeset
|
615 |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
616 (defvar unicode-query-coding-skip-chars-arg nil ;; Set in general-late.el |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
617 "Used by `unicode-query-coding-region' to skip chars with known mappings.") |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
618 |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
619 (defun unicode-query-coding-region (begin end coding-system |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
620 &optional buffer errorp highlightp) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
621 "The `query-coding-region' implementation for Unicode coding systems." |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
622 (check-argument-type #'coding-system-p |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
623 (setq coding-system (find-coding-system coding-system))) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
624 (check-argument-type #'integer-or-marker-p begin) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
625 (check-argument-type #'integer-or-marker-p end) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
626 (let* ((skip-chars-arg unicode-query-coding-skip-chars-arg) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
627 (ranges (make-range-table)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
628 (looking-at-arg (concat "[" skip-chars-arg "]")) |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4566
diff
changeset
|
629 fail-range-start fail-range-end char-after failed |
4551 | 630 extent) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
631 (save-excursion |
4551 | 632 (when highlightp |
633 (map-extents #'(lambda (extent ignored-arg) | |
634 (when (eq 'query-coding-warning-face | |
635 (extent-face extent)) | |
636 (delete-extent extent))) buffer begin end)) | |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
637 (goto-char begin buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
638 (skip-chars-forward skip-chars-arg end buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
639 (while (< (point buffer) end) |
4551 | 640 ; (message |
4568
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4566
diff
changeset
|
641 ; "fail-range-start is %S, point is %S, end is %S" |
1d74a1d115ee
Add #'query-coding-region tests; do the work necessary to get them running.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4566
diff
changeset
|
642 ; fail-range-start (point buffer) end) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
643 (setq char-after (char-after (point buffer) buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
644 fail-range-start (point buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
645 (while (and |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
646 (< (point buffer) end) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
647 (not (looking-at looking-at-arg)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
648 (= -1 (char-to-unicode char-after))) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
649 (forward-char 1 buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
650 (setq char-after (char-after (point buffer) buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
651 failed t)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
652 (if (= fail-range-start (point buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
653 ;; The character can actually be encoded by the coding |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
654 ;; system; check the characters past it. |
4551 | 655 (forward-char 1 buffer) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
656 ;; Can't be encoded; note this. |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
657 (when errorp |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
658 (error 'text-conversion-error |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
659 (format "Cannot encode %s using coding system" |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
660 (buffer-substring fail-range-start (point buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
661 buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
662 (coding-system-name coding-system))) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
663 (put-range-table fail-range-start |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
664 ;; If char-after is non-nil, we're not at |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
665 ;; the end of the buffer. |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
666 (setq fail-range-end (if char-after |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
667 (point buffer) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
668 (point-max buffer))) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
669 t ranges) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
670 (when highlightp |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
671 (setq extent (make-extent fail-range-start fail-range-end buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
672 (set-extent-priority extent (+ mouse-highlight-priority 2)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
673 (set-extent-face extent 'query-coding-warning-face))) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
674 (skip-chars-forward skip-chars-arg end buffer)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
675 (if failed |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
676 (values nil ranges) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
677 (values t nil))))) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
678 |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
679 (loop |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
680 for coding-system in (coding-system-list) |
4570
e6a7054a9c30
Add check-coding-systems-region, test it and others, fix some bugs.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
681 initially (unless (featurep 'mule) (return)) |
4549
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
682 do (when (eq 'unicode (coding-system-type coding-system)) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
683 (coding-system-put coding-system 'query-coding-function |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
684 #'unicode-query-coding-region))) |
68d1ca56cffa
First part of interactive checks that coding systems encode regions.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4317
diff
changeset
|
685 |
4317
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
686 (unless (featurep 'mule) |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
687 ;; We do this in such a roundabout way--instead of having the above defun |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
688 ;; and defvar calls inside a (when (featurep 'mule) ...) form--to have |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
689 ;; make-docfile.c pick up symbol and function documentation correctly. An |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
690 ;; alternative approach would be to fix make-docfile.c to be able to read |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
691 ;; Lisp. |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
692 (mapcar #'unintern |
15d36164ebd7
Eliminate lost docstring warnings on 21.5.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4268
diff
changeset
|
693 '(ccl-encode-to-ucs-2 unicode-error-default-translation-table |
4490
67fbcaf3dbdc
error-sequence -> invalid-sequence
Aidan Kehoe <kehoea@parhasard.net>
parents:
4489
diff
changeset
|
694 unicode-invalid-regexp-range frob-unicode-errors-region |
4570
e6a7054a9c30
Add check-coding-systems-region, test it and others, fix some bugs.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
695 unicode-error-translate-region unicode-query-coding-region |
e6a7054a9c30
Add check-coding-systems-region, test it and others, fix some bugs.
Aidan Kehoe <kehoea@parhasard.net>
parents:
4568
diff
changeset
|
696 unicode-query-coding-skip-chars-arg))) |
3667 | 697 |
771 | 698 ;; #### UTF-7 is not yet implemented, and it's tricky to do. There's |
699 ;; an implementation in appendix A.1 of the Unicode Standard, Version | |
700 ;; 2.0, but I don't know its licensing characteristics. | |
701 | |
702 ; (make-coding-system | |
703 ; 'utf-7 'unicode | |
704 ; "UTF-7" | |
705 ; '(mnemonic "UTF7" | |
3659 | 706 ; documentation; "UTF-7 Unicode encoding -- 7-bit-ASCII modal Internet-mail-compatible |
771 | 707 ; encoding especially designed for headers, with the following |
708 ; properties: | |
709 | |
710 ; -- Only characters that are considered safe for passing through any mail | |
711 ; gateway without damage are used. | |
712 | |
713 ; -- This is a modal encoding, with two states. The first, default | |
714 ; state encodes the most common Unicode characters (upper and | |
715 ; lowercase letters, digits, and 9 common punctuation marks) as | |
716 ; themselves, and the second state, entered using '+' and | |
717 ; terminated with '-' or any character disallowed in state 2, | |
718 ; encodes any Unicode characters by first converting to UTF-16, | |
719 ; most significant byte first, and then to a slightly modified | |
720 ; Base64 encoding. (Thus, UTF-7 has the same limitations on the | |
721 ; characters it can encode as UTF-16.) | |
722 | |
723 ; -- The modified Base64 encoding deviates from standard Base64 in | |
724 ; that it omits the `=' pad character. This is eliminated so as to | |
725 ; avoid conflicts with the use of `=' as an escape in the | |
726 ; Quoted-Printable encoding and the related Q encoding for headers: | |
727 ; With this modification, non-whitespace chars in UTF-7 will be | |
728 ; represented in Quoted-Printable and in Q as-is, with no further | |
729 ; encoding. | |
730 | |
731 ; For more information, see Appendix A.1 of The Unicode Standard 2.0, or | |
732 ; wherever it is in v3.0." | |
3767 | 733 ; unicode-type utf-7)) |