annotate lisp/unicode.el @ 3767:6b2ef948e140

[xemacs-hg @ 2006-12-29 18:09:38 by aidan] etc/ChangeLog addition: 2006-12-21 Aidan Kehoe <kehoea@parhasard.net> * unicode/unicode-consortium/8859-7.TXT: Update the mapping to the 2003 version of ISO 8859-7. lisp/ChangeLog addition: 2006-12-21 Aidan Kehoe <kehoea@parhasard.net> * mule/cyrillic.el: * mule/cyrillic.el (iso-8859-5): * mule/cyrillic.el (cyrillic-koi8-r-encode-table): Add syntax, case support for Cyrillic; make some parentheses more Lispy. * mule/european.el: Content moved to latin.el, file deleted. * mule/general-late.el: If Unicode tables are to be loaded at dump time, do it here, not in loadup.el. * mule/greek.el: Add syntax, case support for Greek. * mule/latin.el: Move the content of european.el here. Change the case table mappings to use hexadecimal codes, to make cross reference to the standards easier. In all cases, take character syntax from similar characters in Latin-1 , rather than deciding separately what syntax they should take. Add (incomplete) support for case with Turkish. Remove description of the character sets used from the language environments' doc strings, since now that we create variant language environments on the fly, such descriptions will often be inaccurate. Set the native-coding-system language info property while setting the other coding-system properties of the language. * mule/misc-lang.el (ipa): Remove the language environment. The International Phonetic _Alphabet_ is not a language, it's inane to have a corresponding language environment in XEmacs. * mule/mule-cmds.el (create-variant-language-environment): Also modify the coding-priority when creating a new language environment; document that. * mule/mule-cmds.el (get-language-environment-from-locale): Recognise that the 'native-coding-system language-info property can be a list, interpret it correctly when it is one. 2006-12-21 Aidan Kehoe <kehoea@parhasard.net> * coding.el (coding-system-category): Use the new 'unicode-type property for finding what sort of Unicode coding system subtype a coding system is, instead of the overshadowed 'type property. * dumped-lisp.el (preloaded-file-list): mule/european.el has been removed. * loadup.el (really-early-error-handler): Unicode tables loaded at dump time are now in mule/general-late.el. * simple.el (count-lines): Add some backslashes to to parentheses in docstrings to help fontification along. * simple.el (what-cursor-position): Wrap a line to fit in 80 characters. * unicode.el: Use the 'unicode-type property, not 'type, for setting the Unicode coding-system subtype. src/ChangeLog addition: 2006-12-21 Aidan Kehoe <kehoea@parhasard.net> * file-coding.c: Update the make-coding-system docstring to reflect unicode-type * general-slots.h: New symbol, unicode-type, since 'type was being overridden when accessing a coding system's Unicode subtype. * intl-win32.c: Backslash a few parentheses, to help fontification along. * intl-win32.c (complex_vars_of_intl_win32): Use the 'unicode-type symbol, not 'type, when creating the Microsoft Unicode coding system. * unicode.c (unicode_putprop): * unicode.c (unicode_getprop): * unicode.c (unicode_print): Using 'type as the property name when working out what Unicode subtype a given coding system is was broken, since there's a general coding system property called 'type. Change the former to use 'unicode-type instead.
author aidan
date Fri, 29 Dec 2006 18:09:51 +0000
parents 4c8ad140bcec
children aa28d959af41
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1 ;;; unicode.el --- Unicode support -*- coding: iso-2022-7bit; -*-
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
3 ;; Copyright (C) 2001, 2002 Ben Wing.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
4
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
5 ;; Keywords: multilingual, Unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
6
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
7 ;; This file is part of XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
8
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
9 ;; XEmacs is free software; you can redistribute it and/or modify it
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
10 ;; under the terms of the GNU General Public License as published by
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
11 ;; the Free Software Foundation; either version 2, or (at your option)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
12 ;; any later version.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
13
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
14 ;; XEmacs is distributed in the hope that it will be useful, but
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
15 ;; WITHOUT ANY WARRANTY; without even the implied warranty of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
16 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
17 ;; General Public License for more details.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
18
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
19 ;; You should have received a copy of the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
20 ;; along with XEmacs; see the file COPYING. If not, write to the Free
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
21 ;; Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
22 ;; 02111-1307, USA.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
23
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
24 ;;; Synched up with: Not in FSF.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
25
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
26 ;;; Commentary:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
27
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
28 ;; Lisp support for Unicode, e.g. initialize the translation tables.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
29
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
30 ;;; Code:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
31
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
32 ;; GNU Emacs has the charsets:
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
33
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
34 ;; mule-unicode-2500-33ff
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
35 ;; mule-unicode-e000-ffff
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
36 ;; mule-unicode-0100-24ff
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
37
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
38 ;; built-in. This is hack--and an incomplete hack at that--against the
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
39 ;; spirit and the letter of standard ISO 2022 character sets. Instead of
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
40 ;; this, we have the jit-ucs-charset-N Mule character sets, created in
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
41 ;; unicode.c on encountering a Unicode code point that we don't recognise,
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
42 ;; and saved in ISO 2022 coding systems using the UTF-8 escape described in
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
43 ;; ISO-IR 196.
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
44
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
45 ;; accessed in loadup.el, mule-cmds.el; see discussion in unicode.c
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
46 (defvar load-unicode-tables-at-dump-time (eq system-type 'windows-nt)
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
47 "[INTERNAL] Whether to load the Unicode tables at dump time.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
48 Setting this at run-time does nothing.")
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
49
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
50 ;; NOTE: This takes only a fraction of a second on my Pentium III
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
51 ;; 700Mhz even with a totally optimization-disabled XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
52 (defun load-unicode-tables ()
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
53 "Initialize the Unicode translation tables for all standard charsets."
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
54 (let ((parse-args
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
55 '(("unicode/unicode-consortium"
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
56 ;; Due to the braindamaged way Mule treats the ASCII and Control-1
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
57 ;; charsets' types, trying to load them results in out-of-range
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
58 ;; warnings at unicode.c:1439. They're no-ops anyway, they're
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
59 ;; hardwired in unicode.c (unicode_to_ichar, ichar_to_unicode).
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
60 ;; ("8859-1.TXT" ascii #x00 #x7F #x0)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
61 ;; ("8859-1.TXT" control-1 #x80 #x9F #x-80)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
62 ;; The 8859-1.TXT G1 assignments are half no-ops, hardwired in
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
63 ;; unicode.c ichar_to_unicode, but not in unicode_to_ichar.
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
64 ("8859-1.TXT" latin-iso8859-1 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
65 ;; "8859-10.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
66 ;; "8859-13.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
67 ("8859-14.TXT" latin-iso8859-14 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
68 ("8859-15.TXT" latin-iso8859-15 #xA0 #xFF #x-80)
2575
e71117a6ddac [xemacs-hg @ 2005-02-09 15:29:07 by aidan]
aidan
parents: 2574
diff changeset
69 ("8859-16.TXT" latin-iso8859-16 #xA0 #xFF #x-80)
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
70 ("8859-2.TXT" latin-iso8859-2 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
71 ("8859-3.TXT" latin-iso8859-3 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
72 ("8859-4.TXT" latin-iso8859-4 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
73 ("8859-5.TXT" cyrillic-iso8859-5 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
74 ("8859-6.TXT" arabic-iso8859-6 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
75 ("8859-7.TXT" greek-iso8859-7 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
76 ("8859-8.TXT" hebrew-iso8859-8 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
77 ("8859-9.TXT" latin-iso8859-9 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
78 ;; charset for Big5 does not matter; specifying `big5' will
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
79 ;; automatically make the right thing happen
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
80 ("BIG5.TXT" chinese-big5-1 nil nil nil big5)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
81 ("CNS11643.TXT" chinese-cns11643-1 #x10000 #x1FFFF #x-10000)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
82 ("CNS11643.TXT" chinese-cns11643-2 #x20000 #x2FFFF #x-20000)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
83 ;; "CP1250.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
84 ;; "CP1251.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
85 ;; "CP1252.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
86 ;; "CP1253.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
87 ;; "CP1254.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
88 ;; "CP1255.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
89 ;; "CP1256.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
90 ;; "CP1257.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
91 ;; "CP1258.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
92 ;; "CP874.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
93 ;; "CP932.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
94 ;; "CP936.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
95 ;; "CP949.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
96 ;; "CP950.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
97 ;; "GB12345.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
98 ("GB2312.TXT" chinese-gb2312)
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
99 ;; "HANGUL.TXT"
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
100 ;; #### shouldn't JIS X 0201's upper limit be 7f?
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
101 ("JIS0201.TXT" latin-jisx0201 #x21 #x80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
102 ("JIS0201.TXT" katakana-jisx0201 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
103 ("JIS0208.TXT" japanese-jisx0208 nil nil nil ignore-first-column)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
104 ("JIS0212.TXT" japanese-jisx0212)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
105 ;; "JOHAB.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
106 ;; "KOI8-R.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
107 ;; "KSC5601.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
108 ;; note that KSC5601.TXT as currently distributed is NOT what
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
109 ;; it claims to be! see comments in KSX1001.TXT.
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
110 ("KSX1001.TXT" korean-ksc5601)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
111 ;; "OLD5601.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
112 ;; "SHIFTJIS.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
113 )
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
114 ("unicode/mule-ucs"
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
115 ;; #### we don't support surrogates?!??
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
116 ;; use these instead of the above ones once we support surrogates
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
117 ;;("chinese-cns11643-1.txt" chinese-cns11643-1)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
118 ;;("chinese-cns11643-2.txt" chinese-cns11643-2)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
119 ;;("chinese-cns11643-3.txt" chinese-cns11643-3)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
120 ;;("chinese-cns11643-4.txt" chinese-cns11643-4)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
121 ;;("chinese-cns11643-5.txt" chinese-cns11643-5)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
122 ;;("chinese-cns11643-6.txt" chinese-cns11643-6)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
123 ;;("chinese-cns11643-7.txt" chinese-cns11643-7)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
124 ("chinese-sisheng.txt" chinese-sisheng)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
125 ("ethiopic.txt" ethiopic)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
126 ("indian-is13194.txt" indian-is13194)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
127 ("ipa.txt" ipa)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
128 ("thai-tis620.txt" thai-tis620)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
129 ("tibetan.txt" tibetan)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
130 ("vietnamese-viscii-lower.txt" vietnamese-viscii-lower)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
131 ("vietnamese-viscii-upper.txt" vietnamese-viscii-upper)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
132 )
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
133 ("unicode/other"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
134 ("lao.txt" lao)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
135 )
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
136 )))
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
137 (mapcar #'(lambda (tables)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
138 (let ((undir
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
139 (expand-file-name (car tables) data-directory)))
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
140 (mapcar #'(lambda (args)
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 985
diff changeset
141 (apply 'load-unicode-mapping-table
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
142 (expand-file-name (car args) undir)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
143 (cdr args)))
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
144 (cdr tables))))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
145 parse-args)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
146
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
147 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
148 'utf-16 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
149 "UTF-16"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
150 '(mnemonic "UTF-16"
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
151 documentation
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
152 "UTF-16 Unicode encoding -- the standard (almost-) fixed-width
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
153 two-byte encoding, with surrogates. It will be fixed-width if all
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
154 characters are in the BMP (Basic Multilingual Plane -- first 65536
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
155 codepoints). Cannot represent characters with codepoints above
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
156 0x10FFFF (a little more than 1,000,000). Unicode and ISO guarantee
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
157 never to encode any characters outside this range -- all the rest are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
158 for private, corporate or internal use."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
159 unicode-type utf-16))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
160
2574
5e2653bc0ab0 [xemacs-hg @ 2005-02-08 23:59:50 by aidan]
aidan
parents: 2367
diff changeset
161 (define-coding-system-alias 'utf-16-be 'utf-16)
5e2653bc0ab0 [xemacs-hg @ 2005-02-08 23:59:50 by aidan]
aidan
parents: 2367
diff changeset
162
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
163 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
164 'utf-16-bom 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
165 "UTF-16 w/BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
166 '(mnemonic "UTF16-BOM"
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
167 documentation
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
168 "UTF-16 Unicode encoding with byte order mark (BOM) at the beginning.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
169 The BOM is Unicode character U+FEFF -- i.e. the first two bytes are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
170 0xFE and 0xFF, respectively, or reversed in a little-endian
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
171 representation. It has been sanctioned by the Unicode Consortium for
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
172 use at the beginning of a Unicode stream as a marker of the byte order
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
173 of the stream, and commonly appears in Unicode files under Microsoft
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
174 Windows, where it also functions as a magic cookie identifying a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
175 Unicode file. The character is called \"ZERO WIDTH NO-BREAK SPACE\"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
176 and is suitable as a byte-order marker because:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
177
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
178 -- it has no displayable representation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
179 -- due to its semantics it never normally appears at the beginning
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
180 of a stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
181 -- its reverse U+FFFE is not a legal Unicode character
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
182 -- neither byte sequence is at all likely in any other standard
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
183 encoding, particularly at the beginning of a stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
184
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
185 This coding system will insert a BOM at the beginning of a stream when
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
186 writing and strip it off when reading."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
187 unicode-type utf-16
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
188 need-bom t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
189
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
190 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
191 'utf-16-little-endian 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
192 "UTF-16 Little Endian"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
193 '(mnemonic "UTF16-LE"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
194 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
195 "Little-endian version of UTF-16 Unicode encoding.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
196 See `utf-16' coding system."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
197 unicode-type utf-16
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
198 little-endian t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
199
2574
5e2653bc0ab0 [xemacs-hg @ 2005-02-08 23:59:50 by aidan]
aidan
parents: 2367
diff changeset
200 (define-coding-system-alias 'utf-16-le 'utf-16-little-endian)
5e2653bc0ab0 [xemacs-hg @ 2005-02-08 23:59:50 by aidan]
aidan
parents: 2367
diff changeset
201
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
202 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
203 'utf-16-little-endian-bom 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
204 "UTF-16 Little Endian w/BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
205 '(mnemonic "MSW-Unicode"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
206 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
207 "Little-endian version of UTF-16 Unicode encoding, with byte order mark.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
208 Standard encoding for representing Unicode under MS Windows. See
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
209 `utf-16-bom' coding system."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
210 unicode-type utf-16
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
211 little-endian t
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
212 need-bom t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
213
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
214 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
215 'ucs-4 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
216 "UCS-4"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
217 '(mnemonic "UCS4"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
218 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
219 "UCS-4 Unicode encoding -- fully fixed-width four-byte encoding."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
220 unicode-type ucs-4))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
221
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
222 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
223 'ucs-4-little-endian 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
224 "UCS-4 Little Endian"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
225 '(mnemonic "UCS4-LE"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
226 documentation
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
227 ;; #### I don't think this is permitted by ISO 10646, only Unicode.
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
228 ;; Call it UTF-32 instead?
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
229 "Little-endian version of UCS-4 Unicode encoding. See `ucs-4' coding system."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
230 unicode-type ucs-4
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
231 little-endian t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
232
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
233 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
234 'utf-8 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
235 "UTF-8"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
236 '(mnemonic "UTF8"
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
237 documentation "
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
238 UTF-8 Unicode encoding -- ASCII-compatible 8-bit variable-width encoding
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
239 sharing the following principles with the Mule-internal encoding:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
240
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
241 -- All ASCII characters (codepoints 0 through 127) are represented
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
242 by themselves (i.e. using one byte, with the same value as the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
243 ASCII codepoint), and these bytes are disjoint from bytes
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
244 representing non-ASCII characters.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
245
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
246 This means that any 8-bit clean application can safely process
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
247 UTF-8-encoded text as it were ASCII, with no corruption (e.g. a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
248 '/' byte is always a slash character, never the second byte of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
249 some other character, as with Big5, so a pathname encoded in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
250 UTF-8 can safely be split up into components and reassembled
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
251 again using standard ASCII processes).
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
252
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
253 -- Leading bytes and non-leading bytes in the encoding of a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
254 character are disjoint, so moving backwards is easy.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
255
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
256 -- Given only the leading byte, you know how many following bytes
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
257 are present.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
258 "
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
259 unicode-type utf-8))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
260
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
261 (make-coding-system
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
262 'utf-8-bom 'unicode
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
263 "UTF-8 w/BOM"
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
264 '(mnemonic "MSW-UTF8"
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
265 documentation
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
266 "UTF-8 Unicode encoding, with byte order mark.
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
267 Standard encoding for representing UTF-8 under MS Windows."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
268 unicode-type utf-8
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
269 little-endian t
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
270 need-bom t))
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
271
2633
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
272 (defun decode-char (quote-ucs code &optional restriction)
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
273 "FSF compatibility--return Mule character with Unicode codepoint CODE.
2633
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
274 The second argument must be 'ucs, the third argument is ignored. "
3506
bcc2611d4cfc [xemacs-hg @ 2006-07-13 20:45:48 by aidan]
aidan
parents: 3439
diff changeset
275 (assert (eq quote-ucs 'ucs) t
2633
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
276 "Sorry, decode-char doesn't yet support anything but the UCS. ")
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
277 (unicode-to-char code))
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
278
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
279 (defun encode-char (char quote-ucs &optional restriction)
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
280 "FSF compatibility--return the Unicode code point of CHAR.
2633
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
281 The second argument must be 'ucs, the third argument is ignored. "
3506
bcc2611d4cfc [xemacs-hg @ 2006-07-13 20:45:48 by aidan]
aidan
parents: 3439
diff changeset
282 (assert (eq quote-ucs 'ucs) t
2633
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
283 "Sorry, encode-char doesn't yet support anything but the UCS. ")
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
284 (char-to-unicode char))
b921e3d0ac3e [xemacs-hg @ 2005-03-04 21:59:42 by aidan]
aidan
parents: 2575
diff changeset
285
3667
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
286 (when (featurep 'mule)
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
287 ;; This CCL program is used for displaying the fallback UCS character set,
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
288 ;; and can be repurposed to lao and the IPA, all going well.
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
289 ;;
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
290 ;; define-ccl-program is available after mule-ccl is loaded, much later
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
291 ;; than this file in the build process. The below is the result of
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
292 ;;
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
293 ;; (macroexpand
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
294 ;; '(define-ccl-program ccl-encode-to-ucs-2
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
295 ;; `(1
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
296 ;; ((r1 = (r1 << 8))
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
297 ;; (r1 = (r1 | r2))
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
298 ;; (mule-to-unicode r0 r1)
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
299 ;; (r1 = (r0 >> 8))
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
300 ;; (r2 = (r0 & 255))))
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
301 ;; "CCL program to transform Mule characters to UCS-2."))
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
302 ;;
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
303 ;; and it should occasionally be confirmed that the correspondence still
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
304 ;; holds.
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
305
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
306 (let ((prog [1 10 131127 8 98872 65823 147513 8 82009 255 22]))
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
307 (defconst ccl-encode-to-ucs-2 prog
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
308 "CCL program to transform Mule characters to UCS-2.")
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
309 (put (quote ccl-encode-to-ucs-2) (quote ccl-program-idx)
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
310 (register-ccl-program (quote ccl-encode-to-ucs-2) prog)) nil))
4c8ad140bcec [xemacs-hg @ 2006-11-07 18:51:21 by aidan]
aidan
parents: 3666
diff changeset
311
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
312 ;; #### UTF-7 is not yet implemented, and it's tricky to do. There's
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
313 ;; an implementation in appendix A.1 of the Unicode Standard, Version
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
314 ;; 2.0, but I don't know its licensing characteristics.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
315
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
316 ; (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
317 ; 'utf-7 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
318 ; "UTF-7"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
319 ; '(mnemonic "UTF7"
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3506
diff changeset
320 ; documentation; "UTF-7 Unicode encoding -- 7-bit-ASCII modal Internet-mail-compatible
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
321 ; encoding especially designed for headers, with the following
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
322 ; properties:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
323
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
324 ; -- Only characters that are considered safe for passing through any mail
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
325 ; gateway without damage are used.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
326
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
327 ; -- This is a modal encoding, with two states. The first, default
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
328 ; state encodes the most common Unicode characters (upper and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
329 ; lowercase letters, digits, and 9 common punctuation marks) as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
330 ; themselves, and the second state, entered using '+' and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
331 ; terminated with '-' or any character disallowed in state 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
332 ; encodes any Unicode characters by first converting to UTF-16,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
333 ; most significant byte first, and then to a slightly modified
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
334 ; Base64 encoding. (Thus, UTF-7 has the same limitations on the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
335 ; characters it can encode as UTF-16.)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
336
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
337 ; -- The modified Base64 encoding deviates from standard Base64 in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
338 ; that it omits the `=' pad character. This is eliminated so as to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
339 ; avoid conflicts with the use of `=' as an escape in the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
340 ; Quoted-Printable encoding and the related Q encoding for headers:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
341 ; With this modification, non-whitespace chars in UTF-7 will be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
342 ; represented in Quoted-Printable and in Q as-is, with no further
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
343 ; encoding.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
344
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
345 ; For more information, see Appendix A.1 of The Unicode Standard 2.0, or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
346 ; wherever it is in v3.0."
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3667
diff changeset
347 ; unicode-type utf-7))