annotate lisp/unicode.el @ 2417:8b907450718f

[xemacs-hg @ 2004-12-05 08:48:12 by ben] The section on Troubleshooting (now 2.3) has been completely written and includes a lot of stuff that is not properly documented anywhere else. A fair amount of obsolete info has been deleted and I've incorporated the comments that people (mostly Stephen T) made. Former chapter 3 has been split up in two, one pertaining to basic I/O and the other to external I/O. What were formerly chapters 5 and 6 no longer exist as such; the info in them has been distributed across various other chapters. Old chapter 4 got split up, part going to the new chapter 4 on external I/O and part going to the new chapter 5 on the Internet. In this new chapter, stuff not pertaining to a specific package (e.g. VM or GNUS) was taken out of package-specific sections and a general mail section was constituted. Part of old chapter 5 remains in a new chapter 6 devoted to Emacs Lisp and other advanced stuff, and a section from old chapter 3 on basic init-file Lisp and some stuff from old chapter 5 on Info. The rest of chapter 5 was just misc and has gotten scattered to the winds (mostly in chapters 3 and 4). Old chapter 6 has also gotten quite scattered; there is no longer any section specifically devoted to Windows except one of the Installation sections (along with a section specfically devoted to Unix), and the rest has moved to join the appropriate non-Windows-specific section elsewhere. A lot of chapters had their sections rearranged and likewise for sections having entries rearranged, with the intention that the new arrangement should be more natural. In general I hope that stuff should be much easier to locate. I also rewrote the entries on the relation between XEmacs and GNU Emacs on the authors of XEmacs, including lots of info on who wrote specific subsections. However, this history is certainly not complete; I hope people will look over this and fix it up as necessary.
author ben
date Sun, 05 Dec 2004 08:48:12 +0000
parents ecf1ebac70d8
children 5e2653bc0ab0
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1 ;;; unicode.el --- Unicode support -*- coding: iso-2022-7bit; -*-
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
3 ;; Copyright (C) 2001, 2002 Ben Wing.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
4
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
5 ;; Keywords: multilingual, Unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
6
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
7 ;; This file is part of XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
8
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
9 ;; XEmacs is free software; you can redistribute it and/or modify it
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
10 ;; under the terms of the GNU General Public License as published by
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
11 ;; the Free Software Foundation; either version 2, or (at your option)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
12 ;; any later version.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
13
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
14 ;; XEmacs is distributed in the hope that it will be useful, but
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
15 ;; WITHOUT ANY WARRANTY; without even the implied warranty of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
16 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
17 ;; General Public License for more details.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
18
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
19 ;; You should have received a copy of the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
20 ;; along with XEmacs; see the file COPYING. If not, write to the Free
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
21 ;; Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
22 ;; 02111-1307, USA.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
23
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
24 ;;; Synched up with: Not in FSF.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
25
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
26 ;;; Commentary:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
27
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
28 ;; Lisp support for Unicode, e.g. initialize the translation tables.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
29
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
30 ;;; Code:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
31
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
32 ; ;; Subsets of Unicode.
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
33
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
34 ; #### what is this bogosity ... "chars 96, final ?2" !!?!
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
35 ; (make-charset 'mule-unicode-2500-33ff
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
36 ; "Unicode characters of the range U+2500..U+33FF."
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
37 ; '(dimension
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
38 ; 2
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
39 ; registry "ISO10646-1"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
40 ; chars 96
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
41 ; columns 1
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
42 ; direction l2r
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
43 ; final ?2
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
44 ; graphic 0
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
45 ; short-name "Unicode subset 2"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
46 ; long-name "Unicode subset (U+2500..U+33FF)"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
47 ; ))
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
48
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
49
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
50 ; (make-charset 'mule-unicode-e000-ffff
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
51 ; "Unicode characters of the range U+E000..U+FFFF."
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
52 ; '(dimension
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
53 ; 2
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
54 ; registry "ISO10646-1"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
55 ; chars 96
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
56 ; columns 1
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
57 ; direction l2r
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
58 ; final ?3
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
59 ; graphic 0
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
60 ; short-name "Unicode subset 3"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
61 ; long-name "Unicode subset (U+E000+FFFF)"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
62 ; ))
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
63
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
64
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
65 ; (make-charset 'mule-unicode-0100-24ff
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
66 ; "Unicode characters of the range U+0100..U+24FF."
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
67 ; '(dimension
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
68 ; 2
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
69 ; registry "ISO10646-1"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
70 ; chars 96
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
71 ; columns 1
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
72 ; direction l2r
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
73 ; final ?1
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
74 ; graphic 0
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
75 ; short-name "Unicode subset"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
76 ; long-name "Unicode subset (U+0100..U+24FF)"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
77 ; ))
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
78
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
79
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
80 ;; accessed in loadup.el, mule-cmds.el; see discussion in unicode.c
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
81 (defvar load-unicode-tables-at-dump-time (eq system-type 'windows-nt)
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
82 "[INTERNAL] Whether to load the Unicode tables at dump time.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
83 Setting this at run-time does nothing.")
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2297
diff changeset
84
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
85 ;; NOTE: This takes only a fraction of a second on my Pentium III
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
86 ;; 700Mhz even with a totally optimization-disabled XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
87 (defun load-unicode-tables ()
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
88 "Initialize the Unicode translation tables for all standard charsets."
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
89 (let ((parse-args
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
90 '(("unicode/unicode-consortium"
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
91 ;; Due to the braindamaged way Mule treats the ASCII and Control-1
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
92 ;; charsets' types, trying to load them results in out-of-range
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
93 ;; warnings at unicode.c:1439. They're no-ops anyway, they're
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
94 ;; hardwired in unicode.c (unicode_to_ichar, ichar_to_unicode).
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
95 ;; ("8859-1.TXT" ascii #x00 #x7F #x0)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
96 ;; ("8859-1.TXT" control-1 #x80 #x9F #x-80)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
97 ;; The 8859-1.TXT G1 assignments are half no-ops, hardwired in
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
98 ;; unicode.c ichar_to_unicode, but not in unicode_to_ichar.
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
99 ("8859-1.TXT" latin-iso8859-1 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
100 ;; "8859-10.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
101 ;; "8859-13.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
102 ("8859-14.TXT" latin-iso8859-14 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
103 ("8859-15.TXT" latin-iso8859-15 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
104 ("8859-2.TXT" latin-iso8859-2 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
105 ("8859-3.TXT" latin-iso8859-3 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
106 ("8859-4.TXT" latin-iso8859-4 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
107 ("8859-5.TXT" cyrillic-iso8859-5 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
108 ("8859-6.TXT" arabic-iso8859-6 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
109 ("8859-7.TXT" greek-iso8859-7 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
110 ("8859-8.TXT" hebrew-iso8859-8 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
111 ("8859-9.TXT" latin-iso8859-9 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
112 ;; charset for Big5 does not matter; specifying `big5' will
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
113 ;; automatically make the right thing happen
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
114 ("BIG5.TXT" chinese-big5-1 nil nil nil big5)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
115 ("CNS11643.TXT" chinese-cns11643-1 #x10000 #x1FFFF #x-10000)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
116 ("CNS11643.TXT" chinese-cns11643-2 #x20000 #x2FFFF #x-20000)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
117 ;; "CP1250.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
118 ;; "CP1251.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
119 ;; "CP1252.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
120 ;; "CP1253.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
121 ;; "CP1254.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
122 ;; "CP1255.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
123 ;; "CP1256.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
124 ;; "CP1257.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
125 ;; "CP1258.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
126 ;; "CP874.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
127 ;; "CP932.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
128 ;; "CP936.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
129 ;; "CP949.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
130 ;; "CP950.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
131 ;; "GB12345.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
132 ("GB2312.TXT" chinese-gb2312)
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
133 ;; "HANGUL.TXT"
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
134 ;; #### shouldn't JIS X 0201's upper limit be 7f?
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
135 ("JIS0201.TXT" latin-jisx0201 #x21 #x80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
136 ("JIS0201.TXT" katakana-jisx0201 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
137 ("JIS0208.TXT" japanese-jisx0208 nil nil nil ignore-first-column)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
138 ("JIS0212.TXT" japanese-jisx0212)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
139 ;; "JOHAB.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
140 ;; "KOI8-R.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
141 ;; "KSC5601.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
142 ;; note that KSC5601.TXT as currently distributed is NOT what
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
143 ;; it claims to be! see comments in KSX1001.TXT.
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
144 ("KSX1001.TXT" korean-ksc5601)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
145 ;; "OLD5601.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
146 ;; "SHIFTJIS.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
147 )
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
148 ("unicode/mule-ucs"
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
149 ;; #### we don't support surrogates?!??
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
150 ;; use these instead of the above ones once we support surrogates
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
151 ;;("chinese-cns11643-1.txt" chinese-cns11643-1)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
152 ;;("chinese-cns11643-2.txt" chinese-cns11643-2)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
153 ;;("chinese-cns11643-3.txt" chinese-cns11643-3)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
154 ;;("chinese-cns11643-4.txt" chinese-cns11643-4)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
155 ;;("chinese-cns11643-5.txt" chinese-cns11643-5)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
156 ;;("chinese-cns11643-6.txt" chinese-cns11643-6)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
157 ;;("chinese-cns11643-7.txt" chinese-cns11643-7)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
158 ("chinese-sisheng.txt" chinese-sisheng)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
159 ("ethiopic.txt" ethiopic)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
160 ("indian-is13194.txt" indian-is13194)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
161 ("ipa.txt" ipa)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
162 ("thai-tis620.txt" thai-tis620)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
163 ("tibetan.txt" tibetan)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
164 ("vietnamese-viscii-lower.txt" vietnamese-viscii-lower)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
165 ("vietnamese-viscii-upper.txt" vietnamese-viscii-upper)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
166 )
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
167 ("unicode/other"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
168 ("lao.txt" lao)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
169 )
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
170 )))
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
171 (mapcar #'(lambda (tables)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
172 (let ((undir
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
173 (expand-file-name (car tables) data-directory)))
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
174 (mapcar #'(lambda (args)
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 985
diff changeset
175 (apply 'load-unicode-mapping-table
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
176 (expand-file-name (car args) undir)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
177 (cdr args)))
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
178 (cdr tables))))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
179 parse-args)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
180
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
181 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
182 'utf-16 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
183 "UTF-16"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
184 '(mnemonic "UTF-16"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
185 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
186 "UTF-16 Unicode encoding -- the standard (almost-) fixed-width
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
187 two-byte encoding, with surrogates. It will be fixed-width if all
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
188 characters are in the BMP (Basic Multilingual Plane -- first 65536
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
189 codepoints). Cannot represent characters with codepoints above
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
190 0x10FFFF (a little more than 1,000,000). Unicode and ISO guarantee
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
191 never to encode any characters outside this range -- all the rest are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
192 for private, corporate or internal use."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
193 type utf-16))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
194
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
195 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
196 'utf-16-bom 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
197 "UTF-16 w/BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
198 '(mnemonic "UTF16-BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
199 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
200 "UTF-16 Unicode encoding with byte order mark (BOM) at the beginning.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
201 The BOM is Unicode character U+FEFF -- i.e. the first two bytes are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
202 0xFE and 0xFF, respectively, or reversed in a little-endian
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
203 representation. It has been sanctioned by the Unicode Consortium for
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
204 use at the beginning of a Unicode stream as a marker of the byte order
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
205 of the stream, and commonly appears in Unicode files under Microsoft
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
206 Windows, where it also functions as a magic cookie identifying a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
207 Unicode file. The character is called \"ZERO WIDTH NO-BREAK SPACE\"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
208 and is suitable as a byte-order marker because:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
209
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
210 -- it has no displayable representation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
211 -- due to its semantics it never normally appears at the beginning
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
212 of a stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
213 -- its reverse U+FFFE is not a legal Unicode character
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
214 -- neither byte sequence is at all likely in any other standard
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
215 encoding, particularly at the beginning of a stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
216
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
217 This coding system will insert a BOM at the beginning of a stream when
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
218 writing and strip it off when reading."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
219 type utf-16
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
220 need-bom t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
221
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
222 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
223 'utf-16-little-endian 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
224 "UTF-16 Little Endian"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
225 '(mnemonic "UTF16-LE"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
226 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
227 "Little-endian version of UTF-16 Unicode encoding.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
228 See `utf-16' coding system."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
229 type utf-16
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
230 little-endian t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
231
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
232 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
233 'utf-16-little-endian-bom 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
234 "UTF-16 Little Endian w/BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
235 '(mnemonic "MSW-Unicode"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
236 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
237 "Little-endian version of UTF-16 Unicode encoding, with byte order mark.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
238 Standard encoding for representing Unicode under MS Windows. See
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
239 `utf-16-bom' coding system."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
240 type utf-16
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
241 little-endian t
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
242 need-bom t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
243
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
244 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
245 'ucs-4 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
246 "UCS-4"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
247 '(mnemonic "UCS4"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
248 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
249 "UCS-4 Unicode encoding -- fully fixed-width four-byte encoding."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
250 type ucs-4))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
251
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
252 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
253 'ucs-4-little-endian 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
254 "UCS-4 Little Endian"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
255 '(mnemonic "UCS4-LE"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
256 documentation
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
257 ;; #### I don't think this is permitted by ISO 10646, only Unicode.
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
258 ;; Call it UTF-32 instead?
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
259 "Little-endian version of UCS-4 Unicode encoding. See `ucs-4' coding system."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
260 type ucs-4
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
261 little-endian t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
262
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
263 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
264 'utf-8 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
265 "UTF-8"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
266 '(mnemonic "UTF8"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
267 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
268 "UTF-8 Unicode encoding -- ASCII-compatible 8-bit variable-width encoding
2297
13a418960a88 [xemacs-hg @ 2004-09-22 02:05:42 by stephent]
stephent
parents: 1318
diff changeset
269 sharing the following principles with the Mule-internal encoding:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
270
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
271 -- All ASCII characters (codepoints 0 through 127) are represented
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
272 by themselves (i.e. using one byte, with the same value as the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
273 ASCII codepoint), and these bytes are disjoint from bytes
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
274 representing non-ASCII characters.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
275
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
276 This means that any 8-bit clean application can safely process
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
277 UTF-8-encoded text as it were ASCII, with no corruption (e.g. a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
278 '/' byte is always a slash character, never the second byte of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
279 some other character, as with Big5, so a pathname encoded in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
280 UTF-8 can safely be split up into components and reassembled
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
281 again using standard ASCII processes).
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
282
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
283 -- Leading bytes and non-leading bytes in the encoding of a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
284 character are disjoint, so moving backwards is easy.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
285
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
286 -- Given only the leading byte, you know how many following bytes
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
287 are present.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
288 "
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
289 type utf-8))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
290
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
291 (make-coding-system
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
292 'utf-8-bom 'unicode
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
293 "UTF-8 w/BOM"
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
294 '(mnemonic "MSW-UTF8"
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
295 documentation
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
296 "UTF-8 Unicode encoding, with byte order mark.
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
297 Standard encoding for representing UTF-8 under MS Windows."
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
298 type utf-8
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
299 little-endian t
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
300 need-bom t))
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
301
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
302 ;; #### UTF-7 is not yet implemented, and it's tricky to do. There's
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
303 ;; an implementation in appendix A.1 of the Unicode Standard, Version
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
304 ;; 2.0, but I don't know its licensing characteristics.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
305
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
306 ; (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
307 ; 'utf-7 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
308 ; "UTF-7"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
309 ; '(mnemonic "UTF7"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
310 ; documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
311 ; "UTF-7 Unicode encoding -- 7-bit-ASCII modal Internet-mail-compatible
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
312 ; encoding especially designed for headers, with the following
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
313 ; properties:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
314
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
315 ; -- Only characters that are considered safe for passing through any mail
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
316 ; gateway without damage are used.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
317
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
318 ; -- This is a modal encoding, with two states. The first, default
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
319 ; state encodes the most common Unicode characters (upper and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
320 ; lowercase letters, digits, and 9 common punctuation marks) as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
321 ; themselves, and the second state, entered using '+' and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
322 ; terminated with '-' or any character disallowed in state 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
323 ; encodes any Unicode characters by first converting to UTF-16,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
324 ; most significant byte first, and then to a slightly modified
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
325 ; Base64 encoding. (Thus, UTF-7 has the same limitations on the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
326 ; characters it can encode as UTF-16.)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
327
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
328 ; -- The modified Base64 encoding deviates from standard Base64 in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
329 ; that it omits the `=' pad character. This is eliminated so as to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
330 ; avoid conflicts with the use of `=' as an escape in the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
331 ; Quoted-Printable encoding and the related Q encoding for headers:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
332 ; With this modification, non-whitespace chars in UTF-7 will be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
333 ; represented in Quoted-Printable and in Q as-is, with no further
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
334 ; encoding.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
335
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
336 ; For more information, see Appendix A.1 of The Unicode Standard 2.0, or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
337 ; wherever it is in v3.0."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
338 ; type utf-7))