annotate lisp/unicode.el @ 1318:b531bf8658e9

[xemacs-hg @ 2003-02-21 06:56:46 by ben] redisplay fixes et al. PROBLEMS: Add comment about Cygwin, unexec and sysmalloc. Move some non-general stuff out of general. Make a section for x86. configure.in: Add check for broken alloca in funcalls. mule/mule-cmds.el: Alias file-name to native not vice-versa. Do set EOL of native but not of process output to fix various problems and be consistent with code-init.el. code-cmds.el: Return a name not a coding system. code-init.el: Reindent. Remove `file-name' since it should always be the same as native. unicode.el: Rename to load-unicode-mapping-table as suggested by the anonymous (but rather Turnbullian) comment in unicode.c. xemacs.dsp: Add /k to default build. alloc.c: Make gc_currently_forbidden static. config.h.in, lisp.h: Move some stuff to lisp.h. console-gtk.h, console-impl.h, console-msw.h, console-x.h, event-Xt.c, event-msw.c, redisplay-gtk.c, redisplay-msw.c, redisplay-output.c, redisplay-x.c, gtk-xemacs.c: Remove duplicated code to redraw exposed area. Add deadbox method needed by the generalized redraw code. Defer redrawing if already in redisplay. frame-msw.c, event-stream.c, frame.c: Add comments about calling Lisp. debug.c, general-slots.h: Move generalish symbols to general-slots.h. doprnt.c: reindent. lisp.h, dynarr.c: Add debug code for locking a dynarr to catch invalid mods. Use in redisplay.c. eval.c: file-coding.c: Define file-name as alias for native not vice-versa. frame-gtk.c, frame-x.c: Move Qwindow_id to general-slots. dialog-msw.c, glyphs-gtk.c, glyphs-msw.c, glyphs-widget.c, glyphs-x.c, gui.c, gui.h, menubar-msw.c, menubar.c: Ensure that various glyph functions that eval within redisplay protect the evals. Same for calls to internal_equal(). Modify various functions, e.g. gui_item_*(), to protect evals within redisplay, taking an in_redisplay parameter if it's possible for them to be called both inside and outside of redisplay. gutter.c: Defer specifier-changed updating till after redisplay, if necessary, since we need to enter redisplay to do it. gutter.c: Do nothing if in redisplay. lisp.h: Add version of alloca() for use in function calls. lisp.h: Add XCAD[D+]R up to 6 D's, and aliases X1ST, X2ND, etc. frame.c, frame.h, redisplay.c, redisplay.h, signal.c, toolbar.c: Redo critical-section code and move from frame.c to redisplay.c. Require that every place inside of redisplay catch errors itself, not at the edge of the critical section (thereby bypassing the rest of redisplay and leaving things in an inconsistent state). Introduce separate means of holding frame-size changes without entering a complete critical section. Introduce "post-redisplay" methods for deferring things till after redisplay. Abort if we enter redisplay reentrantly. Disable all quit checking in redisplay since it's too dangerous. Ensure that all calls to QUIT trigger an abort if unprotected. redisplay.c, scrollbar-gtk.c, scrollbar-x.c, scrollbar.c: Create enter/exit_redisplay_critical_section_maybe() for code that needs to ensure it's in a critical section but doesn't interfere with an existing critical section. sysdep.c: Use _wexecve() when under Windows NT for Unicode correctness. text.c, text.h: Add new_dfc() functions, which return an alloca()ed value rather than requiring an lvalue. (Not really used yet; used in another workspace, to come.) Add some macros for SIZED_EXTERNAL. Update the encoding aliases after involved scrutinization of the X manual. unicode.c: Answer the anonymous but suspiciously Turnbullian questions. Rename parse-unicode-translation-table to load-unicode-mapping-table, as suggested.
author ben
date Fri, 21 Feb 2003 06:57:21 +0000
parents 7f62a956b825
children 13a418960a88
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1 ;;; unicode.el --- Unicode support -*- coding: iso-2022-7bit; -*-
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
3 ;; Copyright (C) 2001, 2002 Ben Wing.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
4
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
5 ;; Keywords: multilingual, Unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
6
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
7 ;; This file is part of XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
8
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
9 ;; XEmacs is free software; you can redistribute it and/or modify it
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
10 ;; under the terms of the GNU General Public License as published by
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
11 ;; the Free Software Foundation; either version 2, or (at your option)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
12 ;; any later version.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
13
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
14 ;; XEmacs is distributed in the hope that it will be useful, but
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
15 ;; WITHOUT ANY WARRANTY; without even the implied warranty of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
16 ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
17 ;; General Public License for more details.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
18
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
19 ;; You should have received a copy of the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
20 ;; along with XEmacs; see the file COPYING. If not, write to the Free
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
21 ;; Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
22 ;; 02111-1307, USA.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
23
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
24 ;;; Synched up with: Not in FSF.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
25
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
26 ;;; Commentary:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
27
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
28 ;; Lisp support for Unicode, e.g. initialize the translation tables.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
29
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
30 ;;; Code:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
31
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
32 ; ;; Subsets of Unicode.
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
33
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
34 ; (make-charset 'mule-unicode-2500-33ff
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
35 ; "Unicode characters of the range U+2500..U+33FF."
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
36 ; '(dimension
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
37 ; 2
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
38 ; registry "ISO10646-1"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
39 ; chars 96
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
40 ; columns 1
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
41 ; direction l2r
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
42 ; final ?2
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
43 ; graphic 0
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
44 ; short-name "Unicode subset 2"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
45 ; long-name "Unicode subset (U+2500..U+33FF)"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
46 ; ))
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
47
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
48
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
49 ; (make-charset 'mule-unicode-e000-ffff
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
50 ; "Unicode characters of the range U+E000..U+FFFF."
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
51 ; '(dimension
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
52 ; 2
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
53 ; registry "ISO10646-1"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
54 ; chars 96
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
55 ; columns 1
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
56 ; direction l2r
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
57 ; final ?3
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
58 ; graphic 0
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
59 ; short-name "Unicode subset 3"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
60 ; long-name "Unicode subset (U+E000+FFFF)"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
61 ; ))
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
62
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
63
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
64 ; (make-charset 'mule-unicode-0100-24ff
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
65 ; "Unicode characters of the range U+0100..U+24FF."
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
66 ; '(dimension
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
67 ; 2
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
68 ; registry "ISO10646-1"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
69 ; chars 96
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
70 ; columns 1
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
71 ; direction l2r
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
72 ; final ?1
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
73 ; graphic 0
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
74 ; short-name "Unicode subset"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
75 ; long-name "Unicode subset (U+0100..U+24FF)"
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
76 ; ))
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
77
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
78
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
79 ;; NOTE: This takes only a fraction of a second on my Pentium III
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
80 ;; 700Mhz even with a totally optimization-disabled XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
81 (defun load-unicode-tables ()
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
82 "Initialize the Unicode translation tables for all standard charsets."
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
83 (let ((parse-args
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
84 '(("unicode/unicode-consortium"
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
85 ;; Due to the braindamaged way Mule treats the ASCII and Control-1
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
86 ;; charsets' types, trying to load them results in out-of-range
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
87 ;; warnings at unicode.c:1439. They're no-ops anyway, they're
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
88 ;; hardwired in unicode.c (unicode_to_ichar, ichar_to_unicode).
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
89 ;; ("8859-1.TXT" ascii #x00 #x7F #x0)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
90 ;; ("8859-1.TXT" control-1 #x80 #x9F #x-80)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
91 ;; The 8859-1.TXT G1 assignments are half no-ops, hardwired in
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 780
diff changeset
92 ;; unicode.c ichar_to_unicode, but not in unicode_to_ichar.
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
93 ("8859-1.TXT" latin-iso8859-1 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
94 ;; "8859-10.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
95 ;; "8859-13.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
96 ("8859-14.TXT" latin-iso8859-14 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
97 ("8859-15.TXT" latin-iso8859-15 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
98 ("8859-2.TXT" latin-iso8859-2 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
99 ("8859-3.TXT" latin-iso8859-3 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
100 ("8859-4.TXT" latin-iso8859-4 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
101 ("8859-5.TXT" cyrillic-iso8859-5 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
102 ("8859-6.TXT" arabic-iso8859-6 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
103 ("8859-7.TXT" greek-iso8859-7 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
104 ("8859-8.TXT" hebrew-iso8859-8 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
105 ("8859-9.TXT" latin-iso8859-9 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
106 ;; charset for Big5 does not matter; specifying `big5' will
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
107 ;; automatically make the right thing happen
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
108 ("BIG5.TXT" chinese-big5-1 nil nil nil big5)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
109 ("CNS11643.TXT" chinese-cns11643-1 #x10000 #x1FFFF #x-10000)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
110 ("CNS11643.TXT" chinese-cns11643-2 #x20000 #x2FFFF #x-20000)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
111 ;; "CP1250.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
112 ;; "CP1251.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
113 ;; "CP1252.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
114 ;; "CP1253.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
115 ;; "CP1254.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
116 ;; "CP1255.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
117 ;; "CP1256.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
118 ;; "CP1257.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
119 ;; "CP1258.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
120 ;; "CP874.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
121 ;; "CP932.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
122 ;; "CP936.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
123 ;; "CP949.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
124 ;; "CP950.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
125 ;; "GB12345.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
126 ("GB2312.TXT" chinese-gb2312)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
127 ;; "HANGUL.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
128 ("JIS0201.TXT" latin-jisx0201 #x21 #x80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
129 ("JIS0201.TXT" katakana-jisx0201 #xA0 #xFF #x-80)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
130 ("JIS0208.TXT" japanese-jisx0208 nil nil nil ignore-first-column)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
131 ("JIS0212.TXT" japanese-jisx0212)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
132 ;; "JOHAB.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
133 ;; "KOI8-R.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
134 ;; "KSC5601.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
135 ;; note that KSC5601.TXT as currently distributed is NOT what
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
136 ;; it claims to be! see comments in KSX1001.TXT.
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
137 ("KSX1001.TXT" korean-ksc5601)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
138 ;; "OLD5601.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
139 ;; "SHIFTJIS.TXT"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
140 )
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
141 ("unicode/mule-ucs"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
142 ;; use these instead of the above ones once we support surrogates
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
143 ;;("chinese-cns11643-1.txt" chinese-cns11643-1)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
144 ;;("chinese-cns11643-2.txt" chinese-cns11643-2)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
145 ;;("chinese-cns11643-3.txt" chinese-cns11643-3)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
146 ;;("chinese-cns11643-4.txt" chinese-cns11643-4)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
147 ;;("chinese-cns11643-5.txt" chinese-cns11643-5)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
148 ;;("chinese-cns11643-6.txt" chinese-cns11643-6)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
149 ;;("chinese-cns11643-7.txt" chinese-cns11643-7)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
150 ("chinese-sisheng.txt" chinese-sisheng)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
151 ("ethiopic.txt" ethiopic)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
152 ("indian-is13194.txt" indian-is13194)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
153 ("ipa.txt" ipa)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
154 ("thai-tis620.txt" thai-tis620)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
155 ("tibetan.txt" tibetan)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
156 ("vietnamese-viscii-lower.txt" vietnamese-viscii-lower)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
157 ("vietnamese-viscii-upper.txt" vietnamese-viscii-upper)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
158 )
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
159 ("unicode/other"
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
160 ("lao.txt" lao)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
161 )
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
162 )))
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
163 (mapcar #'(lambda (tables)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
164 (let ((undir
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
165 (expand-file-name (car tables) data-directory)))
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
166 (mapcar #'(lambda (args)
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 985
diff changeset
167 (apply 'load-unicode-mapping-table
780
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
168 (expand-file-name (car args) undir)
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
169 (cdr args)))
578cb2932d72 [xemacs-hg @ 2002-03-18 10:07:30 by ben]
ben
parents: 778
diff changeset
170 (cdr tables))))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
171 parse-args)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
172
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
173 (defun init-unicode-at-startup ()
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
174 (load-unicode-tables))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
175
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
176 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
177 'utf-16 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
178 "UTF-16"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
179 '(mnemonic "UTF-16"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
180 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
181 "UTF-16 Unicode encoding -- the standard (almost-) fixed-width
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
182 two-byte encoding, with surrogates. It will be fixed-width if all
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
183 characters are in the BMP (Basic Multilingual Plane -- first 65536
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
184 codepoints). Cannot represent characters with codepoints above
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
185 0x10FFFF (a little more than 1,000,000). Unicode and ISO guarantee
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
186 never to encode any characters outside this range -- all the rest are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
187 for private, corporate or internal use."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
188 type utf-16))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
189
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
190 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
191 'utf-16-bom 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
192 "UTF-16 w/BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
193 '(mnemonic "UTF16-BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
194 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
195 "UTF-16 Unicode encoding with byte order mark (BOM) at the beginning.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
196 The BOM is Unicode character U+FEFF -- i.e. the first two bytes are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
197 0xFE and 0xFF, respectively, or reversed in a little-endian
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
198 representation. It has been sanctioned by the Unicode Consortium for
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
199 use at the beginning of a Unicode stream as a marker of the byte order
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
200 of the stream, and commonly appears in Unicode files under Microsoft
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
201 Windows, where it also functions as a magic cookie identifying a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
202 Unicode file. The character is called \"ZERO WIDTH NO-BREAK SPACE\"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
203 and is suitable as a byte-order marker because:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
204
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
205 -- it has no displayable representation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
206 -- due to its semantics it never normally appears at the beginning
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
207 of a stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
208 -- its reverse U+FFFE is not a legal Unicode character
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
209 -- neither byte sequence is at all likely in any other standard
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
210 encoding, particularly at the beginning of a stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
211
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
212 This coding system will insert a BOM at the beginning of a stream when
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
213 writing and strip it off when reading."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
214 type utf-16
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
215 need-bom t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
216
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
217 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
218 'utf-16-little-endian 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
219 "UTF-16 Little Endian"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
220 '(mnemonic "UTF16-LE"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
221 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
222 "Little-endian version of UTF-16 Unicode encoding.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
223 See `utf-16' coding system."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
224 type utf-16
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
225 little-endian t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
226
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
227 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
228 'utf-16-little-endian-bom 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
229 "UTF-16 Little Endian w/BOM"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
230 '(mnemonic "MSW-Unicode"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
231 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
232 "Little-endian version of UTF-16 Unicode encoding, with byte order mark.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
233 Standard encoding for representing Unicode under MS Windows. See
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
234 `utf-16-bom' coding system."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
235 type utf-16
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
236 little-endian t
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
237 need-bom t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
238
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
239 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
240 'ucs-4 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
241 "UCS-4"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
242 '(mnemonic "UCS4"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
243 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
244 "UCS-4 Unicode encoding -- fully fixed-width four-byte encoding."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
245 type ucs-4))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
246
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
247 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
248 'ucs-4-little-endian 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
249 "UCS-4 Little Endian"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
250 '(mnemonic "UCS4-LE"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
251 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
252 "Little-endian version of UCS-4 Unicode encoding. See `ucs-4' coding system."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
253 type ucs-4
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
254 little-endian t))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
255
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
256 (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
257 'utf-8 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
258 "UTF-8"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
259 '(mnemonic "UTF8"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
260 documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
261 "UTF-8 Unicode encoding -- ASCII-compatible 8-bit variable-width encoding
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
262 with the same principles as the Mule-internal encoding:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
263
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
264 -- All ASCII characters (codepoints 0 through 127) are represented
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
265 by themselves (i.e. using one byte, with the same value as the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
266 ASCII codepoint), and these bytes are disjoint from bytes
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
267 representing non-ASCII characters.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
268
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
269 This means that any 8-bit clean application can safely process
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
270 UTF-8-encoded text as it were ASCII, with no corruption (e.g. a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
271 '/' byte is always a slash character, never the second byte of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
272 some other character, as with Big5, so a pathname encoded in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
273 UTF-8 can safely be split up into components and reassembled
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
274 again using standard ASCII processes).
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
275
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
276 -- Leading bytes and non-leading bytes in the encoding of a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
277 character are disjoint, so moving backwards is easy.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
278
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
279 -- Given only the leading byte, you know how many following bytes
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
280 are present.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
281 "
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
282 type utf-8))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
283
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
284 (make-coding-system
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
285 'utf-8-bom 'unicode
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
286 "UTF-8 w/BOM"
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
287 '(mnemonic "MSW-UTF8"
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
288 documentation
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
289 "UTF-8 Unicode encoding, with byte order mark.
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
290 Standard encoding for representing UTF-8 under MS Windows."
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
291 type utf-8
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
292 little-endian t
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
293 need-bom t))
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 877
diff changeset
294
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
295 ;; #### UTF-7 is not yet implemented, and it's tricky to do. There's
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
296 ;; an implementation in appendix A.1 of the Unicode Standard, Version
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
297 ;; 2.0, but I don't know its licensing characteristics.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
298
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
299 ; (make-coding-system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
300 ; 'utf-7 'unicode
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
301 ; "UTF-7"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
302 ; '(mnemonic "UTF7"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
303 ; documentation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
304 ; "UTF-7 Unicode encoding -- 7-bit-ASCII modal Internet-mail-compatible
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
305 ; encoding especially designed for headers, with the following
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
306 ; properties:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
307
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
308 ; -- Only characters that are considered safe for passing through any mail
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
309 ; gateway without damage are used.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
310
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
311 ; -- This is a modal encoding, with two states. The first, default
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
312 ; state encodes the most common Unicode characters (upper and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
313 ; lowercase letters, digits, and 9 common punctuation marks) as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
314 ; themselves, and the second state, entered using '+' and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
315 ; terminated with '-' or any character disallowed in state 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
316 ; encodes any Unicode characters by first converting to UTF-16,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
317 ; most significant byte first, and then to a slightly modified
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
318 ; Base64 encoding. (Thus, UTF-7 has the same limitations on the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
319 ; characters it can encode as UTF-16.)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
320
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
321 ; -- The modified Base64 encoding deviates from standard Base64 in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
322 ; that it omits the `=' pad character. This is eliminated so as to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
323 ; avoid conflicts with the use of `=' as an escape in the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
324 ; Quoted-Printable encoding and the related Q encoding for headers:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
325 ; With this modification, non-whitespace chars in UTF-7 will be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
326 ; represented in Quoted-Printable and in Q as-is, with no further
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
327 ; encoding.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
328
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
329 ; For more information, see Appendix A.1 of The Unicode Standard 2.0, or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
330 ; wherever it is in v3.0."
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
331 ; type utf-7))