annotate src/unicode.c @ 1318:b531bf8658e9

[xemacs-hg @ 2003-02-21 06:56:46 by ben] redisplay fixes et al. PROBLEMS: Add comment about Cygwin, unexec and sysmalloc. Move some non-general stuff out of general. Make a section for x86. configure.in: Add check for broken alloca in funcalls. mule/mule-cmds.el: Alias file-name to native not vice-versa. Do set EOL of native but not of process output to fix various problems and be consistent with code-init.el. code-cmds.el: Return a name not a coding system. code-init.el: Reindent. Remove `file-name' since it should always be the same as native. unicode.el: Rename to load-unicode-mapping-table as suggested by the anonymous (but rather Turnbullian) comment in unicode.c. xemacs.dsp: Add /k to default build. alloc.c: Make gc_currently_forbidden static. config.h.in, lisp.h: Move some stuff to lisp.h. console-gtk.h, console-impl.h, console-msw.h, console-x.h, event-Xt.c, event-msw.c, redisplay-gtk.c, redisplay-msw.c, redisplay-output.c, redisplay-x.c, gtk-xemacs.c: Remove duplicated code to redraw exposed area. Add deadbox method needed by the generalized redraw code. Defer redrawing if already in redisplay. frame-msw.c, event-stream.c, frame.c: Add comments about calling Lisp. debug.c, general-slots.h: Move generalish symbols to general-slots.h. doprnt.c: reindent. lisp.h, dynarr.c: Add debug code for locking a dynarr to catch invalid mods. Use in redisplay.c. eval.c: file-coding.c: Define file-name as alias for native not vice-versa. frame-gtk.c, frame-x.c: Move Qwindow_id to general-slots. dialog-msw.c, glyphs-gtk.c, glyphs-msw.c, glyphs-widget.c, glyphs-x.c, gui.c, gui.h, menubar-msw.c, menubar.c: Ensure that various glyph functions that eval within redisplay protect the evals. Same for calls to internal_equal(). Modify various functions, e.g. gui_item_*(), to protect evals within redisplay, taking an in_redisplay parameter if it's possible for them to be called both inside and outside of redisplay. gutter.c: Defer specifier-changed updating till after redisplay, if necessary, since we need to enter redisplay to do it. gutter.c: Do nothing if in redisplay. lisp.h: Add version of alloca() for use in function calls. lisp.h: Add XCAD[D+]R up to 6 D's, and aliases X1ST, X2ND, etc. frame.c, frame.h, redisplay.c, redisplay.h, signal.c, toolbar.c: Redo critical-section code and move from frame.c to redisplay.c. Require that every place inside of redisplay catch errors itself, not at the edge of the critical section (thereby bypassing the rest of redisplay and leaving things in an inconsistent state). Introduce separate means of holding frame-size changes without entering a complete critical section. Introduce "post-redisplay" methods for deferring things till after redisplay. Abort if we enter redisplay reentrantly. Disable all quit checking in redisplay since it's too dangerous. Ensure that all calls to QUIT trigger an abort if unprotected. redisplay.c, scrollbar-gtk.c, scrollbar-x.c, scrollbar.c: Create enter/exit_redisplay_critical_section_maybe() for code that needs to ensure it's in a critical section but doesn't interfere with an existing critical section. sysdep.c: Use _wexecve() when under Windows NT for Unicode correctness. text.c, text.h: Add new_dfc() functions, which return an alloca()ed value rather than requiring an lvalue. (Not really used yet; used in another workspace, to come.) Add some macros for SIZED_EXTERNAL. Update the encoding aliases after involved scrutinization of the X manual. unicode.c: Answer the anonymous but suspiciously Turnbullian questions. Rename parse-unicode-translation-table to load-unicode-mapping-table, as suggested.
author ben
date Fri, 21 Feb 2003 06:57:21 +0000
parents 70921960b980
children 969b7290edca
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1 /* Code to handle Unicode conversion.
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2 Copyright (C) 2000, 2001, 2002, 2003 Ben Wing.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
4 This file is part of XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
5
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
6 XEmacs is free software; you can redistribute it and/or modify it
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
7 under the terms of the GNU General Public License as published by the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
8 Free Software Foundation; either version 2, or (at your option) any
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
9 later version.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
11 XEmacs is distributed in the hope that it will be useful, but WITHOUT
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
12 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
13 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
14 for more details.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
15
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
16 You should have received a copy of the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
17 along with XEmacs; see the file COPYING. If not, write to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
18 the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
19 Boston, MA 02111-1307, USA. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
20
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
21 /* Synched up with: FSF 20.3. Not in FSF. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
22
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
23 /* Authorship:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
24
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
25 Current primary author: Ben Wing <ben@xemacs.org>
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
26
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
27 Written by Ben Wing <ben@xemacs.org>, June, 2001.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
28 Separated out into this file, August, 2001.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
29 Includes Unicode coding systems, some parts of which have been written
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
30 by someone else. #### Morioka and Hayashi, I think.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
31
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
32 As of September 2001, the detection code is here and abstraction of the
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
33 detection system is finished. The unicode detectors have been rewritten
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
34 to include multiple levels of likelihood.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
35 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
36
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
37 #include <config.h>
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
38 #include "lisp.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
39
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
40 #include "charset.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
41 #include "file-coding.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
42 #include "opaque.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
43
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
44 #include "sysfile.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
45
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
46 /* #### WARNING! The current sledgehammer routines have a fundamental
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
47 problem in that they can't handle two characters mapping to a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
48 single Unicode codepoint or vice-versa in a single charset table.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
49 It's not clear there is any way to handle this and still make the
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
50 sledgehammer routines useful.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
51
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
52 Inquiring Minds Want To Know Dept: does the above WARNING mean that
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
53 _if_ it happens, then it will signal error, or then it will do
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
54 something evil and unpredictable? Signaling an error is OK: for
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
55 all national standards, the national to Unicode map is an inclusion
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
56 (1-to-1). Any character set that does not behave that way is
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
57 broken according to the Unicode standard.
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
58
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
59 Answer: You will get an abort(), since the purpose of the sledgehammer
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
60 routines is self-checking. The above problem with non-1-to-1 mapping
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
61 occurs in the Big5 tables, as provided by the Unicode Consortium. */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
62
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
63 /* #define SLEDGEHAMMER_CHECK_UNICODE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
64
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
65 /* We currently use the following format for tables:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
66
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
67 If dimension == 1, to_unicode_table is a 96-element array of ints
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
68 (Unicode code points); else, it's a 96-element array of int *
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
69 pointers, each of which points to a 96-element array of ints. If no
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
70 elements in a row have been filled in, the pointer will point to a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
71 default empty table; that way, memory usage is more reasonable but
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
72 lookup still fast.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
73
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
74 -- If from_unicode_levels == 1, from_unicode_table is a 256-element
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
75 array of shorts (octet 1 in high byte, octet 2 in low byte; we don't
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
76 store Ichars directly to save space).
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
77
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
78 -- If from_unicode_levels == 2, from_unicode_table is a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
79 256-element array of short * pointers, each of which points to a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
80 256-element array of shorts.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
81
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
82 -- If from_unicode_levels == 3, from_unicode_table is a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
83 256-element array of short ** pointers, each of which points to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
84 a 256-element array of short * pointers, each of which points to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
85 a 256-element array of shorts.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
86
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
87 -- If from_unicode_levels == 4, same thing but one level deeper.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
88
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
89 Just as for to_unicode_table, we use default tables to fill in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
90 all entries with no values in them.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
91
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
92 #### An obvious space-saving optimization is to use variable-sized
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
93 tables, where each table instead of just being a 256-element array,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
94 is a structure with a start value, an end value, and a variable
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
95 number of entries (END - START + 1). Only 8 bits are needed for
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
96 END and START, and could be stored at the end to avoid alignment
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
97 problems. However, before charging off and implementing this,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
98 we need to consider whether it's worth it:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
99
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
100 (1) Most tables will be highly localized in which code points are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
101 defined, heavily reducing the possible memory waste. Before
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
102 doing any rewriting, write some code to see how much memory is
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
103 actually being wasted (i.e. ratio of empty entries to total # of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
104 entries) and only start rewriting if it's unacceptably high. You
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
105 have to check over all charsets.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
106
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
107 (2) Since entries are usually added one at a time, you have to be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
108 very careful when creating the tables to avoid realloc()/free()
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
109 thrashing in the common case when you are in an area of high
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
110 localization and are going to end up using most entries in the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
111 table. You'd certainly want to allow only certain sizes, not
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
112 arbitrary ones (probably powers of 2, where you want the entire
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
113 block including the START/END values to fit into a power of 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
114 minus any malloc overhead if there is any -- there's none under
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
115 gmalloc.c, and probably most system malloc() functions are quite
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
116 smart nowadays and also have no overhead). You could optimize
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
117 somewhat during the in-C initializations, because you can compute
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
118 the actual usage of various tables by scanning the entries you're
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
119 going to add in a separate pass before adding them. (You could
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
120 actually do the same thing when entries are added on the Lisp
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
121 level by making the assumption that all the entries will come in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
122 one after another before any use is made of the data. So as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
123 they're coming in, you just store them in a big long list, and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
124 the first time you need to retrieve an entry, you compute the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
125 whole table at once.) You'd still have to deal with the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
126 possibility of later entries coming in, though.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
127
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
128 (3) You do lose some speed using START/END values, since you need
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
129 a couple of comparisons at each level. This could easily make
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
130 each single lookup become 3-4 times slower. The Unicode book
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
131 considers this a big issue, and recommends against variable-sized
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
132 tables for this reason; however, they almost certainly have in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
133 mind applications that primarily involve conversion of large
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
134 amounts of data. Most Unicode strings that are translated in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
135 XEmacs are fairly small. The only place where this might matter
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
136 is in loading large files -- e.g. a 3-megabyte Unicode-encoded
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
137 file. So think about this, and maybe do a trial implementation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
138 where you don't worry too much about the intricacies of (2) and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
139 just implement some basic "multiply by 1.5" trick or something to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
140 do the resizing. There is a very good FAQ on Unicode called
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
141 something like the Linux-Unicode How-To (it should be part of the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
142 Linux How-To's, I think), that lists the url of a guy with a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
143 whole bunch of unicode files you can use to stress-test your
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
144 implementations, and he's highly likely to have a good
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
145 multi-megabyte Unicode-encoded file (with normal text in it -- if
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
146 you created your own just by creating repeated strings of letters
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
147 and numbers, you probably wouldn't get accurate results).
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
148 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
149
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
150 /* When MULE is not defined, we may still need some Unicode support --
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
151 in particular, some Windows API's always want Unicode, and the way
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
152 we've set up the Unicode encapsulation, we may as well go ahead and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
153 always use the Unicode versions of split API's. (It would be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
154 trickier to not use them, and pointless -- under NT, the ANSI API's
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
155 call the Unicode ones anyway, so in the case of structures, we'd be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
156 converting from Unicode to ANSI structures, only to have the OS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
157 convert them back.) */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
158
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
159 Lisp_Object Qunicode;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
160 Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
161 Lisp_Object Qneed_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
162
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
163 Lisp_Object Qutf_16_little_endian, Qutf_16_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
164 Lisp_Object Qutf_16_little_endian_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
165
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
166 Lisp_Object Qutf_8_bom;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
167
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
168 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
169
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
170 /* #### Using ints for to_unicode is OK (as long as they are >= 32 bits).
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
171 However, shouldn't the shorts below be unsigned?
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
172
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
173 Answer: Doesn't matter because the values being converted to are only
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
174 96x96. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
175 static int *to_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
176 static int **to_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
177
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
178 static short *from_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
179 static short **from_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
180 static short ***from_unicode_blank_3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
181 static short ****from_unicode_blank_4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
182
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
183 static const struct memory_description to_unicode_level_0_desc_1[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
184 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
185 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
186
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
187 static const struct sized_memory_description to_unicode_level_0_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
188 sizeof (int), to_unicode_level_0_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
189 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
190
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
191 static const struct memory_description to_unicode_level_1_desc_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
192 { XD_STRUCT_PTR, 0, 96, &to_unicode_level_0_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
193 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
194 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
195
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
196 static const struct sized_memory_description to_unicode_level_1_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
197 sizeof (void *), to_unicode_level_1_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
198 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
199
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
200 static const struct memory_description to_unicode_description_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
201 { XD_STRUCT_PTR, 1, 96, &to_unicode_level_0_desc },
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
202 { XD_STRUCT_PTR, 2, 96, &to_unicode_level_1_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
203 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
204 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
205
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
206 /* Not static because each charset has a set of to and from tables and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
207 needs to describe them to pdump. */
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
208 const struct sized_memory_description to_unicode_description = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
209 sizeof (void *), to_unicode_description_1
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
210 };
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
211
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
212 static const struct memory_description from_unicode_level_0_desc_1[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
213 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
214 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
215
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
216 static const struct sized_memory_description from_unicode_level_0_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
217 sizeof (short), from_unicode_level_0_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
218 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
219
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
220 static const struct memory_description from_unicode_level_1_desc_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
221 { XD_STRUCT_PTR, 0, 256, &from_unicode_level_0_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
222 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
223 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
224
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
225 static const struct sized_memory_description from_unicode_level_1_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
226 sizeof (void *), from_unicode_level_1_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
227 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
228
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
229 static const struct memory_description from_unicode_level_2_desc_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
230 { XD_STRUCT_PTR, 0, 256, &from_unicode_level_1_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
231 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
232 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
233
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
234 static const struct sized_memory_description from_unicode_level_2_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
235 sizeof (void *), from_unicode_level_2_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
236 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
237
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
238 static const struct memory_description from_unicode_level_3_desc_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
239 { XD_STRUCT_PTR, 0, 256, &from_unicode_level_2_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
240 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
241 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
242
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
243 static const struct sized_memory_description from_unicode_level_3_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
244 sizeof (void *), from_unicode_level_3_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
245 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
246
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
247 static const struct memory_description from_unicode_description_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
248 { XD_STRUCT_PTR, 1, 256, &from_unicode_level_0_desc },
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
249 { XD_STRUCT_PTR, 2, 256, &from_unicode_level_1_desc },
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
250 { XD_STRUCT_PTR, 3, 256, &from_unicode_level_2_desc },
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
251 { XD_STRUCT_PTR, 4, 256, &from_unicode_level_3_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
252 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
253 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
254
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
255 /* Not static because each charset has a set of to and from tables and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
256 needs to describe them to pdump. */
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
257 const struct sized_memory_description from_unicode_description = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
258 sizeof (void *), from_unicode_description_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
259 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
260
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
261 static Lisp_Object_dynarr *unicode_precedence_dynarr;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
262
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
263 static const struct memory_description lod_description_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
264 XD_DYNARR_DESC (Lisp_Object_dynarr, &lisp_object_description),
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
265 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
266 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
267
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
268 static const struct sized_memory_description lisp_object_dynarr_description = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
269 sizeof (Lisp_Object_dynarr),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
270 lod_description_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
271 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
272
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
273 Lisp_Object Vlanguage_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
274 Lisp_Object Vdefault_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
275
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
276 Lisp_Object Qignore_first_column;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
277
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
278
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
279 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
280 /* Unicode implementation */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
281 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
282
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
283 #define BREAKUP_UNICODE_CODE(val, u1, u2, u3, u4, levels) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
284 do { \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
285 int buc_val = (val); \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
286 \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
287 (u1) = buc_val >> 24; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
288 (u2) = (buc_val >> 16) & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
289 (u3) = (buc_val >> 8) & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
290 (u4) = buc_val & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
291 (levels) = (buc_val <= 0xFF ? 1 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
292 buc_val <= 0xFFFF ? 2 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
293 buc_val <= 0xFFFFFF ? 3 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
294 4); \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
295 } while (0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
296
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
297 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
298 init_blank_unicode_tables (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
299 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
300 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
301
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
302 from_unicode_blank_1 = xnew_array (short, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
303 from_unicode_blank_2 = xnew_array (short *, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
304 from_unicode_blank_3 = xnew_array (short **, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
305 from_unicode_blank_4 = xnew_array (short ***, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
306 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
307 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
308 /* #### IMWTK: Why does using -1 here work? Simply because there are
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
309 no existing 96x96 charsets?
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
310
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
311 Answer: I don't understand the concern. -1 indicates there is no
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
312 entry for this particular codepoint, which is always the case for
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
313 blank tables. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
314 from_unicode_blank_1[i] = (short) -1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
315 from_unicode_blank_2[i] = from_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
316 from_unicode_blank_3[i] = from_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
317 from_unicode_blank_4[i] = from_unicode_blank_3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
318 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
319
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
320 to_unicode_blank_1 = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
321 to_unicode_blank_2 = xnew_array (int *, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
322 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
323 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
324 /* Here -1 is guaranteed OK. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
325 to_unicode_blank_1[i] = -1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
326 to_unicode_blank_2[i] = to_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
327 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
328 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
329
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
330 static void *
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
331 create_new_from_unicode_table (int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
332 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
333 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
334 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
335 /* WARNING: If you are thinking of compressing these, keep in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
336 mind that sizeof (short) does not equal sizeof (short *). */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
337 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
338 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
339 short *newtab = xnew_array (short, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
340 memcpy (newtab, from_unicode_blank_1, 256 * sizeof (short));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
341 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
342 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
343 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
344 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
345 short **newtab = xnew_array (short *, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
346 memcpy (newtab, from_unicode_blank_2, 256 * sizeof (short *));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
347 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
348 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
349 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
350 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
351 short ***newtab = xnew_array (short **, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
352 memcpy (newtab, from_unicode_blank_3, 256 * sizeof (short **));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
353 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
354 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
355 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
356 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
357 short ****newtab = xnew_array (short ***, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
358 memcpy (newtab, from_unicode_blank_4, 256 * sizeof (short ***));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
359 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
360 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
361 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
362 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
363 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
364 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
365 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
366
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
367 /* Allocate and blank the tables.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
368 Loading them up is done by load-unicode-mapping-table. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
369 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
370 init_charset_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
371 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
372 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
373 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
374 int *to_table = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
375 memcpy (to_table, to_unicode_blank_1, 96 * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
376 XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
377 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
378 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
379 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
380 int **to_table = xnew_array (int *, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
381 memcpy (to_table, to_unicode_blank_2, 96 * sizeof (int *));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
382 XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
383 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
384
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
385 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
386 XCHARSET_FROM_UNICODE_TABLE (charset) = create_new_from_unicode_table (1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
387 XCHARSET_FROM_UNICODE_LEVELS (charset) = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
388 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
389 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
390
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
391 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
392 free_from_unicode_table (void *table, int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
393 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
394 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
395
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
396 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
397 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
398 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
399 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
400 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
401 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
402 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
403 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
404 free_from_unicode_table (tab[i], 1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
405 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
406 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
407 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
408 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
409 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
410 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
411 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
412 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
413 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
414 free_from_unicode_table (tab[i], 2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
415 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
416 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
417 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
418 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
419 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
420 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
421 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
422 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
423 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
424 free_from_unicode_table (tab[i], 3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
425 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
426 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
427 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
428 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
429
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
430 xfree (table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
431 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
432
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
433 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
434 free_to_unicode_table (void *table, int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
435 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
436 if (level == 2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
437 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
438 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
439 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
440
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
441 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
442 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
443 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
444 free_to_unicode_table (tab[i], 1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
445 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
446 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
447
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
448 xfree (table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
449 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
450
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
451 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
452 free_charset_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
453 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
454 free_to_unicode_table (XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
455 XCHARSET_DIMENSION (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
456 free_from_unicode_table (XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
457 XCHARSET_FROM_UNICODE_LEVELS (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
458 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
459
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
460 #ifdef MEMORY_USAGE_STATS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
461
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
462 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
463 compute_from_unicode_table_size_1 (void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
464 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
465 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
466 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
467 Bytecount size = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
468
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
469 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
470 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
471 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
472 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
473 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
474 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
475 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
476 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
477 size += compute_from_unicode_table_size_1 (tab[i], 1, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
478 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
479 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
480 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
481 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
482 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
483 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
484 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
485 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
486 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
487 size += compute_from_unicode_table_size_1 (tab[i], 2, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
488 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
489 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
490 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
491 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
492 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
493 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
494 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
495 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
496 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
497 size += compute_from_unicode_table_size_1 (tab[i], 3, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
498 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
499 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
500 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
501 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
502
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
503 size += malloced_storage_size (table,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
504 256 * (level == 1 ? sizeof (short) :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
505 sizeof (void *)),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
506 stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
507 return size;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
508 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
509
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
510 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
511 compute_to_unicode_table_size_1 (void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
512 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
513 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
514 Bytecount size = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
515
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
516 if (level == 2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
517 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
518 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
519 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
520
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
521 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
522 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
523 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
524 size += compute_to_unicode_table_size_1 (tab[i], 1, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
525 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
526 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
527
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
528 size += malloced_storage_size (table,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
529 96 * (level == 1 ? sizeof (int) :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
530 sizeof (void *)),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
531 stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
532 return size;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
533 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
534
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
535 Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
536 compute_from_unicode_table_size (Lisp_Object charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
537 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
538 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
539 return (compute_from_unicode_table_size_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
540 (XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
541 XCHARSET_FROM_UNICODE_LEVELS (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
542 stats));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
543 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
544
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
545 Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
546 compute_to_unicode_table_size (Lisp_Object charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
547 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
548 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
549 return (compute_to_unicode_table_size_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
550 (XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
551 XCHARSET_DIMENSION (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
552 stats));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
553 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
554
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
555 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
556
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
557 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
558
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
559 /* "Sledgehammer checks" are checks that verify the self-consistency
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
560 of an entire structure every time a change is about to be made or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
561 has been made to the structure. Not fast but a pretty much
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
562 sure-fire way of flushing out any incorrectnesses in the algorithms
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
563 that create the structure.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
564
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
565 Checking only after a change has been made will speed things up by
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
566 a factor of 2, but it doesn't absolutely prove that the code just
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
567 checked caused the problem; perhaps it happened elsewhere, either
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
568 in some code you forgot to sledgehammer check or as a result of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
569 data corruption. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
570
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
571 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
572 assert_not_any_blank_table (void *tab)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
573 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
574 assert (tab != from_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
575 assert (tab != from_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
576 assert (tab != from_unicode_blank_3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
577 assert (tab != from_unicode_blank_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
578 assert (tab != to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
579 assert (tab != to_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
580 assert (tab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
581 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
582
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
583 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
584 sledgehammer_check_from_table (Lisp_Object charset, void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
585 int codetop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
586 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
587 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
588
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
589 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
590 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
591 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
592 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
593 short *tab = (short *) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
594 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
595 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
596 if (tab[i] != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
597 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
598 Lisp_Object char_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
599 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
600
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
601 assert (valid_ichar_p (tab[i]));
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
602 BREAKUP_ICHAR (tab[i], char_charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
603 assert (EQ (charset, char_charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
604 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
605 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
606 int *to_table =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
607 (int *) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
608 assert_not_any_blank_table (to_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
609 assert (to_table[c1 - 32] == (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
610 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
611 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
612 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
613 int **to_table =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
614 (int **) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
615 assert_not_any_blank_table (to_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
616 assert_not_any_blank_table (to_table[c1 - 32]);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
617 assert (to_table[c1 - 32][c2 - 32] == (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
618 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
619 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
620 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
621 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
622 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
623 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
624 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
625 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
626 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
627 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
628 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
629 sledgehammer_check_from_table (charset, tab[i], 1,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
630 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
631 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
632 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
633 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
634 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
635 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
636 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
637 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
638 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
639 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
640 sledgehammer_check_from_table (charset, tab[i], 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
641 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
642 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
643 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
644 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
645 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
646 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
647 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
648 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
649 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
650 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
651 sledgehammer_check_from_table (charset, tab[i], 3,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
652 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
653 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
654 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
655 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
656 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
657 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
658 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
659 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
660
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
661 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
662 sledgehammer_check_to_table (Lisp_Object charset, void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
663 int codetop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
664 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
665 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
666
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
667 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
668 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
669 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
670 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
671 int *tab = (int *) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
672
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
673 if (XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
674 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
675 assert (tab[0] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
676 assert (tab[95] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
677 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
678
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
679 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
680 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
681 if (tab[i] != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
682 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
683 int u4, u3, u2, u1, levels;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
684 Ichar ch;
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
685 Ichar this_ch;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
686 short val;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
687 void *frtab = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
688
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
689 if (XCHARSET_DIMENSION (charset) == 1)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
690 this_ch = make_ichar (charset, i + 32, 0);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
691 else
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
692 this_ch = make_ichar (charset, codetop + 32, i + 32);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
693
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
694 assert (tab[i] >= 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
695 BREAKUP_UNICODE_CODE (tab[i], u4, u3, u2, u1, levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
696 assert (levels <= XCHARSET_FROM_UNICODE_LEVELS (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
697
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
698 switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
699 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
700 case 1: val = ((short *) frtab)[u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
701 case 2: val = ((short **) frtab)[u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
702 case 3: val = ((short ***) frtab)[u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
703 case 4: val = ((short ****) frtab)[u4][u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
704 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
705 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
706
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
707 ch = make_ichar (charset, val >> 8, val & 0xFF);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
708 assert (ch == this_ch);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
709
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
710 switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
711 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
712 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
713 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
714 frtab = ((short ****) frtab)[u4];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
715 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
716 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
717 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
718 frtab = ((short ***) frtab)[u3];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
719 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
720 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
721 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
722 frtab = ((short **) frtab)[u2];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
723 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
724 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
725 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
726 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
727 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
728 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
729 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
730 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
731 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
732 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
733 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
734 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
735 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
736
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
737 if (XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
738 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
739 assert (tab[0] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
740 assert (tab[95] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
741 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
742
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
743 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
744 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
745 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
746 sledgehammer_check_to_table (charset, tab[i], 1, i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
747 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
748 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
749 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
750 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
751 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
752 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
753 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
754
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
755 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
756 sledgehammer_check_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
757 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
758 /* verify that the blank tables have not been modified */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
759 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
760 int from_level = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
761 int to_level = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
762
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
763 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
764 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
765 assert (from_unicode_blank_1[i] == (short) -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
766 assert (from_unicode_blank_2[i] == from_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
767 assert (from_unicode_blank_3[i] == from_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
768 assert (from_unicode_blank_4[i] == from_unicode_blank_3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
769 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
770
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
771 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
772 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
773 assert (to_unicode_blank_1[i] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
774 assert (to_unicode_blank_2[i] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
775 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
776
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
777 assert (from_level >= 1 && from_level <= 4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
778
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
779 sledgehammer_check_from_table (charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
780 XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
781 from_level, 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
782
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
783 sledgehammer_check_to_table (charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
784 XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
785 XCHARSET_DIMENSION (charset), 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
786 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
787
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
788 #endif /* SLEDGEHAMMER_CHECK_UNICODE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
789
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
790 static void
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
791 set_unicode_conversion (Ichar chr, int code)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
792 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
793 Lisp_Object charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
794 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
795
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
796 BREAKUP_ICHAR (chr, charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
797
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
798 /* I tried an assert on code > 255 || chr == code, but that fails because
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
799 Mule gives many Latin characters separate code points for different
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
800 ISO 8859 coded character sets. Obvious in hindsight.... */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
801 assert (!EQ (charset, Vcharset_ascii) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
802 assert (!EQ (charset, Vcharset_latin_iso8859_1) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
803 assert (!EQ (charset, Vcharset_control_1) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
804
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
805 /* This assert is needed because it is simply unimplemented. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
806 assert (!EQ (charset, Vcharset_composite));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
807
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
808 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
809 sledgehammer_check_unicode_tables (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
810 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
811
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
812 /* First, the char -> unicode translation */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
813
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
814 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
815 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
816 int *to_table = (int *) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
817 to_table[c1 - 32] = code;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
818 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
819 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
820 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
821 int **to_table_2 = (int **) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
822 int *to_table_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
823
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
824 assert (XCHARSET_DIMENSION (charset) == 2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
825 to_table_1 = to_table_2[c1 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
826 if (to_table_1 == to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
827 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
828 to_table_1 = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
829 memcpy (to_table_1, to_unicode_blank_1, 96 * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
830 to_table_2[c1 - 32] = to_table_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
831 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
832 to_table_1[c2 - 32] = code;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
833 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
834
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
835 /* Then, unicode -> char: much harder */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
836
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
837 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
838 int charset_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
839 int u4, u3, u2, u1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
840 int code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
841 BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
842
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
843 charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
844
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
845 /* Make sure the charset's tables have at least as many levels as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
846 the code point has: Note that the charset is guaranteed to have
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
847 at least one level, because it was created that way */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
848 if (charset_levels < code_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
849 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
850 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
851
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
852 assert (charset_levels > 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
853 for (i = 2; i <= code_levels; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
854 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
855 if (charset_levels < i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
856 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
857 void *old_table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
858 void *table = create_new_from_unicode_table (i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
859 XCHARSET_FROM_UNICODE_TABLE (charset) = table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
860
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
861 switch (i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
862 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
863 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
864 ((short **) table)[0] = (short *) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
865 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
866 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
867 ((short ***) table)[0] = (short **) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
868 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
869 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
870 ((short ****) table)[0] = (short ***) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
871 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
872 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
873 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
874 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
875 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
876
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
877 charset_levels = code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
878 XCHARSET_FROM_UNICODE_LEVELS (charset) = code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
879 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
880
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
881 /* Now, make sure there is a non-default table at each level */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
882 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
883 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
884 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
885
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
886 for (i = charset_levels; i >= 2; i--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
887 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
888 switch (i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
889 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
890 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
891 if (((short ****) table)[u4] == from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
892 ((short ****) table)[u4] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
893 ((short ***) create_new_from_unicode_table (3));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
894 table = ((short ****) table)[u4];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
895 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
896 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
897 if (((short ***) table)[u3] == from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
898 ((short ***) table)[u3] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
899 ((short **) create_new_from_unicode_table (2));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
900 table = ((short ***) table)[u3];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
901 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
902 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
903 if (((short **) table)[u2] == from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
904 ((short **) table)[u2] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
905 ((short *) create_new_from_unicode_table (1));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
906 table = ((short **) table)[u2];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
907 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
908 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
909 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
910 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
911 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
912
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
913 /* Finally, set the character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
914
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
915 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
916 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
917 switch (charset_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
918 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
919 case 1: ((short *) table)[u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
920 case 2: ((short **) table)[u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
921 case 3: ((short ***) table)[u3][u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
922 case 4: ((short ****) table)[u4][u3][u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
923 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
924 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
925 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
926 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
927
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
928 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
929 sledgehammer_check_unicode_tables (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
930 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
931 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
932
788
026c5bf9c134 [xemacs-hg @ 2002-03-21 07:29:57 by ben]
ben
parents: 778
diff changeset
933 int
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
934 ichar_to_unicode (Ichar chr)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
935 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
936 Lisp_Object charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
937 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
938
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
939 type_checking_assert (valid_ichar_p (chr));
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
940 /* This shortcut depends on the representation of an Ichar, see text.c. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
941 if (chr < 256)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
942 return (int) chr;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
943
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
944 BREAKUP_ICHAR (chr, charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
945 if (EQ (charset, Vcharset_composite))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
946 return -1; /* #### don't know how to handle */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
947 else if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
948 return ((int *) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
949 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
950 return ((int **) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32][c2 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
951 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
952
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
953 static Ichar
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
954 unicode_to_ichar (int code, Lisp_Object_dynarr *charsets)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
955 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
956 int u1, u2, u3, u4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
957 int code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
958 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
959 int n = Dynarr_length (charsets);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
960
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
961 type_checking_assert (code >= 0);
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
962 /* This shortcut depends on the representation of an Ichar, see text.c.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
963 Note that it may _not_ be extended to U+00A0 to U+00FF (many ISO 8859
893
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
964 coded character sets have points that map into that region, so this
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
965 function is many-valued). */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
966 if (code < 0xA0)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
967 return (Ichar) code;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
968
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
969 BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
970
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
971 for (i = 0; i < n; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
972 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
973 Lisp_Object charset = Dynarr_at (charsets, i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
974 int charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
975 if (charset_levels >= code_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
976 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
977 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
978 short retval;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
979
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
980 switch (charset_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
981 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
982 case 1: retval = ((short *) table)[u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
983 case 2: retval = ((short **) table)[u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
984 case 3: retval = ((short ***) table)[u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
985 case 4: retval = ((short ****) table)[u4][u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
986 default: abort (); retval = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
987 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
988
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
989 if (retval != -1)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
990 return make_ichar (charset, retval >> 8, retval & 0xFF);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
991 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
992 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
993
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
994 return (Ichar) -1;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
995 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
996
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
997 /* Add charsets to precedence list.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
998 LIST must be a list of charsets. Charsets which are in the list more
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
999 than once are given the precedence implied by their earliest appearance.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1000 Later appearances are ignored. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1001 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1002 add_charsets_to_precedence_list (Lisp_Object list, int *lbs,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1003 Lisp_Object_dynarr *dynarr)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1004 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1005 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1006 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1007 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1008 Lisp_Object charset = Fget_charset (elt);
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
1009 int lb = XCHARSET_LEADING_BYTE (charset);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1010 if (lbs[lb - MIN_LEADING_BYTE] == 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1011 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1012 Dynarr_add (dynarr, charset);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1013 lbs[lb - MIN_LEADING_BYTE] = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1014 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1015 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1016 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1017 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1018
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1019 /* Rebuild the charset precedence array.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1020 The "charsets preferred for the current language" get highest precedence,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1021 followed by the "charsets preferred by default", ordered as in
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1022 Vlanguage_unicode_precedence_list and Vdefault_unicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1023 respectively. All remaining charsets follow in an arbitrary order. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1024 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1025 recalculate_unicode_precedence (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1026 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1027 int lbs[NUM_LEADING_BYTES];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1028 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1029
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1030 for (i = 0; i < NUM_LEADING_BYTES; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1031 lbs[i] = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1032
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1033 Dynarr_reset (unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1034
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1035 add_charsets_to_precedence_list (Vlanguage_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1036 lbs, unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1037 add_charsets_to_precedence_list (Vdefault_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1038 lbs, unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1039
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1040 for (i = 0; i < NUM_LEADING_BYTES; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1041 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1042 if (lbs[i] == 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1043 {
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1044 Lisp_Object charset = charset_by_leading_byte (i + MIN_LEADING_BYTE);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1045 if (!NILP (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1046 Dynarr_add (unicode_precedence_dynarr, charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1047 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1048 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1049 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1050
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1051 DEFUN ("unicode-precedence-list",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1052 Funicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1053 0, 0, 0, /*
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1054 Return the precedence order among charsets used for Unicode decoding.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1055
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1056 Value is a list of charsets, which are searched in order for a translation
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1057 matching a given Unicode character.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1058
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1059 The highest precedence is given to the language-specific precedence list of
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1060 charsets, defined by `set-language-unicode-precedence-list'. These are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1061 followed by charsets in the default precedence list, defined by
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1062 `set-default-unicode-precedence-list'. Charsets occurring multiple times are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1063 given precedence according to their first occurrance in either list. These
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1064 are followed by the remaining charsets, in some arbitrary order.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1065
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1066 The language-specific precedence list is meant to be set as part of the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1067 language environment initialization; the default precedence list is meant
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1068 to be set by the user.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1069
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1070 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1071 */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1072 ())
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1073 {
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1074 int i;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1075 Lisp_Object list = Qnil;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1076
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1077 for (i = Dynarr_length (unicode_precedence_dynarr) - 1; i >= 0; i--)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1078 list = Fcons (Dynarr_at (unicode_precedence_dynarr, i), list);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1079 return list;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1080 }
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1081
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1082
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1083 /* #### This interface is wrong. Cyrillic users and Chinese users are going
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1084 to have varying opinions about whether ISO Cyrillic, KOI8-R, or Windows
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1085 1251 should take precedence, and whether Big Five or CNS should take
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1086 precedence, respectively. This means that users are sometimes going to
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1087 want to set Vlanguage_unicode_precedence_list.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1088 Furthermore, this should be language-local (buffer-local would be a
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1089 reasonable approximation).
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1090
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1091 Answer: You are right, this needs rethinking. */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1092 DEFUN ("set-language-unicode-precedence-list",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1093 Fset_language_unicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1094 1, 1, 0, /*
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1095 Set the language-specific precedence of charsets in Unicode decoding.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1096 LIST is a list of charsets.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1097 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1098
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1099 #### NOTE: This interface may be changed.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1100 */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1101 (list))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1102 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1103 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1104 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1105 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1106 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1107
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1108 Vlanguage_unicode_precedence_list = list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1109 recalculate_unicode_precedence ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1110 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1111 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1112
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1113 DEFUN ("language-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1114 Flanguage_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1115 0, 0, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1116 Return the language-specific precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1117 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1118
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1119 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1120 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1121 ())
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1122 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1123 return Vlanguage_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1124 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1125
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1126 DEFUN ("set-default-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1127 Fset_default_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1128 1, 1, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1129 Set the default precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1130 This is intended to be set by the user. See
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1131 `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1132
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1133 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1134 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1135 (list))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1136 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1137 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1138 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1139 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1140 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1141
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1142 Vdefault_unicode_precedence_list = list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1143 recalculate_unicode_precedence ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1144 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1145 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1146
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1147 DEFUN ("default-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1148 Fdefault_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1149 0, 0, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1150 Return the default precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1151 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1152
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1153 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1154 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1155 ())
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1156 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1157 return Vdefault_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1158 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1159
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1160 DEFUN ("set-unicode-conversion", Fset_unicode_conversion,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1161 2, 2, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1162 Add conversion information between Unicode codepoints and characters.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1163 Conversions for U+0000 to U+00FF are hardwired to ASCII, Control-1, and
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1164 Latin-1. Attempts to set these values will raise an error.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1165
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1166 CHARACTER is one of the following:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1167
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1168 -- A character (in which case CODE must be a non-negative integer; values
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1169 above 2^20 - 1 are allowed for the purpose of specifying private
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1170 characters, but are illegal in standard Unicode---they will cause errors
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1171 when converted to utf-16)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1172 -- A vector of characters (in which case CODE must be a vector of integers
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1173 of the same length)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1174 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1175 (character, code))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1176 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1177 Lisp_Object charset;
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1178 int ichar, unicode;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1179
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1180 CHECK_CHAR (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1181 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1182
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1183 unicode = XINT (code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1184 ichar = XCHAR (character);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1185 charset = ichar_charset (ichar);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1186
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1187 /* The translations of ASCII, Control-1, and Latin-1 code points are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1188 hard-coded in ichar_to_unicode and unicode_to_ichar.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1189
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1190 Checking unicode < 256 && ichar != unicode is wrong because Mule gives
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1191 many Latin characters code points in a few different character sets. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1192 if ((EQ (charset, Vcharset_ascii) ||
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1193 EQ (charset, Vcharset_control_1) ||
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1194 EQ (charset, Vcharset_latin_iso8859_1))
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1195 && unicode != ichar)
893
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
1196 signal_error (Qinvalid_argument, "Can't change Unicode translation for ASCII, Control-1 or Latin-1 character",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1197 character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1198
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1199 /* #### Composite characters are not properly implemented yet. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1200 if (EQ (charset, Vcharset_composite))
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1201 signal_error (Qinvalid_argument, "Can't set Unicode translation for Composite char",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1202 character);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1203
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1204 set_unicode_conversion (ichar, unicode);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1205 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1206 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1207
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1208 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1209
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
1210 DEFUN ("char-to-unicode", Fchar_to_unicode, 1, 1, 0, /*
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1211 Convert character to Unicode codepoint.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1212 When there is no international support (i.e. the 'mule feature is not
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1213 present), this function simply does `char-to-int'.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1214 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1215 (character))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1216 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1217 CHECK_CHAR (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1218 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1219 return make_int (ichar_to_unicode (XCHAR (character)));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1220 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1221 return Fchar_to_int (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1222 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1223 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1224
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
1225 DEFUN ("unicode-to-char", Funicode_to_char, 1, 2, 0, /*
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1226 Convert Unicode codepoint to character.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1227 CODE should be a non-negative integer.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1228 If CHARSETS is given, it should be a list of charsets, and only those
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1229 charsets will be consulted, in the given order, for a translation.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1230 Otherwise, the default ordering of all charsets will be given (see
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1231 `set-unicode-charset-precedence').
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1232
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1233 When there is no international support (i.e. the 'mule feature is not
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1234 present), this function simply does `int-to-char' and ignores the CHARSETS
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1235 argument.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1236 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1237 (code, charsets))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1238 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1239 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1240 Lisp_Object_dynarr *dyn;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1241 int lbs[NUM_LEADING_BYTES];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1242 int c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1243
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1244 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1245 c = XINT (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1246 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1247 EXTERNAL_LIST_LOOP_2 (elt, charsets)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1248 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1249 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1250
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1251 if (NILP (charsets))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1252 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1253 Ichar ret = unicode_to_ichar (c, unicode_precedence_dynarr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1254 if (ret == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1255 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1256 return make_char (ret);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1257 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1258
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1259 dyn = Dynarr_new (Lisp_Object);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1260 memset (lbs, 0, NUM_LEADING_BYTES * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1261 add_charsets_to_precedence_list (charsets, lbs, dyn);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1262 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1263 Ichar ret = unicode_to_ichar (c, dyn);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1264 Dynarr_free (dyn);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1265 if (ret == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1266 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1267 return make_char (ret);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1268 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1269 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1270 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1271 return Fint_to_char (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1272 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1273 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1274
872
79c6ff3eef26 [xemacs-hg @ 2002-06-20 21:18:01 by ben]
ben
parents: 867
diff changeset
1275 #ifdef MULE
79c6ff3eef26 [xemacs-hg @ 2002-06-20 21:18:01 by ben]
ben
parents: 867
diff changeset
1276
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1277 static Lisp_Object
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1278 cerrar_el_fulano (Lisp_Object fulano)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1279 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1280 FILE *file = (FILE *) get_opaque_ptr (fulano);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1281 retry_fclose (file);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1282 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1283 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1284
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1285 DEFUN ("load-unicode-mapping-table", Fload_unicode_mapping_table,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1286 2, 6, 0, /*
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1287 Load Unicode tables with the Unicode mapping data in FILENAME for CHARSET.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1288 Data is text, in the form of one translation per line -- charset
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1289 codepoint followed by Unicode codepoint. Numbers are decimal or hex
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1290 \(preceded by 0x). Comments are marked with a #. Charset codepoints
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1291 for two-dimensional charsets have the first octet stored in the
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1292 high 8 bits of the hex number and the second in the low 8 bits.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1293
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1294 If START and END are given, only charset codepoints within the given
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1295 range will be processed. (START and END apply to the codepoints in the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1296 file, before OFFSET is applied.)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1297
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1298 If OFFSET is given, that value will be added to all charset codepoints
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1299 in the file to obtain the internal charset codepoint. \(We assume
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1300 that octets in the table are in the range 33 to 126 or 32 to 127. If
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1301 you have a table in ku-ten form, with octets in the range 1 to 94, you
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1302 will have to use an offset of 5140, i.e. 0x2020.)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1303
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1304 FLAGS, if specified, control further how the tables are interpreted
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1305 and are used to special-case certain known format deviations in the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1306 Unicode tables or in the charset:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1307
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1308 `ignore-first-column'
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1309 The JIS X 0208 tables have 3 columns of data instead of 2. The first
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1310 column contains the Shift-JIS codepoint, which we ignore.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1311 `big5'
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1312 The charset codepoints are Big Five codepoints; convert it to the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1313 hacked-up Mule codepoint in `chinese-big5-1' or `chinese-big5-2'.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1314 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1315 (filename, charset, start, end, offset, flags))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1316 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1317 int st = 0, en = INT_MAX, of = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1318 FILE *file;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1319 struct gcpro gcpro1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1320 char line[1025];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1321 int fondo = specpdl_depth ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1322 int ignore_first_column = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1323 int big5 = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1324
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1325 CHECK_STRING (filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1326 charset = Fget_charset (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1327 if (!NILP (start))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1328 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1329 CHECK_INT (start);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1330 st = XINT (start);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1331 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1332 if (!NILP (end))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1333 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1334 CHECK_INT (end);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1335 en = XINT (end);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1336 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1337 if (!NILP (offset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1338 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1339 CHECK_INT (offset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1340 of = XINT (offset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1341 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1342
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1343 if (!LISTP (flags))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1344 flags = list1 (flags);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1345
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1346 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1347 EXTERNAL_LIST_LOOP_2 (elt, flags)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1348 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1349 if (EQ (elt, Qignore_first_column))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1350 ignore_first_column = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1351 else if (EQ (elt, Qbig5))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1352 big5 = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1353 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1354 invalid_constant
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1355 ("Unrecognized `load-unicode-mapping-table' flag", elt);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1356 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1357 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1358
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1359 GCPRO1 (filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1360 filename = Fexpand_file_name (filename, Qnil);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1361 file = qxe_fopen (XSTRING_DATA (filename), READ_TEXT);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1362 if (!file)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1363 report_file_error ("Cannot open", filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1364 record_unwind_protect (cerrar_el_fulano, make_opaque_ptr (file));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1365 while (fgets (line, sizeof (line), file))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1366 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1367 char *p = line;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1368 int cp1, cp2, endcount;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1369 int cp1high, cp1low;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1370 int dummy;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1371
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1372 while (*p) /* erase all comments out of the line */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1373 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1374 if (*p == '#')
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1375 *p = '\0';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1376 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1377 p++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1378 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1379 /* see if line is nothing but whitespace and skip if so */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1380 p = line + strspn (line, " \t\n\r\f");
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1381 if (!*p)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1382 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1383 /* NOTE: It appears that MS Windows and Newlib sscanf() have
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1384 different interpretations for whitespace (== "skip all whitespace
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1385 at processing point"): Newlib requires at least one corresponding
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1386 whitespace character in the input, but MS allows none. The
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1387 following would be easier to write if we could count on the MS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1388 interpretation.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1389
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1390 Also, the return value does NOT include %n storage. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1391 if ((!ignore_first_column ?
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1392 sscanf (p, "%i %i%n", &cp1, &cp2, &endcount) < 2 :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1393 sscanf (p, "%i %i %i%n", &dummy, &cp1, &cp2, &endcount) < 3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1394 || *(p + endcount + strspn (p + endcount, " \t\n\r\f")))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1395 {
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1396 warn_when_safe (Qunicode, Qwarning,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1397 "Unrecognized line in translation file %s:\n%s",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1398 XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1399 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1400 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1401 if (cp1 >= st && cp1 <= en)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1402 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1403 cp1 += of;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1404 if (cp1 < 0 || cp1 >= 65536)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1405 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1406 out_of_range:
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1407 warn_when_safe (Qunicode, Qwarning,
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1408 "Out of range first codepoint 0x%x in "
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1409 "translation file %s:\n%s",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1410 cp1, XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1411 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1412 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1413
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1414 cp1high = cp1 >> 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1415 cp1low = cp1 & 255;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1416
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1417 if (big5)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1418 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1419 Ichar ch = decode_big5_char (cp1high, cp1low);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1420 if (ch == -1)
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1421
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1422 warn_when_safe (Qunicode, Qwarning,
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1423 "Out of range Big5 codepoint 0x%x in "
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1424 "translation file %s:\n%s",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1425 cp1, XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1426 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1427 set_unicode_conversion (ch, cp2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1428 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1429 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1430 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1431 int l1, h1, l2, h2;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1432 Ichar emch;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1433
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1434 switch (XCHARSET_TYPE (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1435 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1436 case CHARSET_TYPE_94: l1 = 33; h1 = 126; l2 = 0; h2 = 0; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1437 case CHARSET_TYPE_96: l1 = 32; h1 = 127; l2 = 0; h2 = 0; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1438 case CHARSET_TYPE_94X94: l1 = 33; h1 = 126; l2 = 33; h2 = 126;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1439 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1440 case CHARSET_TYPE_96X96: l1 = 32; h1 = 127; l2 = 32; h2 = 127;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1441 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1442 default: abort (); l1 = 0; h1 = 0; l2 = 0; h2 = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1443 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1444
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1445 if (cp1high < l2 || cp1high > h2 || cp1low < l1 || cp1low > h1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1446 goto out_of_range;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1447
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1448 emch = (cp1high == 0 ? make_ichar (charset, cp1low, 0) :
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1449 make_ichar (charset, cp1high, cp1low));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1450 set_unicode_conversion (emch, cp2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1451 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1452 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1453 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1454
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1455 if (ferror (file))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1456 report_file_error ("IO error when reading", filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1457
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1458 unbind_to (fondo); /* close file */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1459 UNGCPRO;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1460 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1461 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1462
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1463 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1464
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1465
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1466 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1467 /* Unicode coding system */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1468 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1469
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1470 /* ISO 10646 UTF-16, UCS-4, UTF-8, UTF-7, etc. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1471
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1472 enum unicode_type
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1473 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1474 UNICODE_UTF_16,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1475 UNICODE_UTF_8,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1476 UNICODE_UTF_7,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1477 UNICODE_UCS_4,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1478 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1479
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1480 struct unicode_coding_system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1481 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1482 enum unicode_type type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1483 int little_endian :1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1484 int need_bom :1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1485 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1486
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1487 #define CODING_SYSTEM_UNICODE_TYPE(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1488 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1489 #define XCODING_SYSTEM_UNICODE_TYPE(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1490 CODING_SYSTEM_UNICODE_TYPE (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1491 #define CODING_SYSTEM_UNICODE_LITTLE_ENDIAN(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1492 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1493 #define XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1494 CODING_SYSTEM_UNICODE_LITTLE_ENDIAN (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1495 #define CODING_SYSTEM_UNICODE_NEED_BOM(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1496 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->need_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1497 #define XCODING_SYSTEM_UNICODE_NEED_BOM(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1498 CODING_SYSTEM_UNICODE_NEED_BOM (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1499
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1500 struct unicode_coding_stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1501 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1502 /* decode */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1503 unsigned char counter;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1504 int seen_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1505 /* encode */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1506 Lisp_Object current_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1507 int current_char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1508 int wrote_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1509 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1510
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1511 static const struct memory_description unicode_coding_system_description[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1512 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1513 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1514
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1515 DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (unicode);
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1516
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1517 /* Decode a UCS-2 or UCS-4 character into a buffer. If the lookup fails, use
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1518 <GETA MARK> (U+3013) of JIS X 0208, which means correct character
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1519 is not found, instead.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1520 #### do something more appropriate (use blob?)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1521 Danger, Will Robinson! Data loss. Should we signal user? */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1522 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1523 decode_unicode_char (int ch, unsigned_char_dynarr *dst,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1524 struct unicode_coding_stream *data, int ignore_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1525 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1526 if (ch == 0xFEFF && !data->seen_char && ignore_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1527 ;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1528 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1529 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1530 #ifdef MULE
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1531 Ichar chr = unicode_to_ichar (ch, unicode_precedence_dynarr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1532
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1533 if (chr != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1534 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1535 Ibyte work[MAX_ICHAR_LEN];
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1536 int len;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1537
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1538 len = set_itext_ichar (work, chr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1539 Dynarr_add_many (dst, work, len);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1540 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1541 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1542 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1543 Dynarr_add (dst, LEADING_BYTE_JAPANESE_JISX0208);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1544 Dynarr_add (dst, 34 + 128);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1545 Dynarr_add (dst, 46 + 128);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1546 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1547 #else
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1548 Dynarr_add (dst, (Ibyte) ch);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1549 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1550 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1551
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1552 data->seen_char = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1553 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1554
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1555 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1556 encode_unicode_char_1 (int code, unsigned_char_dynarr *dst,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1557 enum unicode_type type, int little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1558 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1559 switch (type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1560 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1561 case UNICODE_UTF_16:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1562 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1563 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1564 Dynarr_add (dst, (unsigned char) (code & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1565 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1566 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1567 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1568 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1569 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1570 Dynarr_add (dst, (unsigned char) (code & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1571 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1572 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1573
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1574 case UNICODE_UCS_4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1575 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1576 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1577 Dynarr_add (dst, (unsigned char) (code & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1578 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1579 Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1580 Dynarr_add (dst, (unsigned char) (code >> 24));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1581 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1582 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1583 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1584 Dynarr_add (dst, (unsigned char) (code >> 24));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1585 Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1586 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1587 Dynarr_add (dst, (unsigned char) (code & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1588 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1589 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1590
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1591 case UNICODE_UTF_8:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1592 if (code <= 0x7f)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1593 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1594 Dynarr_add (dst, (unsigned char) code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1595 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1596 else if (code <= 0x7ff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1597 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1598 Dynarr_add (dst, (unsigned char) ((code >> 6) | 0xc0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1599 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1600 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1601 else if (code <= 0xffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1602 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1603 Dynarr_add (dst, (unsigned char) ((code >> 12) | 0xe0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1604 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1605 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1606 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1607 else if (code <= 0x1fffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1608 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1609 Dynarr_add (dst, (unsigned char) ((code >> 18) | 0xf0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1610 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1611 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1612 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1613 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1614 else if (code <= 0x3ffffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1615 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1616 Dynarr_add (dst, (unsigned char) ((code >> 24) | 0xf8));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1617 Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1618 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1619 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1620 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1621 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1622 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1623 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1624 Dynarr_add (dst, (unsigned char) ((code >> 30) | 0xfc));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1625 Dynarr_add (dst, (unsigned char) (((code >> 24) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1626 Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1627 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1628 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1629 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1630 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1631 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1632
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1633 case UNICODE_UTF_7: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1634
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1635 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1636 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1637 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1638
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1639 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1640 encode_unicode_char (Lisp_Object charset, int h, int l,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1641 unsigned_char_dynarr *dst, enum unicode_type type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1642 int little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1643 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1644 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1645 int code = ichar_to_unicode (make_ichar (charset, h & 127, l & 127));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1646
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1647 if (code == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1648 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1649 if (type != UNICODE_UTF_16 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1650 XCHARSET_DIMENSION (charset) == 2 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1651 XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1652 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1653 unsigned char final = XCHARSET_FINAL (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1654
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1655 if (('@' <= final) && (final < 0x7f))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1656 code = (0xe00000 + (final - '@') * 94 * 94
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1657 + ((h & 127) - 33) * 94 + (l & 127) - 33);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1658 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1659 code = '?';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1660 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1661 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1662 code = '?';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1663 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1664 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1665 int code = h;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1666 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1667
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1668 encode_unicode_char_1 (code, dst, type, little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1669 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1670
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1671 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1672 unicode_convert (struct coding_stream *str, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1673 unsigned_char_dynarr *dst, Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1674 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1675 unsigned int ch = str->ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1676 struct unicode_coding_stream *data = CODING_STREAM_TYPE_DATA (str, unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1677 enum unicode_type type =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1678 XCODING_SYSTEM_UNICODE_TYPE (str->codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1679 int little_endian = XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (str->codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1680 int ignore_bom = XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1681 Bytecount orign = n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1682
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1683 if (str->direction == CODING_DECODE)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1684 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1685 unsigned char counter = data->counter;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1686
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1687 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1688 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1689 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1690
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1691 switch (type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1692 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1693 case UNICODE_UTF_8:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1694 switch (counter)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1695 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1696 case 0:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1697 if (c >= 0xfc)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1698 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1699 ch = c & 0x01;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1700 counter = 5;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1701 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1702 else if (c >= 0xf8)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1703 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1704 ch = c & 0x03;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1705 counter = 4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1706 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1707 else if (c >= 0xf0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1708 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1709 ch = c & 0x07;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1710 counter = 3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1711 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1712 else if (c >= 0xe0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1713 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1714 ch = c & 0x0f;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1715 counter = 2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1716 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1717 else if (c >= 0xc0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1718 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1719 ch = c & 0x1f;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1720 counter = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1721 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1722 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1723 decode_unicode_char (c, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1724 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1725 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1726 ch = (ch << 6) | (c & 0x3f);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1727 decode_unicode_char (ch, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1728 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1729 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1730 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1731 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1732 ch = (ch << 6) | (c & 0x3f);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1733 counter--;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1734 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1735 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1736
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1737 case UNICODE_UTF_16:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1738 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1739 ch = (c << counter) | ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1740 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1741 ch = (ch << 8) | c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1742 counter += 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1743 if (counter == 16)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1744 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1745 int tempch = ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1746 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1747 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1748 decode_unicode_char (tempch, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1749 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1750 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1751
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1752 case UNICODE_UCS_4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1753 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1754 ch = (c << counter) | ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1755 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1756 ch = (ch << 8) | c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1757 counter += 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1758 if (counter == 32)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1759 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1760 int tempch = ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1761 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1762 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1763 if (tempch < 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1764 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1765 /* !!#### indicate an error */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1766 tempch = '~';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1767 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1768 decode_unicode_char (tempch, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1769 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1770 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1772 case UNICODE_UTF_7:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1773 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1774 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1775
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1776 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1777 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1778
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1779 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1780 if (str->eof)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1781 DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1782
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1783 data->counter = counter;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1784 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1785 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1786 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1787 unsigned char char_boundary = data->current_char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1788 Lisp_Object charset = data->current_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1789
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1790 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1791 /* flags for handling composite chars. We do a little switcheroo
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1792 on the source while we're outputting the composite char. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1793 Bytecount saved_n = 0;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1794 const Ibyte *saved_src = NULL;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1795 int in_composite = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1796
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1797 back_to_square_n:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1798 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1799
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1800 if (XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys) && !data->wrote_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1801 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1802 encode_unicode_char_1 (0xFEFF, dst, type, little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1803 data->wrote_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1804 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1805
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1806 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1807 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1808 Ibyte c = *src++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1809
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1810 #ifdef MULE
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1811 if (byte_ascii_p (c))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1812 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1813 { /* Processing ASCII character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1814 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1815 encode_unicode_char (Vcharset_ascii, c, 0, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1816 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1817
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1818 char_boundary = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1819 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1820 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1821 else if (ibyte_leading_byte_p (c) || ibyte_leading_byte_p (ch))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1822 { /* Processing Leading Byte */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1823 ch = 0;
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1824 charset = charset_by_leading_byte (c);
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1825 if (leading_byte_prefix_p(c))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1826 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1827 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1828 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1829 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1830 { /* Processing Non-ASCII character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1831 char_boundary = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1832 if (EQ (charset, Vcharset_control_1))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1833 encode_unicode_char (Vcharset_control_1, c, 0, dst,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1834 type, little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1835 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1836 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1837 switch (XCHARSET_REP_BYTES (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1838 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1839 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1840 encode_unicode_char (charset, c, 0, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1841 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1842 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1843 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1844 if (XCHARSET_PRIVATE_P (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1845 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1846 encode_unicode_char (charset, c, 0, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1847 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1848 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1849 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1850 else if (ch)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1851 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1852 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1853 if (EQ (charset, Vcharset_composite))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1854 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1855 if (in_composite)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1856 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1857 /* #### Bother! We don't know how to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1858 handle this yet. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1859 encode_unicode_char (Vcharset_ascii, '~', 0,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1860 dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1861 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1862 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1863 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1864 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1865 Ichar emch = make_ichar (Vcharset_composite,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1866 ch & 0x7F,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1867 c & 0x7F);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1868 Lisp_Object lstr =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1869 composite_char_string (emch);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1870 saved_n = n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1871 saved_src = src;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1872 in_composite = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1873 src = XSTRING_DATA (lstr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1874 n = XSTRING_LENGTH (lstr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1875 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1876 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1877 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1878 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1879 encode_unicode_char (charset, ch, c, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1880 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1881 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1882 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1883 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1884 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1885 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1886 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1887 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1888 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1889 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1890 if (ch)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1891 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1892 encode_unicode_char (charset, ch, c, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1893 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1894 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1895 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1896 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1897 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1898 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1899 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1900 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1901 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1902 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1903 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1904 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1905 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1906 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1907 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1908 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1909
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1910 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1911 if (in_composite)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1912 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1913 n = saved_n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1914 src = saved_src;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1915 in_composite = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1916 goto back_to_square_n; /* Wheeeeeeeee ..... */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1917 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1918 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1919
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1920 data->current_char_boundary = char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1921 data->current_charset = charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1922
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1923 /* La palabra se hizo carne! */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1924 /* A palavra fez-se carne! */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1925 /* Whatever. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1926 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1927
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1928 str->ch = ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1929 return orign;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1930 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1931
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1932 /* DEFINE_DETECTOR (utf_7); */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1933 DEFINE_DETECTOR (utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1934 DEFINE_DETECTOR_CATEGORY (utf_8, utf_8);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
1935 DEFINE_DETECTOR_CATEGORY (utf_8, utf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1936 DEFINE_DETECTOR (ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1937 DEFINE_DETECTOR_CATEGORY (ucs_4, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1938 DEFINE_DETECTOR (utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1939 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1940 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1941 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1942 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1943
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1944 struct ucs_4_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1945 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1946 int in_ucs_4_byte;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1947 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1948
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1949 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1950 ucs_4_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1951 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1952 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1953 struct ucs_4_detector *data = DETECTION_STATE_DATA (st, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1954
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1955 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1956 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1957 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1958 switch (data->in_ucs_4_byte)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1959 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1960 case 0:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1961 if (c >= 128)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1962 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1963 DET_RESULT (st, ucs_4) = DET_NEARLY_IMPOSSIBLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1964 return;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1965 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1966 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1967 data->in_ucs_4_byte++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1968 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1969 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1970 data->in_ucs_4_byte = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1971 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1972 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1973 data->in_ucs_4_byte++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1974 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1975 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1976
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1977 /* !!#### write this for real */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1978 DET_RESULT (st, ucs_4) = DET_AS_LIKELY_AS_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1979 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1980
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1981 struct utf_16_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1982 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1983 unsigned int seen_ffff:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1984 unsigned int seen_forward_bom:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1985 unsigned int seen_rev_bom:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1986 int byteno;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1987 int prev_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1988 int text, rev_text;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
1989 int sep, rev_sep;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
1990 int num_ascii;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1991 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1992
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1993 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1994 utf_16_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1995 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1996 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1997 struct utf_16_detector *data = DETECTION_STATE_DATA (st, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1998
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1999 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2000 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2001 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2002 int prevc = data->prev_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2003 if (data->byteno == 1 && c == 0xFF && prevc == 0xFE)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2004 data->seen_forward_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2005 else if (data->byteno == 1 && c == 0xFE && prevc == 0xFF)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2006 data->seen_rev_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2007
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2008 if (data->byteno & 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2009 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2010 if (c == 0xFF && prevc == 0xFF)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2011 data->seen_ffff = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2012 if (prevc == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2013 && (c == '\r' || c == '\n'
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2014 || (c >= 0x20 && c <= 0x7E)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2015 data->text++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2016 if (c == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2017 && (prevc == '\r' || prevc == '\n'
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2018 || (prevc >= 0x20 && prevc <= 0x7E)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2019 data->rev_text++;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2020 /* #### 0x2028 is LINE SEPARATOR and 0x2029 is PARAGRAPH SEPARATOR.
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2021 I used to count these in text and rev_text but that is very bad,
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2022 as 0x2028 is also space + left-paren in ASCII, which is extremely
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2023 common. So, what do we do with these? */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2024 if (prevc == 0x20 && (c == 0x28 || c == 0x29))
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2025 data->sep++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2026 if (c == 0x20 && (prevc == 0x28 || prevc == 0x29))
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2027 data->rev_sep++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2028 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2029
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2030 if ((c >= ' ' && c <= '~') || c == '\n' || c == '\r' || c == '\t' ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2031 c == '\f' || c == '\v')
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2032 data->num_ascii++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2033 data->byteno++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2034 data->prev_char = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2035 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2036
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2037 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2038 int variance_indicates_big_endian =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2039 (data->text >= 10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2040 && (data->rev_text == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2041 || data->text / data->rev_text >= 10));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2042 int variance_indicates_little_endian =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2043 (data->rev_text >= 10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2044 && (data->text == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2045 || data->rev_text / data->text >= 10));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2046
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2047 if (data->seen_ffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2048 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2049 else if (data->seen_forward_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2050 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2051 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2052 if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2053 DET_RESULT (st, utf_16_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2054 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2055 DET_RESULT (st, utf_16_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2056 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2057 DET_RESULT (st, utf_16_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2058 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2059 else if (data->seen_forward_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2060 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2061 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2062 if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2063 DET_RESULT (st, utf_16_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2064 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2065 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2066 DET_RESULT (st, utf_16_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2067 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2068 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2069 DET_RESULT (st, utf_16_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2070 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2071 else if (data->seen_rev_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2072 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2073 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2074 if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2075 DET_RESULT (st, utf_16_little_endian_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2076 else if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2077 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2078 DET_RESULT (st, utf_16_little_endian_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2079 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2080 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2081 DET_RESULT (st, utf_16_little_endian_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2082 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2083 else if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2084 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2085 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2086 DET_RESULT (st, utf_16) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2087 DET_RESULT (st, utf_16_little_endian) = DET_SOMEWHAT_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2088 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2089 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2090 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2091 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2092 DET_RESULT (st, utf_16) = DET_SOMEWHAT_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2093 DET_RESULT (st, utf_16_little_endian) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2094 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2095 else
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2096 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2097 /* #### FUCKME! There should really be an ASCII detector. This
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2098 would rule out the need to have this built-in here as
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2099 well. --ben */
1292
f3437b56874d [xemacs-hg @ 2003-02-13 09:57:04 by ben]
ben
parents: 1267
diff changeset
2100 int pct_ascii = data->byteno ? (100 * data->num_ascii) / data->byteno
f3437b56874d [xemacs-hg @ 2003-02-13 09:57:04 by ben]
ben
parents: 1267
diff changeset
2101 : 100;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2102
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2103 if (pct_ascii > 90)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2104 SET_DET_RESULTS (st, utf_16, DET_QUITE_IMPROBABLE);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2105 else if (pct_ascii > 75)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2106 SET_DET_RESULTS (st, utf_16, DET_SOMEWHAT_UNLIKELY);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2107 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2108 SET_DET_RESULTS (st, utf_16, DET_AS_LIKELY_AS_UNLIKELY);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2109 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2110 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2111 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2112
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2113 struct utf_8_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2114 {
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2115 int byteno;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2116 int first_byte;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2117 int second_byte;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2118 int prev_byte;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2119 int in_utf_8_byte;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2120 int recent_utf_8_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2121 int seen_bogus_utf8;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2122 int seen_really_bogus_utf8;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2123 int seen_2byte_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2124 int seen_longer_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2125 int seen_iso2022_esc;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2126 int seen_iso_shift;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2127 int seen_utf_bom:1;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2128 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2129
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2130 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2131 utf_8_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2132 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2133 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2134 struct utf_8_detector *data = DETECTION_STATE_DATA (st, utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2135
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2136 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2137 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2138 UExtbyte c = *src++;
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2139 switch (data->byteno)
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2140 {
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2141 case 0:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2142 data->first_byte = c;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2143 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2144 case 1:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2145 data->second_byte = c;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2146 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2147 case 2:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2148 if (data->first_byte == 0xef &&
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2149 data->second_byte == 0xbb &&
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2150 c == 0xbf)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2151 data->seen_utf_bom = 1;
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2152 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2153 }
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2154
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2155 switch (data->in_utf_8_byte)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2156 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2157 case 0:
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2158 if (data->prev_byte == ISO_CODE_ESC && c >= 0x28 && c <= 0x2F)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2159 data->seen_iso2022_esc++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2160 else if (c == ISO_CODE_SI || c == ISO_CODE_SO)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2161 data->seen_iso_shift++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2162 else if (c >= 0xfc)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2163 data->in_utf_8_byte = 5;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2164 else if (c >= 0xf8)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2165 data->in_utf_8_byte = 4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2166 else if (c >= 0xf0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2167 data->in_utf_8_byte = 3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2168 else if (c >= 0xe0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2169 data->in_utf_8_byte = 2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2170 else if (c >= 0xc0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2171 data->in_utf_8_byte = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2172 else if (c >= 0x80)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2173 data->seen_bogus_utf8++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2174 if (data->in_utf_8_byte > 0)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2175 data->recent_utf_8_sequence = data->in_utf_8_byte;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2176 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2177 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2178 if ((c & 0xc0) != 0x80)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2179 data->seen_really_bogus_utf8++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2180 else
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2181 {
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2182 data->in_utf_8_byte--;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2183 if (data->in_utf_8_byte == 0)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2184 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2185 if (data->recent_utf_8_sequence == 1)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2186 data->seen_2byte_sequence++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2187 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2188 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2189 assert (data->recent_utf_8_sequence >= 2);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2190 data->seen_longer_sequence++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2191 }
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2192 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2193 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2194 }
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2195
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2196 data->byteno++;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2197 data->prev_byte = c;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2198 }
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2199
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2200 /* either BOM or no BOM, but not both */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2201 SET_DET_RESULTS (st, utf_8, DET_NEARLY_IMPOSSIBLE);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2202
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2203
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2204 if (data->seen_utf_bom)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2205 DET_RESULT (st, utf_8_bom) = DET_NEAR_CERTAINTY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2206 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2207 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2208 if (data->seen_really_bogus_utf8 ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2209 data->seen_bogus_utf8 >= 2)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2210 ; /* bogus */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2211 else if (data->seen_bogus_utf8)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2212 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2213 else if ((data->seen_longer_sequence >= 5 ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2214 data->seen_2byte_sequence >= 10) &&
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2215 (!(data->seen_iso2022_esc + data->seen_iso_shift) ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2216 (data->seen_longer_sequence * 2 + data->seen_2byte_sequence) /
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2217 (data->seen_iso2022_esc + data->seen_iso_shift) >= 10))
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2218 /* heuristics, heuristics, we love heuristics */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2219 DET_RESULT (st, utf_8) = DET_QUITE_PROBABLE;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2220 else if (data->seen_iso2022_esc ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2221 data->seen_iso_shift >= 3)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2222 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2223 else if (data->seen_longer_sequence ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2224 data->seen_2byte_sequence)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2225 DET_RESULT (st, utf_8) = DET_SOMEWHAT_LIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2226 else if (data->seen_iso_shift)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2227 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2228 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2229 DET_RESULT (st, utf_8) = DET_AS_LIKELY_AS_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2230 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2231 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2232
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2233 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2234 unicode_init_coding_stream (struct coding_stream *str)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2235 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2236 struct unicode_coding_stream *data =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2237 CODING_STREAM_TYPE_DATA (str, unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2238 xzero (*data);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2239 data->current_charset = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2240 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2241
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2242 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2243 unicode_rewind_coding_stream (struct coding_stream *str)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2244 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2245 unicode_init_coding_stream (str);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2246 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2247
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2248 static int
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2249 unicode_putprop (Lisp_Object codesys, Lisp_Object key, Lisp_Object value)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2250 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2251 if (EQ (key, Qtype))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2252 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2253 enum unicode_type type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2254
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2255 if (EQ (value, Qutf_8))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2256 type = UNICODE_UTF_8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2257 else if (EQ (value, Qutf_16))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2258 type = UNICODE_UTF_16;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2259 else if (EQ (value, Qutf_7))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2260 type = UNICODE_UTF_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2261 else if (EQ (value, Qucs_4))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2262 type = UNICODE_UCS_4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2263 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2264 invalid_constant ("Invalid Unicode type", key);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2265
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2266 XCODING_SYSTEM_UNICODE_TYPE (codesys) = type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2267 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2268 else if (EQ (key, Qlittle_endian))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2269 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (codesys) = !NILP (value);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2270 else if (EQ (key, Qneed_bom))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2271 XCODING_SYSTEM_UNICODE_NEED_BOM (codesys) = !NILP (value);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2272 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2273 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2274 return 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2275 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2276
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2277 static Lisp_Object
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2278 unicode_getprop (Lisp_Object coding_system, Lisp_Object prop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2279 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2280 if (EQ (prop, Qtype))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2281 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2282 switch (XCODING_SYSTEM_UNICODE_TYPE (coding_system))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2283 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2284 case UNICODE_UTF_16: return Qutf_16;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2285 case UNICODE_UTF_8: return Qutf_8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2286 case UNICODE_UTF_7: return Qutf_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2287 case UNICODE_UCS_4: return Qucs_4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2288 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2289 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2290 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2291 else if (EQ (prop, Qlittle_endian))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2292 return XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (coding_system) ? Qt : Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2293 else if (EQ (prop, Qneed_bom))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2294 return XCODING_SYSTEM_UNICODE_NEED_BOM (coding_system) ? Qt : Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2295 return Qunbound;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2296 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2297
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2298 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2299 unicode_print (Lisp_Object cs, Lisp_Object printcharfun, int escapeflag)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2300 {
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
2301 write_fmt_string_lisp (printcharfun, "(%s", 1, unicode_getprop (cs, Qtype));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2302 if (XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (cs))
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2303 write_c_string (printcharfun, ", little-endian");
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2304 if (XCODING_SYSTEM_UNICODE_NEED_BOM (cs))
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2305 write_c_string (printcharfun, ", need-bom");
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2306 write_c_string (printcharfun, ")");
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2307 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2308
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2309 int
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2310 dfc_coding_system_is_unicode (Lisp_Object codesys)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2311 {
1315
70921960b980 [xemacs-hg @ 2003-02-20 08:19:28 by ben]
ben
parents: 1292
diff changeset
2312 #ifdef WIN32_ANY
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2313 codesys = Fget_coding_system (codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2314 return (EQ (XCODING_SYSTEM_TYPE (codesys), Qunicode) &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2315 XCODING_SYSTEM_UNICODE_TYPE (codesys) == UNICODE_UTF_16 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2316 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (codesys));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2317
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2318 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2319 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2320 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2321 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2322
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2323
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2324 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2325 /* Initialization */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2326 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2327
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2328 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2329 syms_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2330 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2331 #ifdef MULE
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
2332 DEFSUBR (Funicode_precedence_list);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2333 DEFSUBR (Fset_language_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2334 DEFSUBR (Flanguage_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2335 DEFSUBR (Fset_default_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2336 DEFSUBR (Fdefault_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2337 DEFSUBR (Fset_unicode_conversion);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2338
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
2339 DEFSUBR (Fload_unicode_mapping_table);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2340
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2341 DEFSYMBOL (Qignore_first_column);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2342 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2343
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
2344 DEFSUBR (Fchar_to_unicode);
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
2345 DEFSUBR (Funicode_to_char);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2346
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2347 DEFSYMBOL (Qunicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2348 DEFSYMBOL (Qucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2349 DEFSYMBOL (Qutf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2350 DEFSYMBOL (Qutf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2351 DEFSYMBOL (Qutf_7);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2352
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2353 DEFSYMBOL (Qneed_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2354
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2355 DEFSYMBOL (Qutf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2356 DEFSYMBOL (Qutf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2357 DEFSYMBOL (Qutf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2358 DEFSYMBOL (Qutf_16_little_endian_bom);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2359
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2360 DEFSYMBOL (Qutf_8);
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2361 DEFSYMBOL (Qutf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2362 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2363
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2364 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2365 coding_system_type_create_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2366 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2367 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (unicode, "unicode-coding-system-p");
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2368 CODING_SYSTEM_HAS_METHOD (unicode, print);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2369 CODING_SYSTEM_HAS_METHOD (unicode, convert);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2370 CODING_SYSTEM_HAS_METHOD (unicode, init_coding_stream);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2371 CODING_SYSTEM_HAS_METHOD (unicode, rewind_coding_stream);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2372 CODING_SYSTEM_HAS_METHOD (unicode, putprop);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2373 CODING_SYSTEM_HAS_METHOD (unicode, getprop);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2374
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2375 INITIALIZE_DETECTOR (utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2376 DETECTOR_HAS_METHOD (utf_8, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2377 INITIALIZE_DETECTOR_CATEGORY (utf_8, utf_8);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2378 INITIALIZE_DETECTOR_CATEGORY (utf_8, utf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2379
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2380 INITIALIZE_DETECTOR (ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2381 DETECTOR_HAS_METHOD (ucs_4, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2382 INITIALIZE_DETECTOR_CATEGORY (ucs_4, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2383
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2384 INITIALIZE_DETECTOR (utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2385 DETECTOR_HAS_METHOD (utf_16, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2386 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2387 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2388 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2389 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2390 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2391
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2392 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2393 reinit_coding_system_type_create_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2394 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2395 REINITIALIZE_CODING_SYSTEM_TYPE (unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2396 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2397
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2398 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2399 reinit_vars_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2400 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2401 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2402 init_blank_unicode_tables ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2403 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2404 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2405
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2406 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2407 vars_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2408 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2409 reinit_vars_of_unicode ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2410
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2411 Fprovide (intern ("unicode"));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2412
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2413 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2414 staticpro (&Vlanguage_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2415 Vlanguage_unicode_precedence_list = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2416
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2417 staticpro (&Vdefault_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2418 Vdefault_unicode_precedence_list = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2419
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2420 unicode_precedence_dynarr = Dynarr_new (Lisp_Object);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2421 dump_add_root_struct_ptr (&unicode_precedence_dynarr,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2422 &lisp_object_dynarr_description);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2423 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2424 }