annotate src/unicode.c @ 1292:f3437b56874d

[xemacs-hg @ 2003-02-13 09:57:04 by ben] profile updates profile.c: Major reworking. Keep track of new information -- total function timing (includes descendants), GC usage, total GC usage (includes descendants). New functions to be called appropriately from eval.c, alloc.c to keep track of this information. Keep track of when we're actually in a function vs. in its profile, for more accurate timing counts. Track profile overhead separately. Create new mechanism for specifying "internal sections" that are tracked just like regular Lisp functions and even appear in the backtrace if `backtrace-with-internal-sections' is non-nil (t by default for error-checking builds). Add some KKCC information for the straight (non-Elisp) hash table used by profile, which contains Lisp objects in its keys -- but not used yet. Remove old ad-hoc methods for tracking garbage collection, redisplay (which was incorrect anyway when Lisp was called within these sections). Don't record any tick info when blocking under MS Windows, since the timer there is in real time rather than in process time. Make `start-profiling', `stop-profiling' interactive. Be consistent wrt. recursive functions and functions currently on the stack when starting or stopping -- together these make implementing the `total' values extremely difficult. When we start profiling, we act as if we just entered all the functions currently on the stack. Likewise when exiting. Create vars in_profile for tracking time spent inside of profiling, and profiling_lock for setting exclusive access to the main hash table when reading from it or modifying it. (protects against getting screwed up by the signal handle going off at the same time. profile.h: New file. Create macros for declaring internal profiling sections. lisp.h: Move profile-related stuff to profile.h. alloc.c: Keep track of total consing, for profile. Tell profile when we are consing. Use new profile-section method for noting garbage-collection. alloc.c: Abort if we attempt to call the allocator reentrantly. backtrace.h, eval.c: Add info for use by profile in the backtrace frame and transfer PUSH_BACKTRACE/POP_BACKTRACE from eval.c, for use with profile. elhash.c: Author comment. eval.c, lisp.h: New Lisp var `backtrace-with-internal-sections'. Set to t when error-checking is on. eval.c: When unwinding, eval.c: Report to profile when we are about-to-call and just-called wrt. a function. alloc.c, eval.c: Allow for "fake" backtrace frames, for internal sections (used by profile and `backtrace-with-internal-sections'. event-Xt.c, event-gtk.c, event-msw.c, event-tty.c: Record when we are actually blocking on an event, for profile's sake. event-stream.c: Record internal profiling sections for getting, dispatching events. extents.c: Record internal profiling sections for map_extents. hash.c, hash.h: Add pregrow_hash_table_if_necessary(). (Used in profile code since the signal handler is the main grower but can't allow a realloc(). We make sure, at critical points, that the table is large enough.) lread.c: Create internal profiling sections for `load' (which may be triggered internally by autoload, etc.). redisplay.c: Remove old profile_redisplay_flag. Use new macros to declare internal profiling section for redisplay. text.c: Use new macros to declare internal profiling sections for char-byte conversion and internal-external conversion. SEMI-UNRELATED CHANGES: ----------------------- text.c: Update the long comments.
author ben
date Thu, 13 Feb 2003 09:57:08 +0000
parents c57f32e44416
children 70921960b980
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1 /* Code to handle Unicode conversion.
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2 Copyright (C) 2000, 2001, 2002, 2003 Ben Wing.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
4 This file is part of XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
5
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
6 XEmacs is free software; you can redistribute it and/or modify it
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
7 under the terms of the GNU General Public License as published by the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
8 Free Software Foundation; either version 2, or (at your option) any
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
9 later version.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
11 XEmacs is distributed in the hope that it will be useful, but WITHOUT
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
12 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
13 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
14 for more details.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
15
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
16 You should have received a copy of the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
17 along with XEmacs; see the file COPYING. If not, write to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
18 the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
19 Boston, MA 02111-1307, USA. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
20
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
21 /* Synched up with: FSF 20.3. Not in FSF. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
22
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
23 /* Authorship:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
24
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
25 Current primary author: Ben Wing <ben@xemacs.org>
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
26
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
27 Written by Ben Wing <ben@xemacs.org>, June, 2001.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
28 Separated out into this file, August, 2001.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
29 Includes Unicode coding systems, some parts of which have been written
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
30 by someone else. #### Morioka and Hayashi, I think.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
31
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
32 As of September 2001, the detection code is here and abstraction of the
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
33 detection system is finished. The unicode detectors have been rewritten
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
34 to include multiple levels of likelihood.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
35 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
36
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
37 #include <config.h>
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
38 #include "lisp.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
39
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
40 #include "charset.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
41 #include "file-coding.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
42 #include "opaque.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
43
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
44 #include "sysfile.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
45
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
46 /* #### WARNING! The current sledgehammer routines have a fundamental
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
47 problem in that they can't handle two characters mapping to a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
48 single Unicode codepoint or vice-versa in a single charset table.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
49 It's not clear there is any way to handle this and still make the
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
50 sledgehammer routines useful.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
51
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
52 Inquiring Minds Want To Know Dept: does the above WARNING mean that
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
53 _if_ it happens, then it will signal error, or then it will do
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
54 something evil and unpredictable? Signaling an error is OK: for
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
55 all national standards, the national to Unicode map is an inclusion
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
56 (1-to-1). Any character set that does not behave that way is
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
57 broken according to the Unicode standard. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
58
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
59 /* #define SLEDGEHAMMER_CHECK_UNICODE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
60
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
61 /* We currently use the following format for tables:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
62
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
63 If dimension == 1, to_unicode_table is a 96-element array of ints
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
64 (Unicode code points); else, it's a 96-element array of int *
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
65 pointers, each of which points to a 96-element array of ints. If no
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
66 elements in a row have been filled in, the pointer will point to a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
67 default empty table; that way, memory usage is more reasonable but
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
68 lookup still fast.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
69
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
70 -- If from_unicode_levels == 1, from_unicode_table is a 256-element
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
71 array of shorts (octet 1 in high byte, octet 2 in low byte; we don't
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
72 store Ichars directly to save space).
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
73
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
74 -- If from_unicode_levels == 2, from_unicode_table is a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
75 256-element array of short * pointers, each of which points to a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
76 256-element array of shorts.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
77
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
78 -- If from_unicode_levels == 3, from_unicode_table is a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
79 256-element array of short ** pointers, each of which points to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
80 a 256-element array of short * pointers, each of which points to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
81 a 256-element array of shorts.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
82
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
83 -- If from_unicode_levels == 4, same thing but one level deeper.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
84
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
85 Just as for to_unicode_table, we use default tables to fill in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
86 all entries with no values in them.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
87
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
88 #### An obvious space-saving optimization is to use variable-sized
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
89 tables, where each table instead of just being a 256-element array,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
90 is a structure with a start value, an end value, and a variable
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
91 number of entries (END - START + 1). Only 8 bits are needed for
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
92 END and START, and could be stored at the end to avoid alignment
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
93 problems. However, before charging off and implementing this,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
94 we need to consider whether it's worth it:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
95
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
96 (1) Most tables will be highly localized in which code points are
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
97 defined, heavily reducing the possible memory waste. Before
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
98 doing any rewriting, write some code to see how much memory is
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
99 actually being wasted (i.e. ratio of empty entries to total # of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
100 entries) and only start rewriting if it's unacceptably high. You
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
101 have to check over all charsets.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
102
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
103 (2) Since entries are usually added one at a time, you have to be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
104 very careful when creating the tables to avoid realloc()/free()
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
105 thrashing in the common case when you are in an area of high
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
106 localization and are going to end up using most entries in the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
107 table. You'd certainly want to allow only certain sizes, not
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
108 arbitrary ones (probably powers of 2, where you want the entire
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
109 block including the START/END values to fit into a power of 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
110 minus any malloc overhead if there is any -- there's none under
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
111 gmalloc.c, and probably most system malloc() functions are quite
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
112 smart nowadays and also have no overhead). You could optimize
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
113 somewhat during the in-C initializations, because you can compute
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
114 the actual usage of various tables by scanning the entries you're
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
115 going to add in a separate pass before adding them. (You could
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
116 actually do the same thing when entries are added on the Lisp
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
117 level by making the assumption that all the entries will come in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
118 one after another before any use is made of the data. So as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
119 they're coming in, you just store them in a big long list, and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
120 the first time you need to retrieve an entry, you compute the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
121 whole table at once.) You'd still have to deal with the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
122 possibility of later entries coming in, though.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
123
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
124 (3) You do lose some speed using START/END values, since you need
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
125 a couple of comparisons at each level. This could easily make
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
126 each single lookup become 3-4 times slower. The Unicode book
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
127 considers this a big issue, and recommends against variable-sized
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
128 tables for this reason; however, they almost certainly have in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
129 mind applications that primarily involve conversion of large
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
130 amounts of data. Most Unicode strings that are translated in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
131 XEmacs are fairly small. The only place where this might matter
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
132 is in loading large files -- e.g. a 3-megabyte Unicode-encoded
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
133 file. So think about this, and maybe do a trial implementation
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
134 where you don't worry too much about the intricacies of (2) and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
135 just implement some basic "multiply by 1.5" trick or something to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
136 do the resizing. There is a very good FAQ on Unicode called
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
137 something like the Linux-Unicode How-To (it should be part of the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
138 Linux How-To's, I think), that lists the url of a guy with a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
139 whole bunch of unicode files you can use to stress-test your
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
140 implementations, and he's highly likely to have a good
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
141 multi-megabyte Unicode-encoded file (with normal text in it -- if
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
142 you created your own just by creating repeated strings of letters
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
143 and numbers, you probably wouldn't get accurate results).
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
144 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
145
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
146 /* When MULE is not defined, we may still need some Unicode support --
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
147 in particular, some Windows API's always want Unicode, and the way
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
148 we've set up the Unicode encapsulation, we may as well go ahead and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
149 always use the Unicode versions of split API's. (It would be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
150 trickier to not use them, and pointless -- under NT, the ANSI API's
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
151 call the Unicode ones anyway, so in the case of structures, we'd be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
152 converting from Unicode to ANSI structures, only to have the OS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
153 convert them back.) */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
154
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
155 Lisp_Object Qunicode;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
156 Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
157 Lisp_Object Qneed_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
158
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
159 Lisp_Object Qutf_16_little_endian, Qutf_16_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
160 Lisp_Object Qutf_16_little_endian_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
161
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
162 Lisp_Object Qutf_8_bom;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
163
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
164 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
165
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
166 /* #### Using ints for to_unicode is OK (as long as they are >= 32 bits).
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
167 However, shouldn't the shorts below be unsigned? */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
168 static int *to_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
169 static int **to_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
170
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
171 static short *from_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
172 static short **from_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
173 static short ***from_unicode_blank_3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
174 static short ****from_unicode_blank_4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
175
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
176 static const struct memory_description to_unicode_level_0_desc_1[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
177 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
178 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
179
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
180 static const struct sized_memory_description to_unicode_level_0_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
181 sizeof (int), to_unicode_level_0_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
182 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
183
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
184 static const struct memory_description to_unicode_level_1_desc_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
185 { XD_STRUCT_PTR, 0, 96, &to_unicode_level_0_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
186 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
187 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
188
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
189 static const struct sized_memory_description to_unicode_level_1_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
190 sizeof (void *), to_unicode_level_1_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
191 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
192
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
193 static const struct memory_description to_unicode_description_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
194 { XD_STRUCT_PTR, 1, 96, &to_unicode_level_0_desc },
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
195 { XD_STRUCT_PTR, 2, 96, &to_unicode_level_1_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
196 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
197 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
198
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
199 /* Not static because each charset has a set of to and from tables and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
200 needs to describe them to pdump. */
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
201 const struct sized_memory_description to_unicode_description = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
202 sizeof (void *), to_unicode_description_1
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
203 };
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
205 static const struct memory_description from_unicode_level_0_desc_1[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
206 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
207 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
208
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
209 static const struct sized_memory_description from_unicode_level_0_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
210 sizeof (short), from_unicode_level_0_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
211 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
212
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
213 static const struct memory_description from_unicode_level_1_desc_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
214 { XD_STRUCT_PTR, 0, 256, &from_unicode_level_0_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
215 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
216 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
217
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
218 static const struct sized_memory_description from_unicode_level_1_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
219 sizeof (void *), from_unicode_level_1_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
220 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
221
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
222 static const struct memory_description from_unicode_level_2_desc_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
223 { XD_STRUCT_PTR, 0, 256, &from_unicode_level_1_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
224 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
225 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
226
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
227 static const struct sized_memory_description from_unicode_level_2_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
228 sizeof (void *), from_unicode_level_2_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
229 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
230
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
231 static const struct memory_description from_unicode_level_3_desc_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
232 { XD_STRUCT_PTR, 0, 256, &from_unicode_level_2_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
233 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
234 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
235
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
236 static const struct sized_memory_description from_unicode_level_3_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
237 sizeof (void *), from_unicode_level_3_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
238 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
239
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
240 static const struct memory_description from_unicode_description_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
241 { XD_STRUCT_PTR, 1, 256, &from_unicode_level_0_desc },
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
242 { XD_STRUCT_PTR, 2, 256, &from_unicode_level_1_desc },
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
243 { XD_STRUCT_PTR, 3, 256, &from_unicode_level_2_desc },
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
244 { XD_STRUCT_PTR, 4, 256, &from_unicode_level_3_desc },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
245 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
246 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
247
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
248 /* Not static because each charset has a set of to and from tables and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
249 needs to describe them to pdump. */
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
250 const struct sized_memory_description from_unicode_description = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
251 sizeof (void *), from_unicode_description_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
252 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
253
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
254 static Lisp_Object_dynarr *unicode_precedence_dynarr;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
255
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
256 static const struct memory_description lod_description_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
257 XD_DYNARR_DESC (Lisp_Object_dynarr, &lisp_object_description),
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
258 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
259 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
260
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
261 static const struct sized_memory_description lisp_object_dynarr_description = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
262 sizeof (Lisp_Object_dynarr),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
263 lod_description_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
264 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
265
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
266 Lisp_Object Vlanguage_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
267 Lisp_Object Vdefault_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
268
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
269 Lisp_Object Qignore_first_column;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
270
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
271
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
272 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
273 /* Unicode implementation */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
274 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
275
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
276 #define BREAKUP_UNICODE_CODE(val, u1, u2, u3, u4, levels) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
277 do { \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
278 int buc_val = (val); \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
279 \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
280 (u1) = buc_val >> 24; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
281 (u2) = (buc_val >> 16) & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
282 (u3) = (buc_val >> 8) & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
283 (u4) = buc_val & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
284 (levels) = (buc_val <= 0xFF ? 1 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
285 buc_val <= 0xFFFF ? 2 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
286 buc_val <= 0xFFFFFF ? 3 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
287 4); \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
288 } while (0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
289
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
290 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
291 init_blank_unicode_tables (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
292 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
293 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
294
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
295 from_unicode_blank_1 = xnew_array (short, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
296 from_unicode_blank_2 = xnew_array (short *, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
297 from_unicode_blank_3 = xnew_array (short **, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
298 from_unicode_blank_4 = xnew_array (short ***, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
299 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
300 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
301 /* #### IMWTK: Why does using -1 here work? Simply because there are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
302 no existing 96x96 charsets? */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
303 from_unicode_blank_1[i] = (short) -1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
304 from_unicode_blank_2[i] = from_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
305 from_unicode_blank_3[i] = from_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
306 from_unicode_blank_4[i] = from_unicode_blank_3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
307 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
308
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
309 to_unicode_blank_1 = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
310 to_unicode_blank_2 = xnew_array (int *, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
311 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
312 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
313 /* Here -1 is guaranteed OK. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
314 to_unicode_blank_1[i] = -1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
315 to_unicode_blank_2[i] = to_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
316 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
317 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
318
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
319 static void *
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
320 create_new_from_unicode_table (int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
321 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
322 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
323 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
324 /* WARNING: If you are thinking of compressing these, keep in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
325 mind that sizeof (short) does not equal sizeof (short *). */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
326 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
327 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
328 short *newtab = xnew_array (short, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
329 memcpy (newtab, from_unicode_blank_1, 256 * sizeof (short));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
330 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
331 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
332 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
333 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
334 short **newtab = xnew_array (short *, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
335 memcpy (newtab, from_unicode_blank_2, 256 * sizeof (short *));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
336 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
337 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
338 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
339 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
340 short ***newtab = xnew_array (short **, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
341 memcpy (newtab, from_unicode_blank_3, 256 * sizeof (short **));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
342 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
343 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
344 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
345 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
346 short ****newtab = xnew_array (short ***, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
347 memcpy (newtab, from_unicode_blank_4, 256 * sizeof (short ***));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
348 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
349 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
350 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
351 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
352 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
353 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
354 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
355
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
356 /* Allocate and blank the tables.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
357 Loading them up is done by parse-unicode-translation-table. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
358 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
359 init_charset_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
360 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
361 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
362 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
363 int *to_table = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
364 memcpy (to_table, to_unicode_blank_1, 96 * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
365 XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
366 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
367 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
368 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
369 int **to_table = xnew_array (int *, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
370 memcpy (to_table, to_unicode_blank_2, 96 * sizeof (int *));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
371 XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
372 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
373
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
374 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
375 XCHARSET_FROM_UNICODE_TABLE (charset) = create_new_from_unicode_table (1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
376 XCHARSET_FROM_UNICODE_LEVELS (charset) = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
377 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
378 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
379
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
380 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
381 free_from_unicode_table (void *table, int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
382 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
383 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
384
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
385 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
386 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
387 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
388 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
389 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
390 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
391 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
392 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
393 free_from_unicode_table (tab[i], 1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
394 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
395 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
396 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
397 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
398 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
399 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
400 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
401 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
402 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
403 free_from_unicode_table (tab[i], 2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
404 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
405 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
406 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
407 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
408 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
409 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
410 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
411 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
412 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
413 free_from_unicode_table (tab[i], 3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
414 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
415 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
416 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
417 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
418
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
419 xfree (table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
420 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
421
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
422 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
423 free_to_unicode_table (void *table, int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
424 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
425 if (level == 2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
426 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
427 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
428 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
429
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
430 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
431 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
432 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
433 free_to_unicode_table (tab[i], 1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
434 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
435 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
436
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
437 xfree (table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
438 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
439
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
440 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
441 free_charset_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
442 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
443 free_to_unicode_table (XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
444 XCHARSET_DIMENSION (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
445 free_from_unicode_table (XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
446 XCHARSET_FROM_UNICODE_LEVELS (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
447 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
448
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
449 #ifdef MEMORY_USAGE_STATS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
450
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
451 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
452 compute_from_unicode_table_size_1 (void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
453 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
454 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
455 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
456 Bytecount size = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
457
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
458 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
459 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
460 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
461 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
462 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
463 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
464 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
465 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
466 size += compute_from_unicode_table_size_1 (tab[i], 1, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
467 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
468 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
469 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
470 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
471 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
472 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
473 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
474 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
475 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
476 size += compute_from_unicode_table_size_1 (tab[i], 2, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
477 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
478 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
479 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
480 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
481 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
482 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
483 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
484 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
485 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
486 size += compute_from_unicode_table_size_1 (tab[i], 3, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
487 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
488 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
489 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
490 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
491
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
492 size += malloced_storage_size (table,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
493 256 * (level == 1 ? sizeof (short) :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
494 sizeof (void *)),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
495 stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
496 return size;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
497 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
498
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
499 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
500 compute_to_unicode_table_size_1 (void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
501 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
502 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
503 Bytecount size = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
504
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
505 if (level == 2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
506 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
507 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
508 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
509
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
510 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
511 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
512 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
513 size += compute_to_unicode_table_size_1 (tab[i], 1, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
514 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
515 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
516
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
517 size += malloced_storage_size (table,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
518 96 * (level == 1 ? sizeof (int) :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
519 sizeof (void *)),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
520 stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
521 return size;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
522 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
523
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
524 Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
525 compute_from_unicode_table_size (Lisp_Object charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
526 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
527 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
528 return (compute_from_unicode_table_size_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
529 (XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
530 XCHARSET_FROM_UNICODE_LEVELS (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
531 stats));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
532 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
533
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
534 Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
535 compute_to_unicode_table_size (Lisp_Object charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
536 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
537 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
538 return (compute_to_unicode_table_size_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
539 (XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
540 XCHARSET_DIMENSION (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
541 stats));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
542 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
543
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
544 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
545
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
546 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
547
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
548 /* "Sledgehammer checks" are checks that verify the self-consistency
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
549 of an entire structure every time a change is about to be made or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
550 has been made to the structure. Not fast but a pretty much
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
551 sure-fire way of flushing out any incorrectnesses in the algorithms
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
552 that create the structure.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
553
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
554 Checking only after a change has been made will speed things up by
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
555 a factor of 2, but it doesn't absolutely prove that the code just
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
556 checked caused the problem; perhaps it happened elsewhere, either
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
557 in some code you forgot to sledgehammer check or as a result of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
558 data corruption. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
559
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
560 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
561 assert_not_any_blank_table (void *tab)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
562 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
563 assert (tab != from_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
564 assert (tab != from_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
565 assert (tab != from_unicode_blank_3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
566 assert (tab != from_unicode_blank_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
567 assert (tab != to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
568 assert (tab != to_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
569 assert (tab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
570 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
571
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
572 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
573 sledgehammer_check_from_table (Lisp_Object charset, void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
574 int codetop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
575 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
576 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
577
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
578 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
579 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
580 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
581 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
582 short *tab = (short *) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
583 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
584 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
585 if (tab[i] != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
586 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
587 Lisp_Object char_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
588 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
589
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
590 assert (valid_ichar_p (tab[i]));
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
591 BREAKUP_ICHAR (tab[i], char_charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
592 assert (EQ (charset, char_charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
593 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
594 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
595 int *to_table =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
596 (int *) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
597 assert_not_any_blank_table (to_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
598 assert (to_table[c1 - 32] == (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
599 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
600 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
601 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
602 int **to_table =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
603 (int **) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
604 assert_not_any_blank_table (to_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
605 assert_not_any_blank_table (to_table[c1 - 32]);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
606 assert (to_table[c1 - 32][c2 - 32] == (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
607 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
608 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
609 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
610 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
611 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
612 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
613 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
614 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
615 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
616 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
617 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
618 sledgehammer_check_from_table (charset, tab[i], 1,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
619 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
620 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
621 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
622 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
623 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
624 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
625 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
626 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
627 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
628 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
629 sledgehammer_check_from_table (charset, tab[i], 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
630 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
631 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
632 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
633 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
634 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
635 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
636 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
637 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
638 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
639 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
640 sledgehammer_check_from_table (charset, tab[i], 3,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
641 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
642 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
643 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
644 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
645 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
646 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
647 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
648 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
649
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
650 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
651 sledgehammer_check_to_table (Lisp_Object charset, void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
652 int codetop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
653 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
654 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
655
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
656 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
657 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
658 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
659 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
660 int *tab = (int *) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
661
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
662 if (XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
663 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
664 assert (tab[0] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
665 assert (tab[95] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
666 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
667
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
668 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
669 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
670 if (tab[i] != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
671 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
672 int u4, u3, u2, u1, levels;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
673 Ichar ch;
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
674 Ichar this_ch;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
675 short val;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
676 void *frtab = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
677
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
678 if (XCHARSET_DIMENSION (charset) == 1)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
679 this_ch = make_ichar (charset, i + 32, 0);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
680 else
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
681 this_ch = make_ichar (charset, codetop + 32, i + 32);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
682
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
683 assert (tab[i] >= 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
684 BREAKUP_UNICODE_CODE (tab[i], u4, u3, u2, u1, levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
685 assert (levels <= XCHARSET_FROM_UNICODE_LEVELS (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
686
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
687 switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
688 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
689 case 1: val = ((short *) frtab)[u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
690 case 2: val = ((short **) frtab)[u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
691 case 3: val = ((short ***) frtab)[u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
692 case 4: val = ((short ****) frtab)[u4][u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
693 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
694 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
695
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
696 ch = make_ichar (charset, val >> 8, val & 0xFF);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
697 assert (ch == this_ch);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
698
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
699 switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
700 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
701 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
702 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
703 frtab = ((short ****) frtab)[u4];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
704 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
705 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
706 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
707 frtab = ((short ***) frtab)[u3];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
708 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
709 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
710 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
711 frtab = ((short **) frtab)[u2];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
712 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
713 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
714 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
715 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
716 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
717 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
718 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
719 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
720 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
721 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
722 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
723 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
724 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
725
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
726 if (XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
727 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
728 assert (tab[0] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
729 assert (tab[95] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
730 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
731
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
732 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
733 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
734 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
735 sledgehammer_check_to_table (charset, tab[i], 1, i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
736 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
737 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
738 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
739 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
740 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
741 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
742 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
743
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
744 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
745 sledgehammer_check_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
746 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
747 /* verify that the blank tables have not been modified */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
748 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
749 int from_level = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
750 int to_level = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
751
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
752 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
753 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
754 assert (from_unicode_blank_1[i] == (short) -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
755 assert (from_unicode_blank_2[i] == from_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
756 assert (from_unicode_blank_3[i] == from_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
757 assert (from_unicode_blank_4[i] == from_unicode_blank_3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
758 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
759
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
760 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
761 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
762 assert (to_unicode_blank_1[i] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
763 assert (to_unicode_blank_2[i] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
764 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
765
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
766 assert (from_level >= 1 && from_level <= 4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
767
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
768 sledgehammer_check_from_table (charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
769 XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
770 from_level, 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
772 sledgehammer_check_to_table (charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
773 XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
774 XCHARSET_DIMENSION (charset), 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
775 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
776
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
777 #endif /* SLEDGEHAMMER_CHECK_UNICODE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
778
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
779 static void
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
780 set_unicode_conversion (Ichar chr, int code)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
781 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
782 Lisp_Object charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
783 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
784
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
785 BREAKUP_ICHAR (chr, charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
786
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
787 /* I tried an assert on code > 255 || chr == code, but that fails because
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
788 Mule gives many Latin characters separate code points for different
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
789 ISO 8859 coded character sets. Obvious in hindsight.... */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
790 assert (!EQ (charset, Vcharset_ascii) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
791 assert (!EQ (charset, Vcharset_latin_iso8859_1) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
792 assert (!EQ (charset, Vcharset_control_1) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
793
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
794 /* This assert is needed because it is simply unimplemented. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
795 assert (!EQ (charset, Vcharset_composite));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
796
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
797 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
798 sledgehammer_check_unicode_tables (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
799 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
800
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
801 /* First, the char -> unicode translation */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
802
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
803 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
804 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
805 int *to_table = (int *) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
806 to_table[c1 - 32] = code;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
807 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
808 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
809 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
810 int **to_table_2 = (int **) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
811 int *to_table_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
812
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
813 assert (XCHARSET_DIMENSION (charset) == 2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
814 to_table_1 = to_table_2[c1 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
815 if (to_table_1 == to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
816 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
817 to_table_1 = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
818 memcpy (to_table_1, to_unicode_blank_1, 96 * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
819 to_table_2[c1 - 32] = to_table_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
820 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
821 to_table_1[c2 - 32] = code;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
822 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
823
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
824 /* Then, unicode -> char: much harder */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
825
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
826 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
827 int charset_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
828 int u4, u3, u2, u1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
829 int code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
830 BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
831
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
832 charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
833
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
834 /* Make sure the charset's tables have at least as many levels as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
835 the code point has: Note that the charset is guaranteed to have
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
836 at least one level, because it was created that way */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
837 if (charset_levels < code_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
838 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
839 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
840
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
841 assert (charset_levels > 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
842 for (i = 2; i <= code_levels; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
843 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
844 if (charset_levels < i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
845 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
846 void *old_table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
847 void *table = create_new_from_unicode_table (i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
848 XCHARSET_FROM_UNICODE_TABLE (charset) = table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
849
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
850 switch (i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
851 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
852 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
853 ((short **) table)[0] = (short *) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
854 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
855 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
856 ((short ***) table)[0] = (short **) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
857 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
858 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
859 ((short ****) table)[0] = (short ***) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
860 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
861 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
862 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
863 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
864 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
865
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
866 charset_levels = code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
867 XCHARSET_FROM_UNICODE_LEVELS (charset) = code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
868 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
869
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
870 /* Now, make sure there is a non-default table at each level */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
871 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
872 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
873 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
874
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
875 for (i = charset_levels; i >= 2; i--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
876 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
877 switch (i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
878 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
879 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
880 if (((short ****) table)[u4] == from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
881 ((short ****) table)[u4] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
882 ((short ***) create_new_from_unicode_table (3));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
883 table = ((short ****) table)[u4];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
884 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
885 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
886 if (((short ***) table)[u3] == from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
887 ((short ***) table)[u3] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
888 ((short **) create_new_from_unicode_table (2));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
889 table = ((short ***) table)[u3];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
890 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
891 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
892 if (((short **) table)[u2] == from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
893 ((short **) table)[u2] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
894 ((short *) create_new_from_unicode_table (1));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
895 table = ((short **) table)[u2];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
896 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
897 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
898 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
899 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
900 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
901
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
902 /* Finally, set the character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
903
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
904 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
905 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
906 switch (charset_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
907 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
908 case 1: ((short *) table)[u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
909 case 2: ((short **) table)[u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
910 case 3: ((short ***) table)[u3][u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
911 case 4: ((short ****) table)[u4][u3][u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
912 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
913 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
914 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
915 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
916
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
917 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
918 sledgehammer_check_unicode_tables (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
919 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
920 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
921
788
026c5bf9c134 [xemacs-hg @ 2002-03-21 07:29:57 by ben]
ben
parents: 778
diff changeset
922 int
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
923 ichar_to_unicode (Ichar chr)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
924 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
925 Lisp_Object charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
926 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
927
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
928 type_checking_assert (valid_ichar_p (chr));
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
929 /* This shortcut depends on the representation of an Ichar, see text.c. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
930 if (chr < 256)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
931 return (int) chr;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
932
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
933 BREAKUP_ICHAR (chr, charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
934 if (EQ (charset, Vcharset_composite))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
935 return -1; /* #### don't know how to handle */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
936 else if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
937 return ((int *) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
938 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
939 return ((int **) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32][c2 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
940 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
941
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
942 static Ichar
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
943 unicode_to_ichar (int code, Lisp_Object_dynarr *charsets)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
944 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
945 int u1, u2, u3, u4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
946 int code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
947 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
948 int n = Dynarr_length (charsets);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
949
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
950 type_checking_assert (code >= 0);
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
951 /* This shortcut depends on the representation of an Ichar, see text.c.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
952 Note that it may _not_ be extended to U+00A0 to U+00FF (many ISO 8859
893
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
953 coded character sets have points that map into that region, so this
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
954 function is many-valued). */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
955 if (code < 0xA0)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
956 return (Ichar) code;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
957
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
958 BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
959
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
960 for (i = 0; i < n; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
961 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
962 Lisp_Object charset = Dynarr_at (charsets, i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
963 int charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
964 if (charset_levels >= code_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
965 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
966 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
967 short retval;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
968
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
969 switch (charset_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
970 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
971 case 1: retval = ((short *) table)[u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
972 case 2: retval = ((short **) table)[u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
973 case 3: retval = ((short ***) table)[u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
974 case 4: retval = ((short ****) table)[u4][u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
975 default: abort (); retval = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
976 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
977
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
978 if (retval != -1)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
979 return make_ichar (charset, retval >> 8, retval & 0xFF);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
980 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
981 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
982
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
983 return (Ichar) -1;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
984 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
985
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
986 /* Add charsets to precedence list.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
987 LIST must be a list of charsets. Charsets which are in the list more
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
988 than once are given the precedence implied by their earliest appearance.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
989 Later appearances are ignored. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
990 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
991 add_charsets_to_precedence_list (Lisp_Object list, int *lbs,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
992 Lisp_Object_dynarr *dynarr)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
993 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
994 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
995 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
996 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
997 Lisp_Object charset = Fget_charset (elt);
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
998 int lb = XCHARSET_LEADING_BYTE (charset);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
999 if (lbs[lb - MIN_LEADING_BYTE] == 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1000 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1001 Dynarr_add (dynarr, charset);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1002 lbs[lb - MIN_LEADING_BYTE] = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1003 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1004 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1005 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1006 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1007
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1008 /* Rebuild the charset precedence array.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1009 The "charsets preferred for the current language" get highest precedence,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1010 followed by the "charsets preferred by default", ordered as in
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1011 Vlanguage_unicode_precedence_list and Vdefault_unicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1012 respectively. All remaining charsets follow in an arbitrary order. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1013 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1014 recalculate_unicode_precedence (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1015 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1016 int lbs[NUM_LEADING_BYTES];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1017 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1018
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1019 for (i = 0; i < NUM_LEADING_BYTES; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1020 lbs[i] = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1021
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1022 Dynarr_reset (unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1023
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1024 add_charsets_to_precedence_list (Vlanguage_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1025 lbs, unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1026 add_charsets_to_precedence_list (Vdefault_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1027 lbs, unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1028
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1029 for (i = 0; i < NUM_LEADING_BYTES; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1030 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1031 if (lbs[i] == 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1032 {
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1033 Lisp_Object charset = charset_by_leading_byte (i + MIN_LEADING_BYTE);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1034 if (!NILP (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1035 Dynarr_add (unicode_precedence_dynarr, charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1036 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1037 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1038 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1039
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1040 DEFUN ("unicode-precedence-list",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1041 Funicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1042 0, 0, 0, /*
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1043 Return the precedence order among charsets used for Unicode decoding.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1044
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1045 Value is a list of charsets, which are searched in order for a translation
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1046 matching a given Unicode character.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1047
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1048 The highest precedence is given to the language-specific precedence list of
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1049 charsets, defined by `set-language-unicode-precedence-list'. These are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1050 followed by charsets in the default precedence list, defined by
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1051 `set-default-unicode-precedence-list'. Charsets occurring multiple times are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1052 given precedence according to their first occurrance in either list. These
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1053 are followed by the remaining charsets, in some arbitrary order.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1054
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1055 The language-specific precedence list is meant to be set as part of the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1056 language environment initialization; the default precedence list is meant
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1057 to be set by the user.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1058 */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1059 ())
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1060 {
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1061 int i;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1062 Lisp_Object list = Qnil;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1063
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1064 for (i = Dynarr_length (unicode_precedence_dynarr) - 1; i >= 0; i--)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1065 list = Fcons (Dynarr_at (unicode_precedence_dynarr, i), list);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1066 return list;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1067 }
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1068
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1069
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1070 /* #### This interface is wrong. Cyrillic users and Chinese users are going
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1071 to have varying opinions about whether ISO Cyrillic, KOI8-R, or Windows
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1072 1251 should take precedence, and whether Big Five or CNS should take
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1073 precedence, respectively. This means that users are sometimes going to
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1074 want to set Vlanguage_unicode_precedence_list.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1075 Furthermore, this should be language-local (buffer-local would be a
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1076 reasonable approximation). */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1077 DEFUN ("set-language-unicode-precedence-list",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1078 Fset_language_unicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1079 1, 1, 0, /*
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1080 Set the language-specific precedence of charsets in Unicode decoding.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1081 LIST is a list of charsets.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1082 See `unicode-precedence-list' for more information.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1083 */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1084 (list))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1085 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1086 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1087 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1088 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1089 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1090
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1091 Vlanguage_unicode_precedence_list = list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1092 recalculate_unicode_precedence ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1093 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1094 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1095
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1096 DEFUN ("language-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1097 Flanguage_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1098 0, 0, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1099 Return the language-specific precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1100 See `unicode-precedence-list' for more information.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1101 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1102 ())
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1103 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1104 return Vlanguage_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1105 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1106
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1107 DEFUN ("set-default-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1108 Fset_default_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1109 1, 1, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1110 Set the default precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1111 This is intended to be set by the user. See
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1112 `unicode-precedence-list' for more information.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1113 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1114 (list))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1115 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1116 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1117 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1118 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1119 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1120
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1121 Vdefault_unicode_precedence_list = list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1122 recalculate_unicode_precedence ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1123 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1124 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1125
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1126 DEFUN ("default-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1127 Fdefault_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1128 0, 0, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1129 Return the default precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1130 See `unicode-precedence-list' for more information.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1131 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1132 ())
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1133 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1134 return Vdefault_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1135 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1136
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1137 DEFUN ("set-unicode-conversion", Fset_unicode_conversion,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1138 2, 2, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1139 Add conversion information between Unicode codepoints and characters.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1140 Conversions for U+0000 to U+00FF are hardwired to ASCII, Control-1, and
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1141 Latin-1. Attempts to set these values will raise an error.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1142
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1143 CHARACTER is one of the following:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1144
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1145 -- A character (in which case CODE must be a non-negative integer; values
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1146 above 2^20 - 1 are allowed for the purpose of specifying private
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1147 characters, but are illegal in standard Unicode---they will cause errors
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1148 when converted to utf-16)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1149 -- A vector of characters (in which case CODE must be a vector of integers
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1150 of the same length)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1151 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1152 (character, code))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1153 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1154 Lisp_Object charset;
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1155 int ichar, unicode;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1156
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1157 CHECK_CHAR (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1158 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1159
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1160 unicode = XINT (code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1161 ichar = XCHAR (character);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1162 charset = ichar_charset (ichar);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1163
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1164 /* The translations of ASCII, Control-1, and Latin-1 code points are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1165 hard-coded in ichar_to_unicode and unicode_to_ichar.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1166
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1167 Checking unicode < 256 && ichar != unicode is wrong because Mule gives
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1168 many Latin characters code points in a few different character sets. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1169 if ((EQ (charset, Vcharset_ascii) ||
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1170 EQ (charset, Vcharset_control_1) ||
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1171 EQ (charset, Vcharset_latin_iso8859_1))
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1172 && unicode != ichar)
893
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
1173 signal_error (Qinvalid_argument, "Can't change Unicode translation for ASCII, Control-1 or Latin-1 character",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1174 character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1175
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1176 /* #### Composite characters are not properly implemented yet. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1177 if (EQ (charset, Vcharset_composite))
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1178 signal_error (Qinvalid_argument, "Can't set Unicode translation for Composite char",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1179 character);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1180
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1181 set_unicode_conversion (ichar, unicode);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1182 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1183 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1184
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1185 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1186
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
1187 DEFUN ("char-to-unicode", Fchar_to_unicode, 1, 1, 0, /*
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1188 Convert character to Unicode codepoint.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1189 When there is no international support (i.e. the 'mule feature is not
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1190 present), this function simply does `char-to-int'.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1191 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1192 (character))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1193 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1194 CHECK_CHAR (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1195 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1196 return make_int (ichar_to_unicode (XCHAR (character)));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1197 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1198 return Fchar_to_int (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1199 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1200 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1201
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
1202 DEFUN ("unicode-to-char", Funicode_to_char, 1, 2, 0, /*
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1203 Convert Unicode codepoint to character.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1204 CODE should be a non-negative integer.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1205 If CHARSETS is given, it should be a list of charsets, and only those
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1206 charsets will be consulted, in the given order, for a translation.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1207 Otherwise, the default ordering of all charsets will be given (see
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1208 `set-unicode-charset-precedence').
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1209
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1210 When there is no international support (i.e. the 'mule feature is not
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1211 present), this function simply does `int-to-char' and ignores the CHARSETS
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1212 argument.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1213 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1214 (code, charsets))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1215 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1216 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1217 Lisp_Object_dynarr *dyn;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1218 int lbs[NUM_LEADING_BYTES];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1219 int c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1220
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1221 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1222 c = XINT (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1223 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1224 EXTERNAL_LIST_LOOP_2 (elt, charsets)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1225 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1226 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1227
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1228 if (NILP (charsets))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1229 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1230 Ichar ret = unicode_to_ichar (c, unicode_precedence_dynarr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1231 if (ret == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1232 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1233 return make_char (ret);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1234 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1235
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1236 dyn = Dynarr_new (Lisp_Object);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1237 memset (lbs, 0, NUM_LEADING_BYTES * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1238 add_charsets_to_precedence_list (charsets, lbs, dyn);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1239 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1240 Ichar ret = unicode_to_ichar (c, dyn);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1241 Dynarr_free (dyn);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1242 if (ret == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1243 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1244 return make_char (ret);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1245 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1246 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1247 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1248 return Fint_to_char (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1249 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1250 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1251
872
79c6ff3eef26 [xemacs-hg @ 2002-06-20 21:18:01 by ben]
ben
parents: 867
diff changeset
1252 #ifdef MULE
79c6ff3eef26 [xemacs-hg @ 2002-06-20 21:18:01 by ben]
ben
parents: 867
diff changeset
1253
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1254 static Lisp_Object
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1255 cerrar_el_fulano (Lisp_Object fulano)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1256 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1257 FILE *file = (FILE *) get_opaque_ptr (fulano);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1258 retry_fclose (file);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1259 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1260 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1261
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1262 /* #### shouldn't this interface be called load-unicode-mapping-table
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1263 for consistency with Unicode Consortium terminology? */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1264 DEFUN ("parse-unicode-translation-table", Fparse_unicode_translation_table,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1265 2, 6, 0, /*
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1266 Load Unicode tables with the Unicode mapping data in FILENAME for CHARSET.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1267 Data is text, in the form of one translation per line -- charset
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1268 codepoint followed by Unicode codepoint. Numbers are decimal or hex
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1269 \(preceded by 0x). Comments are marked with a #. Charset codepoints
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1270 for two-dimensional charsets have the first octet stored in the
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1271 high 8 bits of the hex number and the second in the low 8 bits.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1272
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1273 If START and END are given, only charset codepoints within the given
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1274 range will be processed. (START and END apply to the codepoints in the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1275 file, before OFFSET is applied.)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1276
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1277 If OFFSET is given, that value will be added to all charset codepoints
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1278 in the file to obtain the internal charset codepoint. \(We assume
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1279 that octets in the table are in the range 33 to 126 or 32 to 127. If
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1280 you have a table in ku-ten form, with octets in the range 1 to 94, you
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1281 will have to use an offset of 5140, i.e. 0x2020.)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1282
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1283 FLAGS, if specified, control further how the tables are interpreted
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1284 and are used to special-case certain known format deviations in the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1285 Unicode tables or in the charset:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1286
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1287 `ignore-first-column'
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1288 The JIS X 0208 tables have 3 columns of data instead of 2. The first
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1289 column contains the Shift-JIS codepoint, which we ignore.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1290 `big5'
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1291 The charset codepoints are Big Five codepoints; convert it to the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1292 hacked-up Mule codepoint in `chinese-big5-1' or `chinese-big5-2'.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1293 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1294 (filename, charset, start, end, offset, flags))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1295 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1296 int st = 0, en = INT_MAX, of = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1297 FILE *file;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1298 struct gcpro gcpro1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1299 char line[1025];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1300 int fondo = specpdl_depth ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1301 int ignore_first_column = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1302 int big5 = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1303
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1304 CHECK_STRING (filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1305 charset = Fget_charset (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1306 if (!NILP (start))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1307 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1308 CHECK_INT (start);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1309 st = XINT (start);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1310 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1311 if (!NILP (end))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1312 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1313 CHECK_INT (end);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1314 en = XINT (end);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1315 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1316 if (!NILP (offset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1317 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1318 CHECK_INT (offset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1319 of = XINT (offset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1320 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1321
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1322 if (!LISTP (flags))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1323 flags = list1 (flags);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1324
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1325 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1326 EXTERNAL_LIST_LOOP_2 (elt, flags)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1327 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1328 if (EQ (elt, Qignore_first_column))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1329 ignore_first_column = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1330 else if (EQ (elt, Qbig5))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1331 big5 = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1332 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1333 invalid_constant
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1334 ("Unrecognized `parse-unicode-translation-table' flag", elt);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1335 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1336 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1337
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1338 GCPRO1 (filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1339 filename = Fexpand_file_name (filename, Qnil);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1340 file = qxe_fopen (XSTRING_DATA (filename), READ_TEXT);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1341 if (!file)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1342 report_file_error ("Cannot open", filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1343 record_unwind_protect (cerrar_el_fulano, make_opaque_ptr (file));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1344 while (fgets (line, sizeof (line), file))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1345 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1346 char *p = line;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1347 int cp1, cp2, endcount;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1348 int cp1high, cp1low;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1349 int dummy;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1350
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1351 while (*p) /* erase all comments out of the line */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1352 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1353 if (*p == '#')
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1354 *p = '\0';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1355 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1356 p++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1357 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1358 /* see if line is nothing but whitespace and skip if so */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1359 p = line + strspn (line, " \t\n\r\f");
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1360 if (!*p)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1361 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1362 /* NOTE: It appears that MS Windows and Newlib sscanf() have
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1363 different interpretations for whitespace (== "skip all whitespace
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1364 at processing point"): Newlib requires at least one corresponding
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1365 whitespace character in the input, but MS allows none. The
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1366 following would be easier to write if we could count on the MS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1367 interpretation.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1368
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1369 Also, the return value does NOT include %n storage. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1370 if ((!ignore_first_column ?
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1371 sscanf (p, "%i %i%n", &cp1, &cp2, &endcount) < 2 :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1372 sscanf (p, "%i %i %i%n", &dummy, &cp1, &cp2, &endcount) < 3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1373 || *(p + endcount + strspn (p + endcount, " \t\n\r\f")))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1374 {
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1375 warn_when_safe (Qunicode, Qwarning,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1376 "Unrecognized line in translation file %s:\n%s",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1377 XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1378 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1379 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1380 if (cp1 >= st && cp1 <= en)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1381 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1382 cp1 += of;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1383 if (cp1 < 0 || cp1 >= 65536)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1384 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1385 out_of_range:
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1386 warn_when_safe (Qunicode, Qwarning,
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1387 "Out of range first codepoint 0x%x in "
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1388 "translation file %s:\n%s",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1389 cp1, XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1390 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1391 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1392
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1393 cp1high = cp1 >> 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1394 cp1low = cp1 & 255;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1395
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1396 if (big5)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1397 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1398 Ichar ch = decode_big5_char (cp1high, cp1low);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1399 if (ch == -1)
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1400
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1401 warn_when_safe (Qunicode, Qwarning,
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1402 "Out of range Big5 codepoint 0x%x in "
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1403 "translation file %s:\n%s",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1404 cp1, XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1405 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1406 set_unicode_conversion (ch, cp2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1407 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1408 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1409 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1410 int l1, h1, l2, h2;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1411 Ichar emch;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1412
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1413 switch (XCHARSET_TYPE (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1414 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1415 case CHARSET_TYPE_94: l1 = 33; h1 = 126; l2 = 0; h2 = 0; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1416 case CHARSET_TYPE_96: l1 = 32; h1 = 127; l2 = 0; h2 = 0; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1417 case CHARSET_TYPE_94X94: l1 = 33; h1 = 126; l2 = 33; h2 = 126;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1418 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1419 case CHARSET_TYPE_96X96: l1 = 32; h1 = 127; l2 = 32; h2 = 127;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1420 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1421 default: abort (); l1 = 0; h1 = 0; l2 = 0; h2 = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1422 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1423
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1424 if (cp1high < l2 || cp1high > h2 || cp1low < l1 || cp1low > h1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1425 goto out_of_range;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1426
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1427 emch = (cp1high == 0 ? make_ichar (charset, cp1low, 0) :
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1428 make_ichar (charset, cp1high, cp1low));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1429 set_unicode_conversion (emch, cp2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1430 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1431 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1432 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1433
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1434 if (ferror (file))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1435 report_file_error ("IO error when reading", filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1436
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1437 unbind_to (fondo); /* close file */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1438 UNGCPRO;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1439 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1440 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1441
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1442 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1443
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1444
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1445 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1446 /* Unicode coding system */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1447 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1448
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1449 /* ISO 10646 UTF-16, UCS-4, UTF-8, UTF-7, etc. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1450
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1451 enum unicode_type
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1452 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1453 UNICODE_UTF_16,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1454 UNICODE_UTF_8,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1455 UNICODE_UTF_7,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1456 UNICODE_UCS_4,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1457 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1458
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1459 struct unicode_coding_system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1460 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1461 enum unicode_type type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1462 int little_endian :1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1463 int need_bom :1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1464 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1465
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1466 #define CODING_SYSTEM_UNICODE_TYPE(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1467 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1468 #define XCODING_SYSTEM_UNICODE_TYPE(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1469 CODING_SYSTEM_UNICODE_TYPE (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1470 #define CODING_SYSTEM_UNICODE_LITTLE_ENDIAN(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1471 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1472 #define XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1473 CODING_SYSTEM_UNICODE_LITTLE_ENDIAN (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1474 #define CODING_SYSTEM_UNICODE_NEED_BOM(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1475 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->need_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1476 #define XCODING_SYSTEM_UNICODE_NEED_BOM(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1477 CODING_SYSTEM_UNICODE_NEED_BOM (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1478
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1479 struct unicode_coding_stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1480 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1481 /* decode */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1482 unsigned char counter;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1483 int seen_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1484 /* encode */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1485 Lisp_Object current_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1486 int current_char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1487 int wrote_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1488 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1489
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1490 static const struct memory_description unicode_coding_system_description[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1491 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1492 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1493
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1494 DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (unicode);
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1495
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1496 /* Decode a UCS-2 or UCS-4 character into a buffer. If the lookup fails, use
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1497 <GETA MARK> (U+3013) of JIS X 0208, which means correct character
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1498 is not found, instead.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1499 #### do something more appropriate (use blob?)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1500 Danger, Will Robinson! Data loss. Should we signal user? */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1501 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1502 decode_unicode_char (int ch, unsigned_char_dynarr *dst,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1503 struct unicode_coding_stream *data, int ignore_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1504 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1505 if (ch == 0xFEFF && !data->seen_char && ignore_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1506 ;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1507 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1508 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1509 #ifdef MULE
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1510 Ichar chr = unicode_to_ichar (ch, unicode_precedence_dynarr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1511
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1512 if (chr != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1513 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1514 Ibyte work[MAX_ICHAR_LEN];
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1515 int len;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1516
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1517 len = set_itext_ichar (work, chr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1518 Dynarr_add_many (dst, work, len);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1519 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1520 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1521 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1522 Dynarr_add (dst, LEADING_BYTE_JAPANESE_JISX0208);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1523 Dynarr_add (dst, 34 + 128);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1524 Dynarr_add (dst, 46 + 128);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1525 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1526 #else
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1527 Dynarr_add (dst, (Ibyte) ch);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1528 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1529 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1530
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1531 data->seen_char = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1532 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1533
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1534 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1535 encode_unicode_char_1 (int code, unsigned_char_dynarr *dst,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1536 enum unicode_type type, int little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1537 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1538 switch (type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1539 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1540 case UNICODE_UTF_16:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1541 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1542 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1543 Dynarr_add (dst, (unsigned char) (code & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1544 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1545 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1546 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1547 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1548 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1549 Dynarr_add (dst, (unsigned char) (code & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1550 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1551 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1552
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1553 case UNICODE_UCS_4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1554 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1555 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1556 Dynarr_add (dst, (unsigned char) (code & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1557 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1558 Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1559 Dynarr_add (dst, (unsigned char) (code >> 24));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1560 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1561 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1562 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1563 Dynarr_add (dst, (unsigned char) (code >> 24));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1564 Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1565 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1566 Dynarr_add (dst, (unsigned char) (code & 255));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1567 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1568 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1569
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1570 case UNICODE_UTF_8:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1571 if (code <= 0x7f)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1572 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1573 Dynarr_add (dst, (unsigned char) code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1574 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1575 else if (code <= 0x7ff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1576 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1577 Dynarr_add (dst, (unsigned char) ((code >> 6) | 0xc0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1578 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1579 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1580 else if (code <= 0xffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1581 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1582 Dynarr_add (dst, (unsigned char) ((code >> 12) | 0xe0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1583 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1584 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1585 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1586 else if (code <= 0x1fffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1587 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1588 Dynarr_add (dst, (unsigned char) ((code >> 18) | 0xf0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1589 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1590 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1591 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1592 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1593 else if (code <= 0x3ffffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1594 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1595 Dynarr_add (dst, (unsigned char) ((code >> 24) | 0xf8));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1596 Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1597 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1598 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1599 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1600 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1601 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1602 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1603 Dynarr_add (dst, (unsigned char) ((code >> 30) | 0xfc));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1604 Dynarr_add (dst, (unsigned char) (((code >> 24) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1605 Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1606 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1607 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1608 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1609 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1610 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1611
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1612 case UNICODE_UTF_7: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1613
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1614 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1615 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1616 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1617
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1618 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1619 encode_unicode_char (Lisp_Object charset, int h, int l,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1620 unsigned_char_dynarr *dst, enum unicode_type type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1621 int little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1622 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1623 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1624 int code = ichar_to_unicode (make_ichar (charset, h & 127, l & 127));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1625
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1626 if (code == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1627 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1628 if (type != UNICODE_UTF_16 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1629 XCHARSET_DIMENSION (charset) == 2 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1630 XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1631 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1632 unsigned char final = XCHARSET_FINAL (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1633
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1634 if (('@' <= final) && (final < 0x7f))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1635 code = (0xe00000 + (final - '@') * 94 * 94
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1636 + ((h & 127) - 33) * 94 + (l & 127) - 33);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1637 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1638 code = '?';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1639 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1640 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1641 code = '?';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1642 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1643 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1644 int code = h;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1645 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1646
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1647 encode_unicode_char_1 (code, dst, type, little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1648 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1649
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1650 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1651 unicode_convert (struct coding_stream *str, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1652 unsigned_char_dynarr *dst, Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1653 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1654 unsigned int ch = str->ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1655 struct unicode_coding_stream *data = CODING_STREAM_TYPE_DATA (str, unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1656 enum unicode_type type =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1657 XCODING_SYSTEM_UNICODE_TYPE (str->codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1658 int little_endian = XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (str->codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1659 int ignore_bom = XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1660 Bytecount orign = n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1661
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1662 if (str->direction == CODING_DECODE)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1663 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1664 unsigned char counter = data->counter;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1665
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1666 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1667 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1668 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1669
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1670 switch (type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1671 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1672 case UNICODE_UTF_8:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1673 switch (counter)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1674 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1675 case 0:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1676 if (c >= 0xfc)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1677 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1678 ch = c & 0x01;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1679 counter = 5;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1680 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1681 else if (c >= 0xf8)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1682 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1683 ch = c & 0x03;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1684 counter = 4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1685 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1686 else if (c >= 0xf0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1687 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1688 ch = c & 0x07;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1689 counter = 3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1690 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1691 else if (c >= 0xe0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1692 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1693 ch = c & 0x0f;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1694 counter = 2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1695 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1696 else if (c >= 0xc0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1697 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1698 ch = c & 0x1f;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1699 counter = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1700 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1701 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1702 decode_unicode_char (c, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1703 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1704 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1705 ch = (ch << 6) | (c & 0x3f);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1706 decode_unicode_char (ch, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1707 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1708 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1709 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1710 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1711 ch = (ch << 6) | (c & 0x3f);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1712 counter--;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1713 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1714 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1715
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1716 case UNICODE_UTF_16:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1717 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1718 ch = (c << counter) | ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1719 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1720 ch = (ch << 8) | c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1721 counter += 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1722 if (counter == 16)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1723 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1724 int tempch = ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1725 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1726 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1727 decode_unicode_char (tempch, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1728 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1729 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1730
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1731 case UNICODE_UCS_4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1732 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1733 ch = (c << counter) | ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1734 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1735 ch = (ch << 8) | c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1736 counter += 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1737 if (counter == 32)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1738 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1739 int tempch = ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1740 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1741 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1742 if (tempch < 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1743 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1744 /* !!#### indicate an error */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1745 tempch = '~';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1746 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1747 decode_unicode_char (tempch, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1748 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1749 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1750
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1751 case UNICODE_UTF_7:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1752 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1753 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1754
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1755 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1756 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1757
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1758 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1759 if (str->eof)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1760 DECODE_OUTPUT_PARTIAL_CHAR (ch, dst);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1761
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1762 data->counter = counter;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1763 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1764 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1765 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1766 unsigned char char_boundary = data->current_char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1767 Lisp_Object charset = data->current_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1768
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1769 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1770 /* flags for handling composite chars. We do a little switcheroo
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1771 on the source while we're outputting the composite char. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1772 Bytecount saved_n = 0;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1773 const Ibyte *saved_src = NULL;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1774 int in_composite = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1775
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1776 back_to_square_n:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1777 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1778
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1779 if (XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys) && !data->wrote_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1780 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1781 encode_unicode_char_1 (0xFEFF, dst, type, little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1782 data->wrote_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1783 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1784
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1785 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1786 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1787 Ibyte c = *src++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1788
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1789 #ifdef MULE
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1790 if (byte_ascii_p (c))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1791 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1792 { /* Processing ASCII character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1793 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1794 encode_unicode_char (Vcharset_ascii, c, 0, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1795 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1796
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1797 char_boundary = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1798 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1799 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1800 else if (ibyte_leading_byte_p (c) || ibyte_leading_byte_p (ch))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1801 { /* Processing Leading Byte */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1802 ch = 0;
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1803 charset = charset_by_leading_byte (c);
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1804 if (leading_byte_prefix_p(c))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1805 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1806 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1807 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1808 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1809 { /* Processing Non-ASCII character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1810 char_boundary = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1811 if (EQ (charset, Vcharset_control_1))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1812 encode_unicode_char (Vcharset_control_1, c, 0, dst,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1813 type, little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1814 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1815 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1816 switch (XCHARSET_REP_BYTES (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1817 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1818 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1819 encode_unicode_char (charset, c, 0, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1820 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1821 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1822 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1823 if (XCHARSET_PRIVATE_P (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1824 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1825 encode_unicode_char (charset, c, 0, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1826 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1827 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1828 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1829 else if (ch)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1830 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1831 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1832 if (EQ (charset, Vcharset_composite))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1833 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1834 if (in_composite)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1835 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1836 /* #### Bother! We don't know how to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1837 handle this yet. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1838 encode_unicode_char (Vcharset_ascii, '~', 0,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1839 dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1840 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1841 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1842 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1843 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1844 Ichar emch = make_ichar (Vcharset_composite,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1845 ch & 0x7F,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1846 c & 0x7F);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1847 Lisp_Object lstr =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1848 composite_char_string (emch);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1849 saved_n = n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1850 saved_src = src;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1851 in_composite = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1852 src = XSTRING_DATA (lstr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1853 n = XSTRING_LENGTH (lstr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1854 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1855 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1856 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1857 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1858 encode_unicode_char (charset, ch, c, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1859 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1860 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1861 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1862 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1863 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1864 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1865 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1866 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1867 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1868 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1869 if (ch)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1870 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1871 encode_unicode_char (charset, ch, c, dst, type,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1872 little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1873 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1874 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1875 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1876 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1877 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1878 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1879 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1880 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1881 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1882 abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1883 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1884 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1885 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1886 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1887 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1888
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1889 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1890 if (in_composite)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1891 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1892 n = saved_n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1893 src = saved_src;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1894 in_composite = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1895 goto back_to_square_n; /* Wheeeeeeeee ..... */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1896 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1897 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1898
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1899 data->current_char_boundary = char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1900 data->current_charset = charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1901
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1902 /* La palabra se hizo carne! */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1903 /* A palavra fez-se carne! */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1904 /* Whatever. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1905 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1906
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1907 str->ch = ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1908 return orign;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1909 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1910
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1911 /* DEFINE_DETECTOR (utf_7); */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1912 DEFINE_DETECTOR (utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1913 DEFINE_DETECTOR_CATEGORY (utf_8, utf_8);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
1914 DEFINE_DETECTOR_CATEGORY (utf_8, utf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1915 DEFINE_DETECTOR (ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1916 DEFINE_DETECTOR_CATEGORY (ucs_4, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1917 DEFINE_DETECTOR (utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1918 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1919 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1920 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1921 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1922
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1923 struct ucs_4_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1924 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1925 int in_ucs_4_byte;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1926 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1927
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1928 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1929 ucs_4_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1930 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1931 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1932 struct ucs_4_detector *data = DETECTION_STATE_DATA (st, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1933
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1934 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1935 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1936 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1937 switch (data->in_ucs_4_byte)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1938 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1939 case 0:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1940 if (c >= 128)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1941 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1942 DET_RESULT (st, ucs_4) = DET_NEARLY_IMPOSSIBLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1943 return;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1944 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1945 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1946 data->in_ucs_4_byte++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1947 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1948 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1949 data->in_ucs_4_byte = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1950 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1951 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1952 data->in_ucs_4_byte++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1953 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1954 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1955
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1956 /* !!#### write this for real */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1957 DET_RESULT (st, ucs_4) = DET_AS_LIKELY_AS_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1958 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1959
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1960 struct utf_16_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1961 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1962 unsigned int seen_ffff:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1963 unsigned int seen_forward_bom:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1964 unsigned int seen_rev_bom:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1965 int byteno;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1966 int prev_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1967 int text, rev_text;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
1968 int sep, rev_sep;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
1969 int num_ascii;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1970 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1971
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1972 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1973 utf_16_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1974 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1975 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1976 struct utf_16_detector *data = DETECTION_STATE_DATA (st, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1977
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1978 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1979 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1980 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1981 int prevc = data->prev_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1982 if (data->byteno == 1 && c == 0xFF && prevc == 0xFE)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1983 data->seen_forward_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1984 else if (data->byteno == 1 && c == 0xFE && prevc == 0xFF)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1985 data->seen_rev_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1986
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1987 if (data->byteno & 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1988 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1989 if (c == 0xFF && prevc == 0xFF)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1990 data->seen_ffff = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1991 if (prevc == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1992 && (c == '\r' || c == '\n'
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1993 || (c >= 0x20 && c <= 0x7E)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1994 data->text++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1995 if (c == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1996 && (prevc == '\r' || prevc == '\n'
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1997 || (prevc >= 0x20 && prevc <= 0x7E)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1998 data->rev_text++;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
1999 /* #### 0x2028 is LINE SEPARATOR and 0x2029 is PARAGRAPH SEPARATOR.
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2000 I used to count these in text and rev_text but that is very bad,
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2001 as 0x2028 is also space + left-paren in ASCII, which is extremely
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2002 common. So, what do we do with these? */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2003 if (prevc == 0x20 && (c == 0x28 || c == 0x29))
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2004 data->sep++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2005 if (c == 0x20 && (prevc == 0x28 || prevc == 0x29))
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2006 data->rev_sep++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2007 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2008
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2009 if ((c >= ' ' && c <= '~') || c == '\n' || c == '\r' || c == '\t' ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2010 c == '\f' || c == '\v')
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2011 data->num_ascii++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2012 data->byteno++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2013 data->prev_char = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2014 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2015
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2016 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2017 int variance_indicates_big_endian =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2018 (data->text >= 10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2019 && (data->rev_text == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2020 || data->text / data->rev_text >= 10));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2021 int variance_indicates_little_endian =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2022 (data->rev_text >= 10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2023 && (data->text == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2024 || data->rev_text / data->text >= 10));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2025
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2026 if (data->seen_ffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2027 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2028 else if (data->seen_forward_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2029 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2030 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2031 if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2032 DET_RESULT (st, utf_16_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2033 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2034 DET_RESULT (st, utf_16_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2035 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2036 DET_RESULT (st, utf_16_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2037 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2038 else if (data->seen_forward_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2039 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2040 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2041 if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2042 DET_RESULT (st, utf_16_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2043 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2044 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2045 DET_RESULT (st, utf_16_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2046 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2047 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2048 DET_RESULT (st, utf_16_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2049 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2050 else if (data->seen_rev_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2051 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2052 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2053 if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2054 DET_RESULT (st, utf_16_little_endian_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2055 else if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2056 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2057 DET_RESULT (st, utf_16_little_endian_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2058 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2059 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2060 DET_RESULT (st, utf_16_little_endian_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2061 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2062 else if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2063 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2064 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2065 DET_RESULT (st, utf_16) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2066 DET_RESULT (st, utf_16_little_endian) = DET_SOMEWHAT_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2067 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2068 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2069 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2070 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2071 DET_RESULT (st, utf_16) = DET_SOMEWHAT_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2072 DET_RESULT (st, utf_16_little_endian) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2073 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2074 else
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2075 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2076 /* #### FUCKME! There should really be an ASCII detector. This
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2077 would rule out the need to have this built-in here as
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2078 well. --ben */
1292
f3437b56874d [xemacs-hg @ 2003-02-13 09:57:04 by ben]
ben
parents: 1267
diff changeset
2079 int pct_ascii = data->byteno ? (100 * data->num_ascii) / data->byteno
f3437b56874d [xemacs-hg @ 2003-02-13 09:57:04 by ben]
ben
parents: 1267
diff changeset
2080 : 100;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2081
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2082 if (pct_ascii > 90)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2083 SET_DET_RESULTS (st, utf_16, DET_QUITE_IMPROBABLE);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2084 else if (pct_ascii > 75)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2085 SET_DET_RESULTS (st, utf_16, DET_SOMEWHAT_UNLIKELY);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2086 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2087 SET_DET_RESULTS (st, utf_16, DET_AS_LIKELY_AS_UNLIKELY);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2088 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2089 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2090 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2091
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2092 struct utf_8_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2093 {
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2094 int byteno;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2095 int first_byte;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2096 int second_byte;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2097 int prev_byte;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2098 int in_utf_8_byte;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2099 int recent_utf_8_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2100 int seen_bogus_utf8;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2101 int seen_really_bogus_utf8;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2102 int seen_2byte_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2103 int seen_longer_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2104 int seen_iso2022_esc;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2105 int seen_iso_shift;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2106 int seen_utf_bom:1;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2107 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2108
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2109 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2110 utf_8_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2111 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2112 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2113 struct utf_8_detector *data = DETECTION_STATE_DATA (st, utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2114
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2115 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2116 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2117 UExtbyte c = *src++;
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2118 switch (data->byteno)
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2119 {
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2120 case 0:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2121 data->first_byte = c;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2122 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2123 case 1:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2124 data->second_byte = c;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2125 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2126 case 2:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2127 if (data->first_byte == 0xef &&
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2128 data->second_byte == 0xbb &&
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2129 c == 0xbf)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2130 data->seen_utf_bom = 1;
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2131 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2132 }
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2133
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2134 switch (data->in_utf_8_byte)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2135 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2136 case 0:
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2137 if (data->prev_byte == ISO_CODE_ESC && c >= 0x28 && c <= 0x2F)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2138 data->seen_iso2022_esc++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2139 else if (c == ISO_CODE_SI || c == ISO_CODE_SO)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2140 data->seen_iso_shift++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2141 else if (c >= 0xfc)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2142 data->in_utf_8_byte = 5;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2143 else if (c >= 0xf8)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2144 data->in_utf_8_byte = 4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2145 else if (c >= 0xf0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2146 data->in_utf_8_byte = 3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2147 else if (c >= 0xe0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2148 data->in_utf_8_byte = 2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2149 else if (c >= 0xc0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2150 data->in_utf_8_byte = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2151 else if (c >= 0x80)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2152 data->seen_bogus_utf8++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2153 if (data->in_utf_8_byte > 0)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2154 data->recent_utf_8_sequence = data->in_utf_8_byte;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2155 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2156 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2157 if ((c & 0xc0) != 0x80)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2158 data->seen_really_bogus_utf8++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2159 else
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2160 {
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2161 data->in_utf_8_byte--;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2162 if (data->in_utf_8_byte == 0)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2163 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2164 if (data->recent_utf_8_sequence == 1)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2165 data->seen_2byte_sequence++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2166 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2167 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2168 assert (data->recent_utf_8_sequence >= 2);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2169 data->seen_longer_sequence++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2170 }
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2171 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2172 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2173 }
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2174
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2175 data->byteno++;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2176 data->prev_byte = c;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2177 }
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2178
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2179 /* either BOM or no BOM, but not both */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2180 SET_DET_RESULTS (st, utf_8, DET_NEARLY_IMPOSSIBLE);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2181
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2182
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2183 if (data->seen_utf_bom)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2184 DET_RESULT (st, utf_8_bom) = DET_NEAR_CERTAINTY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2185 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2186 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2187 if (data->seen_really_bogus_utf8 ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2188 data->seen_bogus_utf8 >= 2)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2189 ; /* bogus */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2190 else if (data->seen_bogus_utf8)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2191 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2192 else if ((data->seen_longer_sequence >= 5 ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2193 data->seen_2byte_sequence >= 10) &&
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2194 (!(data->seen_iso2022_esc + data->seen_iso_shift) ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2195 (data->seen_longer_sequence * 2 + data->seen_2byte_sequence) /
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2196 (data->seen_iso2022_esc + data->seen_iso_shift) >= 10))
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2197 /* heuristics, heuristics, we love heuristics */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2198 DET_RESULT (st, utf_8) = DET_QUITE_PROBABLE;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2199 else if (data->seen_iso2022_esc ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2200 data->seen_iso_shift >= 3)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2201 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2202 else if (data->seen_longer_sequence ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2203 data->seen_2byte_sequence)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2204 DET_RESULT (st, utf_8) = DET_SOMEWHAT_LIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2205 else if (data->seen_iso_shift)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2206 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2207 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2208 DET_RESULT (st, utf_8) = DET_AS_LIKELY_AS_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2209 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2210 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2211
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2212 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2213 unicode_init_coding_stream (struct coding_stream *str)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2214 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2215 struct unicode_coding_stream *data =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2216 CODING_STREAM_TYPE_DATA (str, unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2217 xzero (*data);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2218 data->current_charset = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2219 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2220
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2221 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2222 unicode_rewind_coding_stream (struct coding_stream *str)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2223 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2224 unicode_init_coding_stream (str);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2225 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2226
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2227 static int
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2228 unicode_putprop (Lisp_Object codesys, Lisp_Object key, Lisp_Object value)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2229 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2230 if (EQ (key, Qtype))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2231 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2232 enum unicode_type type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2233
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2234 if (EQ (value, Qutf_8))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2235 type = UNICODE_UTF_8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2236 else if (EQ (value, Qutf_16))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2237 type = UNICODE_UTF_16;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2238 else if (EQ (value, Qutf_7))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2239 type = UNICODE_UTF_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2240 else if (EQ (value, Qucs_4))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2241 type = UNICODE_UCS_4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2242 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2243 invalid_constant ("Invalid Unicode type", key);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2244
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2245 XCODING_SYSTEM_UNICODE_TYPE (codesys) = type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2246 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2247 else if (EQ (key, Qlittle_endian))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2248 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (codesys) = !NILP (value);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2249 else if (EQ (key, Qneed_bom))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2250 XCODING_SYSTEM_UNICODE_NEED_BOM (codesys) = !NILP (value);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2251 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2252 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2253 return 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2254 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2255
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2256 static Lisp_Object
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2257 unicode_getprop (Lisp_Object coding_system, Lisp_Object prop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2258 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2259 if (EQ (prop, Qtype))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2260 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2261 switch (XCODING_SYSTEM_UNICODE_TYPE (coding_system))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2262 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2263 case UNICODE_UTF_16: return Qutf_16;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2264 case UNICODE_UTF_8: return Qutf_8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2265 case UNICODE_UTF_7: return Qutf_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2266 case UNICODE_UCS_4: return Qucs_4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2267 default: abort ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2268 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2269 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2270 else if (EQ (prop, Qlittle_endian))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2271 return XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (coding_system) ? Qt : Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2272 else if (EQ (prop, Qneed_bom))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2273 return XCODING_SYSTEM_UNICODE_NEED_BOM (coding_system) ? Qt : Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2274 return Qunbound;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2275 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2276
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2277 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2278 unicode_print (Lisp_Object cs, Lisp_Object printcharfun, int escapeflag)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2279 {
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
2280 write_fmt_string_lisp (printcharfun, "(%s", 1, unicode_getprop (cs, Qtype));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2281 if (XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (cs))
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2282 write_c_string (printcharfun, ", little-endian");
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2283 if (XCODING_SYSTEM_UNICODE_NEED_BOM (cs))
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2284 write_c_string (printcharfun, ", need-bom");
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2285 write_c_string (printcharfun, ")");
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2286 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2287
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2288 int
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2289 dfc_coding_system_is_unicode (Lisp_Object codesys)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2290 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2291 #ifdef HAVE_WIN32_CODING_SYSTEMS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2292 codesys = Fget_coding_system (codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2293 return (EQ (XCODING_SYSTEM_TYPE (codesys), Qunicode) &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2294 XCODING_SYSTEM_UNICODE_TYPE (codesys) == UNICODE_UTF_16 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2295 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (codesys));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2296
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2297 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2298 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2299 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2300 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2301
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2302
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2303 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2304 /* Initialization */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2305 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2306
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2307 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2308 syms_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2309 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2310 #ifdef MULE
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
2311 DEFSUBR (Funicode_precedence_list);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2312 DEFSUBR (Fset_language_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2313 DEFSUBR (Flanguage_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2314 DEFSUBR (Fset_default_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2315 DEFSUBR (Fdefault_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2316 DEFSUBR (Fset_unicode_conversion);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2317
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2318 DEFSUBR (Fparse_unicode_translation_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2319
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2320 DEFSYMBOL (Qignore_first_column);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2321 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2322
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
2323 DEFSUBR (Fchar_to_unicode);
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
2324 DEFSUBR (Funicode_to_char);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2325
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2326 DEFSYMBOL (Qunicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2327 DEFSYMBOL (Qucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2328 DEFSYMBOL (Qutf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2329 DEFSYMBOL (Qutf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2330 DEFSYMBOL (Qutf_7);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2331
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2332 DEFSYMBOL (Qneed_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2333
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2334 DEFSYMBOL (Qutf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2335 DEFSYMBOL (Qutf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2336 DEFSYMBOL (Qutf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2337 DEFSYMBOL (Qutf_16_little_endian_bom);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2338
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2339 DEFSYMBOL (Qutf_8);
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2340 DEFSYMBOL (Qutf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2341 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2342
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2343 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2344 coding_system_type_create_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2345 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2346 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (unicode, "unicode-coding-system-p");
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2347 CODING_SYSTEM_HAS_METHOD (unicode, print);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2348 CODING_SYSTEM_HAS_METHOD (unicode, convert);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2349 CODING_SYSTEM_HAS_METHOD (unicode, init_coding_stream);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2350 CODING_SYSTEM_HAS_METHOD (unicode, rewind_coding_stream);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2351 CODING_SYSTEM_HAS_METHOD (unicode, putprop);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2352 CODING_SYSTEM_HAS_METHOD (unicode, getprop);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2353
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2354 INITIALIZE_DETECTOR (utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2355 DETECTOR_HAS_METHOD (utf_8, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2356 INITIALIZE_DETECTOR_CATEGORY (utf_8, utf_8);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2357 INITIALIZE_DETECTOR_CATEGORY (utf_8, utf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2358
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2359 INITIALIZE_DETECTOR (ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2360 DETECTOR_HAS_METHOD (ucs_4, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2361 INITIALIZE_DETECTOR_CATEGORY (ucs_4, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2362
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2363 INITIALIZE_DETECTOR (utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2364 DETECTOR_HAS_METHOD (utf_16, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2365 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2366 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2367 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2368 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2369 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2370
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2371 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2372 reinit_coding_system_type_create_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2373 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2374 REINITIALIZE_CODING_SYSTEM_TYPE (unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2375 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2376
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2377 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2378 reinit_vars_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2379 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2380 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2381 init_blank_unicode_tables ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2382 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2383 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2384
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2385 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2386 vars_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2387 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2388 reinit_vars_of_unicode ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2389
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2390 Fprovide (intern ("unicode"));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2391
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2392 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2393 staticpro (&Vlanguage_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2394 Vlanguage_unicode_precedence_list = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2395
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2396 staticpro (&Vdefault_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2397 Vdefault_unicode_precedence_list = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2398
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2399 unicode_precedence_dynarr = Dynarr_new (Lisp_Object);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2400 dump_add_root_struct_ptr (&unicode_precedence_dynarr,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2401 &lisp_object_dynarr_description);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2402 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2403 }