annotate src/unicode.c @ 4539:061e030e3270

Fix some bugs in load-history construction, built-in symbol file names. lib-src/ChangeLog addition: 2008-12-27 Aidan Kehoe <kehoea@parhasard.net> * make-docfile.c (main): Allow more than one -d argument, followed by a directory to change to. (put_filename): Don't strip directory information; with previous change, allows retrieval of Lisp function and variable origin files from #'built-in-symbol-file relative to lisp-directory. (scan_lisp_file): Don't add an extraneous newline after the file name, put_filename has added the newline already. lisp/ChangeLog addition: 2008-12-27 Aidan Kehoe <kehoea@parhasard.net> * loadup.el (load-history): Add the contents of current-load-list to load-history before clearing it. Move the variable declarations earlier in the file to a format understood by make-docfile.c. * custom.el (custom-declare-variable): Add the variable's symbol to the current file's load history entry correctly, don't use a cons. Eliminate a comment that we don't need to worry about, we don't need to check the `initialized' C variable in Lisp. * bytecomp.el (byte-compile-output-file-form): Merge Andreas Schwab's pre-GPLv3 GNU change of 19970831 here; treat #'custom-declare-variable correctly, generating the docstrings in a format understood by make-docfile.c. * loadhist.el (symbol-file): Correct behaviour for checking autoloaded macros and functions when supplied with a TYPE argument. Accept fully-qualified paths from #'built-in-symbol-file; if a path is not fully-qualified, return it relative to lisp-directory if the filename corresponds to a Lisp file, and relative to (concat source-directory "/src/") otherwise. * make-docfile.el (preloaded-file-list): Rationalise some let bindings a little. Use the "-d" argument to make-docfile.c to supply Lisp paths relative to lisp-directory, not absolutely. Add in loadup.el explicitly to the list of files to be processed by make-docfile.c--it doesn't make sense to add it to preloaded-file-list, since that is used for purposes of byte-compilation too. src/ChangeLog addition: 2008-12-27 Aidan Kehoe <kehoea@parhasard.net> * doc.c (Fbuilt_in_symbol_file): Return a subr's filename immediately if we've found it. Check for compiled function and compiled macro docstrings in DOC too, and return them if they exist. The branch of the if statement focused on functions may have executed, but we may still want to check variable bindings; an else clause isn't appropriate.
author Aidan Kehoe <kehoea@parhasard.net>
date Sat, 27 Dec 2008 14:05:50 +0000
parents bd9b678f4db7
children 2669b1b7e33b
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1 /* Code to handle Unicode conversion.
3025
facf3239ba30 [xemacs-hg @ 2005-10-25 11:16:19 by ben]
ben
parents: 3024
diff changeset
2 Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005 Ben Wing.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
4 This file is part of XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
5
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
6 XEmacs is free software; you can redistribute it and/or modify it
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
7 under the terms of the GNU General Public License as published by the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
8 Free Software Foundation; either version 2, or (at your option) any
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
9 later version.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
11 XEmacs is distributed in the hope that it will be useful, but WITHOUT
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
12 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
13 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
14 for more details.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
15
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
16 You should have received a copy of the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
17 along with XEmacs; see the file COPYING. If not, write to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
18 the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
19 Boston, MA 02111-1307, USA. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
20
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
21 /* Synched up with: FSF 20.3. Not in FSF. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
22
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
23 /* Authorship:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
24
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
25 Current primary author: Ben Wing <ben@xemacs.org>
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
26
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
27 Written by Ben Wing <ben@xemacs.org>, June, 2001.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
28 Separated out into this file, August, 2001.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
29 Includes Unicode coding systems, some parts of which have been written
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
30 by someone else. #### Morioka and Hayashi, I think.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
31
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
32 As of September 2001, the detection code is here and abstraction of the
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
33 detection system is finished. The unicode detectors have been rewritten
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
34 to include multiple levels of likelihood.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
35 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
36
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
37 #include <config.h>
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
38 #include "lisp.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
39
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
40 #include "charset.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
41 #include "file-coding.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
42 #include "opaque.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
43
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
44 #include "sysfile.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
45
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
46 /* For more info about how Unicode works under Windows, see intl-win32.c. */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
47
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
48 /* Info about Unicode translation tables [ben]:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
49
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
50 FORMAT:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
51 -------
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
52
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
53 We currently use the following format for tables:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
54
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
55 If dimension == 1, to_unicode_table is a 96-element array of ints
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
56 (Unicode code points); else, it's a 96-element array of int * pointers,
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
57 each of which points to a 96-element array of ints. If no elements in a
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
58 row have been filled in, the pointer will point to a default empty
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
59 table; that way, memory usage is more reasonable but lookup still fast.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
60
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
61 -- If from_unicode_levels == 1, from_unicode_table is a 256-element
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
62 array of shorts (octet 1 in high byte, octet 2 in low byte; we don't
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
63 store Ichars directly to save space).
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
64
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
65 -- If from_unicode_levels == 2, from_unicode_table is a 256-element
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
66 array of short * pointers, each of which points to a 256-element array
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
67 of shorts.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
68
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
69 -- If from_unicode_levels == 3, from_unicode_table is a 256-element
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
70 array of short ** pointers, each of which points to a 256-element array
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
71 of short * pointers, each of which points to a 256-element array of
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
72 shorts.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
73
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
74 -- If from_unicode_levels == 4, same thing but one level deeper.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
75
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
76 Just as for to_unicode_table, we use default tables to fill in all
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
77 entries with no values in them.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
78
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
79 #### An obvious space-saving optimization is to use variable-sized
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
80 tables, where each table instead of just being a 256-element array, is a
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
81 structure with a start value, an end value, and a variable number of
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
82 entries (END - START + 1). Only 8 bits are needed for END and START,
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
83 and could be stored at the end to avoid alignment problems. However,
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
84 before charging off and implementing this, we need to consider whether
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
85 it's worth it:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
86
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
87 (1) Most tables will be highly localized in which code points are
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
88 defined, heavily reducing the possible memory waste. Before doing any
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
89 rewriting, write some code to see how much memory is actually being
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
90 wasted (i.e. ratio of empty entries to total # of entries) and only
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
91 start rewriting if it's unacceptably high. You have to check over all
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
92 charsets.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
93
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
94 (2) Since entries are usually added one at a time, you have to be very
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
95 careful when creating the tables to avoid realloc()/free() thrashing in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
96 the common case when you are in an area of high localization and are
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
97 going to end up using most entries in the table. You'd certainly want
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
98 to allow only certain sizes, not arbitrary ones (probably powers of 2,
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
99 where you want the entire block including the START/END values to fit
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
100 into a power of 2, minus any malloc overhead if there is any -- there's
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
101 none under gmalloc.c, and probably most system malloc() functions are
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
102 quite smart nowadays and also have no overhead). You could optimize
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
103 somewhat during the in-C initializations, because you can compute the
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
104 actual usage of various tables by scanning the entries you're going to
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
105 add in a separate pass before adding them. (You could actually do the
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
106 same thing when entries are added on the Lisp level by making the
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
107 assumption that all the entries will come in one after another before
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
108 any use is made of the data. So as they're coming in, you just store
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
109 them in a big long list, and the first time you need to retrieve an
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
110 entry, you compute the whole table at once.) You'd still have to deal
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
111 with the possibility of later entries coming in, though.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
112
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
113 (3) You do lose some speed using START/END values, since you need a
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
114 couple of comparisons at each level. This could easily make each single
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
115 lookup become 3-4 times slower. The Unicode book considers this a big
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
116 issue, and recommends against variable-sized tables for this reason;
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
117 however, they almost certainly have in mind applications that primarily
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
118 involve conversion of large amounts of data. Most Unicode strings that
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
119 are translated in XEmacs are fairly small. The only place where this
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
120 might matter is in loading large files -- e.g. a 3-megabyte
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
121 Unicode-encoded file. So think about this, and maybe do a trial
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
122 implementation where you don't worry too much about the intricacies of
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
123 (2) and just implement some basic "multiply by 1.5" trick or something
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
124 to do the resizing. There is a very good FAQ on Unicode called
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
125 something like the Linux-Unicode How-To (it should be part of the Linux
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
126 How-To's, I think), that lists the url of a guy with a whole bunch of
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
127 unicode files you can use to stress-test your implementations, and he's
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
128 highly likely to have a good multi-megabyte Unicode-encoded file (with
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
129 normal text in it -- if you created your own just by creating repeated
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
130 strings of letters and numbers, you probably wouldn't get accurate
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
131 results).
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
132
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
133 INITIALIZATION:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
134 ---------------
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
135
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
136 There are advantages and disadvantages to loading the tables at
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
137 run-time.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
138
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
139 Advantages:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
140
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
141 They're big, and it's very fast to recreate them (a fraction of a second
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
142 on modern processors).
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
143
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
144 Disadvantages:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
145
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
146 (1) User-defined charsets: It would be inconvenient to require all
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
147 dumped user-defined charsets to be reloaded at init time.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
148
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
149 NB With run-time loading, we load in init-mule-at-startup, in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
150 mule-cmds.el. This is called from startup.el, which is quite late in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
151 the initialization process -- but data-directory isn't set until then.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
152 With dump-time loading, you still can't dump in a Japanese directory
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
153 (again, until we move to Unicode internally), but this is not such an
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
154 imposition.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
155
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
156
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
157 */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
158
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
159 /* #### WARNING! The current sledgehammer routines have a fundamental
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
160 problem in that they can't handle two characters mapping to a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
161 single Unicode codepoint or vice-versa in a single charset table.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
162 It's not clear there is any way to handle this and still make the
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
163 sledgehammer routines useful.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
164
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
165 Inquiring Minds Want To Know Dept: does the above WARNING mean that
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
166 _if_ it happens, then it will signal error, or then it will do
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
167 something evil and unpredictable? Signaling an error is OK: for
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
168 all national standards, the national to Unicode map is an inclusion
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
169 (1-to-1). Any character set that does not behave that way is
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
170 broken according to the Unicode standard.
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
171
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
172 Answer: You will get an ABORT(), since the purpose of the sledgehammer
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
173 routines is self-checking. The above problem with non-1-to-1 mapping
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
174 occurs in the Big5 tables, as provided by the Unicode Consortium. */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
175
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
176 /* #define SLEDGEHAMMER_CHECK_UNICODE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
177
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
178 /* When MULE is not defined, we may still need some Unicode support --
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
179 in particular, some Windows API's always want Unicode, and the way
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
180 we've set up the Unicode encapsulation, we may as well go ahead and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
181 always use the Unicode versions of split API's. (It would be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
182 trickier to not use them, and pointless -- under NT, the ANSI API's
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
183 call the Unicode ones anyway, so in the case of structures, we'd be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
184 converting from Unicode to ANSI structures, only to have the OS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
185 convert them back.) */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
186
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
187 Lisp_Object Qunicode;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
188 Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7, Qutf_32;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
189 Lisp_Object Qneed_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
190
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
191 Lisp_Object Qutf_16_little_endian, Qutf_16_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
192 Lisp_Object Qutf_16_little_endian_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
193
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
194 Lisp_Object Qutf_8_bom;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
195
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
196 /* See the Unicode FAQ, http://www.unicode.org/faq/utf_bom.html#35 for this
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
197 algorithm.
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
198
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
199 (They also give another, really verbose one, as part of their explanation
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
200 of the various planes of the encoding, but we won't use that.) */
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
201
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
202 #define UTF_16_LEAD_OFFSET (0xD800 - (0x10000 >> 10))
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
203 #define UTF_16_SURROGATE_OFFSET (0x10000 - (0xD800 << 10) - 0xDC00)
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
204
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
205 #define utf_16_surrogates_to_code(lead, trail) \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
206 (((lead) << 10) + (trail) + UTF_16_SURROGATE_OFFSET)
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
207
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
208 #define CODE_TO_UTF_16_SURROGATES(codepoint, lead, trail) do { \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
209 int __ctu16s_code = (codepoint); \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
210 lead = UTF_16_LEAD_OFFSET + (__ctu16s_code >> 10); \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
211 trail = 0xDC00 + (__ctu16s_code & 0x3FF); \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
212 } while (0)
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
213
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
214 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
215
3352
8dbdcd070418 [xemacs-hg @ 2006-04-22 15:18:54 by stephent]
stephent
parents: 3025
diff changeset
216 /* Using ints for to_unicode is OK (as long as they are >= 32 bits).
8dbdcd070418 [xemacs-hg @ 2006-04-22 15:18:54 by stephent]
stephent
parents: 3025
diff changeset
217 In from_unicode, we're converting from Mule characters, which means
8dbdcd070418 [xemacs-hg @ 2006-04-22 15:18:54 by stephent]
stephent
parents: 3025
diff changeset
218 that the values being converted to are only 96x96, and we can save
8dbdcd070418 [xemacs-hg @ 2006-04-22 15:18:54 by stephent]
stephent
parents: 3025
diff changeset
219 space by using shorts (signedness doesn't matter). */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
220 static int *to_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
221 static int **to_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
222
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
223 static short *from_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
224 static short **from_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
225 static short ***from_unicode_blank_3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
226 static short ****from_unicode_blank_4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
227
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
228 static const struct memory_description to_unicode_level_0_desc_1[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
229 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
230 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
231
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
232 static const struct sized_memory_description to_unicode_level_0_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
233 sizeof (int), to_unicode_level_0_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
234 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
235
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
236 static const struct memory_description to_unicode_level_1_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
237 { XD_BLOCK_PTR, 0, 96, { &to_unicode_level_0_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
238 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
239 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
240
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
241 static const struct sized_memory_description to_unicode_level_1_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
242 sizeof (void *), to_unicode_level_1_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
243 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
244
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
245 static const struct memory_description to_unicode_description_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
246 { XD_BLOCK_PTR, 1, 96, { &to_unicode_level_0_desc } },
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
247 { XD_BLOCK_PTR, 2, 96, { &to_unicode_level_1_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
248 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
249 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
250
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
251 /* Not static because each charset has a set of to and from tables and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
252 needs to describe them to pdump. */
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
253 const struct sized_memory_description to_unicode_description = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
254 sizeof (void *), to_unicode_description_1
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
255 };
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
256
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
257 /* Used only for to_unicode_blank_2 */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
258 static const struct memory_description to_unicode_level_2_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
259 { XD_BLOCK_PTR, 0, 96, { &to_unicode_level_1_desc } },
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
260 { XD_END }
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
261 };
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
262
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
263 static const struct memory_description from_unicode_level_0_desc_1[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
264 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
265 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
266
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
267 static const struct sized_memory_description from_unicode_level_0_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
268 sizeof (short), from_unicode_level_0_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
269 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
270
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
271 static const struct memory_description from_unicode_level_1_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
272 { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_0_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
273 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
274 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
275
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
276 static const struct sized_memory_description from_unicode_level_1_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
277 sizeof (void *), from_unicode_level_1_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
278 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
279
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
280 static const struct memory_description from_unicode_level_2_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
281 { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_1_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
282 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
283 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
284
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
285 static const struct sized_memory_description from_unicode_level_2_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
286 sizeof (void *), from_unicode_level_2_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
287 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
288
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
289 static const struct memory_description from_unicode_level_3_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
290 { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_2_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
291 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
292 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
293
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
294 static const struct sized_memory_description from_unicode_level_3_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
295 sizeof (void *), from_unicode_level_3_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
296 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
297
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
298 static const struct memory_description from_unicode_description_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
299 { XD_BLOCK_PTR, 1, 256, { &from_unicode_level_0_desc } },
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
300 { XD_BLOCK_PTR, 2, 256, { &from_unicode_level_1_desc } },
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
301 { XD_BLOCK_PTR, 3, 256, { &from_unicode_level_2_desc } },
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
302 { XD_BLOCK_PTR, 4, 256, { &from_unicode_level_3_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
303 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
304 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
305
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
306 /* Not static because each charset has a set of to and from tables and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
307 needs to describe them to pdump. */
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
308 const struct sized_memory_description from_unicode_description = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
309 sizeof (void *), from_unicode_description_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
310 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
311
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
312 /* Used only for from_unicode_blank_4 */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
313 static const struct memory_description from_unicode_level_4_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
314 { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_3_desc } },
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
315 { XD_END }
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
316 };
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
317
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
318 static Lisp_Object_dynarr *unicode_precedence_dynarr;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
319
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
320 static const struct memory_description lod_description_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
321 XD_DYNARR_DESC (Lisp_Object_dynarr, &lisp_object_description),
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
322 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
323 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
324
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
325 static const struct sized_memory_description lisp_object_dynarr_description = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
326 sizeof (Lisp_Object_dynarr),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
327 lod_description_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
328 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
329
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
330 Lisp_Object Vlanguage_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
331 Lisp_Object Vdefault_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
332
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
333 Lisp_Object Qignore_first_column;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
334
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
335 Lisp_Object Vcurrent_jit_charset;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
336 Lisp_Object Qlast_allocated_character;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
337 Lisp_Object Qccl_encode_to_ucs_2;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
338
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
339 Lisp_Object Vnumber_of_jit_charsets;
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
340 Lisp_Object Vlast_jit_charset_final;
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
341 Lisp_Object Vcharset_descr;
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
342
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
343
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
344
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
345 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
346 /* Unicode implementation */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
347 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
348
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
349 #define BREAKUP_UNICODE_CODE(val, u1, u2, u3, u4, levels) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
350 do { \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
351 int buc_val = (val); \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
352 \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
353 (u1) = buc_val >> 24; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
354 (u2) = (buc_val >> 16) & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
355 (u3) = (buc_val >> 8) & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
356 (u4) = buc_val & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
357 (levels) = (buc_val <= 0xFF ? 1 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
358 buc_val <= 0xFFFF ? 2 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
359 buc_val <= 0xFFFFFF ? 3 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
360 4); \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
361 } while (0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
362
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
363 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
364 init_blank_unicode_tables (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
365 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
366 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
367
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
368 from_unicode_blank_1 = xnew_array (short, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
369 from_unicode_blank_2 = xnew_array (short *, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
370 from_unicode_blank_3 = xnew_array (short **, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
371 from_unicode_blank_4 = xnew_array (short ***, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
372 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
373 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
374 /* #### IMWTK: Why does using -1 here work? Simply because there are
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
375 no existing 96x96 charsets?
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
376
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
377 Answer: I don't understand the concern. -1 indicates there is no
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
378 entry for this particular codepoint, which is always the case for
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
379 blank tables. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
380 from_unicode_blank_1[i] = (short) -1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
381 from_unicode_blank_2[i] = from_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
382 from_unicode_blank_3[i] = from_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
383 from_unicode_blank_4[i] = from_unicode_blank_3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
384 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
385
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
386 to_unicode_blank_1 = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
387 to_unicode_blank_2 = xnew_array (int *, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
388 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
389 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
390 /* Here -1 is guaranteed OK. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
391 to_unicode_blank_1[i] = -1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
392 to_unicode_blank_2[i] = to_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
393 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
394 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
395
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
396 static void *
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
397 create_new_from_unicode_table (int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
398 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
399 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
400 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
401 /* WARNING: If you are thinking of compressing these, keep in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
402 mind that sizeof (short) does not equal sizeof (short *). */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
403 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
404 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
405 short *newtab = xnew_array (short, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
406 memcpy (newtab, from_unicode_blank_1, 256 * sizeof (short));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
407 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
408 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
409 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
410 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
411 short **newtab = xnew_array (short *, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
412 memcpy (newtab, from_unicode_blank_2, 256 * sizeof (short *));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
413 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
414 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
415 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
416 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
417 short ***newtab = xnew_array (short **, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
418 memcpy (newtab, from_unicode_blank_3, 256 * sizeof (short **));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
419 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
420 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
421 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
422 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
423 short ****newtab = xnew_array (short ***, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
424 memcpy (newtab, from_unicode_blank_4, 256 * sizeof (short ***));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
425 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
426 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
427 default:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
428 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
429 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
430 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
431 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
432
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
433 /* Allocate and blank the tables.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
434 Loading them up is done by load-unicode-mapping-table. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
435 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
436 init_charset_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
437 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
438 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
439 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
440 int *to_table = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
441 memcpy (to_table, to_unicode_blank_1, 96 * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
442 XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
443 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
444 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
445 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
446 int **to_table = xnew_array (int *, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
447 memcpy (to_table, to_unicode_blank_2, 96 * sizeof (int *));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
448 XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
449 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
450
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
451 {
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
452 XCHARSET_FROM_UNICODE_TABLE (charset) =
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
453 create_new_from_unicode_table (1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
454 XCHARSET_FROM_UNICODE_LEVELS (charset) = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
455 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
456 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
457
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
458 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
459 free_from_unicode_table (void *table, int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
460 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
461 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
462
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
463 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
464 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
465 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
466 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
467 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
468 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
469 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
470 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
471 free_from_unicode_table (tab[i], 1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
472 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
473 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
474 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
475 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
476 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
477 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
478 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
479 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
480 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
481 free_from_unicode_table (tab[i], 2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
482 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
483 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
484 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
485 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
486 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
487 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
488 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
489 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
490 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
491 free_from_unicode_table (tab[i], 3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
492 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
493 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
494 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
495 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
496
1726
a8d8f419b459 [xemacs-hg @ 2003-09-30 15:26:34 by james]
james
parents: 1429
diff changeset
497 xfree (table, void *);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
498 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
499
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
500 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
501 free_to_unicode_table (void *table, int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
502 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
503 if (level == 2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
504 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
505 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
506 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
507
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
508 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
509 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
510 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
511 free_to_unicode_table (tab[i], 1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
512 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
513 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
514
1726
a8d8f419b459 [xemacs-hg @ 2003-09-30 15:26:34 by james]
james
parents: 1429
diff changeset
515 xfree (table, void *);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
516 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
517
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
518 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
519 free_charset_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
520 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
521 free_to_unicode_table (XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
522 XCHARSET_DIMENSION (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
523 free_from_unicode_table (XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
524 XCHARSET_FROM_UNICODE_LEVELS (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
525 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
526
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
527 #ifdef MEMORY_USAGE_STATS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
528
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
529 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
530 compute_from_unicode_table_size_1 (void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
531 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
532 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
533 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
534 Bytecount size = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
535
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
536 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
537 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
538 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
539 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
540 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
541 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
542 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
543 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
544 size += compute_from_unicode_table_size_1 (tab[i], 1, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
545 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
546 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
547 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
548 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
549 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
550 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
551 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
552 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
553 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
554 size += compute_from_unicode_table_size_1 (tab[i], 2, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
555 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
556 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
557 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
558 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
559 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
560 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
561 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
562 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
563 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
564 size += compute_from_unicode_table_size_1 (tab[i], 3, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
565 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
566 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
567 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
568 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
569
3024
b7f26b2f78bd [xemacs-hg @ 2005-10-25 08:32:40 by ben]
ben
parents: 3017
diff changeset
570 size += malloced_storage_size (table,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
571 256 * (level == 1 ? sizeof (short) :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
572 sizeof (void *)),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
573 stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
574 return size;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
575 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
576
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
577 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
578 compute_to_unicode_table_size_1 (void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
579 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
580 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
581 Bytecount size = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
582
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
583 if (level == 2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
584 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
585 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
586 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
587
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
588 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
589 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
590 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
591 size += compute_to_unicode_table_size_1 (tab[i], 1, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
592 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
593 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
594
3024
b7f26b2f78bd [xemacs-hg @ 2005-10-25 08:32:40 by ben]
ben
parents: 3017
diff changeset
595 size += malloced_storage_size (table,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
596 96 * (level == 1 ? sizeof (int) :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
597 sizeof (void *)),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
598 stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
599 return size;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
600 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
601
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
602 Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
603 compute_from_unicode_table_size (Lisp_Object charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
604 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
605 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
606 return (compute_from_unicode_table_size_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
607 (XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
608 XCHARSET_FROM_UNICODE_LEVELS (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
609 stats));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
610 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
611
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
612 Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
613 compute_to_unicode_table_size (Lisp_Object charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
614 struct overhead_stats *stats)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
615 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
616 return (compute_to_unicode_table_size_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
617 (XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
618 XCHARSET_DIMENSION (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
619 stats));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
620 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
621
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
622 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
623
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
624 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
625
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
626 /* "Sledgehammer checks" are checks that verify the self-consistency
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
627 of an entire structure every time a change is about to be made or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
628 has been made to the structure. Not fast but a pretty much
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
629 sure-fire way of flushing out any incorrectnesses in the algorithms
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
630 that create the structure.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
631
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
632 Checking only after a change has been made will speed things up by
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
633 a factor of 2, but it doesn't absolutely prove that the code just
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
634 checked caused the problem; perhaps it happened elsewhere, either
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
635 in some code you forgot to sledgehammer check or as a result of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
636 data corruption. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
637
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
638 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
639 assert_not_any_blank_table (void *tab)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
640 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
641 assert (tab != from_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
642 assert (tab != from_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
643 assert (tab != from_unicode_blank_3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
644 assert (tab != from_unicode_blank_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
645 assert (tab != to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
646 assert (tab != to_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
647 assert (tab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
648 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
649
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
650 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
651 sledgehammer_check_from_table (Lisp_Object charset, void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
652 int codetop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
653 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
654 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
655
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
656 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
657 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
658 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
659 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
660 short *tab = (short *) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
661 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
662 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
663 if (tab[i] != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
664 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
665 Lisp_Object char_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
666 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
667
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
668 assert (valid_ichar_p (tab[i]));
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
669 BREAKUP_ICHAR (tab[i], char_charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
670 assert (EQ (charset, char_charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
671 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
672 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
673 int *to_table =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
674 (int *) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
675 assert_not_any_blank_table (to_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
676 assert (to_table[c1 - 32] == (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
677 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
678 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
679 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
680 int **to_table =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
681 (int **) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
682 assert_not_any_blank_table (to_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
683 assert_not_any_blank_table (to_table[c1 - 32]);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
684 assert (to_table[c1 - 32][c2 - 32] == (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
685 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
686 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
687 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
688 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
689 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
690 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
691 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
692 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
693 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
694 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
695 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
696 sledgehammer_check_from_table (charset, tab[i], 1,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
697 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
698 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
699 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
700 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
701 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
702 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
703 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
704 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
705 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
706 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
707 sledgehammer_check_from_table (charset, tab[i], 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
708 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
709 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
710 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
711 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
712 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
713 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
714 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
715 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
716 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
717 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
718 sledgehammer_check_from_table (charset, tab[i], 3,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
719 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
720 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
721 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
722 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
723 default:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
724 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
725 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
726 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
727
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
728 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
729 sledgehammer_check_to_table (Lisp_Object charset, void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
730 int codetop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
731 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
732 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
733
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
734 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
735 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
736 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
737 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
738 int *tab = (int *) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
739
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
740 if (XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
741 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
742 assert (tab[0] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
743 assert (tab[95] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
744 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
745
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
746 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
747 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
748 if (tab[i] != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
749 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
750 int u4, u3, u2, u1, levels;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
751 Ichar ch;
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
752 Ichar this_ch;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
753 short val;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
754 void *frtab = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
755
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
756 if (XCHARSET_DIMENSION (charset) == 1)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
757 this_ch = make_ichar (charset, i + 32, 0);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
758 else
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
759 this_ch = make_ichar (charset, codetop + 32, i + 32);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
760
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
761 assert (tab[i] >= 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
762 BREAKUP_UNICODE_CODE (tab[i], u4, u3, u2, u1, levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
763 assert (levels <= XCHARSET_FROM_UNICODE_LEVELS (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
764
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
765 switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
766 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
767 case 1: val = ((short *) frtab)[u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
768 case 2: val = ((short **) frtab)[u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
769 case 3: val = ((short ***) frtab)[u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
770 case 4: val = ((short ****) frtab)[u4][u3][u2][u1]; break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
771 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
772 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
773
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
774 ch = make_ichar (charset, val >> 8, val & 0xFF);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
775 assert (ch == this_ch);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
776
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
777 switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
778 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
779 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
780 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
781 frtab = ((short ****) frtab)[u4];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
782 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
783 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
784 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
785 frtab = ((short ***) frtab)[u3];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
786 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
787 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
788 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
789 frtab = ((short **) frtab)[u2];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
790 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
791 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
792 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
793 break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
794 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
795 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
796 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
797 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
798 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
799 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
800 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
801 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
802 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
803
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
804 if (XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
805 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
806 assert (tab[0] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
807 assert (tab[95] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
808 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
809
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
810 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
811 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
812 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
813 sledgehammer_check_to_table (charset, tab[i], 1, i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
814 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
815 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
816 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
817 default:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
818 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
819 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
820 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
821
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
822 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
823 sledgehammer_check_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
824 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
825 /* verify that the blank tables have not been modified */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
826 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
827 int from_level = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
828 int to_level = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
829
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
830 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
831 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
832 assert (from_unicode_blank_1[i] == (short) -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
833 assert (from_unicode_blank_2[i] == from_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
834 assert (from_unicode_blank_3[i] == from_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
835 assert (from_unicode_blank_4[i] == from_unicode_blank_3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
836 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
837
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
838 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
839 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
840 assert (to_unicode_blank_1[i] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
841 assert (to_unicode_blank_2[i] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
842 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
843
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
844 assert (from_level >= 1 && from_level <= 4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
845
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
846 sledgehammer_check_from_table (charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
847 XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
848 from_level, 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
849
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
850 sledgehammer_check_to_table (charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
851 XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
852 XCHARSET_DIMENSION (charset), 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
853 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
854
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
855 #endif /* SLEDGEHAMMER_CHECK_UNICODE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
856
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
857 static void
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
858 set_unicode_conversion (Ichar chr, int code)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
859 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
860 Lisp_Object charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
861 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
862
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
863 BREAKUP_ICHAR (chr, charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
864
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
865 /* I tried an assert on code > 255 || chr == code, but that fails because
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
866 Mule gives many Latin characters separate code points for different
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
867 ISO 8859 coded character sets. Obvious in hindsight.... */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
868 assert (!EQ (charset, Vcharset_ascii) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
869 assert (!EQ (charset, Vcharset_latin_iso8859_1) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
870 assert (!EQ (charset, Vcharset_control_1) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
871
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
872 /* This assert is needed because it is simply unimplemented. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
873 assert (!EQ (charset, Vcharset_composite));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
874
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
875 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
876 sledgehammer_check_unicode_tables (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
877 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
878
2704
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
879 if (EQ(charset, Vcharset_ascii) || EQ(charset, Vcharset_control_1))
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
880 return;
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
881
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
882 /* First, the char -> unicode translation */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
883
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
884 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
885 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
886 int *to_table = (int *) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
887 to_table[c1 - 32] = code;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
888 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
889 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
890 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
891 int **to_table_2 = (int **) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
892 int *to_table_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
893
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
894 assert (XCHARSET_DIMENSION (charset) == 2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
895 to_table_1 = to_table_2[c1 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
896 if (to_table_1 == to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
897 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
898 to_table_1 = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
899 memcpy (to_table_1, to_unicode_blank_1, 96 * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
900 to_table_2[c1 - 32] = to_table_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
901 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
902 to_table_1[c2 - 32] = code;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
903 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
904
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
905 /* Then, unicode -> char: much harder */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
906
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
907 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
908 int charset_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
909 int u4, u3, u2, u1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
910 int code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
911 BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
912
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
913 charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
914
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
915 /* Make sure the charset's tables have at least as many levels as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
916 the code point has: Note that the charset is guaranteed to have
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
917 at least one level, because it was created that way */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
918 if (charset_levels < code_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
919 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
920 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
921
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
922 assert (charset_levels > 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
923 for (i = 2; i <= code_levels; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
924 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
925 if (charset_levels < i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
926 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
927 void *old_table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
928 void *table = create_new_from_unicode_table (i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
929 XCHARSET_FROM_UNICODE_TABLE (charset) = table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
930
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
931 switch (i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
932 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
933 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
934 ((short **) table)[0] = (short *) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
935 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
936 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
937 ((short ***) table)[0] = (short **) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
938 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
939 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
940 ((short ****) table)[0] = (short ***) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
941 break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
942 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
943 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
944 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
945 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
946
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
947 charset_levels = code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
948 XCHARSET_FROM_UNICODE_LEVELS (charset) = code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
949 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
950
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
951 /* Now, make sure there is a non-default table at each level */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
952 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
953 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
954 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
955
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
956 for (i = charset_levels; i >= 2; i--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
957 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
958 switch (i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
959 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
960 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
961 if (((short ****) table)[u4] == from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
962 ((short ****) table)[u4] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
963 ((short ***) create_new_from_unicode_table (3));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
964 table = ((short ****) table)[u4];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
965 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
966 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
967 if (((short ***) table)[u3] == from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
968 ((short ***) table)[u3] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
969 ((short **) create_new_from_unicode_table (2));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
970 table = ((short ***) table)[u3];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
971 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
972 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
973 if (((short **) table)[u2] == from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
974 ((short **) table)[u2] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
975 ((short *) create_new_from_unicode_table (1));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
976 table = ((short **) table)[u2];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
977 break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
978 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
979 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
980 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
981 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
982
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
983 /* Finally, set the character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
984
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
985 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
986 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
987 switch (charset_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
988 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
989 case 1: ((short *) table)[u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
990 case 2: ((short **) table)[u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
991 case 3: ((short ***) table)[u3][u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
992 case 4: ((short ****) table)[u4][u3][u2][u1] = (c1 << 8) + c2; break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
993 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
994 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
995 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
996 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
997
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
998 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
999 sledgehammer_check_unicode_tables (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1000 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1001 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1002
788
026c5bf9c134 [xemacs-hg @ 2002-03-21 07:29:57 by ben]
ben
parents: 778
diff changeset
1003 int
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1004 ichar_to_unicode (Ichar chr)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1005 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1006 Lisp_Object charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1007 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1008
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1009 type_checking_assert (valid_ichar_p (chr));
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1010 /* This shortcut depends on the representation of an Ichar, see text.c. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1011 if (chr < 256)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1012 return (int) chr;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1013
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1014 BREAKUP_ICHAR (chr, charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1015 if (EQ (charset, Vcharset_composite))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1016 return -1; /* #### don't know how to handle */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1017 else if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1018 return ((int *) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1019 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1020 return ((int **) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32][c2 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1021 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1022
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1023 static Ichar
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1024 get_free_codepoint(Lisp_Object charset)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1025 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1026 Lisp_Object name = Fcharset_name(charset);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1027 Lisp_Object zeichen = Fget(name, Qlast_allocated_character, Qnil);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1028 Ichar res;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1029
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1030 /* Only allow this with the 96x96 character sets we are using for
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1031 temporary Unicode support. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1032 assert(2 == XCHARSET_DIMENSION(charset) && 96 == XCHARSET_CHARS(charset));
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1033
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1034 if (!NILP(zeichen))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1035 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1036 int c1, c2;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1037
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1038 BREAKUP_ICHAR(XCHAR(zeichen), charset, c1, c2);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1039
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1040 if (127 == c1 && 127 == c2)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1041 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1042 /* We've already used the hightest-numbered character in this
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1043 set--tell our caller to create another. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1044 return -1;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1045 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1046
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1047 if (127 == c2)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1048 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1049 ++c1;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1050 c2 = 0x20;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1051 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1052 else
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1053 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1054 ++c2;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1055 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1056
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1057 res = make_ichar(charset, c1, c2);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1058 Fput(name, Qlast_allocated_character, make_char(res));
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1059 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1060 else
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1061 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1062 res = make_ichar(charset, 32, 32);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1063 Fput(name, Qlast_allocated_character, make_char(res));
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1064 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1065 return res;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1066 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1067
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1068 /* The just-in-time creation of XEmacs characters that correspond to unknown
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1069 Unicode code points happens when:
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1070
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1071 1. The lookup would otherwise fail.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1072
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1073 2. The charsets array is the nil or the default.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1074
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1075 If there are no free code points in the just-in-time Unicode character
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1076 set, and the charsets array is the default unicode precedence list,
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1077 create a new just-in-time Unicode character set, add it at the end of the
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1078 unicode precedence list, create the XEmacs character in that character
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1079 set, and return it. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1080
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1081 static Ichar
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1082 unicode_to_ichar (int code, Lisp_Object_dynarr *charsets)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1083 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1084 int u1, u2, u3, u4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1085 int code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1086 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1087 int n = Dynarr_length (charsets);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1088
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1089 type_checking_assert (code >= 0);
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1090 /* This shortcut depends on the representation of an Ichar, see text.c.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1091 Note that it may _not_ be extended to U+00A0 to U+00FF (many ISO 8859
893
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
1092 coded character sets have points that map into that region, so this
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
1093 function is many-valued). */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1094 if (code < 0xA0)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1095 return (Ichar) code;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1096
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1097 BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1098
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1099 for (i = 0; i < n; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1100 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1101 Lisp_Object charset = Dynarr_at (charsets, i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1102 int charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1103 if (charset_levels >= code_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1104 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1105 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1106 short retval;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1107
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1108 switch (charset_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1109 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1110 case 1: retval = ((short *) table)[u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1111 case 2: retval = ((short **) table)[u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1112 case 3: retval = ((short ***) table)[u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1113 case 4: retval = ((short ****) table)[u4][u3][u2][u1]; break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1114 default: ABORT (); retval = 0;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1115 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1116
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1117 if (retval != -1)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1118 return make_ichar (charset, retval >> 8, retval & 0xFF);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1119 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1120 }
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1121
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1122 /* Only do the magic just-in-time assignment if we're using the default
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1123 list. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1124 if (unicode_precedence_dynarr == charsets)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1125 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1126 if (NILP (Vcurrent_jit_charset) ||
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1127 (-1 == (i = get_free_codepoint(Vcurrent_jit_charset))))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1128 {
3452
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1129 Ibyte setname[32];
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1130 int number_of_jit_charsets = XINT (Vnumber_of_jit_charsets);
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1131 Ascbyte last_jit_charset_final = XCHAR (Vlast_jit_charset_final);
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1132
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1133 /* This final byte shit is, umm, not that cool. */
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1134 assert (last_jit_charset_final >= 0x30);
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1135
3452
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1136 /* Assertion added partly because our Win32 layer doesn't
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1137 support snprintf; with this, we're sure it won't overflow
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1138 the buffer. */
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1139 assert(100 > number_of_jit_charsets);
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1140
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1141 qxesprintf(setname, "jit-ucs-charset-%d", number_of_jit_charsets);
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1142
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1143 Vcurrent_jit_charset = Fmake_charset
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1144 (intern((const CIbyte *)setname), Vcharset_descr,
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1145 /* Set encode-as-utf-8 to t, to have this character set written
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1146 using UTF-8 escapes in escape-quoted and ctext. This
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1147 sidesteps the fact that our internal character -> Unicode
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1148 mapping is not stable from one invocation to the next. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1149 nconc2 (list2(Qencode_as_utf_8, Qt),
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1150 nconc2 (list6(Qcolumns, make_int(1), Qchars, make_int(96),
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1151 Qdimension, make_int(2)),
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
1152 list6(Qregistries, Qunicode_registries,
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1153 Qfinal, make_char(last_jit_charset_final),
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1154 /* This CCL program is initialised in
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1155 unicode.el. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1156 Qccl_program, Qccl_encode_to_ucs_2))));
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1157
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1158 /* Record for the Unicode infrastructure that we've created
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1159 this character set. */
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1160 Vnumber_of_jit_charsets = make_int (number_of_jit_charsets + 1);
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1161 Vlast_jit_charset_final = make_char (last_jit_charset_final + 1);
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1162
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1163 i = get_free_codepoint(Vcurrent_jit_charset);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1164 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1165
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1166 if (-1 != i)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1167 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1168 set_unicode_conversion((Ichar)i, code);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1169 /* No need to add the charset to the end of the list; it's done
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1170 automatically. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1171 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1172 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1173 return (Ichar) i;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1174 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1175
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1176 /* Add charsets to precedence list.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1177 LIST must be a list of charsets. Charsets which are in the list more
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1178 than once are given the precedence implied by their earliest appearance.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1179 Later appearances are ignored. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1180 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1181 add_charsets_to_precedence_list (Lisp_Object list, int *lbs,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1182 Lisp_Object_dynarr *dynarr)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1183 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1184 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1185 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1186 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1187 Lisp_Object charset = Fget_charset (elt);
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
1188 int lb = XCHARSET_LEADING_BYTE (charset);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1189 if (lbs[lb - MIN_LEADING_BYTE] == 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1190 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1191 Dynarr_add (dynarr, charset);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1192 lbs[lb - MIN_LEADING_BYTE] = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1193 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1194 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1195 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1196 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1197
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1198 /* Rebuild the charset precedence array.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1199 The "charsets preferred for the current language" get highest precedence,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1200 followed by the "charsets preferred by default", ordered as in
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1201 Vlanguage_unicode_precedence_list and Vdefault_unicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1202 respectively. All remaining charsets follow in an arbitrary order. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1203 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1204 recalculate_unicode_precedence (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1205 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1206 int lbs[NUM_LEADING_BYTES];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1207 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1208
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1209 for (i = 0; i < NUM_LEADING_BYTES; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1210 lbs[i] = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1211
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1212 Dynarr_reset (unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1213
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1214 add_charsets_to_precedence_list (Vlanguage_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1215 lbs, unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1216 add_charsets_to_precedence_list (Vdefault_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1217 lbs, unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1218
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1219 for (i = 0; i < NUM_LEADING_BYTES; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1220 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1221 if (lbs[i] == 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1222 {
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1223 Lisp_Object charset = charset_by_leading_byte (i + MIN_LEADING_BYTE);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1224 if (!NILP (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1225 Dynarr_add (unicode_precedence_dynarr, charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1226 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1227 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1228 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1229
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1230 DEFUN ("unicode-precedence-list",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1231 Funicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1232 0, 0, 0, /*
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1233 Return the precedence order among charsets used for Unicode decoding.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1234
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1235 Value is a list of charsets, which are searched in order for a translation
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1236 matching a given Unicode character.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1237
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1238 The highest precedence is given to the language-specific precedence list of
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1239 charsets, defined by `set-language-unicode-precedence-list'. These are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1240 followed by charsets in the default precedence list, defined by
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1241 `set-default-unicode-precedence-list'. Charsets occurring multiple times are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1242 given precedence according to their first occurrance in either list. These
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1243 are followed by the remaining charsets, in some arbitrary order.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1244
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1245 The language-specific precedence list is meant to be set as part of the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1246 language environment initialization; the default precedence list is meant
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1247 to be set by the user.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1248
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1249 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1250 */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1251 ())
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1252 {
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1253 int i;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1254 Lisp_Object list = Qnil;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1255
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1256 for (i = Dynarr_length (unicode_precedence_dynarr) - 1; i >= 0; i--)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1257 list = Fcons (Dynarr_at (unicode_precedence_dynarr, i), list);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1258 return list;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1259 }
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1260
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1261
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1262 /* #### This interface is wrong. Cyrillic users and Chinese users are going
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1263 to have varying opinions about whether ISO Cyrillic, KOI8-R, or Windows
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1264 1251 should take precedence, and whether Big Five or CNS should take
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1265 precedence, respectively. This means that users are sometimes going to
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1266 want to set Vlanguage_unicode_precedence_list.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1267 Furthermore, this should be language-local (buffer-local would be a
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1268 reasonable approximation).
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1269
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1270 Answer: You are right, this needs rethinking. */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1271 DEFUN ("set-language-unicode-precedence-list",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1272 Fset_language_unicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1273 1, 1, 0, /*
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1274 Set the language-specific precedence of charsets in Unicode decoding.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1275 LIST is a list of charsets.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1276 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1277
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1278 #### NOTE: This interface may be changed.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1279 */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1280 (list))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1281 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1282 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1283 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1284 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1285 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1286
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1287 Vlanguage_unicode_precedence_list = list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1288 recalculate_unicode_precedence ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1289 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1290 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1291
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1292 DEFUN ("language-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1293 Flanguage_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1294 0, 0, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1295 Return the language-specific precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1296 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1297
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1298 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1299 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1300 ())
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1301 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1302 return Vlanguage_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1303 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1304
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1305 DEFUN ("set-default-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1306 Fset_default_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1307 1, 1, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1308 Set the default precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1309 This is intended to be set by the user. See
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1310 `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1311
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1312 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1313 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1314 (list))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1315 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1316 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1317 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1318 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1319 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1320
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1321 Vdefault_unicode_precedence_list = list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1322 recalculate_unicode_precedence ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1323 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1324 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1325
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1326 DEFUN ("default-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1327 Fdefault_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1328 0, 0, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1329 Return the default precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1330 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1331
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1332 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1333 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1334 ())
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1335 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1336 return Vdefault_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1337 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1338
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1339 DEFUN ("set-unicode-conversion", Fset_unicode_conversion,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1340 2, 2, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1341 Add conversion information between Unicode codepoints and characters.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1342 Conversions for U+0000 to U+00FF are hardwired to ASCII, Control-1, and
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1343 Latin-1. Attempts to set these values will raise an error.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1344
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1345 CHARACTER is one of the following:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1346
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1347 -- A character (in which case CODE must be a non-negative integer; values
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1348 above 2^20 - 1 are allowed for the purpose of specifying private
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1349 characters, but are illegal in standard Unicode---they will cause errors
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1350 when converted to utf-16)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1351 -- A vector of characters (in which case CODE must be a vector of integers
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1352 of the same length)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1353 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1354 (character, code))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1355 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1356 Lisp_Object charset;
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1357 int ichar, unicode;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1358
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1359 CHECK_CHAR (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1360 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1361
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1362 unicode = XINT (code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1363 ichar = XCHAR (character);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1364 charset = ichar_charset (ichar);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1365
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1366 /* The translations of ASCII, Control-1, and Latin-1 code points are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1367 hard-coded in ichar_to_unicode and unicode_to_ichar.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1368
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1369 Checking unicode < 256 && ichar != unicode is wrong because Mule gives
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1370 many Latin characters code points in a few different character sets. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1371 if ((EQ (charset, Vcharset_ascii) ||
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1372 EQ (charset, Vcharset_control_1) ||
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1373 EQ (charset, Vcharset_latin_iso8859_1))
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1374 && unicode != ichar)
893
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
1375 signal_error (Qinvalid_argument, "Can't change Unicode translation for ASCII, Control-1 or Latin-1 character",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1376 character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1377
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1378 /* #### Composite characters are not properly implemented yet. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1379 if (EQ (charset, Vcharset_composite))
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1380 signal_error (Qinvalid_argument, "Can't set Unicode translation for Composite char",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1381 character);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1382
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1383 set_unicode_conversion (ichar, unicode);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1384 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1385 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1386
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1387 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1388
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
1389 DEFUN ("char-to-unicode", Fchar_to_unicode, 1, 1, 0, /*
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1390 Convert character to Unicode codepoint.
3025
facf3239ba30 [xemacs-hg @ 2005-10-25 11:16:19 by ben]
ben
parents: 3024
diff changeset
1391 When there is no international support (i.e. the `mule' feature is not
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1392 present), this function simply does `char-to-int'.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1393 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1394 (character))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1395 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1396 CHECK_CHAR (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1397 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1398 return make_int (ichar_to_unicode (XCHAR (character)));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1399 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1400 return Fchar_to_int (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1401 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1402 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1403
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
1404 DEFUN ("unicode-to-char", Funicode_to_char, 1, 2, 0, /*
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1405 Convert Unicode codepoint to character.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1406 CODE should be a non-negative integer.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1407 If CHARSETS is given, it should be a list of charsets, and only those
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1408 charsets will be consulted, in the given order, for a translation.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1409 Otherwise, the default ordering of all charsets will be given (see
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1410 `set-unicode-charset-precedence').
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1411
3025
facf3239ba30 [xemacs-hg @ 2005-10-25 11:16:19 by ben]
ben
parents: 3024
diff changeset
1412 When there is no international support (i.e. the `mule' feature is not
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1413 present), this function simply does `int-to-char' and ignores the CHARSETS
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1414 argument.
2622
c8a9be2d4728 [xemacs-hg @ 2005-02-28 23:36:30 by aidan]
aidan
parents: 2551
diff changeset
1415
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1416 If the CODE would not otherwise be converted to an XEmacs character, and the
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1417 list of character sets to be consulted is nil or the default, a new XEmacs
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1418 character will be created for it in one of the `jit-ucs-charset' Mule
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1419 character sets, and that character will be returned.
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1420
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1421 This is limited to around 400,000 characters per XEmacs session, though, so
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1422 while normal usage will not be problematic, things like:
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1423
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1424 \(dotimes (i #x110000) (decode-char 'ucs i))
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1425
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1426 will eventually error. The long-term solution to this is Unicode as an
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1427 internal encoding.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1428 */
2333
ba4677f54a05 [xemacs-hg @ 2004-10-14 17:26:18 by james]
james
parents: 2286
diff changeset
1429 (code, USED_IF_MULE (charsets)))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1430 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1431 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1432 Lisp_Object_dynarr *dyn;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1433 int lbs[NUM_LEADING_BYTES];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1434 int c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1435
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1436 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1437 c = XINT (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1438 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1439 EXTERNAL_LIST_LOOP_2 (elt, charsets)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1440 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1441 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1442
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1443 if (NILP (charsets))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1444 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1445 Ichar ret = unicode_to_ichar (c, unicode_precedence_dynarr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1446 if (ret == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1447 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1448 return make_char (ret);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1449 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1450
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1451 dyn = Dynarr_new (Lisp_Object);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1452 memset (lbs, 0, NUM_LEADING_BYTES * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1453 add_charsets_to_precedence_list (charsets, lbs, dyn);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1454 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1455 Ichar ret = unicode_to_ichar (c, dyn);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1456 Dynarr_free (dyn);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1457 if (ret == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1458 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1459 return make_char (ret);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1460 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1461 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1462 CHECK_NATNUM (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1463 return Fint_to_char (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1464 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1465 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1466
872
79c6ff3eef26 [xemacs-hg @ 2002-06-20 21:18:01 by ben]
ben
parents: 867
diff changeset
1467 #ifdef MULE
79c6ff3eef26 [xemacs-hg @ 2002-06-20 21:18:01 by ben]
ben
parents: 867
diff changeset
1468
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1469 static Lisp_Object
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1470 cerrar_el_fulano (Lisp_Object fulano)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1471 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1472 FILE *file = (FILE *) get_opaque_ptr (fulano);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1473 retry_fclose (file);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1474 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1475 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1476
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1477 DEFUN ("load-unicode-mapping-table", Fload_unicode_mapping_table,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1478 2, 6, 0, /*
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1479 Load Unicode tables with the Unicode mapping data in FILENAME for CHARSET.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1480 Data is text, in the form of one translation per line -- charset
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1481 codepoint followed by Unicode codepoint. Numbers are decimal or hex
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1482 \(preceded by 0x). Comments are marked with a #. Charset codepoints
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1483 for two-dimensional charsets have the first octet stored in the
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1484 high 8 bits of the hex number and the second in the low 8 bits.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1485
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1486 If START and END are given, only charset codepoints within the given
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1487 range will be processed. (START and END apply to the codepoints in the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1488 file, before OFFSET is applied.)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1489
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1490 If OFFSET is given, that value will be added to all charset codepoints
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1491 in the file to obtain the internal charset codepoint. \(We assume
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1492 that octets in the table are in the range 33 to 126 or 32 to 127. If
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1493 you have a table in ku-ten form, with octets in the range 1 to 94, you
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1494 will have to use an offset of 5140, i.e. 0x2020.)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1495
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1496 FLAGS, if specified, control further how the tables are interpreted
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1497 and are used to special-case certain known format deviations in the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1498 Unicode tables or in the charset:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1499
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1500 `ignore-first-column'
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1501 The JIS X 0208 tables have 3 columns of data instead of 2. The first
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1502 column contains the Shift-JIS codepoint, which we ignore.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1503 `big5'
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1504 The charset codepoints are Big Five codepoints; convert it to the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1505 hacked-up Mule codepoint in `chinese-big5-1' or `chinese-big5-2'.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1506 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1507 (filename, charset, start, end, offset, flags))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1508 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1509 int st = 0, en = INT_MAX, of = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1510 FILE *file;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1511 struct gcpro gcpro1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1512 char line[1025];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1513 int fondo = specpdl_depth ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1514 int ignore_first_column = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1515 int big5 = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1516
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1517 CHECK_STRING (filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1518 charset = Fget_charset (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1519 if (!NILP (start))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1520 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1521 CHECK_INT (start);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1522 st = XINT (start);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1523 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1524 if (!NILP (end))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1525 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1526 CHECK_INT (end);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1527 en = XINT (end);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1528 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1529 if (!NILP (offset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1530 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1531 CHECK_INT (offset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1532 of = XINT (offset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1533 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1534
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1535 if (!LISTP (flags))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1536 flags = list1 (flags);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1537
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1538 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1539 EXTERNAL_LIST_LOOP_2 (elt, flags)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1540 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1541 if (EQ (elt, Qignore_first_column))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1542 ignore_first_column = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1543 else if (EQ (elt, Qbig5))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1544 big5 = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1545 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1546 invalid_constant
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1547 ("Unrecognized `load-unicode-mapping-table' flag", elt);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1548 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1549 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1550
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1551 GCPRO1 (filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1552 filename = Fexpand_file_name (filename, Qnil);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1553 file = qxe_fopen (XSTRING_DATA (filename), READ_TEXT);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1554 if (!file)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1555 report_file_error ("Cannot open", filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1556 record_unwind_protect (cerrar_el_fulano, make_opaque_ptr (file));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1557 while (fgets (line, sizeof (line), file))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1558 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1559 char *p = line;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1560 int cp1, cp2, endcount;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1561 int cp1high, cp1low;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1562 int dummy;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1563
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1564 while (*p) /* erase all comments out of the line */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1565 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1566 if (*p == '#')
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1567 *p = '\0';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1568 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1569 p++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1570 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1571 /* see if line is nothing but whitespace and skip if so */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1572 p = line + strspn (line, " \t\n\r\f");
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1573 if (!*p)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1574 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1575 /* NOTE: It appears that MS Windows and Newlib sscanf() have
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1576 different interpretations for whitespace (== "skip all whitespace
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1577 at processing point"): Newlib requires at least one corresponding
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1578 whitespace character in the input, but MS allows none. The
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1579 following would be easier to write if we could count on the MS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1580 interpretation.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1581
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1582 Also, the return value does NOT include %n storage. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1583 if ((!ignore_first_column ?
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1584 sscanf (p, "%i %i%n", &cp1, &cp2, &endcount) < 2 :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1585 sscanf (p, "%i %i %i%n", &dummy, &cp1, &cp2, &endcount) < 3)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1586 /* #### Temporary code! Cygwin newlib fucked up scanf() handling
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1587 of numbers beginning 0x0... starting in 04/2004, in an attempt
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1588 to fix another bug. A partial fix for this was put in in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1589 06/2004, but as of 10/2004 the value of ENDCOUNT returned in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1590 such case is still wrong. If this gets fixed soon, remove
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1591 this code. --ben */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1592 #ifndef CYGWIN_SCANF_BUG
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1593 || *(p + endcount + strspn (p + endcount, " \t\n\r\f"))
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1594 #endif
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1595 )
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1596 {
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1597 warn_when_safe (Qunicode, Qwarning,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1598 "Unrecognized line in translation file %s:\n%s",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1599 XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1600 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1601 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1602 if (cp1 >= st && cp1 <= en)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1603 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1604 cp1 += of;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1605 if (cp1 < 0 || cp1 >= 65536)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1606 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1607 out_of_range:
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1608 warn_when_safe (Qunicode, Qwarning,
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1609 "Out of range first codepoint 0x%x in "
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1610 "translation file %s:\n%s",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1611 cp1, XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1612 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1613 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1614
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1615 cp1high = cp1 >> 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1616 cp1low = cp1 & 255;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1617
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1618 if (big5)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1619 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1620 Ichar ch = decode_big5_char (cp1high, cp1low);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1621 if (ch == -1)
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1622
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1623 warn_when_safe (Qunicode, Qwarning,
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1624 "Out of range Big5 codepoint 0x%x in "
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1625 "translation file %s:\n%s",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1626 cp1, XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1627 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1628 set_unicode_conversion (ch, cp2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1629 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1630 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1631 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1632 int l1, h1, l2, h2;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1633 Ichar emch;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1634
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1635 switch (XCHARSET_TYPE (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1636 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1637 case CHARSET_TYPE_94: l1 = 33; h1 = 126; l2 = 0; h2 = 0; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1638 case CHARSET_TYPE_96: l1 = 32; h1 = 127; l2 = 0; h2 = 0; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1639 case CHARSET_TYPE_94X94: l1 = 33; h1 = 126; l2 = 33; h2 = 126;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1640 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1641 case CHARSET_TYPE_96X96: l1 = 32; h1 = 127; l2 = 32; h2 = 127;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1642 break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1643 default: ABORT (); l1 = 0; h1 = 0; l2 = 0; h2 = 0;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1644 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1645
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1646 if (cp1high < l2 || cp1high > h2 || cp1low < l1 || cp1low > h1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1647 goto out_of_range;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1648
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1649 emch = (cp1high == 0 ? make_ichar (charset, cp1low, 0) :
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1650 make_ichar (charset, cp1high, cp1low));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1651 set_unicode_conversion (emch, cp2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1652 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1653 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1654 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1655
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1656 if (ferror (file))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1657 report_file_error ("IO error when reading", filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1658
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1659 unbind_to (fondo); /* close file */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1660 UNGCPRO;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1661 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1662 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1663
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1664 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1665
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1666
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1667 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1668 /* Unicode coding system */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1669 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1670
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1671 struct unicode_coding_system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1672 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1673 enum unicode_type type;
1887
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1674 unsigned int little_endian :1;
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1675 unsigned int need_bom :1;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1676 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1677
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1678 #define CODING_SYSTEM_UNICODE_TYPE(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1679 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1680 #define XCODING_SYSTEM_UNICODE_TYPE(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1681 CODING_SYSTEM_UNICODE_TYPE (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1682 #define CODING_SYSTEM_UNICODE_LITTLE_ENDIAN(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1683 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1684 #define XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1685 CODING_SYSTEM_UNICODE_LITTLE_ENDIAN (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1686 #define CODING_SYSTEM_UNICODE_NEED_BOM(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1687 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->need_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1688 #define XCODING_SYSTEM_UNICODE_NEED_BOM(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1689 CODING_SYSTEM_UNICODE_NEED_BOM (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1690
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1691 struct unicode_coding_stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1692 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1693 /* decode */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1694 unsigned char counter;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1695 unsigned char indicated_length;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1696 int seen_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1697 /* encode */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1698 Lisp_Object current_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1699 int current_char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1700 int wrote_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1701 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1702
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1703 static const struct memory_description unicode_coding_system_description[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1704 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1705 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1706
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1707 DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (unicode);
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1708
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1709 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1710 decode_unicode_char (int ch, unsigned_char_dynarr *dst,
1887
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1711 struct unicode_coding_stream *data,
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1712 unsigned int ignore_bom)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1713 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1714 if (ch == 0xFEFF && !data->seen_char && ignore_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1715 ;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1716 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1717 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1718 #ifdef MULE
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1719 Ichar chr = unicode_to_ichar (ch, unicode_precedence_dynarr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1720
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1721 if (chr != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1722 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1723 Ibyte work[MAX_ICHAR_LEN];
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1724 int len;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1725
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1726 len = set_itext_ichar (work, chr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1727 Dynarr_add_many (dst, work, len);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1728 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1729 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1730 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1731 Dynarr_add (dst, LEADING_BYTE_JAPANESE_JISX0208);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1732 Dynarr_add (dst, 34 + 128);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1733 Dynarr_add (dst, 46 + 128);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1734 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1735 #else
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1736 Dynarr_add (dst, (Ibyte) ch);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1737 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1738 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1739
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1740 data->seen_char = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1741 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1742
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1743 #define DECODE_ERROR_OCTET(octet, dst, data, ignore_bom) \
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1744 decode_unicode_char ((octet) + UNICODE_ERROR_OCTET_RANGE_START, \
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1745 dst, data, ignore_bom)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1746
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1747 static inline void
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1748 indicate_invalid_utf_8 (unsigned char indicated_length,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1749 unsigned char counter,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1750 int ch, unsigned_char_dynarr *dst,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1751 struct unicode_coding_stream *data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1752 unsigned int ignore_bom)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1753 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1754 Binbyte stored = indicated_length - counter;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1755 Binbyte mask = "\x00\x00\xC0\xE0\xF0\xF8\xFC"[indicated_length];
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1756
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1757 while (stored > 0)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1758 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1759 DECODE_ERROR_OCTET (((ch >> (6 * (stored - 1))) & 0x3f) | mask,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1760 dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1761 mask = 0x80, stored--;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1762 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1763 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1764
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1765 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1766 encode_unicode_char_1 (int code, unsigned_char_dynarr *dst,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1767 enum unicode_type type, unsigned int little_endian,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1768 int write_error_characters_as_such)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1769 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1770 switch (type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1771 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1772 case UNICODE_UTF_16:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1773 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1774 {
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1775 if (code < 0x10000) {
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1776 Dynarr_add (dst, (unsigned char) (code & 255));
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1777 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1778 } else if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1779 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1780 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1781 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1782 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1783 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1784 else if (code < 0x110000)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1785 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1786 /* Little endian; least significant byte first. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1787 int first, second;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1788
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1789 CODE_TO_UTF_16_SURROGATES(code, first, second);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1790
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1791 Dynarr_add (dst, (unsigned char) (first & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1792 Dynarr_add (dst, (unsigned char) ((first >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1793
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1794 Dynarr_add (dst, (unsigned char) (second & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1795 Dynarr_add (dst, (unsigned char) ((second >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1796 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1797 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1798 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1799 /* Not valid Unicode. Pass U+FFFD, least significant byte
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1800 first. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1801 Dynarr_add (dst, (unsigned char) 0xFD);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1802 Dynarr_add (dst, (unsigned char) 0xFF);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1803 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1804 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1805 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1806 {
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1807 if (code < 0x10000) {
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1808 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1809 Dynarr_add (dst, (unsigned char) (code & 255));
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1810 } else if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1811 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1812 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1813 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1814 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1815 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1816 else if (code < 0x110000)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1817 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1818 /* Big endian; most significant byte first. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1819 int first, second;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1820
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1821 CODE_TO_UTF_16_SURROGATES(code, first, second);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1822
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1823 Dynarr_add (dst, (unsigned char) ((first >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1824 Dynarr_add (dst, (unsigned char) (first & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1825
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1826 Dynarr_add (dst, (unsigned char) ((second >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1827 Dynarr_add (dst, (unsigned char) (second & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1828 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1829 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1830 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1831 /* Not valid Unicode. Pass U+FFFD, most significant byte
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1832 first. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1833 Dynarr_add (dst, (unsigned char) 0xFF);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1834 Dynarr_add (dst, (unsigned char) 0xFD);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1835 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1836 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1837 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1838
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1839 case UNICODE_UCS_4:
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1840 case UNICODE_UTF_32:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1841 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1842 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1843 if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1844 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1845 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1846 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1847 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1848 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1849 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1850 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1851 /* We generate and accept incorrect sequences here, which is
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1852 okay, in the interest of preservation of the user's
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1853 data. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1854 Dynarr_add (dst, (unsigned char) (code & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1855 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1856 Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1857 Dynarr_add (dst, (unsigned char) (code >> 24));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1858 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1859 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1860 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1861 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1862 if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1863 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1864 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1865 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1866 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1867 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1868 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1869 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1870 /* We generate and accept incorrect sequences here, which is okay,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1871 in the interest of preservation of the user's data. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1872 Dynarr_add (dst, (unsigned char) (code >> 24));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1873 Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1874 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1875 Dynarr_add (dst, (unsigned char) (code & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1876 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1877 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1878 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1879
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1880 case UNICODE_UTF_8:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1881 if (code <= 0x7f)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1882 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1883 Dynarr_add (dst, (unsigned char) code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1884 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1885 else if (code <= 0x7ff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1886 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1887 Dynarr_add (dst, (unsigned char) ((code >> 6) | 0xc0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1888 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1889 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1890 else if (code <= 0xffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1891 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1892 Dynarr_add (dst, (unsigned char) ((code >> 12) | 0xe0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1893 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1894 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1895 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1896 else if (code <= 0x1fffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1897 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1898 Dynarr_add (dst, (unsigned char) ((code >> 18) | 0xf0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1899 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1900 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1901 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1902 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1903 else if (code <= 0x3ffffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1904 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1905
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1906 #if !(UNICODE_ERROR_OCTET_RANGE_START > 0x1fffff \
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1907 && UNICODE_ERROR_OCTET_RANGE_START < 0x3ffffff)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1908 #error "This code needs to be rewritten. "
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1909 #endif
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1910 if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1911 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1912 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1913 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1914 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1915 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1916 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1917 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1918 Dynarr_add (dst, (unsigned char) ((code >> 24) | 0xf8));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1919 Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1920 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1921 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1922 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1923 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1924 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1925 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1926 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1927 Dynarr_add (dst, (unsigned char) ((code >> 30) | 0xfc));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1928 Dynarr_add (dst, (unsigned char) (((code >> 24) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1929 Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1930 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1931 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1932 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1933 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1934 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1935
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1936 case UNICODE_UTF_7: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1937
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1938 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1939 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1940 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1941
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1942 /* Also used in mule-coding.c for UTF-8 handling in ISO 2022-oriented
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1943 encodings. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1944 void
2333
ba4677f54a05 [xemacs-hg @ 2004-10-14 17:26:18 by james]
james
parents: 2286
diff changeset
1945 encode_unicode_char (Lisp_Object USED_IF_MULE (charset), int h,
ba4677f54a05 [xemacs-hg @ 2004-10-14 17:26:18 by james]
james
parents: 2286
diff changeset
1946 int USED_IF_MULE (l), unsigned_char_dynarr *dst,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1947 enum unicode_type type, unsigned int little_endian,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1948 int write_error_characters_as_such)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1949 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1950 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1951 int code = ichar_to_unicode (make_ichar (charset, h & 127, l & 127));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1952
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1953 if (code == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1954 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1955 if (type != UNICODE_UTF_16 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1956 XCHARSET_DIMENSION (charset) == 2 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1957 XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1958 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1959 unsigned char final = XCHARSET_FINAL (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1960
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1961 if (('@' <= final) && (final < 0x7f))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1962 code = (0xe00000 + (final - '@') * 94 * 94
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1963 + ((h & 127) - 33) * 94 + (l & 127) - 33);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1964 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1965 code = '?';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1966 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1967 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1968 code = '?';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1969 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1970 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1971 int code = h;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1972 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1973
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1974 encode_unicode_char_1 (code, dst, type, little_endian,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1975 write_error_characters_as_such);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1976 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1977
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1978 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1979 unicode_convert (struct coding_stream *str, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1980 unsigned_char_dynarr *dst, Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1981 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1982 unsigned int ch = str->ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1983 struct unicode_coding_stream *data = CODING_STREAM_TYPE_DATA (str, unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1984 enum unicode_type type =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1985 XCODING_SYSTEM_UNICODE_TYPE (str->codesys);
1887
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1986 unsigned int little_endian =
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1987 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (str->codesys);
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1988 unsigned int ignore_bom = XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1989 Bytecount orign = n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1990
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1991 if (str->direction == CODING_DECODE)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1992 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1993 unsigned char counter = data->counter;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1994 unsigned char indicated_length
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1995 = data->indicated_length;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1996
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1997 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1998 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1999 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2000
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2001 switch (type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2002 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2003 case UNICODE_UTF_8:
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2004 if (0 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2005 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2006 if (0 == (c & 0x80))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2007 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2008 /* ASCII. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2009 decode_unicode_char (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2010 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2011 else if (0 == (c & 0x40))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2012 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2013 /* Highest bit set, second highest not--there's
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2014 something wrong. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2015 DECODE_ERROR_OCTET (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2016 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2017 else if (0 == (c & 0x20))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2018 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2019 ch = c & 0x1f;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2020 counter = 1;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2021 indicated_length = 2;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2022 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2023 else if (0 == (c & 0x10))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2024 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2025 ch = c & 0x0f;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2026 counter = 2;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2027 indicated_length = 3;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2028 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2029 else if (0 == (c & 0x08))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2030 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2031 ch = c & 0x0f;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2032 counter = 3;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2033 indicated_length = 4;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2034 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2035 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2036 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2037 /* We don't supports lengths longer than 4 in
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2038 external-format data. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2039 DECODE_ERROR_OCTET (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2040
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2041 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2042 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2043 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2044 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2045 /* counter != 0 */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2046 if ((0 == (c & 0x80)) || (0 != (c & 0x40)))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2047 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2048 indicate_invalid_utf_8(indicated_length,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2049 counter,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2050 ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2051 if (c & 0x80)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2052 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2053 DECODE_ERROR_OCTET (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2054 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2055 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2056 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2057 /* The character just read is ASCII. Treat it as
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2058 such. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2059 decode_unicode_char (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2060 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2061 ch = 0;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2062 counter = 0;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2063 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2064 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2065 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2066 ch = (ch << 6) | (c & 0x3f);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2067 counter--;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2068 /* Just processed the final byte. Emit the character. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2069 if (!counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2070 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2071 /* Don't accept over-long sequences, surrogates,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2072 or codes above #x10FFFF. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2073 if ((ch < 0x80) ||
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2074 ((ch < 0x800) && indicated_length > 2) ||
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2075 ((ch < 0x10000) && indicated_length > 3) ||
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2076 valid_utf_16_surrogate(ch) || (ch > 0x110000))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2077 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2078 indicate_invalid_utf_8(indicated_length,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2079 counter,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2080 ch, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2081 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2082 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2083 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2084 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2085 decode_unicode_char (ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2086 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2087 ch = 0;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2088 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2089 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2090 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2091 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2092
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2093 case UNICODE_UTF_16:
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2094
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2095 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2096 ch = (c << counter) | ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2097 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2098 ch = (ch << 8) | c;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2099
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2100 counter += 8;
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2101
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2102 if (16 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2103 {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2104 int tempch = ch;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2105
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2106 if (valid_utf_16_first_surrogate(ch))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2107 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2108 break;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2109 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2110 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2111 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2112 decode_unicode_char (tempch, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2113 }
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2114 else if (32 == counter)
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2115 {
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2116 int tempch;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2117
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2118 if (!valid_utf_16_last_surrogate(ch & 0xFFFF))
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2119 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2120 DECODE_ERROR_OCTET ((ch >> 24) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2121 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2122 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2123 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2124 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2125 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2126 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2127 ignore_bom);
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2128 }
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2129 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2130 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2131 tempch = utf_16_surrogates_to_code((ch >> 16),
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2132 (ch & 0xffff));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2133 decode_unicode_char(tempch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2134 }
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2135 ch = 0;
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2136 counter = 0;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2137 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2138 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2139 assert(8 == counter || 24 == counter);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2140 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2141
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2142 case UNICODE_UCS_4:
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2143 case UNICODE_UTF_32:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2144 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2145 ch = (c << counter) | ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2146 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2147 ch = (ch << 8) | c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2148 counter += 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2149 if (counter == 32)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2150 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2151 if (ch > 0x10ffff)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2152 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2153 /* ch is not a legal Unicode character. We're fine
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2154 with that in UCS-4, though not in UTF-32. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2155 if (UNICODE_UCS_4 == type && ch < 0x80000000)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2156 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2157 decode_unicode_char (ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2158 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2159 else if (little_endian)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2160 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2161 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2162 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2163 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2164 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2165 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2166 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2167 DECODE_ERROR_OCTET ((ch >> 24) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2168 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2169 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2170 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2171 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2172 DECODE_ERROR_OCTET ((ch >> 24) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2173 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2174 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2175 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2176 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2177 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2178 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2179 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2180 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2181 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2182 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2183 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2184 decode_unicode_char (ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2185 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2186 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2187 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2188 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2189 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2190
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2191 case UNICODE_UTF_7:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
2192 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2193 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2194
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
2195 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2196 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2197
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2198 }
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2199
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2200 if (str->eof && ch)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2201 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2202 switch (type)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2203 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2204 case UNICODE_UTF_8:
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2205 indicate_invalid_utf_8(indicated_length,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2206 counter, ch, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2207 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2208 break;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2209
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2210 case UNICODE_UTF_16:
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2211 case UNICODE_UCS_4:
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2212 case UNICODE_UTF_32:
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2213 if (8 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2214 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2215 DECODE_ERROR_OCTET (ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2216 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2217 else if (16 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2218 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2219 if (little_endian)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2220 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2221 DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2222 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2223 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2224 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2225 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2226 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2227 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2228 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2229 DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2230 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2231 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2232 else if (24 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2233 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2234 if (little_endian)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2235 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2236 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2237 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2238 DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2239 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2240 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2241 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2242 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2243 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2244 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2245 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2246 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2247 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2248 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2249 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2250 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2251 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2252 else assert(0);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2253 break;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2254 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2255 ch = 0;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2256 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2257
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2258 data->counter = counter;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2259 data->indicated_length = indicated_length;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2260 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2261 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2262 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2263 unsigned char char_boundary = data->current_char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2264 Lisp_Object charset = data->current_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2265
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2266 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2267 /* flags for handling composite chars. We do a little switcheroo
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2268 on the source while we're outputting the composite char. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2269 Bytecount saved_n = 0;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
2270 const Ibyte *saved_src = NULL;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2271 int in_composite = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2272
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2273 back_to_square_n:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2274 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2275
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2276 if (XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys) && !data->wrote_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2277 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2278 encode_unicode_char_1 (0xFEFF, dst, type, little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2279 data->wrote_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2280 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2281
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2282 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2283 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
2284 Ibyte c = *src++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2285
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2286 #ifdef MULE
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2287 if (byte_ascii_p (c))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2288 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2289 { /* Processing ASCII character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2290 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2291 encode_unicode_char (Vcharset_ascii, c, 0, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2292 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2293
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2294 char_boundary = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2295 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2296 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
2297 else if (ibyte_leading_byte_p (c) || ibyte_leading_byte_p (ch))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2298 { /* Processing Leading Byte */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2299 ch = 0;
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2300 charset = charset_by_leading_byte (c);
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2301 if (leading_byte_prefix_p(c))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2302 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2303 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2304 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2305 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2306 { /* Processing Non-ASCII character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2307 char_boundary = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2308 if (EQ (charset, Vcharset_control_1))
2704
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2309 /* See:
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2310
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2311 (Info-goto-node "(internals)Internal String Encoding")
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2312
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2313 for the rationale behind subtracting #xa0 from the
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2314 character's code. */
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2315 encode_unicode_char (Vcharset_control_1, c - 0xa0, 0, dst,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2316 type, little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2317 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2318 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2319 switch (XCHARSET_REP_BYTES (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2320 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2321 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2322 encode_unicode_char (charset, c, 0, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2323 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2324 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2325 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2326 if (XCHARSET_PRIVATE_P (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2327 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2328 encode_unicode_char (charset, c, 0, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2329 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2330 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2331 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2332 else if (ch)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2333 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2334 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2335 if (EQ (charset, Vcharset_composite))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2336 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2337 if (in_composite)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2338 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2339 /* #### Bother! We don't know how to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2340 handle this yet. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2341 encode_unicode_char (Vcharset_ascii, '~', 0,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2342 dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2343 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2344 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2345 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2346 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
2347 Ichar emch = make_ichar (Vcharset_composite,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2348 ch & 0x7F,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2349 c & 0x7F);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2350 Lisp_Object lstr =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2351 composite_char_string (emch);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2352 saved_n = n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2353 saved_src = src;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2354 in_composite = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2355 src = XSTRING_DATA (lstr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2356 n = XSTRING_LENGTH (lstr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2357 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2358 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2359 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2360 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2361 encode_unicode_char (charset, ch, c, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2362 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2363 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2364 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2365 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2366 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2367 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2368 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2369 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2370 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2371 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2372 if (ch)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2373 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2374 encode_unicode_char (charset, ch, c, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2375 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2376 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2377 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2378 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2379 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2380 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2381 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2382 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2383 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2384 default:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
2385 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2386 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2387 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2388 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2389 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2390 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2391
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2392 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2393 if (in_composite)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2394 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2395 n = saved_n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2396 src = saved_src;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2397 in_composite = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2398 goto back_to_square_n; /* Wheeeeeeeee ..... */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2399 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2400 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2401
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2402 data->current_char_boundary = char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2403 data->current_charset = charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2404
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2405 /* La palabra se hizo carne! */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2406 /* A palavra fez-se carne! */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2407 /* Whatever. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2408 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2409
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2410 str->ch = ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2411 return orign;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2412 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2413
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2414 /* DEFINE_DETECTOR (utf_7); */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2415 DEFINE_DETECTOR (utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2416 DEFINE_DETECTOR_CATEGORY (utf_8, utf_8);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2417 DEFINE_DETECTOR_CATEGORY (utf_8, utf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2418 DEFINE_DETECTOR (ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2419 DEFINE_DETECTOR_CATEGORY (ucs_4, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2420 DEFINE_DETECTOR (utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2421 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2422 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2423 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2424 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2425
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2426 struct ucs_4_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2427 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2428 int in_ucs_4_byte;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2429 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2430
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2431 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2432 ucs_4_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2433 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2434 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2435 struct ucs_4_detector *data = DETECTION_STATE_DATA (st, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2436
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2437 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2438 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2439 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2440 switch (data->in_ucs_4_byte)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2441 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2442 case 0:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2443 if (c >= 128)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2444 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2445 DET_RESULT (st, ucs_4) = DET_NEARLY_IMPOSSIBLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2446 return;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2447 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2448 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2449 data->in_ucs_4_byte++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2450 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2451 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2452 data->in_ucs_4_byte = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2453 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2454 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2455 data->in_ucs_4_byte++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2456 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2457 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2458
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2459 /* !!#### write this for real */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2460 DET_RESULT (st, ucs_4) = DET_AS_LIKELY_AS_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2461 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2462
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2463 struct utf_16_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2464 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2465 unsigned int seen_ffff:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2466 unsigned int seen_forward_bom:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2467 unsigned int seen_rev_bom:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2468 int byteno;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2469 int prev_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2470 int text, rev_text;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2471 int sep, rev_sep;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2472 int num_ascii;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2473 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2474
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2475 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2476 utf_16_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2477 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2478 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2479 struct utf_16_detector *data = DETECTION_STATE_DATA (st, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2480
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2481 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2482 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2483 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2484 int prevc = data->prev_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2485 if (data->byteno == 1 && c == 0xFF && prevc == 0xFE)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2486 data->seen_forward_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2487 else if (data->byteno == 1 && c == 0xFE && prevc == 0xFF)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2488 data->seen_rev_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2489
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2490 if (data->byteno & 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2491 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2492 if (c == 0xFF && prevc == 0xFF)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2493 data->seen_ffff = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2494 if (prevc == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2495 && (c == '\r' || c == '\n'
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2496 || (c >= 0x20 && c <= 0x7E)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2497 data->text++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2498 if (c == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2499 && (prevc == '\r' || prevc == '\n'
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2500 || (prevc >= 0x20 && prevc <= 0x7E)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2501 data->rev_text++;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2502 /* #### 0x2028 is LINE SEPARATOR and 0x2029 is PARAGRAPH SEPARATOR.
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2503 I used to count these in text and rev_text but that is very bad,
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2504 as 0x2028 is also space + left-paren in ASCII, which is extremely
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2505 common. So, what do we do with these? */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2506 if (prevc == 0x20 && (c == 0x28 || c == 0x29))
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2507 data->sep++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2508 if (c == 0x20 && (prevc == 0x28 || prevc == 0x29))
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2509 data->rev_sep++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2510 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2511
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2512 if ((c >= ' ' && c <= '~') || c == '\n' || c == '\r' || c == '\t' ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2513 c == '\f' || c == '\v')
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2514 data->num_ascii++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2515 data->byteno++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2516 data->prev_char = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2517 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2518
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2519 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2520 int variance_indicates_big_endian =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2521 (data->text >= 10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2522 && (data->rev_text == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2523 || data->text / data->rev_text >= 10));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2524 int variance_indicates_little_endian =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2525 (data->rev_text >= 10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2526 && (data->text == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2527 || data->rev_text / data->text >= 10));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2528
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2529 if (data->seen_ffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2530 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2531 else if (data->seen_forward_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2532 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2533 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2534 if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2535 DET_RESULT (st, utf_16_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2536 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2537 DET_RESULT (st, utf_16_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2538 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2539 DET_RESULT (st, utf_16_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2540 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2541 else if (data->seen_forward_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2542 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2543 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2544 if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2545 DET_RESULT (st, utf_16_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2546 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2547 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2548 DET_RESULT (st, utf_16_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2549 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2550 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2551 DET_RESULT (st, utf_16_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2552 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2553 else if (data->seen_rev_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2554 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2555 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2556 if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2557 DET_RESULT (st, utf_16_little_endian_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2558 else if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2559 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2560 DET_RESULT (st, utf_16_little_endian_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2561 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2562 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2563 DET_RESULT (st, utf_16_little_endian_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2564 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2565 else if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2566 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2567 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2568 DET_RESULT (st, utf_16) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2569 DET_RESULT (st, utf_16_little_endian) = DET_SOMEWHAT_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2570 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2571 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2572 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2573 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2574 DET_RESULT (st, utf_16) = DET_SOMEWHAT_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2575 DET_RESULT (st, utf_16_little_endian) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2576 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2577 else
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2578 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2579 /* #### FUCKME! There should really be an ASCII detector. This
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2580 would rule out the need to have this built-in here as
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2581 well. --ben */
1292
f3437b56874d [xemacs-hg @ 2003-02-13 09:57:04 by ben]
ben
parents: 1267
diff changeset
2582 int pct_ascii = data->byteno ? (100 * data->num_ascii) / data->byteno
f3437b56874d [xemacs-hg @ 2003-02-13 09:57:04 by ben]
ben
parents: 1267
diff changeset
2583 : 100;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2584
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2585 if (pct_ascii > 90)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2586 SET_DET_RESULTS (st, utf_16, DET_QUITE_IMPROBABLE);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2587 else if (pct_ascii > 75)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2588 SET_DET_RESULTS (st, utf_16, DET_SOMEWHAT_UNLIKELY);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2589 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2590 SET_DET_RESULTS (st, utf_16, DET_AS_LIKELY_AS_UNLIKELY);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2591 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2592 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2593 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2594
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2595 struct utf_8_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2596 {
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2597 int byteno;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2598 int first_byte;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2599 int second_byte;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2600 int prev_byte;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2601 int in_utf_8_byte;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2602 int recent_utf_8_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2603 int seen_bogus_utf8;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2604 int seen_really_bogus_utf8;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2605 int seen_2byte_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2606 int seen_longer_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2607 int seen_iso2022_esc;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2608 int seen_iso_shift;
1887
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
2609 unsigned int seen_utf_bom:1;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2610 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2611
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2612 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2613 utf_8_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2614 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2615 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2616 struct utf_8_detector *data = DETECTION_STATE_DATA (st, utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2617
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2618 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2619 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2620 UExtbyte c = *src++;
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2621 switch (data->byteno)
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2622 {
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2623 case 0:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2624 data->first_byte = c;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2625 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2626 case 1:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2627 data->second_byte = c;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2628 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2629 case 2:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2630 if (data->first_byte == 0xef &&
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2631 data->second_byte == 0xbb &&
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2632 c == 0xbf)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2633 data->seen_utf_bom = 1;
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2634 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2635 }
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2636
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2637 switch (data->in_utf_8_byte)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2638 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2639 case 0:
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2640 if (data->prev_byte == ISO_CODE_ESC && c >= 0x28 && c <= 0x2F)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2641 data->seen_iso2022_esc++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2642 else if (c == ISO_CODE_SI || c == ISO_CODE_SO)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2643 data->seen_iso_shift++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2644 else if (c >= 0xfc)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2645 data->in_utf_8_byte = 5;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2646 else if (c >= 0xf8)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2647 data->in_utf_8_byte = 4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2648 else if (c >= 0xf0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2649 data->in_utf_8_byte = 3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2650 else if (c >= 0xe0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2651 data->in_utf_8_byte = 2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2652 else if (c >= 0xc0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2653 data->in_utf_8_byte = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2654 else if (c >= 0x80)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2655 data->seen_bogus_utf8++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2656 if (data->in_utf_8_byte > 0)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2657 data->recent_utf_8_sequence = data->in_utf_8_byte;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2658 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2659 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2660 if ((c & 0xc0) != 0x80)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2661 data->seen_really_bogus_utf8++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2662 else
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2663 {
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2664 data->in_utf_8_byte--;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2665 if (data->in_utf_8_byte == 0)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2666 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2667 if (data->recent_utf_8_sequence == 1)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2668 data->seen_2byte_sequence++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2669 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2670 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2671 assert (data->recent_utf_8_sequence >= 2);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2672 data->seen_longer_sequence++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2673 }
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2674 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2675 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2676 }
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2677
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2678 data->byteno++;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2679 data->prev_byte = c;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2680 }
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2681
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2682 /* either BOM or no BOM, but not both */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2683 SET_DET_RESULTS (st, utf_8, DET_NEARLY_IMPOSSIBLE);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2684
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2685
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2686 if (data->seen_utf_bom)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2687 DET_RESULT (st, utf_8_bom) = DET_NEAR_CERTAINTY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2688 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2689 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2690 if (data->seen_really_bogus_utf8 ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2691 data->seen_bogus_utf8 >= 2)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2692 ; /* bogus */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2693 else if (data->seen_bogus_utf8)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2694 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2695 else if ((data->seen_longer_sequence >= 5 ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2696 data->seen_2byte_sequence >= 10) &&
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2697 (!(data->seen_iso2022_esc + data->seen_iso_shift) ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2698 (data->seen_longer_sequence * 2 + data->seen_2byte_sequence) /
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2699 (data->seen_iso2022_esc + data->seen_iso_shift) >= 10))
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2700 /* heuristics, heuristics, we love heuristics */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2701 DET_RESULT (st, utf_8) = DET_QUITE_PROBABLE;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2702 else if (data->seen_iso2022_esc ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2703 data->seen_iso_shift >= 3)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2704 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2705 else if (data->seen_longer_sequence ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2706 data->seen_2byte_sequence)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2707 DET_RESULT (st, utf_8) = DET_SOMEWHAT_LIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2708 else if (data->seen_iso_shift)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2709 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2710 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2711 DET_RESULT (st, utf_8) = DET_AS_LIKELY_AS_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2712 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2713 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2714
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2715 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2716 unicode_init_coding_stream (struct coding_stream *str)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2717 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2718 struct unicode_coding_stream *data =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2719 CODING_STREAM_TYPE_DATA (str, unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2720 xzero (*data);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2721 data->current_charset = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2722 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2723
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2724 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2725 unicode_rewind_coding_stream (struct coding_stream *str)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2726 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2727 unicode_init_coding_stream (str);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2728 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2729
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2730 static int
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2731 unicode_putprop (Lisp_Object codesys, Lisp_Object key, Lisp_Object value)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2732 {
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3659
diff changeset
2733 if (EQ (key, Qunicode_type))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2734 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2735 enum unicode_type type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2736
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2737 if (EQ (value, Qutf_8))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2738 type = UNICODE_UTF_8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2739 else if (EQ (value, Qutf_16))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2740 type = UNICODE_UTF_16;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2741 else if (EQ (value, Qutf_7))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2742 type = UNICODE_UTF_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2743 else if (EQ (value, Qucs_4))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2744 type = UNICODE_UCS_4;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2745 else if (EQ (value, Qutf_32))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2746 type = UNICODE_UTF_32;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2747 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2748 invalid_constant ("Invalid Unicode type", key);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2749
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2750 XCODING_SYSTEM_UNICODE_TYPE (codesys) = type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2751 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2752 else if (EQ (key, Qlittle_endian))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2753 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (codesys) = !NILP (value);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2754 else if (EQ (key, Qneed_bom))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2755 XCODING_SYSTEM_UNICODE_NEED_BOM (codesys) = !NILP (value);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2756 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2757 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2758 return 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2759 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2760
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2761 static Lisp_Object
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2762 unicode_getprop (Lisp_Object coding_system, Lisp_Object prop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2763 {
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3659
diff changeset
2764 if (EQ (prop, Qunicode_type))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2765 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2766 switch (XCODING_SYSTEM_UNICODE_TYPE (coding_system))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2767 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2768 case UNICODE_UTF_16: return Qutf_16;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2769 case UNICODE_UTF_8: return Qutf_8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2770 case UNICODE_UTF_7: return Qutf_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2771 case UNICODE_UCS_4: return Qucs_4;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2772 case UNICODE_UTF_32: return Qutf_32;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
2773 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2774 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2775 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2776 else if (EQ (prop, Qlittle_endian))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2777 return XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (coding_system) ? Qt : Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2778 else if (EQ (prop, Qneed_bom))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2779 return XCODING_SYSTEM_UNICODE_NEED_BOM (coding_system) ? Qt : Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2780 return Qunbound;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2781 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2782
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2783 static void
2286
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2784 unicode_print (Lisp_Object cs, Lisp_Object printcharfun,
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2785 int UNUSED (escapeflag))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2786 {
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3659
diff changeset
2787 write_fmt_string_lisp (printcharfun, "(%s", 1,
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3659
diff changeset
2788 unicode_getprop (cs, Qunicode_type));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2789 if (XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (cs))
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2790 write_c_string (printcharfun, ", little-endian");
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2791 if (XCODING_SYSTEM_UNICODE_NEED_BOM (cs))
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2792 write_c_string (printcharfun, ", need-bom");
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2793 write_c_string (printcharfun, ")");
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2794 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2795
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2796 int
2286
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2797 dfc_coding_system_is_unicode (
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2798 #ifdef WIN32_ANY
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2799 Lisp_Object codesys
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2800 #else
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2801 Lisp_Object UNUSED (codesys)
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2802 #endif
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2803 )
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2804 {
1315
70921960b980 [xemacs-hg @ 2003-02-20 08:19:28 by ben]
ben
parents: 1292
diff changeset
2805 #ifdef WIN32_ANY
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2806 codesys = Fget_coding_system (codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2807 return (EQ (XCODING_SYSTEM_TYPE (codesys), Qunicode) &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2808 XCODING_SYSTEM_UNICODE_TYPE (codesys) == UNICODE_UTF_16 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2809 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (codesys));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2810
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2811 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2812 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2813 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2814 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2815
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2816
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2817 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2818 /* Initialization */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2819 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2820
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2821 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2822 syms_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2823 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2824 #ifdef MULE
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
2825 DEFSUBR (Funicode_precedence_list);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2826 DEFSUBR (Fset_language_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2827 DEFSUBR (Flanguage_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2828 DEFSUBR (Fset_default_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2829 DEFSUBR (Fdefault_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2830 DEFSUBR (Fset_unicode_conversion);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2831
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
2832 DEFSUBR (Fload_unicode_mapping_table);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2833
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
2834 DEFSYMBOL (Qccl_encode_to_ucs_2);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
2835 DEFSYMBOL (Qlast_allocated_character);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2836 DEFSYMBOL (Qignore_first_column);
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2837
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2838 DEFSYMBOL (Qunicode_registries);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2839 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2840
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
2841 DEFSUBR (Fchar_to_unicode);
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
2842 DEFSUBR (Funicode_to_char);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2843
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2844 DEFSYMBOL (Qunicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2845 DEFSYMBOL (Qucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2846 DEFSYMBOL (Qutf_16);
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2847 DEFSYMBOL (Qutf_32);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2848 DEFSYMBOL (Qutf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2849 DEFSYMBOL (Qutf_7);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2850
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2851 DEFSYMBOL (Qneed_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2852
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2853 DEFSYMBOL (Qutf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2854 DEFSYMBOL (Qutf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2855 DEFSYMBOL (Qutf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2856 DEFSYMBOL (Qutf_16_little_endian_bom);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2857
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2858 DEFSYMBOL (Qutf_8);
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2859 DEFSYMBOL (Qutf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2860 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2861
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2862 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2863 coding_system_type_create_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2864 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2865 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (unicode, "unicode-coding-system-p");
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2866 CODING_SYSTEM_HAS_METHOD (unicode, print);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2867 CODING_SYSTEM_HAS_METHOD (unicode, convert);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2868 CODING_SYSTEM_HAS_METHOD (unicode, init_coding_stream);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2869 CODING_SYSTEM_HAS_METHOD (unicode, rewind_coding_stream);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2870 CODING_SYSTEM_HAS_METHOD (unicode, putprop);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2871 CODING_SYSTEM_HAS_METHOD (unicode, getprop);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2872
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2873 INITIALIZE_DETECTOR (utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2874 DETECTOR_HAS_METHOD (utf_8, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2875 INITIALIZE_DETECTOR_CATEGORY (utf_8, utf_8);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2876 INITIALIZE_DETECTOR_CATEGORY (utf_8, utf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2877
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2878 INITIALIZE_DETECTOR (ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2879 DETECTOR_HAS_METHOD (ucs_4, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2880 INITIALIZE_DETECTOR_CATEGORY (ucs_4, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2881
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2882 INITIALIZE_DETECTOR (utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2883 DETECTOR_HAS_METHOD (utf_16, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2884 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2885 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2886 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2887 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2888 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2889
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2890 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2891 reinit_coding_system_type_create_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2892 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2893 REINITIALIZE_CODING_SYSTEM_TYPE (unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2894 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2895
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2896 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2897 vars_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2898 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2899 Fprovide (intern ("unicode"));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2900
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2901 #ifdef MULE
4270
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
2902 staticpro (&Vnumber_of_jit_charsets);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
2903 Vnumber_of_jit_charsets = make_int (0);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
2904 staticpro (&Vlast_jit_charset_final);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
2905 Vlast_jit_charset_final = make_char (0x30);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
2906 staticpro (&Vcharset_descr);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
2907 Vcharset_descr
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
2908 = build_string ("Mule charset for otherwise unknown Unicode code points.");
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
2909
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2910 staticpro (&Vlanguage_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2911 Vlanguage_unicode_precedence_list = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2912
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2913 staticpro (&Vdefault_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2914 Vdefault_unicode_precedence_list = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2915
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2916 unicode_precedence_dynarr = Dynarr_new (Lisp_Object);
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2917 dump_add_root_block_ptr (&unicode_precedence_dynarr,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2918 &lisp_object_dynarr_description);
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2919
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2920
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2921
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2922 init_blank_unicode_tables ();
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2923
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
2924 staticpro (&Vcurrent_jit_charset);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
2925 Vcurrent_jit_charset = Qnil;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
2926
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2927 /* Note that the "block" we are describing is a single pointer, and hence
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2928 we could potentially use dump_add_root_block_ptr(). However, given
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2929 the way the descriptions are written, we couldn't use them, and would
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2930 have to write new descriptions for each of the pointers below, since
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2931 we would have to make use of a description with an XD_BLOCK_ARRAY
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2932 in it. */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2933
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2934 dump_add_root_block (&to_unicode_blank_1, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2935 to_unicode_level_1_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2936 dump_add_root_block (&to_unicode_blank_2, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2937 to_unicode_level_2_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2938
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2939 dump_add_root_block (&from_unicode_blank_1, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2940 from_unicode_level_1_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2941 dump_add_root_block (&from_unicode_blank_2, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2942 from_unicode_level_2_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2943 dump_add_root_block (&from_unicode_blank_3, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2944 from_unicode_level_3_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2945 dump_add_root_block (&from_unicode_blank_4, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
2946 from_unicode_level_4_desc_1);
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2947
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2948 DEFVAR_LISP ("unicode-registries", &Qunicode_registries /*
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2949 Vector describing the X11 registries searched when using fallback fonts.
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2950
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2951 "Fallback fonts" here includes by default those fonts used by redisplay when
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2952 displaying charsets for which the `encode-as-utf-8' property is true, and
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2953 those used when no font matching the charset's registries property has been
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2954 found (that is, they're probably Mule-specific charsets like Ethiopic or
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2955 IPA.)
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2956 */ );
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
2957 Qunicode_registries = vector1(build_string("iso10646-1"));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2958 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2959 }