annotate src/unicode.c @ 5576:071b810ceb18

Declare labels as line where appropriate; use #'labels, not #'flet, tests. lisp/ChangeLog addition: 2011-10-03 Aidan Kehoe <kehoea@parhasard.net> * simple.el (handle-pre-motion-command-current-command-is-motion): Implement #'keysyms-equal with #'labels + (declare (inline ...)), instead of abusing macrolet to the same end. * specifier.el (let-specifier): * mule/mule-cmds.el (describe-language-environment): * mule/mule-cmds.el (set-language-environment-coding-systems): * mule/mule-x-init.el (x-use-halfwidth-roman-font): * faces.el (Face-frob-property): * keymap.el (key-sequence-list-description): * lisp-mode.el (construct-lisp-mode-menu): * loadhist.el (unload-feature): * mouse.el (default-mouse-track-check-for-activation): Declare various labels inline in dumped files when that reduces the size of the dumped image. Declaring labels inline is normally only worthwhile for inner loops and so on, but it's reasonable exercise of the related code to have these changes in core. tests/ChangeLog addition: 2011-10-03 Aidan Kehoe <kehoea@parhasard.net> * automated/case-tests.el (uni-mappings): * automated/database-tests.el (delete-database-files): * automated/hash-table-tests.el (iterations): * automated/lisp-tests.el (test1): * automated/lisp-tests.el (a): * automated/lisp-tests.el (cl-floor): * automated/lisp-tests.el (foo): * automated/lisp-tests.el (list-nreverse): * automated/lisp-tests.el (needs-lexical-context): * automated/mule-tests.el (featurep): * automated/os-tests.el (original-string): * automated/os-tests.el (with): * automated/symbol-tests.el (check-weak-list-unique): Replace #'flet with #'labels where appropriate in these tests, following my own advice on style in the docstrings of those functions.
author Aidan Kehoe <kehoea@parhasard.net>
date Mon, 03 Oct 2011 20:16:14 +0100
parents 4dee0387b9de
children 56144c8593a8
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1 /* Code to handle Unicode conversion.
4834
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
2 Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2010 Ben Wing.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
4 This file is part of XEmacs.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
5
5402
308d34e9f07d Changed bulk of GPLv2 or later files identified by script
Mats Lidell <matsl@xemacs.org>
parents: 5157
diff changeset
6 XEmacs is free software: you can redistribute it and/or modify it
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
7 under the terms of the GNU General Public License as published by the
5402
308d34e9f07d Changed bulk of GPLv2 or later files identified by script
Mats Lidell <matsl@xemacs.org>
parents: 5157
diff changeset
8 Free Software Foundation, either version 3 of the License, or (at your
308d34e9f07d Changed bulk of GPLv2 or later files identified by script
Mats Lidell <matsl@xemacs.org>
parents: 5157
diff changeset
9 option) any later version.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
11 XEmacs is distributed in the hope that it will be useful, but WITHOUT
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
12 ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
13 FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
14 for more details.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
15
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
16 You should have received a copy of the GNU General Public License
5402
308d34e9f07d Changed bulk of GPLv2 or later files identified by script
Mats Lidell <matsl@xemacs.org>
parents: 5157
diff changeset
17 along with XEmacs. If not, see <http://www.gnu.org/licenses/>. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
18
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
19 /* Synched up with: FSF 20.3. Not in FSF. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
20
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
21 /* Authorship:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
22
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
23 Current primary author: Ben Wing <ben@xemacs.org>
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
24
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
25 Written by Ben Wing <ben@xemacs.org>, June, 2001.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
26 Separated out into this file, August, 2001.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
27 Includes Unicode coding systems, some parts of which have been written
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
28 by someone else. #### Morioka and Hayashi, I think.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
29
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
30 As of September 2001, the detection code is here and abstraction of the
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
31 detection system is finished. The unicode detectors have been rewritten
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
32 to include multiple levels of likelihood.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
33 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
34
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
35 #include <config.h>
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
36 #include "lisp.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
37
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
38 #include "charset.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
39 #include "file-coding.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
40 #include "opaque.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
41
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
42 #include "buffer.h"
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
43 #include "rangetab.h"
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
44 #include "extents.h"
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
45
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
46 #include "sysfile.h"
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
47
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
48 /* For more info about how Unicode works under Windows, see intl-win32.c. */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
49
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
50 /* Info about Unicode translation tables [ben]:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
51
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
52 FORMAT:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
53 -------
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
54
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
55 We currently use the following format for tables:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
56
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
57 If dimension == 1, to_unicode_table is a 96-element array of ints
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
58 (Unicode code points); else, it's a 96-element array of int * pointers,
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
59 each of which points to a 96-element array of ints. If no elements in a
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
60 row have been filled in, the pointer will point to a default empty
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
61 table; that way, memory usage is more reasonable but lookup still fast.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
62
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
63 -- If from_unicode_levels == 1, from_unicode_table is a 256-element
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
64 array of shorts (octet 1 in high byte, octet 2 in low byte; we don't
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
65 store Ichars directly to save space).
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
66
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
67 -- If from_unicode_levels == 2, from_unicode_table is a 256-element
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
68 array of short * pointers, each of which points to a 256-element array
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
69 of shorts.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
70
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
71 -- If from_unicode_levels == 3, from_unicode_table is a 256-element
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
72 array of short ** pointers, each of which points to a 256-element array
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
73 of short * pointers, each of which points to a 256-element array of
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
74 shorts.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
75
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
76 -- If from_unicode_levels == 4, same thing but one level deeper.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
77
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
78 Just as for to_unicode_table, we use default tables to fill in all
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
79 entries with no values in them.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
80
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
81 #### An obvious space-saving optimization is to use variable-sized
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
82 tables, where each table instead of just being a 256-element array, is a
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
83 structure with a start value, an end value, and a variable number of
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
84 entries (END - START + 1). Only 8 bits are needed for END and START,
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
85 and could be stored at the end to avoid alignment problems. However,
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
86 before charging off and implementing this, we need to consider whether
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
87 it's worth it:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
88
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
89 (1) Most tables will be highly localized in which code points are
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
90 defined, heavily reducing the possible memory waste. Before doing any
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
91 rewriting, write some code to see how much memory is actually being
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
92 wasted (i.e. ratio of empty entries to total # of entries) and only
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
93 start rewriting if it's unacceptably high. You have to check over all
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
94 charsets.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
95
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
96 (2) Since entries are usually added one at a time, you have to be very
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
97 careful when creating the tables to avoid realloc()/free() thrashing in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
98 the common case when you are in an area of high localization and are
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
99 going to end up using most entries in the table. You'd certainly want
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
100 to allow only certain sizes, not arbitrary ones (probably powers of 2,
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
101 where you want the entire block including the START/END values to fit
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
102 into a power of 2, minus any malloc overhead if there is any -- there's
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
103 none under gmalloc.c, and probably most system malloc() functions are
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
104 quite smart nowadays and also have no overhead). You could optimize
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
105 somewhat during the in-C initializations, because you can compute the
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
106 actual usage of various tables by scanning the entries you're going to
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
107 add in a separate pass before adding them. (You could actually do the
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
108 same thing when entries are added on the Lisp level by making the
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
109 assumption that all the entries will come in one after another before
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
110 any use is made of the data. So as they're coming in, you just store
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
111 them in a big long list, and the first time you need to retrieve an
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
112 entry, you compute the whole table at once.) You'd still have to deal
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
113 with the possibility of later entries coming in, though.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
114
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
115 (3) You do lose some speed using START/END values, since you need a
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
116 couple of comparisons at each level. This could easily make each single
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
117 lookup become 3-4 times slower. The Unicode book considers this a big
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
118 issue, and recommends against variable-sized tables for this reason;
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
119 however, they almost certainly have in mind applications that primarily
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
120 involve conversion of large amounts of data. Most Unicode strings that
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
121 are translated in XEmacs are fairly small. The only place where this
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
122 might matter is in loading large files -- e.g. a 3-megabyte
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
123 Unicode-encoded file. So think about this, and maybe do a trial
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
124 implementation where you don't worry too much about the intricacies of
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
125 (2) and just implement some basic "multiply by 1.5" trick or something
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
126 to do the resizing. There is a very good FAQ on Unicode called
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
127 something like the Linux-Unicode How-To (it should be part of the Linux
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
128 How-To's, I think), that lists the url of a guy with a whole bunch of
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
129 unicode files you can use to stress-test your implementations, and he's
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
130 highly likely to have a good multi-megabyte Unicode-encoded file (with
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
131 normal text in it -- if you created your own just by creating repeated
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
132 strings of letters and numbers, you probably wouldn't get accurate
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
133 results).
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
134
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
135 INITIALIZATION:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
136 ---------------
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
137
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
138 There are advantages and disadvantages to loading the tables at
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
139 run-time.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
140
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
141 Advantages:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
142
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
143 They're big, and it's very fast to recreate them (a fraction of a second
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
144 on modern processors).
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
145
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
146 Disadvantages:
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
147
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
148 (1) User-defined charsets: It would be inconvenient to require all
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
149 dumped user-defined charsets to be reloaded at init time.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
150
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
151 NB With run-time loading, we load in init-mule-at-startup, in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
152 mule-cmds.el. This is called from startup.el, which is quite late in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
153 the initialization process -- but data-directory isn't set until then.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
154 With dump-time loading, you still can't dump in a Japanese directory
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
155 (again, until we move to Unicode internally), but this is not such an
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
156 imposition.
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
157
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
158
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
159 */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
160
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
161 /* #### WARNING! The current sledgehammer routines have a fundamental
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
162 problem in that they can't handle two characters mapping to a
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
163 single Unicode codepoint or vice-versa in a single charset table.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
164 It's not clear there is any way to handle this and still make the
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
165 sledgehammer routines useful.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
166
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
167 Inquiring Minds Want To Know Dept: does the above WARNING mean that
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
168 _if_ it happens, then it will signal error, or then it will do
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
169 something evil and unpredictable? Signaling an error is OK: for
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
170 all national standards, the national to Unicode map is an inclusion
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
171 (1-to-1). Any character set that does not behave that way is
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
172 broken according to the Unicode standard.
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
173
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
174 Answer: You will get an ABORT(), since the purpose of the sledgehammer
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
175 routines is self-checking. The above problem with non-1-to-1 mapping
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
176 occurs in the Big5 tables, as provided by the Unicode Consortium. */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
177
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
178 /* #define SLEDGEHAMMER_CHECK_UNICODE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
179
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
180 /* When MULE is not defined, we may still need some Unicode support --
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
181 in particular, some Windows API's always want Unicode, and the way
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
182 we've set up the Unicode encapsulation, we may as well go ahead and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
183 always use the Unicode versions of split API's. (It would be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
184 trickier to not use them, and pointless -- under NT, the ANSI API's
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
185 call the Unicode ones anyway, so in the case of structures, we'd be
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
186 converting from Unicode to ANSI structures, only to have the OS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
187 convert them back.) */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
188
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
189 Lisp_Object Qunicode;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
190 Lisp_Object Qutf_16, Qutf_8, Qucs_4, Qutf_7, Qutf_32;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
191 Lisp_Object Qneed_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
192
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
193 Lisp_Object Qutf_16_little_endian, Qutf_16_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
194 Lisp_Object Qutf_16_little_endian_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
195
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
196 Lisp_Object Qutf_8_bom;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
197
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
198 #ifdef MULE
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
199 /* These range tables are not directly accessible from Lisp: */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
200 static Lisp_Object Vunicode_invalid_and_query_skip_chars;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
201 static Lisp_Object Vutf_8_invalid_and_query_skip_chars;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
202 static Lisp_Object Vunicode_query_skip_chars;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
203
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
204 static Lisp_Object Vunicode_query_string, Vunicode_invalid_string,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
205 Vutf_8_invalid_string;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
206 #endif /* MULE */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
207
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
208 /* See the Unicode FAQ, http://www.unicode.org/faq/utf_bom.html#35 for this
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
209 algorithm.
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
210
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
211 (They also give another, really verbose one, as part of their explanation
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
212 of the various planes of the encoding, but we won't use that.) */
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
213
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
214 #define UTF_16_LEAD_OFFSET (0xD800 - (0x10000 >> 10))
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
215 #define UTF_16_SURROGATE_OFFSET (0x10000 - (0xD800 << 10) - 0xDC00)
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
216
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
217 #define utf_16_surrogates_to_code(lead, trail) \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
218 (((lead) << 10) + (trail) + UTF_16_SURROGATE_OFFSET)
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
219
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
220 #define CODE_TO_UTF_16_SURROGATES(codepoint, lead, trail) do { \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
221 int __ctu16s_code = (codepoint); \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
222 lead = UTF_16_LEAD_OFFSET + (__ctu16s_code >> 10); \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
223 trail = 0xDC00 + (__ctu16s_code & 0x3FF); \
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
224 } while (0)
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
225
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
226 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
227
3352
8dbdcd070418 [xemacs-hg @ 2006-04-22 15:18:54 by stephent]
stephent
parents: 3025
diff changeset
228 /* Using ints for to_unicode is OK (as long as they are >= 32 bits).
8dbdcd070418 [xemacs-hg @ 2006-04-22 15:18:54 by stephent]
stephent
parents: 3025
diff changeset
229 In from_unicode, we're converting from Mule characters, which means
8dbdcd070418 [xemacs-hg @ 2006-04-22 15:18:54 by stephent]
stephent
parents: 3025
diff changeset
230 that the values being converted to are only 96x96, and we can save
8dbdcd070418 [xemacs-hg @ 2006-04-22 15:18:54 by stephent]
stephent
parents: 3025
diff changeset
231 space by using shorts (signedness doesn't matter). */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
232 static int *to_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
233 static int **to_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
234
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
235 static short *from_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
236 static short **from_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
237 static short ***from_unicode_blank_3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
238 static short ****from_unicode_blank_4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
239
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
240 static const struct memory_description to_unicode_level_0_desc_1[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
241 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
242 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
243
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
244 static const struct sized_memory_description to_unicode_level_0_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
245 sizeof (int), to_unicode_level_0_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
246 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
247
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
248 static const struct memory_description to_unicode_level_1_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
249 { XD_BLOCK_PTR, 0, 96, { &to_unicode_level_0_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
250 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
251 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
252
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
253 static const struct sized_memory_description to_unicode_level_1_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
254 sizeof (void *), to_unicode_level_1_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
255 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
256
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
257 static const struct memory_description to_unicode_description_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
258 { XD_BLOCK_PTR, 1, 96, { &to_unicode_level_0_desc } },
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
259 { XD_BLOCK_PTR, 2, 96, { &to_unicode_level_1_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
260 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
261 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
262
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
263 /* Not static because each charset has a set of to and from tables and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
264 needs to describe them to pdump. */
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
265 const struct sized_memory_description to_unicode_description = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
266 sizeof (void *), to_unicode_description_1
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
267 };
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
268
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
269 /* Used only for to_unicode_blank_2 */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
270 static const struct memory_description to_unicode_level_2_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
271 { XD_BLOCK_PTR, 0, 96, { &to_unicode_level_1_desc } },
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
272 { XD_END }
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
273 };
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
274
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
275 static const struct memory_description from_unicode_level_0_desc_1[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
276 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
277 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
278
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
279 static const struct sized_memory_description from_unicode_level_0_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
280 sizeof (short), from_unicode_level_0_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
281 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
282
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
283 static const struct memory_description from_unicode_level_1_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
284 { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_0_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
285 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
286 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
287
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
288 static const struct sized_memory_description from_unicode_level_1_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
289 sizeof (void *), from_unicode_level_1_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
290 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
291
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
292 static const struct memory_description from_unicode_level_2_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
293 { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_1_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
294 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
295 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
296
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
297 static const struct sized_memory_description from_unicode_level_2_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
298 sizeof (void *), from_unicode_level_2_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
299 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
300
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
301 static const struct memory_description from_unicode_level_3_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
302 { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_2_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
303 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
304 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
305
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
306 static const struct sized_memory_description from_unicode_level_3_desc = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
307 sizeof (void *), from_unicode_level_3_desc_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
308 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
309
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
310 static const struct memory_description from_unicode_description_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
311 { XD_BLOCK_PTR, 1, 256, { &from_unicode_level_0_desc } },
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
312 { XD_BLOCK_PTR, 2, 256, { &from_unicode_level_1_desc } },
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
313 { XD_BLOCK_PTR, 3, 256, { &from_unicode_level_2_desc } },
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
314 { XD_BLOCK_PTR, 4, 256, { &from_unicode_level_3_desc } },
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
315 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
316 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
317
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
318 /* Not static because each charset has a set of to and from tables and
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
319 needs to describe them to pdump. */
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
320 const struct sized_memory_description from_unicode_description = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
321 sizeof (void *), from_unicode_description_1
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
322 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
323
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
324 /* Used only for from_unicode_blank_4 */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
325 static const struct memory_description from_unicode_level_4_desc_1[] = {
2551
9f70af3ac939 [xemacs-hg @ 2005-02-03 16:14:02 by james]
james
parents: 2500
diff changeset
326 { XD_BLOCK_PTR, 0, 256, { &from_unicode_level_3_desc } },
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
327 { XD_END }
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
328 };
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
329
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
330 static Lisp_Object_dynarr *unicode_precedence_dynarr;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
331
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
332 static const struct memory_description lod_description_1[] = {
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
333 XD_DYNARR_DESC (Lisp_Object_dynarr, &lisp_object_description),
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
334 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
335 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
336
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
337 static const struct sized_memory_description lisp_object_dynarr_description = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
338 sizeof (Lisp_Object_dynarr),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
339 lod_description_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
340 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
341
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
342 Lisp_Object Vlanguage_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
343 Lisp_Object Vdefault_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
344
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
345 Lisp_Object Qignore_first_column;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
346
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
347 Lisp_Object Vcurrent_jit_charset;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
348 Lisp_Object Qlast_allocated_character;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
349 Lisp_Object Qccl_encode_to_ucs_2;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
350
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
351 Lisp_Object Vnumber_of_jit_charsets;
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
352 Lisp_Object Vlast_jit_charset_final;
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
353 Lisp_Object Vcharset_descr;
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
354
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
355
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
356
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
357 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
358 /* Unicode implementation */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
359 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
360
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
361 #define BREAKUP_UNICODE_CODE(val, u1, u2, u3, u4, levels) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
362 do { \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
363 int buc_val = (val); \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
364 \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
365 (u1) = buc_val >> 24; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
366 (u2) = (buc_val >> 16) & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
367 (u3) = (buc_val >> 8) & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
368 (u4) = buc_val & 255; \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
369 (levels) = (buc_val <= 0xFF ? 1 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
370 buc_val <= 0xFFFF ? 2 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
371 buc_val <= 0xFFFFFF ? 3 : \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
372 4); \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
373 } while (0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
374
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
375 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
376 init_blank_unicode_tables (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
377 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
378 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
379
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
380 from_unicode_blank_1 = xnew_array (short, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
381 from_unicode_blank_2 = xnew_array (short *, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
382 from_unicode_blank_3 = xnew_array (short **, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
383 from_unicode_blank_4 = xnew_array (short ***, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
384 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
385 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
386 /* #### IMWTK: Why does using -1 here work? Simply because there are
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
387 no existing 96x96 charsets?
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
388
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
389 Answer: I don't understand the concern. -1 indicates there is no
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
390 entry for this particular codepoint, which is always the case for
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
391 blank tables. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
392 from_unicode_blank_1[i] = (short) -1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
393 from_unicode_blank_2[i] = from_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
394 from_unicode_blank_3[i] = from_unicode_blank_2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
395 from_unicode_blank_4[i] = from_unicode_blank_3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
396 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
397
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
398 to_unicode_blank_1 = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
399 to_unicode_blank_2 = xnew_array (int *, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
400 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
401 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
402 /* Here -1 is guaranteed OK. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
403 to_unicode_blank_1[i] = -1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
404 to_unicode_blank_2[i] = to_unicode_blank_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
405 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
406 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
407
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
408 static void *
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
409 create_new_from_unicode_table (int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
410 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
411 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
412 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
413 /* WARNING: If you are thinking of compressing these, keep in
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
414 mind that sizeof (short) does not equal sizeof (short *). */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
415 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
416 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
417 short *newtab = xnew_array (short, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
418 memcpy (newtab, from_unicode_blank_1, 256 * sizeof (short));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
419 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
420 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
421 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
422 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
423 short **newtab = xnew_array (short *, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
424 memcpy (newtab, from_unicode_blank_2, 256 * sizeof (short *));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
425 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
426 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
427 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
428 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
429 short ***newtab = xnew_array (short **, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
430 memcpy (newtab, from_unicode_blank_3, 256 * sizeof (short **));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
431 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
432 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
433 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
434 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
435 short ****newtab = xnew_array (short ***, 256);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
436 memcpy (newtab, from_unicode_blank_4, 256 * sizeof (short ***));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
437 return newtab;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
438 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
439 default:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
440 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
441 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
442 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
443 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
444
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
445 /* Allocate and blank the tables.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
446 Loading them up is done by load-unicode-mapping-table. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
447 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
448 init_charset_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
449 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
450 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
451 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
452 int *to_table = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
453 memcpy (to_table, to_unicode_blank_1, 96 * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
454 XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
455 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
456 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
457 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
458 int **to_table = xnew_array (int *, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
459 memcpy (to_table, to_unicode_blank_2, 96 * sizeof (int *));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
460 XCHARSET_TO_UNICODE_TABLE (charset) = to_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
461 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
462
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
463 {
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
464 XCHARSET_FROM_UNICODE_TABLE (charset) =
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
465 create_new_from_unicode_table (1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
466 XCHARSET_FROM_UNICODE_LEVELS (charset) = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
467 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
468 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
469
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
470 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
471 free_from_unicode_table (void *table, int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
472 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
473 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
474
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
475 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
476 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
477 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
478 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
479 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
480 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
481 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
482 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
483 free_from_unicode_table (tab[i], 1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
484 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
485 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
486 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
487 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
488 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
489 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
490 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
491 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
492 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
493 free_from_unicode_table (tab[i], 2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
494 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
495 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
496 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
497 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
498 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
499 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
500 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
501 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
502 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
503 free_from_unicode_table (tab[i], 3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
504 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
505 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
506 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
507 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
508
4976
16112448d484 Rename xfree(FOO, TYPE) -> xfree(FOO)
Ben Wing <ben@xemacs.org>
parents: 4953
diff changeset
509 xfree (table);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
510 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
511
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
512 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
513 free_to_unicode_table (void *table, int level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
514 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
515 if (level == 2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
516 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
517 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
518 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
519
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
520 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
521 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
522 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
523 free_to_unicode_table (tab[i], 1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
524 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
525 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
526
4976
16112448d484 Rename xfree(FOO, TYPE) -> xfree(FOO)
Ben Wing <ben@xemacs.org>
parents: 4953
diff changeset
527 xfree (table);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
528 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
529
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
530 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
531 free_charset_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
532 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
533 free_to_unicode_table (XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
534 XCHARSET_DIMENSION (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
535 free_from_unicode_table (XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
536 XCHARSET_FROM_UNICODE_LEVELS (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
537 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
538
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
539 #ifdef MEMORY_USAGE_STATS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
540
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
541 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
542 compute_from_unicode_table_size_1 (void *table, int level,
5157
1fae11d56ad2 redo memory-usage mechanism, add way of dynamically initializing Lisp objects
Ben Wing <ben@xemacs.org>
parents: 4976
diff changeset
543 struct usage_stats *stats)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
544 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
545 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
546 Bytecount size = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
547
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
548 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
549 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
550 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
551 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
552 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
553 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
554 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
555 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
556 size += compute_from_unicode_table_size_1 (tab[i], 1, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
557 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
558 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
559 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
560 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
561 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
562 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
563 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
564 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
565 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
566 size += compute_from_unicode_table_size_1 (tab[i], 2, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
567 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
568 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
569 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
570 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
571 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
572 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
573 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
574 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
575 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
576 size += compute_from_unicode_table_size_1 (tab[i], 3, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
577 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
578 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
579 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
580 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
581
3024
b7f26b2f78bd [xemacs-hg @ 2005-10-25 08:32:40 by ben]
ben
parents: 3017
diff changeset
582 size += malloced_storage_size (table,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
583 256 * (level == 1 ? sizeof (short) :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
584 sizeof (void *)),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
585 stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
586 return size;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
587 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
588
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
589 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
590 compute_to_unicode_table_size_1 (void *table, int level,
5157
1fae11d56ad2 redo memory-usage mechanism, add way of dynamically initializing Lisp objects
Ben Wing <ben@xemacs.org>
parents: 4976
diff changeset
591 struct usage_stats *stats)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
592 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
593 Bytecount size = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
594
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
595 if (level == 2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
596 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
597 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
598 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
599
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
600 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
601 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
602 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
603 size += compute_to_unicode_table_size_1 (tab[i], 1, stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
604 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
605 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
606
3024
b7f26b2f78bd [xemacs-hg @ 2005-10-25 08:32:40 by ben]
ben
parents: 3017
diff changeset
607 size += malloced_storage_size (table,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
608 96 * (level == 1 ? sizeof (int) :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
609 sizeof (void *)),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
610 stats);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
611 return size;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
612 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
613
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
614 Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
615 compute_from_unicode_table_size (Lisp_Object charset,
5157
1fae11d56ad2 redo memory-usage mechanism, add way of dynamically initializing Lisp objects
Ben Wing <ben@xemacs.org>
parents: 4976
diff changeset
616 struct usage_stats *stats)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
617 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
618 return (compute_from_unicode_table_size_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
619 (XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
620 XCHARSET_FROM_UNICODE_LEVELS (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
621 stats));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
622 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
623
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
624 Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
625 compute_to_unicode_table_size (Lisp_Object charset,
5157
1fae11d56ad2 redo memory-usage mechanism, add way of dynamically initializing Lisp objects
Ben Wing <ben@xemacs.org>
parents: 4976
diff changeset
626 struct usage_stats *stats)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
627 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
628 return (compute_to_unicode_table_size_1
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
629 (XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
630 XCHARSET_DIMENSION (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
631 stats));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
632 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
633
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
634 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
635
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
636 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
637
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
638 /* "Sledgehammer checks" are checks that verify the self-consistency
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
639 of an entire structure every time a change is about to be made or
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
640 has been made to the structure. Not fast but a pretty much
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
641 sure-fire way of flushing out any incorrectnesses in the algorithms
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
642 that create the structure.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
643
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
644 Checking only after a change has been made will speed things up by
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
645 a factor of 2, but it doesn't absolutely prove that the code just
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
646 checked caused the problem; perhaps it happened elsewhere, either
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
647 in some code you forgot to sledgehammer check or as a result of
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
648 data corruption. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
649
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
650 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
651 assert_not_any_blank_table (void *tab)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
652 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
653 assert (tab != from_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
654 assert (tab != from_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
655 assert (tab != from_unicode_blank_3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
656 assert (tab != from_unicode_blank_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
657 assert (tab != to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
658 assert (tab != to_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
659 assert (tab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
660 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
661
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
662 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
663 sledgehammer_check_from_table (Lisp_Object charset, void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
664 int codetop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
665 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
666 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
667
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
668 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
669 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
670 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
671 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
672 short *tab = (short *) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
673 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
674 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
675 if (tab[i] != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
676 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
677 Lisp_Object char_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
678 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
679
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
680 assert (valid_ichar_p (tab[i]));
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
681 BREAKUP_ICHAR (tab[i], char_charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
682 assert (EQ (charset, char_charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
683 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
684 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
685 int *to_table =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
686 (int *) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
687 assert_not_any_blank_table (to_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
688 assert (to_table[c1 - 32] == (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
689 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
690 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
691 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
692 int **to_table =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
693 (int **) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
694 assert_not_any_blank_table (to_table);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
695 assert_not_any_blank_table (to_table[c1 - 32]);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
696 assert (to_table[c1 - 32][c2 - 32] == (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
697 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
698 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
699 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
700 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
701 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
702 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
703 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
704 short **tab = (short **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
705 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
706 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
707 if (tab[i] != from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
708 sledgehammer_check_from_table (charset, tab[i], 1,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
709 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
710 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
711 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
712 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
713 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
714 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
715 short ***tab = (short ***) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
716 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
717 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
718 if (tab[i] != from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
719 sledgehammer_check_from_table (charset, tab[i], 2,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
720 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
721 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
722 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
723 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
724 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
725 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
726 short ****tab = (short ****) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
727 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
728 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
729 if (tab[i] != from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
730 sledgehammer_check_from_table (charset, tab[i], 3,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
731 (codetop << 8) + i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
732 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
733 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
734 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
735 default:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
736 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
737 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
738 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
739
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
740 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
741 sledgehammer_check_to_table (Lisp_Object charset, void *table, int level,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
742 int codetop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
743 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
744 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
745
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
746 switch (level)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
747 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
748 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
749 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
750 int *tab = (int *) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
751
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
752 if (XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
753 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
754 assert (tab[0] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
755 assert (tab[95] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
756 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
757
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
758 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
759 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
760 if (tab[i] != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
761 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
762 int u4, u3, u2, u1, levels;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
763 Ichar ch;
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
764 Ichar this_ch;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
765 short val;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
766 void *frtab = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
767
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
768 if (XCHARSET_DIMENSION (charset) == 1)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
769 this_ch = make_ichar (charset, i + 32, 0);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
770 else
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
771 this_ch = make_ichar (charset, codetop + 32, i + 32);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
772
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
773 assert (tab[i] >= 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
774 BREAKUP_UNICODE_CODE (tab[i], u4, u3, u2, u1, levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
775 assert (levels <= XCHARSET_FROM_UNICODE_LEVELS (charset));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
776
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
777 switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
778 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
779 case 1: val = ((short *) frtab)[u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
780 case 2: val = ((short **) frtab)[u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
781 case 3: val = ((short ***) frtab)[u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
782 case 4: val = ((short ****) frtab)[u4][u3][u2][u1]; break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
783 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
784 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
785
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
786 ch = make_ichar (charset, val >> 8, val & 0xFF);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
787 assert (ch == this_ch);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
788
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
789 switch (XCHARSET_FROM_UNICODE_LEVELS (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
790 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
791 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
792 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
793 frtab = ((short ****) frtab)[u4];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
794 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
795 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
796 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
797 frtab = ((short ***) frtab)[u3];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
798 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
799 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
800 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
801 frtab = ((short **) frtab)[u2];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
802 /* fall through */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
803 case 1:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
804 assert_not_any_blank_table (frtab);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
805 break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
806 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
807 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
808 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
809 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
810 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
811 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
812 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
813 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
814 int **tab = (int **) table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
815
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
816 if (XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
817 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
818 assert (tab[0] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
819 assert (tab[95] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
820 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
821
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
822 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
823 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
824 if (tab[i] != to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
825 sledgehammer_check_to_table (charset, tab[i], 1, i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
826 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
827 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
828 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
829 default:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
830 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
831 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
832 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
833
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
834 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
835 sledgehammer_check_unicode_tables (Lisp_Object charset)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
836 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
837 /* verify that the blank tables have not been modified */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
838 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
839 int from_level = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
840 int to_level = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
841
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
842 for (i = 0; i < 256; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
843 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
844 assert (from_unicode_blank_1[i] == (short) -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
845 assert (from_unicode_blank_2[i] == from_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
846 assert (from_unicode_blank_3[i] == from_unicode_blank_2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
847 assert (from_unicode_blank_4[i] == from_unicode_blank_3);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
848 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
849
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
850 for (i = 0; i < 96; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
851 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
852 assert (to_unicode_blank_1[i] == -1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
853 assert (to_unicode_blank_2[i] == to_unicode_blank_1);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
854 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
855
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
856 assert (from_level >= 1 && from_level <= 4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
857
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
858 sledgehammer_check_from_table (charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
859 XCHARSET_FROM_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
860 from_level, 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
861
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
862 sledgehammer_check_to_table (charset,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
863 XCHARSET_TO_UNICODE_TABLE (charset),
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
864 XCHARSET_DIMENSION (charset), 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
865 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
866
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
867 #endif /* SLEDGEHAMMER_CHECK_UNICODE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
868
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
869 static void
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
870 set_unicode_conversion (Ichar chr, int code)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
871 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
872 Lisp_Object charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
873 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
874
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
875 BREAKUP_ICHAR (chr, charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
876
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
877 /* I tried an assert on code > 255 || chr == code, but that fails because
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
878 Mule gives many Latin characters separate code points for different
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
879 ISO 8859 coded character sets. Obvious in hindsight.... */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
880 assert (!EQ (charset, Vcharset_ascii) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
881 assert (!EQ (charset, Vcharset_latin_iso8859_1) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
882 assert (!EQ (charset, Vcharset_control_1) || chr == code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
883
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
884 /* This assert is needed because it is simply unimplemented. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
885 assert (!EQ (charset, Vcharset_composite));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
886
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
887 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
888 sledgehammer_check_unicode_tables (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
889 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
890
2704
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
891 if (EQ(charset, Vcharset_ascii) || EQ(charset, Vcharset_control_1))
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
892 return;
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
893
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
894 /* First, the char -> unicode translation */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
895
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
896 if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
897 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
898 int *to_table = (int *) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
899 to_table[c1 - 32] = code;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
900 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
901 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
902 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
903 int **to_table_2 = (int **) XCHARSET_TO_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
904 int *to_table_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
905
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
906 assert (XCHARSET_DIMENSION (charset) == 2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
907 to_table_1 = to_table_2[c1 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
908 if (to_table_1 == to_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
909 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
910 to_table_1 = xnew_array (int, 96);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
911 memcpy (to_table_1, to_unicode_blank_1, 96 * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
912 to_table_2[c1 - 32] = to_table_1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
913 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
914 to_table_1[c2 - 32] = code;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
915 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
916
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
917 /* Then, unicode -> char: much harder */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
918
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
919 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
920 int charset_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
921 int u4, u3, u2, u1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
922 int code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
923 BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
924
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
925 charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
926
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
927 /* Make sure the charset's tables have at least as many levels as
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
928 the code point has: Note that the charset is guaranteed to have
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
929 at least one level, because it was created that way */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
930 if (charset_levels < code_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
931 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
932 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
933
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
934 assert (charset_levels > 0);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
935 for (i = 2; i <= code_levels; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
936 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
937 if (charset_levels < i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
938 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
939 void *old_table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
940 void *table = create_new_from_unicode_table (i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
941 XCHARSET_FROM_UNICODE_TABLE (charset) = table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
942
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
943 switch (i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
944 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
945 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
946 ((short **) table)[0] = (short *) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
947 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
948 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
949 ((short ***) table)[0] = (short **) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
950 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
951 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
952 ((short ****) table)[0] = (short ***) old_table;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
953 break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
954 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
955 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
956 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
957 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
958
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
959 charset_levels = code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
960 XCHARSET_FROM_UNICODE_LEVELS (charset) = code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
961 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
962
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
963 /* Now, make sure there is a non-default table at each level */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
964 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
965 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
966 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
967
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
968 for (i = charset_levels; i >= 2; i--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
969 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
970 switch (i)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
971 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
972 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
973 if (((short ****) table)[u4] == from_unicode_blank_3)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
974 ((short ****) table)[u4] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
975 ((short ***) create_new_from_unicode_table (3));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
976 table = ((short ****) table)[u4];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
977 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
978 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
979 if (((short ***) table)[u3] == from_unicode_blank_2)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
980 ((short ***) table)[u3] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
981 ((short **) create_new_from_unicode_table (2));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
982 table = ((short ***) table)[u3];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
983 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
984 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
985 if (((short **) table)[u2] == from_unicode_blank_1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
986 ((short **) table)[u2] =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
987 ((short *) create_new_from_unicode_table (1));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
988 table = ((short **) table)[u2];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
989 break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
990 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
991 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
992 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
993 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
994
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
995 /* Finally, set the character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
996
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
997 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
998 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
999 switch (charset_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1000 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1001 case 1: ((short *) table)[u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1002 case 2: ((short **) table)[u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1003 case 3: ((short ***) table)[u3][u2][u1] = (c1 << 8) + c2; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1004 case 4: ((short ****) table)[u4][u3][u2][u1] = (c1 << 8) + c2; break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1005 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1006 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1007 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1008 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1009
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1010 #ifdef SLEDGEHAMMER_CHECK_UNICODE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1011 sledgehammer_check_unicode_tables (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1012 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1013 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1014
788
026c5bf9c134 [xemacs-hg @ 2002-03-21 07:29:57 by ben]
ben
parents: 778
diff changeset
1015 int
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1016 ichar_to_unicode (Ichar chr)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1017 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1018 Lisp_Object charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1019 int c1, c2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1020
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1021 type_checking_assert (valid_ichar_p (chr));
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1022 /* This shortcut depends on the representation of an Ichar, see text.c. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1023 if (chr < 256)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1024 return (int) chr;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1025
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1026 BREAKUP_ICHAR (chr, charset, c1, c2);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1027 if (EQ (charset, Vcharset_composite))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1028 return -1; /* #### don't know how to handle */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1029 else if (XCHARSET_DIMENSION (charset) == 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1030 return ((int *) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1031 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1032 return ((int **) XCHARSET_TO_UNICODE_TABLE (charset))[c1 - 32][c2 - 32];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1033 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1034
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1035 static Ichar
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1036 get_free_codepoint(Lisp_Object charset)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1037 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1038 Lisp_Object name = Fcharset_name(charset);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1039 Lisp_Object zeichen = Fget(name, Qlast_allocated_character, Qnil);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1040 Ichar res;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1041
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1042 /* Only allow this with the 96x96 character sets we are using for
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1043 temporary Unicode support. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1044 assert(2 == XCHARSET_DIMENSION(charset) && 96 == XCHARSET_CHARS(charset));
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1045
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1046 if (!NILP(zeichen))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1047 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1048 int c1, c2;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1049
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1050 BREAKUP_ICHAR(XCHAR(zeichen), charset, c1, c2);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1051
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1052 if (127 == c1 && 127 == c2)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1053 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1054 /* We've already used the hightest-numbered character in this
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1055 set--tell our caller to create another. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1056 return -1;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1057 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1058
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1059 if (127 == c2)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1060 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1061 ++c1;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1062 c2 = 0x20;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1063 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1064 else
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1065 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1066 ++c2;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1067 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1068
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1069 res = make_ichar(charset, c1, c2);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1070 Fput(name, Qlast_allocated_character, make_char(res));
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1071 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1072 else
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1073 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1074 res = make_ichar(charset, 32, 32);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1075 Fput(name, Qlast_allocated_character, make_char(res));
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1076 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1077 return res;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1078 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1079
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1080 /* The just-in-time creation of XEmacs characters that correspond to unknown
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1081 Unicode code points happens when:
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1082
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1083 1. The lookup would otherwise fail.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1084
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1085 2. The charsets array is the nil or the default.
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1086
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1087 If there are no free code points in the just-in-time Unicode character
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1088 set, and the charsets array is the default unicode precedence list,
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1089 create a new just-in-time Unicode character set, add it at the end of the
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1090 unicode precedence list, create the XEmacs character in that character
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1091 set, and return it. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1092
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1093 static Ichar
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1094 unicode_to_ichar (int code, Lisp_Object_dynarr *charsets)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1095 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1096 int u1, u2, u3, u4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1097 int code_levels;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1098 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1099 int n = Dynarr_length (charsets);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1100
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1101 type_checking_assert (code >= 0);
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1102 /* This shortcut depends on the representation of an Ichar, see text.c.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1103 Note that it may _not_ be extended to U+00A0 to U+00FF (many ISO 8859
893
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
1104 coded character sets have points that map into that region, so this
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
1105 function is many-valued). */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1106 if (code < 0xA0)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1107 return (Ichar) code;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1108
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1109 BREAKUP_UNICODE_CODE (code, u4, u3, u2, u1, code_levels);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1110
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1111 for (i = 0; i < n; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1112 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1113 Lisp_Object charset = Dynarr_at (charsets, i);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1114 int charset_levels = XCHARSET_FROM_UNICODE_LEVELS (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1115 if (charset_levels >= code_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1116 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1117 void *table = XCHARSET_FROM_UNICODE_TABLE (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1118 short retval;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1119
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1120 switch (charset_levels)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1121 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1122 case 1: retval = ((short *) table)[u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1123 case 2: retval = ((short **) table)[u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1124 case 3: retval = ((short ***) table)[u3][u2][u1]; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1125 case 4: retval = ((short ****) table)[u4][u3][u2][u1]; break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1126 default: ABORT (); retval = 0;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1127 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1128
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1129 if (retval != -1)
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1130 return make_ichar (charset, retval >> 8, retval & 0xFF);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1131 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1132 }
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1133
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1134 /* Only do the magic just-in-time assignment if we're using the default
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1135 list. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1136 if (unicode_precedence_dynarr == charsets)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1137 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1138 if (NILP (Vcurrent_jit_charset) ||
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1139 (-1 == (i = get_free_codepoint(Vcurrent_jit_charset))))
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1140 {
3452
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1141 Ibyte setname[32];
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1142 int number_of_jit_charsets = XINT (Vnumber_of_jit_charsets);
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1143 Ascbyte last_jit_charset_final = XCHAR (Vlast_jit_charset_final);
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1144
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1145 /* This final byte shit is, umm, not that cool. */
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1146 assert (last_jit_charset_final >= 0x30);
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1147
3452
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1148 /* Assertion added partly because our Win32 layer doesn't
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1149 support snprintf; with this, we're sure it won't overflow
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1150 the buffer. */
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1151 assert(100 > number_of_jit_charsets);
551c008d3777 [xemacs-hg @ 2006-06-14 06:10:08 by aidan]
aidan
parents: 3439
diff changeset
1152
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1153 qxesprintf(setname, "jit-ucs-charset-%d", number_of_jit_charsets);
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1154
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1155 Vcurrent_jit_charset = Fmake_charset
4953
304aebb79cd3 function renamings to track names of char typedefs
Ben Wing <ben@xemacs.org>
parents: 4952
diff changeset
1156 (intern_istring (setname), Vcharset_descr,
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1157 /* Set encode-as-utf-8 to t, to have this character set written
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1158 using UTF-8 escapes in escape-quoted and ctext. This
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1159 sidesteps the fact that our internal character -> Unicode
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1160 mapping is not stable from one invocation to the next. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1161 nconc2 (list2(Qencode_as_utf_8, Qt),
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1162 nconc2 (list6(Qcolumns, make_int(1), Qchars, make_int(96),
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1163 Qdimension, make_int(2)),
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
1164 list6(Qregistries, Qunicode_registries,
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1165 Qfinal, make_char(last_jit_charset_final),
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1166 /* This CCL program is initialised in
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1167 unicode.el. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1168 Qccl_program, Qccl_encode_to_ucs_2))));
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1169
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1170 /* Record for the Unicode infrastructure that we've created
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1171 this character set. */
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1172 Vnumber_of_jit_charsets = make_int (number_of_jit_charsets + 1);
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1173 Vlast_jit_charset_final = make_char (last_jit_charset_final + 1);
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1174
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1175 i = get_free_codepoint(Vcurrent_jit_charset);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1176 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1177
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1178 if (-1 != i)
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1179 {
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1180 set_unicode_conversion((Ichar)i, code);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1181 /* No need to add the charset to the end of the list; it's done
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1182 automatically. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1183 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1184 }
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1185 return (Ichar) i;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1186 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1187
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1188 /* Add charsets to precedence list.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1189 LIST must be a list of charsets. Charsets which are in the list more
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1190 than once are given the precedence implied by their earliest appearance.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1191 Later appearances are ignored. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1192 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1193 add_charsets_to_precedence_list (Lisp_Object list, int *lbs,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1194 Lisp_Object_dynarr *dynarr)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1195 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1196 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1197 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1198 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1199 Lisp_Object charset = Fget_charset (elt);
778
2923009caf47 [xemacs-hg @ 2002-03-16 10:38:59 by ben]
ben
parents: 771
diff changeset
1200 int lb = XCHARSET_LEADING_BYTE (charset);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1201 if (lbs[lb - MIN_LEADING_BYTE] == 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1202 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1203 Dynarr_add (dynarr, charset);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1204 lbs[lb - MIN_LEADING_BYTE] = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1205 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1206 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1207 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1208 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1209
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1210 /* Rebuild the charset precedence array.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1211 The "charsets preferred for the current language" get highest precedence,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1212 followed by the "charsets preferred by default", ordered as in
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1213 Vlanguage_unicode_precedence_list and Vdefault_unicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1214 respectively. All remaining charsets follow in an arbitrary order. */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1215 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1216 recalculate_unicode_precedence (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1217 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1218 int lbs[NUM_LEADING_BYTES];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1219 int i;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1220
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1221 for (i = 0; i < NUM_LEADING_BYTES; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1222 lbs[i] = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1223
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1224 Dynarr_reset (unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1225
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1226 add_charsets_to_precedence_list (Vlanguage_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1227 lbs, unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1228 add_charsets_to_precedence_list (Vdefault_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1229 lbs, unicode_precedence_dynarr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1230
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1231 for (i = 0; i < NUM_LEADING_BYTES; i++)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1232 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1233 if (lbs[i] == 0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1234 {
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
1235 Lisp_Object charset = charset_by_leading_byte (i + MIN_LEADING_BYTE);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1236 if (!NILP (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1237 Dynarr_add (unicode_precedence_dynarr, charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1238 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1239 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1240 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1241
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1242 DEFUN ("unicode-precedence-list",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1243 Funicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1244 0, 0, 0, /*
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1245 Return the precedence order among charsets used for Unicode decoding.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1246
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1247 Value is a list of charsets, which are searched in order for a translation
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1248 matching a given Unicode character.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1249
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1250 The highest precedence is given to the language-specific precedence list of
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1251 charsets, defined by `set-language-unicode-precedence-list'. These are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1252 followed by charsets in the default precedence list, defined by
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1253 `set-default-unicode-precedence-list'. Charsets occurring multiple times are
5384
3889ef128488 Fix misspelled words, and some grammar, across the entire source tree.
Jerry James <james@xemacs.org>
parents: 5345
diff changeset
1254 given precedence according to their first occurrence in either list. These
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1255 are followed by the remaining charsets, in some arbitrary order.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1256
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1257 The language-specific precedence list is meant to be set as part of the
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1258 language environment initialization; the default precedence list is meant
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1259 to be set by the user.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1260
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1261 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1262 */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1263 ())
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1264 {
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1265 int i;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1266 Lisp_Object list = Qnil;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1267
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1268 for (i = Dynarr_length (unicode_precedence_dynarr) - 1; i >= 0; i--)
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1269 list = Fcons (Dynarr_at (unicode_precedence_dynarr, i), list);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1270 return list;
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1271 }
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1272
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1273
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1274 /* #### This interface is wrong. Cyrillic users and Chinese users are going
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1275 to have varying opinions about whether ISO Cyrillic, KOI8-R, or Windows
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1276 1251 should take precedence, and whether Big Five or CNS should take
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1277 precedence, respectively. This means that users are sometimes going to
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1278 want to set Vlanguage_unicode_precedence_list.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1279 Furthermore, this should be language-local (buffer-local would be a
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1280 reasonable approximation).
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1281
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1282 Answer: You are right, this needs rethinking. */
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1283 DEFUN ("set-language-unicode-precedence-list",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1284 Fset_language_unicode_precedence_list,
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1285 1, 1, 0, /*
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1286 Set the language-specific precedence of charsets in Unicode decoding.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1287 LIST is a list of charsets.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1288 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1289
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1290 #### NOTE: This interface may be changed.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1291 */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1292 (list))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1293 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1294 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1295 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1296 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1297 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1298
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1299 Vlanguage_unicode_precedence_list = list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1300 recalculate_unicode_precedence ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1301 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1302 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1303
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1304 DEFUN ("language-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1305 Flanguage_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1306 0, 0, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1307 Return the language-specific precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1308 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1309
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1310 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1311 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1312 ())
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1313 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1314 return Vlanguage_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1315 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1316
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1317 DEFUN ("set-default-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1318 Fset_default_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1319 1, 1, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1320 Set the default precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1321 This is intended to be set by the user. See
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1322 `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1323
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1324 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1325 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1326 (list))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1327 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1328 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1329 EXTERNAL_LIST_LOOP_2 (elt, list)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1330 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1331 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1332
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1333 Vdefault_unicode_precedence_list = list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1334 recalculate_unicode_precedence ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1335 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1336 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1337
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1338 DEFUN ("default-unicode-precedence-list",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1339 Fdefault_unicode_precedence_list,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1340 0, 0, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1341 Return the default precedence list used for Unicode decoding.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1342 See `unicode-precedence-list' for more information.
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1343
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1344 #### NOTE: This interface may be changed.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1345 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1346 ())
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1347 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1348 return Vdefault_unicode_precedence_list;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1349 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1350
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1351 DEFUN ("set-unicode-conversion", Fset_unicode_conversion,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1352 2, 2, 0, /*
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1353 Add conversion information between Unicode codepoints and characters.
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1354 Conversions for U+0000 to U+00FF are hardwired to ASCII, Control-1, and
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1355 Latin-1. Attempts to set these values will raise an error.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1356
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1357 CHARACTER is one of the following:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1358
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1359 -- A character (in which case CODE must be a non-negative integer; values
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1360 above 2^20 - 1 are allowed for the purpose of specifying private
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1361 characters, but are illegal in standard Unicode---they will cause errors
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1362 when converted to utf-16)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1363 -- A vector of characters (in which case CODE must be a vector of integers
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1364 of the same length)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1365 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1366 (character, code))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1367 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1368 Lisp_Object charset;
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1369 int ichar, unicode;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1370
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1371 CHECK_CHAR (character);
5307
c096d8051f89 Have NATNUMP give t for positive bignums; check limits appropriately.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5157
diff changeset
1372
c096d8051f89 Have NATNUMP give t for positive bignums; check limits appropriately.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5157
diff changeset
1373 check_integer_range (code, Qzero, make_integer (EMACS_INT_MAX));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1374
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1375 unicode = XINT (code);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1376 ichar = XCHAR (character);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1377 charset = ichar_charset (ichar);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1378
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1379 /* The translations of ASCII, Control-1, and Latin-1 code points are
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1380 hard-coded in ichar_to_unicode and unicode_to_ichar.
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1381
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1382 Checking unicode < 256 && ichar != unicode is wrong because Mule gives
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1383 many Latin characters code points in a few different character sets. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1384 if ((EQ (charset, Vcharset_ascii) ||
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1385 EQ (charset, Vcharset_control_1) ||
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1386 EQ (charset, Vcharset_latin_iso8859_1))
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1387 && unicode != ichar)
893
c9f067fd71a3 [xemacs-hg @ 2002-07-02 12:32:34 by stephent]
stephent
parents: 877
diff changeset
1388 signal_error (Qinvalid_argument, "Can't change Unicode translation for ASCII, Control-1 or Latin-1 character",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1389 character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1390
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1391 /* #### Composite characters are not properly implemented yet. */
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1392 if (EQ (charset, Vcharset_composite))
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1393 signal_error (Qinvalid_argument, "Can't set Unicode translation for Composite char",
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1394 character);
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1395
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1396 set_unicode_conversion (ichar, unicode);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1397 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1398 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1399
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1400 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1401
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
1402 DEFUN ("char-to-unicode", Fchar_to_unicode, 1, 1, 0, /*
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1403 Convert character to Unicode codepoint.
3025
facf3239ba30 [xemacs-hg @ 2005-10-25 11:16:19 by ben]
ben
parents: 3024
diff changeset
1404 When there is no international support (i.e. the `mule' feature is not
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1405 present), this function simply does `char-to-int'.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1406 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1407 (character))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1408 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1409 CHECK_CHAR (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1410 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1411 return make_int (ichar_to_unicode (XCHAR (character)));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1412 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1413 return Fchar_to_int (character);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1414 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1415 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1416
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
1417 DEFUN ("unicode-to-char", Funicode_to_char, 1, 2, 0, /*
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1418 Convert Unicode codepoint to character.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1419 CODE should be a non-negative integer.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1420 If CHARSETS is given, it should be a list of charsets, and only those
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1421 charsets will be consulted, in the given order, for a translation.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1422 Otherwise, the default ordering of all charsets will be given (see
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1423 `set-unicode-charset-precedence').
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1424
3025
facf3239ba30 [xemacs-hg @ 2005-10-25 11:16:19 by ben]
ben
parents: 3024
diff changeset
1425 When there is no international support (i.e. the `mule' feature is not
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1426 present), this function simply does `int-to-char' and ignores the CHARSETS
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1427 argument.
2622
c8a9be2d4728 [xemacs-hg @ 2005-02-28 23:36:30 by aidan]
aidan
parents: 2551
diff changeset
1428
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1429 If the CODE would not otherwise be converted to an XEmacs character, and the
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1430 list of character sets to be consulted is nil or the default, a new XEmacs
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1431 character will be created for it in one of the `jit-ucs-charset' Mule
4268
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1432 character sets, and that character will be returned.
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1433
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1434 This is limited to around 400,000 characters per XEmacs session, though, so
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1435 while normal usage will not be problematic, things like:
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1436
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1437 \(dotimes (i #x110000) (decode-char 'ucs i))
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1438
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1439 will eventually error. The long-term solution to this is Unicode as an
75d0292c1bff [xemacs-hg @ 2007-11-14 19:41:04 by aidan]
aidan
parents: 4096
diff changeset
1440 internal encoding.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1441 */
2333
ba4677f54a05 [xemacs-hg @ 2004-10-14 17:26:18 by james]
james
parents: 2286
diff changeset
1442 (code, USED_IF_MULE (charsets)))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1443 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1444 #ifdef MULE
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1445 Lisp_Object_dynarr *dyn;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1446 int lbs[NUM_LEADING_BYTES];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1447 int c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1448
5307
c096d8051f89 Have NATNUMP give t for positive bignums; check limits appropriately.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5157
diff changeset
1449 check_integer_range (code, Qzero, make_integer (EMACS_INT_MAX));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1450 c = XINT (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1451 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1452 EXTERNAL_LIST_LOOP_2 (elt, charsets)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1453 Fget_charset (elt);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1454 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1455
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1456 if (NILP (charsets))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1457 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1458 Ichar ret = unicode_to_ichar (c, unicode_precedence_dynarr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1459 if (ret == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1460 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1461 return make_char (ret);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1462 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1463
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1464 dyn = Dynarr_new (Lisp_Object);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1465 memset (lbs, 0, NUM_LEADING_BYTES * sizeof (int));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1466 add_charsets_to_precedence_list (charsets, lbs, dyn);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1467 {
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1468 Ichar ret = unicode_to_ichar (c, dyn);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1469 Dynarr_free (dyn);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1470 if (ret == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1471 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1472 return make_char (ret);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1473 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1474 #else
5307
c096d8051f89 Have NATNUMP give t for positive bignums; check limits appropriately.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5157
diff changeset
1475 check_integer_range (code, Qzero, make_integer (EMACS_INT_MAX));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1476 return Fint_to_char (code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1477 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1478 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1479
872
79c6ff3eef26 [xemacs-hg @ 2002-06-20 21:18:01 by ben]
ben
parents: 867
diff changeset
1480 #ifdef MULE
79c6ff3eef26 [xemacs-hg @ 2002-06-20 21:18:01 by ben]
ben
parents: 867
diff changeset
1481
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1482 static Lisp_Object
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1483 cerrar_el_fulano (Lisp_Object fulano)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1484 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1485 FILE *file = (FILE *) get_opaque_ptr (fulano);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1486 retry_fclose (file);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1487 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1488 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1489
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1490 DEFUN ("load-unicode-mapping-table", Fload_unicode_mapping_table,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1491 2, 6, 0, /*
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1492 Load Unicode tables with the Unicode mapping data in FILENAME for CHARSET.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1493 Data is text, in the form of one translation per line -- charset
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1494 codepoint followed by Unicode codepoint. Numbers are decimal or hex
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1495 \(preceded by 0x). Comments are marked with a #. Charset codepoints
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1496 for two-dimensional charsets have the first octet stored in the
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1497 high 8 bits of the hex number and the second in the low 8 bits.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1498
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1499 If START and END are given, only charset codepoints within the given
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1500 range will be processed. (START and END apply to the codepoints in the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1501 file, before OFFSET is applied.)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1502
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1503 If OFFSET is given, that value will be added to all charset codepoints
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1504 in the file to obtain the internal charset codepoint. \(We assume
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1505 that octets in the table are in the range 33 to 126 or 32 to 127. If
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1506 you have a table in ku-ten form, with octets in the range 1 to 94, you
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1507 will have to use an offset of 5140, i.e. 0x2020.)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1508
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1509 FLAGS, if specified, control further how the tables are interpreted
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1510 and are used to special-case certain known format deviations in the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1511 Unicode tables or in the charset:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1512
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1513 `ignore-first-column'
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1514 The JIS X 0208 tables have 3 columns of data instead of 2. The first
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1515 column contains the Shift-JIS codepoint, which we ignore.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1516 `big5'
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1517 The charset codepoints are Big Five codepoints; convert it to the
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1518 hacked-up Mule codepoint in `chinese-big5-1' or `chinese-big5-2'.
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1519 */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1520 (filename, charset, start, end, offset, flags))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1521 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1522 int st = 0, en = INT_MAX, of = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1523 FILE *file;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1524 struct gcpro gcpro1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1525 char line[1025];
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1526 int fondo = specpdl_depth ();
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1527 int ignore_first_column = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1528 int big5 = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1529
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1530 CHECK_STRING (filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1531 charset = Fget_charset (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1532 if (!NILP (start))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1533 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1534 CHECK_INT (start);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1535 st = XINT (start);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1536 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1537 if (!NILP (end))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1538 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1539 CHECK_INT (end);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1540 en = XINT (end);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1541 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1542 if (!NILP (offset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1543 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1544 CHECK_INT (offset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1545 of = XINT (offset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1546 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1547
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1548 if (!LISTP (flags))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1549 flags = list1 (flags);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1550
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1551 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1552 EXTERNAL_LIST_LOOP_2 (elt, flags)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1553 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1554 if (EQ (elt, Qignore_first_column))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1555 ignore_first_column = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1556 else if (EQ (elt, Qbig5))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1557 big5 = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1558 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1559 invalid_constant
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
1560 ("Unrecognized `load-unicode-mapping-table' flag", elt);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1561 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1562 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1563
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1564 GCPRO1 (filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1565 filename = Fexpand_file_name (filename, Qnil);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1566 file = qxe_fopen (XSTRING_DATA (filename), READ_TEXT);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1567 if (!file)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1568 report_file_error ("Cannot open", filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1569 record_unwind_protect (cerrar_el_fulano, make_opaque_ptr (file));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1570 while (fgets (line, sizeof (line), file))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1571 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1572 char *p = line;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1573 int cp1, cp2, endcount;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1574 int cp1high, cp1low;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1575 int dummy;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1576
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1577 while (*p) /* erase all comments out of the line */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1578 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1579 if (*p == '#')
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1580 *p = '\0';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1581 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1582 p++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1583 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1584 /* see if line is nothing but whitespace and skip if so */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1585 p = line + strspn (line, " \t\n\r\f");
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1586 if (!*p)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1587 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1588 /* NOTE: It appears that MS Windows and Newlib sscanf() have
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1589 different interpretations for whitespace (== "skip all whitespace
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1590 at processing point"): Newlib requires at least one corresponding
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1591 whitespace character in the input, but MS allows none. The
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1592 following would be easier to write if we could count on the MS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1593 interpretation.
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1594
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1595 Also, the return value does NOT include %n storage. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1596 if ((!ignore_first_column ?
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1597 sscanf (p, "%i %i%n", &cp1, &cp2, &endcount) < 2 :
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1598 sscanf (p, "%i %i %i%n", &dummy, &cp1, &cp2, &endcount) < 3)
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1599 /* #### Temporary code! Cygwin newlib fucked up scanf() handling
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1600 of numbers beginning 0x0... starting in 04/2004, in an attempt
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1601 to fix another bug. A partial fix for this was put in in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1602 06/2004, but as of 10/2004 the value of ENDCOUNT returned in
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1603 such case is still wrong. If this gets fixed soon, remove
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1604 this code. --ben */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1605 #ifndef CYGWIN_SCANF_BUG
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1606 || *(p + endcount + strspn (p + endcount, " \t\n\r\f"))
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1607 #endif
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
1608 )
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1609 {
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1610 warn_when_safe (Qunicode, Qwarning,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1611 "Unrecognized line in translation file %s:\n%s",
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1612 XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1613 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1614 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1615 if (cp1 >= st && cp1 <= en)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1616 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1617 cp1 += of;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1618 if (cp1 < 0 || cp1 >= 65536)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1619 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1620 out_of_range:
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1621 warn_when_safe (Qunicode, Qwarning,
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1622 "Out of range first codepoint 0x%x in "
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1623 "translation file %s:\n%s",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1624 cp1, XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1625 continue;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1626 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1627
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1628 cp1high = cp1 >> 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1629 cp1low = cp1 & 255;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1630
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1631 if (big5)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1632 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1633 Ichar ch = decode_big5_char (cp1high, cp1low);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1634 if (ch == -1)
793
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1635
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1636 warn_when_safe (Qunicode, Qwarning,
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1637 "Out of range Big5 codepoint 0x%x in "
e38acbeb1cae [xemacs-hg @ 2002-03-29 04:46:17 by ben]
ben
parents: 788
diff changeset
1638 "translation file %s:\n%s",
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1639 cp1, XSTRING_DATA (filename), line);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1640 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1641 set_unicode_conversion (ch, cp2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1642 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1643 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1644 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1645 int l1, h1, l2, h2;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1646 Ichar emch;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1647
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1648 switch (XCHARSET_TYPE (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1649 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1650 case CHARSET_TYPE_94: l1 = 33; h1 = 126; l2 = 0; h2 = 0; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1651 case CHARSET_TYPE_96: l1 = 32; h1 = 127; l2 = 0; h2 = 0; break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1652 case CHARSET_TYPE_94X94: l1 = 33; h1 = 126; l2 = 33; h2 = 126;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1653 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1654 case CHARSET_TYPE_96X96: l1 = 32; h1 = 127; l2 = 32; h2 = 127;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1655 break;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1656 default: ABORT (); l1 = 0; h1 = 0; l2 = 0; h2 = 0;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1657 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1658
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1659 if (cp1high < l2 || cp1high > h2 || cp1low < l1 || cp1low > h1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1660 goto out_of_range;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1661
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1662 emch = (cp1high == 0 ? make_ichar (charset, cp1low, 0) :
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1663 make_ichar (charset, cp1high, cp1low));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1664 set_unicode_conversion (emch, cp2);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1665 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1666 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1667 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1668
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1669 if (ferror (file))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1670 report_file_error ("IO error when reading", filename);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1671
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1672 unbind_to (fondo); /* close file */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1673 UNGCPRO;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1674 return Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1675 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1676
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1677 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1678
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1679
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1680 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1681 /* Unicode coding system */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1682 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1683
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1684 struct unicode_coding_system
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1685 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1686 enum unicode_type type;
1887
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1687 unsigned int little_endian :1;
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1688 unsigned int need_bom :1;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1689 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1690
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1691 #define CODING_SYSTEM_UNICODE_TYPE(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1692 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1693 #define XCODING_SYSTEM_UNICODE_TYPE(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1694 CODING_SYSTEM_UNICODE_TYPE (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1695 #define CODING_SYSTEM_UNICODE_LITTLE_ENDIAN(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1696 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1697 #define XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1698 CODING_SYSTEM_UNICODE_LITTLE_ENDIAN (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1699 #define CODING_SYSTEM_UNICODE_NEED_BOM(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1700 (CODING_SYSTEM_TYPE_DATA (codesys, unicode)->need_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1701 #define XCODING_SYSTEM_UNICODE_NEED_BOM(codesys) \
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1702 CODING_SYSTEM_UNICODE_NEED_BOM (XCODING_SYSTEM (codesys))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1703
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1704 struct unicode_coding_stream
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1705 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1706 /* decode */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1707 unsigned char counter;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1708 unsigned char indicated_length;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1709 int seen_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1710 /* encode */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1711 Lisp_Object current_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1712 int current_char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1713 int wrote_bom;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1714 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1715
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1716 static const struct memory_description unicode_coding_system_description[] = {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1717 { XD_END }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1718 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1719
1204
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1720 DEFINE_CODING_SYSTEM_TYPE_WITH_DATA (unicode);
e22b0213b713 [xemacs-hg @ 2003-01-12 11:07:58 by michaels]
michaels
parents: 985
diff changeset
1721
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1722 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1723 decode_unicode_char (int ch, unsigned_char_dynarr *dst,
1887
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1724 struct unicode_coding_stream *data,
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1725 unsigned int ignore_bom)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1726 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1727 if (ch == 0xFEFF && !data->seen_char && ignore_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1728 ;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1729 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1730 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1731 #ifdef MULE
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
1732 Ichar chr = unicode_to_ichar (ch, unicode_precedence_dynarr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1733
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1734 if (chr != -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1735 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1736 Ibyte work[MAX_ICHAR_LEN];
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1737 int len;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1738
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1739 len = set_itext_ichar (work, chr);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1740 Dynarr_add_many (dst, work, len);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1741 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1742 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1743 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1744 Dynarr_add (dst, LEADING_BYTE_JAPANESE_JISX0208);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1745 Dynarr_add (dst, 34 + 128);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1746 Dynarr_add (dst, 46 + 128);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1747 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1748 #else
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1749 Dynarr_add (dst, (Ibyte) ch);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1750 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1751 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1752
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1753 data->seen_char = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1754 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1755
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1756 #define DECODE_ERROR_OCTET(octet, dst, data, ignore_bom) \
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1757 decode_unicode_char ((octet) + UNICODE_ERROR_OCTET_RANGE_START, \
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1758 dst, data, ignore_bom)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1759
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1760 static inline void
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1761 indicate_invalid_utf_8 (unsigned char indicated_length,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1762 unsigned char counter,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1763 int ch, unsigned_char_dynarr *dst,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1764 struct unicode_coding_stream *data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1765 unsigned int ignore_bom)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1766 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1767 Binbyte stored = indicated_length - counter;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1768 Binbyte mask = "\x00\x00\xC0\xE0\xF0\xF8\xFC"[indicated_length];
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1769
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1770 while (stored > 0)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1771 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1772 DECODE_ERROR_OCTET (((ch >> (6 * (stored - 1))) & 0x3f) | mask,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1773 dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1774 mask = 0x80, stored--;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1775 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1776 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1777
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1778 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1779 encode_unicode_char_1 (int code, unsigned_char_dynarr *dst,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1780 enum unicode_type type, unsigned int little_endian,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1781 int write_error_characters_as_such)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1782 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1783 switch (type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1784 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1785 case UNICODE_UTF_16:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1786 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1787 {
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1788 if (code < 0x10000) {
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1789 Dynarr_add (dst, (unsigned char) (code & 255));
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1790 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1791 } else if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1792 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1793 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1794 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1795 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1796 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1797 else if (code < 0x110000)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1798 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1799 /* Little endian; least significant byte first. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1800 int first, second;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1801
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1802 CODE_TO_UTF_16_SURROGATES(code, first, second);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1803
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1804 Dynarr_add (dst, (unsigned char) (first & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1805 Dynarr_add (dst, (unsigned char) ((first >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1806
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1807 Dynarr_add (dst, (unsigned char) (second & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1808 Dynarr_add (dst, (unsigned char) ((second >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1809 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1810 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1811 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1812 /* Not valid Unicode. Pass U+FFFD, least significant byte
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1813 first. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1814 Dynarr_add (dst, (unsigned char) 0xFD);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1815 Dynarr_add (dst, (unsigned char) 0xFF);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1816 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1817 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1818 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1819 {
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1820 if (code < 0x10000) {
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1821 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
1822 Dynarr_add (dst, (unsigned char) (code & 255));
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1823 } else if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1824 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1825 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1826 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1827 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1828 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1829 else if (code < 0x110000)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1830 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1831 /* Big endian; most significant byte first. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1832 int first, second;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1833
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1834 CODE_TO_UTF_16_SURROGATES(code, first, second);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1835
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1836 Dynarr_add (dst, (unsigned char) ((first >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1837 Dynarr_add (dst, (unsigned char) (first & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1838
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1839 Dynarr_add (dst, (unsigned char) ((second >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1840 Dynarr_add (dst, (unsigned char) (second & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1841 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1842 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1843 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1844 /* Not valid Unicode. Pass U+FFFD, most significant byte
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1845 first. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1846 Dynarr_add (dst, (unsigned char) 0xFF);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1847 Dynarr_add (dst, (unsigned char) 0xFD);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1848 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1849 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1850 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1851
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1852 case UNICODE_UCS_4:
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1853 case UNICODE_UTF_32:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1854 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1855 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1856 if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1857 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1858 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1859 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1860 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1861 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1862 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1863 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1864 /* We generate and accept incorrect sequences here, which is
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1865 okay, in the interest of preservation of the user's
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1866 data. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1867 Dynarr_add (dst, (unsigned char) (code & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1868 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1869 Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1870 Dynarr_add (dst, (unsigned char) (code >> 24));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1871 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1872 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1873 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1874 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1875 if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1876 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1877 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1878 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1879 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1880 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1881 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1882 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1883 /* We generate and accept incorrect sequences here, which is okay,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1884 in the interest of preservation of the user's data. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1885 Dynarr_add (dst, (unsigned char) (code >> 24));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1886 Dynarr_add (dst, (unsigned char) ((code >> 16) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1887 Dynarr_add (dst, (unsigned char) ((code >> 8) & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1888 Dynarr_add (dst, (unsigned char) (code & 255));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1889 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1890 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1891 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1892
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1893 case UNICODE_UTF_8:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1894 if (code <= 0x7f)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1895 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1896 Dynarr_add (dst, (unsigned char) code);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1897 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1898 else if (code <= 0x7ff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1899 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1900 Dynarr_add (dst, (unsigned char) ((code >> 6) | 0xc0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1901 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1902 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1903 else if (code <= 0xffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1904 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1905 Dynarr_add (dst, (unsigned char) ((code >> 12) | 0xe0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1906 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1907 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1908 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1909 else if (code <= 0x1fffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1910 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1911 Dynarr_add (dst, (unsigned char) ((code >> 18) | 0xf0));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1912 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1913 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1914 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1915 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1916 else if (code <= 0x3ffffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1917 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1918
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1919 #if !(UNICODE_ERROR_OCTET_RANGE_START > 0x1fffff \
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1920 && UNICODE_ERROR_OCTET_RANGE_START < 0x3ffffff)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1921 #error "This code needs to be rewritten. "
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1922 #endif
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1923 if (write_error_characters_as_such &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1924 code >= UNICODE_ERROR_OCTET_RANGE_START &&
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1925 code < (UNICODE_ERROR_OCTET_RANGE_START + 0x100))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1926 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1927 Dynarr_add (dst, (unsigned char) ((code & 0xFF)));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1928 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1929 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1930 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1931 Dynarr_add (dst, (unsigned char) ((code >> 24) | 0xf8));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1932 Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1933 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1934 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1935 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1936 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1937 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1938 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1939 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1940 Dynarr_add (dst, (unsigned char) ((code >> 30) | 0xfc));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1941 Dynarr_add (dst, (unsigned char) (((code >> 24) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1942 Dynarr_add (dst, (unsigned char) (((code >> 18) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1943 Dynarr_add (dst, (unsigned char) (((code >> 12) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1944 Dynarr_add (dst, (unsigned char) (((code >> 6) & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1945 Dynarr_add (dst, (unsigned char) ((code & 0x3f) | 0x80));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1946 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1947 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1948
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1949 case UNICODE_UTF_7: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1950
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
1951 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1952 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1953 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1954
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1955 /* Also used in mule-coding.c for UTF-8 handling in ISO 2022-oriented
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1956 encodings. */
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
1957 void
2333
ba4677f54a05 [xemacs-hg @ 2004-10-14 17:26:18 by james]
james
parents: 2286
diff changeset
1958 encode_unicode_char (Lisp_Object USED_IF_MULE (charset), int h,
ba4677f54a05 [xemacs-hg @ 2004-10-14 17:26:18 by james]
james
parents: 2286
diff changeset
1959 int USED_IF_MULE (l), unsigned_char_dynarr *dst,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1960 enum unicode_type type, unsigned int little_endian,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1961 int write_error_characters_as_such)
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1962 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1963 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
1964 int code = ichar_to_unicode (make_ichar (charset, h & 127, l & 127));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1965
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1966 if (code == -1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1967 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1968 if (type != UNICODE_UTF_16 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1969 XCHARSET_DIMENSION (charset) == 2 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1970 XCHARSET_CHARS (charset) == 94)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1971 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1972 unsigned char final = XCHARSET_FINAL (charset);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1973
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1974 if (('@' <= final) && (final < 0x7f))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1975 code = (0xe00000 + (final - '@') * 94 * 94
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1976 + ((h & 127) - 33) * 94 + (l & 127) - 33);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1977 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1978 code = '?';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1979 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1980 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1981 code = '?';
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1982 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1983 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1984 int code = h;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1985 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1986
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1987 encode_unicode_char_1 (code, dst, type, little_endian,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
1988 write_error_characters_as_such);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1989 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1990
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1991 static Bytecount
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1992 unicode_convert (struct coding_stream *str, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1993 unsigned_char_dynarr *dst, Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1994 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1995 unsigned int ch = str->ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1996 struct unicode_coding_stream *data = CODING_STREAM_TYPE_DATA (str, unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1997 enum unicode_type type =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
1998 XCODING_SYSTEM_UNICODE_TYPE (str->codesys);
1887
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
1999 unsigned int little_endian =
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
2000 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (str->codesys);
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
2001 unsigned int ignore_bom = XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2002 Bytecount orign = n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2003
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2004 if (str->direction == CODING_DECODE)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2005 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2006 unsigned char counter = data->counter;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2007 unsigned char indicated_length
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2008 = data->indicated_length;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2009
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2010 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2011 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2012 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2013
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2014 switch (type)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2015 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2016 case UNICODE_UTF_8:
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2017 if (0 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2018 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2019 if (0 == (c & 0x80))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2020 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2021 /* ASCII. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2022 decode_unicode_char (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2023 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2024 else if (0 == (c & 0x40))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2025 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2026 /* Highest bit set, second highest not--there's
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2027 something wrong. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2028 DECODE_ERROR_OCTET (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2029 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2030 else if (0 == (c & 0x20))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2031 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2032 ch = c & 0x1f;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2033 counter = 1;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2034 indicated_length = 2;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2035 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2036 else if (0 == (c & 0x10))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2037 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2038 ch = c & 0x0f;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2039 counter = 2;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2040 indicated_length = 3;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2041 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2042 else if (0 == (c & 0x08))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2043 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2044 ch = c & 0x0f;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2045 counter = 3;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2046 indicated_length = 4;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2047 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2048 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2049 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2050 /* We don't supports lengths longer than 4 in
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2051 external-format data. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2052 DECODE_ERROR_OCTET (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2053
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2054 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2055 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2056 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2057 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2058 /* counter != 0 */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2059 if ((0 == (c & 0x80)) || (0 != (c & 0x40)))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2060 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2061 indicate_invalid_utf_8(indicated_length,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2062 counter,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2063 ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2064 if (c & 0x80)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2065 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2066 DECODE_ERROR_OCTET (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2067 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2068 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2069 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2070 /* The character just read is ASCII. Treat it as
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2071 such. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2072 decode_unicode_char (c, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2073 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2074 ch = 0;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2075 counter = 0;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2076 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2077 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2078 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2079 ch = (ch << 6) | (c & 0x3f);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2080 counter--;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2081 /* Just processed the final byte. Emit the character. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2082 if (!counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2083 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2084 /* Don't accept over-long sequences, surrogates,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2085 or codes above #x10FFFF. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2086 if ((ch < 0x80) ||
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2087 ((ch < 0x800) && indicated_length > 2) ||
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2088 ((ch < 0x10000) && indicated_length > 3) ||
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2089 valid_utf_16_surrogate(ch) || (ch > 0x110000))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2090 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2091 indicate_invalid_utf_8(indicated_length,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2092 counter,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2093 ch, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2094 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2095 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2096 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2097 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2098 decode_unicode_char (ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2099 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2100 ch = 0;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2101 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2102 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2103 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2104 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2105
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2106 case UNICODE_UTF_16:
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2107
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2108 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2109 ch = (c << counter) | ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2110 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2111 ch = (ch << 8) | c;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2112
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2113 counter += 8;
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2114
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2115 if (16 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2116 {
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2117 int tempch = ch;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2118
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2119 if (valid_utf_16_first_surrogate(ch))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2120 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2121 break;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2122 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2123 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2124 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2125 decode_unicode_char (tempch, dst, data, ignore_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2126 }
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2127 else if (32 == counter)
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2128 {
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2129 int tempch;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2130
4583
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2131 if (little_endian)
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2132 {
4583
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2133 if (!valid_utf_16_last_surrogate(ch >> 16))
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2134 {
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2135 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2136 ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2137 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2138 ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2139 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2140 ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2141 DECODE_ERROR_OCTET ((ch >> 24) & 0xFF, dst, data,
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2142 ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2143 }
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2144 else
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2145 {
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2146 tempch = utf_16_surrogates_to_code((ch & 0xffff),
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2147 (ch >> 16));
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2148 decode_unicode_char(tempch, dst, data, ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2149 }
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2150 }
4583
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2151 else
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2152 {
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2153 if (!valid_utf_16_last_surrogate(ch & 0xFFFF))
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2154 {
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2155 DECODE_ERROR_OCTET ((ch >> 24) & 0xFF, dst, data,
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2156 ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2157 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2158 ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2159 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2160 ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2161 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2162 ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2163 }
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2164 else
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2165 {
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2166 tempch = utf_16_surrogates_to_code((ch >> 16),
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2167 (ch & 0xffff));
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2168 decode_unicode_char(tempch, dst, data, ignore_bom);
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2169 }
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2170 }
2669b1b7e33b Correct little-endian UTF-16 surrogate handling.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4270
diff changeset
2171
3952
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2172 ch = 0;
3584cb2c07db [xemacs-hg @ 2007-05-13 11:11:28 by aidan]
aidan
parents: 3767
diff changeset
2173 counter = 0;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2174 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2175 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2176 assert(8 == counter || 24 == counter);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2177 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2178
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2179 case UNICODE_UCS_4:
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2180 case UNICODE_UTF_32:
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2181 if (little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2182 ch = (c << counter) | ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2183 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2184 ch = (ch << 8) | c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2185 counter += 8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2186 if (counter == 32)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2187 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2188 if (ch > 0x10ffff)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2189 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2190 /* ch is not a legal Unicode character. We're fine
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2191 with that in UCS-4, though not in UTF-32. */
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2192 if (UNICODE_UCS_4 == type && ch < 0x80000000)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2193 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2194 decode_unicode_char (ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2195 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2196 else if (little_endian)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2197 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2198 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2199 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2200 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2201 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2202 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2203 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2204 DECODE_ERROR_OCTET ((ch >> 24) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2205 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2206 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2207 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2208 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2209 DECODE_ERROR_OCTET ((ch >> 24) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2210 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2211 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2212 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2213 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2214 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2215 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2216 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2217 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2218 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2219 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2220 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2221 decode_unicode_char (ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2222 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2223 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2224 counter = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2225 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2226 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2227
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2228 case UNICODE_UTF_7:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
2229 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2230 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2231
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
2232 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2233 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2234
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2235 }
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2236
4688
7e54adf407a1 Fix a bug with Unicode error sequences and very short input strings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4583
diff changeset
2237 if (str->eof && counter)
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2238 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2239 switch (type)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2240 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2241 case UNICODE_UTF_8:
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2242 indicate_invalid_utf_8(indicated_length,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2243 counter, ch, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2244 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2245 break;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2246
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2247 case UNICODE_UTF_16:
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2248 case UNICODE_UCS_4:
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2249 case UNICODE_UTF_32:
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2250 if (8 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2251 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2252 DECODE_ERROR_OCTET (ch, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2253 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2254 else if (16 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2255 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2256 if (little_endian)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2257 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2258 DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2259 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2260 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2261 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2262 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2263 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2264 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2265 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2266 DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2267 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2268 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2269 else if (24 == counter)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2270 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2271 if (little_endian)
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2272 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2273 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2274 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2275 DECODE_ERROR_OCTET (ch & 0xFF, dst, data, ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2276 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2277 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2278 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2279 else
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2280 {
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2281 DECODE_ERROR_OCTET ((ch >> 16) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2282 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2283 DECODE_ERROR_OCTET ((ch >> 8) & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2284 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2285 DECODE_ERROR_OCTET (ch & 0xFF, dst, data,
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2286 ignore_bom);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2287 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2288 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2289 else assert(0);
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2290 break;
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2291 }
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2292 ch = 0;
4688
7e54adf407a1 Fix a bug with Unicode error sequences and very short input strings.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4583
diff changeset
2293 counter = 0;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2294 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2295
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2296 data->counter = counter;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2297 data->indicated_length = indicated_length;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2298 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2299 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2300 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2301 unsigned char char_boundary = data->current_char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2302 Lisp_Object charset = data->current_charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2303
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2304 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2305 /* flags for handling composite chars. We do a little switcheroo
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2306 on the source while we're outputting the composite char. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2307 Bytecount saved_n = 0;
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
2308 const Ibyte *saved_src = NULL;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2309 int in_composite = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2310
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2311 back_to_square_n:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2312 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2313
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2314 if (XCODING_SYSTEM_UNICODE_NEED_BOM (str->codesys) && !data->wrote_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2315 {
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2316 encode_unicode_char_1 (0xFEFF, dst, type, little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2317 data->wrote_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2318 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2319
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2320 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2321 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
2322 Ibyte c = *src++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2323
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2324 #ifdef MULE
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2325 if (byte_ascii_p (c))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2326 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2327 { /* Processing ASCII character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2328 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2329 encode_unicode_char (Vcharset_ascii, c, 0, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2330 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2331
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2332 char_boundary = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2333 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2334 #ifdef MULE
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
2335 else if (ibyte_leading_byte_p (c) || ibyte_leading_byte_p (ch))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2336 { /* Processing Leading Byte */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2337 ch = 0;
826
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2338 charset = charset_by_leading_byte (c);
6728e641994e [xemacs-hg @ 2002-05-05 11:30:15 by ben]
ben
parents: 800
diff changeset
2339 if (leading_byte_prefix_p(c))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2340 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2341 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2342 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2343 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2344 { /* Processing Non-ASCII character */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2345 char_boundary = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2346 if (EQ (charset, Vcharset_control_1))
2704
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2347 /* See:
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2348
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2349 (Info-goto-node "(internals)Internal String Encoding")
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2350
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2351 for the rationale behind subtracting #xa0 from the
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2352 character's code. */
114400ea911b [xemacs-hg @ 2005-03-31 14:56:37 by aidan]
aidan
parents: 2622
diff changeset
2353 encode_unicode_char (Vcharset_control_1, c - 0xa0, 0, dst,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2354 type, little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2355 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2356 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2357 switch (XCHARSET_REP_BYTES (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2358 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2359 case 2:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2360 encode_unicode_char (charset, c, 0, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2361 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2362 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2363 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2364 if (XCHARSET_PRIVATE_P (charset))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2365 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2366 encode_unicode_char (charset, c, 0, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2367 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2368 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2369 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2370 else if (ch)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2371 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2372 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2373 if (EQ (charset, Vcharset_composite))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2374 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2375 if (in_composite)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2376 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2377 /* #### Bother! We don't know how to
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2378 handle this yet. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2379 encode_unicode_char (Vcharset_ascii, '~', 0,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2380 dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2381 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2382 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2383 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2384 {
867
804517e16990 [xemacs-hg @ 2002-06-05 09:54:39 by ben]
ben
parents: 826
diff changeset
2385 Ichar emch = make_ichar (Vcharset_composite,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2386 ch & 0x7F,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2387 c & 0x7F);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2388 Lisp_Object lstr =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2389 composite_char_string (emch);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2390 saved_n = n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2391 saved_src = src;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2392 in_composite = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2393 src = XSTRING_DATA (lstr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2394 n = XSTRING_LENGTH (lstr);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2395 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2396 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2397 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2398 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2399 encode_unicode_char (charset, ch, c, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2400 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2401 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2402 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2403 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2404 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2405 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2406 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2407 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2408 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2409 case 4:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2410 if (ch)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2411 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2412 encode_unicode_char (charset, ch, c, dst, type,
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2413 little_endian, 1);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2414 ch = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2415 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2416 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2417 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2418 ch = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2419 char_boundary = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2420 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2421 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2422 default:
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
2423 ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2424 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2425 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2426 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2427 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2428 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2429
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2430 #ifdef ENABLE_COMPOSITE_CHARS
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2431 if (in_composite)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2432 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2433 n = saved_n;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2434 src = saved_src;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2435 in_composite = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2436 goto back_to_square_n; /* Wheeeeeeeee ..... */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2437 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2438 #endif /* ENABLE_COMPOSITE_CHARS */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2439
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2440 data->current_char_boundary = char_boundary;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2441 data->current_charset = charset;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2442
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2443 /* La palabra se hizo carne! */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2444 /* A palavra fez-se carne! */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2445 /* Whatever. */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2446 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2447
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2448 str->ch = ch;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2449 return orign;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2450 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2451
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2452 /* DEFINE_DETECTOR (utf_7); */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2453 DEFINE_DETECTOR (utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2454 DEFINE_DETECTOR_CATEGORY (utf_8, utf_8);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2455 DEFINE_DETECTOR_CATEGORY (utf_8, utf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2456 DEFINE_DETECTOR (ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2457 DEFINE_DETECTOR_CATEGORY (ucs_4, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2458 DEFINE_DETECTOR (utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2459 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2460 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2461 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2462 DEFINE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2463
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2464 struct ucs_4_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2465 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2466 int in_ucs_4_byte;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2467 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2468
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2469 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2470 ucs_4_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2471 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2472 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2473 struct ucs_4_detector *data = DETECTION_STATE_DATA (st, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2474
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2475 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2476 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2477 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2478 switch (data->in_ucs_4_byte)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2479 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2480 case 0:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2481 if (c >= 128)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2482 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2483 DET_RESULT (st, ucs_4) = DET_NEARLY_IMPOSSIBLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2484 return;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2485 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2486 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2487 data->in_ucs_4_byte++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2488 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2489 case 3:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2490 data->in_ucs_4_byte = 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2491 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2492 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2493 data->in_ucs_4_byte++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2494 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2495 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2496
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2497 /* !!#### write this for real */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2498 DET_RESULT (st, ucs_4) = DET_AS_LIKELY_AS_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2499 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2500
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2501 struct utf_16_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2502 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2503 unsigned int seen_ffff:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2504 unsigned int seen_forward_bom:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2505 unsigned int seen_rev_bom:1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2506 int byteno;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2507 int prev_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2508 int text, rev_text;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2509 int sep, rev_sep;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2510 int num_ascii;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2511 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2512
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2513 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2514 utf_16_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2515 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2516 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2517 struct utf_16_detector *data = DETECTION_STATE_DATA (st, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2518
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2519 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2520 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2521 UExtbyte c = *src++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2522 int prevc = data->prev_char;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2523 if (data->byteno == 1 && c == 0xFF && prevc == 0xFE)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2524 data->seen_forward_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2525 else if (data->byteno == 1 && c == 0xFE && prevc == 0xFF)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2526 data->seen_rev_bom = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2527
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2528 if (data->byteno & 1)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2529 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2530 if (c == 0xFF && prevc == 0xFF)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2531 data->seen_ffff = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2532 if (prevc == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2533 && (c == '\r' || c == '\n'
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2534 || (c >= 0x20 && c <= 0x7E)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2535 data->text++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2536 if (c == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2537 && (prevc == '\r' || prevc == '\n'
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2538 || (prevc >= 0x20 && prevc <= 0x7E)))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2539 data->rev_text++;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2540 /* #### 0x2028 is LINE SEPARATOR and 0x2029 is PARAGRAPH SEPARATOR.
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2541 I used to count these in text and rev_text but that is very bad,
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2542 as 0x2028 is also space + left-paren in ASCII, which is extremely
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2543 common. So, what do we do with these? */
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2544 if (prevc == 0x20 && (c == 0x28 || c == 0x29))
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2545 data->sep++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2546 if (c == 0x20 && (prevc == 0x28 || prevc == 0x29))
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2547 data->rev_sep++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2548 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2549
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2550 if ((c >= ' ' && c <= '~') || c == '\n' || c == '\r' || c == '\t' ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2551 c == '\f' || c == '\v')
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2552 data->num_ascii++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2553 data->byteno++;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2554 data->prev_char = c;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2555 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2556
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2557 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2558 int variance_indicates_big_endian =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2559 (data->text >= 10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2560 && (data->rev_text == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2561 || data->text / data->rev_text >= 10));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2562 int variance_indicates_little_endian =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2563 (data->rev_text >= 10
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2564 && (data->text == 0
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2565 || data->rev_text / data->text >= 10));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2566
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2567 if (data->seen_ffff)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2568 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2569 else if (data->seen_forward_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2570 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2571 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2572 if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2573 DET_RESULT (st, utf_16_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2574 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2575 DET_RESULT (st, utf_16_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2576 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2577 DET_RESULT (st, utf_16_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2578 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2579 else if (data->seen_forward_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2580 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2581 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2582 if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2583 DET_RESULT (st, utf_16_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2584 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2585 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2586 DET_RESULT (st, utf_16_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2587 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2588 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2589 DET_RESULT (st, utf_16_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2590 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2591 else if (data->seen_rev_bom)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2592 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2593 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2594 if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2595 DET_RESULT (st, utf_16_little_endian_bom) = DET_NEAR_CERTAINTY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2596 else if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2597 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2598 DET_RESULT (st, utf_16_little_endian_bom) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2599 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2600 /* #### may need to rethink */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2601 DET_RESULT (st, utf_16_little_endian_bom) = DET_QUITE_PROBABLE;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2602 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2603 else if (variance_indicates_big_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2604 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2605 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2606 DET_RESULT (st, utf_16) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2607 DET_RESULT (st, utf_16_little_endian) = DET_SOMEWHAT_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2608 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2609 else if (variance_indicates_little_endian)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2610 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2611 SET_DET_RESULTS (st, utf_16, DET_NEARLY_IMPOSSIBLE);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2612 DET_RESULT (st, utf_16) = DET_SOMEWHAT_UNLIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2613 DET_RESULT (st, utf_16_little_endian) = DET_SOMEWHAT_LIKELY;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2614 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2615 else
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2616 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2617 /* #### FUCKME! There should really be an ASCII detector. This
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2618 would rule out the need to have this built-in here as
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2619 well. --ben */
1292
f3437b56874d [xemacs-hg @ 2003-02-13 09:57:04 by ben]
ben
parents: 1267
diff changeset
2620 int pct_ascii = data->byteno ? (100 * data->num_ascii) / data->byteno
f3437b56874d [xemacs-hg @ 2003-02-13 09:57:04 by ben]
ben
parents: 1267
diff changeset
2621 : 100;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2622
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2623 if (pct_ascii > 90)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2624 SET_DET_RESULTS (st, utf_16, DET_QUITE_IMPROBABLE);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2625 else if (pct_ascii > 75)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2626 SET_DET_RESULTS (st, utf_16, DET_SOMEWHAT_UNLIKELY);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2627 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2628 SET_DET_RESULTS (st, utf_16, DET_AS_LIKELY_AS_UNLIKELY);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2629 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2630 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2631 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2632
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2633 struct utf_8_detector
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2634 {
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2635 int byteno;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2636 int first_byte;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2637 int second_byte;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2638 int prev_byte;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2639 int in_utf_8_byte;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2640 int recent_utf_8_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2641 int seen_bogus_utf8;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2642 int seen_really_bogus_utf8;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2643 int seen_2byte_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2644 int seen_longer_sequence;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2645 int seen_iso2022_esc;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2646 int seen_iso_shift;
1887
1e5b7843dfa0 [xemacs-hg @ 2004-01-27 17:55:15 by james]
james
parents: 1726
diff changeset
2647 unsigned int seen_utf_bom:1;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2648 };
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2649
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2650 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2651 utf_8_detect (struct detection_state *st, const UExtbyte *src,
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2652 Bytecount n)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2653 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2654 struct utf_8_detector *data = DETECTION_STATE_DATA (st, utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2655
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2656 while (n--)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2657 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2658 UExtbyte c = *src++;
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2659 switch (data->byteno)
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2660 {
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2661 case 0:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2662 data->first_byte = c;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2663 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2664 case 1:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2665 data->second_byte = c;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2666 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2667 case 2:
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2668 if (data->first_byte == 0xef &&
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2669 data->second_byte == 0xbb &&
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2670 c == 0xbf)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2671 data->seen_utf_bom = 1;
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2672 break;
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2673 }
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2674
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2675 switch (data->in_utf_8_byte)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2676 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2677 case 0:
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2678 if (data->prev_byte == ISO_CODE_ESC && c >= 0x28 && c <= 0x2F)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2679 data->seen_iso2022_esc++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2680 else if (c == ISO_CODE_SI || c == ISO_CODE_SO)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2681 data->seen_iso_shift++;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2682 else if (c >= 0xfc)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2683 data->in_utf_8_byte = 5;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2684 else if (c >= 0xf8)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2685 data->in_utf_8_byte = 4;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2686 else if (c >= 0xf0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2687 data->in_utf_8_byte = 3;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2688 else if (c >= 0xe0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2689 data->in_utf_8_byte = 2;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2690 else if (c >= 0xc0)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2691 data->in_utf_8_byte = 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2692 else if (c >= 0x80)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2693 data->seen_bogus_utf8++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2694 if (data->in_utf_8_byte > 0)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2695 data->recent_utf_8_sequence = data->in_utf_8_byte;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2696 break;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2697 default:
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2698 if ((c & 0xc0) != 0x80)
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2699 data->seen_really_bogus_utf8++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2700 else
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2701 {
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2702 data->in_utf_8_byte--;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2703 if (data->in_utf_8_byte == 0)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2704 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2705 if (data->recent_utf_8_sequence == 1)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2706 data->seen_2byte_sequence++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2707 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2708 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2709 assert (data->recent_utf_8_sequence >= 2);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2710 data->seen_longer_sequence++;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2711 }
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2712 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2713 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2714 }
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2715
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
2716 data->byteno++;
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2717 data->prev_byte = c;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2718 }
1267
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2719
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2720 /* either BOM or no BOM, but not both */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2721 SET_DET_RESULTS (st, utf_8, DET_NEARLY_IMPOSSIBLE);
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2722
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2723
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2724 if (data->seen_utf_bom)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2725 DET_RESULT (st, utf_8_bom) = DET_NEAR_CERTAINTY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2726 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2727 {
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2728 if (data->seen_really_bogus_utf8 ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2729 data->seen_bogus_utf8 >= 2)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2730 ; /* bogus */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2731 else if (data->seen_bogus_utf8)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2732 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2733 else if ((data->seen_longer_sequence >= 5 ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2734 data->seen_2byte_sequence >= 10) &&
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2735 (!(data->seen_iso2022_esc + data->seen_iso_shift) ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2736 (data->seen_longer_sequence * 2 + data->seen_2byte_sequence) /
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2737 (data->seen_iso2022_esc + data->seen_iso_shift) >= 10))
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2738 /* heuristics, heuristics, we love heuristics */
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2739 DET_RESULT (st, utf_8) = DET_QUITE_PROBABLE;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2740 else if (data->seen_iso2022_esc ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2741 data->seen_iso_shift >= 3)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2742 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2743 else if (data->seen_longer_sequence ||
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2744 data->seen_2byte_sequence)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2745 DET_RESULT (st, utf_8) = DET_SOMEWHAT_LIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2746 else if (data->seen_iso_shift)
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2747 DET_RESULT (st, utf_8) = DET_SOMEWHAT_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2748 else
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2749 DET_RESULT (st, utf_8) = DET_AS_LIKELY_AS_UNLIKELY;
c57f32e44416 [xemacs-hg @ 2003-02-07 01:43:05 by ben]
ben
parents: 1204
diff changeset
2750 }
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2751 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2752
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2753 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2754 unicode_init_coding_stream (struct coding_stream *str)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2755 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2756 struct unicode_coding_stream *data =
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2757 CODING_STREAM_TYPE_DATA (str, unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2758 xzero (*data);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2759 data->current_charset = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2760 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2761
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2762 static void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2763 unicode_rewind_coding_stream (struct coding_stream *str)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2764 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2765 unicode_init_coding_stream (str);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2766 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2767
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2768 static int
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2769 unicode_putprop (Lisp_Object codesys, Lisp_Object key, Lisp_Object value)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2770 {
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3659
diff changeset
2771 if (EQ (key, Qunicode_type))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2772 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2773 enum unicode_type type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2774
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2775 if (EQ (value, Qutf_8))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2776 type = UNICODE_UTF_8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2777 else if (EQ (value, Qutf_16))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2778 type = UNICODE_UTF_16;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2779 else if (EQ (value, Qutf_7))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2780 type = UNICODE_UTF_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2781 else if (EQ (value, Qucs_4))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2782 type = UNICODE_UCS_4;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2783 else if (EQ (value, Qutf_32))
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2784 type = UNICODE_UTF_32;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2785 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2786 invalid_constant ("Invalid Unicode type", key);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2787
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2788 XCODING_SYSTEM_UNICODE_TYPE (codesys) = type;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2789 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2790 else if (EQ (key, Qlittle_endian))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2791 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (codesys) = !NILP (value);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2792 else if (EQ (key, Qneed_bom))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2793 XCODING_SYSTEM_UNICODE_NEED_BOM (codesys) = !NILP (value);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2794 else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2795 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2796 return 1;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2797 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2798
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2799 static Lisp_Object
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2800 unicode_getprop (Lisp_Object coding_system, Lisp_Object prop)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2801 {
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3659
diff changeset
2802 if (EQ (prop, Qunicode_type))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2803 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2804 switch (XCODING_SYSTEM_UNICODE_TYPE (coding_system))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2805 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2806 case UNICODE_UTF_16: return Qutf_16;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2807 case UNICODE_UTF_8: return Qutf_8;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2808 case UNICODE_UTF_7: return Qutf_7;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2809 case UNICODE_UCS_4: return Qucs_4;
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
2810 case UNICODE_UTF_32: return Qutf_32;
2500
3d8143fc88e1 [xemacs-hg @ 2005-01-24 23:33:30 by ben]
ben
parents: 2367
diff changeset
2811 default: ABORT ();
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2812 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2813 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2814 else if (EQ (prop, Qlittle_endian))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2815 return XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (coding_system) ? Qt : Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2816 else if (EQ (prop, Qneed_bom))
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2817 return XCODING_SYSTEM_UNICODE_NEED_BOM (coding_system) ? Qt : Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2818 return Qunbound;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2819 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2820
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2821 static void
2286
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2822 unicode_print (Lisp_Object cs, Lisp_Object printcharfun,
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
2823 int UNUSED (escapeflag))
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2824 {
3767
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3659
diff changeset
2825 write_fmt_string_lisp (printcharfun, "(%s", 1,
6b2ef948e140 [xemacs-hg @ 2006-12-29 18:09:38 by aidan]
aidan
parents: 3659
diff changeset
2826 unicode_getprop (cs, Qunicode_type));
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2827 if (XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (cs))
4952
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
2828 write_ascstring (printcharfun, ", little-endian");
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2829 if (XCODING_SYSTEM_UNICODE_NEED_BOM (cs))
4952
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
2830 write_ascstring (printcharfun, ", need-bom");
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
2831 write_ascstring (printcharfun, ")");
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2832 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
2833
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2834 #ifdef MULE
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2835 DEFUN ("set-unicode-query-skip-chars-args", Fset_unicode_query_skip_chars_args,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2836 3, 3, 0, /*
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2837 Specify strings as matching characters known to Unicode coding systems.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2838
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2839 QUERY-STRING is a string matching characters that can unequivocally be
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2840 encoded by the Unicode coding systems.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2841
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2842 INVALID-STRING is a string to match XEmacs characters that represent known
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2843 octets on disk, but that are invalid sequences according to Unicode.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2844
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2845 UTF-8-INVALID-STRING is a more restrictive string to match XEmacs characters
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2846 that are invalid UTF-8 octets.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2847
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2848 All three strings are in the format accepted by `skip-chars-forward'.
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2849 */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2850 (query_string, invalid_string, utf_8_invalid_string))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2851 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2852 CHECK_STRING (query_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2853 CHECK_STRING (invalid_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2854 CHECK_STRING (utf_8_invalid_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2855
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2856 Vunicode_query_string = query_string;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2857 Vunicode_invalid_string = invalid_string;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2858 Vutf_8_invalid_string = utf_8_invalid_string;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2859
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2860 return Qnil;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2861 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2862
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2863 static void
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2864 add_lisp_string_to_skip_chars_range (Lisp_Object string, Lisp_Object rtab,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2865 Lisp_Object value)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2866 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2867 Ibyte *p, *pend;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2868 Ichar c;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2869
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2870 p = XSTRING_DATA (string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2871 pend = p + XSTRING_LENGTH (string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2872
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2873 while (p != pend)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2874 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2875 c = itext_ichar (p);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2876
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2877 INC_IBYTEPTR (p);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2878
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2879 if (c == '\\')
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2880 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2881 if (p == pend) break;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2882 c = itext_ichar (p);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2883 INC_IBYTEPTR (p);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2884 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2885
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2886 if (p != pend && *p == '-')
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2887 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2888 Ichar cend;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2889
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2890 /* Skip over the dash. */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2891 p++;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2892 if (p == pend) break;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2893 cend = itext_ichar (p);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2894
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2895 Fput_range_table (make_int (c), make_int (cend), value,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2896 rtab);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2897
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2898 INC_IBYTEPTR (p);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2899 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2900 else
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2901 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2902 Fput_range_table (make_int (c), make_int (c), value, rtab);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2903 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2904 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2905 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2906
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2907 /* This function wouldn't be necessary if initialised range tables were
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2908 dumped properly; see
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2909 http://mid.gmane.org/18179.49815.622843.336527@parhasard.net . */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2910 static void
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2911 initialize_unicode_query_range_tables_from_strings (void)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2912 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2913 CHECK_STRING (Vunicode_query_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2914 CHECK_STRING (Vunicode_invalid_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2915 CHECK_STRING (Vutf_8_invalid_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2916
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2917 Vunicode_query_skip_chars = Fmake_range_table (Qstart_closed_end_closed);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2918
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2919 add_lisp_string_to_skip_chars_range (Vunicode_query_string,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2920 Vunicode_query_skip_chars,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2921 Qsucceeded);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2922
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2923 Vunicode_invalid_and_query_skip_chars
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2924 = Fcopy_range_table (Vunicode_query_skip_chars);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2925
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2926 add_lisp_string_to_skip_chars_range (Vunicode_invalid_string,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2927 Vunicode_invalid_and_query_skip_chars,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2928 Qinvalid_sequence);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2929
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2930 Vutf_8_invalid_and_query_skip_chars
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2931 = Fcopy_range_table (Vunicode_query_skip_chars);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2932
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2933 add_lisp_string_to_skip_chars_range (Vutf_8_invalid_string,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2934 Vutf_8_invalid_and_query_skip_chars,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2935 Qinvalid_sequence);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2936 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2937
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2938 static Lisp_Object
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2939 unicode_query (Lisp_Object codesys, struct buffer *buf, Charbpos end,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2940 int flags)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2941 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2942 Charbpos pos = BUF_PT (buf), fail_range_start, fail_range_end;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2943 Charbpos pos_byte = BYTE_BUF_PT (buf);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2944 Lisp_Object skip_chars_range_table, result = Qnil;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2945 enum query_coding_failure_reasons failed_reason,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2946 previous_failed_reason = query_coding_succeeded;
4824
c12b646d84ee changes to get things to compile under latest cygwin
Ben Wing <ben@xemacs.org>
parents: 4770
diff changeset
2947 int checked_unicode,
c12b646d84ee changes to get things to compile under latest cygwin
Ben Wing <ben@xemacs.org>
parents: 4770
diff changeset
2948 invalid_lower_limit = UNICODE_ERROR_OCTET_RANGE_START,
c12b646d84ee changes to get things to compile under latest cygwin
Ben Wing <ben@xemacs.org>
parents: 4770
diff changeset
2949 invalid_upper_limit = -1,
c12b646d84ee changes to get things to compile under latest cygwin
Ben Wing <ben@xemacs.org>
parents: 4770
diff changeset
2950 unicode_type = XCODING_SYSTEM_UNICODE_TYPE (codesys);
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2951
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2952 if (flags & QUERY_METHOD_HIGHLIGHT &&
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2953 /* If we're being called really early, live without highlights getting
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2954 cleared properly: */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2955 !(UNBOUNDP (XSYMBOL (Qquery_coding_clear_highlights)->function)))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2956 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2957 /* It's okay to call Lisp here, the only non-stack object we may have
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2958 allocated up to this point is skip_chars_range_table, and that's
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2959 reachable from its entry in Vfixed_width_query_ranges_cache. */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2960 call3 (Qquery_coding_clear_highlights, make_int (pos), make_int (end),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2961 wrap_buffer (buf));
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2962 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2963
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2964 if (NILP (Vunicode_query_skip_chars))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2965 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2966 initialize_unicode_query_range_tables_from_strings();
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2967 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2968
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2969 if (flags & QUERY_METHOD_IGNORE_INVALID_SEQUENCES)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2970 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2971 switch (unicode_type)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2972 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2973 case UNICODE_UTF_8:
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2974 skip_chars_range_table = Vutf_8_invalid_and_query_skip_chars;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2975 break;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2976 case UNICODE_UTF_7:
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2977 /* #### See above. */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2978 return Qunbound;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2979 break;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2980 default:
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2981 skip_chars_range_table = Vunicode_invalid_and_query_skip_chars;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2982 break;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2983 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2984 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2985 else
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2986 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2987 switch (unicode_type)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2988 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2989 case UNICODE_UTF_8:
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2990 invalid_lower_limit = UNICODE_ERROR_OCTET_RANGE_START + 0x80;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2991 invalid_upper_limit = UNICODE_ERROR_OCTET_RANGE_START + 0xFF;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2992 break;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2993 case UNICODE_UTF_7:
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2994 /* #### Work out what to do here in reality, read the spec and decide
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2995 which octets are invalid. */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2996 return Qunbound;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2997 break;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2998 default:
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
2999 invalid_lower_limit = UNICODE_ERROR_OCTET_RANGE_START;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3000 invalid_upper_limit = UNICODE_ERROR_OCTET_RANGE_START + 0xFF;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3001 break;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3002 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3003
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3004 skip_chars_range_table = Vunicode_query_skip_chars;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3005 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3006
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3007 while (pos < end)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3008 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3009 Ichar ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3010 if ((ch < 0x100 ? 1 :
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3011 (!EQ (Qnil, Fget_range_table (make_int (ch), skip_chars_range_table,
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3012 Qnil)))))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3013 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3014 pos++;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3015 INC_BYTEBPOS (buf, pos_byte);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3016 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3017 else
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3018 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3019 fail_range_start = pos;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3020 while ((pos < end) &&
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3021 ((checked_unicode = ichar_to_unicode (ch),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3022 -1 == checked_unicode
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3023 && (failed_reason = query_coding_unencodable))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3024 || (!(flags & QUERY_METHOD_IGNORE_INVALID_SEQUENCES) &&
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3025 (invalid_lower_limit <= checked_unicode) &&
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3026 (checked_unicode <= invalid_upper_limit)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3027 && (failed_reason = query_coding_invalid_sequence)))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3028 && (previous_failed_reason == query_coding_succeeded
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3029 || previous_failed_reason == failed_reason))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3030 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3031 pos++;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3032 INC_BYTEBPOS (buf, pos_byte);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3033 ch = BYTE_BUF_FETCH_CHAR (buf, pos_byte);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3034 previous_failed_reason = failed_reason;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3035 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3036
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3037 if (fail_range_start == pos)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3038 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3039 /* The character can actually be encoded; move on. */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3040 pos++;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3041 INC_BYTEBPOS (buf, pos_byte);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3042 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3043 else
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3044 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3045 assert (previous_failed_reason == query_coding_invalid_sequence
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3046 || previous_failed_reason == query_coding_unencodable);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3047
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3048 if (flags & QUERY_METHOD_ERRORP)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3049 {
4952
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3050 signal_error_2
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3051 (Qtext_conversion_error,
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3052 "Cannot encode using coding system",
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3053 make_string_from_buffer (buf, fail_range_start,
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3054 pos - fail_range_start),
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3055 XCODING_SYSTEM_NAME (codesys));
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3056 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3057
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3058 if (NILP (result))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3059 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3060 result = Fmake_range_table (Qstart_closed_end_open);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3061 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3062
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3063 fail_range_end = pos;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3064
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3065 Fput_range_table (make_int (fail_range_start),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3066 make_int (fail_range_end),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3067 (previous_failed_reason
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3068 == query_coding_unencodable ?
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3069 Qunencodable : Qinvalid_sequence),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3070 result);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3071 previous_failed_reason = query_coding_succeeded;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3072
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3073 if (flags & QUERY_METHOD_HIGHLIGHT)
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3074 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3075 Lisp_Object extent
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3076 = Fmake_extent (make_int (fail_range_start),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3077 make_int (fail_range_end),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3078 wrap_buffer (buf));
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3079
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3080 Fset_extent_priority
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3081 (extent, make_int (2 + mouse_highlight_priority));
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3082 Fset_extent_face (extent, Qquery_coding_warning_face);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3083 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3084 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3085 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3086 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3087
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3088 return result;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3089 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3090 #else /* !MULE */
4770
b9aaf2a18957 Add missing return value type to unicode_query.
Stephen J. Turnbull <stephen@xemacs.org>
parents: 4690
diff changeset
3091 static Lisp_Object
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3092 unicode_query (Lisp_Object UNUSED (codesys),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3093 struct buffer * UNUSED (buf),
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3094 Charbpos UNUSED (end), int UNUSED (flags))
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3095 {
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3096 return Qnil;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3097 }
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3098 #endif
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3099
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3100 int
2286
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
3101 dfc_coding_system_is_unicode (
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
3102 #ifdef WIN32_ANY
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
3103 Lisp_Object codesys
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
3104 #else
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
3105 Lisp_Object UNUSED (codesys)
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
3106 #endif
04bc9d2f42c7 [xemacs-hg @ 2004-09-20 19:18:55 by james]
james
parents: 1887
diff changeset
3107 )
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3108 {
1315
70921960b980 [xemacs-hg @ 2003-02-20 08:19:28 by ben]
ben
parents: 1292
diff changeset
3109 #ifdef WIN32_ANY
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3110 codesys = Fget_coding_system (codesys);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3111 return (EQ (XCODING_SYSTEM_TYPE (codesys), Qunicode) &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3112 XCODING_SYSTEM_UNICODE_TYPE (codesys) == UNICODE_UTF_16 &&
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3113 XCODING_SYSTEM_UNICODE_LITTLE_ENDIAN (codesys));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3114
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3115 #else
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3116 return 0;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3117 #endif
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3118 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3119
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3120
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3121 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3122 /* Initialization */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3123 /************************************************************************/
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3124
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3125 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3126 syms_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3127 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3128 #ifdef MULE
877
e54d47b2d736 [xemacs-hg @ 2002-06-23 09:54:35 by stephent]
stephent
parents: 872
diff changeset
3129 DEFSUBR (Funicode_precedence_list);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3130 DEFSUBR (Fset_language_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3131 DEFSUBR (Flanguage_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3132 DEFSUBR (Fset_default_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3133 DEFSUBR (Fdefault_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3134 DEFSUBR (Fset_unicode_conversion);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3135
1318
b531bf8658e9 [xemacs-hg @ 2003-02-21 06:56:46 by ben]
ben
parents: 1315
diff changeset
3136 DEFSUBR (Fload_unicode_mapping_table);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3137
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3138 DEFSUBR (Fset_unicode_query_skip_chars_args);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3139
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
3140 DEFSYMBOL (Qccl_encode_to_ucs_2);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
3141 DEFSYMBOL (Qlast_allocated_character);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3142 DEFSYMBOL (Qignore_first_column);
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3143
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3144 DEFSYMBOL (Qunicode_registries);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3145 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3146
800
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
3147 DEFSUBR (Fchar_to_unicode);
a5954632b187 [xemacs-hg @ 2002-03-31 08:27:14 by ben]
ben
parents: 793
diff changeset
3148 DEFSUBR (Funicode_to_char);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3149
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3150 DEFSYMBOL (Qunicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3151 DEFSYMBOL (Qucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3152 DEFSYMBOL (Qutf_16);
4096
1abf84db2c7f [xemacs-hg @ 2007-08-04 20:00:10 by aidan]
aidan
parents: 3952
diff changeset
3153 DEFSYMBOL (Qutf_32);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3154 DEFSYMBOL (Qutf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3155 DEFSYMBOL (Qutf_7);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3156
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3157 DEFSYMBOL (Qneed_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3158
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3159 DEFSYMBOL (Qutf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3160 DEFSYMBOL (Qutf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3161 DEFSYMBOL (Qutf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3162 DEFSYMBOL (Qutf_16_little_endian_bom);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
3163
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
3164 DEFSYMBOL (Qutf_8);
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
3165 DEFSYMBOL (Qutf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3166 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3167
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3168 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3169 coding_system_type_create_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3170 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3171 INITIALIZE_CODING_SYSTEM_TYPE_WITH_DATA (unicode, "unicode-coding-system-p");
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3172 CODING_SYSTEM_HAS_METHOD (unicode, print);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3173 CODING_SYSTEM_HAS_METHOD (unicode, convert);
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3174 CODING_SYSTEM_HAS_METHOD (unicode, query);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3175 CODING_SYSTEM_HAS_METHOD (unicode, init_coding_stream);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3176 CODING_SYSTEM_HAS_METHOD (unicode, rewind_coding_stream);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3177 CODING_SYSTEM_HAS_METHOD (unicode, putprop);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3178 CODING_SYSTEM_HAS_METHOD (unicode, getprop);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3179
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3180 INITIALIZE_DETECTOR (utf_8);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3181 DETECTOR_HAS_METHOD (utf_8, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3182 INITIALIZE_DETECTOR_CATEGORY (utf_8, utf_8);
985
7f62a956b825 [xemacs-hg @ 2002-09-01 06:41:40 by youngs]
youngs
parents: 893
diff changeset
3183 INITIALIZE_DETECTOR_CATEGORY (utf_8, utf_8_bom);
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3184
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3185 INITIALIZE_DETECTOR (ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3186 DETECTOR_HAS_METHOD (ucs_4, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3187 INITIALIZE_DETECTOR_CATEGORY (ucs_4, ucs_4);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3188
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3189 INITIALIZE_DETECTOR (utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3190 DETECTOR_HAS_METHOD (utf_16, detect);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3191 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3192 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3193 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3194 INITIALIZE_DETECTOR_CATEGORY (utf_16, utf_16_little_endian_bom);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3195 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3196
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3197 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3198 reinit_coding_system_type_create_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3199 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3200 REINITIALIZE_CODING_SYSTEM_TYPE (unicode);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3201 }
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3202
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3203 void
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3204 vars_of_unicode (void)
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3205 {
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3206 Fprovide (intern ("unicode"));
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3207
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3208 #ifdef MULE
4270
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
3209 staticpro (&Vnumber_of_jit_charsets);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
3210 Vnumber_of_jit_charsets = make_int (0);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
3211 staticpro (&Vlast_jit_charset_final);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
3212 Vlast_jit_charset_final = make_char (0x30);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
3213 staticpro (&Vcharset_descr);
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
3214 Vcharset_descr
4952
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3215 = build_defer_string ("Mule charset for otherwise unknown Unicode code points.");
4270
bd9b678f4db7 [xemacs-hg @ 2007-11-15 10:05:14 by aidan]
aidan
parents: 4268
diff changeset
3216
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3217 staticpro (&Vlanguage_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3218 Vlanguage_unicode_precedence_list = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3219
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3220 staticpro (&Vdefault_unicode_precedence_list);
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3221 Vdefault_unicode_precedence_list = Qnil;
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3222
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3223 unicode_precedence_dynarr = Dynarr_new (Lisp_Object);
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3224 dump_add_root_block_ptr (&unicode_precedence_dynarr,
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3225 &lisp_object_dynarr_description);
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3226
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3227
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3228
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3229 init_blank_unicode_tables ();
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3230
3439
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
3231 staticpro (&Vcurrent_jit_charset);
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
3232 Vcurrent_jit_charset = Qnil;
d1754e7f0cea [xemacs-hg @ 2006-06-03 17:50:39 by aidan]
aidan
parents: 3352
diff changeset
3233
2367
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3234 /* Note that the "block" we are describing is a single pointer, and hence
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3235 we could potentially use dump_add_root_block_ptr(). However, given
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3236 the way the descriptions are written, we couldn't use them, and would
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3237 have to write new descriptions for each of the pointers below, since
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3238 we would have to make use of a description with an XD_BLOCK_ARRAY
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3239 in it. */
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3240
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3241 dump_add_root_block (&to_unicode_blank_1, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3242 to_unicode_level_1_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3243 dump_add_root_block (&to_unicode_blank_2, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3244 to_unicode_level_2_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3245
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3246 dump_add_root_block (&from_unicode_blank_1, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3247 from_unicode_level_1_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3248 dump_add_root_block (&from_unicode_blank_2, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3249 from_unicode_level_2_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3250 dump_add_root_block (&from_unicode_blank_3, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3251 from_unicode_level_3_desc_1);
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3252 dump_add_root_block (&from_unicode_blank_4, sizeof (void *),
ecf1ebac70d8 [xemacs-hg @ 2004-11-04 23:05:23 by ben]
ben
parents: 2333
diff changeset
3253 from_unicode_level_4_desc_1);
3659
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3254
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3255 DEFVAR_LISP ("unicode-registries", &Qunicode_registries /*
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3256 Vector describing the X11 registries searched when using fallback fonts.
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3257
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3258 "Fallback fonts" here includes by default those fonts used by redisplay when
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3259 displaying charsets for which the `encode-as-utf-8' property is true, and
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3260 those used when no font matching the charset's registries property has been
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3261 found (that is, they're probably Mule-specific charsets like Ethiopic or
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3262 IPA.)
98af8a976fc3 [xemacs-hg @ 2006-11-05 22:31:31 by aidan]
aidan
parents: 3452
diff changeset
3263 */ );
4952
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3264 Qunicode_registries = vector1(build_ascstring("iso10646-1"));
4690
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3265
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3266 /* Initialised in lisp/mule/general-late.el, by a call to
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3267 #'set-unicode-query-skip-chars-args. Or at least they would be, but we
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3268 can't do this at dump time right now, initialised range tables aren't
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3269 dumped properly. */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3270 staticpro (&Vunicode_invalid_and_query_skip_chars);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3271 Vunicode_invalid_and_query_skip_chars = Qnil;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3272 staticpro (&Vutf_8_invalid_and_query_skip_chars);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3273 Vutf_8_invalid_and_query_skip_chars = Qnil;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3274 staticpro (&Vunicode_query_skip_chars);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3275 Vunicode_query_skip_chars = Qnil;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3276
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3277 /* If we could dump the range table above these wouldn't be necessary: */
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3278 staticpro (&Vunicode_query_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3279 Vunicode_query_string = Qnil;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3280 staticpro (&Vunicode_invalid_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3281 Vunicode_invalid_string = Qnil;
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3282 staticpro (&Vutf_8_invalid_string);
257b468bf2ca Move the #'query-coding-region implementation to C.
Aidan Kehoe <kehoea@parhasard.net>
parents: 4688
diff changeset
3283 Vutf_8_invalid_string = Qnil;
771
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3284 #endif /* MULE */
943eaba38521 [xemacs-hg @ 2002-03-13 08:51:24 by ben]
ben
parents:
diff changeset
3285 }
4834
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3286
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3287 void
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3288 complex_vars_of_unicode (void)
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3289 {
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3290 /* We used to define this in unicode.el. But we need it early for
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3291 Cygwin 1.7 -- used in LOCAL_FILE_FORMAT_TO_TSTR() et al. */
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3292 Fmake_coding_system_internal
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3293 (Qutf_8, Qunicode,
4952
19a72041c5ed Mule-izing, various fixes related to char * arguments
Ben Wing <ben@xemacs.org>
parents: 4834
diff changeset
3294 build_defer_string ("UTF-8"),
5345
db326b8fe982 Use Ben's recently-introduced listu (), where appropriate.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5307
diff changeset
3295 listu (Qdocumentation,
db326b8fe982 Use Ben's recently-introduced listu (), where appropriate.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5307
diff changeset
3296 build_defer_string (
4834
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3297 "UTF-8 Unicode encoding -- ASCII-compatible 8-bit variable-width encoding\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3298 "sharing the following principles with the Mule-internal encoding:\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3299 "\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3300 " -- All ASCII characters (codepoints 0 through 127) are represented\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3301 " by themselves (i.e. using one byte, with the same value as the\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3302 " ASCII codepoint), and these bytes are disjoint from bytes\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3303 " representing non-ASCII characters.\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3304 "\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3305 " This means that any 8-bit clean application can safely process\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3306 " UTF-8-encoded text as it were ASCII, with no corruption (e.g. a\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3307 " '/' byte is always a slash character, never the second byte of\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3308 " some other character, as with Big5, so a pathname encoded in\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3309 " UTF-8 can safely be split up into components and reassembled\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3310 " again using standard ASCII processes).\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3311 "\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3312 " -- Leading bytes and non-leading bytes in the encoding of a\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3313 " character are disjoint, so moving backwards is easy.\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3314 "\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3315 " -- Given only the leading byte, you know how many following bytes\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3316 " are present.\n"
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3317 ),
5345
db326b8fe982 Use Ben's recently-introduced listu (), where appropriate.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5307
diff changeset
3318 Qmnemonic, build_ascstring ("UTF8"),
db326b8fe982 Use Ben's recently-introduced listu (), where appropriate.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5307
diff changeset
3319 Qunicode_type, Qutf_8,
db326b8fe982 Use Ben's recently-introduced listu (), where appropriate.
Aidan Kehoe <kehoea@parhasard.net>
parents: 5307
diff changeset
3320 Qunbound));
4834
b3ea9c582280 Use new cygwin_conv_path API with Cygwin 1.7 for converting names between Win32 and POSIX, UTF-8-aware, with attendant changes elsewhere
Ben Wing <ben@xemacs.org>
parents: 4824
diff changeset
3321 }