771
|
1 This directory contains Unicode translation tables for most of the
|
|
2 charsets in XEmacs.
|
|
3
|
|
4 The tables in unicode-consortium/ come from:
|
|
5
|
|
6 http://www.unicode.org/Public/MAPPINGS/
|
|
7
|
|
8 The tables in ibm/ come from:
|
|
9
|
|
10 http://oss.software.ibm.com/icu/charset/
|
|
11
|
|
12 Someone needs to write a simple program to parse these tables. You
|
|
13 should use the tables in unicode-consortium/; the ones in ibm/ can be
|
|
14 used to supplement or check the accuracy of the others.
|
|
15
|
|
16 Perhaps the best way is to put some C code in XEmacs, probably in the
|
|
17 form of a Lisp primitive, to parse a table in a specified file and add
|
|
18 the appropriate Unicode mappings using set_unicode_conversion. Then
|
|
19 it will be easy to read the tables at dump time. Doing it this way
|
|
20 avoids the need to create large Elisp files solely to initialize the
|
|
21 tables, or embed a bunch of initializing data in the C code.
|
|
22
|
|
23 I'd suggest this:
|
|
24
|
|
25 DEFUN ("parse-unicode-translation-table", ..., 2, 5, 0 /*
|
|
26 Parse Unicode translation data in FILENAME for CHARSET.
|
|
27 Data is text, in the form of one translation per line -- charset codepoint
|
|
28 followed by Unicode codepoint. Numbers are decimal or hex (preceded by 0x).
|
|
29 Comments are marked with a #.
|
|
30
|
|
31 If START and END are given, only charset codepoints within the given range
|
|
32 will be processed. If OFFSET is given, that value will be added to all
|
|
33 charset codepoints in the file to obtain the internal charset codepoint.
|
|
34
|
|
35 (#### This still doesn't handle Big5 tables. Either we need to special-case
|
|
36 this or allow a CCL program or Lisp routine to do the conversion.)
|
|
37 */
|
|
38 (filename, charset, start, end, offset))
|
|
39 {
|
|
40
|
|
41 }
|
|
42
|