comparison etc/unicode/README @ 779:a29c4eef8f00

[xemacs-hg @ 2002-03-18 09:40:27 by ben] add more translation tables [from mule-ucs], fixup README's
author ben
date Mon, 18 Mar 2002 09:40:41 +0000
parents 943eaba38521
children
comparison
equal deleted inserted replaced
778:2923009caf47 779:a29c4eef8f00
7 7
8 The tables in ibm/ come from: 8 The tables in ibm/ come from:
9 9
10 http://oss.software.ibm.com/icu/charset/ 10 http://oss.software.ibm.com/icu/charset/
11 11
12 Someone needs to write a simple program to parse these tables. You 12 The tables in unicode-consortium/ should be used as source data; the ones
13 should use the tables in unicode-consortium/; the ones in ibm/ can be 13 in ibm/ can be used to supplement or check the accuracy of the others.
14 used to supplement or check the accuracy of the others.
15
16 Perhaps the best way is to put some C code in XEmacs, probably in the
17 form of a Lisp primitive, to parse a table in a specified file and add
18 the appropriate Unicode mappings using set_unicode_conversion. Then
19 it will be easy to read the tables at dump time. Doing it this way
20 avoids the need to create large Elisp files solely to initialize the
21 tables, or embed a bunch of initializing data in the C code.
22
23 I'd suggest this:
24
25 DEFUN ("parse-unicode-translation-table", ..., 2, 5, 0 /*
26 Parse Unicode translation data in FILENAME for CHARSET.
27 Data is text, in the form of one translation per line -- charset codepoint
28 followed by Unicode codepoint. Numbers are decimal or hex (preceded by 0x).
29 Comments are marked with a #.
30
31 If START and END are given, only charset codepoints within the given range
32 will be processed. If OFFSET is given, that value will be added to all
33 charset codepoints in the file to obtain the internal charset codepoint.
34
35 (#### This still doesn't handle Big5 tables. Either we need to special-case
36 this or allow a CCL program or Lisp routine to do the conversion.)
37 */
38 (filename, charset, start, end, offset))
39 {
40
41 }
42