Mercurial > hg > xemacs-beta
comparison etc/unicode/README @ 779:a29c4eef8f00
[xemacs-hg @ 2002-03-18 09:40:27 by ben]
add more translation tables [from mule-ucs], fixup README's
| author | ben |
|---|---|
| date | Mon, 18 Mar 2002 09:40:41 +0000 |
| parents | 943eaba38521 |
| children |
comparison
equal
deleted
inserted
replaced
| 778:2923009caf47 | 779:a29c4eef8f00 |
|---|---|
| 7 | 7 |
| 8 The tables in ibm/ come from: | 8 The tables in ibm/ come from: |
| 9 | 9 |
| 10 http://oss.software.ibm.com/icu/charset/ | 10 http://oss.software.ibm.com/icu/charset/ |
| 11 | 11 |
| 12 Someone needs to write a simple program to parse these tables. You | 12 The tables in unicode-consortium/ should be used as source data; the ones |
| 13 should use the tables in unicode-consortium/; the ones in ibm/ can be | 13 in ibm/ can be used to supplement or check the accuracy of the others. |
| 14 used to supplement or check the accuracy of the others. | |
| 15 | |
| 16 Perhaps the best way is to put some C code in XEmacs, probably in the | |
| 17 form of a Lisp primitive, to parse a table in a specified file and add | |
| 18 the appropriate Unicode mappings using set_unicode_conversion. Then | |
| 19 it will be easy to read the tables at dump time. Doing it this way | |
| 20 avoids the need to create large Elisp files solely to initialize the | |
| 21 tables, or embed a bunch of initializing data in the C code. | |
| 22 | |
| 23 I'd suggest this: | |
| 24 | |
| 25 DEFUN ("parse-unicode-translation-table", ..., 2, 5, 0 /* | |
| 26 Parse Unicode translation data in FILENAME for CHARSET. | |
| 27 Data is text, in the form of one translation per line -- charset codepoint | |
| 28 followed by Unicode codepoint. Numbers are decimal or hex (preceded by 0x). | |
| 29 Comments are marked with a #. | |
| 30 | |
| 31 If START and END are given, only charset codepoints within the given range | |
| 32 will be processed. If OFFSET is given, that value will be added to all | |
| 33 charset codepoints in the file to obtain the internal charset codepoint. | |
| 34 | |
| 35 (#### This still doesn't handle Big5 tables. Either we need to special-case | |
| 36 this or allow a CCL program or Lisp routine to do the conversion.) | |
| 37 */ | |
| 38 (filename, charset, start, end, offset)) | |
| 39 { | |
| 40 | |
| 41 } | |
| 42 |
