xemacs-beta: etc/unicode/README comparison

comparison etc/unicode/README @ 771:943eaba38521

[xemacs-hg @ 2002-03-13 08:51:24 by ben] The big ben-mule-21-5 check-in! Various files were added and deleted. See CHANGES-ben-mule. There are still some test suite failures. No crashes, though. Many of the failures have to do with problems in the test suite itself rather than in the actual code. I'll be addressing these in the next day or so -- none of the test suite failures are at all critical. Meanwhile I'll be trying to address the biggest issues -- i.e. build or run failures, which will almost certainly happen on various platforms. All comments should be sent to ben@xemacs.org -- use a Cc: if necessary when sending to mailing lists. There will be pre- and post- tags, something like pre-ben-mule-21-5-merge-in, and post-ben-mule-21-5-merge-in.

author	ben
date	Wed, 13 Mar 2002 08:54:06 +0000
parents
children	a29c4eef8f00

comparison

equal deleted inserted replaced

-:336a418893b5
+:943eaba38521
+This directory contains Unicode translation tables for most of the
+charsets in XEmacs.
+The tables in unicode-consortium/ come from:
+http://www.unicode.org/Public/MAPPINGS/
+The tables in ibm/ come from:
+http://oss.software.ibm.com/icu/charset/
+Someone needs to write a simple program to parse these tables.  You
+should use the tables in unicode-consortium/; the ones in ibm/ can be
+used to supplement or check the accuracy of the others.
+Perhaps the best way is to put some C code in XEmacs, probably in the
+form of a Lisp primitive, to parse a table in a specified file and add
+the appropriate Unicode mappings using set_unicode_conversion.  Then
+it will be easy to read the tables at dump time.  Doing it this way
+avoids the need to create large Elisp files solely to initialize the
+tables, or embed a bunch of initializing data in the C code.
+I'd suggest this:
+DEFUN ("parse-unicode-translation-table", ..., 2, 5, 0 /*
+Parse Unicode translation data in FILENAME for CHARSET.
+Data is text, in the form of one translation per line -- charset codepoint
+followed by Unicode codepoint.  Numbers are decimal or hex (preceded by 0x).
+Comments are marked with a #.
+If START and END are given, only charset codepoints within the given range
+will be processed.  If OFFSET is given, that value will be added to all
+charset codepoints in the file to obtain the internal charset codepoint.
+(#### This still doesn't handle Big5 tables.  Either we need to special-case
+this or allow a CCL program or Lisp routine to do the conversion.)
+*/
+(filename, charset, start, end, offset))
+{
+}

Mercurial > hg > xemacs-beta

comparison etc/unicode/README @ 771:943eaba38521