Mercurial > hg > xemacs-beta
diff man/lispref/mule.texi @ 775:7d972c3de90a
[xemacs-hg @ 2002-03-14 11:50:12 by stephent]
New 21.5 Info docs, misc. <87r8mn8j4v.fsf@tleeps18.sk.tsukuba.ac.jp>
author | stephent |
---|---|
date | Thu, 14 Mar 2002 11:50:17 +0000 |
parents | 576fb035e263 |
children | 37e56e920ac5 |
line wrap: on
line diff
--- a/man/lispref/mule.texi Thu Mar 14 03:54:28 2002 +0000 +++ b/man/lispref/mule.texi Thu Mar 14 11:50:17 2002 +0000 @@ -1,6 +1,6 @@ @c -*-texinfo-*- @c This is part of the XEmacs Lisp Reference Manual. -@c Copyright (C) 1996 Ben Wing. +@c Copyright (C) 1996 Ben Wing, 2001-2002 Free Software Foundation. @c See the file lispref.texi for copying conditions. @setfilename ../../info/internationalization.info @node MULE, Tips, Internationalization, top @@ -23,6 +23,7 @@ * Coding Systems:: Ways of representing a string of chars using integers. * CCL:: A special language for writing fast converters. * Category Tables:: Subdividing charsets into groups. +* Unicode Support:: The universal coded character set. @end menu @node Internationalization Terminology, Charsets, , MULE @@ -2009,7 +2010,7 @@ This section is not yet written. -@node Category Tables, , CCL, MULE +@node Category Tables, Unicode Support, CCL, MULE @section Category Tables A category table is a type of char table used for keeping track of @@ -2069,3 +2070,114 @@ Valid values are @code{nil} or a bit vector of size 95. @end defun + +@c Added 2002-03-13 sjt +@node Unicode Support, , Category Tables, MULE +@section Unicode Support +@cindex unicode +@cindex utf-8 +@cindex utf-16 +@cindex ucs-2 +@cindex ucs-4 +@cindex bmp +@cindex basic multilingual plance + +Unicode support was added by Ben Wing to XEmacs 21.5.6. + +@defun set-language-unicode-precedence-list list +Set the language-specific precedence list used for Unicode decoding. +This is a list of charsets, which are consulted in order for a translation +matching a given Unicode character. If no matches are found, the charsets +in the default precedence list (see +@code{set-default-unicode-precedence-list}) are consulted, and then all +remaining charsets, in some arbitrary order. + +The language-specific precedence list is meant to be set as part of the +language environment initialization; the default precedence list is meant +to be set by the user. +@end defun + +@defun language-unicode-precedence-list +Return the language-specific precedence list used for Unicode decoding. +See @code{set-language-unicode-precedence-list} for more information. +@end defun + +@defun set-default-unicode-precedence-list list +Set the default precedence list used for Unicode decoding. +This is meant to be set by the user. See +`set-language-unicode-precedence-list' for more information. +@end defun + +@defun default-unicode-precedence-list +Return the default precedence list used for Unicode decoding. +See @code{set-language-unicode-precedence-list} for more information. +@end defun + +@defun set-unicode-conversion character code +Add conversion information between Unicode codepoints and characters. +@var{character} is one of the following: + +@c #### fix this markup +-- A character (in which case @var{code} must be a non-negative integer) +-- A vector of characters (in which case @var{code} must be a vector of + non-negative integers of the same length) + +Values of @var{code} above 2^20 - 1 are allowed for the purpose of specifying +private characters, but will cause errors when converted to UTF-16 or UTF-32. +UCS-4 and UTF-8 can handle values to 2^31 - 1, but XEmacs Lisp integers top +out at 2^30 - 1. +@end defun + +@defun character-to-unicode character +Convert @var{character} to Unicode codepoint. +When there is no international support (i.e. MULE is not defined), +this function simply does @code{char-to-int}. +@end defun + +@defun unicode-to-character code [charsets] +Convert Unicode codepoint @var{code} to character. +@var{code} should be a non-negative integer. +If @var{charsets} is given, it should be a list of charsets, and only those +charsets will be consulted, in the given order, for a translation. +Otherwise, the default ordering of all charsets will be given (see +@code{set-unicode-charset-precedence}). + +When there is no international support (i.e. MULE is not defined), +this function simply does @code{int-to-char} and ignores the +@var{charsets} argument. +@end defun + +@defun parse-unicode-translation-table filename charset start end offset flags +Parse Unicode translation data in @var{filename} for MULE @var{charset}. +Data is text, in the form of one translation per line -- charset +codepoint followed by Unicode codepoint. Numbers are decimal or hex +\(preceded by 0x). Comments are marked with a #. Charset codepoints +for two-dimensional charsets should have the first octet stored in the +high 8 bits of the hex number and the second in the low 8 bits. + +If @var{start} and @var{end} are given, only charset codepoints within +the given range will be processed. If @var{offset} is given, that value +will be added to all charset codepoints in the file to obtain the +internal charset codepoint. @var{start} and @var{end} apply to the +codepoints in the file, before @var{offset} is applied. + +(Note that, as usual, we assume that octets are in the range 32 to +127 or 33 to 126. If you have a table in kuten form, with octets in +the range 1 to 94, you will have to use an offset of 5140, +i.e. 0x2020.) + +@var{flags}, if specified, control further how the tables are interpreted +and are used to special-case certain known table weirdnesses in the +Unicode tables: + +@table @code +@item ignore-first-column' +Exactly as it sounds. The JIS X 0208 tables have 3 columns of data instead +of 2; the first is the Shift-JIS codepoint. + +@item big5 +The charset codepoint is a Big Five codepoint; convert it to the +proper hacked-up codepoint in `chinese-big5-1' or `chinese-big5-2'. +@end table +@end defun +