comparison man/lispref/mule.texi @ 775:7d972c3de90a

[xemacs-hg @ 2002-03-14 11:50:12 by stephent] New 21.5 Info docs, misc. <87r8mn8j4v.fsf@tleeps18.sk.tsukuba.ac.jp>
author stephent
date Thu, 14 Mar 2002 11:50:17 +0000
parents 576fb035e263
children 37e56e920ac5
comparison
equal deleted inserted replaced
774:703228f54913 775:7d972c3de90a
1 @c -*-texinfo-*- 1 @c -*-texinfo-*-
2 @c This is part of the XEmacs Lisp Reference Manual. 2 @c This is part of the XEmacs Lisp Reference Manual.
3 @c Copyright (C) 1996 Ben Wing. 3 @c Copyright (C) 1996 Ben Wing, 2001-2002 Free Software Foundation.
4 @c See the file lispref.texi for copying conditions. 4 @c See the file lispref.texi for copying conditions.
5 @setfilename ../../info/internationalization.info 5 @setfilename ../../info/internationalization.info
6 @node MULE, Tips, Internationalization, top 6 @node MULE, Tips, Internationalization, top
7 @chapter MULE 7 @chapter MULE
8 8
21 * MULE Characters:: Working with characters in XEmacs/MULE. 21 * MULE Characters:: Working with characters in XEmacs/MULE.
22 * Composite Characters:: Making new characters by overstriking other ones. 22 * Composite Characters:: Making new characters by overstriking other ones.
23 * Coding Systems:: Ways of representing a string of chars using integers. 23 * Coding Systems:: Ways of representing a string of chars using integers.
24 * CCL:: A special language for writing fast converters. 24 * CCL:: A special language for writing fast converters.
25 * Category Tables:: Subdividing charsets into groups. 25 * Category Tables:: Subdividing charsets into groups.
26 * Unicode Support:: The universal coded character set.
26 @end menu 27 @end menu
27 28
28 @node Internationalization Terminology, Charsets, , MULE 29 @node Internationalization Terminology, Charsets, , MULE
29 @section Internationalization Terminology 30 @section Internationalization Terminology
30 31
2007 @comment Node, Next, Previous, Up 2008 @comment Node, Next, Previous, Up
2008 @subsection CCL Examples 2009 @subsection CCL Examples
2009 2010
2010 This section is not yet written. 2011 This section is not yet written.
2011 2012
2012 @node Category Tables, , CCL, MULE 2013 @node Category Tables, Unicode Support, CCL, MULE
2013 @section Category Tables 2014 @section Category Tables
2014 2015
2015 A category table is a type of char table used for keeping track of 2016 A category table is a type of char table used for keeping track of
2016 categories. Categories are used for classifying characters for use in 2017 categories. Categories are used for classifying characters for use in
2017 regexps---you can refer to a category rather than having to use a 2018 regexps---you can refer to a category rather than having to use a
2067 @defun category-table-value-p object 2068 @defun category-table-value-p object
2068 This function returns @code{t} if @var{object} is a category table value. 2069 This function returns @code{t} if @var{object} is a category table value.
2069 Valid values are @code{nil} or a bit vector of size 95. 2070 Valid values are @code{nil} or a bit vector of size 95.
2070 @end defun 2071 @end defun
2071 2072
2073
2074 @c Added 2002-03-13 sjt
2075 @node Unicode Support, , Category Tables, MULE
2076 @section Unicode Support
2077 @cindex unicode
2078 @cindex utf-8
2079 @cindex utf-16
2080 @cindex ucs-2
2081 @cindex ucs-4
2082 @cindex bmp
2083 @cindex basic multilingual plance
2084
2085 Unicode support was added by Ben Wing to XEmacs 21.5.6.
2086
2087 @defun set-language-unicode-precedence-list list
2088 Set the language-specific precedence list used for Unicode decoding.
2089 This is a list of charsets, which are consulted in order for a translation
2090 matching a given Unicode character. If no matches are found, the charsets
2091 in the default precedence list (see
2092 @code{set-default-unicode-precedence-list}) are consulted, and then all
2093 remaining charsets, in some arbitrary order.
2094
2095 The language-specific precedence list is meant to be set as part of the
2096 language environment initialization; the default precedence list is meant
2097 to be set by the user.
2098 @end defun
2099
2100 @defun language-unicode-precedence-list
2101 Return the language-specific precedence list used for Unicode decoding.
2102 See @code{set-language-unicode-precedence-list} for more information.
2103 @end defun
2104
2105 @defun set-default-unicode-precedence-list list
2106 Set the default precedence list used for Unicode decoding.
2107 This is meant to be set by the user. See
2108 `set-language-unicode-precedence-list' for more information.
2109 @end defun
2110
2111 @defun default-unicode-precedence-list
2112 Return the default precedence list used for Unicode decoding.
2113 See @code{set-language-unicode-precedence-list} for more information.
2114 @end defun
2115
2116 @defun set-unicode-conversion character code
2117 Add conversion information between Unicode codepoints and characters.
2118 @var{character} is one of the following:
2119
2120 @c #### fix this markup
2121 -- A character (in which case @var{code} must be a non-negative integer)
2122 -- A vector of characters (in which case @var{code} must be a vector of
2123 non-negative integers of the same length)
2124
2125 Values of @var{code} above 2^20 - 1 are allowed for the purpose of specifying
2126 private characters, but will cause errors when converted to UTF-16 or UTF-32.
2127 UCS-4 and UTF-8 can handle values to 2^31 - 1, but XEmacs Lisp integers top
2128 out at 2^30 - 1.
2129 @end defun
2130
2131 @defun character-to-unicode character
2132 Convert @var{character} to Unicode codepoint.
2133 When there is no international support (i.e. MULE is not defined),
2134 this function simply does @code{char-to-int}.
2135 @end defun
2136
2137 @defun unicode-to-character code [charsets]
2138 Convert Unicode codepoint @var{code} to character.
2139 @var{code} should be a non-negative integer.
2140 If @var{charsets} is given, it should be a list of charsets, and only those
2141 charsets will be consulted, in the given order, for a translation.
2142 Otherwise, the default ordering of all charsets will be given (see
2143 @code{set-unicode-charset-precedence}).
2144
2145 When there is no international support (i.e. MULE is not defined),
2146 this function simply does @code{int-to-char} and ignores the
2147 @var{charsets} argument.
2148 @end defun
2149
2150 @defun parse-unicode-translation-table filename charset start end offset flags
2151 Parse Unicode translation data in @var{filename} for MULE @var{charset}.
2152 Data is text, in the form of one translation per line -- charset
2153 codepoint followed by Unicode codepoint. Numbers are decimal or hex
2154 \(preceded by 0x). Comments are marked with a #. Charset codepoints
2155 for two-dimensional charsets should have the first octet stored in the
2156 high 8 bits of the hex number and the second in the low 8 bits.
2157
2158 If @var{start} and @var{end} are given, only charset codepoints within
2159 the given range will be processed. If @var{offset} is given, that value
2160 will be added to all charset codepoints in the file to obtain the
2161 internal charset codepoint. @var{start} and @var{end} apply to the
2162 codepoints in the file, before @var{offset} is applied.
2163
2164 (Note that, as usual, we assume that octets are in the range 32 to
2165 127 or 33 to 126. If you have a table in kuten form, with octets in
2166 the range 1 to 94, you will have to use an offset of 5140,
2167 i.e. 0x2020.)
2168
2169 @var{flags}, if specified, control further how the tables are interpreted
2170 and are used to special-case certain known table weirdnesses in the
2171 Unicode tables:
2172
2173 @table @code
2174 @item ignore-first-column'
2175 Exactly as it sounds. The JIS X 0208 tables have 3 columns of data instead
2176 of 2; the first is the Shift-JIS codepoint.
2177
2178 @item big5
2179 The charset codepoint is a Big Five codepoint; convert it to the
2180 proper hacked-up codepoint in `chinese-big5-1' or `chinese-big5-2'.
2181 @end table
2182 @end defun
2183