Mercurial > hg > xemacs-beta
comparison man/lispref/mule.texi @ 775:7d972c3de90a
[xemacs-hg @ 2002-03-14 11:50:12 by stephent]
New 21.5 Info docs, misc. <87r8mn8j4v.fsf@tleeps18.sk.tsukuba.ac.jp>
author | stephent |
---|---|
date | Thu, 14 Mar 2002 11:50:17 +0000 |
parents | 576fb035e263 |
children | 37e56e920ac5 |
comparison
equal
deleted
inserted
replaced
774:703228f54913 | 775:7d972c3de90a |
---|---|
1 @c -*-texinfo-*- | 1 @c -*-texinfo-*- |
2 @c This is part of the XEmacs Lisp Reference Manual. | 2 @c This is part of the XEmacs Lisp Reference Manual. |
3 @c Copyright (C) 1996 Ben Wing. | 3 @c Copyright (C) 1996 Ben Wing, 2001-2002 Free Software Foundation. |
4 @c See the file lispref.texi for copying conditions. | 4 @c See the file lispref.texi for copying conditions. |
5 @setfilename ../../info/internationalization.info | 5 @setfilename ../../info/internationalization.info |
6 @node MULE, Tips, Internationalization, top | 6 @node MULE, Tips, Internationalization, top |
7 @chapter MULE | 7 @chapter MULE |
8 | 8 |
21 * MULE Characters:: Working with characters in XEmacs/MULE. | 21 * MULE Characters:: Working with characters in XEmacs/MULE. |
22 * Composite Characters:: Making new characters by overstriking other ones. | 22 * Composite Characters:: Making new characters by overstriking other ones. |
23 * Coding Systems:: Ways of representing a string of chars using integers. | 23 * Coding Systems:: Ways of representing a string of chars using integers. |
24 * CCL:: A special language for writing fast converters. | 24 * CCL:: A special language for writing fast converters. |
25 * Category Tables:: Subdividing charsets into groups. | 25 * Category Tables:: Subdividing charsets into groups. |
26 * Unicode Support:: The universal coded character set. | |
26 @end menu | 27 @end menu |
27 | 28 |
28 @node Internationalization Terminology, Charsets, , MULE | 29 @node Internationalization Terminology, Charsets, , MULE |
29 @section Internationalization Terminology | 30 @section Internationalization Terminology |
30 | 31 |
2007 @comment Node, Next, Previous, Up | 2008 @comment Node, Next, Previous, Up |
2008 @subsection CCL Examples | 2009 @subsection CCL Examples |
2009 | 2010 |
2010 This section is not yet written. | 2011 This section is not yet written. |
2011 | 2012 |
2012 @node Category Tables, , CCL, MULE | 2013 @node Category Tables, Unicode Support, CCL, MULE |
2013 @section Category Tables | 2014 @section Category Tables |
2014 | 2015 |
2015 A category table is a type of char table used for keeping track of | 2016 A category table is a type of char table used for keeping track of |
2016 categories. Categories are used for classifying characters for use in | 2017 categories. Categories are used for classifying characters for use in |
2017 regexps---you can refer to a category rather than having to use a | 2018 regexps---you can refer to a category rather than having to use a |
2067 @defun category-table-value-p object | 2068 @defun category-table-value-p object |
2068 This function returns @code{t} if @var{object} is a category table value. | 2069 This function returns @code{t} if @var{object} is a category table value. |
2069 Valid values are @code{nil} or a bit vector of size 95. | 2070 Valid values are @code{nil} or a bit vector of size 95. |
2070 @end defun | 2071 @end defun |
2071 | 2072 |
2073 | |
2074 @c Added 2002-03-13 sjt | |
2075 @node Unicode Support, , Category Tables, MULE | |
2076 @section Unicode Support | |
2077 @cindex unicode | |
2078 @cindex utf-8 | |
2079 @cindex utf-16 | |
2080 @cindex ucs-2 | |
2081 @cindex ucs-4 | |
2082 @cindex bmp | |
2083 @cindex basic multilingual plance | |
2084 | |
2085 Unicode support was added by Ben Wing to XEmacs 21.5.6. | |
2086 | |
2087 @defun set-language-unicode-precedence-list list | |
2088 Set the language-specific precedence list used for Unicode decoding. | |
2089 This is a list of charsets, which are consulted in order for a translation | |
2090 matching a given Unicode character. If no matches are found, the charsets | |
2091 in the default precedence list (see | |
2092 @code{set-default-unicode-precedence-list}) are consulted, and then all | |
2093 remaining charsets, in some arbitrary order. | |
2094 | |
2095 The language-specific precedence list is meant to be set as part of the | |
2096 language environment initialization; the default precedence list is meant | |
2097 to be set by the user. | |
2098 @end defun | |
2099 | |
2100 @defun language-unicode-precedence-list | |
2101 Return the language-specific precedence list used for Unicode decoding. | |
2102 See @code{set-language-unicode-precedence-list} for more information. | |
2103 @end defun | |
2104 | |
2105 @defun set-default-unicode-precedence-list list | |
2106 Set the default precedence list used for Unicode decoding. | |
2107 This is meant to be set by the user. See | |
2108 `set-language-unicode-precedence-list' for more information. | |
2109 @end defun | |
2110 | |
2111 @defun default-unicode-precedence-list | |
2112 Return the default precedence list used for Unicode decoding. | |
2113 See @code{set-language-unicode-precedence-list} for more information. | |
2114 @end defun | |
2115 | |
2116 @defun set-unicode-conversion character code | |
2117 Add conversion information between Unicode codepoints and characters. | |
2118 @var{character} is one of the following: | |
2119 | |
2120 @c #### fix this markup | |
2121 -- A character (in which case @var{code} must be a non-negative integer) | |
2122 -- A vector of characters (in which case @var{code} must be a vector of | |
2123 non-negative integers of the same length) | |
2124 | |
2125 Values of @var{code} above 2^20 - 1 are allowed for the purpose of specifying | |
2126 private characters, but will cause errors when converted to UTF-16 or UTF-32. | |
2127 UCS-4 and UTF-8 can handle values to 2^31 - 1, but XEmacs Lisp integers top | |
2128 out at 2^30 - 1. | |
2129 @end defun | |
2130 | |
2131 @defun character-to-unicode character | |
2132 Convert @var{character} to Unicode codepoint. | |
2133 When there is no international support (i.e. MULE is not defined), | |
2134 this function simply does @code{char-to-int}. | |
2135 @end defun | |
2136 | |
2137 @defun unicode-to-character code [charsets] | |
2138 Convert Unicode codepoint @var{code} to character. | |
2139 @var{code} should be a non-negative integer. | |
2140 If @var{charsets} is given, it should be a list of charsets, and only those | |
2141 charsets will be consulted, in the given order, for a translation. | |
2142 Otherwise, the default ordering of all charsets will be given (see | |
2143 @code{set-unicode-charset-precedence}). | |
2144 | |
2145 When there is no international support (i.e. MULE is not defined), | |
2146 this function simply does @code{int-to-char} and ignores the | |
2147 @var{charsets} argument. | |
2148 @end defun | |
2149 | |
2150 @defun parse-unicode-translation-table filename charset start end offset flags | |
2151 Parse Unicode translation data in @var{filename} for MULE @var{charset}. | |
2152 Data is text, in the form of one translation per line -- charset | |
2153 codepoint followed by Unicode codepoint. Numbers are decimal or hex | |
2154 \(preceded by 0x). Comments are marked with a #. Charset codepoints | |
2155 for two-dimensional charsets should have the first octet stored in the | |
2156 high 8 bits of the hex number and the second in the low 8 bits. | |
2157 | |
2158 If @var{start} and @var{end} are given, only charset codepoints within | |
2159 the given range will be processed. If @var{offset} is given, that value | |
2160 will be added to all charset codepoints in the file to obtain the | |
2161 internal charset codepoint. @var{start} and @var{end} apply to the | |
2162 codepoints in the file, before @var{offset} is applied. | |
2163 | |
2164 (Note that, as usual, we assume that octets are in the range 32 to | |
2165 127 or 33 to 126. If you have a table in kuten form, with octets in | |
2166 the range 1 to 94, you will have to use an offset of 5140, | |
2167 i.e. 0x2020.) | |
2168 | |
2169 @var{flags}, if specified, control further how the tables are interpreted | |
2170 and are used to special-case certain known table weirdnesses in the | |
2171 Unicode tables: | |
2172 | |
2173 @table @code | |
2174 @item ignore-first-column' | |
2175 Exactly as it sounds. The JIS X 0208 tables have 3 columns of data instead | |
2176 of 2; the first is the Shift-JIS codepoint. | |
2177 | |
2178 @item big5 | |
2179 The charset codepoint is a Big Five codepoint; convert it to the | |
2180 proper hacked-up codepoint in `chinese-big5-1' or `chinese-big5-2'. | |
2181 @end table | |
2182 @end defun | |
2183 |