comparison man/internals/internals.texi @ 3955:c1f9ac9f66de

[xemacs-hg @ 2007-05-15 10:25:12 by aidan] Eliminate a few problems in man/internals/internals.texi.
author aidan
date Tue, 15 May 2007 10:25:16 +0000
parents 1dac67fc67ae
children 8d2106500793
comparison
equal deleted inserted replaced
3954:42f1cd0fb81d 3955:c1f9ac9f66de
7524 @end example 7524 @end example
7525 7525
7526 converts to a char that represents the lowercase letter b. 7526 converts to a char that represents the lowercase letter b.
7527 7527
7528 @example 7528 @example
7529 ?^[$(B#&^[(B 7529 ?\u5357
7530 @end example 7530 @end example
7531 7531
7532 (where @samp{^[} actually is an @samp{ESC} character) converts to a 7532 converts to a Han character meaning ``south, southwards''; depending on
7533 particular Kanji character when using an ISO2022-based coding system for 7533 how your XEmacs is configured, it will be assigned to either a Japanese
7534 input. (To decode this goo: @samp{ESC} begins an escape sequence; 7534 or Chinese character set (possibly even a Korean one).
7535 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a
7536 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese
7537 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array
7538 of characters [subtract 33 from the ASCII value of each character to get
7539 the corresponding index]; @samp{ESC (} is a class of escape sequences
7540 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch
7541 to US ASCII''. It is a coincidence that the letter @samp{B} is used to
7542 denote both Japanese Kanji and US ASCII. If the first @samp{B} were
7543 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character
7544 from the GB2312 character set.)
7545 7535
7546 @example 7536 @example
7547 "foobar" 7537 "foobar"
7548 @end example 7538 @end example
7549 7539
9888 The internal makeup of the Ichar integer varies depending on whether 9878 The internal makeup of the Ichar integer varies depending on whether
9889 we have compiled with MULE support. If not, the Ichar integer is an 9879 we have compiled with MULE support. If not, the Ichar integer is an
9890 8-bit integer with possible values from 0 - 255. 0 - 127 are the 9880 8-bit integer with possible values from 0 - 255. 0 - 127 are the
9891 standard ASCII characters, while 128 - 255 are the characters from the 9881 standard ASCII characters, while 128 - 255 are the characters from the
9892 ISO-8859-1 character set. If we have compiled with MULE support, an 9882 ISO-8859-1 character set. If we have compiled with MULE support, an
9893 Ichar is a 19-bit integer, with the various bits having meanings 9883 Ichar is a 21-bit integer, with the various bits having meanings
9894 according to a complex scheme that will be detailed later. The 9884 according to a complex scheme that will be detailed later. The
9895 characters numbered 0 - 255 still have the same meanings as for the 9885 characters numbered 0 - 255 still have the same meanings as for the
9896 non-MULE case, though. 9886 non-MULE case, though.
9897 9887
9898 Internally, the text in a buffer is represented in a fairly simple 9888 Internally, the text in a buffer is represented in a fairly simple
9928 released back to the operating system. However, this tends to result in a 9918 released back to the operating system. However, this tends to result in a
9929 noticeable speed penalty.) 9919 noticeable speed penalty.)
9930 9920
9931 Astute readers may notice that the text in a buffer is represented as 9921 Astute readers may notice that the text in a buffer is represented as
9932 an array of @emph{bytes}, while (at least in the MULE case) an Ichar is 9922 an array of @emph{bytes}, while (at least in the MULE case) an Ichar is
9933 a 19-bit integer, which clearly cannot fit in a byte. This means (of 9923 a 21-bit integer, which clearly cannot fit in a byte. This means (of
9934 course) that the text in a buffer uses a different representation from 9924 course) that the text in a buffer uses a different representation from
9935 an Ichar: specifically, the 19-bit Ichar becomes a series of one to 9925 an Ichar: specifically, the 21-bit Ichar becomes a series of one to
9936 four bytes. The conversion between these two representations is complex 9926 four bytes. The conversion between these two representations is complex
9937 and will be described later. 9927 and will be described later.
9938 9928
9939 In the non-MULE case, everything is very simple: An Ichar 9929 In the non-MULE case, everything is very simple: An Ichar
9940 is an 8-bit value, which fits neatly into one byte. 9930 is an 8-bit value, which fits neatly into one byte.
10975 @item mswindows-unicode 10965 @item mswindows-unicode
10976 this is used for representing text passed to MS Window API calls with 10966 this is used for representing text passed to MS Window API calls with
10977 arguments that need to be in Unicode format. (mswindows-unicode is a 10967 arguments that need to be in Unicode format. (mswindows-unicode is a
10978 coding system of type UTF-16) 10968 coding system of type UTF-16)
10979 10969
10980 @item ms-windows-multi-byte 10970 @item mswindows-multi-byte
10981 this is used for representing text passed to MS Windows API calls with 10971 this is used for representing text passed to MS Windows API calls with
10982 arguments that need to be in multi-byte format. Note that there are 10972 arguments that need to be in multi-byte format. Note that there are
10983 very few if any examples of such calls. 10973 very few if any examples of such calls.
10984 10974
10985 @item mswindows-tstr 10975 @item mswindows-tstr
10993 10983
10994 @item terminal 10984 @item terminal
10995 used for text sent to or read from a text terminal in the absence of a 10985 used for text sent to or read from a text terminal in the absence of a
10996 more specific coding system (calls to window-system specific APIs should 10986 more specific coding system (calls to window-system specific APIs should
10997 use the appropriate window-specific coding system if it makes sense to 10987 use the appropriate window-specific coding system if it makes sense to
10998 do so.) 10988 do so.) Like others here, this is a coding system alias.
10999 10989
11000 @item file-name 10990 @item file-name
11001 used when specifying the names of files in the absence of a more 10991 used when specifying the names of files in the absence of a more
11002 specific encoding, such as ms-windows-tstr. 10992 specific encoding, such as ms-windows-tstr. This is a coding system
10993 alias -- what it's an alias of is determined at startup.
11003 10994
11004 @item native 10995 @item native
11005 the most general coding system for specifying text passed to system 10996 the most general coding system for specifying text passed to system
11006 calls. This generally translates to whatever coding system is specified 10997 calls. This generally translates to whatever coding system is specified
11007 by the current locale. This should only be used when none of the coding 10998 by the current locale. This should only be used when none of the coding
11008 systems mentioned above are appropriate. 10999 systems mentioned above are appropriate. This is a coding system
11000 alias -- what it's an alias of is determined at startup.
11009 @end table 11001 @end table
11010 11002
11011 @subheading Proper Display of Multilingual Text 11003 @subheading Proper Display of Multilingual Text
11012 11004
11013 There are two things required to get this working correctly. One is 11005 There are two things required to get this working correctly. One is
11272 @end example 11264 @end example
11273 11265
11274 There are two internal encodings for characters in XEmacs/Mule. One is 11266 There are two internal encodings for characters in XEmacs/Mule. One is
11275 called @dfn{string encoding} and is an 8-bit encoding that is used for 11267 called @dfn{string encoding} and is an 8-bit encoding that is used for
11276 representing characters in a buffer or string. It uses 1 to 4 bytes per 11268 representing characters in a buffer or string. It uses 1 to 4 bytes per
11277 character. The other is called @dfn{character encoding} and is a 19-bit 11269 character. The other is called @dfn{character encoding} and is a 21-bit
11278 encoding that is used for representing characters individually in a 11270 encoding that is used for representing characters individually in a
11279 variable. 11271 variable.
11280 11272
11281 (In the following descriptions, we'll ignore composite characters for 11273 (In the following descriptions, we'll ignore composite characters for
11282 the moment. We also give a general (structural) overview first, 11274 the moment. We also give a general (structural) overview first,