Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 3955:c1f9ac9f66de
[xemacs-hg @ 2007-05-15 10:25:12 by aidan]
Eliminate a few problems in man/internals/internals.texi.
author | aidan |
---|---|
date | Tue, 15 May 2007 10:25:16 +0000 |
parents | 1dac67fc67ae |
children | 8d2106500793 |
comparison
equal
deleted
inserted
replaced
3954:42f1cd0fb81d | 3955:c1f9ac9f66de |
---|---|
7524 @end example | 7524 @end example |
7525 | 7525 |
7526 converts to a char that represents the lowercase letter b. | 7526 converts to a char that represents the lowercase letter b. |
7527 | 7527 |
7528 @example | 7528 @example |
7529 ?^[$(B#&^[(B | 7529 ?\u5357 |
7530 @end example | 7530 @end example |
7531 | 7531 |
7532 (where @samp{^[} actually is an @samp{ESC} character) converts to a | 7532 converts to a Han character meaning ``south, southwards''; depending on |
7533 particular Kanji character when using an ISO2022-based coding system for | 7533 how your XEmacs is configured, it will be assigned to either a Japanese |
7534 input. (To decode this goo: @samp{ESC} begins an escape sequence; | 7534 or Chinese character set (possibly even a Korean one). |
7535 @samp{ESC $ (} is a class of escape sequences meaning ``switch to a | |
7536 94x94 character set''; @samp{ESC $ ( B} means ``switch to Japanese | |
7537 Kanji''; @samp{#} and @samp{&} collectively index into a 94-by-94 array | |
7538 of characters [subtract 33 from the ASCII value of each character to get | |
7539 the corresponding index]; @samp{ESC (} is a class of escape sequences | |
7540 meaning ``switch to a 94 character set''; @samp{ESC (B} means ``switch | |
7541 to US ASCII''. It is a coincidence that the letter @samp{B} is used to | |
7542 denote both Japanese Kanji and US ASCII. If the first @samp{B} were | |
7543 replaced with an @samp{A}, you'd be requesting a Chinese Hanzi character | |
7544 from the GB2312 character set.) | |
7545 | 7535 |
7546 @example | 7536 @example |
7547 "foobar" | 7537 "foobar" |
7548 @end example | 7538 @end example |
7549 | 7539 |
9888 The internal makeup of the Ichar integer varies depending on whether | 9878 The internal makeup of the Ichar integer varies depending on whether |
9889 we have compiled with MULE support. If not, the Ichar integer is an | 9879 we have compiled with MULE support. If not, the Ichar integer is an |
9890 8-bit integer with possible values from 0 - 255. 0 - 127 are the | 9880 8-bit integer with possible values from 0 - 255. 0 - 127 are the |
9891 standard ASCII characters, while 128 - 255 are the characters from the | 9881 standard ASCII characters, while 128 - 255 are the characters from the |
9892 ISO-8859-1 character set. If we have compiled with MULE support, an | 9882 ISO-8859-1 character set. If we have compiled with MULE support, an |
9893 Ichar is a 19-bit integer, with the various bits having meanings | 9883 Ichar is a 21-bit integer, with the various bits having meanings |
9894 according to a complex scheme that will be detailed later. The | 9884 according to a complex scheme that will be detailed later. The |
9895 characters numbered 0 - 255 still have the same meanings as for the | 9885 characters numbered 0 - 255 still have the same meanings as for the |
9896 non-MULE case, though. | 9886 non-MULE case, though. |
9897 | 9887 |
9898 Internally, the text in a buffer is represented in a fairly simple | 9888 Internally, the text in a buffer is represented in a fairly simple |
9928 released back to the operating system. However, this tends to result in a | 9918 released back to the operating system. However, this tends to result in a |
9929 noticeable speed penalty.) | 9919 noticeable speed penalty.) |
9930 | 9920 |
9931 Astute readers may notice that the text in a buffer is represented as | 9921 Astute readers may notice that the text in a buffer is represented as |
9932 an array of @emph{bytes}, while (at least in the MULE case) an Ichar is | 9922 an array of @emph{bytes}, while (at least in the MULE case) an Ichar is |
9933 a 19-bit integer, which clearly cannot fit in a byte. This means (of | 9923 a 21-bit integer, which clearly cannot fit in a byte. This means (of |
9934 course) that the text in a buffer uses a different representation from | 9924 course) that the text in a buffer uses a different representation from |
9935 an Ichar: specifically, the 19-bit Ichar becomes a series of one to | 9925 an Ichar: specifically, the 21-bit Ichar becomes a series of one to |
9936 four bytes. The conversion between these two representations is complex | 9926 four bytes. The conversion between these two representations is complex |
9937 and will be described later. | 9927 and will be described later. |
9938 | 9928 |
9939 In the non-MULE case, everything is very simple: An Ichar | 9929 In the non-MULE case, everything is very simple: An Ichar |
9940 is an 8-bit value, which fits neatly into one byte. | 9930 is an 8-bit value, which fits neatly into one byte. |
10975 @item mswindows-unicode | 10965 @item mswindows-unicode |
10976 this is used for representing text passed to MS Window API calls with | 10966 this is used for representing text passed to MS Window API calls with |
10977 arguments that need to be in Unicode format. (mswindows-unicode is a | 10967 arguments that need to be in Unicode format. (mswindows-unicode is a |
10978 coding system of type UTF-16) | 10968 coding system of type UTF-16) |
10979 | 10969 |
10980 @item ms-windows-multi-byte | 10970 @item mswindows-multi-byte |
10981 this is used for representing text passed to MS Windows API calls with | 10971 this is used for representing text passed to MS Windows API calls with |
10982 arguments that need to be in multi-byte format. Note that there are | 10972 arguments that need to be in multi-byte format. Note that there are |
10983 very few if any examples of such calls. | 10973 very few if any examples of such calls. |
10984 | 10974 |
10985 @item mswindows-tstr | 10975 @item mswindows-tstr |
10993 | 10983 |
10994 @item terminal | 10984 @item terminal |
10995 used for text sent to or read from a text terminal in the absence of a | 10985 used for text sent to or read from a text terminal in the absence of a |
10996 more specific coding system (calls to window-system specific APIs should | 10986 more specific coding system (calls to window-system specific APIs should |
10997 use the appropriate window-specific coding system if it makes sense to | 10987 use the appropriate window-specific coding system if it makes sense to |
10998 do so.) | 10988 do so.) Like others here, this is a coding system alias. |
10999 | 10989 |
11000 @item file-name | 10990 @item file-name |
11001 used when specifying the names of files in the absence of a more | 10991 used when specifying the names of files in the absence of a more |
11002 specific encoding, such as ms-windows-tstr. | 10992 specific encoding, such as ms-windows-tstr. This is a coding system |
10993 alias -- what it's an alias of is determined at startup. | |
11003 | 10994 |
11004 @item native | 10995 @item native |
11005 the most general coding system for specifying text passed to system | 10996 the most general coding system for specifying text passed to system |
11006 calls. This generally translates to whatever coding system is specified | 10997 calls. This generally translates to whatever coding system is specified |
11007 by the current locale. This should only be used when none of the coding | 10998 by the current locale. This should only be used when none of the coding |
11008 systems mentioned above are appropriate. | 10999 systems mentioned above are appropriate. This is a coding system |
11000 alias -- what it's an alias of is determined at startup. | |
11009 @end table | 11001 @end table |
11010 | 11002 |
11011 @subheading Proper Display of Multilingual Text | 11003 @subheading Proper Display of Multilingual Text |
11012 | 11004 |
11013 There are two things required to get this working correctly. One is | 11005 There are two things required to get this working correctly. One is |
11272 @end example | 11264 @end example |
11273 | 11265 |
11274 There are two internal encodings for characters in XEmacs/Mule. One is | 11266 There are two internal encodings for characters in XEmacs/Mule. One is |
11275 called @dfn{string encoding} and is an 8-bit encoding that is used for | 11267 called @dfn{string encoding} and is an 8-bit encoding that is used for |
11276 representing characters in a buffer or string. It uses 1 to 4 bytes per | 11268 representing characters in a buffer or string. It uses 1 to 4 bytes per |
11277 character. The other is called @dfn{character encoding} and is a 19-bit | 11269 character. The other is called @dfn{character encoding} and is a 21-bit |
11278 encoding that is used for representing characters individually in a | 11270 encoding that is used for representing characters individually in a |
11279 variable. | 11271 variable. |
11280 | 11272 |
11281 (In the following descriptions, we'll ignore composite characters for | 11273 (In the following descriptions, we'll ignore composite characters for |
11282 the moment. We also give a general (structural) overview first, | 11274 the moment. We also give a general (structural) overview first, |