Mercurial > hg > xemacs-beta
comparison man/lispref/mule.texi @ 1188:11ff4edb6bb7
[xemacs-hg @ 2003-01-05 10:53:58 by youngs]
2003-01-04 Steve Youngs <youngs@xemacs.org>
* lispref/mule.texi (Charset Unification): Menu item "Internals"
should be "Unification Internals".
author | youngs |
---|---|
date | Sun, 05 Jan 2003 10:54:04 +0000 |
parents | c1553814932e |
children | 465bd3c7d932 |
comparison
equal
deleted
inserted
replaced
1187:6f18092b3b3c | 1188:11ff4edb6bb7 |
---|---|
2212 * Usage:: An overview of the operation of Unification. | 2212 * Usage:: An overview of the operation of Unification. |
2213 * Configuration:: Configuring Unification for use. | 2213 * Configuration:: Configuring Unification for use. |
2214 * Theory of Operation:: How Unification works. | 2214 * Theory of Operation:: How Unification works. |
2215 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets. | 2215 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets. |
2216 * Charsets and Coding Systems:: Reference lists with annotations. | 2216 * Charsets and Coding Systems:: Reference lists with annotations. |
2217 * Internals:: Utilities and implementation details. | 2217 * Unification Internals:: Utilities and implementation details. |
2218 @end menu | 2218 @end menu |
2219 | 2219 |
2220 @node Overview, Usage, Charset Unification, Charset Unification | 2220 @node Overview, Usage, Charset Unification, Charset Unification |
2221 @subsection An Overview of Unification | 2221 @subsection An Overview of Unification |
2222 | 2222 |
2693 consider the ISO Latin character sets to be disjoint. This manifests | 2693 consider the ISO Latin character sets to be disjoint. This manifests |
2694 itself when a user enters characters using input methods associated with | 2694 itself when a user enters characters using input methods associated with |
2695 different coded character sets into a single buffer. | 2695 different coded character sets into a single buffer. |
2696 | 2696 |
2697 There are two problems stemming from this design. First, Mule | 2697 There are two problems stemming from this design. First, Mule |
2698 represents the same character in different ways. Abstractly, ',As(B' | 2698 represents the same character in different ways. Abstractly, 'ó' |
2699 (LATIN SMALL LETTER O WITH ACUTE) can get represented as | 2699 (LATIN SMALL LETTER O WITH ACUTE) can get represented as |
2700 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like | 2700 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like |
2701 ',Ass(B' in the display might actually be represented [latin-iso8859-1 | 2701 'óó' in the display might actually be represented [latin-iso8859-1 |
2702 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B | 2702 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B |
2703 #xF3 ESC - A] in the file. In some cases this treatment would be | 2703 #xF3 ESC - A] in the file. In some cases this treatment would be |
2704 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00 | 2704 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00 |
2705 (the CJK ideographic character meaning ``one'')), and although arguably | 2705 (the CJK ideographic character meaning ``one'')), and although arguably |
2706 incorrect it is convenient when mixing the CJK scripts. But in the case | 2706 incorrect it is convenient when mixing the CJK scripts. But in the case |
2744 set which contains all of the Latin characters in the buffer. | 2744 set which contains all of the Latin characters in the buffer. |
2745 | 2745 |
2746 The primary purpose of Unification is to fix the problem by giving the | 2746 The primary purpose of Unification is to fix the problem by giving the |
2747 user the choice to change the representation of all characters to one | 2747 user the choice to change the representation of all characters to one |
2748 character set and give sensible recommendations based on context. In | 2748 character set and give sensible recommendations based on context. In |
2749 the ',As(B' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and | 2749 the 'ó' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and |
2750 both will be suggested. In the EURO SIGN example, only ISO 8859/15 | 2750 both will be suggested. In the EURO SIGN example, only ISO 8859/15 |
2751 makes sense, and that is what will be recommended. In both cases, the | 2751 makes sense, and that is what will be recommended. In both cases, the |
2752 user will be reminded that there are universal encodings available. | 2752 user will be reminded that there are universal encodings available. |
2753 | 2753 |
2754 I call this @dfn{remapping} (from the universal character set to a | 2754 I call this @dfn{remapping} (from the universal character set to a |
2755 particular ISO 8859 coded character set). It is mere accident that this | 2755 particular ISO 8859 coded character set). It is mere accident that this |
2756 letter has the same code point in both character sets. (Not entirely, | 2756 letter has the same code point in both character sets. (Not entirely, |
2757 but there are many examples of Latin characters that have different code | 2757 but there are many examples of Latin characters that have different code |
2758 points in different Latin-X sets.) | 2758 points in different Latin-X sets.) |
2759 | 2759 |
2760 Note that, in the ',As(B' example, that treating the buffer in this way will | 2760 Note that, in the 'ó' example, that treating the buffer in this way will |
2761 result in a representation such as [latin-iso8859-2 | 2761 result in a representation such as [latin-iso8859-2 |
2762 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3]. | 2762 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3]. |
2763 This is guaranteed to occasionally result in the second problem you | 2763 This is guaranteed to occasionally result in the second problem you |
2764 observed, to which we now turn. | 2764 observed, to which we now turn. |
2765 | 2765 |