comparison man/lispref/mule.texi @ 1188:11ff4edb6bb7

[xemacs-hg @ 2003-01-05 10:53:58 by youngs] 2003-01-04 Steve Youngs <youngs@xemacs.org> * lispref/mule.texi (Charset Unification): Menu item "Internals" should be "Unification Internals".
author youngs
date Sun, 05 Jan 2003 10:54:04 +0000
parents c1553814932e
children 465bd3c7d932
comparison
equal deleted inserted replaced
1187:6f18092b3b3c 1188:11ff4edb6bb7
2212 * Usage:: An overview of the operation of Unification. 2212 * Usage:: An overview of the operation of Unification.
2213 * Configuration:: Configuring Unification for use. 2213 * Configuration:: Configuring Unification for use.
2214 * Theory of Operation:: How Unification works. 2214 * Theory of Operation:: How Unification works.
2215 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets. 2215 * What Unification Cannot Do for You:: Inherent problems of 8-bit charsets.
2216 * Charsets and Coding Systems:: Reference lists with annotations. 2216 * Charsets and Coding Systems:: Reference lists with annotations.
2217 * Internals:: Utilities and implementation details. 2217 * Unification Internals:: Utilities and implementation details.
2218 @end menu 2218 @end menu
2219 2219
2220 @node Overview, Usage, Charset Unification, Charset Unification 2220 @node Overview, Usage, Charset Unification, Charset Unification
2221 @subsection An Overview of Unification 2221 @subsection An Overview of Unification
2222 2222
2693 consider the ISO Latin character sets to be disjoint. This manifests 2693 consider the ISO Latin character sets to be disjoint. This manifests
2694 itself when a user enters characters using input methods associated with 2694 itself when a user enters characters using input methods associated with
2695 different coded character sets into a single buffer. 2695 different coded character sets into a single buffer.
2696 2696
2697 There are two problems stemming from this design. First, Mule 2697 There are two problems stemming from this design. First, Mule
2698 represents the same character in different ways. Abstractly, ',As(B' 2698 represents the same character in different ways. Abstractly, 'ó'
2699 (LATIN SMALL LETTER O WITH ACUTE) can get represented as 2699 (LATIN SMALL LETTER O WITH ACUTE) can get represented as
2700 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like 2700 [latin-iso8859-1 #x73] or as [latin-iso8859-2 #x73]. So what looks like
2701 ',Ass(B' in the display might actually be represented [latin-iso8859-1 2701 'óó' in the display might actually be represented [latin-iso8859-1
2702 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B 2702 #x73][latin-iso8859-2 #x73] in the buffer, and saved as [#xF3 ESC - B
2703 #xF3 ESC - A] in the file. In some cases this treatment would be 2703 #xF3 ESC - A] in the file. In some cases this treatment would be
2704 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00 2704 appropriate (consider HYPHEN, MINUS SIGN, EN DASH, EM DASH, and U+4E00
2705 (the CJK ideographic character meaning ``one'')), and although arguably 2705 (the CJK ideographic character meaning ``one'')), and although arguably
2706 incorrect it is convenient when mixing the CJK scripts. But in the case 2706 incorrect it is convenient when mixing the CJK scripts. But in the case
2744 set which contains all of the Latin characters in the buffer. 2744 set which contains all of the Latin characters in the buffer.
2745 2745
2746 The primary purpose of Unification is to fix the problem by giving the 2746 The primary purpose of Unification is to fix the problem by giving the
2747 user the choice to change the representation of all characters to one 2747 user the choice to change the representation of all characters to one
2748 character set and give sensible recommendations based on context. In 2748 character set and give sensible recommendations based on context. In
2749 the ',As(B' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and 2749 the 'ó' example, either ISO 8859/1 or ISO 8859/2 is satisfactory, and
2750 both will be suggested. In the EURO SIGN example, only ISO 8859/15 2750 both will be suggested. In the EURO SIGN example, only ISO 8859/15
2751 makes sense, and that is what will be recommended. In both cases, the 2751 makes sense, and that is what will be recommended. In both cases, the
2752 user will be reminded that there are universal encodings available. 2752 user will be reminded that there are universal encodings available.
2753 2753
2754 I call this @dfn{remapping} (from the universal character set to a 2754 I call this @dfn{remapping} (from the universal character set to a
2755 particular ISO 8859 coded character set). It is mere accident that this 2755 particular ISO 8859 coded character set). It is mere accident that this
2756 letter has the same code point in both character sets. (Not entirely, 2756 letter has the same code point in both character sets. (Not entirely,
2757 but there are many examples of Latin characters that have different code 2757 but there are many examples of Latin characters that have different code
2758 points in different Latin-X sets.) 2758 points in different Latin-X sets.)
2759 2759
2760 Note that, in the ',As(B' example, that treating the buffer in this way will 2760 Note that, in the 'ó' example, that treating the buffer in this way will
2761 result in a representation such as [latin-iso8859-2 2761 result in a representation such as [latin-iso8859-2
2762 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3]. 2762 #x73][latin-iso8859-2 #x73], and the file will be saved as [#xF3 #xF3].
2763 This is guaranteed to occasionally result in the second problem you 2763 This is guaranteed to occasionally result in the second problem you
2764 observed, to which we now turn. 2764 observed, to which we now turn.
2765 2765