Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 1263:bada4b0bce3a
[xemacs-hg @ 2003-02-06 14:37:51 by stephent]
nits <87fzr1o4s3.fsf@tleepslib.sk.tsukuba.ac.jp>
author | stephent |
---|---|
date | Thu, 06 Feb 2003 14:37:56 +0000 |
parents | 465bd3c7d932 |
children | 1b0339b048ce |
comparison
equal
deleted
inserted
replaced
1262:807c72f959fe | 1263:bada4b0bce3a |
---|---|
265 | 265 |
266 * Introduction to Buffers:: A buffer holds a block of text such as a file. | 266 * Introduction to Buffers:: A buffer holds a block of text such as a file. |
267 * The Text in a Buffer:: Representation of the text in a buffer. | 267 * The Text in a Buffer:: Representation of the text in a buffer. |
268 * Buffer Lists:: Keeping track of all buffers. | 268 * Buffer Lists:: Keeping track of all buffers. |
269 * Markers and Extents:: Tagging locations within a buffer. | 269 * Markers and Extents:: Tagging locations within a buffer. |
270 * Ibytes and Ichars:: Representation of individual characters. | 270 * Ibytes and Ichars:: Representation of individual characters. |
271 * The Buffer Object:: The Lisp object corresponding to a buffer. | 271 * The Buffer Object:: The Lisp object corresponding to a buffer. |
272 | 272 |
273 MULE Character Sets and Encodings | 273 MULE Character Sets and Encodings |
274 | 274 |
275 * Character Sets:: | 275 * Character Sets:: |
2766 @cindex Ichar | 2766 @cindex Ichar |
2767 An @code{Ichar} holds a single Emacs character. | 2767 An @code{Ichar} holds a single Emacs character. |
2768 | 2768 |
2769 Obviously, the equality between characters and bytes is lost in the Mule | 2769 Obviously, the equality between characters and bytes is lost in the Mule |
2770 world. Characters can be represented by one or more bytes in the | 2770 world. Characters can be represented by one or more bytes in the |
2771 buffer, and @code{Ichar} is the C type large enough to hold any | 2771 buffer, and @code{Ichar} is a C type large enough to hold any |
2772 character. | 2772 character. |
2773 | 2773 |
2774 Without Mule support, an @code{Ichar} is equivalent to an | 2774 Without Mule support, an @code{Ichar} is equivalent to an |
2775 @code{unsigned char}. | 2775 @code{unsigned char}. |
2776 | 2776 |
2781 | 2781 |
2782 XEmacs does not work with the same character formats all the time; when | 2782 XEmacs does not work with the same character formats all the time; when |
2783 reading characters from the outside, it decodes them to an internal | 2783 reading characters from the outside, it decodes them to an internal |
2784 format, and likewise encodes them when writing. @code{Ibyte} (in fact | 2784 format, and likewise encodes them when writing. @code{Ibyte} (in fact |
2785 @code{unsigned char}) is the basic unit of XEmacs internal buffers and | 2785 @code{unsigned char}) is the basic unit of XEmacs internal buffers and |
2786 strings format. A @code{Ibyte *} is the type that points at text | 2786 strings format. An @code{Ibyte *} is the type that points at text |
2787 encoded in the variable-width internal encoding. | 2787 encoded in the variable-width internal encoding. |
2788 | 2788 |
2789 One character can correspond to one or more @code{Ibyte}s. In the | 2789 One character can correspond to one or more @code{Ibyte}s. In the |
2790 current Mule implementation, an ASCII character is represented by the | 2790 current Mule implementation, an ASCII character is represented by the |
2791 same @code{Ibyte}, and other characters are represented by a sequence | 2791 same @code{Ibyte}, and other characters are represented by a sequence |
2985 | 2985 |
2986 The interface to conversion between the internal and external | 2986 The interface to conversion between the internal and external |
2987 representations of text are the numerous conversion macros defined in | 2987 representations of text are the numerous conversion macros defined in |
2988 @file{buffer.h}. There used to be a fixed set of external formats | 2988 @file{buffer.h}. There used to be a fixed set of external formats |
2989 supported by these macros, but now any coding system can be used with | 2989 supported by these macros, but now any coding system can be used with |
2990 these macros. The coding system alias mechanism is used to create the | 2990 them. The coding system alias mechanism is used to create the |
2991 following logical coding systems, which replace the fixed external | 2991 following logical coding systems, which replace the fixed external |
2992 formats. The (dontusethis-set-symbol-value-handler) mechanism was | 2992 formats. The (dontusethis-set-symbol-value-handler) mechanism was |
2993 enhanced to make this possible (more work on that is needed). | 2993 enhanced to make this possible (more work on that is needed). |
2994 | 2994 |
2995 Example useful coding systems: | 2995 Often useful coding systems: |
2996 | 2996 |
2997 @table @code | 2997 @table @code |
2998 @item Qbinary | 2998 @item Qbinary |
2999 This is the simplest format and is what we use in the absence of a more | 2999 This is the simplest format and is what we use in the absence of a more |
3000 appropriate format. This converts according to the @code{binary} coding | 3000 appropriate format. This converts according to the @code{binary} coding |
3038 accept data of type @code{LPTSTR} or @code{LPCSTR}. This maps to either | 3038 accept data of type @code{LPTSTR} or @code{LPCSTR}. This maps to either |
3039 @code{Qmswindows_multibyte} (a locale-specific encoding, same as | 3039 @code{Qmswindows_multibyte} (a locale-specific encoding, same as |
3040 @code{Qnative}) or @code{Qmswindows_unicode}, depending on whether | 3040 @code{Qnative}) or @code{Qmswindows_unicode}, depending on whether |
3041 XEmacs is being run under Windows 9X or Windows NT/2000/XP. | 3041 XEmacs is being run under Windows 9X or Windows NT/2000/XP. |
3042 @end table | 3042 @end table |
3043 | |
3044 Many other coding systems are provided by default. | |
3043 | 3045 |
3044 There are two fundamental macros to convert between external and | 3046 There are two fundamental macros to convert between external and |
3045 internal format, as well as various convenience macros to simplify the | 3047 internal format, as well as various convenience macros to simplify the |
3046 most common operations. | 3048 most common operations. |
3047 | 3049 |
3194 It is extremely important to always convert external data, because | 3196 It is extremely important to always convert external data, because |
3195 XEmacs can crash if unexpected 8-bit sequences are copied to its internal | 3197 XEmacs can crash if unexpected 8-bit sequences are copied to its internal |
3196 buffers literally. | 3198 buffers literally. |
3197 | 3199 |
3198 This means that when a system function, such as @code{readdir}, returns | 3200 This means that when a system function, such as @code{readdir}, returns |
3199 a string, you may need to convert it using one of the conversion macros | 3201 a string, you normally need to convert it using one of the conversion macros |
3200 described in the previous chapter, before passing it further to Lisp. | 3202 described in the previous chapter, before passing it further to Lisp. |
3201 | 3203 |
3202 Actually, most of the basic system functions that accept '\0'-terminated | 3204 Actually, most of the basic system functions that accept '\0'-terminated |
3203 string arguments, like @code{stat()} and @code{open()}, have | 3205 string arguments, like @code{stat()} and @code{open()}, have |
3204 @strong{encapsulated} equivalents that do the internal to external | 3206 @strong{encapsulated} equivalents that do the internal to external |
3214 to be decoded only once, when it is read. After that, it is passed | 3216 to be decoded only once, when it is read. After that, it is passed |
3215 around in internal format. | 3217 around in internal format. |
3216 | 3218 |
3217 @item Do all work in internal format | 3219 @item Do all work in internal format |
3218 External-formatted data is completely unpredictable in its format. It | 3220 External-formatted data is completely unpredictable in its format. It |
3219 may be Unicode (non-ASCII compatible); it may be a modal encoding, in | 3221 may be fixed-width Unicode (not even ASCII compatible); it may be a |
3222 modal encoding, in | |
3220 which case some occurrences of (e.g.) the slash character may be part of | 3223 which case some occurrences of (e.g.) the slash character may be part of |
3221 two-byte Asian-language characters, and a naive attempt to split apart a | 3224 two-byte Asian-language characters, and a naive attempt to split apart a |
3222 pathname by slashes will fail; etc. Internal-format text should be | 3225 pathname by slashes will fail; etc. Internal-format text should be |
3223 converted to external format only at the point where an external API is | 3226 converted to external format only at the point where an external API is |
3224 actually called, and the first thing done after receiving | 3227 actually called, and the first thing done after receiving |
8111 @menu | 8114 @menu |
8112 * Introduction to Buffers:: A buffer holds a block of text such as a file. | 8115 * Introduction to Buffers:: A buffer holds a block of text such as a file. |
8113 * The Text in a Buffer:: Representation of the text in a buffer. | 8116 * The Text in a Buffer:: Representation of the text in a buffer. |
8114 * Buffer Lists:: Keeping track of all buffers. | 8117 * Buffer Lists:: Keeping track of all buffers. |
8115 * Markers and Extents:: Tagging locations within a buffer. | 8118 * Markers and Extents:: Tagging locations within a buffer. |
8116 * Ibytes and Ichars:: Representation of individual characters. | 8119 * Ibytes and Ichars:: Representation of individual characters. |
8117 * The Buffer Object:: The Lisp object corresponding to a buffer. | 8120 * The Buffer Object:: The Lisp object corresponding to a buffer. |
8118 @end menu | 8121 @end menu |
8119 | 8122 |
8120 @node Introduction to Buffers | 8123 @node Introduction to Buffers |
8121 @section Introduction to Buffers | 8124 @section Introduction to Buffers |
8196 @dfn{character position}. We can speak of the character before or after | 8199 @dfn{character position}. We can speak of the character before or after |
8197 a particular buffer position, and when you insert a character at a | 8200 a particular buffer position, and when you insert a character at a |
8198 particular position, all characters after that position end up at new | 8201 particular position, all characters after that position end up at new |
8199 positions. When we speak of the character @dfn{at} a position, we | 8202 positions. When we speak of the character @dfn{at} a position, we |
8200 really mean the character after the position. (This schizophrenia | 8203 really mean the character after the position. (This schizophrenia |
8201 between a buffer position being ``between'' a character and ``on'' a | 8204 between a buffer position being ``between'' two characters and ``on'' a |
8202 character is rampant in Emacs.) | 8205 character is rampant in Emacs.) |
8203 | 8206 |
8204 Buffer positions are numbered starting at 1. This means that | 8207 Buffer positions are numbered starting at 1. This means that |
8205 position 1 is before the first character, and position 0 is not | 8208 position 1 is before the first character, and position 0 is not |
8206 valid. If there are N characters in a buffer, then buffer | 8209 valid. If there are N characters in a buffer, then buffer |
9794 Similarly, a string may or may not have an extent_info structure. | 9797 Similarly, a string may or may not have an extent_info structure. |
9795 (Generally it won't if there haven't been any extents added to the | 9798 (Generally it won't if there haven't been any extents added to the |
9796 string.) So use the @code{_force} version if you need the extent_info | 9799 string.) So use the @code{_force} version if you need the extent_info |
9797 structure to be there. | 9800 structure to be there. |
9798 | 9801 |
9799 A list of extents is maintained as a double gap array: One gap array | 9802 A list of extents is maintained as a double gap array. One gap array |
9800 is ordered by start index (the @dfn{display order}) and the other is | 9803 is ordered by start index (the @dfn{display order}) and the other is |
9801 ordered by end index (the @dfn{e-order}). Note that positions in an | 9804 ordered by end index (the @dfn{e-order}). Note that positions in an |
9802 extent list should logically be conceived of as referring @emph{to} a | 9805 extent list should logically be conceived of as referring @emph{to} a |
9803 particular extent (as is the norm in programs) rather than sitting | 9806 particular extent (as is the norm in programs) rather than sitting |
9804 between two extents. Note also that callers of these functions should | 9807 between two extents. Note also that callers of these functions should |
9827 | 9830 |
9828 @item | 9831 @item |
9829 Code to manipulate them is relatively simple to write. | 9832 Code to manipulate them is relatively simple to write. |
9830 @end enumerate | 9833 @end enumerate |
9831 | 9834 |
9832 An alternative would be a balanced binary trees, which have guaranteed | 9835 An alternative would be balanced binary trees, which have guaranteed |
9833 @math{O(log N)} time for all operations (although the constant factors | 9836 @math{O(log N)} time for all operations (although the constant factors |
9834 are not as good, and repeated localized operations will be slower than | 9837 are not as good, and repeated localized operations will be slower than |
9835 for a gap array). Such code is quite tricky to write, however. | 9838 for a gap array). Such code is quite tricky to write, however. |
9836 | 9839 |
9837 @node Zero-Length Extents | 9840 @node Zero-Length Extents |