Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 377:d883f39b8495 r21-2b4
Import from CVS: tag r21-2b4
author | cvs |
---|---|
date | Mon, 13 Aug 2007 11:05:42 +0200 |
parents | 6240c7796c7a |
children | 8626e4521993 |
comparison
equal
deleted
inserted
replaced
376:e2295b4d9f2e | 377:d883f39b8495 |
---|---|
2043 code generalization for future I18N work. | 2043 code generalization for future I18N work. |
2044 | 2044 |
2045 @menu | 2045 @menu |
2046 * Character-Related Data Types:: | 2046 * Character-Related Data Types:: |
2047 * Working With Character and Byte Positions:: | 2047 * Working With Character and Byte Positions:: |
2048 * Conversion of External Data:: | 2048 * Conversion to and from External Data:: |
2049 * General Guidelines for Writing Mule-Aware Code:: | 2049 * General Guidelines for Writing Mule-Aware Code:: |
2050 * An Example of Mule-Aware Code:: | 2050 * An Example of Mule-Aware Code:: |
2051 @end menu | 2051 @end menu |
2052 | 2052 |
2053 @node Character-Related Data Types | 2053 @node Character-Related Data Types |
2054 @subsection Character-Related Data Types | 2054 @subsection Character-Related Data Types |
2055 | 2055 |
2056 First, we will list the basic character-related datatypes used by | 2056 First, let's review the basic character-related datatypes used by |
2057 XEmacs. Note that the separate @code{typedef}s are not required for the | 2057 XEmacs. Note that the separate @code{typedef}s are not mandatory in the |
2058 code to work (all of them boil down to @code{unsigned char} or | 2058 current implementation (all of them boil down to @code{unsigned char} or |
2059 @code{int}), but they improve clarity of code a great deal, because one | 2059 @code{int}), but they improve clarity of code a great deal, because one |
2060 glance at the declaration can tell the intended use of the variable. | 2060 glance at the declaration can tell the intended use of the variable. |
2061 | 2061 |
2062 @table @code | 2062 @table @code |
2063 @item Emchar | 2063 @item Emchar |
2091 Without Mule support, a @code{Bufbyte} is equivalent to an | 2091 Without Mule support, a @code{Bufbyte} is equivalent to an |
2092 @code{Emchar}. | 2092 @code{Emchar}. |
2093 | 2093 |
2094 @item Bufpos | 2094 @item Bufpos |
2095 @itemx Charcount | 2095 @itemx Charcount |
2096 @cindex Bufpos | |
2097 @cindex Charcount | |
2096 A @code{Bufpos} represents a character position in a buffer or string. | 2098 A @code{Bufpos} represents a character position in a buffer or string. |
2097 A @code{Charcount} represents a number (count) of characters. | 2099 A @code{Charcount} represents a number (count) of characters. |
2098 Logically, subtracting two @code{Bufpos} values yields a | 2100 Logically, subtracting two @code{Bufpos} values yields a |
2099 @code{Charcount} value. Although all of these are @code{typedef}ed to | 2101 @code{Charcount} value. Although all of these are @code{typedef}ed to |
2100 @code{int}, we use them in preference to @code{int} to make it clear | 2102 @code{int}, we use them in preference to @code{int} to make it clear |
2103 @code{Bufpos} and @code{Charcount} values are the only ones that are | 2105 @code{Bufpos} and @code{Charcount} values are the only ones that are |
2104 ever visible to Lisp. | 2106 ever visible to Lisp. |
2105 | 2107 |
2106 @item Bytind | 2108 @item Bytind |
2107 @itemx Bytecount | 2109 @itemx Bytecount |
2110 @cindex Bytind | |
2111 @cindex Bytecount | |
2108 A @code{Bytind} represents a byte position in a buffer or string. A | 2112 A @code{Bytind} represents a byte position in a buffer or string. A |
2109 @code{Bytecount} represents the distance between two positions in bytes. | 2113 @code{Bytecount} represents the distance between two positions in bytes. |
2110 The relationship between @code{Bytind} and @code{Bytecount} is the same | 2114 The relationship between @code{Bytind} and @code{Bytecount} is the same |
2111 as the relationship between @code{Bufpos} and @code{Charcount}. | 2115 as the relationship between @code{Bufpos} and @code{Charcount}. |
2112 | 2116 |
2113 @item Extbyte | 2117 @item Extbyte |
2114 @itemx Extcount | 2118 @itemx Extcount |
2119 @cindex Extbyte | |
2120 @cindex Extcount | |
2115 When dealing with the outside world, XEmacs works with @code{Extbyte}s, | 2121 When dealing with the outside world, XEmacs works with @code{Extbyte}s, |
2116 which are equivalent to @code{unsigned char}. Obviously, an | 2122 which are equivalent to @code{unsigned char}. Obviously, an |
2117 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes | 2123 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes |
2118 and Extcounts are not all that frequent in XEmacs code. | 2124 and Extcounts are not all that frequent in XEmacs code. |
2119 @end table | 2125 @end table |
2128 most important ones. Examining the existing code is the best way to | 2134 most important ones. Examining the existing code is the best way to |
2129 learn about them. | 2135 learn about them. |
2130 | 2136 |
2131 @table @code | 2137 @table @code |
2132 @item MAX_EMCHAR_LEN | 2138 @item MAX_EMCHAR_LEN |
2139 @cindex MAX_EMCHAR_LEN | |
2133 This preprocessor constant is the maximum number of buffer bytes per | 2140 This preprocessor constant is the maximum number of buffer bytes per |
2134 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful | 2141 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful |
2135 when allocating temporary strings to keep a known number of characters. | 2142 when allocating temporary strings to keep a known number of characters. |
2136 For instance: | 2143 For instance: |
2137 | 2144 |
2153 | 2160 |
2154 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4. | 2161 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4. |
2155 Without Mule, it is 1. | 2162 Without Mule, it is 1. |
2156 | 2163 |
2157 @item charptr_emchar | 2164 @item charptr_emchar |
2158 @item set_charptr_emchar | 2165 @itemx set_charptr_emchar |
2159 @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and returns | 2166 @cindex charptr_emchar |
2160 the underlying @code{Emchar}. If it were a function, its prototype | 2167 @cindex set_charptr_emchar |
2161 would be: | 2168 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and |
2169 returns the @code{Emchar} stored at that position. If it were a | |
2170 function, its prototype would be: | |
2162 | 2171 |
2163 @example | 2172 @example |
2164 Emchar charptr_emchar (Bufbyte *p); | 2173 Emchar charptr_emchar (Bufbyte *p); |
2165 @end example | 2174 @end example |
2166 | 2175 |
2198 Note how @code{set_charptr_emchar} is used to store the @code{Emchar} | 2207 Note how @code{set_charptr_emchar} is used to store the @code{Emchar} |
2199 and increment the counter, at the same time. | 2208 and increment the counter, at the same time. |
2200 | 2209 |
2201 @item INC_CHARPTR | 2210 @item INC_CHARPTR |
2202 @itemx DEC_CHARPTR | 2211 @itemx DEC_CHARPTR |
2212 @cindex INC_CHARPTR | |
2213 @cindex DEC_CHARPTR | |
2203 These two macros increment and decrement a @code{Bufbyte} pointer, | 2214 These two macros increment and decrement a @code{Bufbyte} pointer, |
2204 respectively. The pointer needs to be correctly positioned at the | 2215 respectively. They will adjust the pointer by the appropriate number of |
2205 beginning of a valid character position. | 2216 bytes according to the byte length of the character stored there. Both |
2217 macros assume that the memory address is located at the beginning of a | |
2218 valid character. | |
2206 | 2219 |
2207 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)} | 2220 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)} |
2208 simply expand to @code{p++} and @code{p--}, respectively. | 2221 simply expand to @code{p++} and @code{p--}, respectively. |
2209 | 2222 |
2210 @item bytecount_to_charcount | 2223 @item bytecount_to_charcount |
2224 @cindex bytecount_to_charcount | |
2211 Given a pointer to a text string and a length in bytes, return the | 2225 Given a pointer to a text string and a length in bytes, return the |
2212 equivalent length in characters. | 2226 equivalent length in characters. |
2213 | 2227 |
2214 @example | 2228 @example |
2215 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc); | 2229 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc); |
2216 @end example | 2230 @end example |
2217 | 2231 |
2218 @item charcount_to_bytecount | 2232 @item charcount_to_bytecount |
2233 @cindex charcount_to_bytecount | |
2219 Given a pointer to a text string and a length in characters, return the | 2234 Given a pointer to a text string and a length in characters, return the |
2220 equivalent length in bytes. | 2235 equivalent length in bytes. |
2221 | 2236 |
2222 @example | 2237 @example |
2223 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc); | 2238 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc); |
2224 @end example | 2239 @end example |
2225 | 2240 |
2226 @item charptr_n_addr | 2241 @item charptr_n_addr |
2242 @cindex charptr_n_addr | |
2227 Return a pointer to the beginning of the character offset @var{cc} (in | 2243 Return a pointer to the beginning of the character offset @var{cc} (in |
2228 characters) from @var{p}. | 2244 characters) from @var{p}. |
2229 | 2245 |
2230 @example | 2246 @example |
2231 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); | 2247 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); |
2232 @end example | 2248 @end example |
2233 @end table | 2249 @end table |
2234 | 2250 |
2235 @node Conversion of External Data | 2251 @node Conversion to and from External Data |
2236 @subsection Conversion of External Data | 2252 @subsection Conversion to and from External Data |
2237 | 2253 |
2238 When an external function, such as a C library function, returns a | 2254 When an external function, such as a C library function, returns a |
2239 @code{char} pointer, you should never treat it as @code{Bufbyte}. This | 2255 @code{char} pointer, you should almost never treat it as @code{Bufbyte}. |
2240 is because these returned strings may contain 8bit characters which can | 2256 This is because these returned strings may contain 8bit characters which |
2241 be misinterpreted by XEmacs, and cause a crash. Instead, you should use | 2257 can be misinterpreted by XEmacs, and cause a crash. Likewise, when |
2242 a conversion macro. Many different conversion macros are defined in | 2258 exporting a piece of internal text to the outside world, you should |
2243 @file{buffer.h}, so I will try to order them logically, by direction and | 2259 always convert it to an appropriate external encoding, lest the internal |
2244 by format. | 2260 stuff (such as the infamous \201 characters) leak out. |
2245 | 2261 |
2246 Thus the basic conversion macros are @code{GET_CHARPTR_INT_DATA_ALLOCA} | 2262 The interface to conversion between the internal and external |
2247 and @code{GET_CHARPTR_EXT_DATA_ALLOCA}. The former is used to convert | 2263 representations of text are the numerous conversion macros defined in |
2248 external data to internal format, and the latter is used to convert the | 2264 @file{buffer.h}. Before looking at them, we'll look at the external |
2249 other way around. The arguments each of these receives are @var{ptr} | 2265 formats supported by these macros. |
2250 (pointer to the text in external format), @var{len} (length of texts in | 2266 |
2251 bytes), @var{fmt} (format of the external text), @var{ptr_out} (lvalue | 2267 Currently meaningful formats are @code{FORMAT_BINARY}, |
2252 to which new text should be copied), and @var{len_out} (lvalue which | 2268 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. Here |
2253 will be assigned the length of the internal text in bytes). The | 2269 is a description of these. |
2254 resulting text is stored to a stack-allocated buffer. If the text | 2270 |
2255 doesn't need changing, these macros will do nothing, except for setting | 2271 @table @code |
2272 @item FORMAT_BINARY | |
2273 Binary format. This is the simplest format and is what we use in the | |
2274 absence of a more appropriate format. This converts according to the | |
2275 @code{binary} coding system: | |
2276 | |
2277 @enumerate a | |
2278 @item | |
2279 On input, bytes 0--255 are converted into characters 0--255. | |
2280 @item | |
2281 On output, characters 0--255 are converted into bytes 0--255 and other | |
2282 characters are converted into `X'. | |
2283 @end enumerate | |
2284 | |
2285 @item FORMAT_FILENAME | |
2286 Format used for filenames. In the original Mule, this is user-definable | |
2287 with the @code{pathname-coding-system} variable. For the moment, we | |
2288 just use the @code{binary} coding system. | |
2289 | |
2290 @item FORMAT_OS | |
2291 Format used for the external Unix environment---@code{argv[]}, stuff | |
2292 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. | |
2293 | |
2294 Perhaps should be the same as FORMAT_FILENAME. | |
2295 | |
2296 @item FORMAT_CTEXT | |
2297 Compound--text format. This is the standard X format used for data | |
2298 stored in properties, selections, and the like. This is an 8-bit | |
2299 no-lock-shift ISO2022 coding system. | |
2300 @end table | |
2301 | |
2302 The macros to convert between these formats and the internal format, and | |
2303 vice versa, follow. | |
2304 | |
2305 @table @code | |
2306 @item GET_CHARPTR_INT_DATA_ALLOCA | |
2307 @itemx GET_CHARPTR_EXT_DATA_ALLOCA | |
2308 These two are the most basic conversion macros. | |
2309 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal | |
2310 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way | |
2311 around. The arguments each of these receives are @var{ptr} (pointer to | |
2312 the text in external format), @var{len} (length of texts in bytes), | |
2313 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which | |
2314 new text should be copied), and @var{len_out} (lvalue which will be | |
2315 assigned the length of the internal text in bytes). The resulting text | |
2316 is stored to a stack-allocated buffer. If the text doesn't need | |
2317 changing, these macros will do nothing, except for setting | |
2256 @var{len_out}. | 2318 @var{len_out}. |
2257 | 2319 |
2258 Currently meaningful formats are @code{FORMAT_BINARY}, | 2320 The macros above take many arguments which makes them unwieldy. For |
2259 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. | 2321 this reason, a number of convenience macros are defined with obvious |
2260 | 2322 functionality, but accepting less arguments. The general rule is that |
2261 The two macros above take many arguments which makes them unwieldy. For | 2323 macros with @samp{INT} in their name convert text to internal Emacs |
2262 this reason, several convenience macros are defined with obvious | 2324 representation, whereas the @samp{EXT} macros convert to external |
2263 functionality, but accepting less arguments: | 2325 representation. |
2264 | 2326 |
2265 @table @code | 2327 @item GET_C_CHARPTR_INT_DATA_ALLOCA |
2266 @item GET_C_CHARPTR_EXT_DATA_ALLOCA | 2328 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA |
2267 @itemx GET_C_CHARPTR_INT_DATA_ALLOCA | 2329 As their names imply, these macros work on C char pointers, which are |
2268 These two macros work on ``C char pointers'', which are zero-terminated, | 2330 zero-terminated, and thus do not need @var{len} or @var{len_out} |
2269 and thus do not need @var{len} or @var{len_out} parameters. | 2331 parameters. |
2270 | 2332 |
2271 @item GET_STRING_EXT_DATA_ALLOCA | 2333 @item GET_STRING_EXT_DATA_ALLOCA |
2272 @itemx GET_C_STRING_EXT_DATA_ALLOCA | 2334 @itemx GET_C_STRING_EXT_DATA_ALLOCA |
2273 These two macros work on Lisp strings, thus also not needing a @var{len} | 2335 These two macros convert a Lisp string into an external representation. |
2274 parameter. However, @code{GET_STRING_EXT_DATA_ALLOCA} still provides a | 2336 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA} |
2275 @var{len_out} parameter. Note that for Lisp strings only one conversion | 2337 stores its output to a generic string, providing @var{len_out}, the |
2276 direction makes sense. | 2338 length of the resulting external string. On the other hand, |
2339 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be | |
2340 satisfied with output string being zero-terminated. | |
2341 | |
2342 Note that for Lisp strings only one conversion direction makes sense. | |
2277 | 2343 |
2278 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA | 2344 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA |
2345 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA | |
2346 @itemx GET_STRING_BINARY_DATA_ALLOCA | |
2347 @itemx GET_C_STRING_BINARY_DATA_ALLOCA | |
2279 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA | 2348 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA |
2280 @itemx GET_C_CHARPTR_EXT_CTEXT_DATA_ALLOCA | |
2281 @itemx ... | 2349 @itemx ... |
2282 These macros are a combination of the above, but with the @var{fmt} | 2350 These macros convert internal text to a specific external |
2283 argument encoded into the name of the macro. | 2351 representation, with the external format being encoded into the name of |
2352 the macro. Note that the @code{GET_STRING_...} and | |
2353 @code{GET_C_STRING...} macros lack the @samp{EXT} tag, because they | |
2354 only make sense in that direction. | |
2355 | |
2356 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA | |
2357 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA | |
2358 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA | |
2359 @itemx ... | |
2360 These macros convert external text of a specific format to its internal | |
2361 representation, with the external format being incoded into the name of | |
2362 the macro. | |
2284 @end table | 2363 @end table |
2285 | 2364 |
2286 @node General Guidelines for Writing Mule-Aware Code | 2365 @node General Guidelines for Writing Mule-Aware Code |
2287 @subsection General Guidelines for Writing Mule-Aware Code | 2366 @subsection General Guidelines for Writing Mule-Aware Code |
2288 | 2367 |