comparison man/internals/internals.texi @ 377:d883f39b8495 r21-2b4

Import from CVS: tag r21-2b4
author cvs
date Mon, 13 Aug 2007 11:05:42 +0200
parents 6240c7796c7a
children 8626e4521993
comparison
equal deleted inserted replaced
376:e2295b4d9f2e 377:d883f39b8495
2043 code generalization for future I18N work. 2043 code generalization for future I18N work.
2044 2044
2045 @menu 2045 @menu
2046 * Character-Related Data Types:: 2046 * Character-Related Data Types::
2047 * Working With Character and Byte Positions:: 2047 * Working With Character and Byte Positions::
2048 * Conversion of External Data:: 2048 * Conversion to and from External Data::
2049 * General Guidelines for Writing Mule-Aware Code:: 2049 * General Guidelines for Writing Mule-Aware Code::
2050 * An Example of Mule-Aware Code:: 2050 * An Example of Mule-Aware Code::
2051 @end menu 2051 @end menu
2052 2052
2053 @node Character-Related Data Types 2053 @node Character-Related Data Types
2054 @subsection Character-Related Data Types 2054 @subsection Character-Related Data Types
2055 2055
2056 First, we will list the basic character-related datatypes used by 2056 First, let's review the basic character-related datatypes used by
2057 XEmacs. Note that the separate @code{typedef}s are not required for the 2057 XEmacs. Note that the separate @code{typedef}s are not mandatory in the
2058 code to work (all of them boil down to @code{unsigned char} or 2058 current implementation (all of them boil down to @code{unsigned char} or
2059 @code{int}), but they improve clarity of code a great deal, because one 2059 @code{int}), but they improve clarity of code a great deal, because one
2060 glance at the declaration can tell the intended use of the variable. 2060 glance at the declaration can tell the intended use of the variable.
2061 2061
2062 @table @code 2062 @table @code
2063 @item Emchar 2063 @item Emchar
2091 Without Mule support, a @code{Bufbyte} is equivalent to an 2091 Without Mule support, a @code{Bufbyte} is equivalent to an
2092 @code{Emchar}. 2092 @code{Emchar}.
2093 2093
2094 @item Bufpos 2094 @item Bufpos
2095 @itemx Charcount 2095 @itemx Charcount
2096 @cindex Bufpos
2097 @cindex Charcount
2096 A @code{Bufpos} represents a character position in a buffer or string. 2098 A @code{Bufpos} represents a character position in a buffer or string.
2097 A @code{Charcount} represents a number (count) of characters. 2099 A @code{Charcount} represents a number (count) of characters.
2098 Logically, subtracting two @code{Bufpos} values yields a 2100 Logically, subtracting two @code{Bufpos} values yields a
2099 @code{Charcount} value. Although all of these are @code{typedef}ed to 2101 @code{Charcount} value. Although all of these are @code{typedef}ed to
2100 @code{int}, we use them in preference to @code{int} to make it clear 2102 @code{int}, we use them in preference to @code{int} to make it clear
2103 @code{Bufpos} and @code{Charcount} values are the only ones that are 2105 @code{Bufpos} and @code{Charcount} values are the only ones that are
2104 ever visible to Lisp. 2106 ever visible to Lisp.
2105 2107
2106 @item Bytind 2108 @item Bytind
2107 @itemx Bytecount 2109 @itemx Bytecount
2110 @cindex Bytind
2111 @cindex Bytecount
2108 A @code{Bytind} represents a byte position in a buffer or string. A 2112 A @code{Bytind} represents a byte position in a buffer or string. A
2109 @code{Bytecount} represents the distance between two positions in bytes. 2113 @code{Bytecount} represents the distance between two positions in bytes.
2110 The relationship between @code{Bytind} and @code{Bytecount} is the same 2114 The relationship between @code{Bytind} and @code{Bytecount} is the same
2111 as the relationship between @code{Bufpos} and @code{Charcount}. 2115 as the relationship between @code{Bufpos} and @code{Charcount}.
2112 2116
2113 @item Extbyte 2117 @item Extbyte
2114 @itemx Extcount 2118 @itemx Extcount
2119 @cindex Extbyte
2120 @cindex Extcount
2115 When dealing with the outside world, XEmacs works with @code{Extbyte}s, 2121 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2116 which are equivalent to @code{unsigned char}. Obviously, an 2122 which are equivalent to @code{unsigned char}. Obviously, an
2117 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes 2123 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes
2118 and Extcounts are not all that frequent in XEmacs code. 2124 and Extcounts are not all that frequent in XEmacs code.
2119 @end table 2125 @end table
2128 most important ones. Examining the existing code is the best way to 2134 most important ones. Examining the existing code is the best way to
2129 learn about them. 2135 learn about them.
2130 2136
2131 @table @code 2137 @table @code
2132 @item MAX_EMCHAR_LEN 2138 @item MAX_EMCHAR_LEN
2139 @cindex MAX_EMCHAR_LEN
2133 This preprocessor constant is the maximum number of buffer bytes per 2140 This preprocessor constant is the maximum number of buffer bytes per
2134 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful 2141 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful
2135 when allocating temporary strings to keep a known number of characters. 2142 when allocating temporary strings to keep a known number of characters.
2136 For instance: 2143 For instance:
2137 2144
2153 2160
2154 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4. 2161 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2155 Without Mule, it is 1. 2162 Without Mule, it is 1.
2156 2163
2157 @item charptr_emchar 2164 @item charptr_emchar
2158 @item set_charptr_emchar 2165 @itemx set_charptr_emchar
2159 @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and returns 2166 @cindex charptr_emchar
2160 the underlying @code{Emchar}. If it were a function, its prototype 2167 @cindex set_charptr_emchar
2161 would be: 2168 The @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and
2169 returns the @code{Emchar} stored at that position. If it were a
2170 function, its prototype would be:
2162 2171
2163 @example 2172 @example
2164 Emchar charptr_emchar (Bufbyte *p); 2173 Emchar charptr_emchar (Bufbyte *p);
2165 @end example 2174 @end example
2166 2175
2198 Note how @code{set_charptr_emchar} is used to store the @code{Emchar} 2207 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2199 and increment the counter, at the same time. 2208 and increment the counter, at the same time.
2200 2209
2201 @item INC_CHARPTR 2210 @item INC_CHARPTR
2202 @itemx DEC_CHARPTR 2211 @itemx DEC_CHARPTR
2212 @cindex INC_CHARPTR
2213 @cindex DEC_CHARPTR
2203 These two macros increment and decrement a @code{Bufbyte} pointer, 2214 These two macros increment and decrement a @code{Bufbyte} pointer,
2204 respectively. The pointer needs to be correctly positioned at the 2215 respectively. They will adjust the pointer by the appropriate number of
2205 beginning of a valid character position. 2216 bytes according to the byte length of the character stored there. Both
2217 macros assume that the memory address is located at the beginning of a
2218 valid character.
2206 2219
2207 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)} 2220 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2208 simply expand to @code{p++} and @code{p--}, respectively. 2221 simply expand to @code{p++} and @code{p--}, respectively.
2209 2222
2210 @item bytecount_to_charcount 2223 @item bytecount_to_charcount
2224 @cindex bytecount_to_charcount
2211 Given a pointer to a text string and a length in bytes, return the 2225 Given a pointer to a text string and a length in bytes, return the
2212 equivalent length in characters. 2226 equivalent length in characters.
2213 2227
2214 @example 2228 @example
2215 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc); 2229 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2216 @end example 2230 @end example
2217 2231
2218 @item charcount_to_bytecount 2232 @item charcount_to_bytecount
2233 @cindex charcount_to_bytecount
2219 Given a pointer to a text string and a length in characters, return the 2234 Given a pointer to a text string and a length in characters, return the
2220 equivalent length in bytes. 2235 equivalent length in bytes.
2221 2236
2222 @example 2237 @example
2223 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc); 2238 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2224 @end example 2239 @end example
2225 2240
2226 @item charptr_n_addr 2241 @item charptr_n_addr
2242 @cindex charptr_n_addr
2227 Return a pointer to the beginning of the character offset @var{cc} (in 2243 Return a pointer to the beginning of the character offset @var{cc} (in
2228 characters) from @var{p}. 2244 characters) from @var{p}.
2229 2245
2230 @example 2246 @example
2231 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); 2247 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2232 @end example 2248 @end example
2233 @end table 2249 @end table
2234 2250
2235 @node Conversion of External Data 2251 @node Conversion to and from External Data
2236 @subsection Conversion of External Data 2252 @subsection Conversion to and from External Data
2237 2253
2238 When an external function, such as a C library function, returns a 2254 When an external function, such as a C library function, returns a
2239 @code{char} pointer, you should never treat it as @code{Bufbyte}. This 2255 @code{char} pointer, you should almost never treat it as @code{Bufbyte}.
2240 is because these returned strings may contain 8bit characters which can 2256 This is because these returned strings may contain 8bit characters which
2241 be misinterpreted by XEmacs, and cause a crash. Instead, you should use 2257 can be misinterpreted by XEmacs, and cause a crash. Likewise, when
2242 a conversion macro. Many different conversion macros are defined in 2258 exporting a piece of internal text to the outside world, you should
2243 @file{buffer.h}, so I will try to order them logically, by direction and 2259 always convert it to an appropriate external encoding, lest the internal
2244 by format. 2260 stuff (such as the infamous \201 characters) leak out.
2245 2261
2246 Thus the basic conversion macros are @code{GET_CHARPTR_INT_DATA_ALLOCA} 2262 The interface to conversion between the internal and external
2247 and @code{GET_CHARPTR_EXT_DATA_ALLOCA}. The former is used to convert 2263 representations of text are the numerous conversion macros defined in
2248 external data to internal format, and the latter is used to convert the 2264 @file{buffer.h}. Before looking at them, we'll look at the external
2249 other way around. The arguments each of these receives are @var{ptr} 2265 formats supported by these macros.
2250 (pointer to the text in external format), @var{len} (length of texts in 2266
2251 bytes), @var{fmt} (format of the external text), @var{ptr_out} (lvalue 2267 Currently meaningful formats are @code{FORMAT_BINARY},
2252 to which new text should be copied), and @var{len_out} (lvalue which 2268 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. Here
2253 will be assigned the length of the internal text in bytes). The 2269 is a description of these.
2254 resulting text is stored to a stack-allocated buffer. If the text 2270
2255 doesn't need changing, these macros will do nothing, except for setting 2271 @table @code
2272 @item FORMAT_BINARY
2273 Binary format. This is the simplest format and is what we use in the
2274 absence of a more appropriate format. This converts according to the
2275 @code{binary} coding system:
2276
2277 @enumerate a
2278 @item
2279 On input, bytes 0--255 are converted into characters 0--255.
2280 @item
2281 On output, characters 0--255 are converted into bytes 0--255 and other
2282 characters are converted into `X'.
2283 @end enumerate
2284
2285 @item FORMAT_FILENAME
2286 Format used for filenames. In the original Mule, this is user-definable
2287 with the @code{pathname-coding-system} variable. For the moment, we
2288 just use the @code{binary} coding system.
2289
2290 @item FORMAT_OS
2291 Format used for the external Unix environment---@code{argv[]}, stuff
2292 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2293
2294 Perhaps should be the same as FORMAT_FILENAME.
2295
2296 @item FORMAT_CTEXT
2297 Compound--text format. This is the standard X format used for data
2298 stored in properties, selections, and the like. This is an 8-bit
2299 no-lock-shift ISO2022 coding system.
2300 @end table
2301
2302 The macros to convert between these formats and the internal format, and
2303 vice versa, follow.
2304
2305 @table @code
2306 @item GET_CHARPTR_INT_DATA_ALLOCA
2307 @itemx GET_CHARPTR_EXT_DATA_ALLOCA
2308 These two are the most basic conversion macros.
2309 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal
2310 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way
2311 around. The arguments each of these receives are @var{ptr} (pointer to
2312 the text in external format), @var{len} (length of texts in bytes),
2313 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which
2314 new text should be copied), and @var{len_out} (lvalue which will be
2315 assigned the length of the internal text in bytes). The resulting text
2316 is stored to a stack-allocated buffer. If the text doesn't need
2317 changing, these macros will do nothing, except for setting
2256 @var{len_out}. 2318 @var{len_out}.
2257 2319
2258 Currently meaningful formats are @code{FORMAT_BINARY}, 2320 The macros above take many arguments which makes them unwieldy. For
2259 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. 2321 this reason, a number of convenience macros are defined with obvious
2260 2322 functionality, but accepting less arguments. The general rule is that
2261 The two macros above take many arguments which makes them unwieldy. For 2323 macros with @samp{INT} in their name convert text to internal Emacs
2262 this reason, several convenience macros are defined with obvious 2324 representation, whereas the @samp{EXT} macros convert to external
2263 functionality, but accepting less arguments: 2325 representation.
2264 2326
2265 @table @code 2327 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2266 @item GET_C_CHARPTR_EXT_DATA_ALLOCA 2328 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2267 @itemx GET_C_CHARPTR_INT_DATA_ALLOCA 2329 As their names imply, these macros work on C char pointers, which are
2268 These two macros work on ``C char pointers'', which are zero-terminated, 2330 zero-terminated, and thus do not need @var{len} or @var{len_out}
2269 and thus do not need @var{len} or @var{len_out} parameters. 2331 parameters.
2270 2332
2271 @item GET_STRING_EXT_DATA_ALLOCA 2333 @item GET_STRING_EXT_DATA_ALLOCA
2272 @itemx GET_C_STRING_EXT_DATA_ALLOCA 2334 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2273 These two macros work on Lisp strings, thus also not needing a @var{len} 2335 These two macros convert a Lisp string into an external representation.
2274 parameter. However, @code{GET_STRING_EXT_DATA_ALLOCA} still provides a 2336 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2275 @var{len_out} parameter. Note that for Lisp strings only one conversion 2337 stores its output to a generic string, providing @var{len_out}, the
2276 direction makes sense. 2338 length of the resulting external string. On the other hand,
2339 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2340 satisfied with output string being zero-terminated.
2341
2342 Note that for Lisp strings only one conversion direction makes sense.
2277 2343
2278 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA 2344 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2345 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2346 @itemx GET_STRING_BINARY_DATA_ALLOCA
2347 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2279 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA 2348 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2280 @itemx GET_C_CHARPTR_EXT_CTEXT_DATA_ALLOCA
2281 @itemx ... 2349 @itemx ...
2282 These macros are a combination of the above, but with the @var{fmt} 2350 These macros convert internal text to a specific external
2283 argument encoded into the name of the macro. 2351 representation, with the external format being encoded into the name of
2352 the macro. Note that the @code{GET_STRING_...} and
2353 @code{GET_C_STRING...} macros lack the @samp{EXT} tag, because they
2354 only make sense in that direction.
2355
2356 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2357 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2358 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2359 @itemx ...
2360 These macros convert external text of a specific format to its internal
2361 representation, with the external format being incoded into the name of
2362 the macro.
2284 @end table 2363 @end table
2285 2364
2286 @node General Guidelines for Writing Mule-Aware Code 2365 @node General Guidelines for Writing Mule-Aware Code
2287 @subsection General Guidelines for Writing Mule-Aware Code 2366 @subsection General Guidelines for Writing Mule-Aware Code
2288 2367