comparison man/internals/internals.texi @ 373:6240c7796c7a r21-2b2

Import from CVS: tag r21-2b2
author cvs
date Mon, 13 Aug 2007 11:04:06 +0200
parents cc15677e0335
children d883f39b8495
comparison
equal deleted inserted replaced
372:49e1ed2d7ed8 373:6240c7796c7a
596 @itemize @bullet 596 @itemize @bullet
597 @item 597 @item
598 version 20.1 released September 17, 1997. 598 version 20.1 released September 17, 1997.
599 @item 599 @item
600 version 20.2 released September 20, 1997. 600 version 20.2 released September 20, 1997.
601 @item
602 version 20.3 released August 19, 1998.
601 @end itemize 603 @end itemize
602 604
603 @node XEmacs 605 @node XEmacs
604 @section XEmacs 606 @section XEmacs
605 @cindex XEmacs 607 @cindex XEmacs
1652 1654
1653 @menu 1655 @menu
1654 * General Coding Rules:: 1656 * General Coding Rules::
1655 * Writing Lisp Primitives:: 1657 * Writing Lisp Primitives::
1656 * Adding Global Lisp Variables:: 1658 * Adding Global Lisp Variables::
1659 * Coding for Mule::
1657 * Techniques for XEmacs Developers:: 1660 * Techniques for XEmacs Developers::
1658 @end menu 1661 @end menu
1659 1662
1660 @node General Coding Rules 1663 @node General Coding Rules
1661 @section General Coding Rules 1664 @section General Coding Rules
1752 1755
1753 while (!NILP (args)) 1756 while (!NILP (args))
1754 @{ 1757 @{
1755 val = Feval (XCAR (args)); 1758 val = Feval (XCAR (args));
1756 if (!NILP (val)) 1759 if (!NILP (val))
1757 break; 1760 break;
1758 args = XCDR (args); 1761 args = XCDR (args);
1759 @} 1762 @}
1760 1763
1761 UNGCPRO; 1764 UNGCPRO;
1762 return val; 1765 return val;
2021 C variable in the @code{vars_of_*()} function. Otherwise, the 2024 C variable in the @code{vars_of_*()} function. Otherwise, the
2022 garbage-collection mechanism won't know that the object in this variable 2025 garbage-collection mechanism won't know that the object in this variable
2023 is in use, and will happily collect it and reuse its storage for another 2026 is in use, and will happily collect it and reuse its storage for another
2024 Lisp object, and you will be the one who's unhappy when you can't figure 2027 Lisp object, and you will be the one who's unhappy when you can't figure
2025 out how your variable got overwritten. 2028 out how your variable got overwritten.
2029
2030 @node Coding for Mule
2031 @section Coding for Mule
2032 @cindex Coding for Mule
2033
2034 Although Mule support is not compiled by default in XEmacs, many people
2035 are using it, and we consider it crucial that new code works correctly
2036 with multibyte characters. This is not hard; it is only a matter of
2037 following several simple user-interface guidelines. Even if you never
2038 compile with Mule, with a little practice you will find it quite easy
2039 to code Mule-correctly.
2040
2041 Note that these guidelines are not necessarily tied to the current Mule
2042 implementation; they are also a good idea to follow on the grounds of
2043 code generalization for future I18N work.
2044
2045 @menu
2046 * Character-Related Data Types::
2047 * Working With Character and Byte Positions::
2048 * Conversion of External Data::
2049 * General Guidelines for Writing Mule-Aware Code::
2050 * An Example of Mule-Aware Code::
2051 @end menu
2052
2053 @node Character-Related Data Types
2054 @subsection Character-Related Data Types
2055
2056 First, we will list the basic character-related datatypes used by
2057 XEmacs. Note that the separate @code{typedef}s are not required for the
2058 code to work (all of them boil down to @code{unsigned char} or
2059 @code{int}), but they improve clarity of code a great deal, because one
2060 glance at the declaration can tell the intended use of the variable.
2061
2062 @table @code
2063 @item Emchar
2064 @cindex Emchar
2065 An @code{Emchar} holds a single Emacs character.
2066
2067 Obviously, the equality between characters and bytes is lost in the Mule
2068 world. Characters can be represented by one or more bytes in the
2069 buffer, and @code{Emchar} is the C type large enough to hold any
2070 character.
2071
2072 Without Mule support, an @code{Emchar} is equivalent to an
2073 @code{unsigned char}.
2074
2075 @item Bufbyte
2076 @cindex Bufbyte
2077 The data representing the text in a buffer or string is logically a set
2078 of @code{Bufbyte}s.
2079
2080 XEmacs does not work with character formats all the time; when reading
2081 characters from the outside, it decodes them to an internal format, and
2082 likewise encodes them when writing. @code{Bufbyte} (in fact
2083 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2084 strings format.
2085
2086 One character can correspond to one or more @code{Bufbyte}s. In the
2087 current implementation, an ASCII character is represented by the same
2088 @code{Bufbyte}, and extended characters are represented by a sequence of
2089 @code{Bufbyte}s.
2090
2091 Without Mule support, a @code{Bufbyte} is equivalent to an
2092 @code{Emchar}.
2093
2094 @item Bufpos
2095 @itemx Charcount
2096 A @code{Bufpos} represents a character position in a buffer or string.
2097 A @code{Charcount} represents a number (count) of characters.
2098 Logically, subtracting two @code{Bufpos} values yields a
2099 @code{Charcount} value. Although all of these are @code{typedef}ed to
2100 @code{int}, we use them in preference to @code{int} to make it clear
2101 what sort of position is being used.
2102
2103 @code{Bufpos} and @code{Charcount} values are the only ones that are
2104 ever visible to Lisp.
2105
2106 @item Bytind
2107 @itemx Bytecount
2108 A @code{Bytind} represents a byte position in a buffer or string. A
2109 @code{Bytecount} represents the distance between two positions in bytes.
2110 The relationship between @code{Bytind} and @code{Bytecount} is the same
2111 as the relationship between @code{Bufpos} and @code{Charcount}.
2112
2113 @item Extbyte
2114 @itemx Extcount
2115 When dealing with the outside world, XEmacs works with @code{Extbyte}s,
2116 which are equivalent to @code{unsigned char}. Obviously, an
2117 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes
2118 and Extcounts are not all that frequent in XEmacs code.
2119 @end table
2120
2121 @node Working With Character and Byte Positions
2122 @subsection Working With Character and Byte Positions
2123
2124 Now that we have defined the basic character-related types, we can look
2125 at the macros and functions designed for work with them and for
2126 conversion between them. Most of these macros are defined in
2127 @file{buffer.h}, and we don't discuss all of them here, but only the
2128 most important ones. Examining the existing code is the best way to
2129 learn about them.
2130
2131 @table @code
2132 @item MAX_EMCHAR_LEN
2133 This preprocessor constant is the maximum number of buffer bytes per
2134 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful
2135 when allocating temporary strings to keep a known number of characters.
2136 For instance:
2137
2138 @example
2139 @group
2140 @{
2141 Charcount cclen;
2142 ...
2143 @{
2144 /* Allocate place for @var{cclen} characters. */
2145 Bufbyte *tmp_buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN);
2146 ...
2147 @end group
2148 @end example
2149
2150 If you followed the previous section, you can guess that, logically,
2151 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces
2152 a @code{Bytecount} value.
2153
2154 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4.
2155 Without Mule, it is 1.
2156
2157 @item charptr_emchar
2158 @item set_charptr_emchar
2159 @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and returns
2160 the underlying @code{Emchar}. If it were a function, its prototype
2161 would be:
2162
2163 @example
2164 Emchar charptr_emchar (Bufbyte *p);
2165 @end example
2166
2167 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte
2168 position. It returns the number of bytes stored:
2169
2170 @example
2171 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c);
2172 @end example
2173
2174 It is important to note that @code{set_charptr_emchar} is safe only for
2175 appending a character at the end of a buffer, not for overwriting a
2176 character in the middle. This is because the width of characters
2177 varies, and @code{set_charptr_emchar} cannot resize the string if it
2178 writes, say, a two-byte character where a single-byte character used to
2179 reside.
2180
2181 A typical use of @code{set_charptr_emchar} can be demonstrated by this
2182 example, which copies characters from buffer @var{buf} to a temporary
2183 string of Bufbytes.
2184
2185 @example
2186 @group
2187 @{
2188 Bufpos pos;
2189 for (pos = beg; pos < end; pos++)
2190 @{
2191 Emchar c = BUF_FETCH_CHAR (buf, pos);
2192 p += set_charptr_emchar (buf, c);
2193 @}
2194 @}
2195 @end group
2196 @end example
2197
2198 Note how @code{set_charptr_emchar} is used to store the @code{Emchar}
2199 and increment the counter, at the same time.
2200
2201 @item INC_CHARPTR
2202 @itemx DEC_CHARPTR
2203 These two macros increment and decrement a @code{Bufbyte} pointer,
2204 respectively. The pointer needs to be correctly positioned at the
2205 beginning of a valid character position.
2206
2207 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)}
2208 simply expand to @code{p++} and @code{p--}, respectively.
2209
2210 @item bytecount_to_charcount
2211 Given a pointer to a text string and a length in bytes, return the
2212 equivalent length in characters.
2213
2214 @example
2215 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc);
2216 @end example
2217
2218 @item charcount_to_bytecount
2219 Given a pointer to a text string and a length in characters, return the
2220 equivalent length in bytes.
2221
2222 @example
2223 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc);
2224 @end example
2225
2226 @item charptr_n_addr
2227 Return a pointer to the beginning of the character offset @var{cc} (in
2228 characters) from @var{p}.
2229
2230 @example
2231 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc);
2232 @end example
2233 @end table
2234
2235 @node Conversion of External Data
2236 @subsection Conversion of External Data
2237
2238 When an external function, such as a C library function, returns a
2239 @code{char} pointer, you should never treat it as @code{Bufbyte}. This
2240 is because these returned strings may contain 8bit characters which can
2241 be misinterpreted by XEmacs, and cause a crash. Instead, you should use
2242 a conversion macro. Many different conversion macros are defined in
2243 @file{buffer.h}, so I will try to order them logically, by direction and
2244 by format.
2245
2246 Thus the basic conversion macros are @code{GET_CHARPTR_INT_DATA_ALLOCA}
2247 and @code{GET_CHARPTR_EXT_DATA_ALLOCA}. The former is used to convert
2248 external data to internal format, and the latter is used to convert the
2249 other way around. The arguments each of these receives are @var{ptr}
2250 (pointer to the text in external format), @var{len} (length of texts in
2251 bytes), @var{fmt} (format of the external text), @var{ptr_out} (lvalue
2252 to which new text should be copied), and @var{len_out} (lvalue which
2253 will be assigned the length of the internal text in bytes). The
2254 resulting text is stored to a stack-allocated buffer. If the text
2255 doesn't need changing, these macros will do nothing, except for setting
2256 @var{len_out}.
2257
2258 Currently meaningful formats are @code{FORMAT_BINARY},
2259 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}.
2260
2261 The two macros above take many arguments which makes them unwieldy. For
2262 this reason, several convenience macros are defined with obvious
2263 functionality, but accepting less arguments:
2264
2265 @table @code
2266 @item GET_C_CHARPTR_EXT_DATA_ALLOCA
2267 @itemx GET_C_CHARPTR_INT_DATA_ALLOCA
2268 These two macros work on ``C char pointers'', which are zero-terminated,
2269 and thus do not need @var{len} or @var{len_out} parameters.
2270
2271 @item GET_STRING_EXT_DATA_ALLOCA
2272 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2273 These two macros work on Lisp strings, thus also not needing a @var{len}
2274 parameter. However, @code{GET_STRING_EXT_DATA_ALLOCA} still provides a
2275 @var{len_out} parameter. Note that for Lisp strings only one conversion
2276 direction makes sense.
2277
2278 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2279 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2280 @itemx GET_C_CHARPTR_EXT_CTEXT_DATA_ALLOCA
2281 @itemx ...
2282 These macros are a combination of the above, but with the @var{fmt}
2283 argument encoded into the name of the macro.
2284 @end table
2285
2286 @node General Guidelines for Writing Mule-Aware Code
2287 @subsection General Guidelines for Writing Mule-Aware Code
2288
2289 This section contains some general guidance on how to write Mule-aware
2290 code, as well as some pitfalls you should avoid.
2291
2292 @table @emph
2293 @item Never use @code{char} and @code{char *}.
2294 In XEmacs, the use of @code{char} and @code{char *} is almost always a
2295 mistake. If you want to manipulate an Emacs character from ``C'', use
2296 @code{Emchar}. If you want to examine a specific octet in the internal
2297 format, use @code{Bufbyte}. If you want a Lisp-visible character, use a
2298 @code{Lisp_Object} and @code{make_char}. If you want a pointer to move
2299 through the internal text, use @code{Bufbyte *}. Also note that you
2300 almost certainly do not need @code{Emchar *}.
2301
2302 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}.
2303 The whole point of using different types is to avoid confusion about the
2304 use of certain variables. Lest this effect be nullified, you need to be
2305 careful about using the right types.
2306
2307 @item Always convert external data
2308 It is extremely important to always convert external data, because
2309 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2310 buffers literally.
2311
2312 This means that when a system function, such as @code{readdir}, returns
2313 a string, you need to convert it using one of the conversion macros
2314 described in the previous chapter, before passing it further to Lisp.
2315 In the case of @code{readdir}, you would use the
2316 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro.
2317
2318 Also note that many internal functions, such as @code{make_string},
2319 accept Bufbytes, which removes the need for them to convert the data
2320 they receive. This increases efficiency because that way external data
2321 needs to be decoded only once, when it is read. After that, it is
2322 passed around in internal format.
2323 @end table
2324
2325 @node An Example of Mule-Aware Code
2326 @subsection An Example of Mule-Aware Code
2327
2328 As an example of Mule-aware code, we shall will analyze the
2329 @code{string} function, which conses up a Lisp string from the character
2330 arguments it receives. Here is the definition, pasted from
2331 @code{alloc.c}:
2332
2333 @example
2334 @group
2335 DEFUN ("string", Fstring, 0, MANY, 0, /*
2336 Concatenate all the argument characters and make the result a string.
2337 */
2338 (int nargs, Lisp_Object *args))
2339 @{
2340 Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN);
2341 Bufbyte *p = storage;
2342
2343 for (; nargs; nargs--, args++)
2344 @{
2345 Lisp_Object lisp_char = *args;
2346 CHECK_CHAR_COERCE_INT (lisp_char);
2347 p += set_charptr_emchar (p, XCHAR (lisp_char));
2348 @}
2349 return make_string (storage, p - storage);
2350 @}
2351 @end group
2352 @end example
2353
2354 Now we can analyze the source line by line.
2355
2356 Obviously, string will be as long as there are arguments to the
2357 function. This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs}
2358 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs}
2359 @code{Emchar}s to fit in the string.
2360
2361 Then, the loop checks that each element is a character, converting
2362 integers in the process. Like many other functions in XEmacs, this
2363 function silently accepts integers where characters are expected, for
2364 historical and compatibility reasons. Unless you know what you are
2365 doing, @code{CHECK_CHAR} will also suffice. @code{XCHAR (lisp_char)}
2366 extracts the @code{Emchar} from the @code{Lisp_Object}, and
2367 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in
2368 the process.
2369
2370 Other instructing examples of correct coding under Mule can be found all
2371 over XEmacs code. For starters, I recommend
2372 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have
2373 understood this section of the manual and studied the examples, you can
2374 proceed writing new Mule-aware code.
2026 2375
2027 @node Techniques for XEmacs Developers 2376 @node Techniques for XEmacs Developers
2028 @section Techniques for XEmacs Developers 2377 @section Techniques for XEmacs Developers
2029 2378
2030 To make a quantified XEmacs, do: @code{make quantmacs}. 2379 To make a quantified XEmacs, do: @code{make quantmacs}.