Mercurial > hg > xemacs-beta
comparison man/internals/internals.texi @ 373:6240c7796c7a r21-2b2
Import from CVS: tag r21-2b2
author | cvs |
---|---|
date | Mon, 13 Aug 2007 11:04:06 +0200 |
parents | cc15677e0335 |
children | d883f39b8495 |
comparison
equal
deleted
inserted
replaced
372:49e1ed2d7ed8 | 373:6240c7796c7a |
---|---|
596 @itemize @bullet | 596 @itemize @bullet |
597 @item | 597 @item |
598 version 20.1 released September 17, 1997. | 598 version 20.1 released September 17, 1997. |
599 @item | 599 @item |
600 version 20.2 released September 20, 1997. | 600 version 20.2 released September 20, 1997. |
601 @item | |
602 version 20.3 released August 19, 1998. | |
601 @end itemize | 603 @end itemize |
602 | 604 |
603 @node XEmacs | 605 @node XEmacs |
604 @section XEmacs | 606 @section XEmacs |
605 @cindex XEmacs | 607 @cindex XEmacs |
1652 | 1654 |
1653 @menu | 1655 @menu |
1654 * General Coding Rules:: | 1656 * General Coding Rules:: |
1655 * Writing Lisp Primitives:: | 1657 * Writing Lisp Primitives:: |
1656 * Adding Global Lisp Variables:: | 1658 * Adding Global Lisp Variables:: |
1659 * Coding for Mule:: | |
1657 * Techniques for XEmacs Developers:: | 1660 * Techniques for XEmacs Developers:: |
1658 @end menu | 1661 @end menu |
1659 | 1662 |
1660 @node General Coding Rules | 1663 @node General Coding Rules |
1661 @section General Coding Rules | 1664 @section General Coding Rules |
1752 | 1755 |
1753 while (!NILP (args)) | 1756 while (!NILP (args)) |
1754 @{ | 1757 @{ |
1755 val = Feval (XCAR (args)); | 1758 val = Feval (XCAR (args)); |
1756 if (!NILP (val)) | 1759 if (!NILP (val)) |
1757 break; | 1760 break; |
1758 args = XCDR (args); | 1761 args = XCDR (args); |
1759 @} | 1762 @} |
1760 | 1763 |
1761 UNGCPRO; | 1764 UNGCPRO; |
1762 return val; | 1765 return val; |
2021 C variable in the @code{vars_of_*()} function. Otherwise, the | 2024 C variable in the @code{vars_of_*()} function. Otherwise, the |
2022 garbage-collection mechanism won't know that the object in this variable | 2025 garbage-collection mechanism won't know that the object in this variable |
2023 is in use, and will happily collect it and reuse its storage for another | 2026 is in use, and will happily collect it and reuse its storage for another |
2024 Lisp object, and you will be the one who's unhappy when you can't figure | 2027 Lisp object, and you will be the one who's unhappy when you can't figure |
2025 out how your variable got overwritten. | 2028 out how your variable got overwritten. |
2029 | |
2030 @node Coding for Mule | |
2031 @section Coding for Mule | |
2032 @cindex Coding for Mule | |
2033 | |
2034 Although Mule support is not compiled by default in XEmacs, many people | |
2035 are using it, and we consider it crucial that new code works correctly | |
2036 with multibyte characters. This is not hard; it is only a matter of | |
2037 following several simple user-interface guidelines. Even if you never | |
2038 compile with Mule, with a little practice you will find it quite easy | |
2039 to code Mule-correctly. | |
2040 | |
2041 Note that these guidelines are not necessarily tied to the current Mule | |
2042 implementation; they are also a good idea to follow on the grounds of | |
2043 code generalization for future I18N work. | |
2044 | |
2045 @menu | |
2046 * Character-Related Data Types:: | |
2047 * Working With Character and Byte Positions:: | |
2048 * Conversion of External Data:: | |
2049 * General Guidelines for Writing Mule-Aware Code:: | |
2050 * An Example of Mule-Aware Code:: | |
2051 @end menu | |
2052 | |
2053 @node Character-Related Data Types | |
2054 @subsection Character-Related Data Types | |
2055 | |
2056 First, we will list the basic character-related datatypes used by | |
2057 XEmacs. Note that the separate @code{typedef}s are not required for the | |
2058 code to work (all of them boil down to @code{unsigned char} or | |
2059 @code{int}), but they improve clarity of code a great deal, because one | |
2060 glance at the declaration can tell the intended use of the variable. | |
2061 | |
2062 @table @code | |
2063 @item Emchar | |
2064 @cindex Emchar | |
2065 An @code{Emchar} holds a single Emacs character. | |
2066 | |
2067 Obviously, the equality between characters and bytes is lost in the Mule | |
2068 world. Characters can be represented by one or more bytes in the | |
2069 buffer, and @code{Emchar} is the C type large enough to hold any | |
2070 character. | |
2071 | |
2072 Without Mule support, an @code{Emchar} is equivalent to an | |
2073 @code{unsigned char}. | |
2074 | |
2075 @item Bufbyte | |
2076 @cindex Bufbyte | |
2077 The data representing the text in a buffer or string is logically a set | |
2078 of @code{Bufbyte}s. | |
2079 | |
2080 XEmacs does not work with character formats all the time; when reading | |
2081 characters from the outside, it decodes them to an internal format, and | |
2082 likewise encodes them when writing. @code{Bufbyte} (in fact | |
2083 @code{unsigned char}) is the basic unit of XEmacs internal buffers and | |
2084 strings format. | |
2085 | |
2086 One character can correspond to one or more @code{Bufbyte}s. In the | |
2087 current implementation, an ASCII character is represented by the same | |
2088 @code{Bufbyte}, and extended characters are represented by a sequence of | |
2089 @code{Bufbyte}s. | |
2090 | |
2091 Without Mule support, a @code{Bufbyte} is equivalent to an | |
2092 @code{Emchar}. | |
2093 | |
2094 @item Bufpos | |
2095 @itemx Charcount | |
2096 A @code{Bufpos} represents a character position in a buffer or string. | |
2097 A @code{Charcount} represents a number (count) of characters. | |
2098 Logically, subtracting two @code{Bufpos} values yields a | |
2099 @code{Charcount} value. Although all of these are @code{typedef}ed to | |
2100 @code{int}, we use them in preference to @code{int} to make it clear | |
2101 what sort of position is being used. | |
2102 | |
2103 @code{Bufpos} and @code{Charcount} values are the only ones that are | |
2104 ever visible to Lisp. | |
2105 | |
2106 @item Bytind | |
2107 @itemx Bytecount | |
2108 A @code{Bytind} represents a byte position in a buffer or string. A | |
2109 @code{Bytecount} represents the distance between two positions in bytes. | |
2110 The relationship between @code{Bytind} and @code{Bytecount} is the same | |
2111 as the relationship between @code{Bufpos} and @code{Charcount}. | |
2112 | |
2113 @item Extbyte | |
2114 @itemx Extcount | |
2115 When dealing with the outside world, XEmacs works with @code{Extbyte}s, | |
2116 which are equivalent to @code{unsigned char}. Obviously, an | |
2117 @code{Extcount} is the distance between two @code{Extbyte}s. Extbytes | |
2118 and Extcounts are not all that frequent in XEmacs code. | |
2119 @end table | |
2120 | |
2121 @node Working With Character and Byte Positions | |
2122 @subsection Working With Character and Byte Positions | |
2123 | |
2124 Now that we have defined the basic character-related types, we can look | |
2125 at the macros and functions designed for work with them and for | |
2126 conversion between them. Most of these macros are defined in | |
2127 @file{buffer.h}, and we don't discuss all of them here, but only the | |
2128 most important ones. Examining the existing code is the best way to | |
2129 learn about them. | |
2130 | |
2131 @table @code | |
2132 @item MAX_EMCHAR_LEN | |
2133 This preprocessor constant is the maximum number of buffer bytes per | |
2134 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful | |
2135 when allocating temporary strings to keep a known number of characters. | |
2136 For instance: | |
2137 | |
2138 @example | |
2139 @group | |
2140 @{ | |
2141 Charcount cclen; | |
2142 ... | |
2143 @{ | |
2144 /* Allocate place for @var{cclen} characters. */ | |
2145 Bufbyte *tmp_buf = (Bufbyte *)alloca (cclen * MAX_EMCHAR_LEN); | |
2146 ... | |
2147 @end group | |
2148 @end example | |
2149 | |
2150 If you followed the previous section, you can guess that, logically, | |
2151 multiplying a @code{Charcount} value with @code{MAX_EMCHAR_LEN} produces | |
2152 a @code{Bytecount} value. | |
2153 | |
2154 In the current Mule implementation, @code{MAX_EMCHAR_LEN} equals 4. | |
2155 Without Mule, it is 1. | |
2156 | |
2157 @item charptr_emchar | |
2158 @item set_charptr_emchar | |
2159 @code{charptr_emchar} macro takes a @code{Bufbyte} pointer and returns | |
2160 the underlying @code{Emchar}. If it were a function, its prototype | |
2161 would be: | |
2162 | |
2163 @example | |
2164 Emchar charptr_emchar (Bufbyte *p); | |
2165 @end example | |
2166 | |
2167 @code{set_charptr_emchar} stores an @code{Emchar} to the specified byte | |
2168 position. It returns the number of bytes stored: | |
2169 | |
2170 @example | |
2171 Bytecount set_charptr_emchar (Bufbyte *p, Emchar c); | |
2172 @end example | |
2173 | |
2174 It is important to note that @code{set_charptr_emchar} is safe only for | |
2175 appending a character at the end of a buffer, not for overwriting a | |
2176 character in the middle. This is because the width of characters | |
2177 varies, and @code{set_charptr_emchar} cannot resize the string if it | |
2178 writes, say, a two-byte character where a single-byte character used to | |
2179 reside. | |
2180 | |
2181 A typical use of @code{set_charptr_emchar} can be demonstrated by this | |
2182 example, which copies characters from buffer @var{buf} to a temporary | |
2183 string of Bufbytes. | |
2184 | |
2185 @example | |
2186 @group | |
2187 @{ | |
2188 Bufpos pos; | |
2189 for (pos = beg; pos < end; pos++) | |
2190 @{ | |
2191 Emchar c = BUF_FETCH_CHAR (buf, pos); | |
2192 p += set_charptr_emchar (buf, c); | |
2193 @} | |
2194 @} | |
2195 @end group | |
2196 @end example | |
2197 | |
2198 Note how @code{set_charptr_emchar} is used to store the @code{Emchar} | |
2199 and increment the counter, at the same time. | |
2200 | |
2201 @item INC_CHARPTR | |
2202 @itemx DEC_CHARPTR | |
2203 These two macros increment and decrement a @code{Bufbyte} pointer, | |
2204 respectively. The pointer needs to be correctly positioned at the | |
2205 beginning of a valid character position. | |
2206 | |
2207 Without Mule support, @code{INC_CHARPTR (p)} and @code{DEC_CHARPTR (p)} | |
2208 simply expand to @code{p++} and @code{p--}, respectively. | |
2209 | |
2210 @item bytecount_to_charcount | |
2211 Given a pointer to a text string and a length in bytes, return the | |
2212 equivalent length in characters. | |
2213 | |
2214 @example | |
2215 Charcount bytecount_to_charcount (Bufbyte *p, Bytecount bc); | |
2216 @end example | |
2217 | |
2218 @item charcount_to_bytecount | |
2219 Given a pointer to a text string and a length in characters, return the | |
2220 equivalent length in bytes. | |
2221 | |
2222 @example | |
2223 Bytecount charcount_to_bytecount (Bufbyte *p, Charcount cc); | |
2224 @end example | |
2225 | |
2226 @item charptr_n_addr | |
2227 Return a pointer to the beginning of the character offset @var{cc} (in | |
2228 characters) from @var{p}. | |
2229 | |
2230 @example | |
2231 Bufbyte *charptr_n_addr (Bufbyte *p, Charcount cc); | |
2232 @end example | |
2233 @end table | |
2234 | |
2235 @node Conversion of External Data | |
2236 @subsection Conversion of External Data | |
2237 | |
2238 When an external function, such as a C library function, returns a | |
2239 @code{char} pointer, you should never treat it as @code{Bufbyte}. This | |
2240 is because these returned strings may contain 8bit characters which can | |
2241 be misinterpreted by XEmacs, and cause a crash. Instead, you should use | |
2242 a conversion macro. Many different conversion macros are defined in | |
2243 @file{buffer.h}, so I will try to order them logically, by direction and | |
2244 by format. | |
2245 | |
2246 Thus the basic conversion macros are @code{GET_CHARPTR_INT_DATA_ALLOCA} | |
2247 and @code{GET_CHARPTR_EXT_DATA_ALLOCA}. The former is used to convert | |
2248 external data to internal format, and the latter is used to convert the | |
2249 other way around. The arguments each of these receives are @var{ptr} | |
2250 (pointer to the text in external format), @var{len} (length of texts in | |
2251 bytes), @var{fmt} (format of the external text), @var{ptr_out} (lvalue | |
2252 to which new text should be copied), and @var{len_out} (lvalue which | |
2253 will be assigned the length of the internal text in bytes). The | |
2254 resulting text is stored to a stack-allocated buffer. If the text | |
2255 doesn't need changing, these macros will do nothing, except for setting | |
2256 @var{len_out}. | |
2257 | |
2258 Currently meaningful formats are @code{FORMAT_BINARY}, | |
2259 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. | |
2260 | |
2261 The two macros above take many arguments which makes them unwieldy. For | |
2262 this reason, several convenience macros are defined with obvious | |
2263 functionality, but accepting less arguments: | |
2264 | |
2265 @table @code | |
2266 @item GET_C_CHARPTR_EXT_DATA_ALLOCA | |
2267 @itemx GET_C_CHARPTR_INT_DATA_ALLOCA | |
2268 These two macros work on ``C char pointers'', which are zero-terminated, | |
2269 and thus do not need @var{len} or @var{len_out} parameters. | |
2270 | |
2271 @item GET_STRING_EXT_DATA_ALLOCA | |
2272 @itemx GET_C_STRING_EXT_DATA_ALLOCA | |
2273 These two macros work on Lisp strings, thus also not needing a @var{len} | |
2274 parameter. However, @code{GET_STRING_EXT_DATA_ALLOCA} still provides a | |
2275 @var{len_out} parameter. Note that for Lisp strings only one conversion | |
2276 direction makes sense. | |
2277 | |
2278 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA | |
2279 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA | |
2280 @itemx GET_C_CHARPTR_EXT_CTEXT_DATA_ALLOCA | |
2281 @itemx ... | |
2282 These macros are a combination of the above, but with the @var{fmt} | |
2283 argument encoded into the name of the macro. | |
2284 @end table | |
2285 | |
2286 @node General Guidelines for Writing Mule-Aware Code | |
2287 @subsection General Guidelines for Writing Mule-Aware Code | |
2288 | |
2289 This section contains some general guidance on how to write Mule-aware | |
2290 code, as well as some pitfalls you should avoid. | |
2291 | |
2292 @table @emph | |
2293 @item Never use @code{char} and @code{char *}. | |
2294 In XEmacs, the use of @code{char} and @code{char *} is almost always a | |
2295 mistake. If you want to manipulate an Emacs character from ``C'', use | |
2296 @code{Emchar}. If you want to examine a specific octet in the internal | |
2297 format, use @code{Bufbyte}. If you want a Lisp-visible character, use a | |
2298 @code{Lisp_Object} and @code{make_char}. If you want a pointer to move | |
2299 through the internal text, use @code{Bufbyte *}. Also note that you | |
2300 almost certainly do not need @code{Emchar *}. | |
2301 | |
2302 @item Be careful not to confuse @code{Charcount}, @code{Bytecount}, and @code{Bufpos}. | |
2303 The whole point of using different types is to avoid confusion about the | |
2304 use of certain variables. Lest this effect be nullified, you need to be | |
2305 careful about using the right types. | |
2306 | |
2307 @item Always convert external data | |
2308 It is extremely important to always convert external data, because | |
2309 XEmacs can crash if unexpected 8bit sequences are copied to its internal | |
2310 buffers literally. | |
2311 | |
2312 This means that when a system function, such as @code{readdir}, returns | |
2313 a string, you need to convert it using one of the conversion macros | |
2314 described in the previous chapter, before passing it further to Lisp. | |
2315 In the case of @code{readdir}, you would use the | |
2316 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro. | |
2317 | |
2318 Also note that many internal functions, such as @code{make_string}, | |
2319 accept Bufbytes, which removes the need for them to convert the data | |
2320 they receive. This increases efficiency because that way external data | |
2321 needs to be decoded only once, when it is read. After that, it is | |
2322 passed around in internal format. | |
2323 @end table | |
2324 | |
2325 @node An Example of Mule-Aware Code | |
2326 @subsection An Example of Mule-Aware Code | |
2327 | |
2328 As an example of Mule-aware code, we shall will analyze the | |
2329 @code{string} function, which conses up a Lisp string from the character | |
2330 arguments it receives. Here is the definition, pasted from | |
2331 @code{alloc.c}: | |
2332 | |
2333 @example | |
2334 @group | |
2335 DEFUN ("string", Fstring, 0, MANY, 0, /* | |
2336 Concatenate all the argument characters and make the result a string. | |
2337 */ | |
2338 (int nargs, Lisp_Object *args)) | |
2339 @{ | |
2340 Bufbyte *storage = alloca_array (Bufbyte, nargs * MAX_EMCHAR_LEN); | |
2341 Bufbyte *p = storage; | |
2342 | |
2343 for (; nargs; nargs--, args++) | |
2344 @{ | |
2345 Lisp_Object lisp_char = *args; | |
2346 CHECK_CHAR_COERCE_INT (lisp_char); | |
2347 p += set_charptr_emchar (p, XCHAR (lisp_char)); | |
2348 @} | |
2349 return make_string (storage, p - storage); | |
2350 @} | |
2351 @end group | |
2352 @end example | |
2353 | |
2354 Now we can analyze the source line by line. | |
2355 | |
2356 Obviously, string will be as long as there are arguments to the | |
2357 function. This is why we allocate @code{MAX_EMCHAR_LEN} * @var{nargs} | |
2358 bytes on the stack, i.e. the worst-case number of bytes for @var{nargs} | |
2359 @code{Emchar}s to fit in the string. | |
2360 | |
2361 Then, the loop checks that each element is a character, converting | |
2362 integers in the process. Like many other functions in XEmacs, this | |
2363 function silently accepts integers where characters are expected, for | |
2364 historical and compatibility reasons. Unless you know what you are | |
2365 doing, @code{CHECK_CHAR} will also suffice. @code{XCHAR (lisp_char)} | |
2366 extracts the @code{Emchar} from the @code{Lisp_Object}, and | |
2367 @code{set_charptr_emchar} stores it to storage, increasing @code{p} in | |
2368 the process. | |
2369 | |
2370 Other instructing examples of correct coding under Mule can be found all | |
2371 over XEmacs code. For starters, I recommend | |
2372 @code{Fnormalize_menu_item_name} in @file{menubar.c}. After you have | |
2373 understood this section of the manual and studied the examples, you can | |
2374 proceed writing new Mule-aware code. | |
2026 | 2375 |
2027 @node Techniques for XEmacs Developers | 2376 @node Techniques for XEmacs Developers |
2028 @section Techniques for XEmacs Developers | 2377 @section Techniques for XEmacs Developers |
2029 | 2378 |
2030 To make a quantified XEmacs, do: @code{make quantmacs}. | 2379 To make a quantified XEmacs, do: @code{make quantmacs}. |