comparison man/internals/internals.texi @ 400:a86b2b5e0111 r21-2-30

Import from CVS: tag r21-2-30
author cvs
date Mon, 13 Aug 2007 11:14:34 +0200
parents 74fd4e045ea6
children 2f8bb876ab1d
comparison
equal deleted inserted replaced
399:376370fb5946 400:a86b2b5e0111
136 * Menus:: 136 * Menus::
137 * Subprocesses:: 137 * Subprocesses::
138 * Interface to X Windows:: 138 * Interface to X Windows::
139 * Index:: 139 * Index::
140 140
141 @detailmenu --- The Detailed Node Listing --- 141 @detailmenu
142
143 --- The Detailed Node Listing ---
142 144
143 A History of Emacs 145 A History of Emacs
144 146
145 * Through Version 18:: Unification prevails. 147 * Through Version 18:: Unification prevails.
146 * Lucid Emacs:: One version 19 Emacs. 148 * Lucid Emacs:: One version 19 Emacs.
187 * Garbage Collection - Step by Step:: 189 * Garbage Collection - Step by Step::
188 * Integers and Characters:: 190 * Integers and Characters::
189 * Allocation from Frob Blocks:: 191 * Allocation from Frob Blocks::
190 * lrecords:: 192 * lrecords::
191 * Low-level allocation:: 193 * Low-level allocation::
192 * Pure Space::
193 * Cons:: 194 * Cons::
194 * Vector:: 195 * Vector::
195 * Bit Vector:: 196 * Bit Vector::
196 * Symbol:: 197 * Symbol::
197 * Marker:: 198 * Marker::
964 Java, which is inexcusable. 965 Java, which is inexcusable.
965 @end enumerate 966 @end enumerate
966 967
967 Unfortunately, there is no perfect language. Static typing allows a 968 Unfortunately, there is no perfect language. Static typing allows a
968 compiler to catch programmer errors and produce more efficient code, but 969 compiler to catch programmer errors and produce more efficient code, but
969 makes programming more tedious and less fun. For the forseeable future, 970 makes programming more tedious and less fun. For the foreseeable future,
970 an Ideal Editing and Programming Environment (and that is what XEmacs 971 an Ideal Editing and Programming Environment (and that is what XEmacs
971 aspires to) will be programmable in multiple languages: high level ones 972 aspires to) will be programmable in multiple languages: high level ones
972 like Lisp for user customization and prototyping, and lower level ones 973 like Lisp for user customization and prototyping, and lower level ones
973 for infrastructure and industrial strength applications. If I had my 974 for infrastructure and industrial strength applications. If I had my
974 way, XEmacs would be friendly towards the Python, Scheme, C++, ML, 975 way, XEmacs would be friendly towards the Python, Scheme, C++, ML,
1617 1618
1618 @example 1619 @example
1619 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ] 1620 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1620 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ] 1621 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1621 1622
1622 <---> ^ <------------------------------------------------------> 1623 <---------------------------------------------------------> <->
1623 tag | a pointer to a structure, or an integer 1624 a pointer to a structure, or an integer tag
1624 | 1625 @end example
1625 mark bit 1626
1626 @end example 1627 A tag of 00 is used for all pointer object types, a tag of 10 is used
1627 1628 for characters, and the other two tags 01 and 11 are joined together to
1628 The tag describes the type of the Lisp object. For integers and chars, 1629 form the integer object type. This representation gives us 31 bits
1629 the lower 28 bits contain the value of the integer or char; for all 1630 integers, 30 bits characters and pointers are represented directly
1630 others, the lower 28 bits contain a pointer. The mark bit is used 1631 without any bit masking. This representation, though, assumes that
1631 during garbage-collection, and is always 0 when garbage collection is 1632 pointers to structs are always aligned to multiples of 4, so the lower 2
1632 not happening. (The way that garbage collection works, basically, is that it 1633 bits are always zero.
1633 loops over all places where Lisp objects could exist---this includes
1634 all global variables in C that contain Lisp objects [including
1635 @code{Vobarray}, the C equivalent of @code{obarray}; through this, all
1636 Lisp variables will get marked], plus various other places---and
1637 recursively scans through the Lisp objects, marking each object it finds
1638 by setting the mark bit. Then it goes through the lists of all objects
1639 allocated, freeing the ones that are not marked and turning off the mark
1640 bit of the ones that are marked.)
1641 1634
1642 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type 1635 Lisp objects use the typedef @code{Lisp_Object}, but the actual C type
1643 used for the Lisp object can vary. It can be either a simple type 1636 used for the Lisp object can vary. It can be either a simple type
1644 (@code{long} on the DEC Alpha, @code{int} on other machines) or a 1637 (@code{long} on the DEC Alpha, @code{int} on other machines) or a
1645 structure whose fields are bit fields that line up properly (actually, a 1638 structure whose fields are bit fields that line up properly (actually, a
1652 object is desired, you get a compile error), and it makes it easier to 1645 object is desired, you get a compile error), and it makes it easier to
1653 decode Lisp objects when debugging. The choice of which type to use is 1646 decode Lisp objects when debugging. The choice of which type to use is
1654 determined by the preprocessor constant @code{USE_UNION_TYPE} which is 1647 determined by the preprocessor constant @code{USE_UNION_TYPE} which is
1655 defined via the @code{--use-union-type} option to @code{configure}. 1648 defined via the @code{--use-union-type} option to @code{configure}.
1656 1649
1657 @cindex record type
1658
1659 Note that there are only eight types that the tag can represent, but
1660 many more actual types than this. This is handled by having one of the
1661 tag types specify a meta-type called a @dfn{record}; for all such
1662 objects, the first four bytes of the pointed-to structure indicate what
1663 the actual type is.
1664
1665 Note also that having 28 bits for pointers and integers restricts a lot
1666 of things to 256 megabytes of memory. (Basically, enough pointers and
1667 indices and whatnot get stuffed into Lisp objects that the total amount
1668 of memory used by XEmacs can't grow above 256 megabytes. In older
1669 versions of XEmacs and GNU Emacs, the tag was 5 bits wide, allowing for
1670 32 types, which was more than the actual number of types that existed at
1671 the time, and no ``record'' type was necessary. However, this limited
1672 the editor to 64 megabytes total, which some users who edited large
1673 files might conceivably exceed.)
1674
1675 Also, note that there is an implicit assumption here that all pointers
1676 are low enough that the top bits are all zero and can just be chopped
1677 off. On standard machines that allocate memory from the bottom up (and
1678 give each process its own address space), this works fine. Some
1679 machines, however, put the data space somewhere else in memory
1680 (e.g. beginning at 0x80000000). Those machines cope by defining
1681 @code{DATA_SEG_BITS} in the corresponding @file{m/} or @file{s/} file to
1682 the proper mask. Then, pointers retrieved from Lisp objects are
1683 automatically OR'ed with this value prior to being used.
1684
1685 A corollary of the previous paragraph is that @strong{(pointers to)
1686 stack-allocated structures cannot be put into Lisp objects}. The stack
1687 is generally located near the top of memory; if you put such a pointer
1688 into a Lisp object, it will get its top bits chopped off, and you will
1689 lose.
1690
1691 Actually, there's an alternative representation of a @code{Lisp_Object},
1692 invented by Kyle Jones, that is used when the
1693 @code{--use-minimal-tagbits} option to @code{configure} is used. In
1694 this case the 2 lower bits are used for the tag bits. This
1695 representation assumes that pointers to structs are always aligned to
1696 multiples of 4, so the lower 2 bits are always zero.
1697
1698 @example
1699 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
1700 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]
1701
1702 <---------------------------------------------------------> <->
1703 a pointer to a structure, or an integer tag
1704 @end example
1705
1706 A tag of 00 is used for all pointer object types, a tag of 10 is used
1707 for characters, and the other two tags 01 and 11 are joined together to
1708 form the integer object type. The markbit is moved to part of the
1709 structure being pointed at (integers and chars do not need to be marked,
1710 since no memory is allocated). This representation has these
1711 advantages:
1712
1713 @enumerate
1714 @item
1715 31 bits can be used for Lisp Integers.
1716 @item
1717 @emph{Any} pointer can be represented directly, and no bit masking
1718 operations are necessary.
1719 @end enumerate
1720
1721 The disadvantages are:
1722
1723 @enumerate
1724 @item
1725 An extra level of indirection is needed when accessing the object types
1726 that were not record types. So checking whether a Lisp object is a cons
1727 cell becomes a slower operation.
1728 @item
1729 Mark bits can no longer be stored directly in Lisp objects, so another
1730 place for them must be found. This means that a cons cell requires more
1731 memory than merely room for 2 lisp objects, leading to extra memory use.
1732 @end enumerate
1733
1734 Various macros are used to construct Lisp objects and extract the 1650 Various macros are used to construct Lisp objects and extract the
1735 components. Macros of the form @code{XINT()}, @code{XCHAR()}, 1651 components. Macros of the form @code{XINT()}, @code{XCHAR()},
1736 @code{XSTRING()}, @code{XSYMBOL()}, etc. mask out the pointer/integer 1652 @code{XSTRING()}, @code{XSYMBOL()}, etc. shift out the tag field if
1737 field and cast it to the appropriate type. All of the macros that 1653 needed cast it to the appropriate type. @code{XINT()} needs to be a bit
1738 construct pointers will @code{OR} with @code{DATA_SEG_BITS} if 1654 tricky so that negative numbers are properly sign-extended. Since
1739 necessary. @code{XINT()} needs to be a bit tricky so that negative 1655 integers are stored left-shifted, if the right-shift operator does an
1740 numbers are properly sign-extended: Usually it does this by shifting the 1656 arithmetic shift (i.e. it leaves the most-significant bit as-is rather
1741 number four bits to the left and then four bits to the right. This 1657 than shifting in a zero, so that it mimics a divide-by-two even for
1742 assumes that the right-shift operator does an arithmetic shift (i.e. it 1658 negative numbers) the shift to remove the tag bit is enough. This is
1743 leaves the most-significant bit as-is rather than shifting in a zero, so 1659 the case on all the systems we support.
1744 that it mimics a divide-by-two even for negative numbers). Not all
1745 machines/compilers do this, and on the ones that don't, a more
1746 complicated definition is selected by defining
1747 @code{EXPLICIT_SIGN_EXTEND}.
1748 1660
1749 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor 1661 Note that when @code{ERROR_CHECK_TYPECHECK} is defined, the extractor
1750 macros become more complicated---they check the tag bits and/or the 1662 macros become more complicated---they check the tag bits and/or the
1751 type field in the first four bytes of a record type to ensure that the 1663 type field in the first four bytes of a record type to ensure that the
1752 object is really of the correct type. This is great for catching places 1664 object is really of the correct type. This is great for catching places
1842 some_variable = 0;}. The reason for this has to do with some kludges 1754 some_variable = 0;}. The reason for this has to do with some kludges
1843 done during the dumping process: If possible, the initialized data 1755 done during the dumping process: If possible, the initialized data
1844 segment is re-mapped so that it becomes part of the (unmodifiable) code 1756 segment is re-mapped so that it becomes part of the (unmodifiable) code
1845 segment in the dumped executable. This allows this memory to be shared 1757 segment in the dumped executable. This allows this memory to be shared
1846 among multiple running XEmacs processes. XEmacs is careful to place as 1758 among multiple running XEmacs processes. XEmacs is careful to place as
1847 much constant data as possible into initialized variables (in 1759 much constant data as possible into initialized variables during the
1848 particular, into what's called the @dfn{pure space}---see below) during 1760 @file{temacs} phase.
1849 the @file{temacs} phase.
1850 1761
1851 @cindex copy-on-write 1762 @cindex copy-on-write
1852 @strong{Please note:} This kludge only works on a few systems nowadays, 1763 @strong{Please note:} This kludge only works on a few systems nowadays,
1853 and is rapidly becoming irrelevant because most modern operating systems 1764 and is rapidly becoming irrelevant because most modern operating systems
1854 provide @dfn{copy-on-write} semantics. All data is initially shared 1765 provide @dfn{copy-on-write} semantics. All data is initially shared
2263 @item Bufbyte 2174 @item Bufbyte
2264 @cindex Bufbyte 2175 @cindex Bufbyte
2265 The data representing the text in a buffer or string is logically a set 2176 The data representing the text in a buffer or string is logically a set
2266 of @code{Bufbyte}s. 2177 of @code{Bufbyte}s.
2267 2178
2268 XEmacs does not work with character formats all the time; when reading 2179 XEmacs does not work with the same character formats all the time; when
2269 characters from the outside, it decodes them to an internal format, and 2180 reading characters from the outside, it decodes them to an internal
2270 likewise encodes them when writing. @code{Bufbyte} (in fact 2181 format, and likewise encodes them when writing. @code{Bufbyte} (in fact
2271 @code{unsigned char}) is the basic unit of XEmacs internal buffers and 2182 @code{unsigned char}) is the basic unit of XEmacs internal buffers and
2272 strings format. 2183 strings format. A @code{Bufbyte *} is the type that points at text
2184 encoded in the variable-width internal encoding.
2273 2185
2274 One character can correspond to one or more @code{Bufbyte}s. In the 2186 One character can correspond to one or more @code{Bufbyte}s. In the
2275 current implementation, an ASCII character is represented by the same 2187 current Mule implementation, an ASCII character is represented by the
2276 @code{Bufbyte}, and extended characters are represented by a sequence of 2188 same @code{Bufbyte}, and other characters are represented by a sequence
2277 @code{Bufbyte}s. 2189 of two or more @code{Bufbyte}s.
2278 2190
2279 Without Mule support, a @code{Bufbyte} is equivalent to an 2191 Without Mule support, there are exactly 256 characters, implicitly
2280 @code{Emchar}. 2192 Latin-1, and each character is represented using one @code{Bufbyte}, and
2193 there is a one-to-one correspondence between @code{Bufbyte}s and
2194 @code{Emchar}s.
2281 2195
2282 @item Bufpos 2196 @item Bufpos
2283 @itemx Charcount 2197 @itemx Charcount
2284 @cindex Bufpos 2198 @cindex Bufpos
2285 @cindex Charcount 2199 @cindex Charcount
2286 A @code{Bufpos} represents a character position in a buffer or string. 2200 A @code{Bufpos} represents a character position in a buffer or string.
2287 A @code{Charcount} represents a number (count) of characters. 2201 A @code{Charcount} represents a number (count) of characters.
2288 Logically, subtracting two @code{Bufpos} values yields a 2202 Logically, subtracting two @code{Bufpos} values yields a
2289 @code{Charcount} value. Although all of these are @code{typedef}ed to 2203 @code{Charcount} value. Although all of these are @code{typedef}ed to
2290 @code{int}, we use them in preference to @code{int} to make it clear 2204 @code{EMACS_INT}, we use them in preference to @code{EMACS_INT} to make
2291 what sort of position is being used. 2205 it clear what sort of position is being used.
2292 2206
2293 @code{Bufpos} and @code{Charcount} values are the only ones that are 2207 @code{Bufpos} and @code{Charcount} values are the only ones that are
2294 ever visible to Lisp. 2208 ever visible to Lisp.
2295 2209
2296 @item Bytind 2210 @item Bytind
2297 @itemx Bytecount 2211 @itemx Bytecount
2298 @cindex Bytind 2212 @cindex Bytind
2299 @cindex Bytecount 2213 @cindex Bytecount
2300 A @code{Bytind} represents a byte position in a buffer or string. A 2214 A @code{Bytind} represents a byte position in a buffer or string. A
2301 @code{Bytecount} represents the distance between two positions in bytes. 2215 @code{Bytecount} represents the distance between two positions, in bytes.
2302 The relationship between @code{Bytind} and @code{Bytecount} is the same 2216 The relationship between @code{Bytind} and @code{Bytecount} is the same
2303 as the relationship between @code{Bufpos} and @code{Charcount}. 2217 as the relationship between @code{Bufpos} and @code{Charcount}.
2304 2218
2305 @item Extbyte 2219 @item Extbyte
2306 @itemx Extcount 2220 @itemx Extcount
2323 learn about them. 2237 learn about them.
2324 2238
2325 @table @code 2239 @table @code
2326 @item MAX_EMCHAR_LEN 2240 @item MAX_EMCHAR_LEN
2327 @cindex MAX_EMCHAR_LEN 2241 @cindex MAX_EMCHAR_LEN
2328 This preprocessor constant is the maximum number of buffer bytes per 2242 This preprocessor constant is the maximum number of buffer bytes to
2329 Emacs character, i.e. the byte length of an @code{Emchar}. It is useful 2243 represent an Emacs character in the variable width internal encoding.
2330 when allocating temporary strings to keep a known number of characters. 2244 It is useful when allocating temporary strings to keep a known number of
2331 For instance: 2245 characters. For instance:
2332 2246
2333 @example 2247 @example
2334 @group 2248 @group
2335 @{ 2249 @{
2336 Charcount cclen; 2250 Charcount cclen;
2447 always convert it to an appropriate external encoding, lest the internal 2361 always convert it to an appropriate external encoding, lest the internal
2448 stuff (such as the infamous \201 characters) leak out. 2362 stuff (such as the infamous \201 characters) leak out.
2449 2363
2450 The interface to conversion between the internal and external 2364 The interface to conversion between the internal and external
2451 representations of text are the numerous conversion macros defined in 2365 representations of text are the numerous conversion macros defined in
2452 @file{buffer.h}. Before looking at them, we'll look at the external 2366 @file{buffer.h}. There used to be a fixed set of external formats
2453 formats supported by these macros. 2367 supported by these macros, but now any coding system can be used with
2454 2368 these macros. The coding system alias mechanism is used to create the
2455 Currently meaningful formats are @code{FORMAT_BINARY}, 2369 following logical coding systems, which replace the fixed external
2456 @code{FORMAT_FILENAME}, @code{FORMAT_OS}, and @code{FORMAT_CTEXT}. Here 2370 formats. The (dontusethis-set-symbol-value-handler) mechanism was
2457 is a description of these. 2371 enhanced to make this possible (more work on that is needed - like
2372 remove the @code{dontusethis-} prefix).
2458 2373
2459 @table @code 2374 @table @code
2460 @item FORMAT_BINARY 2375 @item Qbinary
2461 Binary format. This is the simplest format and is what we use in the 2376 This is the simplest format and is what we use in the absence of a more
2462 absence of a more appropriate format. This converts according to the 2377 appropriate format. This converts according to the @code{binary} coding
2463 @code{binary} coding system: 2378 system:
2464 2379
2465 @enumerate a 2380 @enumerate a
2466 @item 2381 @item
2467 On input, bytes 0--255 are converted into characters 0--255. 2382 On input, bytes 0--255 are converted into (implicitly Latin-1)
2383 characters 0--255. A non-Mule xemacs doesn't really know about
2384 different character sets and the fonts to display them, so the bytes can
2385 be treated as text in different 1-byte encodings by simply setting the
2386 appropriate fonts. So in a sense, non-Mule xemacs is a multi-lingual
2387 editor if, for example, different fonts are used to display text in
2388 different buffers, faces, or windows. The specifier mechanism gives the
2389 user complete control over this kind of behavior.
2468 @item 2390 @item
2469 On output, characters 0--255 are converted into bytes 0--255 and other 2391 On output, characters 0--255 are converted into bytes 0--255 and other
2470 characters are converted into `X'. 2392 characters are converted into `~'.
2471 @end enumerate 2393 @end enumerate
2472 2394
2473 @item FORMAT_FILENAME 2395 @item Qfile_name
2474 Format used for filenames. In the original Mule, this is user-definable 2396 Format used for filenames. This is user-definable via either the
2475 with the @code{pathname-coding-system} variable. For the moment, we 2397 @code{file-name-coding-system} or @code{pathname-coding-system} (now
2476 just use the @code{binary} coding system. 2398 obsolete) variables.
2477 2399
2478 @item FORMAT_OS 2400 @item Qnative
2479 Format used for the external Unix environment---@code{argv[]}, stuff 2401 Format used for the external Unix environment---@code{argv[]}, stuff
2480 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc. 2402 from @code{getenv()}, stuff from the @file{/etc/passwd} file, etc.
2481 2403 Currently this is the same as Qfile_name. The two should be
2482 Perhaps should be the same as FORMAT_FILENAME. 2404 distinguished for clarity and possible future separation.
2483 2405
2484 @item FORMAT_CTEXT 2406 @item Qctext
2485 Compound--text format. This is the standard X format used for data 2407 Compound--text format. This is the standard X11 format used for data
2486 stored in properties, selections, and the like. This is an 8-bit 2408 stored in properties, selections, and the like. This is an 8-bit
2487 no-lock-shift ISO2022 coding system. 2409 no-lock-shift ISO2022 coding system. This is a real coding system,
2410 unlike Qfile_name, which is user-definable.
2488 @end table 2411 @end table
2489 2412
2490 The macros to convert between these formats and the internal format, and 2413 There are two fundamental macros to convert between external and
2491 vice versa, follow. 2414 internal format.
2415
2416 @code{TO_INTERNAL_FORMAT} converts external data to internal format, and
2417 @code{TO_EXTERNAL_FORMAT} converts the other way around. The arguments
2418 each of these receives are a source type, a source, a sink type, a sink,
2419 and a coding system (or a symbol naming a coding system).
2420
2421 A typical call looks like
2422 @example
2423 TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
2424 @end example
2425
2426 which means that the contents of the lisp string @code{str} are written
2427 to a malloc'ed memory area which will be pointed to by @code{ptr}, after
2428 the function returns. The conversion will be done using the
2429 @code{file-name} coding system, which will be controlled by the user
2430 indirectly by setting or binding the variable
2431 @code{file-name-coding-system}.
2432
2433 Some sources and sinks require two C variables to specify. We use some
2434 preprocessor magic to allow different source and sink types, and even
2435 different numbers of arguments to specify different types of sources and
2436 sinks.
2437
2438 So we can have a call that looks like
2439 @example
2440 TO_INTERNAL_FORMAT (DATA, (ptr, len),
2441 MALLOC, (ptr, len),
2442 coding_system);
2443 @end example
2444
2445 The parenthesized argument pairs are required to make the preprocessor
2446 magic work.
2447
2448 Here are the different source and sink types:
2492 2449
2493 @table @code 2450 @table @code
2494 @item GET_CHARPTR_INT_DATA_ALLOCA 2451 @item @code{DATA, (ptr, len),}
2495 @itemx GET_CHARPTR_EXT_DATA_ALLOCA 2452 input data is a fixed buffer of size @var{len} at address @var{ptr}
2496 These two are the most basic conversion macros. 2453 @item @code{ALLOCA, (ptr, len),}
2497 @code{GET_CHARPTR_INT_DATA_ALLOCA} converts external data to internal 2454 output data is placed in an alloca()ed buffer of size @var{len} pointed to by @var{ptr}
2498 format, and @code{GET_CHARPTR_EXT_DATA_ALLOCA} converts the other way 2455 @item @code{MALLOC, (ptr, len),}
2499 around. The arguments each of these receives are @var{ptr} (pointer to 2456 output data is in a malloc()ed buffer of size @var{len} pointed to by @var{ptr}
2500 the text in external format), @var{len} (length of texts in bytes), 2457 @item @code{C_STRING_ALLOCA, ptr,}
2501 @var{fmt} (format of the external text), @var{ptr_out} (lvalue to which 2458 equivalent to @code{ALLOCA (ptr, len_ignored)} on output.
2502 new text should be copied), and @var{len_out} (lvalue which will be 2459 @item @code{C_STRING_MALLOC, ptr,}
2503 assigned the length of the internal text in bytes). The resulting text 2460 equivalent to @code{MALLOC (ptr, len_ignored)} on output
2504 is stored to a stack-allocated buffer. If the text doesn't need 2461 @item @code{C_STRING, ptr,}
2505 changing, these macros will do nothing, except for setting 2462 equivalent to @code{DATA, (ptr, strlen (ptr) + 1)} on input
2506 @var{len_out}. 2463 @item @code{LISP_STRING, string,}
2507 2464 input or output is a Lisp_Object of type string
2508 The macros above take many arguments which makes them unwieldy. For 2465 @item @code{LISP_BUFFER, buffer,}
2509 this reason, a number of convenience macros are defined with obvious 2466 output is written to @code{(point)} in lisp buffer @var{buffer}
2510 functionality, but accepting less arguments. The general rule is that 2467 @item @code{LISP_LSTREAM, lstream,}
2511 macros with @samp{INT} in their name convert text to internal Emacs 2468 input or output is a Lisp_Object of type lstream
2512 representation, whereas the @samp{EXT} macros convert to external 2469 @item @code{LISP_OPAQUE, object,}
2513 representation. 2470 input or output is a Lisp_Object of type opaque
2514
2515 @item GET_C_CHARPTR_INT_DATA_ALLOCA
2516 @itemx GET_C_CHARPTR_EXT_DATA_ALLOCA
2517 As their names imply, these macros work on C char pointers, which are
2518 zero-terminated, and thus do not need @var{len} or @var{len_out}
2519 parameters.
2520
2521 @item GET_STRING_EXT_DATA_ALLOCA
2522 @itemx GET_C_STRING_EXT_DATA_ALLOCA
2523 These two macros convert a Lisp string into an external representation.
2524 The difference between them is that @code{GET_STRING_EXT_DATA_ALLOCA}
2525 stores its output to a generic string, providing @var{len_out}, the
2526 length of the resulting external string. On the other hand,
2527 @code{GET_C_STRING_EXT_DATA_ALLOCA} assumes that the caller will be
2528 satisfied with output string being zero-terminated.
2529
2530 Note that for Lisp strings only one conversion direction makes sense.
2531
2532 @item GET_C_CHARPTR_EXT_BINARY_DATA_ALLOCA
2533 @itemx GET_CHARPTR_EXT_BINARY_DATA_ALLOCA
2534 @itemx GET_STRING_BINARY_DATA_ALLOCA
2535 @itemx GET_C_STRING_BINARY_DATA_ALLOCA
2536 @itemx GET_C_CHARPTR_EXT_FILENAME_DATA_ALLOCA
2537 @itemx ...
2538 These macros convert internal text to a specific external
2539 representation, with the external format being encoded into the name of
2540 the macro. Note that the @code{GET_STRING_...} and
2541 @code{GET_C_STRING...} macros lack the @samp{EXT} tag, because they
2542 only make sense in that direction.
2543
2544 @item GET_C_CHARPTR_INT_BINARY_DATA_ALLOCA
2545 @itemx GET_CHARPTR_INT_BINARY_DATA_ALLOCA
2546 @itemx GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA
2547 @itemx ...
2548 These macros convert external text of a specific format to its internal
2549 representation, with the external format being incoded into the name of
2550 the macro.
2551 @end table 2471 @end table
2472
2473 Often, the data is being converted to a '\0'-byte-terminated string,
2474 which is the format required by many external system C APIs. For these
2475 purposes, a source type of @code{C_STRING} or a sink type of
2476 @code{C_STRING_ALLOCA} or @code{C_STRING_MALLOC} is appropriate.
2477 Otherwise, we should try to keep XEmacs '\0'-byte-clean, which means
2478 using (ptr, len) pairs.
2479
2480 The sinks to be specified must be lvalues, unless they are the lisp
2481 object types @code{LISP_LSTREAM} or @code{LISP_BUFFER}.
2482
2483 For the sink types @code{ALLOCA} and @code{C_STRING_ALLOCA}, the
2484 resulting text is stored in a stack-allocated buffer, which is
2485 automatically freed on returning from the function. However, the sink
2486 types @code{MALLOC} and @code{C_STRING_MALLOC} return @code{xmalloc()}ed
2487 memory. The caller is responsible for freeing this memory using
2488 @code{xfree()}.
2489
2490 Note that it doesn't make sense for @code{LISP_STRING} to be a source
2491 for @code{TO_INTERNAL_FORMAT} or a sink for @code{TO_EXTERNAL_FORMAT}.
2492 You'll get an assertion failure if you try.
2493
2552 2494
2553 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule 2495 @node General Guidelines for Writing Mule-Aware Code, An Example of Mule-Aware Code, Conversion to and from External Data, Coding for Mule
2554 @subsection General Guidelines for Writing Mule-Aware Code 2496 @subsection General Guidelines for Writing Mule-Aware Code
2555 2497
2556 This section contains some general guidance on how to write Mule-aware 2498 This section contains some general guidance on how to write Mule-aware
2575 It is extremely important to always convert external data, because 2517 It is extremely important to always convert external data, because
2576 XEmacs can crash if unexpected 8bit sequences are copied to its internal 2518 XEmacs can crash if unexpected 8bit sequences are copied to its internal
2577 buffers literally. 2519 buffers literally.
2578 2520
2579 This means that when a system function, such as @code{readdir}, returns 2521 This means that when a system function, such as @code{readdir}, returns
2580 a string, you need to convert it using one of the conversion macros 2522 a string, you may need to convert it using one of the conversion macros
2581 described in the previous chapter, before passing it further to Lisp. 2523 described in the previous chapter, before passing it further to Lisp.
2582 In the case of @code{readdir}, you would use the 2524
2583 @code{GET_C_CHARPTR_INT_FILENAME_DATA_ALLOCA} macro. 2525 Actually, most of the basic system functions that accept '\0'-terminated
2526 string arguments, like @code{stat()} and @code{open()}, have been
2527 @strong{encapsulated} so that they are they @code{always} do internal to
2528 external conversion themselves. This means you must pass internally
2529 encoded data, typically the @code{XSTRING_DATA} of a Lisp_String to
2530 these functions. This is actually a design bug, since it unexpectedly
2531 changes the semantics of the system functions. A better design would be
2532 to provide separate versions of these system functions that accepted
2533 Lisp_Objects which were lisp strings in place of their current
2534 @code{char *} arguments.
2535
2536 @example
2537 int stat_lisp (Lisp_Object path, struct stat *buf); /* Implement me */
2538 @end example
2584 2539
2585 Also note that many internal functions, such as @code{make_string}, 2540 Also note that many internal functions, such as @code{make_string},
2586 accept Bufbytes, which removes the need for them to convert the data 2541 accept Bufbytes, which removes the need for them to convert the data
2587 they receive. This increases efficiency because that way external data 2542 they receive. This increases efficiency because that way external data
2588 needs to be decoded only once, when it is read. After that, it is 2543 needs to be decoded only once, when it is read. After that, it is
2590 @end table 2545 @end table
2591 2546
2592 @node An Example of Mule-Aware Code, , General Guidelines for Writing Mule-Aware Code, Coding for Mule 2547 @node An Example of Mule-Aware Code, , General Guidelines for Writing Mule-Aware Code, Coding for Mule
2593 @subsection An Example of Mule-Aware Code 2548 @subsection An Example of Mule-Aware Code
2594 2549
2595 As an example of Mule-aware code, we shall will analyze the 2550 As an example of Mule-aware code, we will analyze the @code{string}
2596 @code{string} function, which conses up a Lisp string from the character 2551 function, which conses up a Lisp string from the character arguments it
2597 arguments it receives. Here is the definition, pasted from 2552 receives. Here is the definition, pasted from @code{alloc.c}:
2598 @code{alloc.c}:
2599 2553
2600 @example 2554 @example
2601 @group 2555 @group
2602 DEFUN ("string", Fstring, 0, MANY, 0, /* 2556 DEFUN ("string", Fstring, 0, MANY, 0, /*
2603 Concatenate all the argument characters and make the result a string. 2557 Concatenate all the argument characters and make the result a string.
2706 2660
2707 @item 2661 @item
2708 Generated header files should be included using the @code{#include <...>} syntax, 2662 Generated header files should be included using the @code{#include <...>} syntax,
2709 not the @code{#include "..."} syntax. The generated headers are: 2663 not the @code{#include "..."} syntax. The generated headers are:
2710 2664
2711 @file{config.h puresize-adjust.h sheap-adjust.h paths.h Emacs.ad.h} 2665 @file{config.h sheap-adjust.h paths.h Emacs.ad.h}
2712 2666
2713 The basic rule is that you should assume builds using @code{--srcdir} 2667 The basic rule is that you should assume builds using @code{--srcdir}
2714 and the @code{#include <...>} syntax needs to be used when the 2668 and the @code{#include <...>} syntax needs to be used when the
2715 to-be-included generated file is in a potentially different directory 2669 to-be-included generated file is in a potentially different directory
2716 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."} 2670 @emph{at compile time}. The non-obvious C rule is that @code{#include "..."}
2739 @example 2693 @example
2740 make check 2694 make check
2741 @end example 2695 @end example
2742 @end itemize 2696 @end itemize
2743 2697
2698 Here is a checklist of things to do when creating a new lisp object type
2699 named @var{foo}:
2700
2701 @enumerate
2702 @item
2703 create @var{foo}.h
2704 @item
2705 create @var{foo}.c
2706 @item
2707 add definitions of syms_of_@var{foo}, etc. to @var{foo}.c
2708 @item
2709 add declarations of syms_of_@var{foo}, etc. to symsinit.h
2710 @item
2711 add calls to syms_of_@var{foo}, etc. to emacs.c(main_1)
2712 @item
2713 add definitions of macros like CHECK_FOO and FOOP to @var{foo}.h
2714 @item
2715 add the new type index to enum lrecord_type
2716 @item
2717 add DEFINE_LRECORD_IMPLEMENTATION call to @var{foo}.c
2718 @end enumerate
2744 2719
2745 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top 2720 @node A Summary of the Various XEmacs Modules, Allocation of Objects in XEmacs Lisp, Rules When Writing New C Code, Top
2746 @chapter A Summary of the Various XEmacs Modules 2721 @chapter A Summary of the Various XEmacs Modules
2747 2722
2748 This is accurate as of XEmacs 20.0. 2723 This is accurate as of XEmacs 20.0.
3037 3012
3038 3013
3039 3014
3040 @example 3015 @example
3041 alloc.c 3016 alloc.c
3042 pure.c
3043 puresize.h
3044 @end example 3017 @end example
3045 3018
3046 The large module @file{alloc.c} implements all of the basic allocation and 3019 The large module @file{alloc.c} implements all of the basic allocation and
3047 garbage collection for Lisp objects. The most commonly used Lisp 3020 garbage collection for Lisp objects. The most commonly used Lisp
3048 objects are allocated in chunks, similar to the Blocktype data type 3021 objects are allocated in chunks, similar to the Blocktype data type
3063 Because the different subsystems are divided into general and specific 3036 Because the different subsystems are divided into general and specific
3064 code, adding a new subtype within a subsystem will in general not 3037 code, adding a new subtype within a subsystem will in general not
3065 require changes to the generic subsystem code or affect any of the other 3038 require changes to the generic subsystem code or affect any of the other
3066 subtypes in the subsystem; this provides a great deal of robustness to 3039 subtypes in the subsystem; this provides a great deal of robustness to
3067 the XEmacs code. 3040 the XEmacs code.
3068
3069 @cindex pure space
3070 @file{pure.c} contains the declaration of the @dfn{purespace} array.
3071 Pure space is a hack used to place some constant Lisp data into the code
3072 segment of the XEmacs executable, even though the data needs to be
3073 initialized through function calls. (See above in section VIII for more
3074 info about this.) During startup, certain sorts of data is
3075 automatically copied into pure space, and other data is copied manually
3076 in some of the basic Lisp files by calling the function @code{purecopy},
3077 which copies the object if possible (this only works in temacs, of
3078 course) and returns the new object. In particular, while temacs is
3079 executing, the Lisp reader automatically copies all compiled-function
3080 objects that it reads into pure space. Since compiled-function objects
3081 are large, are never modified, and typically comprise the majority of
3082 the contents of a compiled-Lisp file, this works well. While XEmacs is
3083 running, any attempt to modify an object that resides in pure space
3084 causes an error. Objects in pure space are never garbage collected --
3085 almost all of the time, they're intended to be permanent, and in any
3086 case you can't write into pure space to set the mark bits.
3087
3088 @file{puresize.h} contains the declaration of the size of the pure space
3089 array. This depends on the optional features that are compiled in, any
3090 extra purespace requested by the user at compile time, and certain other
3091 factors (e.g. 64-bit machines need more pure space because their Lisp
3092 objects are larger). The smallest size that suffices should be used, so
3093 that there's no wasted space. If there's not enough pure space, you
3094 will get an error during the build process, specifying how much more
3095 pure space is needed.
3096
3097 3041
3098 3042
3099 @example 3043 @example
3100 eval.c 3044 eval.c
3101 backtrace.h 3045 backtrace.h
4416 * Garbage Collection - Step by Step:: 4360 * Garbage Collection - Step by Step::
4417 * Integers and Characters:: 4361 * Integers and Characters::
4418 * Allocation from Frob Blocks:: 4362 * Allocation from Frob Blocks::
4419 * lrecords:: 4363 * lrecords::
4420 * Low-level allocation:: 4364 * Low-level allocation::
4421 * Pure Space::
4422 * Cons:: 4365 * Cons::
4423 * Vector:: 4366 * Vector::
4424 * Bit Vector:: 4367 * Bit Vector::
4425 * Symbol:: 4368 * Symbol::
4426 * Marker:: 4369 * Marker::
4447 symbols, the primitives are @code{make-symbol} and @code{intern}; etc. 4390 symbols, the primitives are @code{make-symbol} and @code{intern}; etc.
4448 Some Lisp objects, especially those that are primarily used internally, 4391 Some Lisp objects, especially those that are primarily used internally,
4449 have no corresponding Lisp primitives. Every Lisp object, though, 4392 have no corresponding Lisp primitives. Every Lisp object, though,
4450 has at least one C primitive for creating it. 4393 has at least one C primitive for creating it.
4451 4394
4452 Recall from section (VII) that a Lisp object, as stored in a 32-bit 4395 Recall from section (VII) that a Lisp object, as stored in a 32-bit or
4453 or 64-bit word, has a mark bit, a few tag bits, and a ``value'' that 4396 64-bit word, has a few tag bits, and a ``value'' that occupies the
4454 occupies the remainder of the bits. We can separate the different 4397 remainder of the bits. We can separate the different Lisp object types
4455 Lisp object types into four broad categories: 4398 into three broad categories:
4456 4399
4457 @itemize @bullet 4400 @itemize @bullet
4458 @item 4401 @item
4459 (a) Those for whom the value directly represents the contents of the 4402 (a) Those for whom the value directly represents the contents of the
4460 Lisp object. Only two types are in this category: integers and 4403 Lisp object. Only two types are in this category: integers and
4461 characters. No special allocation or garbage collection is necessary 4404 characters. No special allocation or garbage collection is necessary
4462 for such objects. Lisp objects of these types do not need to be 4405 for such objects. Lisp objects of these types do not need to be
4463 @code{GCPRO}ed. 4406 @code{GCPRO}ed.
4464 @end itemize 4407 @end itemize
4465 4408
4466 In the remaining three categories, the value is a pointer to a
4467 structure.
4468
4469 @itemize @bullet
4470 @item
4471 @cindex frob block
4472 (b) Those for whom the tag directly specifies the type. Recall that
4473 there are only three tag bits; this means that at most five types can be
4474 specified this way. The most commonly-used types are stored in this
4475 format; this includes conses, strings, vectors, and sometimes symbols.
4476 With the exception of vectors, objects in this category are allocated in
4477 @dfn{frob blocks}, i.e. large blocks of memory that are subdivided into
4478 individual objects. This saves a lot on malloc overhead, since there
4479 are typically quite a lot of these objects around, and the objects are
4480 small. (A cons, for example, occupies 8 bytes on 32-bit machines---4
4481 bytes for each of the two objects it contains.) Vectors are individually
4482 @code{malloc()}ed since they are of variable size. (It would be
4483 possible, and desirable, to allocate vectors of certain small sizes out
4484 of frob blocks, but it isn't currently done.) Strings are handled
4485 specially: Each string is allocated in two parts, a fixed size structure
4486 containing a length and a data pointer, and the actual data of the
4487 string. The former structure is allocated in frob blocks as usual, and
4488 the latter data is stored in @dfn{string chars blocks} and is relocated
4489 during garbage collection to eliminate holes.
4490 @end itemize
4491
4492 In the remaining two categories, the type is stored in the object 4409 In the remaining two categories, the type is stored in the object
4493 itself. The tag for all such objects is the generic @dfn{lrecord} 4410 itself. The tag for all such objects is the generic @dfn{lrecord}
4494 (Lisp_Record) tag. The first four bytes (or eight, for 64-bit machines) 4411 (Lisp_Type_Record) tag. The first bytes of the object's structure are an
4495 of the object's structure are a pointer to a structure that describes 4412 integer (actually a char) characterising the object's type and some
4496 the object's type, which includes method pointers and a pointer to a 4413 flags, in particular the mark bit used for garbage collection. A
4497 string naming the type. Note that it's possible to save some space by 4414 structure describing the type is accessible thru the
4498 using a one- or two-byte tag, rather than a four- or eight-byte pointer 4415 lrecord_implementation_table indexed with said integer. This structure
4499 to store the type, but it's not clear it's worth making the change. 4416 includes the method pointers and a pointer to a string naming the type.
4500 4417
4501 @itemize @bullet 4418 @itemize @bullet
4502 @item 4419 @item
4503 (c) Those lrecords that are allocated in frob blocks (see above). This 4420 (b) Those lrecords that are allocated in frob blocks (see above). This
4504 includes the objects that are most common and relatively small, and 4421 includes the objects that are most common and relatively small, and
4505 includes floats, compiled functions, symbols (when not in category (b)), 4422 includes conses, strings, subrs, floats, compiled functions, symbols,
4506 extents, events, and markers. With the cleanup of frob blocks done in 4423 extents, events, and markers. With the cleanup of frob blocks done in
4507 19.12, it's not terribly hard to add more objects to this category, but 4424 19.12, it's not terribly hard to add more objects to this category, but
4508 it's a bit trickier than adding an object type to type (d) (esp. if the 4425 it's a bit trickier than adding an object type to type (c) (esp. if the
4509 object needs a finalization method), and is not likely to save much 4426 object needs a finalization method), and is not likely to save much
4510 space unless the object is small and there are many of them. (In fact, 4427 space unless the object is small and there are many of them. (In fact,
4511 if there are very few of them, it might actually waste space.) 4428 if there are very few of them, it might actually waste space.)
4512 @item 4429 @item
4513 (d) Those lrecords that are individually @code{malloc()}ed. These are 4430 (c) Those lrecords that are individually @code{malloc()}ed. These are
4514 called @dfn{lcrecords}. All other types are in this category. Adding a 4431 called @dfn{lcrecords}. All other types are in this category. Adding a
4515 new type to this category is comparatively easy, and all types added 4432 new type to this category is comparatively easy, and all types added
4516 since 19.8 (when the current allocation scheme was devised, by Richard 4433 since 19.8 (when the current allocation scheme was devised, by Richard
4517 Mlynarik), with the exception of the character type, have been in this 4434 Mlynarik), with the exception of the character type, have been in this
4518 category. 4435 category.
4519 @end itemize 4436 @end itemize
4520 4437
4521 Note that bit vectors are a bit of a special case. They are 4438 Note that bit vectors are a bit of a special case. They are
4522 simple lrecords as in category (c), but are individually @code{malloc()}ed 4439 simple lrecords as in category (b), but are individually @code{malloc()}ed
4523 like vectors. You can basically view them as exactly like vectors 4440 like vectors. You can basically view them as exactly like vectors
4524 except that their type is stored in lrecord fashion rather than 4441 except that their type is stored in lrecord fashion rather than
4525 in directly-tagged fashion. 4442 in directly-tagged fashion.
4526 4443
4527 Note that FSF Emacs redesigned their object system in 19.29 to follow
4528 a similar scheme. However, given RMS's expressed dislike for data
4529 abstraction, the FSF scheme is not nearly as clean or as easy to
4530 extend. (FSF calls items of type (c) @code{Lisp_Misc} and items of type
4531 (d) @code{Lisp_Vectorlike}, with separate tags for each, although
4532 @code{Lisp_Vectorlike} is also used for vectors.)
4533 4444
4534 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp 4445 @node Garbage Collection, GCPROing, Introduction to Allocation, Allocation of Objects in XEmacs Lisp
4535 @section Garbage Collection 4446 @section Garbage Collection
4536 @cindex garbage collection 4447 @cindex garbage collection
4537 4448
4547 that ``all of memory'' means all currently allocated objects. 4458 that ``all of memory'' means all currently allocated objects.
4548 Traversing all these objects means traversing all frob blocks, 4459 Traversing all these objects means traversing all frob blocks,
4549 all vectors (which are chained in one big list), and all 4460 all vectors (which are chained in one big list), and all
4550 lcrecords (which are likewise chained). 4461 lcrecords (which are likewise chained).
4551 4462
4552 Note that, when an object is marked, the mark has to occur 4463 Garbage collection can be invoked explicitly by calling
4553 inside of the object's structure, rather than in the 32-bit 4464 @code{garbage-collect} but is also called automatically by @code{eval},
4554 @code{Lisp_Object} holding the object's pointer; i.e. you can't just 4465 once a certain amount of memory has been allocated since the last
4555 set the pointer's mark bit. This is because there may be many 4466 garbage collection (according to @code{gc-cons-threshold}).
4556 pointers to the same object. This means that the method of 4467
4557 marking an object can differ depending on the type. The
4558 different marking methods are approximately as follows:
4559
4560 @enumerate
4561 @item
4562 For conses, the mark bit of the car is set.
4563 @item
4564 For strings, the mark bit of the string's plist is set.
4565 @item
4566 For symbols when not lrecords, the mark bit of the
4567 symbol's plist is set.
4568 @item
4569 For vectors, the length is negated after adding 1.
4570 @item
4571 For lrecords, the pointer to the structure describing
4572 the type is changed (see below).
4573 @item
4574 Integers and characters do not need to be marked, since
4575 no allocation occurs for them.
4576 @end enumerate
4577
4578 The details of this are in the @code{mark_object()} function.
4579
4580 Note that any code that operates during garbage collection has
4581 to be especially careful because of the fact that some objects
4582 may be marked and as such may not look like they normally do.
4583 In particular:
4584
4585 @itemize @bullet
4586 Some object pointers may have their mark bit set. This will make
4587 @code{FOOBARP()} predicates fail. Use @code{GC_FOOBARP()} to deal with
4588 this.
4589 @item
4590 Even if you clear the mark bit, @code{FOOBARP()} will still fail
4591 for lrecords because the implementation pointer has been
4592 changed (see below). @code{GC_FOOBARP()} will correctly deal with
4593 this.
4594 @item
4595 Vectors have their size field munged, so anything that
4596 looks at this field will fail.
4597 @item
4598 Note that @code{XFOOBAR()} macros @emph{will} work correctly on object
4599 pointers with their mark bit set, because the logical shift operations
4600 that remove the tag also remove the mark bit.
4601 @end itemize
4602
4603 Finally, note that garbage collection can be invoked explicitly
4604 by calling @code{garbage-collect} but is also called automatically
4605 by @code{eval}, once a certain amount of memory has been allocated
4606 since the last garbage collection (according to @code{gc-cons-threshold}).
4607 4468
4608 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp 4469 @node GCPROing, Garbage Collection - Step by Step, Garbage Collection, Allocation of Objects in XEmacs Lisp
4609 @section @code{GCPRO}ing 4470 @section @code{GCPRO}ing
4610 4471
4611 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs 4472 @code{GCPRO}ing is one of the ugliest and trickiest parts of Emacs
4614 other from one of the roots of accessibility. The roots 4475 other from one of the roots of accessibility. The roots
4615 of accessibility are: 4476 of accessibility are:
4616 4477
4617 @enumerate 4478 @enumerate
4618 @item 4479 @item
4619 All objects that have been @code{staticpro()}d. This is used for 4480 All objects that have been @code{staticpro()}d or
4620 any global C variables that hold Lisp objects. A call to 4481 @code{staticpro_nodump()}ed. This is used for any global C variables
4621 @code{staticpro()} happens implicitly as a result of any symbols 4482 that hold Lisp objects. A call to @code{staticpro()} happens implicitly
4622 declared with @code{defsymbol()} and any variables declared with 4483 as a result of any symbols declared with @code{defsymbol()} and any
4623 @code{DEFVAR_FOO()}. You need to explicitly call @code{staticpro()} 4484 variables declared with @code{DEFVAR_FOO()}. You need to explicitly
4624 (in the @code{vars_of_foo()} method of a module) for other global 4485 call @code{staticpro()} (in the @code{vars_of_foo()} method of a module)
4625 C variables holding Lisp objects. (This typically includes 4486 for other global C variables holding Lisp objects. (This typically
4626 internal lists and such things.) 4487 includes internal lists and such things.). Use
4488 @code{staticpro_nodump()} only in the rare cases when you do not want
4489 the pointed variable to be saved at dump time but rather recompute it at
4490 startup.
4627 4491
4628 Note that @code{obarray} is one of the @code{staticpro()}d things. 4492 Note that @code{obarray} is one of the @code{staticpro()}d things.
4629 Therefore, all functions and variables get marked through this. 4493 Therefore, all functions and variables get marked through this.
4630 @item 4494 @item
4631 Any shadowed bindings that are sitting on the @code{specpdl} stack. 4495 Any shadowed bindings that are sitting on the @code{specpdl} stack.
4820 @code{Feval}. 4684 @code{Feval}.
4821 @end enumerate 4685 @end enumerate
4822 4686
4823 The upshot is that garbage collection can basically occur everywhere 4687 The upshot is that garbage collection can basically occur everywhere
4824 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or 4688 @code{Feval}, respectively @code{Ffuncall}, is used - either directly or
4825 through another function. Since calls to these two functions are 4689 through another function. Since calls to these two functions are hidden
4826 hidden in various other functions, many calls to 4690 in various other functions, many calls to @code{garbage_collect_1} are
4827 @code{garabge_collect_1} are not obviously foreseeable, and therefore 4691 not obviously foreseeable, and therefore unexpected. Instances where
4828 unexpected. Instances where they are used that are worth remembering are 4692 they are used that are worth remembering are various elisp commands, as
4829 various elisp commands, as for example @code{or}, 4693 for example @code{or}, @code{and}, @code{if}, @code{cond}, @code{while},
4830 @code{and}, @code{if}, @code{cond}, @code{while}, @code{setq}, etc., 4694 @code{setq}, etc., miscellaneous @code{gui_item_...} functions,
4831 miscellaneous @code{gui_item_...} functions, everything related to 4695 everything related to @code{eval} (@code{Feval_buffer}, @code{call0},
4832 @code{eval} (@code{Feval_buffer}, @code{call0}, ...) and inside 4696 ...) and inside @code{Fsignal}. The latter is used to handle signals, as
4833 @code{Fsignal}. The latter is used to handle signals, as for example the 4697 for example the ones raised by every @code{QUITE}-macro triggered after
4834 ones raised by every @code{QUITE}-macro triggered after pressing Ctrl-g. 4698 pressing Ctrl-g.
4835 4699
4836 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step 4700 @node garbage_collect_1, mark_object, Invocation, Garbage Collection - Step by Step
4837 @subsection @code{garbage_collect_1} 4701 @subsection @code{garbage_collect_1}
4838 @cindex @code{garbage_collect_1} 4702 @cindex @code{garbage_collect_1}
4839 4703
4850 @item 4714 @item
4851 Next the correct frame in which to put 4715 Next the correct frame in which to put
4852 all the output occurring during garbage collecting is determined. In 4716 all the output occurring during garbage collecting is determined. In
4853 order to be able to restore the old display's state after displaying the 4717 order to be able to restore the old display's state after displaying the
4854 message, some data about the current cursor position has to be 4718 message, some data about the current cursor position has to be
4855 saved. The variables @code{pre_gc_curser} and @code{cursor_changed} take 4719 saved. The variables @code{pre_gc_cursor} and @code{cursor_changed} take
4856 care of that. 4720 care of that.
4857 @item 4721 @item
4858 The state of @code{gc_currently_forbidden} must be restored after 4722 The state of @code{gc_currently_forbidden} must be restored after
4859 the garbage collection, no matter what happens during the process. We 4723 the garbage collection, no matter what happens during the process. We
4860 accomplish this by @code{record_unwind_protect}ing the suitable function 4724 accomplish this by @code{record_unwind_protect}ing the suitable function
4993 @code{Vall_weak_lists} immediately. A marked list is treated more 4857 @code{Vall_weak_lists} immediately. A marked list is treated more
4994 carefully by going over it and removing just the unmarked pairs. 4858 carefully by going over it and removing just the unmarked pairs.
4995 4859
4996 @item 4860 @item
4997 The function @code{prune_specifiers} checks all listed specifiers held 4861 The function @code{prune_specifiers} checks all listed specifiers held
4998 in @code{Vall_speficiers} and removes the ones from the lists that are 4862 in @code{Vall_specifiers} and removes the ones from the lists that are
4999 unmarked. 4863 unmarked.
5000 4864
5001 @item 4865 @item
5002 All syntax tables are stored in a list called 4866 All syntax tables are stored in a list called
5003 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks 4867 @code{Vall_syntax_tables}. The function @code{prune_syntax_tables} walks
5299 @section lrecords 5163 @section lrecords
5300 5164
5301 [see @file{lrecord.h}] 5165 [see @file{lrecord.h}]
5302 5166
5303 All lrecords have at the beginning of their structure a @code{struct 5167 All lrecords have at the beginning of their structure a @code{struct
5304 lrecord_header}. This just contains a pointer to a @code{struct 5168 lrecord_header}. This just contains a type number and some flags,
5169 including the mark bit. The type number, thru the
5170 @code{lrecord_implementation_table}, gives access to a @code{struct
5305 lrecord_implementation}, which is a structure containing method pointers 5171 lrecord_implementation}, which is a structure containing method pointers
5306 and such. There is one of these for each type, and it is a global, 5172 and such. There is one of these for each type, and it is a global,
5307 constant, statically-declared structure that is declared in the 5173 constant, statically-declared structure that is declared in the
5308 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro. (This macro actually 5174 @code{DEFINE_LRECORD_IMPLEMENTATION()} macro.
5309 declares an array of two @code{struct lrecord_implementation} 5175
5310 structures. The first one contains all the standard method pointers, 5176 Simple lrecords (of type (b) above) just have a @code{struct
5311 and is used in all normal circumstances. During garbage collection,
5312 however, the lrecord is @dfn{marked} by bumping its implementation
5313 pointer by one, so that it points to the second structure in the array.
5314 This structure contains a special indication in it that it's a
5315 @dfn{marked-object} structure: the finalize method is the special
5316 function @code{this_marks_a_marked_record()}, and all other methods are
5317 null pointers. At the end of garbage collection, all lrecords will
5318 either be reclaimed or unmarked by decrementing their implementation
5319 pointers, so this second structure pointer will never remain past
5320 garbage collection.
5321
5322 Simple lrecords (of type (c) above) just have a @code{struct
5323 lrecord_header} at their beginning. lcrecords, however, actually have a 5177 lrecord_header} at their beginning. lcrecords, however, actually have a
5324 @code{struct lcrecord_header}. This, in turn, has a @code{struct 5178 @code{struct lcrecord_header}. This, in turn, has a @code{struct
5325 lrecord_header} at its beginning, so sanity is preserved; but it also 5179 lrecord_header} at its beginning, so sanity is preserved; but it also
5326 has a pointer used to chain all lcrecords together, and a special ID 5180 has a pointer used to chain all lcrecords together, and a special ID
5327 field used to distinguish one lcrecord from another. (This field is used 5181 field used to distinguish one lcrecord from another. (This field is used
5532 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should 5386 (i.e. declared with a @code{_SEQUENCE_IMPLEMENTATION} macro.) This should
5533 simply return the object's size in bytes, exactly as you might expect. 5387 simply return the object's size in bytes, exactly as you might expect.
5534 For an example, see the methods for window configurations and opaques. 5388 For an example, see the methods for window configurations and opaques.
5535 @end enumerate 5389 @end enumerate
5536 5390
5537 @node Low-level allocation, Pure Space, lrecords, Allocation of Objects in XEmacs Lisp 5391 @node Low-level allocation, Cons, lrecords, Allocation of Objects in XEmacs Lisp
5538 @section Low-level allocation 5392 @section Low-level allocation
5539 5393
5540 Memory that you want to allocate directly should be allocated using 5394 Memory that you want to allocate directly should be allocated using
5541 @code{xmalloc()} rather than @code{malloc()}. This implements 5395 @code{xmalloc()} rather than @code{malloc()}. This implements
5542 error-checking on the return value, and once upon a time did some more 5396 error-checking on the return value, and once upon a time did some more
5604 routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the 5458 routines. These routines also call @code{INCREMENT_CONS_COUNTER()} at the
5605 appropriate times; this keeps statistics on how much memory is 5459 appropriate times; this keeps statistics on how much memory is
5606 allocated, so that garbage-collection can be invoked when the 5460 allocated, so that garbage-collection can be invoked when the
5607 threshold is reached. 5461 threshold is reached.
5608 5462
5609 @node Pure Space, Cons, Low-level allocation, Allocation of Objects in XEmacs Lisp 5463 @node Cons, Vector, Low-level allocation, Allocation of Objects in XEmacs Lisp
5610 @section Pure Space
5611
5612 Not yet documented.
5613
5614 @node Cons, Vector, Pure Space, Allocation of Objects in XEmacs Lisp
5615 @section Cons 5464 @section Cons
5616 5465
5617 Conses are allocated in standard frob blocks. The only thing to 5466 Conses are allocated in standard frob blocks. The only thing to
5618 note is that conses can be explicitly freed using @code{free_cons()} 5467 note is that conses can be explicitly freed using @code{free_cons()}
5619 and associated functions @code{free_list()} and @code{free_alist()}. This 5468 and associated functions @code{free_list()} and @code{free_alist()}. This
5647 ``vector''.) 5496 ``vector''.)
5648 5497
5649 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp 5498 @node Symbol, Marker, Bit Vector, Allocation of Objects in XEmacs Lisp
5650 @section Symbol 5499 @section Symbol
5651 5500
5652 Symbols are also allocated in frob blocks. Note that the code 5501 Symbols are also allocated in frob blocks. Symbols in the awful
5653 exists for symbols to be either lrecords (category (c) above) 5502 horrible obarray structure are chained through their @code{next} field.
5654 or simple types (category (b) above), and are lrecords by
5655 default (I think), although there is no good reason for this.
5656
5657 Note that symbols in the awful horrible obarray structure are
5658 chained through their @code{next} field.
5659 5503
5660 Remember that @code{intern} looks up a symbol in an obarray, creating 5504 Remember that @code{intern} looks up a symbol in an obarray, creating
5661 one if necessary. 5505 one if necessary.
5662 5506
5663 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp 5507 @node Marker, String, Symbol, Allocation of Objects in XEmacs Lisp
6004 5848
6005 A bunch of tables needed to reassign properly the global pointers are 5849 A bunch of tables needed to reassign properly the global pointers are
6006 then written. They are: 5850 then written. They are:
6007 5851
6008 @enumerate 5852 @enumerate
6009 @item the staticpro array 5853 @item
6010 @item the dumpstruct array 5854 the staticpro array
6011 @item the lrecord_implementation_table array 5855 @item
6012 @item a vector of all the offsets to the objects in the file that include a 5856 the dumpstruct array
5857 @item
5858 the lrecord_implementation_table array
5859 @item
5860 a vector of all the offsets to the objects in the file that include a
6013 description (for faster relocation at reload time) 5861 description (for faster relocation at reload time)
6014 @item the pdump_wired and pdump_wired_list arrays 5862 @item
5863 the pdump_wired and pdump_wired_list arrays
6015 @end enumerate 5864 @end enumerate
6016 5865
6017 For each of the arrays we write both the pointer to the variables and 5866 For each of the arrays we write both the pointer to the variables and
6018 the relocated offset of the object they point to. Since these variables 5867 the relocated offset of the object they point to. Since these variables
6019 are global, the pointers are still valid when restarting the program and 5868 are global, the pointers are still valid when restarting the program and
6579 @code{funcall_compiled_function()} calls the real byte-code interpreter 6428 @code{funcall_compiled_function()} calls the real byte-code interpreter
6580 @code{execute_optimized_program()} on the byte-code instructions, which 6429 @code{execute_optimized_program()} on the byte-code instructions, which
6581 are converted into an internal form for faster execution. 6430 are converted into an internal form for faster execution.
6582 6431
6583 When a compiled function is executed for the first time by 6432 When a compiled function is executed for the first time by
6584 @code{funcall_compiled_function()}, or when it is @code{Fpurecopy()}ed 6433 @code{funcall_compiled_function()}, or during the dump phase of building
6585 during the dump phase of building XEmacs, the byte-code instructions are 6434 XEmacs, the byte-code instructions are converted from a
6586 converted from a @code{Lisp_String} (which is inefficient to access, 6435 @code{Lisp_String} (which is inefficient to access, especially in the
6587 especially in the presence of MULE) into a @code{Lisp_Opaque} object 6436 presence of MULE) into a @code{Lisp_Opaque} object containing an array
6588 containing an array of unsigned char, which can be directly executed by 6437 of unsigned char, which can be directly executed by the byte-code
6589 the byte-code interpreter. At this time the byte code is also analyzed 6438 interpreter. At this time the byte code is also analyzed for validity
6590 for validity and transformed into a more optimized form, so that 6439 and transformed into a more optimized form, so that
6591 @code{execute_optimized_program()} can really fly. 6440 @code{execute_optimized_program()} can really fly.
6592 6441
6593 Here are some of the optimizations performed by the internal byte-code 6442 Here are some of the optimizations performed by the internal byte-code
6594 transformer: 6443 transformer:
6595 @enumerate 6444 @enumerate
6600 References to the @code{constants} array that will be used as a Lisp 6449 References to the @code{constants} array that will be used as a Lisp
6601 variable are checked for being correct non-constant (i.e. not @code{t}, 6450 variable are checked for being correct non-constant (i.e. not @code{t},
6602 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter 6451 @code{nil}, or @code{keywordp}) symbols, so that the byte interpreter
6603 doesn't have to. 6452 doesn't have to.
6604 @item 6453 @item
6605 The maxiumum number of variable bindings in the byte-code is 6454 The maximum number of variable bindings in the byte-code is
6606 pre-computed, so that space on the @code{specpdl} stack can be 6455 pre-computed, so that space on the @code{specpdl} stack can be
6607 pre-reserved once for the whole function execution. 6456 pre-reserved once for the whole function execution.
6608 @item 6457 @item
6609 All byte-code jumps are relative to the current program counter instead 6458 All byte-code jumps are relative to the current program counter instead
6610 of the start of the program, thereby saving a register. 6459 of the start of the program, thereby saving a register.
6706 All of these are very simple and work as expected, calling 6555 All of these are very simple and work as expected, calling
6707 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of 6556 @code{Feval()} or @code{Fprogn()} as necessary and (in the case of
6708 @code{let} and @code{let*}) using @code{specbind()} to create bindings 6557 @code{let} and @code{let*}) using @code{specbind()} to create bindings
6709 and @code{unbind_to()} to undo the bindings when finished. 6558 and @code{unbind_to()} to undo the bindings when finished.
6710 6559
6711 Note that, with the exeption of @code{Fprogn}, these functions are 6560 Note that, with the exception of @code{Fprogn}, these functions are
6712 typically called in real life only in interpreted code, since the byte 6561 typically called in real life only in interpreted code, since the byte
6713 compiler knows how to convert calls to these functions directly into 6562 compiler knows how to convert calls to these functions directly into
6714 byte code. 6563 byte code.
6715 6564
6716 @node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings 6565 @node Catch and Throw, , Simple Special Forms, Evaluation; Stack Frames; Bindings
8047 There is a separate Lisp object type for each of these four concepts. 7896 There is a separate Lisp object type for each of these four concepts.
8048 Furthermore, there is logically a @dfn{selected console}, 7897 Furthermore, there is logically a @dfn{selected console},
8049 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}. 7898 @dfn{selected display}, @dfn{selected frame}, and @dfn{selected window}.
8050 Each of these objects is distinguished in various ways, such as being the 7899 Each of these objects is distinguished in various ways, such as being the
8051 default object for various functions that act on objects of that type. 7900 default object for various functions that act on objects of that type.
8052 Note that every containing object rememembers the ``selected'' object 7901 Note that every containing object remembers the ``selected'' object
8053 among the objects that it contains: e.g. not only is there a selected 7902 among the objects that it contains: e.g. not only is there a selected
8054 window, but every frame remembers the last window in it that was 7903 window, but every frame remembers the last window in it that was
8055 selected, and changing the selected frame causes the remembered window 7904 selected, and changing the selected frame causes the remembered window
8056 within it to become the selected window. Similar relationships apply 7905 within it to become the selected window. Similar relationships apply
8057 for consoles to devices and devices to frames. 7906 for consoles to devices and devices to frames.
8420 @item 8269 @item
8421 Output changes Implemented by @code{redisplay-output.c}, 8270 Output changes Implemented by @code{redisplay-output.c},
8422 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c} 8271 @code{redisplay-x.c}, @code{redisplay-msw.c} and @code{redisplay-tty.c}
8423 @end enumerate 8272 @end enumerate
8424 8273
8425 Steps 1 and 2 are device-independant and relatively complex. Step 3 is 8274 Steps 1 and 2 are device-independent and relatively complex. Step 3 is
8426 mostly device-dependent. 8275 mostly device-dependent.
8427 8276
8428 Determining the desired display 8277 Determining the desired display
8429 8278
8430 Display attributes are stored in @code{display_line} structures. Each 8279 Display attributes are stored in @code{display_line} structures. Each
8431 @code{display_line} consists of a set of @code{display_block}'s and each 8280 @code{display_line} consists of a set of @code{display_block}'s and each
8432 @code{display_block} contains a number of @code{rune}'s. Generally 8281 @code{display_block} contains a number of @code{rune}'s. Generally
8433 dynarr's of @code{display_line}'s are held by each window representing 8282 dynarr's of @code{display_line}'s are held by each window representing
8434 the current display and the desired display. 8283 the current display and the desired display.
8435 8284
8436 The @code{display_line} structures are tighly tied to buffers which 8285 The @code{display_line} structures are tightly tied to buffers which
8437 presents a problem for redisplay as this connection is bogus for the 8286 presents a problem for redisplay as this connection is bogus for the
8438 modeline. Hence the @code{display_line} generation routines are 8287 modeline. Hence the @code{display_line} generation routines are
8439 duplicated for generating the modeline. This means that the modeline 8288 duplicated for generating the modeline. This means that the modeline
8440 display code has many bugs that the standard redisplay code does not. 8289 display code has many bugs that the standard redisplay code does not.
8441 8290
8764 caching is done by @code{image_instantiate} and is necessary because it 8613 caching is done by @code{image_instantiate} and is necessary because it
8765 is generally possible to display an image-instance in multiple 8614 is generally possible to display an image-instance in multiple
8766 domains. For instance if we create a Pixmap, we can actually display 8615 domains. For instance if we create a Pixmap, we can actually display
8767 this on multiple windows - even though we only need a single Pixmap 8616 this on multiple windows - even though we only need a single Pixmap
8768 instance to do this. If caching wasn't done then it would be necessary 8617 instance to do this. If caching wasn't done then it would be necessary
8769 to create image-instances for every displayable occurrance of a glyph - 8618 to create image-instances for every displayable occurrence of a glyph -
8770 and every usage - and this would be extremely memory and cpu intensive. 8619 and every usage - and this would be extremely memory and cpu intensive.
8771 8620
8772 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is 8621 Widget-glyphs (a.k.a native widgets) are not cached in this way. This is
8773 because widget-glyph image-instances on screen are toolkit windows, and 8622 because widget-glyph image-instances on screen are toolkit windows, and
8774 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are 8623 thus cannot be reused in multiple XEmacs domains. Thus widget-glyphs are
8800 tree recursively. 8649 tree recursively.
8801 8650
8802 This has desirable properties such as lw_modify_all_widgets which is 8651 This has desirable properties such as lw_modify_all_widgets which is
8803 called from glyphs-x.c and updates all the properties of a widget 8652 called from glyphs-x.c and updates all the properties of a widget
8804 without having to know what the widget is or what toolkit it is from. 8653 without having to know what the widget is or what toolkit it is from.
8805 Unfortunately this also has hairy properrties such as making the lwlib 8654 Unfortunately this also has hairy properties such as making the lwlib
8806 code quite complex. And of course lwlib has to know at some level what 8655 code quite complex. And of course lwlib has to know at some level what
8807 the widget is and how to set its properties. 8656 the widget is and how to set its properties.
8808 8657
8809 @node Specifiers, Menus, Glyphs, Top 8658 @node Specifiers, Menus, Glyphs, Top
8810 @chapter Specifiers 8659 @chapter Specifiers